javascript - Large dataset visualisation - Stack Overflow

admin2025-04-20  0

I have hourly data over several years (> 20 years), I would like to have some hint on how to display such a big amount of data in the browser. I would like to display data as time series because all of the different data sets have the same format (a value at a certain time), but display different kind of information. I looked at d3js, and manage to plot all my data, that is 20 years of data or more, then use brushing to zoom in based on this very good exemple.
But the browser don't support that much of data and became extremely slow.
On the server side I use servlets to send data in json format.

I display different kind of data but all have the same format which is time and value, but displaying different kind of information.

Thanks for some advices, hints and exemples on best practices to visualize large datasets.

I have hourly data over several years (> 20 years), I would like to have some hint on how to display such a big amount of data in the browser. I would like to display data as time series because all of the different data sets have the same format (a value at a certain time), but display different kind of information. I looked at d3js, and manage to plot all my data, that is 20 years of data or more, then use brushing to zoom in based on this very good exemple.
But the browser don't support that much of data and became extremely slow.
On the server side I use servlets to send data in json format.

I display different kind of data but all have the same format which is time and value, but displaying different kind of information.

Thanks for some advices, hints and exemples on best practices to visualize large datasets.

Share Improve this question edited Jul 11, 2014 at 12:25 jerome asked Jul 11, 2014 at 10:10 jeromejerome 2,0895 gold badges40 silver badges60 bronze badges 3
  • Aggregate the data or show only part of it and load on demand? – Lars Kotthoff Commented Jul 11, 2014 at 10:31
  • the kind of visualization depends on the kind of data also (if it's better to use a line or area chart for example, or even a calendar cloropleth would be cool if the data allows it). But yes I agree with Lars, you should try and find a way to show the data in a way that isn't too much crowded. Would be cool to pare same days, or same hours for example, if patterns emerge. – tomtomtom Commented Jul 11, 2014 at 10:40
  • 1 It is continous data, what would be a good aggregation for showing it yearly ? For precise hourly view I will use a line chart with the method described below but if I could show another kind of chart to have a good view over the 20 years of data I take it. – jerome Commented Jul 11, 2014 at 12:40
Add a ment  | 

2 Answers 2

Reset to default 6

Don't bring all the data on the client side.

Instead, you could implement a server side method that will look like this:
getData(startDate, endDate, maxSteps)

This method will always return at most maxSteps records, but which records, it's totally up to you and your data. I would suggest one of the following approaches:

The following steps are mon for both methods:

  • get all records available between startDate and endDate
  • if there are less records than maxSteps return all of them

Using the subset of records determined by startDate and endDate continue with the following steps.

Method 1: get exact records from your data. Can be expensive to determine the right ones:

  • determine equidistant points in your data
  • get records from data that are closest to the selected points

    point = startDate;
    stepTimeSpan = (endDate - startDate) / (maxSteps - 1); //will fail if maxSteps = 1
    for (i = 0; i < maxSteps; i++)
    {
        records.Add(getClosestTo(point));
        point = point + stepTimeSpan;
    }
    return records;
    

Method 2: return records resulted from aggregations:

  • split the records in maxSteps buckets with records (by date)
  • obtain one record from each bucket as result of an aggregation

    bucketStart = startDate;
    bucketTimeSpan = (endDate - startDate) / maxSteps;
    for (i = 0; i < maxSteps; i++)
    {
       bucket = getRecordsBetween(bucketStart, bucketStart + bucketTimeSpan);
       records.Add( new Record( AvgDate(bucket), AvgValue(bucket) ) );
       bucketStart = bucketStart + bucketTimeSpan;
    }
    return records;
    

Call this method on client side each time the user changes the interval (using the small chart from the bottom in your example).

Play with maxSteps value until you find the right balance between performance and detail.

An issue with using libraries such as d3.js is that it relies on SVG to create all of the data and to maintain an object to reference the data. This can obviously lead to a DOM explosion depending on your dataset size. You could sample the data before rendering it and sending it to the browser, but the granularity and accuracy could be lost. Maybe you need those non-outlier points to identify trends. It really depends on the size of your dataset though.

Assuming you have a dataset size of ~175,200 points (one for every hour in 20 years), I would suggest to you a library called ZingChart (http://www.zingchart.). It has many different styling options but more importantly it has different rendering capabilities (SVG or canvas) that can render the amount of data you are trying to visualize. In particular, take note of the zoom function which can visualize every single point, along with the ability to add custom tags to each node.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745123719a286301.html

最新回复(0)