Summarise transformation

{ summarise: ... }

summarise is used to calculate summary statistics over columns or groups.

Instructions

Type Description Result
Object Object containing aggregation instructions per new column New dataframe with summarised data

Usage

summarise is most useful when used in combination with groupBy or binning. However, for the purpose of explanation, using summarise to summarise an entire dataframe will first be discussed in the following paragraph. In the paragraph after that, the same principles will then be applied to data grouped with groupBy or binning.

Summarising an entire dataframe

The summarise instructions Object has new column names as keys, and 'aggregation instructions' as value:

{ summarise: { <new column name>: <aggregation instructions> } }

The aggregation instructions can be either an Object or a Function. When the aggregation instructions are passed as an Object, it must be an Object with one key/value pair. The key, then, is a column in the data on which the transformation is applied, and the value either a String with the name of an aggregation function, or an actual Function.

{ summarise: { <new column name>: { <old column name>: ... } } }

For example, to calculate the sum of all data in a single column, we can use the 'sum' aggregation Function:

<vgg-data
  :data="{ a: [1, 2, 3, 4] }"
  :transform="{ summarise: { sumA: { a: 'sum' } } }"
>

  <!-- Data scope: { sumA: [10] } -->

</vgg-data>

The following aggregation Functions can be accessed through the String syntax:

Aggregation Description
count Number of occurances/rows
sum Sum of all values in column
mean Mean of all values in column
median Median of all values in column
mode Most occuring value in column
min Lowest value in column
max Highest value in column

Alternatively, you can use your own custom Function. This Function will be called with the entire column as its first argument:

<vgg-data
  :data="{ a: [1, 2, 3, 4] }"
  :transform="{ summarise: {
    meanA: { a: col => {
      let sum = 0
      col.forEach(value => { sum += value })
      return sum / col.length
    } }
  } }"
>

  <!-- Data scope: { meanA: [2.5] } -->

</vgg-data>

Finally, we can provide the aggregation instructions as Function instead of as an Object:

{ summarise: { <column name>: <aggregation Function> } }

In this example, we will do the same as in the example above, but clearly there are more advanced summary techniques possible with this method:

<vgg-data
  :data="{ a: [1, 2, 3, 4], b: [5, 6, 7, 8] }"
  :transform="{ summarise: {
    meanA: df => {
      let sum = 0
      let col = df.a
      col.forEach(value => { sum += value })
      return sum / col.length
    }
  } }"
>

  <!-- Data scope: { meanA: [2.5] } -->

</vgg-data>

Summarising groups

As mentioned above, summarise is especially powerful in combination with grouped data. For more information on how to create grouped data, check out the group by and binning documentation.

The syntax used in summarise is exactly the same when working with grouped data. The difference, however, is that instead of returning a dataframe with only one row, a dataframe with one row per group will be returned. For example, to calculate the average price per fruit:

<vgg-data
  :data="{ value: [1, 2, 3, 4], fruit: ['apple', 'banana', 'apple', 'banana'] }"
  :transform="[
    { groupBy: 'fruit' },
    { summarise: { meanFruit: { value: 'mean' } } }
  ]"
>

  <!-- Data scope: { fruit: ['apple', 'banana'], meanFruit: [2, 3] } -->

</vgg-data>

Or, to calculate the number of items in each bin when using binning:

<vgg-data
  :data="{ a: [1, 2, 3, 4, 5, 6, 7], b: [8, 9, 10, 11, 12, 13, 14] }"
  :transform="[
    { binning: { groupBy: 'a', method: 'EqualInterval', numClasses: 3 } },
    { summarise: { binCount: { b: 'count' } } }
  ]"
>

  <!-- Data scope: {
    bins: [[1, 3], [4, 5], [6, 7]],
    binCount: [3, 2, 2]
  } -->

</vgg-data>