User Tools

Site Tools


transformations:createbuckets

CREATE BUCKETS

Category: Transform / Advanced


Description

Creates numerical ranges of fixed width and assigns each row to one of the ranges based on values in the specified column.


Use cases

The action is typically used for grouping number values (such as age or tenure) by intervals of fixed width (e.g. age group). Created group labels can later be used in reports and charts.


Action settings

SettingDescription
ColumnSelect the column containing the values to create buckets for.
Bucket column nameEnter a name for the new column containing the bucket range values.
Bottom threshold*Enter the lowest value to assign to a bucket. Values below this threshold will be assigned to a "less than…" bucket.
Bucket width*Enter the number of values each value range (bucket) will contain.
Upper threshold*Enter the highest value to assign to a bucket. Values above this threshold will be assigned to a "greater than…" bucket.
ModeOptions: Lower boundary inclusive, upper exclusive or Lower boundary exclusive, upper inclusive. See table below.
Label styleSelect the format the bucket labels will appear in. Options: 0 < X < 10, < 0, > 10; 0 to 10, less than 0, greater
than 10
; or 0 .. 10, < 0, > 10.
Add columns with lower/upper boundariesWhen selected, an additional column named Lower boundary will be created to hold the lower boundary
values of the buckets, and a column named Upper boundary will be created to hold the upper boundary values of the buckets.
Generate empty rows for empty bucketsWhen checked, will generate rows for all bucket ranges in which a value did not fall ("unused" buckets).

* Setting can be specified using a parameter.

Mode settings

ModeDescription
Lower boundary inclusive,
upper exclusive
In this mode, values falling on the lower boundary will be included in the bucket.
Lower boundary exclusive,
upper inclusive
In this mode, values falling on the upper boundary will be included in the bucket.


Remarks

As bucket ranges reuse values for upper and lower boundaries, the Mode determines where a value that falls on a boundary ends up. For example, a value of "50", with the buckets 40 to 50 and 50 to 60 would fall into the 40 to 50 bucket if the "lower exclusive, upper inclusive" mode. If the "lower inclusive, upper exclusive" mode is used, it will fall into the 50 to 60 bucket.

Values that fall into the "less than…" range will have no lower boundary value. Values that fall into the "greater than…" range will have no upper boundary value.

This process is also called "binning".


Examples

Example #1

Generate buckets for a list of age values.

Before (source table)

Age
21
33
43
65
72
81

After (result table)

Age Age Ranges
21less than 30
3330 to 40
4340 to 50
6560 to 70
7270 to 80
81greater than 80

Notice that values that fall on the upper boundary are included within that range due to the "lower exclusive, upper inclusive" setting.

Action parameters

Column: Age
Bucket column name: Age Ranges
Bottom threshold: 30
Bucket width: 10
Upper threshold: 80
Mode: lower exclusive, upper inclusive
Label style: 0 to 10, less than 0, greater than 10


See also

transformations/createbuckets.txt · Last modified: 2025/01/15 17:51 by craigt

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki