transformations:deduplicate
This is an old revision of the document!
Table of Contents
DEDUPLICATE ROWS
Category: Transform / Advanced
Description
This action removes all duplicate rows in the entire table. Deduplication can be performed based on all columns, or only on specific columns.
Use cases
Use Deduplicate rows to clean datasets of records that may have been duplicated in the source dataset, or during previous actions.
Action settings
Setting | Description |
---|---|
Apply to | Select whether to base deduplication on all columns in the dataset, or only selected columns. Options: All columns or Selected columns (and select the columns to use from the list). |
Remarks
When deduplicating based on specific columns:
- The uniqueness of values in non-selected columns will be ignored.
- Duplicate rows are removed from the dataset from the bottom, up.
Examples
Objective: Find and remove duplicate rows.
Source table: The longest rivers in the world
River | Length (km) | Continent |
---|---|---|
Nile | 6650 | Africa |
Amazon | 6400 | South America |
Nile | 6650 | Africa |
Action parameters:
Apply to "All columns"
Result:
River | Length (km) | Continent |
---|---|---|
Nile | 6650 | Africa |
Amazon | 6400 | South America |
See also
transformations/deduplicate.1619367425.txt.gz · Last modified: 2021/04/25 12:17 by craigt