User Tools

Site Tools


transformations:deduplicate

DEDUPLICATE ROWS

Category: Transform / Filters


Description

This action removes all duplicate rows in the entire table. Deduplication can be performed based on all columns, or only on specific columns.


Use cases

Use Deduplicate rows to clean datasets of records that may have been duplicated in the source dataset, or during previous actions.


Action settings

Setting Description
Apply toSelect whether to base deduplication on all columns in the dataset, or only selected columns.
Options: All columns or Selected columns (and select the columns to use from the list).


Remarks

When deduplicating based on specific columns:

  • The uniqueness of values in non-selected columns will be ignored.
  • Duplicate rows are removed from the dataset from the bottom, up.


Examples

Example: Find and remove duplicate rows.

Source table: The longest rivers in the world

River Length (km) Continent
Nile 6650 Africa
Amazon 6400 South America
Nile 6650 Africa


Action parameters:

Apply to "All columns"


Result table:

River Length (km) Continent
Nile 6650 Africa
Amazon 6400 South America


Community examples


See also

transformations/deduplicate.txt · Last modified: 2021/07/19 02:15 by craigt

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki