User Tools

Site Tools


transformations:keepduplicates

KEEP DUPLICATES

Category: Transform / Filters


Description

This action keeps all duplicate rows and removes all unique rows. This operation can be performed on all columns, or only on specific columns. In the latter case, the uniqueness of values in columns not selected will be ignored.


Use cases

In datasets where all records should be unique, this action helps clean them by looking for duplicate values/records and pulling them out for review - to determine if those records warrant removal or modification.


Action settings

Setting Description
Apply toSelect whether to check the values in all columns for duplicates, or just specified columns. Options: All columns or
Selected columns (and select the columns to check).


Examples

Example: Find duplicates in column "Continent".

Source table: The longest rivers in the world

River Length (km) Continent
Nile 6650 Africa
Amazon 6400 South America
Yangtze 6300 Eurasia
Yellow River (Huang He) 5464 Eurasia


Action parameters:

Apply to "Selected columns"
Columns: "Continent"


Result table:

River Length (km) Continent
Yangtze 6300 Eurasia
Yellow River (Huang He) 5464 Eurasia


Community examples


See also

transformations/keepduplicates.txt · Last modified: 2021/07/19 02:13 by craigt

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki