User Tools

Site Tools


transformations:deduplicate

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
transformations:deduplicate [2016/06/12 11:12] dmitrytransformations:deduplicate [2021/04/25 12:17] craigt
Line 1: Line 1:
-===== Deduplicate =====+{{ transformations:DeduplicateAction.png}} 
 +====== DEDUPLICATE ROWS ====== 
 +Category: Transform / Advanced\\
  
-This transformation removes all duplicate rows in entire table. Deduplication can be performed on all columns, or only on specific columns. In the latter case uniqueness of values in non-selected columns will be ignored.+\\  
 +=====Description===== 
 +This action removes all duplicate rows in the entire table. Deduplication can be performed based on all columns, or only on specific columns.\\
  
-EXAMPLE+\\ 
 +=====Use cases===== 
 +Use //Deduplicate rows// to clean datasets of records that may have been duplicated in the source dataset, or during previous actions. 
  
-**Source table:** The longest rivers in the world+\\  
 +=====Action settings===== 
 +^Setting ^Description ^ 
 +|Apply to|Select whether to base deduplication on all columns in the dataset, or only selected columns.\\  Options:  //All columns// or //Selected columns// (and select the columns to use from the list).|
  
-^  River  ^  Length (km)  ^  Continent  ^ +\\ 
-| Nile  |  6650  |  Africa  | +=====Remarks===== 
-| Amazon  |  6400  |  South America  | +When deduplicating based on specific columns: 
-| Nile  |  6650  |  Africa  |+  * The uniqueness of values in non-selected columns will be ignored. 
 +  * Duplicate rows are removed from the dataset from the bottom, up.
  
-**Objective:** Find and remove duplicate rows. 
  
-**Output table:**+\\  
 +=====Examples===== 
 +**Objective:** Find and remove duplicate rows.\\
  
- River  ^  Length (km)  ^  Continent +**Source table:** The longest rivers in the world 
-| Nile  |  6650   Africa +^ River  ^ Length (km)  ^ Continent 
-| Amazon  |  6400   South America  |+| Nile  |  6650| Africa 
 +| Amazon  |  6400| South America  | 
 +| Nile  |  6650| Africa 
 +\\  
 +**Action parameters:** 
 +> Apply to "All columns"
  
 +\\ 
 +**Result:**
 +^River  ^Length (km)  ^Continent  ^
 +| Nile  |  6650| Africa  |
 +| Amazon  |  6400| South America  |
  
 +\\ 
 +=====See also=====
 +  * [[transformations:keepduplicates|Keep duplicates]]
transformations/deduplicate.txt · Last modified: 2021/07/19 02:15 by craigt

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki