User Tools

Site Tools


transformations:filesplitter

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
transformations:filesplitter [2020/02/19 06:30] dmitrytransformations:filesplitter [2022/12/04 16:34] (current) – [Action settings] dmitry
Line 1: Line 1:
-===== File splitter =====+{{ transformations:FileSplitterAction.png}} 
 +======SPLIT DELIMITED FILE ====== 
 +Category: Import / File\\
  
-This action aims to help processing very large text files. It splits a text file into smaller chunks (which are also text files) that can further be processed using iterations. The chunks can be either of fixed length (by row count), or split by unique values of a column (e.g. one chunk per unique Date). +\\  
 +=====Description===== 
 +This action aims to help in processing very large text files. It splits a text file into smaller chunks (which are also text files) that can further be processed using iterations. The chunks can be either of fixed length (by row count), or split by unique values of a column (e.g. one chunk per unique Date).\\
  
-Since File Splitter doesn't load files in memory, it can be used for splitting files so large that they can't fit in RAM.+\\  
 +=====Action settings===== 
 +^ Setting  ^ Description 
 +|Input file<sup>*</sup>|Fully qualified file name of the text file to be split (includes relative or absolute path).| 
 +|Encoding|ASCII, ANSI (with code page), and other types of encoding. If you're not sure what to choose, try UTF-8 as it's the most\\ common Unicode encoding.| 
 +|Skip first lines<sup>*</sup>|This option specifies how many lines to skip from the beginning of a file. It can be helpful in cases where the first rows contain\\ some metadata (file header) and actual tabular data starts only after that metadata. This setting can be specified\\ using a parameter as well.| 
 +|Ignore quoting | When checked, double quotes are treated like a regular character. | 
 +|Output folder<sup>*</sup>|The folder location the split files will be saved to.| 
 +|Output mode|The method by which the input file will be split.\\Options: //Split by fixed row count// (define the number of rows), or\\ //Split by column value// (define delimiter and column)| 
 +<sup>*</sup> Setting can be specified using a [[:parameters|parameter]].\\ 
 +\\  
 +====Output mode settings==== 
 +^Setting^Description^ 
 +|Split by fixed row count|The input file will be split into a new file every //specified// number of rows.  The //Chunk size (rows)<sup>*</sup>// setting defines the\\ number of rows.  The output files will include an 8-digit sequence number appended to the filenames.| 
 +|Split by column value|The input file will be split into a new file for every unique value found in the specified column.  Define the delimiter\\ the file uses with the //Separator// setting.  Select the column used to define the unique values with the //Column// setting.\\  The output files will include the unique value names the files were split into appended to the filenames.| 
 +<sup>*</sup> Setting can be specified using a [[:parameters|parameter]].\\ 
 + 
 +\\  
 +=====Remarks===== 
 +Since this action doesn't load files in memory, it can be used for splitting files so large that they can't fit in RAM. 
 + 
 +In addition to splitting the input file, this action outputs a single-column dataset containing a list of the split files' filenames.\\ 
 + 
 +\\  
 +=====Examples===== 
 +**Example 1:**  Splitting a comma-delimited text file with 10,000 rows, and splitting every 1,000 rows.\\ 
 + 
 +\\  
 +**Action parameters:** 
 +>  (Split by fixed row count) Chunk size (rows) is 1000\\ 
 + 
 +\\  
 +**Results:** 
 +  * 10 1000-row files with "00000001" through "00000010" appended to the filenames 
 +  * A workflow dataset containing the fully qualified filenames of the split files. 
 + 
 +\\ \\  
 +**Example 2:**  Splitting a comma-delimited text file with 10,000 rows, and splitting by each unique Region value.\\ 
 + 
 +\\  
 +**Action parameters:** 
 +>  (Split by column value) Separator is Comma 
 +>  (Split by column value) Column is Region 
 + 
 +\\  
 +**Results:** 
 +  *One file for each unique Region value, with the Region values appended to the filenames. 
 +  * A workflow dataset containing the fully qualified filenames of the split files. 
 + 
 + 
 +\\  
 +=====See also===== 
 + 
 +  * [[transformations:importtext|Import delimited text file]]
  
transformations/filesplitter.1582111853.txt.gz · Last modified: 2020/02/19 06:30 by dmitry

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki