Table of Contents
Category: Transform / Advanced
This action removes "invisible" characters from text values that are frequently unwanted because they may lead to mismatches and wrong merges:
- Hidden system characters
- Line breaks
- Leading spaces
- Trailing spaces
- Repeating spaces
Non-text values (numbers, symbols, etc.) are not affected by this action.
- Use Sanitize text on text columns to be used in a merge action, just prior to performing the merge, to ensure "hidden" characters don't prevent proper matches.
- Remove markup tags from XML- or HTML-based files, leaving the plain text for downstream processing.
|Remove system characters||When checked, ASCII characters 0-31 are removed, except for tab, carriage return and line feed characters.|
|Tabs||Select how tab characters embedded in the text will be handled. Options: Do nothing, Remove, Remove repeating,
and Replace with spaces.
|Line breaks||Select how line breaks embedded in the text will be handled. Options: Do nothing, Remove, Remove repeating,
and Replace with spaces.
|Remove ASCII FE-FF||When checked, the characters with ASCII codes 0xFE (hexadecimal, 254 decimal) and 0xFF (hexadecimal, 255 decimal) will be removed.|
|Trim leading spaces||When checked, whitespace occurring at the start of text will be removed.|
|Trim trailing spaces||When checked, whitespace occurring at the end of text will be removed.|
|Remove repeating spaces||When checked, instances of more than one, adjacent space will be converted to a single space.|
|Remove XML/HTML tabs||When checked, all XML and HTML markup tags will be removed.|
|Sanitize columns||Select whether to sanitize all columns, or selected columns. Options: Sanitize all columns or Sanitize only
selected columns (and select which columns to process).
The Remove repeating spaces option removes repeating spaces from anywhere within the text, leading spaces, and trailing spaces. All occurrences found within a text value will be replaced, so more than one instance within a single text value will be addressed.
Example: Clean out all unneeded text characters.
Source data: (raw text shown for clarity)
Sample Text " 2 Leading spaces" "2 Trailing spaces " "<b>Bold HTML tags</b>" "2 spaces here-> and 3 spaces here-> ."
Row 1 requires Trim leading spaces
Row 2 requires Trim trailing spaces
Row 3 requires Remove XML/HTML tags
Row 4 requires Remove repeating spaces
Sanitize all columns.
|2 Leading spaces|
|2 Trailing spaces|
|Bold HTML tags|
|2 spaces here→ and 3 spaces here→ .|
- “Printed” Text File: Could EasyMorph import this? (Project; Module: Parse Group; Group: Tab 1; Table: Header (3); Action position: 5)
- How to pull data from web APIs with pagination (Project; Module: Main; Group: Group 1; Table: Query API with pagination;
Action position: 5)
transformations/sanitize.txt · Last modified: 2021/07/19 02:18 by craigt