transformations:sanitize
Table of Contents
SANITIZE TEXT
Category: Transform / Advanced
Description
This action removes "invisible" characters from text values that are frequently unwanted because they may lead to mismatches and wrong merges:
- Hidden system characters
- Tabs
- Line breaks
- Leading spaces
- Trailing spaces
- Repeating spaces
Non-text values (numbers, symbols, etc.) are not affected by this action.
Use cases
- Use Sanitize text on text columns to be used in a merge action, just prior to performing the merge, to ensure "hidden" characters don't prevent proper matches.
- Remove markup tags from XML- or HTML-based files, leaving the plain text for downstream processing.
Action settings
Setting | Description |
---|---|
Remove system characters | When checked, ASCII characters 0-31 are removed, except for tab, carriage return and line feed characters. |
Tabs | Select how tab characters embedded in the text will be handled. Options: Do nothing, Remove, Remove repeating, and Replace with spaces. |
Line breaks | Select how line breaks embedded in the text will be handled. Options: Do nothing, Remove, Remove repeating, and Replace with spaces. |
Remove ASCII FE-FF | When checked, the characters with ASCII codes 0xFE (hexadecimal, 254 decimal) and 0xFF (hexadecimal, 255 decimal) will be removed. |
Trim leading spaces | When checked, whitespace occurring at the start of text will be removed. |
Trim trailing spaces | When checked, whitespace occurring at the end of text will be removed. |
Remove repeating spaces | When checked, instances of more than one, adjacent space will be converted to a single space. |
Remove XML/HTML tabs | When checked, all XML and HTML markup tags will be removed. |
Sanitize columns | Select whether to sanitize all columns, or selected columns. Options: Sanitize all columns or Sanitize only selected columns (and select which columns to process). |
Remarks
The Remove repeating spaces option removes repeating spaces from anywhere within the text, leading spaces, and trailing spaces. All occurrences found within a text value will be replaced, so more than one instance within a single text value will be addressed.
Examples
Example: Clean out all unneeded text characters.
Source data: (raw text shown for clarity)
Sample Text " 2 Leading spaces" "2 Trailing spaces " "<b>Bold HTML tags</b>" "2 spaces here-> and 3 spaces here-> ."
Action parameters:
Row 1 requires Trim leading spaces
Row 2 requires Trim trailing spaces
Row 3 requires Remove XML/HTML tags
Row 4 requires Remove repeating spaces
Sanitize all columns.
Result table:
Sample Text |
---|
2 Leading spaces |
2 Trailing spaces |
Bold HTML tags |
2 spaces here→ and 3 spaces here→ . |
Community examples
- “Printed” Text File: Could EasyMorph import this? (Project; Module: Parse Group; Group: Tab 1; Table: Header (3); Action position: 5)
- How to pull data from web APIs with pagination (Project; Module: Main; Group: Group 1; Table: Query API with pagination;
Action position: 5)
transformations/sanitize.txt · Last modified: 2021/07/19 02:18 by craigt