User Tools

Site Tools


transformations:sanitize

SANITIZE TEXT

Category: Transform / Advanced


Description

This action removes "invisible" characters from text values that are frequently unwanted because they may lead to mismatches and wrong merges:

  • Hidden system characters
  • Tabs
  • Line breaks
  • Leading spaces
  • Trailing spaces
  • Repeating spaces

Non-text values (numbers, symbols, etc.) are not affected by this action.


Use cases

  • Use Sanitize text on text columns to be used in a merge action, just prior to performing the merge, to ensure "hidden" characters don't prevent proper matches.
  • Remove markup tags from XML- or HTML-based files, leaving the plain text for downstream processing.


Action settings

SettingDescription
Remove system charactersWhen checked, ASCII characters 0-31 are removed, except for tab, carriage return and line feed characters.
TabsSelect how tab characters embedded in the text will be handled. Options: Do nothing, Remove, Remove repeating,
and Replace with spaces.
Line breaksSelect how line breaks embedded in the text will be handled. Options: Do nothing, Remove, Remove repeating,
and Replace with spaces.
Remove ASCII FE-FFWhen checked, the characters with ASCII codes 0xFE (hexadecimal, 254 decimal) and 0xFF (hexadecimal, 255 decimal) will be removed.
Trim leading spacesWhen checked, whitespace occurring at the start of text will be removed.
Trim trailing spacesWhen checked, whitespace occurring at the end of text will be removed.
Remove repeating spacesWhen checked, instances of more than one, adjacent space will be converted to a single space.
Remove XML/HTML tabsWhen checked, all XML and HTML markup tags will be removed.
Sanitize columnsSelect whether to sanitize all columns, or selected columns. Options: Sanitize all columns or Sanitize only
selected columns
(and select which columns to process).


Remarks

The Remove repeating spaces option removes repeating spaces from anywhere within the text, leading spaces, and trailing spaces. All occurrences found within a text value will be replaced, so more than one instance within a single text value will be addressed.


Examples

Objective: Clean out all unneeded text characters.

Source data: (raw text shown for clarity)

Sample Text
"  2 Leading spaces"
"2 Trailing spaces  "
"<b>Bold HTML tags</b>"
"2 spaces here->  and 3 spaces here->   ."


Action parameters:

Row 1 requires Trim leading spaces
Row 2 requires Trim trailing spaces
Row 3 requires Remove XML/HTML tags
Row 4 requires Remove repeating spaces
Sanitize all columns.


Output:

Sample Text
2 Leading spaces
2 Trailing spaces
Bold HTML tags
2 spaces here→ and 3 spaces here→ .
transformations/sanitize.txt · Last modified: 2021/04/14 14:29 by dmitry