{{ transformations:SanitizeAction.png}} ====== SANITIZE TEXT ====== Category: Transform / Advanced\\ \\ =====Description===== This action removes "invisible" characters from text values that are frequently unwanted because they may lead to mismatches and wrong merges: * Hidden system characters * Tabs * Line breaks * Leading spaces * Trailing spaces * Repeating spaces Non-text values (numbers, symbols, etc.) are not affected by this action.\\ \\ =====Use cases===== *Use //Sanitize text// on text columns to be used in a merge action, just prior to performing the merge, to ensure "hidden" characters don't prevent proper matches. *Remove markup tags from XML- or HTML-based files, leaving the plain text for downstream processing. \\ =====Action settings===== ^Setting^Description^ |Remove system characters|When checked, ASCII characters 0-31 are removed, except for //tab//, //carriage return// and //line feed// characters.| |Tabs|Select how tab characters embedded in the text will be handled. Options: //Do nothing//, //Remove//, //Remove repeating//,\\ and //Replace with spaces//.| |Line breaks|Select how line breaks embedded in the text will be handled. Options: //Do nothing//, //Remove//, //Remove repeating//,\\ and //Replace with spaces//.| |Remove ASCII FE-FF|When checked, the characters with ASCII codes 0xFE (hexadecimal, 254 decimal) and 0xFF (hexadecimal, 255 decimal) will be removed.| |Trim leading spaces|When checked, whitespace occurring at the start of text will be removed.| |Trim trailing spaces|When checked, whitespace occurring at the end of text will be removed.| |Remove repeating spaces|When checked, instances of more than one, adjacent space will be converted to a single space.| |Remove XML/HTML tabs|When checked, all XML and HTML markup tags will be removed.| |Sanitize columns|Select whether to sanitize all columns, or selected columns. Options: //Sanitize all columns// or //Sanitize only\\ selected columns// (and select which columns to process).| \\ =====Remarks===== The //Remove repeating spaces// option removes repeating spaces from //anywhere// within the text, leading spaces, and trailing spaces. All occurrences found within a text value will be replaced, so more than one instance within a single text value will be addressed.\\ \\ =====Examples===== **Example:** Clean out all unneeded text characters.\\ \\ **Source data:** (raw text shown for clarity) Sample Text " 2 Leading spaces" "2 Trailing spaces " "Bold HTML tags" "2 spaces here-> and 3 spaces here-> ." \\ **Action parameters:** >Row 1 requires //Trim leading spaces// >Row 2 requires //Trim trailing spaces// >Row 3 requires //Remove XML/HTML tags// >Row 4 requires //Remove repeating spaces// >Sanitize all columns. \\ **Result table:** ^Sample Text^ |2 Leading spaces| |2 Trailing spaces| |Bold HTML tags| |2 spaces here-> and 3 spaces here-> .| \\ ====Community examples==== * [[https://community.easymorph.com/t//2008/2|“Printed” Text File: Could EasyMorph import this?]] ([[https://community.easymorph.com/uploads/short-url/kIb1qqOJb9WK6D1N1jFZdnHzC46.morph|Project]]; Module: //Parse Group//; Group: //Tab 1//; Table: //Header (3)//; Action position: //5//) * [[https://community.easymorph.com/t//2160/1|How to pull data from web APIs with pagination]] ([[https://community.easymorph.com/uploads/short-url/dvCSpcEDXYZ8aB0B2gtnt7qulTF.morph|Project]]; Module: //Main//; Group: //Group 1//; Table: //Query API with pagination//;\\ Action position: //5//) \\