c01-data-quality-v2

Transliteration and Character Set Mapping

Translate foreign languages so that addresses can be easily validated.

Overview

Are you facing the challenge of working with foreign languages like Greek, Russian, Japanese or Chinese without being a native speaker?

Our Address Verification Character Set Mapping and Transliteration is an invaluable help when working with strings in almost all common character sets. It supports 40 different character sets and can transform five non-Latin writing systems into Latin characters through Transliteration:

  • Mapping between 40 different character sets, including UTF-8, ISO 8859-1, GBK, BIG5, JIS, EBCDIC
  • Character filter on 'a'-'Z' and '0'-'9'
  • Correct "removal" of diacritics according to language specific rules
  • HTML and URL encoding and decoding
  • Unix <-> Windows line break conversions

Address Verification supports the following Non-Latin writing systems

  • Greek transliteration (BGN/PCGN 1962, ISO 843 - 1997)
  • Cyrillic transliteration (BGN/PCGN 1947, ISO 9 - 1995)
  • Japanese Katakana, Hiragana and Kanji transliteration
  • Chinese Pinyin transliteration (Mandarin, Cantonese) for both Simplified and Traditional Chinese
  • Korean Hangul transliteration
  • Hebrew

Address Verification Character Set Mapping and Transliteration are fully Unicode enabled.