Skip to content

unicode

The unicode module provides Unicode normalization, codepoint category lookup, and character-class queries.

use std::core::unicode

unicode::normalize(text: string, form: string) -> string

Section titled “unicode::normalize(text: string, form: string) -> string”

Normalize a string to a Unicode normalization form. form is one of "NFC", "NFD", "NFKC", or "NFKD". An unknown form is a runtime error.

// Compose any decomposed sequences into precomposed form.
let composed = unicode::normalize("café", "NFC")
print(composed)

unicode::category(codepoint: int) -> string

Section titled “unicode::category(codepoint: int) -> string”

Get a Unicode general-category abbreviation for a codepoint. The result is one of "Lu" (uppercase letter), "Ll" (lowercase letter), "Lo" (other letter), "Nd" (decimal digit), "No" (other number), "Zs" (space separator), "Cc" (control), "Po" (punctuation), or "Cn" (unassigned / uncategorized).

print(unicode::category(65)) // "Lu" (codepoint for 'A')
print(unicode::category(97)) // "Ll" (codepoint for 'a')
print(unicode::category(48)) // "Nd" (codepoint for '0')

Return true if the first character of char is alphabetic in Unicode.

print(unicode::is_letter("a")) // true
print(unicode::is_letter("é")) // true
print(unicode::is_letter("1")) // false

Return true if the first character of char is a Unicode digit.

print(unicode::is_digit("0")) // true
print(unicode::is_digit("9")) // true
print(unicode::is_digit("a")) // false
FunctionSignatureDescription
unicode::normalize(text, form)(string, string) -> stringNFC / NFD / NFKC / NFKD normalization
unicode::category(codepoint)(int) -> stringGeneral-category abbreviation (Lu, Ll, Nd, …)
unicode::is_letter(char)(string) -> boolFirst character is alphabetic
unicode::is_digit(char)(string) -> boolFirst character is a digit

The unicode stdlib source also declares unicode::graphemes(text: string) -> Array<string>, which splits a string into Unicode grapheme clusters — the user-visible characters, which may correspond to multiple codepoints.

  • Strings — the built-in string type and its methods