unicode
unicode
Section titled “unicode”The unicode module provides Unicode normalization, codepoint category
lookup, and character-class queries.
Import
Section titled “Import”use std::core::unicodeFunctions
Section titled “Functions”unicode::normalize(text: string, form: string) -> string
Section titled “unicode::normalize(text: string, form: string) -> string”Normalize a string to a Unicode normalization form. form is one of "NFC",
"NFD", "NFKC", or "NFKD". An unknown form is a runtime error.
// Compose any decomposed sequences into precomposed form.let composed = unicode::normalize("café", "NFC")print(composed)unicode::category(codepoint: int) -> string
Section titled “unicode::category(codepoint: int) -> string”Get a Unicode general-category abbreviation for a codepoint. The result is one
of "Lu" (uppercase letter), "Ll" (lowercase letter), "Lo" (other
letter), "Nd" (decimal digit), "No" (other number), "Zs" (space
separator), "Cc" (control), "Po" (punctuation), or "Cn" (unassigned /
uncategorized).
print(unicode::category(65)) // "Lu" (codepoint for 'A')print(unicode::category(97)) // "Ll" (codepoint for 'a')print(unicode::category(48)) // "Nd" (codepoint for '0')unicode::is_letter(char: string) -> bool
Section titled “unicode::is_letter(char: string) -> bool”Return true if the first character of char is alphabetic in Unicode.
print(unicode::is_letter("a")) // trueprint(unicode::is_letter("é")) // trueprint(unicode::is_letter("1")) // falseunicode::is_digit(char: string) -> bool
Section titled “unicode::is_digit(char: string) -> bool”Return true if the first character of char is a Unicode digit.
print(unicode::is_digit("0")) // trueprint(unicode::is_digit("9")) // trueprint(unicode::is_digit("a")) // falseFunction Reference
Section titled “Function Reference”| Function | Signature | Description |
|---|---|---|
unicode::normalize(text, form) | (string, string) -> string | NFC / NFD / NFKC / NFKD normalization |
unicode::category(codepoint) | (int) -> string | General-category abbreviation (Lu, Ll, Nd, …) |
unicode::is_letter(char) | (string) -> bool | First character is alphabetic |
unicode::is_digit(char) | (string) -> bool | First character is a digit |
Grapheme segmentation
Section titled “Grapheme segmentation”The unicode stdlib source also declares unicode::graphemes(text: string) -> Array<string>, which splits a string into Unicode grapheme clusters — the
user-visible characters, which may correspond to multiple codepoints.
See Also
Section titled “See Also”- Strings — the built-in string type and its methods