Module: ActiveSupport::Multibyte::Unicode
Relationships & Source Files | |
Defined in: | activesupport/lib/active_support/multibyte/unicode.rb |
Constant Summary
-
UNICODE_VERSION =
The Unicode version that is supported by the implementation
RbConfig::CONFIG["UNICODE_VERSION"]
Instance Method Summary
-
#compose(codepoints)
Compose decomposed characters to the composed form.
-
#decompose(type, codepoints)
Decompose composed characters to the decomposed form.
-
#tidy_bytes(string, force = false)
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
- #recode_windows1252_chars(string) private
Instance Method Details
#compose(codepoints)
Compose decomposed characters to the composed form.
# File 'activesupport/lib/active_support/multibyte/unicode.rb', line 21
def compose(codepoints) codepoints.pack("U*").unicode_normalize(:nfc).codepoints end
#decompose(type, codepoints)
Decompose composed characters to the decomposed form.
# File 'activesupport/lib/active_support/multibyte/unicode.rb', line 12
def decompose(type, codepoints) if type == :compatibility codepoints.pack("U*").unicode_normalize(:nfkd).codepoints else codepoints.pack("U*").unicode_normalize(:nfd).codepoints end end
#recode_windows1252_chars(string) (private)
[ GitHub ]# File 'activesupport/lib/active_support/multibyte/unicode.rb', line 37
def recode_windows1252_chars(string) string.encode(Encoding::UTF_8, Encoding::Windows_1252, invalid: :replace, undef: :replace) end
#tidy_bytes(string, force = false)
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Passing true
will forcibly tidy all bytes, assuming that the string’s encoding is entirely CP1252 or ISO-8859-1.
# File 'activesupport/lib/active_support/multibyte/unicode.rb', line 30
def tidy_bytes(string, force = false) return string if string.empty? || string.ascii_only? return recode_windows1252_chars(string) if force string.scrub { |bad| recode_windows1252_chars(bad) } end