123456789_123456789_123456789_123456789_123456789_

Class: ActiveSupport::Multibyte::Chars

Relationships & Source Files
Super Chains via Extension / Inclusion / Inheritance
Instance Chain:
self, Comparable
Inherits: Object
Defined in: activesupport/lib/active_support/multibyte/chars.rb

Overview

Chars enables you to work transparently with UTF-8 encoding in the Ruby ::String class without having extensive knowledge about the encoding. A Chars object accepts a string upon initialization and proxies ::String methods in an encoding safe manner. All the normal ::String methods are also implemented on the proxy.

::String methods are proxied through the Chars object, and can be accessed through the mb_chars method. Methods which would normally return a ::String object now return a Chars object so methods can be chained.

'The Perfect String  '.mb_chars.downcase.strip.normalize # => "the perfect string"

Chars objects are perfectly interchangeable with ::String objects as long as no explicit class checks are made. If certain methods do explicitly check the class, call #to_s before you pass chars objects to them.

bad.explicit_checking_method 'T'.mb_chars.downcase.to_s

The default Chars implementation assumes that the encoding of the string is UTF-8, if you want to handle different encodings you can write your own multibyte string handler and configure it through proxy_class.

class CharsForUTF32
  def size
    @wrapped_string.size / 4
  end

  def self.accepts?(string)
    string.length % 4 == 0
  end
end

ActiveSupport::Multibyte.proxy_class = CharsForUTF32

Class Method Summary

Instance Attribute Summary

Instance Method Summary

Constructor Details

.new(string) ⇒ Chars

Creates a new Chars instance by wrapping string.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 52

def initialize(string)
  @wrapped_string = string
  @wrapped_string.force_encoding(Encoding::UTF_8) unless @wrapped_string.frozen?
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(method, *args, &block)

Forward all undefined methods to the wrapped string.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 58

def method_missing(method, *args, &block)
  result = @wrapped_string.__send__(method, *args, &block)
  if method.to_s =~ /!$/
    self if result
  else
    result.kind_of?(String) ? chars(result) : result
  end
end

Class Method Details

.consumes?(string) ⇒ Boolean

Returns true when the proxy class can handle the string. Returns false otherwise.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 76

def self.consumes?(string)
  string.encoding == Encoding::UTF_8
end

Instance Attribute Details

#to_s (readonly)

Alias for #wrapped_string.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 46

alias to_s wrapped_string

#to_str (readonly)

Alias for #wrapped_string.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 47

alias to_str wrapped_string

#wrapped_string (readonly) Also known as: #to_s, #to_str

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 45

attr_reader :wrapped_string

Instance Method Details

#<=>

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 49

delegate :<=>, :=~, :acts_like_string?, :to => :wrapped_string

#=~

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 49

delegate :<=>, :=~, :acts_like_string?, :to => :wrapped_string

#acts_like_string?Boolean

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 49

delegate :<=>, :=~, :acts_like_string?, :to => :wrapped_string

#capitalize

Converts the first character to uppercase and the remainder to lowercase.

'über'.mb_chars.capitalize.to_s # => "Über"
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 135

def capitalize
  (slice(0) || chars('')).upcase + (slice(1..-1) || chars('')).downcase
end

#compose

Performs composition on all the characters.

'é'.length                       # => 3
'é'.mb_chars.compose.to_s.length # => 2
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 171

def compose
  chars(Unicode.compose(@wrapped_string.codepoints.to_a).pack('U*'))
end

#decompose

Performs canonical decomposition on all the characters.

'é'.length                         # => 2
'é'.mb_chars.decompose.to_s.length # => 3
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 163

def decompose
  chars(Unicode.decompose(:canonical, @wrapped_string.codepoints.to_a).pack('U*'))
end

#downcase

Converts characters in the string to lowercase.

'VĚDA A VÝZKUM'.mb_chars.downcase.to_s # => "věda a výzkum"
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 121

def downcase
  chars Unicode.downcase(@wrapped_string)
end

#grapheme_length

Returns the number of grapheme clusters in the string.

'क्षि'.mb_chars.length   # => 4
'क्षि'.mb_chars.grapheme_length # => 3
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 179

def grapheme_length
  Unicode.unpack_graphemes(@wrapped_string).length
end

#limit(limit)

Limits the byte size of the string to a number of bytes without breaking characters. Usable when the storage for a string is limited for some reason.

'こんにちは'.mb_chars.limit(7).to_s # => "こん"
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 107

def limit(limit)
  slice(0...translate_offset(limit))
end

#normalize(form = nil)

Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.

  • form - The form you want to normalize in. Should be one of the following: :c, :kc, :d, or :kd. Default is ActiveSupport::Multibyte::Unicode.default_normalization_form

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 155

def normalize(form = nil)
  chars(Unicode.normalize(@wrapped_string, form))
end

#respond_to_missing?(method, include_private) ⇒ Boolean

Returns true if obj responds to the given method. Private methods are included in the search only if the optional second parameter evaluates to true.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 70

def respond_to_missing?(method, include_private)
  @wrapped_string.respond_to?(method, include_private)
end

#reverse

Reverses all characters in the string.

'Café'.mb_chars.reverse.to_s # => 'éfaC'
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 98

def reverse
  chars(Unicode.unpack_graphemes(@wrapped_string).reverse.flatten.pack('U*'))
end

#slice!(*args)

Works like String#slice!, but returns an instance of Chars, or nil if the string was not modified.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 91

def slice!(*args)
  chars(@wrapped_string.slice!(*args))
end

#split(*args)

Works just like String#split, with the exception that the items in the resulting list are Chars instances instead of ::String. This makes chaining methods easier.

'Café périferôl'.mb_chars.split(/é/).map { |part| part.upcase.to_s } # => ["CAF", " P", "RIFERÔL"]
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 85

def split(*args)
  @wrapped_string.split(*args).map { |i| self.class.new(i) }
end

#swapcase

Converts characters in the string to the opposite case.

'El Cañón".mb_chars.swapcase.to_s # => "eL cAÑÓN"
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 128

def swapcase
  chars Unicode.swapcase(@wrapped_string)
end

#tidy_bytes(force = false)

Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.

Passing true will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 188

def tidy_bytes(force = false)
  chars(Unicode.tidy_bytes(@wrapped_string, force))
end

#titlecase

Alias for #titleize.

[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 146

alias_method :titlecase, :titleize

#titleize Also known as: #titlecase

Capitalizes the first letter of every word, when possible.

"ÉL QUE SE ENTERÓ".mb_chars.titleize    # => "Él Que Se Enteró"
"日本語".mb_chars.titleize                 # => "日本語"
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 143

def titleize
  chars(downcase.to_s.gsub(/\b('?\S)/u) { Unicode.upcase($1)})
end

#upcase

Converts characters in the string to uppercase.

'Laurent, où sont les tests ?'.mb_chars.upcase.to_s # => "LAURENT, OÙ SONT LES TESTS ?"
[ GitHub ]

  
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 114

def upcase
  chars Unicode.upcase(@wrapped_string)
end