Class: String
Relationships & Source Files | |
Extension / Inclusion / Inheritance Descendants | |
Subclasses:
|
|
Super Chains via Extension / Inclusion / Inheritance | |
Instance Chain:
self,
::Comparable
|
|
Inherits: | Object |
Defined in: | string.c, complex.c, encoding.c, pack.rb, rational.c, symbol.c, transcode.c |
Class Method Summary
-
.try_convert(object) ⇒ Object, ...
Attempts to convert the given
object
to a string. - .new(*args) constructor Internal use only
Instance Attribute Summary
-
#ascii_only? ⇒ Boolean
readonly
Returns whether
self
contains only ASCII characters: -
#empty? ⇒ Boolean
readonly
Returns whether the length of
self
is zero: -
#valid_encoding? ⇒ Boolean
readonly
Returns
true
ifself
is encoded correctly,false
otherwise:
Instance Method Summary
-
#%(object) ⇒ String
Returns the result of formatting
object
into the format specifications contained inself
(seeFormat Specifications
): -
#*(n) ⇒ String
Returns a new string containing
n
copies ofself
: -
#+(other_string) ⇒ String
Returns a new string containing
other_string
concatenated toself
: -
#+ ⇒ String, self
Returns
self
ifself
is not frozen and can be mutated without warning issuance. -
#- ⇒ String
(also: #dedup)
Returns a frozen string equal to
self
. -
#<<(object) ⇒ self
Appends a string representation of
object
toself
; returnsself
. -
#<=>(other_string) ⇒ 1, ...
Compares
self
andother_string
, returning: -
#==(object) ⇒ Boolean
(also: #===)
Returns whether
object
is equal toself
. -
#===(object) ⇒ Boolean
Alias for #==.
-
#=~(object) ⇒ Integer?
When
object
is a::Regexp
, returns the index of the first substring inself
matched byobject
, ornil
if no match is found; updatesRegexp-related global variables
: -
#[](index) ⇒ String?
(also: #slice)
Returns the substring of
self
specified by the arguments. -
#[]=(index, new_string)
Replaces all, some, or none of the contents of
self
; returnsnew_string
. -
#append_as_bytes(*objects) ⇒ self
Concatenates each object in
objects
intoself
; returnsself
; performs no encoding validation or conversion: -
#b ⇒ String
Returns a copy of
self
that has ASCII-8BIT encoding; the underlying bytes are not modified: -
#byteindex(object, offset = 0) ⇒ Integer?
Returns the 0-based integer index of a substring of
self
specified byobject
(a string or::Regexp
) andoffset
, ornil
if there is no such substring; the returned index is the count of bytes (not characters). -
#byterindex(object, offset = self.bytesize) ⇒ Integer?
Returns the 0-based integer index of a substring of
self
that is the last match for the givenobject
(a string or::Regexp
) andoffset
, ornil
if there is no such substring; the returned index is the count of bytes (not characters). -
#bytes ⇒ array_of_bytes
Returns an array of the bytes in
self
: -
#bytesize ⇒ Integer
Returns the count of bytes in
self
. -
#byteslice(offset, length = 1) ⇒ String?
Returns a substring of
self
, ornil
if the substring cannot be constructed. -
#bytesplice(offset, length, str) ⇒ self
Replaces target bytes in
self
with source bytes from the given stringstr
; returnsself
. -
#capitalize(mapping = :ascii) ⇒ String
Returns a string containing the characters in
self
, each with possibly changed case: -
#capitalize!(mapping = :ascii) ⇒ self?
Like #capitalize, except that:
-
#casecmp(other_string) ⇒ 1, ...
Ignoring case, compares
self
andother_string
; returns: -
#casecmp?(other_string) ⇒ true, ...
Returns
true
ifself
andother_string
are equal after Unicode case folding,false
if unequal,nil
if incomparable. -
#center(size, pad_string = ' ') ⇒ String
Returns a centered copy of
self
. -
#chars ⇒ array_of_characters
Returns an array of the characters in
self
: -
#chomp(line_sep = $/) ⇒ String
Returns a new string copied from
self
, with trailing characters possibly removed: -
#chomp!(line_sep = $/) ⇒ self?
Like #chomp, except that:
-
#chop ⇒ String
Returns a new string copied from
self
, with trailing characters possibly removed. -
#chop! ⇒ self?
Like #chop, except that:
-
#chr ⇒ String
Returns a string containing the first character of
self
: -
#clear ⇒ self
Removes the contents of
self
: -
#codepoints ⇒ array_of_integers
Returns an array of the codepoints in
self
; each codepoint is the integer value for a character: -
#concat(*objects) ⇒ String
Concatenates each object in
objects
toself
; returnsself
: -
#count(*selectors) ⇒ Integer
Returns the total number of characters in
self
that are specified by the given selectors. -
#crypt(salt_str) ⇒ String
Returns the string generated by calling
crypt(3)
standard library function withstr
andsalt_str
, in this order, as its arguments. -
#dedup ⇒ String
Alias for #-@.
-
#delete(*selectors) ⇒ String
Returns a new string that is a copy of
self
with certain characters removed; the removed characters are all instances of those specified by the given stringselectors
. -
#delete!(*selectors) ⇒ self?
Like #delete, but modifies
self
in place; returnsself
if any characters were deleted,nil
otherwise. -
#delete_prefix(prefix) ⇒ String
Returns a copy of
self
with leading substringprefix
removed: -
#delete_prefix!(prefix) ⇒ self?
Like #delete_prefix, except that
self
is modified in place; returnsself
if the prefix is removed,nil
otherwise. -
#delete_suffix(suffix) ⇒ String
Returns a copy of
self
with trailing substringsuffix
removed: -
#delete_suffix!(suffix) ⇒ self?
Like #delete_suffix, except that
self
is modified in place; returnsself
if the suffix is removed,nil
otherwise. -
#downcase(mapping) ⇒ String
Returns a new string containing the downcased characters in
self
: -
#downcase!(mapping) ⇒ self?
Like #downcase, except that:
-
#dump ⇒ String
Returns a printable version of
self
, enclosed in double-quotes: -
#each_byte {|byte| ... } ⇒ self
With a block given, calls the block with each successive byte from
self
; returnsself
: -
#each_char {|char| ... } ⇒ self
With a block given, calls the block with each successive character from
self
; returnsself
: -
#each_codepoint {|codepoint| ... } ⇒ self
With a block given, calls the block with each successive codepoint from
self
; each codepoint is the integer value for a character; returnsself
: -
#each_grapheme_cluster {|grapheme_cluster| ... } ⇒ self
-
#each_line(record_separator = $/, chomp: false) {|substring| ... } ⇒ self
With a block given, forms the substrings (lines) that are the result of splitting
self
at each occurrence of the givenrecord_separator
; passes each line to the block; returnsself
. -
#encode(dst_encoding = Encoding.default_internal, **enc_opts) ⇒ String
Returns a copy of
self
transcoded as determined bydst_encoding
; see Encodings. -
#encode!(dst_encoding = Encoding.default_internal, **enc_opts) ⇒ self
Like #encode, but applies encoding changes to
self
; returnsself
. -
#encoding ⇒ Encoding
Alias for Regexp#encoding.
-
#end_with?(*strings) ⇒ Boolean
Returns whether
self
ends with any of the givenstrings
: -
#eql?(object) ⇒ Boolean
Returns whether
self
andobject
have the same length and content: -
#force_encoding(encoding) ⇒ self
Changes the encoding of
self
to the givenencoding
, which may be a string encoding name or an::Encoding
object; does not change the underlying bytes; returns self: -
#getbyte(index) ⇒ Integer?
Returns the byte at zero-based
index
as an integer: -
#grapheme_clusters ⇒ array_of_grapheme_clusters
Returns an array of the grapheme clusters in
self
(see Unicode Grapheme Cluster Boundaries): -
#gsub(pattern, replacement) ⇒ String
Returns a copy of
self
with zero or more substrings replaced. -
#gsub!(pattern, replacement) ⇒ self?
Like #gsub, except that:
-
#hash ⇒ Integer
Returns the integer hash value for
self
. -
#hex ⇒ Integer
Interprets the leading substring of
self
as hexadecimal; returns its integer value: -
#include?(other_string) ⇒ Boolean
Returns whether
self
containsother_string
: -
#index(pattern, offset = 0) ⇒ Integer?
Returns the integer position of the first substring that matches the given argument
pattern
, ornil
if none found. -
#new(string = ''.encode(Encoding::ASCII_8BIT), **options) ⇒ String
constructor
Returns a new String object containing the given
string
. -
#initialize_copy(other_string) ⇒ self
Alias for #replace.
-
#insert(offset, other_string) ⇒ self
Inserts the given
other_string
intoself
; returnsself
. -
#inspect ⇒ String
Returns a printable version of
self
, enclosed in double-quotes. -
#intern ⇒ Symbol
(also: #to_sym)
Returns the
::Symbol
object derived fromself
, creating it if it did not already exist: -
#length ⇒ Integer
(also: #size)
Returns the count of characters (not bytes) in
self
: -
#lines(record_separator = $/, chomp: false) ⇒ String
Returns substrings (“lines”) of
self
according to the given arguments: -
#ljust(width, pad_string = ' ') ⇒ String
Returns a copy of
self
, left-justified and, if necessary, right-padded with thepad_string
: -
#lstrip ⇒ String
Returns a copy of
self
with leading whitespace removed; seeWhitespace in Strings
: -
#lstrip! ⇒ self?
Like #lstrip, except that any modifications are made in
self
; returnsself
if any modification are made,nil
otherwise. -
#match(pattern, offset = 0) ⇒ MatchData?
Returns a
::MatchData
object (ornil
) based onself
and the givenpattern
. -
#match?(pattern, offset = 0) ⇒ Boolean
Returns
true
orfalse
based on whether a match is found forself
andpattern
. -
#next ⇒ String
(also: #succ)
Returns the successor to
self
. -
#next! ⇒ self
(also: #succ!)
Equivalent to #succ, but modifies
self
in place; returnsself
. -
#oct ⇒ Integer
Interprets the leading substring of
self
as a string of octal digits (with an optional sign) and returns the corresponding number; returns zero if there is no such leading substring: -
#ord ⇒ Integer
Returns the integer ordinal of the first character of
self
: -
#partition(string_or_regexp) ⇒ Array, ...
Returns a 3-element array of substrings of
self
. -
#prepend(*other_strings) ⇒ String
Prepends each string in
other_strings
toself
and returnsself
: -
#replace(other_string) ⇒ self
(also: #initialize_copy)
Replaces the contents of
self
with the contents ofother_string
: -
#reverse ⇒ String
Returns a new string with the characters from
self
in reverse order. -
#reverse! ⇒ self
Returns
self
with its characters reversed: -
#rindex(substring, offset = self.length) ⇒ Integer?
Returns the
::Integer
index of the last occurrence of the givensubstring
, ornil
if none found: -
#rjust(size, pad_string = ' ') ⇒ String
Returns a right-justified copy of
self
. -
#rpartition(sep) ⇒ Array, ...
Returns a 3-element array of substrings of
self
. -
#rstrip ⇒ String
Returns a copy of the receiver with trailing whitespace removed; see
Whitespace in Strings
: -
#rstrip! ⇒ self?
Like #rstrip, except that any modifications are made in
self
; returnsself
if any modification are made,nil
otherwise. -
#scan(string_or_regexp) ⇒ Array
Matches a pattern against
self
; the pattern is: -
#scrub(replacement_string = default_replacement) ⇒ String
Returns a copy of
self
with each invalid byte sequence replaced by the givenreplacement_string
. -
#scrub! ⇒ self
Like #scrub, except that any replacements are made in
self
. -
#setbyte(index, integer) ⇒ Integer
Sets the byte at zero-based #index to
integer
; returnsinteger
: -
#size ⇒ Integer
Alias for #length.
-
#[](index) ⇒ String?
Alias for #[].
-
#slice!(index) ⇒ String?
Removes and returns the substring of
self
specified by the arguments. -
#split(field_sep = $;, limit = 0) ⇒ Array
Returns an array of substrings of
self
that are the result of splittingself
at each occurrence of the given field separatorfield_sep
. -
#squeeze(*selectors) ⇒ String
Returns a copy of
self
with characters specified byselectors
“squeezed” (seeMultiple Character Selectors
): -
#squeeze!(*selectors) ⇒ self?
Like #squeeze, but modifies
self
in place. -
#start_with?(*string_or_regexp) ⇒ Boolean
Returns whether
self
starts with any of the givenstring_or_regexp
. -
#strip ⇒ String
Returns a copy of the receiver with leading and trailing whitespace removed; see
Whitespace in Strings
: -
#strip! ⇒ self?
Like #strip, except that any modifications are made in
self
; returnsself
if any modification are made,nil
otherwise. -
#sub(pattern, replacement) ⇒ String
Returns a copy of
self
with only the first occurrence (not all occurrences) of the givenpattern
replaced. -
#sub!(pattern, replacement) ⇒ self?
Replaces the first occurrence (not all occurrences) of the given
pattern
onself
; returnsself
if a replacement occurred,nil
otherwise. -
#succ ⇒ String
Alias for #next.
-
#succ! ⇒ self
Alias for #next!.
-
#sum(n = 16) ⇒ Integer
Returns a basic
n
-bit checksum of the characters inself
; the checksum is the sum of the binary value of each byte inself
, modulo2**n - 1
: -
#swapcase(mapping) ⇒ String
Returns a string containing the characters in
self
, with cases reversed; each uppercase character is downcased; each lowercase character is upcased: -
#swapcase!(mapping) ⇒ self?
Upcases each lowercase character in
self
; downcases uppercase character; returnsself
if any changes were made,nil
otherwise: -
#to_c ⇒ Complex
Returns
self
interpreted as a::Complex
object; leading whitespace and trailing garbage are ignored: -
#to_f ⇒ Float
Returns the result of interpreting leading characters in
self
as a::Float
: -
#to_i(base = 10) ⇒ Integer
Returns the result of interpreting leading characters in
self
as an integer in the givenbase
(which must be in (0, 2..36)): -
#to_r ⇒ Rational
Returns the result of interpreting leading characters in
str
as a rational. -
#to_s ⇒ self, String
(also: #to_str)
Returns
self
ifself
is aString
, orself
converted to aString
ifself
is a subclass ofString
. -
#to_str ⇒ self, String
Alias for #to_s.
-
#to_sym ⇒ Symbol
Alias for #intern.
-
#tr(selector, replacements) ⇒ String
Returns a copy of
self
with each character specified by stringselector
translated to the corresponding character in stringreplacements
. -
#tr!(selector, replacements) ⇒ self?
Like #tr, but modifies
self
in place. -
#tr_s(selector, replacements) ⇒ String
Like #tr, but also squeezes the modified portions of the translated string; returns a new string (translated and squeezed).
-
#tr_s!(selector, replacements) ⇒ self?
Like #tr_s, but modifies
self
in place. -
#undump ⇒ String
Returns an unescaped version of
self
: -
#unicode_normalize(form = :nfc) ⇒ String
Returns a copy of
self
with Unicode normalization applied. -
#unicode_normalize!(form = :nfc) ⇒ self
Like #unicode_normalize, except that the normalization is performed on
self
. -
#unicode_normalized?(form = :nfc) ⇒ Boolean
Returns
true
ifself
is in the givenform
of Unicode normalization,false
otherwise. -
#unpack(template, offset: 0, &block) ⇒ Array
Extracts data from
self
. -
#unpack1(template, offset: 0) ⇒ Object
Like #unpack, but unpacks and returns only the first extracted object.
-
#upcase(mapping) ⇒ String
Returns a string containing the upcased characters in
self
: -
#upcase!(mapping) ⇒ self?
Upcases the characters in
self
; returnsself
if any changes were made,nil
otherwise: -
#upto(other_string, exclusive = false) {|string| ... } ⇒ self
With a block given, calls the block with each
String
value returned by successive calls toString#succ;
the first value isself
, the next isself.succ
, and so on; the sequence terminates when valueother_string
is reached; returnsself
: - #dup Internal use only
- #freeze Internal use only
::Comparable
- Included
#< | Compares two objects based on the receiver’s #<=> method, returning true if it returns a value less than 0. |
#<= | Compares two objects based on the receiver’s #<=> method, returning true if it returns a value less than or equal to 0. |
#== | Compares two objects based on the receiver’s #<=> method, returning true if it returns 0. |
#> | Compares two objects based on the receiver’s #<=> method, returning true if it returns a value greater than 0. |
#>= | Compares two objects based on the receiver’s #<=> method, returning true if it returns a value greater than or equal to 0. |
#between? | |
#clamp |
Constructor Details
.new(*args)
# File 'string.c', line 2102
static VALUE rb_str_s_new(int argc, VALUE *argv, VALUE klass) { if (klass != rb_cString) { return rb_class_new_instance_pass_kw(argc, argv, klass); } static ID keyword_ids[2]; VALUE orig, opt, encoding = Qnil, capacity = Qnil; VALUE kwargs[2]; rb_encoding *enc = NULL; int n = rb_scan_args(argc, argv, "01:", &orig, &opt); if (NIL_P(opt)) { return rb_class_new_instance_pass_kw(argc, argv, klass); } keyword_ids[0] = rb_id_encoding(); CONST_ID(keyword_ids[1], "capacity"); rb_get_kwargs(opt, keyword_ids, 0, 2, kwargs); encoding = kwargs[0]; capacity = kwargs[1]; if (n == 1) { orig = StringValue(orig); } else { orig = Qnil; } if (UNDEF_P(encoding)) { if (!NIL_P(orig)) { encoding = rb_obj_encoding(orig); } } if (!UNDEF_P(encoding)) { enc = rb_to_encoding(encoding); } // If capacity is nil, we're basically just duping `orig`. if (UNDEF_P(capacity)) { if (NIL_P(orig)) { VALUE empty_str = str_new(klass, "", 0); if (enc) { rb_enc_associate(empty_str, enc); } return empty_str; } VALUE copy = str_duplicate(klass, orig); rb_enc_associate(copy, enc); ENC_CODERANGE_CLEAR(copy); return copy; } long capa = 0; capa = NUM2LONG(capacity); if (capa < 0) { capa = 0; } if (!NIL_P(orig)) { long orig_capa = rb_str_capacity(orig); if (orig_capa > capa) { capa = orig_capa; } } VALUE str = str_enc_new(klass, NULL, capa, enc); STR_SET_LEN(str, 0); TERM_FILL(RSTRING_PTR(str), enc ? rb_enc_mbmaxlen(enc) : 1); if (!NIL_P(orig)) { rb_str_buf_append(str, orig); } return str; }
#new(string = ''.encode(Encoding::ASCII_8BIT), **options) ⇒ String
Returns a new String object containing the given string
.
The options
are optional keyword options (see below).
With no argument given and keyword #encoding also not given, returns an empty string with the ::Encoding
ASCII-8BIT
:
s = String.new # => ""
s.encoding # => #<Encoding:ASCII-8BIT>
With argument string
given and keyword option #encoding not given, returns a new string with the same encoding as string
:
s0 = 'foo'.encode(Encoding::UTF_16)
s1 = String.new(s0)
s1.encoding # => #<Encoding:UTF-16 (dummy)>
(Unlike String.new, a string literal
like ''
or a here document literal
always has script encoding
.)
With keyword option #encoding given, returns a string with the specified encoding; the #encoding may be an ::Encoding
object, an encoding name, or an encoding name alias:
String.new(encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII>
String.new('', encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII>
String.new('foo', encoding: Encoding::US_ASCII).encoding # => #<Encoding:US-ASCII>
String.new('foo', encoding: 'US-ASCII').encoding # => #<Encoding:US-ASCII>
String.new('foo', encoding: 'ASCII').encoding # => #<Encoding:US-ASCII>
The given encoding need not be valid for the string’s content, and its validity is not checked:
s = String.new('こんにちは', encoding: 'ascii')
s.valid_encoding? # => false
But the given #encoding itself is checked:
String.new('foo', encoding: 'bar') # Raises ArgumentError.
With keyword option capacity
given, the given value is advisory only, and may or may not set the size of the internal buffer, which may in turn affect performance:
String.new('foo', capacity: 1) # Buffer size is at least 4 (includes terminal null byte).
String.new('foo', capacity: 4096) # Buffer size is at least 4;
# may be equal to, greater than, or less than 4096.
# File 'string.c', line 2024
static VALUE rb_str_init(int argc, VALUE *argv, VALUE str) { static ID keyword_ids[2]; VALUE orig, opt, venc, vcapa; VALUE kwargs[2]; rb_encoding *enc = 0; int n; if (!keyword_ids[0]) { keyword_ids[0] = rb_id_encoding(); CONST_ID(keyword_ids[1], "capacity"); } n = rb_scan_args(argc, argv, "01:", &orig, &opt); if (!NIL_P(opt)) { rb_get_kwargs(opt, keyword_ids, 0, 2, kwargs); venc = kwargs[0]; vcapa = kwargs[1]; if (!UNDEF_P(venc) && !NIL_P(venc)) { enc = rb_to_encoding(venc); } if (!UNDEF_P(vcapa) && !NIL_P(vcapa)) { long capa = NUM2LONG(vcapa); long len = 0; int termlen = enc ? rb_enc_mbminlen(enc) : 1; if (capa < STR_BUF_MIN_SIZE) { capa = STR_BUF_MIN_SIZE; } if (n == 1) { StringValue(orig); len = RSTRING_LEN(orig); if (capa < len) { capa = len; } if (orig == str) n = 0; } str_modifiable(str); if (STR_EMBED_P(str) || FL_TEST(str, STR_SHARED|STR_NOFREE)) { /* make noembed always */ const size_t size = (size_t)capa + termlen; const char *const old_ptr = RSTRING_PTR(str); const size_t osize = RSTRING_LEN(str) + TERM_LEN(str); char *new_ptr = ALLOC_N(char, size); if (STR_EMBED_P(str)) RUBY_ASSERT((long)osize <= str_embed_capa(str)); memcpy(new_ptr, old_ptr, osize < size ? osize : size); FL_UNSET_RAW(str, STR_SHARED|STR_NOFREE); RSTRING(str)->as.heap.ptr = new_ptr; } else if (STR_HEAP_SIZE(str) != (size_t)capa + termlen) { SIZED_REALLOC_N(RSTRING(str)->as.heap.ptr, char, (size_t)capa + termlen, STR_HEAP_SIZE(str)); } STR_SET_LEN(str, len); TERM_FILL(&RSTRING(str)->as.heap.ptr[len], termlen); if (n == 1) { memcpy(RSTRING(str)->as.heap.ptr, RSTRING_PTR(orig), len); rb_enc_cr_str_exact_copy(str, orig); } FL_SET(str, STR_NOEMBED); RSTRING(str)->as.heap.aux.capa = capa; } else if (n == 1) { rb_str_replace(str, orig); } if (enc) { rb_enc_associate(str, enc); ENC_CODERANGE_CLEAR(str); } } else if (n == 1) { rb_str_replace(str, orig); } return str; }
Class Method Details
.try_convert(object) ⇒ Object, ...
Attempts to convert the given object
to a string.
If object
is already a string, returns object
, unmodified.
Otherwise if object
responds to :to_str
, calls object.to_str
and returns the result.
Returns nil
if object
does not respond to :to_str
.
Raises an exception unless object.to_str
returns a string.
# File 'string.c', line 2931
static VALUE rb_str_s_try_convert(VALUE dummy, VALUE str) { return rb_check_string_type(str); }
Instance Attribute Details
#ascii_only? ⇒ Boolean
(readonly)
Returns whether self
contains only ASCII characters:
'abc'.ascii_only? # => true
"abc\u{6666}".ascii_only? # => false
Related: see Querying
.
# File 'string.c', line 11571
static VALUE rb_str_is_ascii_only_p(VALUE str) { int cr = rb_enc_str_coderange(str); return RBOOL(cr == ENC_CODERANGE_7BIT); }
#empty? ⇒ Boolean
(readonly)
Returns whether the length of self
is zero:
'hello'.empty? # => false
' '.empty? # => false
''.empty? # => true
Related: see Querying
.
# File 'string.c', line 2429
static VALUE rb_str_empty(VALUE str) { return RBOOL(RSTRING_LEN(str) == 0); }
#valid_encoding? ⇒ Boolean
(readonly)
Returns true
if self
is encoded correctly, false
otherwise:
"\xc2\xa1".force_encoding(Encoding::UTF_8).valid_encoding? # => true
"\xc2".force_encoding(Encoding::UTF_8).valid_encoding? # => false
"\x80".force_encoding(Encoding::UTF_8).valid_encoding? # => false
# File 'string.c', line 11551
static VALUE rb_str_valid_encoding_p(VALUE str) { int cr = rb_enc_str_coderange(str); return RBOOL(cr != ENC_CODERANGE_BROKEN); }
Instance Method Details
#%(object) ⇒ String
Returns the result of formatting object
into the format specifications contained in self
(see Format Specifications
):
'%05d' % 123 # => "00123"
If self
contains multiple format specifications, object
must be an array or hash containing the objects to be formatted:
'%-5s: %016x' % [ 'ID', self.object_id ] # => "ID : 00002b054ec93168"
'foo = %{foo}' % {foo: 'bar'} # => "foo = bar"
'foo = %{foo}, baz = %{baz}' % {foo: 'bar', baz: 'bat'} # => "foo = bar, baz = bat"
Related: see Converting to New String
.
# File 'string.c', line 2597
static VALUE rb_str_format_m(VALUE str, VALUE arg) { VALUE tmp = rb_check_array_type(arg); if (!NIL_P(tmp)) { return rb_str_format(RARRAY_LENINT(tmp), RARRAY_CONST_PTR(tmp), str); } return rb_str_format(1, &arg, str); }
#*(n) ⇒ String
Returns a new string containing n
copies of self
:
'Ho!' * 3 # => "Ho!Ho!Ho!"
'No!' * 0 # => ""
Related: see Converting to New String
.
# File 'string.c', line 2519
VALUE rb_str_times(VALUE str, VALUE times) { VALUE str2; long n, len; char *ptr2; int termlen; if (times == INT2FIX(1)) { return str_duplicate(rb_cString, str); } if (times == INT2FIX(0)) { str2 = str_alloc_embed(rb_cString, 0); rb_enc_copy(str2, str); return str2; } len = NUM2LONG(times); if (len < 0) { rb_raise(rb_eArgError, "negative argument"); } if (RSTRING_LEN(str) == 1 && RSTRING_PTR(str)[0] == 0) { if (STR_EMBEDDABLE_P(len, 1)) { str2 = str_alloc_embed(rb_cString, len + 1); memset(RSTRING_PTR(str2), 0, len + 1); } else { str2 = str_alloc_heap(rb_cString); RSTRING(str2)->as.heap.aux.capa = len; RSTRING(str2)->as.heap.ptr = ZALLOC_N(char, (size_t)len + 1); } STR_SET_LEN(str2, len); rb_enc_copy(str2, str); return str2; } if (len && LONG_MAX/len < RSTRING_LEN(str)) { rb_raise(rb_eArgError, "argument too big"); } len *= RSTRING_LEN(str); termlen = TERM_LEN(str); str2 = str_enc_new(rb_cString, 0, len, STR_ENC_GET(str)); ptr2 = RSTRING_PTR(str2); if (len) { n = RSTRING_LEN(str); memcpy(ptr2, RSTRING_PTR(str), n); while (n <= len/2) { memcpy(ptr2 + n, ptr2, n); n *= 2; } memcpy(ptr2 + n, ptr2, len-n); } STR_SET_LEN(str2, len); TERM_FILL(&ptr2[len], termlen); rb_enc_cr_str_copy_for_substr(str2, str); return str2; }
#+(other_string) ⇒ String
Returns a new string containing other_string
concatenated to self
:
'Hello from ' + self.to_s # => "Hello from main"
Related: see Converting to New String
.
# File 'string.c', line 2446
VALUE rb_str_plus(VALUE str1, VALUE str2) { VALUE str3; rb_encoding *enc; char *ptr1, *ptr2, *ptr3; long len1, len2; int termlen; StringValue(str2); enc = rb_enc_check_str(str1, str2); RSTRING_GETMEM(str1, ptr1, len1); RSTRING_GETMEM(str2, ptr2, len2); termlen = rb_enc_mbminlen(enc); if (len1 > LONG_MAX - len2) { rb_raise(rb_eArgError, "string size too big"); } str3 = str_enc_new(rb_cString, 0, len1+len2, enc); ptr3 = RSTRING_PTR(str3); memcpy(ptr3, ptr1, len1); memcpy(ptr3+len1, ptr2, len2); TERM_FILL(&ptr3[len1+len2], termlen); ENCODING_CODERANGE_SET(str3, rb_enc_to_index(enc), ENC_CODERANGE_AND(ENC_CODERANGE(str1), ENC_CODERANGE(str2))); RB_GC_GUARD(str1); RB_GC_GUARD(str2); return str3; }
#+ ⇒ String
, self
Returns self
if self
is not frozen and can be mutated without warning issuance.
Otherwise returns self.dup
, which is not frozen.
Related: see Freezing/Unfreezing
.
# File 'string.c', line 3260
static VALUE str_uplus(VALUE str) { if (OBJ_FROZEN(str) || CHILLED_STRING_P(str)) { return rb_str_dup(str); } else { return str; } }
#- ⇒ String
Also known as: #dedup
Returns a frozen string equal to self
.
The returned string is self
if and only if all of the following are true:
-
self
is already frozen. -
self
is an instance of String (rather than of a subclass of String) -
self
has no instance variables set on it.
Otherwise, the returned string is a frozen copy of self
.
Returning self
, when possible, saves duplicating self
; see {Data deduplication}.
It may also save duplicating other, already-existing, strings:
s0 = 'foo'
s1 = 'foo'
s0.object_id == s1.object_id # => false
(-s0).object_id == (-s1).object_id # => true
Note that method #-@
is convenient for defining a constant:
FileName = -'config/database.yml'
While its alias #dedup is better suited for chaining:
'foo'.dedup.gsub!('o')
Related: see Freezing/Unfreezing
.
# File 'string.c', line 3305
static VALUE str_uminus(VALUE str) { if (!BARE_STRING_P(str) && !rb_obj_frozen_p(str)) { str = rb_str_dup(str); } return rb_fstring(str); }
#<<(object) ⇒ self
Appends a string representation of object
to self
; returns self
.
If object
is a string, appends it to self
:
s = 'foo'
s << 'bar' # => "foobar"
s # => "foobar"
If object
is an integer, its value is considered a codepoint; converts the value to a character before concatenating:
s = 'foo'
s << 33 # => "foo!"
Additionally, if the codepoint is in range 0..0xff
and the encoding of self
is Encoding::US_ASCII
, changes the encoding to Encoding::ASCII_8BIT
:
s = 'foo'.encode(Encoding::US_ASCII)
s.encoding # => #<Encoding:US-ASCII>
s << 0xff # => "foo\xFF"
s.encoding # => #<Encoding:BINARY (ASCII-8BIT)>
Raises RangeError if that codepoint is not representable in the encoding of self
:
s = 'foo'
s.encoding # => <Encoding:UTF-8>
s << 0x00110000 # 1114112 out of char range (RangeError)
s = 'foo'.encode(Encoding::EUC_JP)
s << 0x00800080 # invalid codepoint 0x800080 in EUC-JP (RangeError)
Related: see Modifying
.
# File 'string.c', line 3993
VALUE rb_str_concat(VALUE str1, VALUE str2) { unsigned int code; rb_encoding *enc = STR_ENC_GET(str1); int encidx; if (RB_INTEGER_TYPE_P(str2)) { if (rb_num_to_uint(str2, &code) == 0) { } else if (FIXNUM_P(str2)) { rb_raise(rb_eRangeError, "%ld out of char range", FIX2LONG(str2)); } else { rb_raise(rb_eRangeError, "bignum out of char range"); } } else { return rb_str_append(str1, str2); } encidx = rb_ascii8bit_appendable_encoding_index(enc, code); if (encidx >= 0) { rb_str_buf_cat_byte(str1, (unsigned char)code); } else { long pos = RSTRING_LEN(str1); int cr = ENC_CODERANGE(str1); int len; char *buf; switch (len = rb_enc_codelen(code, enc)) { case ONIGERR_INVALID_CODE_POINT_VALUE: rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc)); break; case ONIGERR_TOO_BIG_WIDE_CHAR_VALUE: case 0: rb_raise(rb_eRangeError, "%u out of char range", code); break; } buf = ALLOCA_N(char, len + 1); rb_enc_mbcput(code, buf, enc); if (rb_enc_precise_mbclen(buf, buf + len + 1, enc) != len) { rb_raise(rb_eRangeError, "invalid codepoint 0x%X in %s", code, rb_enc_name(enc)); } rb_str_resize(str1, pos+len); memcpy(RSTRING_PTR(str1) + pos, buf, len); if (cr == ENC_CODERANGE_7BIT && code > 127) { cr = ENC_CODERANGE_VALID; } else if (cr == ENC_CODERANGE_BROKEN) { cr = ENC_CODERANGE_UNKNOWN; } ENC_CODERANGE_SET(str1, cr); } return str1; }
#<=>(other_string) ⇒ 1
, ...
Compares self
and other_string
, returning:
-
-1 if
other_string
is larger. -
0 if the two are equal.
-
1 if
other_string
is smaller. -
nil
if the two are incomparable.
Examples:
'foo' <=> 'foo' # => 0
'foo' <=> 'food' # => -1
'food' <=> 'foo' # => 1
'FOO' <=> 'foo' # => -1
'foo' <=> 'FOO' # => 1
'foo' <=> 1 # => nil
Related: see Comparing
.
# File 'string.c', line 4279
static VALUE rb_str_cmp_m(VALUE str1, VALUE str2) { int result; VALUE s = rb_check_string_type(str2); if (NIL_P(s)) { return rb_invcmp(str1, str2); } result = rb_str_cmp(str1, s); return INT2FIX(result); }
#==(object) ⇒ Boolean
Also known as: #===
Returns whether object
is equal to self
.
When object
is a string, returns whether object
has the same length and content as self
:
s = 'foo'
s == 'foo' # => true
s == 'food' # => false
s == 'FOO' # => false
Returns false
if the two strings’ encodings are not compatible:
"\u{e4 f6 fc}".encode(Encoding::ISO_8859_1) == ("\u{c4 d6 dc}") # => false
When object
is not a string:
-
If
object
responds to method #to_str,object == self
is called and its return value is returned. -
If
object
does not respond to #to_str,false
is returned.
Related: Comparing
.
# File 'string.c', line 4227
VALUE rb_str_equal(VALUE str1, VALUE str2) { if (str1 == str2) return Qtrue; if (!RB_TYPE_P(str2, T_STRING)) { if (!rb_respond_to(str2, idTo_str)) { return Qfalse; } return rb_equal(str2, str1); } return rb_str_eql_internal(str1, str2); }
#==(object) ⇒ Boolean
#===(object) ⇒ Boolean
Boolean
#===(object) ⇒ Boolean
Alias for #==.
#=~(object) ⇒ Integer?
When object
is a ::Regexp
, returns the index of the first substring in self
matched by object
, or nil
if no match is found; updates Regexp-related global variables
:
'foo' =~ /f/ # => 0
$~ # => #<MatchData "f">
'foo' =~ /o/ # => 1
$~ # => #<MatchData "o">
'foo' =~ /x/ # => nil
$~ # => nil
Note that string =~ regexp
is different from regexp =~ string
(see Regexp#=~):
number = nil
'no. 9' =~ /(?<number>\d+)/ # => 4
number # => nil # Not assigned.
/(?<number>\d+)/ =~ 'no. 9' # => 4
number # => "9" # Assigned.
If object
is not a ::Regexp
, returns the value returned by object =~ self
.
Related: see Querying
.
# File 'string.c', line 5046
static VALUE rb_str_match(VALUE x, VALUE y) { switch (OBJ_BUILTIN_TYPE(y)) { case T_STRING: rb_raise(rb_eTypeError, "type mismatch: String given"); case T_REGEXP: return rb_reg_match(y, x); default: return rb_funcall(y, idEqTilde, 1, x); } }
#[](index) ⇒ String
?
#[](start, length) ⇒ String
?
#[](range) ⇒ String
?
#[](substring) ⇒ String
?
Also known as: #slice
String
?
#[](start, length) ⇒ String
?
#[](range) ⇒ String
?
#[](substring) ⇒ String
?
Returns the substring of self
specified by the arguments. See examples at
.String
Slices
Related: see Converting to New String
.
# File 'string.c', line 5804
static VALUE rb_str_aref_m(int argc, VALUE *argv, VALUE str) { if (argc == 2) { if (RB_TYPE_P(argv[0], T_REGEXP)) { return rb_str_subpat(str, argv[0], argv[1]); } else { return rb_str_substr_two_fixnums(str, argv[0], argv[1], TRUE); } } rb_check_arity(argc, 1, 2); return rb_str_aref(str, argv[0]); }
#[]=(index, new_string)
#[]=(start, length, new_string)
#[]=(range, new_string)
#[]=(substring, new_string)
Replaces all, some, or none of the contents of self
; returns new_string
. See
.String
Slices
A few examples:
s = 'foo'
s[2] = 'rtune' # => "rtune"
s # => "fortune"
s[1, 5] = 'init' # => "init"
s # => "finite"
s[3..4] = 'al' # => "al"
s # => "finale"
s[/e$/] = 'ly' # => "ly"
s # => "finally"
s['lly'] = 'ncial' # => "ncial"
s # => "financial"
Related: see Modifying
.
# File 'string.c', line 6041
static VALUE rb_str_aset_m(int argc, VALUE *argv, VALUE str) { if (argc == 3) { if (RB_TYPE_P(argv[0], T_REGEXP)) { rb_str_subpat_set(str, argv[0], argv[1], argv[2]); } else { rb_str_update(str, NUM2LONG(argv[0]), NUM2LONG(argv[1]), argv[2]); } return argv[2]; } rb_check_arity(argc, 2, 3); return rb_str_aset(str, argv[0], argv[1]); }
#append_as_bytes(*objects) ⇒ self
Concatenates each object in objects
into self
; returns self
; performs no encoding validation or conversion:
s = 'foo'
s.append_as_bytes(" \xE2\x82") # => "foo \xE2\x82"
s.valid_encoding? # => false
s.append_as_bytes("\xAC 12")
s.valid_encoding? # => true
When a given object is an integer, the value is considered an 8-bit byte; if the integer occupies more than one byte (i.e,. is greater than 255), appends only the low-order byte (similar to #setbyte):
s = ""
s.append_as_bytes(0, 257) # => "\u0000\u0001"
s.bytesize # => 2
Related: see Modifying
.
# File 'string.c', line 3842
VALUE rb_str_append_as_bytes(int argc, VALUE *argv, VALUE str) { long needed_capacity = 0; volatile VALUE t0; enum ruby_value_type *types = ALLOCV_N(enum ruby_value_type, t0, argc); for (int index = 0; index < argc; index++) { VALUE obj = argv[index]; enum ruby_value_type type = types[index] = rb_type(obj); switch (type) { case T_FIXNUM: case T_BIGNUM: needed_capacity++; break; case T_STRING: needed_capacity += RSTRING_LEN(obj); break; default: rb_raise( rb_eTypeError, "wrong argument type %"PRIsVALUE" (expected String or Integer)", rb_obj_class(obj) ); break; } } str_ensure_available_capa(str, needed_capacity); char *sptr = RSTRING_END(str); for (int index = 0; index < argc; index++) { VALUE obj = argv[index]; enum ruby_value_type type = types[index]; switch (type) { case T_FIXNUM: case T_BIGNUM: { argv[index] = obj = rb_int_and(obj, INT2FIX(0xff)); char byte = (char)(NUM2INT(obj) & 0xFF); *sptr = byte; sptr++; break; } case T_STRING: { const char *ptr; long len; RSTRING_GETMEM(obj, ptr, len); memcpy(sptr, ptr, len); sptr += len; break; } default: rb_bug("append_as_bytes arguments should have been validated"); } } STR_SET_LEN(str, RSTRING_LEN(str) + needed_capacity); TERM_FILL(sptr, TERM_LEN(str)); /* sentinel */ int cr = ENC_CODERANGE(str); switch (cr) { case ENC_CODERANGE_7BIT: { for (int index = 0; index < argc; index++) { VALUE obj = argv[index]; enum ruby_value_type type = types[index]; switch (type) { case T_FIXNUM: case T_BIGNUM: { if (!ISASCII(NUM2INT(obj))) { goto clear_cr; } break; } case T_STRING: { if (ENC_CODERANGE(obj) != ENC_CODERANGE_7BIT) { goto clear_cr; } break; } default: rb_bug("append_as_bytes arguments should have been validated"); } } break; } case ENC_CODERANGE_VALID: if (ENCODING_GET_INLINED(str) == ENCINDEX_ASCII_8BIT) { goto keep_cr; } else { goto clear_cr; } break; default: goto clear_cr; break; } RB_GC_GUARD(t0); clear_cr: // If no fast path was hit, we clear the coderange. // append_as_bytes is predominently meant to be used in // buffering situation, hence it's likely the coderange // will never be scanned, so it's not worth spending time // precomputing the coderange except for simple and common // situations. ENC_CODERANGE_CLEAR(str); keep_cr: return str; }
#b ⇒ String
Returns a copy of self
that has ASCII-8BIT encoding; the underlying bytes are not modified:
s = "\x99"
s.encoding # => #<Encoding:UTF-8>
t = s.b # => "\x99"
t.encoding # => #<Encoding:ASCII-8BIT>
s = "\u4095" # => "䂕"
s.encoding # => #<Encoding:UTF-8>
s.bytes # => [228, 130, 149]
t = s.b # => "\xE4\x82\x95"
t.encoding # => #<Encoding:ASCII-8BIT>
t.bytes # => [228, 130, 149]
Related: see Converting to New String
.
# File 'string.c', line 11507
static VALUE rb_str_b(VALUE str) { VALUE str2; if (STR_EMBED_P(str)) { str2 = str_alloc_embed(rb_cString, RSTRING_LEN(str) + TERM_LEN(str)); } else { str2 = str_alloc_heap(rb_cString); } str_replace_shared_without_enc(str2, str); if (rb_enc_asciicompat(STR_ENC_GET(str))) { // BINARY strings can never be broken; they're either 7-bit ASCII or VALID. // If we know the receiver's code range then we know the result's code range. int cr = ENC_CODERANGE(str); switch (cr) { case ENC_CODERANGE_7BIT: ENC_CODERANGE_SET(str2, ENC_CODERANGE_7BIT); break; case ENC_CODERANGE_BROKEN: case ENC_CODERANGE_VALID: ENC_CODERANGE_SET(str2, ENC_CODERANGE_VALID); break; default: ENC_CODERANGE_CLEAR(str2); break; } } return str2; }
#byteindex(object, offset = 0) ⇒ Integer?
Returns the 0-based integer index of a substring of self
specified by object
(a string or ::Regexp
) and offset
, or nil
if there is no such substring; the returned index is the count of bytes (not characters).
When object
is a string, returns the index of the first found substring equal to object
:
s = 'foo' # => "foo"
s.size # => 3 # Three 1-byte characters.
s.bytesize # => 3 # Three bytes.
s.byteindex('f') # => 0
s.byteindex('o') # => 1
s.byteindex('oo') # => 1
s.byteindex('ooo') # => nil
When object
is a ::Regexp
, returns the index of the first found substring matching object
; updates Regexp-related global variables
:
s = 'foo'
s.byteindex(/f/) # => 0
$~ # => #<MatchData "f">
s.byteindex(/o/) # => 1
s.byteindex(/oo/) # => 1
s.byteindex(/ooo/) # => nil
$~ # => nil
Integer argument offset
, if given, specifies the 0-based index of the byte where searching is to begin.
When offset
is non-negative, searching begins at byte position offset
:
s = 'foo'
s.byteindex('o', 1) # => 1
s.byteindex('o', 2) # => 2
s.byteindex('o', 3) # => nil
When offset
is negative, counts backward from the end of self
:
s = 'foo'
s.byteindex('o', -1) # => 2
s.byteindex('o', -2) # => 1
s.byteindex('o', -3) # => 1
s.byteindex('o', -4) # => nil
Raises IndexError if the byte at offset
is not the first byte of a character:
s = "\uFFFF\uFFFF" # => "\uFFFF\uFFFF"
s.size # => 2 # Two 3-byte characters.
s.bytesize # => 6 # Six bytes.
s.byteindex("\uFFFF") # => 0
s.byteindex("\uFFFF", 1) # Raises IndexError
s.byteindex("\uFFFF", 2) # Raises IndexError
s.byteindex("\uFFFF", 3) # => 3
s.byteindex("\uFFFF", 4) # Raises IndexError
s.byteindex("\uFFFF", 5) # Raises IndexError
s.byteindex("\uFFFF", 6) # => nil
Related: see Querying
.
# File 'string.c', line 4637
static VALUE rb_str_byteindex_m(int argc, VALUE *argv, VALUE str) { VALUE sub; VALUE initpos; long pos; if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) { long slen = RSTRING_LEN(str); pos = NUM2LONG(initpos); if (pos < 0 ? (pos += slen) < 0 : pos > slen) { if (RB_TYPE_P(sub, T_REGEXP)) { rb_backref_set(Qnil); } return Qnil; } } else { pos = 0; } str_ensure_byte_pos(str, pos); if (RB_TYPE_P(sub, T_REGEXP)) { if (rb_reg_search(sub, str, pos, 0) >= 0) { VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); pos = BEG(0); return LONG2NUM(pos); } } else { StringValue(sub); pos = rb_str_byteindex(str, sub, pos); if (pos >= 0) return LONG2NUM(pos); } return Qnil; }
#byterindex(object, offset = self.bytesize) ⇒ Integer?
Returns the 0-based integer index of a substring of self
that is the last match for the given object
(a string or ::Regexp
) and offset
, or nil
if there is no such substring; the returned index is the count of bytes (not characters).
When object
is a string, returns the index of the last found substring equal to object
:
s = 'foo' # => "foo"
s.size # => 3 # Three 1-byte characters.
s.bytesize # => 3 # Three bytes.
s.byterindex('f') # => 0
s.byterindex('o') # => 2
s.byterindex('oo') # => 1
s.byterindex('ooo') # => nil
When object
is a ::Regexp
, returns the index of the last found substring matching object
; updates Regexp-related global variables
:
s = 'foo'
s.byterindex(/f/) # => 0
$~ # => #<MatchData "f">
s.byterindex(/o/) # => 2
s.byterindex(/oo/) # => 1
s.byterindex(/ooo/) # => nil
$~ # => nil
The last match means starting at the possible last position, not the last of the longest matches:
s = 'foo'
s.byterindex(/o+/) # => 2
$~ #=> #<MatchData "o">
To get the last longest match, use a negative lookbehind:
s = 'foo'
s.byterindex(/(?<!o)o+/) # => 1
$~ # => #<MatchData "oo">
Or use method #byteindex with negative lookahead:
s = 'foo'
s.byteindex(/o+(?!.*o)/) # => 1
$~ #=> #<MatchData "oo">
Integer argument offset
, if given, specifies the 0-based index of the byte where searching is to end.
When offset
is non-negative, searching ends at byte position offset
:
s = 'foo'
s.byterindex('o', 0) # => nil
s.byterindex('o', 1) # => 1
s.byterindex('o', 2) # => 2
s.byterindex('o', 3) # => 2
When offset
is negative, counts backward from the end of self
:
s = 'foo'
s.byterindex('o', -1) # => 2
s.byterindex('o', -2) # => 1
s.byterindex('o', -3) # => nil
Raises IndexError if the byte at offset
is not the first byte of a character:
s = "\uFFFF\uFFFF" # => "\uFFFF\uFFFF"
s.size # => 2 # Two 3-byte characters.
s.bytesize # => 6 # Six bytes.
s.byterindex("\uFFFF") # => 3
s.byterindex("\uFFFF", 1) # Raises IndexError
s.byterindex("\uFFFF", 2) # Raises IndexError
s.byterindex("\uFFFF", 3) # => 3
s.byterindex("\uFFFF", 4) # Raises IndexError
s.byterindex("\uFFFF", 5) # Raises IndexError
s.byterindex("\uFFFF", 6) # => nil
Related: see Querying
.
# File 'string.c', line 4976
static VALUE rb_str_byterindex_m(int argc, VALUE *argv, VALUE str) { VALUE sub; VALUE initpos; long pos, len = RSTRING_LEN(str); if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) { pos = NUM2LONG(initpos); if (pos < 0 && (pos += len) < 0) { if (RB_TYPE_P(sub, T_REGEXP)) { rb_backref_set(Qnil); } return Qnil; } if (pos > len) pos = len; } else { pos = len; } str_ensure_byte_pos(str, pos); if (RB_TYPE_P(sub, T_REGEXP)) { if (rb_reg_search(sub, str, pos, 1) >= 0) { VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); pos = BEG(0); return LONG2NUM(pos); } } else { StringValue(sub); pos = rb_str_byterindex(str, sub, pos); if (pos >= 0) return LONG2NUM(pos); } return Qnil; }
#bytes ⇒ array_of_bytes
Returns an array of the bytes in self
:
'hello'.bytes # => [104, 101, 108, 108, 111]
'тест'.bytes # => [209, 130, 208, 181, 209, 129, 209, 130]
'こんにちは'.bytes
# => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]
Related: see Converting to Non-String
.
# File 'string.c', line 9791
static VALUE rb_str_bytes(VALUE str) { VALUE ary = WANTARRAY("bytes", RSTRING_LEN(str)); return rb_str_enumerate_bytes(str, ary); }
#bytesize ⇒ Integer
# File 'string.c', line 2410
VALUE rb_str_bytesize(VALUE str) { return LONG2NUM(RSTRING_LEN(str)); }
#byteslice(offset, length = 1) ⇒ String
?
#byteslice(range) ⇒ String
?
String
?
#byteslice(range) ⇒ String
?
Returns a substring of self
, or nil
if the substring cannot be constructed.
With integer arguments offset
and #length given, returns the substring beginning at the given offset
and of the given #length (as available):
s = '0123456789' # => "0123456789"
s.byteslice(2) # => "2"
s.byteslice(200) # => nil
s.byteslice(4, 3) # => "456"
s.byteslice(4, 30) # => "456789"
Returns nil
if #length is negative or offset
falls outside of self
:
s.byteslice(4, -1) # => nil
s.byteslice(40, 2) # => nil
Counts backwards from the end of self
if offset
is negative:
s = '0123456789' # => "0123456789"
s.byteslice(-4) # => "6"
s.byteslice(-4, 3) # => "678"
With Range argument range
given, returns byteslice(range.begin, range.size)
:
s = '0123456789' # => "0123456789"
s.byteslice(4..6) # => "456"
s.byteslice(-6..-4) # => "456"
s.byteslice(5..2) # => "" # range.size is zero.
s.byteslice(40..42) # => nil
The starting and ending offsets need not be on character boundaries:
s = 'こんにちは'
s.byteslice(0, 3) # => "こ"
s.byteslice(1, 3) # => "\x81\x93\xE3"
The encodings of self
and the returned substring are always the same:
s.encoding # => #<Encoding:UTF-8>
s.byteslice(0, 3).encoding # => #<Encoding:UTF-8>
s.byteslice(1, 3).encoding # => #<Encoding:UTF-8>
But, depending on the character boundaries, the encoding of the returned substring may not be valid:
s.valid_encoding? # => true
s.byteslice(0, 3).valid_encoding? # => true
s.byteslice(1, 3).valid_encoding? # => false
Related: see Converting to New String
.
# File 'string.c', line 6862
static VALUE rb_str_byteslice(int argc, VALUE *argv, VALUE str) { if (argc == 2) { long beg = NUM2LONG(argv[0]); long len = NUM2LONG(argv[1]); return str_byte_substr(str, beg, len, TRUE); } rb_check_arity(argc, 1, 2); return str_byte_aref(str, argv[0]); }
#bytesplice(offset, length, str) ⇒ self
#bytesplice(offset, length, str, str_offset, str_length) ⇒ self
#bytesplice(range, str) ⇒ self
#bytesplice(range, str, str_range) ⇒ self
self
#bytesplice(offset, length, str, str_offset, str_length) ⇒ self
#bytesplice(range, str) ⇒ self
#bytesplice(range, str, str_range) ⇒ self
Replaces target bytes in self
with source bytes from the given string str
; returns self
.
In the first form, arguments offset
and #length determine the target bytes, and the source bytes are all of the given str
:
'0123456789'.bytesplice(0, 3, 'abc') # => "abc3456789"
'0123456789'.bytesplice(3, 3, 'abc') # => "012abc6789"
'0123456789'.bytesplice(0, 50, 'abc') # => "abc"
'0123456789'.bytesplice(50, 3, 'abc') # Raises IndexError.
The counts of the target bytes and source source bytes may be different:
'0123456789'.bytesplice(0, 6, 'abc') # => "abc6789" # Shorter source.
'0123456789'.bytesplice(0, 1, 'abc') # => "abc123456789" # Shorter target.
And either count may be zero (i.e., specifying an empty string):
'0123456789'.bytesplice(0, 3, '') # => "3456789" # Empty source.
'0123456789'.bytesplice(0, 0, 'abc') # => "abc0123456789" # Empty target.
In the second form, just as in the first, arugments offset
and #length determine the target bytes; argument str
contains the source bytes, and the additional arguments str_offset
and str_length
determine the actual source bytes:
'0123456789'.bytesplice(0, 3, 'abc', 0, 3) # => "abc3456789"
'0123456789'.bytesplice(0, 3, 'abc', 1, 1) # => "b3456789" # Shorter source.
'0123456789'.bytesplice(0, 1, 'abc', 0, 3) # => "abc123456789" # Shorter target.
'0123456789'.bytesplice(0, 3, 'abc', 1, 0) # => "3456789" # Empty source.
'0123456789'.bytesplice(0, 0, 'abc', 0, 3) # => "abc0123456789" # Empty target.
In the third form, argument range
determines the target bytes and the source bytes are all of the given str
:
'0123456789'.bytesplice(0..2, 'abc') # => "abc3456789"
'0123456789'.bytesplice(3..5, 'abc') # => "012abc6789"
'0123456789'.bytesplice(0..5, 'abc') # => "abc6789" # Shorter source.
'0123456789'.bytesplice(0..0, 'abc') # => "abc123456789" # Shorter target.
'0123456789'.bytesplice(0..2, '') # => "3456789" # Empty source.
'0123456789'.bytesplice(0...0, 'abc') # => "abc0123456789" # Empty target.
In the fourth form, just as in the third, arugment range
determines the target bytes; argument str
contains the source bytes, and the additional argument str_range
determines the actual source bytes:
'0123456789'.bytesplice(0..2, 'abc', 0..2) # => "abc3456789"
'0123456789'.bytesplice(3..5, 'abc', 0..2) # => "012abc6789"
'0123456789'.bytesplice(0..2, 'abc', 0..1) # => "ab3456789" # Shorter source.
'0123456789'.bytesplice(0..1, 'abc', 0..2) # => "abc23456789" # Shorter target.
'0123456789'.bytesplice(0..2, 'abc', 0...0) # => "3456789" # Empty source.
'0123456789'.bytesplice(0...0, 'abc', 0..2) # => "abc0123456789" # Empty target.
In any of the forms, the beginnings and endings of both source and target must be on character boundaries.
In these examples, self
has five 3-byte characters, and so has character boundaries at offsets 0, 3, 6, 9, 12, and 15.
'こんにちは'.bytesplice(0, 3, 'abc') # => "abcんにちは"
'こんにちは'.bytesplice(1, 3, 'abc') # Raises IndexError.
'こんにちは'.bytesplice(0, 2, 'abc') # Raises IndexError.
# File 'string.c', line 6907
static VALUE rb_str_bytesplice(int argc, VALUE *argv, VALUE str) { long beg, len, vbeg, vlen; VALUE val; int cr; rb_check_arity(argc, 2, 5); if (!(argc == 2 || argc == 3 || argc == 5)) { rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 2, 3, or 5)", argc); } if (argc == 2 || (argc == 3 && !RB_INTEGER_TYPE_P(argv[0]))) { if (!rb_range_beg_len(argv[0], &beg, &len, RSTRING_LEN(str), 2)) { rb_raise(rb_eTypeError, "wrong argument type %s (expected Range)", rb_builtin_class_name(argv[0])); } val = argv[1]; StringValue(val); if (argc == 2) { /* bytesplice(range, str) */ vbeg = 0; vlen = RSTRING_LEN(val); } else { /* bytesplice(range, str, str_range) */ if (!rb_range_beg_len(argv[2], &vbeg, &vlen, RSTRING_LEN(val), 2)) { rb_raise(rb_eTypeError, "wrong argument type %s (expected Range)", rb_builtin_class_name(argv[2])); } } } else { beg = NUM2LONG(argv[0]); len = NUM2LONG(argv[1]); val = argv[2]; StringValue(val); if (argc == 3) { /* bytesplice(index, length, str) */ vbeg = 0; vlen = RSTRING_LEN(val); } else { /* bytesplice(index, length, str, str_index, str_length) */ vbeg = NUM2LONG(argv[3]); vlen = NUM2LONG(argv[4]); } } str_check_beg_len(str, &beg, &len); str_check_beg_len(val, &vbeg, &vlen); str_modify_keep_cr(str); if (RB_UNLIKELY(ENCODING_GET_INLINED(str) != ENCODING_GET_INLINED(val))) { rb_enc_associate(str, rb_enc_check(str, val)); } rb_str_update_1(str, beg, len, val, vbeg, vlen); cr = ENC_CODERANGE_AND(ENC_CODERANGE(str), ENC_CODERANGE(val)); if (cr != ENC_CODERANGE_BROKEN) ENC_CODERANGE_SET(str, cr); return str; }
#capitalize(mapping = :ascii) ⇒ String
Returns a string containing the characters in self
, each with possibly changed case:
-
The first character is upcased.
-
All other characters are downcased.
Examples:
'hello world'.capitalize # => "Hello world"
'HELLO WORLD'.capitalize # => "Hello world"
Some characters do not have upcase and downcase, and so are not changed; see Case Mapping
:
'1, 2, 3, ...'.capitalize # => "1, 2, 3, ..."
The casing is affected by the given mapping
, which may be :ascii
, :fold
, or :turkic
; see Case Mappings
.
Related: see Converting to New String
.
# File 'string.c', line 8262
static VALUE rb_str_capitalize(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE; VALUE ret; flags = check_case_options(argc, argv, flags); enc = str_true_enc(str); if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str; if (flags&ONIGENC_CASE_ASCII_ONLY) { ret = rb_str_new(0, RSTRING_LEN(str)); rb_str_ascii_casemap(str, ret, &flags, enc); } else { ret = rb_str_casemap(str, &flags, enc); } return ret; }
#capitalize!(mapping = :ascii) ⇒ self
?
Like #capitalize, except that:
-
Changes character casings in
self
(not in a copy ofself
). -
Returns
self
if any changes are made,nil
otherwise.
Related: See Modifying
.
# File 'string.c', line 8215
static VALUE rb_str_capitalize_bang(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_TITLECASE; flags = check_case_options(argc, argv, flags); str_modify_keep_cr(str); enc = str_true_enc(str); if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil; if (flags&ONIGENC_CASE_ASCII_ONLY) rb_str_ascii_casemap(str, str, &flags, enc); else str_shared_replace(str, rb_str_casemap(str, &flags, enc)); if (ONIGENC_CASE_MODIFIED&flags) return str; return Qnil; }
#casecmp(other_string) ⇒ 1
, ...
Ignoring case, compares self
and other_string
; returns:
-
-1 if
self.downcase
is smaller thanother_string.downcase
. -
0 if the two are equal.
-
1 if
self.downcase
is larger thanother_string.downcase
. -
nil
if the two are incomparable.
See Case Mapping
.
Examples:
'foo'.casecmp('goo') # => -1
'goo'.casecmp('foo') # => 1
'foo'.casecmp('food') # => -1
'food'.casecmp('foo') # => 1
'FOO'.casecmp('foo') # => 0
'foo'.casecmp('FOO') # => 0
'foo'.casecmp(1) # => nil
Related: see Comparing
.
# File 'string.c', line 4320
static VALUE rb_str_casecmp(VALUE str1, VALUE str2) { VALUE s = rb_check_string_type(str2); if (NIL_P(s)) { return Qnil; } return str_casecmp(str1, s); }
#casecmp?(other_string) ⇒ true
, ...
Returns true
if self
and other_string
are equal after Unicode case folding, false
if unequal, nil
if incomparable.
See Case Mapping
.
Examples:
'foo'.casecmp?('goo') # => false
'goo'.casecmp?('foo') # => false
'foo'.casecmp?('food') # => false
'food'.casecmp?('foo') # => false
'FOO'.casecmp?('foo') # => true
'foo'.casecmp?('FOO') # => true
'foo'.casecmp?(1) # => nil
Related: see Comparing
.
# File 'string.c', line 4409
static VALUE rb_str_casecmp_p(VALUE str1, VALUE str2) { VALUE s = rb_check_string_type(str2); if (NIL_P(s)) { return Qnil; } return str_casecmp_p(str1, s); }
#center(size, pad_string = ' ') ⇒ String
Returns a centered copy of self
.
If integer argument #size is greater than the size (in characters) of self
, returns a new string of length #size that is a copy of self
, centered and padded on one or both ends with pad_string
:
'hello'.center(6) # => "hello " # Padded on one end.
'hello'.center(10) # => " hello " # Padded on both ends.
'hello'.center(20, '-|') # => "-|-|-|-hello-|-|-|-|" # Some padding repeated.
'hello'.center(10, 'abcdefg') # => "abhelloabc" # Some padding not used.
' hello '.center(13) # => " hello "
'тест'.center(10) # => " тест "
'こんにちは'.center(10) # => " こんにちは " # Multi-byte characters.
If #size is less than or equal to the size of self
, returns an unpadded copy of self
:
'hello'.center(5) # => "hello"
'hello'.center(-10) # => "hello"
Related: see Converting to New String
.
# File 'string.c', line 11110
static VALUE rb_str_center(int argc, VALUE *argv, VALUE str) { return rb_str_justify(argc, argv, str, 'c'); }
#chars ⇒ array_of_characters
Returns an array of the characters in self
:
'hello'.chars # => ["h", "e", "l", "l", "o"]
'тест'.chars # => ["т", "е", "с", "т"]
'こんにちは'.chars # => ["こ", "ん", "に", "ち", "は"]
''.chars # => []
Related: see Converting to Non-String
.
# File 'string.c', line 9860
static VALUE rb_str_chars(VALUE str) { VALUE ary = WANTARRAY("chars", rb_str_strlen(str)); return rb_str_enumerate_chars(str, ary); }
#chomp(line_sep = $/) ⇒ String
Returns a new string copied from self
, with trailing characters possibly removed:
When line_sep
is "\n"
, removes the last one or two characters if they are "\r"
, "\n"
, or "\r\n"
(but not "\n\r"
):
$/ # => "\n"
"abc\r".chomp # => "abc"
"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
"abc\n\r".chomp # => "abc\n"
"тест\r\n".chomp # => "тест"
"こんにちは\r\n".chomp # => "こんにちは"
When line_sep
is ''
(an empty string), removes multiple trailing occurrences of "\n"
or "\r\n"
(but not "\r"
or "\n\r"
):
"abc\n\n\n".chomp('') # => "abc"
"abc\r\n\r\n\r\n".chomp('') # => "abc"
"abc\n\n\r\n\r\n\n\n".chomp('') # => "abc"
"abc\n\r\n\r\n\r".chomp('') # => "abc\n\r\n\r\n\r"
"abc\r\r\r".chomp('') # => "abc\r\r\r"
When line_sep
is neither "\n"
nor ''
, removes a single trailing line separator if there is one:
'abcd'.chomp('cd') # => "ab"
'abcdcd'.chomp('cd') # => "abcd"
'abcd'.chomp('xx') # => "abcd"
Related: see Converting to New String
.
# File 'string.c', line 10329
static VALUE rb_str_chomp(int argc, VALUE *argv, VALUE str) { VALUE rs = chomp_rs(argc, argv); if (NIL_P(rs)) return str_duplicate(rb_cString, str); return rb_str_subseq(str, 0, chompped_length(str, rs)); }
#chomp!(line_sep = $/) ⇒ self
?
Like #chomp, except that:
-
Removes trailing characters from
self
(not from a copy ofself
). -
Returns
self
if any characters are removed,nil
otherwise.
Related: see Modifying
.
# File 'string.c', line 10309
static VALUE rb_str_chomp_bang(int argc, VALUE *argv, VALUE str) { VALUE rs; str_modifiable(str); if (RSTRING_LEN(str) == 0 && argc < 2) return Qnil; rs = chomp_rs(argc, argv); if (NIL_P(rs)) return Qnil; return rb_str_chomp_string(str, rs); }
#chop ⇒ String
Returns a new string copied from self
, with trailing characters possibly removed.
Removes "\r\n"
if those are the last two characters.
"abc\r\n".chop # => "abc"
"тест\r\n".chop # => "тест"
"こんにちは\r\n".chop # => "こんにちは"
Otherwise removes the last character if it exists.
'abcd'.chop # => "abc"
'тест'.chop # => "тес"
'こんにちは'.chop # => "こんにち"
''.chop # => ""
If you only need to remove the newline separator at the end of the string, #chomp is a better alternative.
Related: see Converting to New String
.
# File 'string.c', line 10153
static VALUE rb_str_chop(VALUE str) { return rb_str_subseq(str, 0, chopped_length(str)); }
#chop! ⇒ self
?
Like #chop, except that:
-
Removes trailing characters from
self
(not from a copy ofself
). -
Returns
self
if any characters are removed,nil
otherwise.
Related: see Modifying
.
# File 'string.c', line 10127
static VALUE rb_str_chop_bang(VALUE str) { str_modify_keep_cr(str); if (RSTRING_LEN(str) > 0) { long len; len = chopped_length(str); STR_SET_LEN(str, len); TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str)); if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) { ENC_CODERANGE_CLEAR(str); } return str; } return Qnil; }
#chr ⇒ String
Returns a string containing the first character of self
:
'hello'.chr # => "h"
'тест'.chr # => "т"
'こんにちは'.chr # => "こ"
''.chr # => ""
Related: see Converting to New String
.
# File 'string.c', line 6689
static VALUE rb_str_chr(VALUE str) { return rb_str_substr(str, 0, 1); }
#clear ⇒ self
Removes the contents of self
:
s = 'foo'
s.clear # => ""
s # => ""
Related: see Modifying
.
# File 'string.c', line 6667
static VALUE rb_str_clear(VALUE str) { str_discard(str); STR_SET_EMBED(str); STR_SET_LEN(str, 0); RSTRING_PTR(str)[0] = 0; if (rb_enc_asciicompat(STR_ENC_GET(str))) ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT); else ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID); return str; }
#codepoints ⇒ array_of_integers
Returns an array of the codepoints in self
; each codepoint is the integer value for a character:
'hello'.codepoints # => [104, 101, 108, 108, 111]
'тест'.codepoints # => [1090, 1077, 1089, 1090]
'こんにちは'.codepoints # => [12371, 12435, 12395, 12385, 12399]
''.codepoints # => []
Related: see Converting to Non-String
.
# File 'string.c', line 9920
static VALUE rb_str_codepoints(VALUE str) { VALUE ary = WANTARRAY("codepoints", rb_str_strlen(str)); return rb_str_enumerate_codepoints(str, ary); }
#concat(*objects) ⇒ String
Concatenates each object in objects
to self
; returns self
:
'foo'.concat('bar', 'baz') # => "foobarbaz"
For each given object object
that is an integer, the value is considered a codepoint and converted to a character before concatenation:
'foo'.concat(32, 'bar', 32, 'baz') # => "foo bar baz" # Embeds spaces.
'те'.concat(1089, 1090) # => "тест"
'こん'.concat(12395, 12385, 12399) # => "こんにちは"
Related: see Converting to New String
.
# File 'string.c', line 3796
static VALUE rb_str_concat_multi(int argc, VALUE *argv, VALUE str) { str_modifiable(str); if (argc == 1) { return rb_str_concat(str, argv[0]); } else if (argc > 1) { int i; VALUE arg_str = rb_str_tmp_new(0); rb_enc_copy(arg_str, str); for (i = 0; i < argc; i++) { rb_str_concat(arg_str, argv[i]); } rb_str_buf_append(str, arg_str); } return str; }
#count(*selectors) ⇒ Integer
Returns the total number of characters in self
that are specified by the given selectors.
For one 1-character selector, returns the count of instances of that character:
s = 'abracadabra'
s.count('a') # => 5
s.count('b') # => 2
s.count('x') # => 0
s.count('') # => 0
s = 'тест'
s.count('т') # => 2
s.count('е') # => 1
s = 'よろしくお願いします'
s.count('よ') # => 1
s.count('し') # => 2
For one multi-character selector, returns the count of instances for all specified characters:
s = 'abracadabra'
s.count('ab') # => 7
s.count('abc') # => 8
s.count('abcd') # => 9
s.count('abcdr') # => 11
s.count('abcdrx') # => 11
Order and repetition do not matter:
s.count('ba') == s.count('ab') # => true
s.count('baab') == s.count('ab') # => true
For multiple selectors, forms a single selector that is the intersection of characters in all selectors and returns the count of instances for that selector:
s = 'abcdefg'
s.count('abcde', 'dcbfg') == s.count('bcd') # => true
s.count('abc', 'def') == s.count('') # => true
In a character selector, three characters get special treatment:
-
A caret (
'^'
) functions as a negation operator for the immediately following characters:s = 'abracadabra' s.count('^bc') # => 8 # Count of all except 'b' and 'c'.
-
A hyphen (
'-'
) between two other characters defines a range of characters:s = 'abracadabra' s.count('a-c') # => 8 # Count of all 'a', 'b', and 'c'.
-
A backslash (
'\'
) acts as an escape for a caret, a hyphen, or another backslash:s = 'abracadabra' s.count('\^bc') # => 3 # Count of '^', 'b', and 'c'. s.count('a\-c') # => 6 # Count of 'a', '-', and 'c'. 'foo\bar\baz'.count('\\') # => 2 # Count of '\'.
These usages may be mixed:
s = 'abracadabra'
s.count('a-cq-t') # => 10 # Multiple ranges.
s.count('ac-d') # => 7 # Range mixed with plain characters.
s.count('^a-c') # => 3 # Range mixed with negation.
For multiple selectors, all forms may be used, including negations, ranges, and escapes.
s = 'abracadabra'
s.count('^abc', '^def') == s.count('^abcdef') # => true
s.count('a-e', 'c-g') == s.count('cde') # => true
s.count('^abc', 'c-g') == s.count('defg') # => true
Related: see Querying
.
# File 'string.c', line 9086
static VALUE rb_str_count(int argc, VALUE *argv, VALUE str) { char table[TR_TABLE_SIZE]; rb_encoding *enc = 0; VALUE del = 0, nodel = 0, tstr; char *s, *send; int i; int ascompat; size_t n = 0; rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS); tstr = argv[0]; StringValue(tstr); enc = rb_enc_check(str, tstr); if (argc == 1) { const char *ptstr; if (RSTRING_LEN(tstr) == 1 && rb_enc_asciicompat(enc) && (ptstr = RSTRING_PTR(tstr), ONIGENC_IS_ALLOWED_REVERSE_MATCH(enc, (const unsigned char *)ptstr, (const unsigned char *)ptstr+1)) && !is_broken_string(str)) { int clen; unsigned char c = rb_enc_codepoint_len(ptstr, ptstr+1, &clen, enc); s = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0); send = RSTRING_END(str); while (s < send) { if (*(unsigned char*)s++ == c) n++; } return SIZET2NUM(n); } } tr_setup_table(tstr, table, TRUE, &del, &nodel, enc); for (i=1; i<argc; i++) { tstr = argv[i]; StringValue(tstr); enc = rb_enc_check(str, tstr); tr_setup_table(tstr, table, FALSE, &del, &nodel, enc); } s = RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return INT2FIX(0); send = RSTRING_END(str); ascompat = rb_enc_asciicompat(enc); while (s < send) { unsigned int c; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (table[c]) { n++; } s++; } else { int clen; c = rb_enc_codepoint_len(s, send, &clen, enc); if (tr_find(c, table, del, nodel)) { n++; } s += clen; } } return SIZET2NUM(n); }
#crypt(salt_str) ⇒ String
Returns the string generated by calling crypt(3)
standard library function with str
and salt_str
, in this order, as its arguments. Please do not use this method any longer. It is legacy; provided only for backward compatibility with ruby scripts in earlier days. It is bad to use in contemporary programs for several reasons:
-
Behaviour of C’s
crypt(3)
depends on the OS it is run. The generated string lacks data portability. -
On some OSes such as Mac OS,
crypt(3)
never fails (i.e. silently ends up in unexpected results). -
On some OSes such as Mac OS,
crypt(3)
is not thread safe. -
So-called “traditional” usage of
crypt(3)
is very very very weak. According to its manpage, Linux’s traditionalcrypt(3)
output has only 2**56 variations; too easy to brute force today. And this is the default behaviour. -
In order to make things robust some OSes implement so-called “modular” usage. To go through, you have to do a complex build-up of the
salt_str
parameter, by hand. Failure in generation of a proper salt string tends not to yield any errors; typos in parameters are normally not detectable.-
For instance, in the following example, the second invocation of String#crypt is wrong; it has a typo in “round=” (lacks “s”). However the call does not fail and something unexpected is generated.
"foo".crypt("$5$rounds=1000$salt$") # OK, proper usage "foo".crypt("$5$round=1000$salt$") # Typo not detected
-
-
Even in the “modular” mode, some hash functions are considered archaic and no longer recommended at all; for instance module
$1$
is officially abandoned by its author: see
phk.freebsd.dk/sagas/md5crypt_eol/ . For another
instance module <code>$3$</code> is considered completely
broken: see the manpage of FreeBSD.
-
On some OS such as Mac OS, there is no modular mode. Yet, as written above,
crypt(3)
on Mac OS never fails. This means even if you build up a proper salt string it generates a traditional DES hash anyways, and there is no way for you to be aware of."foo".crypt("$5$rounds=1000$salt$") # => "$5fNPQMxC5j6."
If for some reason you cannot migrate to other secure contemporary password hashing algorithms, install the string-crypt gem and require 'string/crypt'
to continue using it.
# File 'string.c', line 10831
static VALUE rb_str_crypt(VALUE str, VALUE salt) { #ifdef HAVE_CRYPT_R VALUE databuf; struct crypt_data *data; # define CRYPT_END() ALLOCV_END(databuf) #else char *tmp_buf; extern char *crypt(const char *, const char *); # define CRYPT_END() rb_nativethread_lock_unlock(&crypt_mutex.lock) #endif VALUE result; const char *s, *saltp; char *res; #ifdef BROKEN_CRYPT char salt_8bit_clean[3]; #endif StringValue(salt); mustnot_wchar(str); mustnot_wchar(salt); s = StringValueCStr(str); saltp = RSTRING_PTR(salt); if (RSTRING_LEN(salt) < 2 || !saltp[0] || !saltp[1]) { rb_raise(rb_eArgError, "salt too short (need >=2 bytes)"); } #ifdef BROKEN_CRYPT if (!ISASCII((unsigned char)saltp[0]) || !ISASCII((unsigned char)saltp[1])) { salt_8bit_clean[0] = saltp[0] & 0x7f; salt_8bit_clean[1] = saltp[1] & 0x7f; salt_8bit_clean[2] = '\0'; saltp = salt_8bit_clean; } #endif #ifdef HAVE_CRYPT_R data = ALLOCV(databuf, sizeof(struct crypt_data)); # ifdef HAVE_STRUCT_CRYPT_DATA_INITIALIZED data->initialized = 0; # endif res = crypt_r(s, saltp, data); #else rb_nativethread_lock_lock(&crypt_mutex.lock); res = crypt(s, saltp); #endif if (!res) { int err = errno; CRYPT_END(); rb_syserr_fail(err, "crypt"); } #ifdef HAVE_CRYPT_R result = rb_str_new_cstr(res); CRYPT_END(); #else // We need to copy this buffer because it's static and we need to unlock the mutex // before allocating a new object (the string to be returned). If we allocate while // holding the lock, we could run GC which fires the VM barrier and causes a deadlock // if other ractors are waiting on this lock. size_t res_size = strlen(res)+1; tmp_buf = ALLOCA_N(char, res_size); // should be small enough to alloca memcpy(tmp_buf, res, res_size); res = tmp_buf; CRYPT_END(); result = rb_str_new_cstr(res); #endif return result; }
#- ⇒ String
#dedup ⇒ String
String
#dedup ⇒ String
Alias for #-@.
#delete(*selectors) ⇒ String
Returns a new string that is a copy of self
with certain characters removed; the removed characters are all instances of those specified by the given string selectors
.
For one 1-character selector, removes all instances of that character:
s = 'abracadabra'
s.delete('a') # => "brcdbr"
s.delete('b') # => "aracadara"
s.delete('x') # => "abracadabra"
s.delete('') # => "abracadabra"
s = 'тест'
s.delete('т') # => "ес"
s.delete('е') # => "тст"
s = 'よろしくお願いします'
s.delete('よ') # => "ろしくお願いします"
s.delete('し') # => "よろくお願います"
For one multi-character selector, removes all instances of the specified characters:
s = 'abracadabra'
s.delete('ab') # => "rcdr"
s.delete('abc') # => "rdr"
s.delete('abcd') # => "rr"
s.delete('abcdr') # => ""
s.delete('abcdrx') # => ""
Order and repetition do not matter:
s.delete('ba') == s.delete('ab') # => true
s.delete('baab') == s.delete('ab') # => true
For multiple selectors, forms a single selector that is the intersection of characters in all selectors and removes all instances of characters specified by that selector:
s = 'abcdefg'
s.delete('abcde', 'dcbfg') == s.delete('bcd') # => true
s.delete('abc', 'def') == s.delete('') # => true
In a character selector, three characters get special treatment:
-
A caret (
'^'
) functions as a negation operator for the immediately following characters:s = 'abracadabra' s.delete('^bc') # => "bcb" # Deletes all except 'b' and 'c'.
-
A hyphen (
'-'
) between two other characters defines a range of characters:s = 'abracadabra' s.delete('a-c') # => "rdr" # Deletes all 'a', 'b', and 'c'.
-
A backslash (
'\'
) acts as an escape for a caret, a hyphen, or another backslash:s = 'abracadabra' s.delete('\^bc') # => "araadara" # Deletes all '^', 'b', and 'c'. s.delete('a\-c') # => "brdbr" # Deletes all 'a', '-', and 'c'. 'foo\bar\baz'.delete('\\') # => "foobarbaz" # Deletes all '\'.
These usages may be mixed:
s = 'abracadabra'
s.delete('a-cq-t') # => "d" # Multiple ranges.
s.delete('ac-d') # => "brbr" # Range mixed with plain characters.
s.delete('^a-c') # => "abacaaba" # Range mixed with negation.
For multiple selectors, all forms may be used, including negations, ranges, and escapes.
s = 'abracadabra'
s.delete('^abc', '^def') == s.delete('^abcdef') # => true
s.delete('a-e', 'c-g') == s.delete('cde') # => true
s.delete('^abc', 'c-g') == s.delete('defg') # => true
Related: see Converting to New String
.
# File 'string.c', line 8919
static VALUE rb_str_delete(int argc, VALUE *argv, VALUE str) { str = str_duplicate(rb_cString, str); rb_str_delete_bang(argc, argv, str); return str; }
#delete!(*selectors) ⇒ self
?
Like #delete, but modifies self
in place; returns self
if any characters were deleted, nil
otherwise.
Related: see Modifying
.
# File 'string.c', line 8849
static VALUE rb_str_delete_bang(int argc, VALUE *argv, VALUE str) { char squeez[TR_TABLE_SIZE]; rb_encoding *enc = 0; char *s, *send, *t; VALUE del = 0, nodel = 0; int modify = 0; int i, ascompat, cr; if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil; rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS); for (i=0; i<argc; i++) { VALUE s = argv[i]; StringValue(s); enc = rb_enc_check(str, s); tr_setup_table(s, squeez, i==0, &del, &nodel, enc); } str_modify_keep_cr(str); ascompat = rb_enc_asciicompat(enc); s = t = RSTRING_PTR(str); send = RSTRING_END(str); cr = ascompat ? ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID; while (s < send) { unsigned int c; int clen; if (ascompat && (c = *(unsigned char*)s) < 0x80) { if (squeez[c]) { modify = 1; } else { if (t != s) *t = c; t++; } s++; } else { c = rb_enc_codepoint_len(s, send, &clen, enc); if (tr_find(c, squeez, del, nodel)) { modify = 1; } else { if (t != s) rb_enc_mbcput(c, t, enc); t += clen; if (cr == ENC_CODERANGE_7BIT) cr = ENC_CODERANGE_VALID; } s += clen; } } TERM_FILL(t, TERM_LEN(str)); STR_SET_LEN(str, t - RSTRING_PTR(str)); ENC_CODERANGE_SET(str, cr); if (modify) return str; return Qnil; }
#delete_prefix(prefix) ⇒ String
Returns a copy of self
with leading substring prefix
removed:
'oof'.delete_prefix('o') # => "of"
'oof'.delete_prefix('oo') # => "f"
'oof'.delete_prefix('oof') # => ""
'oof'.delete_prefix('x') # => "oof"
'тест'.delete_prefix('те') # => "ст"
'こんにちは'.delete_prefix('こん') # => "にちは"
Related: see Converting to New String
.
# File 'string.c', line 11346
static VALUE rb_str_delete_prefix(VALUE str, VALUE prefix) { long prefixlen; prefixlen = deleted_prefix_length(str, prefix); if (prefixlen <= 0) return str_duplicate(rb_cString, str); return rb_str_subseq(str, prefixlen, RSTRING_LEN(str) - prefixlen); }
#delete_prefix!(prefix) ⇒ self
?
Like #delete_prefix, except that self
is modified in place; returns self
if the prefix is removed, nil
otherwise.
Related: see Modifying
.
# File 'string.c', line 11326
static VALUE rb_str_delete_prefix_bang(VALUE str, VALUE prefix) { long prefixlen; str_modify_keep_cr(str); prefixlen = deleted_prefix_length(str, prefix); if (prefixlen <= 0) return Qnil; return rb_str_drop_bytes(str, prefixlen); }
#delete_suffix(suffix) ⇒ String
Returns a copy of self
with trailing substring suffix
removed:
'foo'.delete_suffix('o') # => "fo"
'foo'.delete_suffix('oo') # => "f"
'foo'.delete_suffix('foo') # => ""
'foo'.delete_suffix('f') # => "foo"
'foo'.delete_suffix('x') # => "foo"
'тест'.delete_suffix('ст') # => "те"
'こんにちは'.delete_suffix('ちは') # => "こんに"
Related: see Converting to New String
.
# File 'string.c', line 11430
static VALUE rb_str_delete_suffix(VALUE str, VALUE suffix) { long suffixlen; suffixlen = deleted_suffix_length(str, suffix); if (suffixlen <= 0) return str_duplicate(rb_cString, str); return rb_str_subseq(str, 0, RSTRING_LEN(str) - suffixlen); }
#delete_suffix!(suffix) ⇒ self
?
Like #delete_suffix, except that self
is modified in place; returns self
if the suffix is removed, nil
otherwise.
Related: see Modifying
.
# File 'string.c', line 11402
static VALUE rb_str_delete_suffix_bang(VALUE str, VALUE suffix) { long olen, suffixlen, len; str_modifiable(str); suffixlen = deleted_suffix_length(str, suffix); if (suffixlen <= 0) return Qnil; olen = RSTRING_LEN(str); str_modify_keep_cr(str); len = olen - suffixlen; STR_SET_LEN(str, len); TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str)); if (ENC_CODERANGE(str) != ENC_CODERANGE_7BIT) { ENC_CODERANGE_CLEAR(str); } return str; }
#downcase(mapping) ⇒ String
Returns a new string containing the downcased characters in self
:
'Hello, World!'.downcase # => "hello, world!"
'ТЕСТ'.downcase # => "тест"
'よろしくお願いします'.downcase # => "よろしくお願いします"
Some characters do not have upcased and downcased versions.
The casing may be affected by the given mapping
; see Case Mapping.
Related: see Converting to New String
.
# File 'string.c', line 8177
static VALUE rb_str_downcase(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE; VALUE ret; flags = check_case_options(argc, argv, flags); enc = str_true_enc(str); if (case_option_single_p(flags, enc, str)) { ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str)); str_enc_copy_direct(ret, str); downcase_single(ret); } else if (flags&ONIGENC_CASE_ASCII_ONLY) { ret = rb_str_new(0, RSTRING_LEN(str)); rb_str_ascii_casemap(str, ret, &flags, enc); } else { ret = rb_str_casemap(str, &flags, enc); } return ret; }
#downcase!(mapping) ⇒ self
?
Like #downcase, except that:
-
Changes character casings in
self
(not in a copy ofself
). -
Returns
self
if any changes are made,nil
otherwise.
Related: See Modifying
.
# File 'string.c', line 8146
static VALUE rb_str_downcase_bang(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_DOWNCASE; flags = check_case_options(argc, argv, flags); str_modify_keep_cr(str); enc = str_true_enc(str); if (case_option_single_p(flags, enc, str)) { if (downcase_single(str)) flags |= ONIGENC_CASE_MODIFIED; } else if (flags&ONIGENC_CASE_ASCII_ONLY) rb_str_ascii_casemap(str, str, &flags, enc); else str_shared_replace(str, rb_str_casemap(str, &flags, enc)); if (ONIGENC_CASE_MODIFIED&flags) return str; return Qnil; }
#dump ⇒ String
Returns a printable version of self
, enclosed in double-quotes:
'hello'.dump # => "\"hello\""
Certain special characters are rendered with escapes:
'"'.dump # => "\"\\\"\""
'\\'.dump # => "\"\\\\\""
Non-printing characters are rendered with escapes:
s = ''
s << 7 # Alarm (bell).
s << 8 # Back space.
s << 9 # Horizontal tab.
s << 10 # Line feed.
s << 11 # Vertical tab.
s << 12 # Form feed.
s << 13 # Carriage return.
s # => "\a\b\t\n\v\f\r"
s.dump # => "\"\\a\\b\\t\\n\\v\\f\\r\""
If self
is encoded in UTF-8 and contains Unicode characters, renders Unicode characters in Unicode escape sequence:
'тест'.dump # => "\"\\u0442\\u0435\\u0441\\u0442\""
'こんにちは'.dump # => "\"\\u3053\\u3093\\u306B\\u3061\\u306F\""
If the encoding of self
is not ASCII-compatible (i.e., self.encoding.ascii_compatible?
returns false
), renders all ASCII-compatible bytes as ASCII characters and all other bytes as hexadecimal. Appends .dup.force_encoding(\"encoding\")
, where <encoding>
is self.encoding.name
:
s = 'hello'
s.encoding # => #<Encoding:UTF-8>
s.dump # => "\"hello\""
s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x00h\\x00e\\x00l\\x00l\\x00o\".dup.force_encoding(\"UTF-16\")"
s.encode('utf-16le').dump # => "\"h\\x00e\\x00l\\x00l\\x00o\\x00\".dup.force_encoding(\"UTF-16LE\")"
s = 'тест'
s.encoding # => #<Encoding:UTF-8>
s.dump # => "\"\\u0442\\u0435\\u0441\\u0442\""
s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x04B\\x045\\x04A\\x04B\".dup.force_encoding(\"UTF-16\")"
s.encode('utf-16le').dump # => "\"B\\x045\\x04A\\x04B\\x04\".dup.force_encoding(\"UTF-16LE\")"
s = 'こんにちは'
s.encoding # => #<Encoding:UTF-8>
s.dump # => "\"\\u3053\\u3093\\u306B\\u3061\\u306F\""
s.encode('utf-16').dump # => "\"\\xFE\\xFF0S0\\x930k0a0o\".dup.force_encoding(\"UTF-16\")"
s.encode('utf-16le').dump # => "\"S0\\x930k0a0o0\".dup.force_encoding(\"UTF-16LE\")"
Related: see Converting to New String
.
# File 'string.c', line 7417
VALUE rb_str_dump(VALUE str) { int encidx = rb_enc_get_index(str); rb_encoding *enc = rb_enc_from_index(encidx); long len; const char *p, *pend; char *q, *qend; VALUE result; int u8 = (encidx == rb_utf8_encindex()); static const char nonascii_suffix[] = ".dup.force_encoding(\"%s\")"; len = 2; /* "" */ if (!rb_enc_asciicompat(enc)) { len += strlen(nonascii_suffix) - rb_strlen_lit("%s"); len += strlen(enc->name); } p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str); while (p < pend) { int clen; unsigned char c = *p++; switch (c) { case '"': case '\\': case '\n': case '\r': case '\t': case '\f': case '\013': case '\010': case '\007': case '\033': clen = 2; break; case '#': clen = IS_EVSTR(p, pend) ? 2 : 1; break; default: if (ISPRINT(c)) { clen = 1; } else { if (u8 && c > 0x7F) { /* \u notation */ int n = rb_enc_precise_mbclen(p-1, pend, enc); if (MBCLEN_CHARFOUND_P(n)) { unsigned int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc); if (cc <= 0xFFFF) clen = 6; /* \uXXXX */ else if (cc <= 0xFFFFF) clen = 9; /* \u{XXXXX} */ else clen = 10; /* \u{XXXXXX} */ p += MBCLEN_CHARFOUND_LEN(n)-1; break; } } clen = 4; /* \xNN */ } break; } if (clen > LONG_MAX - len) { rb_raise(rb_eRuntimeError, "string size too big"); } len += clen; } result = rb_str_new(0, len); p = RSTRING_PTR(str); pend = p + RSTRING_LEN(str); q = RSTRING_PTR(result); qend = q + len + 1; *q++ = '"'; while (p < pend) { unsigned char c = *p++; if (c == '"' || c == '\\') { *q++ = '\\'; *q++ = c; } else if (c == '#') { if (IS_EVSTR(p, pend)) *q++ = '\\'; *q++ = '#'; } else if (c == '\n') { *q++ = '\\'; *q++ = 'n'; } else if (c == '\r') { *q++ = '\\'; *q++ = 'r'; } else if (c == '\t') { *q++ = '\\'; *q++ = 't'; } else if (c == '\f') { *q++ = '\\'; *q++ = 'f'; } else if (c == '\013') { *q++ = '\\'; *q++ = 'v'; } else if (c == '\010') { *q++ = '\\'; *q++ = 'b'; } else if (c == '\007') { *q++ = '\\'; *q++ = 'a'; } else if (c == '\033') { *q++ = '\\'; *q++ = 'e'; } else if (ISPRINT(c)) { *q++ = c; } else { *q++ = '\\'; if (u8) { int n = rb_enc_precise_mbclen(p-1, pend, enc) - 1; if (MBCLEN_CHARFOUND_P(n)) { int cc = rb_enc_mbc_to_codepoint(p-1, pend, enc); p += n; if (cc <= 0xFFFF) snprintf(q, qend-q, "u%04X", cc); /* \uXXXX */ else snprintf(q, qend-q, "u{%X}", cc); /* \u{XXXXX} or \u{XXXXXX} */ q += strlen(q); continue; } } snprintf(q, qend-q, "x%02X", c); q += 3; } } *q++ = '"'; *q = '\0'; if (!rb_enc_asciicompat(enc)) { snprintf(q, qend-q, nonascii_suffix, enc->name); encidx = rb_ascii8bit_encindex(); } /* result from dump is ASCII */ rb_enc_associate_index(result, encidx); ENC_CODERANGE_SET(result, ENC_CODERANGE_7BIT); return result; }
#dup
# File 'string.c', line 1962
VALUE rb_str_dup_m(VALUE str) { if (LIKELY(BARE_STRING_P(str))) { return str_duplicate(rb_cString, str); } else { return rb_obj_dup(str); } }
#each_byte {|byte| ... } ⇒ self
#each_byte ⇒ Enumerator
self
#each_byte ⇒ Enumerator
With a block given, calls the block with each successive byte from self
; returns self
:
a = []
'hello'.each_byte {|byte| a.push(byte) } # Five 1-byte characters.
a # => [104, 101, 108, 108, 111]
a = []
'тест'.each_byte {|byte| a.push(byte) } # Four 2-byte characters.
a # => [209, 130, 208, 181, 209, 129, 209, 130]
a = []
'こんにちは'.each_byte {|byte| a.push(byte) } # Five 3-byte characters.
a # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]
With no block given, returns an enumerator.
Related: see Iterating
.
# File 'string.c', line 9776
static VALUE rb_str_each_byte(VALUE str) { RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_byte_size); return rb_str_enumerate_bytes(str, 0); }
#each_char {|char| ... } ⇒ self
#each_char ⇒ Enumerator
self
#each_char ⇒ Enumerator
With a block given, calls the block with each successive character from self
; returns self
:
a = []
'hello'.each_char do |char|
a.push(char)
end
a # => ["h", "e", "l", "l", "o"]
a = []
'тест'.each_char do |char|
a.push(char)
end
a # => ["т", "е", "с", "т"]
a = []
'こんにちは'.each_char do |char|
a.push(char)
end
a # => ["こ", "ん", "に", "ち", "は"]
With no block given, returns an enumerator.
Related: see Iterating
.
# File 'string.c', line 9845
static VALUE rb_str_each_char(VALUE str) { RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size); return rb_str_enumerate_chars(str, 0); }
#each_codepoint {|codepoint| ... } ⇒ self
#each_codepoint ⇒ Enumerator
self
#each_codepoint ⇒ Enumerator
With a block given, calls the block with each successive codepoint from self
; each codepoint is the integer value for a character; returns self
:
a = []
'hello'.each_codepoint do |codepoint|
a.push(codepoint)
end
a # => [104, 101, 108, 108, 111]
a = []
'тест'.each_codepoint do |codepoint|
a.push(codepoint)
end
a # => [1090, 1077, 1089, 1090]
a = []
'こんにちは'.each_codepoint do |codepoint|
a.push(codepoint)
end
a # => [12371, 12435, 12395, 12385, 12399]
With no block given, returns an enumerator.
Related: see Iterating
.
# File 'string.c', line 9905
static VALUE rb_str_each_codepoint(VALUE str) { RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size); return rb_str_enumerate_codepoints(str, 0); }
#each_grapheme_cluster {|grapheme_cluster| ... } ⇒ self
#each_grapheme_cluster ⇒ Enumerator
self
#each_grapheme_cluster ⇒ Enumerator
With a block given, calls the given block with each successive grapheme cluster from self
(see Unicode Grapheme Cluster Boundaries); returns self
:
a = []
'hello'.each_grapheme_cluster do |grapheme_cluster|
a.push(grapheme_cluster)
end
a # => ["h", "e", "l", "l", "o"]
a = []
'тест'.each_grapheme_cluster do |grapheme_cluster|
a.push(grapheme_cluster)
end
a # => ["т", "е", "с", "т"]
a = []
'こんにちは'.each_grapheme_cluster do |grapheme_cluster|
a.push(grapheme_cluster)
end
a # => ["こ", "ん", "に", "ち", "は"]
With no block given, returns an enumerator.
Related: see Iterating
.
# File 'string.c', line 10075
static VALUE rb_str_each_grapheme_cluster(VALUE str) { RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_grapheme_cluster_size); return rb_str_enumerate_grapheme_clusters(str, 0); }
#each_line(record_separator = $/, chomp: false) {|substring| ... } ⇒ self
#each_line(record_separator = $/, chomp: false) ⇒ Enumerator
self
#each_line(record_separator = $/, chomp: false) ⇒ Enumerator
With a block given, forms the substrings (lines) that are the result of splitting self
at each occurrence of the given record_separator
; passes each line to the block; returns self
.
With the default record_separator
:
$/ # => "\n"
s = <<~EOT
This is the first line.
This is line two.
This is line four.
This is line five.
EOT
s.each_line {|line| p line }
Output:
"This is the first line.\n"
"This is line two.\n"
"\n"
"This is line four.\n"
"This is line five.\n"
With a different record_separator
:
record_separator = ' is '
s.each_line(record_separator) {|line| p line }
Output:
"This is "
"the first line.\nThis is "
"line two.\n\nThis is "
"line four.\nThis is "
"line five.\n"
With #chomp as true
, removes the trailing record_separator
from each line:
s.each_line(chomp: true) {|line| p line }
Output:
"This is the first line."
"This is line two."
""
"This is line four."
"This is line five."
With an empty string as record_separator
, forms and passes “paragraphs” by splitting at each occurrence of two or more newlines:
record_separator = ''
s.each_line(record_separator) {|line| p line }
Output:
"This is the first line.\nThis is line two.\n\n"
"This is line four.\nThis is line five.\n"
With no block given, returns an enumerator.
Related: see Iterating
.
# File 'string.c', line 9682
static VALUE rb_str_each_line(int argc, VALUE *argv, VALUE str) { RETURN_SIZED_ENUMERATOR(str, argc, argv, 0); return rb_str_enumerate_lines(argc, argv, str, 0); }
#encode(dst_encoding = Encoding.default_internal, **enc_opts) ⇒ String
#encode(dst_encoding, src_encoding, **enc_opts) ⇒ String
String
#encode(dst_encoding, src_encoding, **enc_opts) ⇒ String
Returns a copy of self
transcoded as determined by dst_encoding
; see Encodings.
By default, raises an exception if self
contains an invalid byte or a character not defined in dst_encoding
; that behavior may be modified by encoding options; see below.
With no arguments:
-
Uses the same encoding if Encoding.default_internal is
nil
(the default):Encoding.default_internal # => nil s = "Ruby\x99".force_encoding('Windows-1252') s.encoding # => #<Encoding:Windows-1252> s.bytes # => [82, 117, 98, 121, 153] t = s.encode # => "Ruby\x99" t.encoding # => #<Encoding:Windows-1252> t.bytes # => [82, 117, 98, 121, 226, 132, 162]
-
Otherwise, uses the encoding Encoding.default_internal:
Encoding.default_internal = 'UTF-8' t = s.encode # => "Ruby™" t.encoding # => #<Encoding:UTF-8>
With only argument dst_encoding
given, uses that encoding:
s = "Ruby\x99".force_encoding('Windows-1252')
s.encoding # => #<Encoding:Windows-1252>
t = s.encode('UTF-8') # => "Ruby™"
t.encoding # => #<Encoding:UTF-8>
With arguments dst_encoding
and src_encoding
given, interprets self
using src_encoding
, encodes the new string using dst_encoding
:
s = "Ruby\x99"
t = s.encode('UTF-8', 'Windows-1252') # => "Ruby™"
t.encoding # => #<Encoding:UTF-8>
Optional keyword arguments enc_opts
specify encoding options; see {Encoding
Options}.
Please note that, unless invalid: :replace
option is given, conversion from an encoding enc
to the same encoding enc
(independent of whether enc
is given explicitly or implicitly) is a no-op, i.e. the string is simply copied without any changes, and no exceptions are raised, even if there are invalid bytes.
Related: see Converting to New String
.
# File 'transcode.c', line 2931
static VALUE str_encode(int argc, VALUE *argv, VALUE str) { VALUE newstr = str; int encidx = str_transcode(argc, argv, &newstr); return encoded_dup(newstr, str, encidx); }
#encode!(dst_encoding = Encoding.default_internal, **enc_opts) ⇒ self
#encode!(dst_encoding, src_encoding, **enc_opts) ⇒ self
self
#encode!(dst_encoding, src_encoding, **enc_opts) ⇒ self
Like #encode, but applies encoding changes to self
; returns self
.
Related: see Modifying
.
# File 'transcode.c', line 2900
static VALUE str_encode_bang(int argc, VALUE *argv, VALUE str) { VALUE newstr; int encidx; rb_check_frozen(str); newstr = str; encidx = str_transcode(argc, argv, &newstr); if (encidx < 0) return str; if (newstr == str) { rb_enc_associate_index(str, encidx); return str; } rb_str_shared_replace(str, newstr); return str_encode_associate(str, encidx); }
#encoding ⇒ Encoding
Alias for Regexp#encoding.
#end_with?(*strings) ⇒ Boolean
Returns whether self
ends with any of the given strings
:
'foo'.end_with?('oo') # => true
'foo'.end_with?('bar', 'oo') # => true
'foo'.end_with?('bar', 'baz') # => false
'foo'.end_with?('') # => true
'тест'.end_with?('т') # => true
'こんにちは'.end_with?('は') # => true
Related: see Querying
.
# File 'string.c', line 11241
static VALUE rb_str_end_with(int argc, VALUE *argv, VALUE str) { int i; for (i=0; i<argc; i++) { VALUE tmp = argv[i]; const char *p, *s, *e; long slen, tlen; rb_encoding *enc; StringValue(tmp); enc = rb_enc_check(str, tmp); if ((tlen = RSTRING_LEN(tmp)) == 0) return Qtrue; if ((slen = RSTRING_LEN(str)) < tlen) continue; p = RSTRING_PTR(str); e = p + slen; s = e - tlen; if (!at_char_boundary(p, s, e, enc)) continue; if (memcmp(s, RSTRING_PTR(tmp), tlen) == 0) return Qtrue; } return Qfalse; }
#eql?(object) ⇒ Boolean
Returns whether self
and object
have the same length and content:
s = 'foo'
s.eql?('foo') # => true
s.eql?('food') # => false
s.eql?('FOO') # => false
Returns false
if the two strings’ encodings are not compatible:
s0 = "äöü" # => "äöü"
s1 = s0.encode(Encoding::ISO_8859_1) # => "\xE4\xF6\xFC"
s0.encoding # => #<Encoding:UTF-8>
s1.encoding # => #<Encoding:ISO-8859-1>
s0.eql?(s1) # => false
See Encodings.
Related: see Querying
.
# File 'string.c', line 4248
VALUE rb_str_eql(VALUE str1, VALUE str2) { if (str1 == str2) return Qtrue; if (!RB_TYPE_P(str2, T_STRING)) return Qfalse; return rb_str_eql_internal(str1, str2); }
#force_encoding(encoding) ⇒ self
Changes the encoding of self
to the given #encoding, which may be a string encoding name or an ::Encoding
object; does not change the underlying bytes; returns self:
s = 'łał'
s.bytes # => [197, 130, 97, 197, 130]
s.encoding # => #<Encoding:UTF-8>
s.force_encoding('ascii') # => "\xC5\x82a\xC5\x82"
s.encoding # => #<Encoding:US-ASCII>
s.valid_encoding? # => true
s.bytes # => [197, 130, 97, 197, 130]
Makes the change even if the given #encoding is invalid for self
(as is the change above):
s.valid_encoding? # => false
See Encodings.
Related: see Modifying
.
# File 'string.c', line 11474
static VALUE rb_str_force_encoding(VALUE str, VALUE enc) { str_modifiable(str); rb_encoding *encoding = rb_to_encoding(enc); int idx = rb_enc_to_index(encoding); // If the encoding is unchanged, we do nothing. if (ENCODING_GET(str) == idx) { return str; } rb_enc_associate_index(str, idx); // If the coderange was 7bit and the new encoding is ASCII-compatible // we can keep the coderange. if (ENC_CODERANGE(str) == ENC_CODERANGE_7BIT && encoding && rb_enc_asciicompat(encoding)) { return str; } ENC_CODERANGE_CLEAR(str); return str; }
#freeze
# File 'string.c', line 3237
VALUE rb_str_freeze(VALUE str) { if (CHILLED_STRING_P(str)) { FL_UNSET_RAW(str, STR_CHILLED); } if (OBJ_FROZEN(str)) return str; rb_str_resize(str, RSTRING_LEN(str)); return rb_obj_freeze(str); }
#getbyte(index) ⇒ Integer?
Returns the byte at zero-based #index as an integer:
s = 'foo'
s.getbyte(0) # => 102
s.getbyte(1) # => 111
s.getbyte(2) # => 111
Counts backward from the end if #index is negative:
s.getbyte(-3) # => 102
Returns nil
if #index is out of range:
s.getbyte(3) # => nil
s.getbyte(-4) # => nil
More examples:
s = 'тест'
s.bytes # => [209, 130, 208, 181, 209, 129, 209, 130]
s.getbyte(2) # => 208
s = 'こんにちは'
s.bytes # => [227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]
s.getbyte(2) # => 147
Related: see Converting to Non-String
.
# File 'string.c', line 6702
VALUE rb_str_getbyte(VALUE str, VALUE index) { long pos = NUM2LONG(index); if (pos < 0) pos += RSTRING_LEN(str); if (pos < 0 || RSTRING_LEN(str) <= pos) return Qnil; return INT2FIX((unsigned char)RSTRING_PTR(str)[pos]); }
#grapheme_clusters ⇒ array_of_grapheme_clusters
Returns an array of the grapheme clusters in self
(see Unicode Grapheme Cluster Boundaries):
s = "ä-pqr-b̈-xyz-c̈"
s.size # => 16
s.bytesize # => 19
s.grapheme_clusters.size # => 13
s.grapheme_clusters
# => ["ä", "-", "p", "q", "r", "-", "b̈", "-", "x", "y", "z", "-", "c̈"]
Details:
s = "ä"
s.grapheme_clusters # => ["ä"] # One grapheme cluster.
s.bytes # => [97, 204, 136] # Three bytes.
s.chars # => ["a", "̈"] # Two characters.
s.chars.map {|char| char.ord } # => [97, 776] # Their values.
Related: see Converting to Non-String
.
# File 'string.c', line 10090
static VALUE rb_str_grapheme_clusters(VALUE str) { VALUE ary = WANTARRAY("grapheme_clusters", rb_str_strlen(str)); return rb_str_enumerate_grapheme_clusters(str, ary); }
#gsub(pattern, replacement) ⇒ String
#gsub(pattern) {|match| ... } ⇒ String
#gsub(pattern) ⇒ Enumerator
String
#gsub(pattern) {|match| ... } ⇒ String
#gsub(pattern) ⇒ Enumerator
Returns a copy of self
with zero or more substrings replaced.
Argument pattern
may be a string or a Regexp; argument replacement
may be a string or a ::Hash
. Varying types for the argument values makes this method very versatile.
Below are some simple examples; for many more examples, see Substitution Methods
.
With arguments pattern
and string replacement
given, replaces each matching substring with the given replacement
string:
s = 'abracadabra'
s.gsub('ab', 'AB') # => "ABracadABra"
s.gsub(/[a-c]/, 'X') # => "XXrXXXdXXrX"
With arguments pattern
and hash replacement
given, replaces each matching substring with a value from the given replacement
hash, or removes it:
h = {'a' => 'A', 'b' => 'B', 'c' => 'C'}
s.gsub(/[a-c]/, h) # => "ABrACAdABrA" # 'a', 'b', 'c' replaced.
s.gsub(/[a-d]/, h) # => "ABrACAABrA" # 'd' removed.
With argument pattern
and a block given, calls the block with each matching substring; replaces that substring with the block’s return value:
s.gsub(/[a-d]/) {|substring| substring.upcase }
# => "ABrACADABrA"
With argument pattern
and no block given, returns a new ::Enumerator
.
Related: see Converting to New String
.
# File 'string.c', line 6625
static VALUE rb_str_gsub(int argc, VALUE *argv, VALUE str) { return str_gsub(argc, argv, str, 0); }
#gsub!(pattern, replacement) ⇒ self
?
#gsub!(pattern) {|match| ... } ⇒ self
?
#gsub!(pattern) ⇒ Enumerator
self
?
#gsub!(pattern) {|match| ... } ⇒ self
?
#gsub!(pattern) ⇒ Enumerator
Like #gsub, except that:
-
Performs substitutions in
self
(not in a copy ofself
). -
Returns
self
if any characters are removed,nil
otherwise.
Related: see Modifying
.
# File 'string.c', line 6574
static VALUE rb_str_gsub_bang(int argc, VALUE *argv, VALUE str) { str_modify_keep_cr(str); return str_gsub(argc, argv, str, 1); }
#hash ⇒ Integer
Returns the integer hash value for self
.
Two String objects that have identical content and compatible encodings also have the same hash value; see Object#hash and Encodings:
s = 'foo'
h = s.hash # => -569050784
h == 'foo'.hash # => true
h == 'food'.hash # => false
h == 'FOO'.hash # => false
s0 = "äöü"
s1 = s0.encode(Encoding::ISO_8859_1)
s0.encoding # => #<Encoding:UTF-8>
s1.encoding # => #<Encoding:ISO-8859-1>
s0.hash == s1.hash # => false
Related: see Querying
.
# File 'string.c', line 4140
static VALUE rb_str_hash_m(VALUE str) { st_index_t hval = rb_str_hash(str); return ST2FIX(hval); }
#hex ⇒ Integer
Interprets the leading substring of self
as hexadecimal; returns its integer value:
'0xFFFF'.hex # => 65535
'FFzzzFF'.hex # => 255 # Hex ends at first non-hex character, 'z'.
'ffzzzFF'.hex # => 255 # Case does not matter.
'-FFzzzFF'.hex # => -255 # May have leading '-'.
'0xFFzzzFF'.hex # => 255 # May have leading '0x'.
'-0xFFzzzFF'.hex # => -255 # May have leading '-0x'.
Returns zero if there is no such leading substring:
'zzz'.hex # => 0
Related: See Converting to Non-String
.
# File 'string.c', line 10729
static VALUE rb_str_hex(VALUE str) { return rb_str_to_inum(str, 16, FALSE); }
#include?(other_string) ⇒ Boolean
Returns whether self
contains other_string
:
s = 'bar'
s.include?('ba') # => true
s.include?('ar') # => true
s.include?('bar') # => true
s.include?('a') # => true
s.include?('') # => true
s.include?('foo') # => false
Related: see Querying
.
# File 'string.c', line 7086
VALUE rb_str_include(VALUE str, VALUE arg) { long i; StringValue(arg); i = rb_str_index(str, arg, 0); return RBOOL(i != -1); }
#index(pattern, offset = 0) ⇒ Integer?
Returns the integer position of the first substring that matches the given argument pattern
, or nil
if none found.
When pattern
is a string, returns the index of the first matching substring in self
:
'foo'.index('f') # => 0
'foo'.index('o') # => 1
'foo'.index('oo') # => 1
'foo'.index('ooo') # => nil
'тест'.index('с') # => 2 # Characters, not bytes.
'こんにちは'.index('ち') # => 3
When +pattern is a ::Regexp
, returns the index of the first match in self
:
'foo'.index(/o./) # => 1
'foo'.index(/.o/) # => 0
When offset
is non-negative, begins the search at position offset
; the returned index is relative to the beginning of self
:
'bar'.index('r', 0) # => 2
'bar'.index('r', 1) # => 2
'bar'.index('r', 2) # => 2
'bar'.index('r', 3) # => nil
'bar'.index(/[r-z]/, 0) # => 2
'тест'.index('с', 1) # => 2
'тест'.index('с', 2) # => 2
'тест'.index('с', 3) # => nil # Offset in characters, not bytes.
'こんにちは'.index('ち', 2) # => 3
With negative integer argument offset
, selects the search position by counting backward from the end of self
:
'foo'.index('o', -1) # => 2
'foo'.index('o', -2) # => 1
'foo'.index('o', -3) # => 1
'foo'.index('o', -4) # => nil
'foo'.index(/o./, -2) # => 1
'foo'.index(/.o/, -2) # => 1
Related: see Querying
.
# File 'string.c', line 4508
static VALUE rb_str_index_m(int argc, VALUE *argv, VALUE str) { VALUE sub; VALUE initpos; rb_encoding *enc = STR_ENC_GET(str); long pos; if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) { long slen = str_strlen(str, enc); /* str's enc */ pos = NUM2LONG(initpos); if (pos < 0 ? (pos += slen) < 0 : pos > slen) { if (RB_TYPE_P(sub, T_REGEXP)) { rb_backref_set(Qnil); } return Qnil; } } else { pos = 0; } if (RB_TYPE_P(sub, T_REGEXP)) { pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos, enc, single_byte_optimizable(str)); if (rb_reg_search(sub, str, pos, 0) >= 0) { VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); pos = rb_str_sublen(str, BEG(0)); return LONG2NUM(pos); } } else { StringValue(sub); pos = rb_str_index(str, sub, pos); if (pos >= 0) { pos = rb_str_sublen(str, pos); return LONG2NUM(pos); } } return Qnil; }
#replace(other_string) ⇒ self
#initialize_copy(other_string) ⇒ self
self
#initialize_copy(other_string) ⇒ self
Alias for #replace.
#insert(offset, other_string) ⇒ self
Inserts the given other_string
into self
; returns self
.
If the given #index is non-negative, inserts other_string
at offset #index:
'foo'.insert(0, 'bar') # => "barfoo"
'foo'.insert(1, 'bar') # => "fbaroo"
'foo'.insert(3, 'bar') # => "foobar"
'тест'.insert(2, 'bar') # => "теbarст" # Characters, not bytes.
'こんにちは'.insert(2, 'bar') # => "こんbarにちは"
If the #index is negative, counts backward from the end of self
and inserts other_string
after the offset:
'foo'.insert(-2, 'bar') # => "fobaro"
Related: see Modifying
.
# File 'string.c', line 6065
static VALUE rb_str_insert(VALUE str, VALUE idx, VALUE str2) { long pos = NUM2LONG(idx); if (pos == -1) { return rb_str_append(str, str2); } else if (pos < 0) { pos++; } rb_str_update(str, pos, 0, str2); return str; }
#inspect ⇒ String
Returns a printable version of self
, enclosed in double-quotes.
Most printable characters are rendered simply as themselves:
'abc'.inspect # => "\"abc\""
'012'.inspect # => "\"012\""
''.inspect # => "\"\""
"\u000012".inspect # => "\"\\u000012\""
'тест'.inspect # => "\"тест\""
'こんにちは'.inspect # => "\"こんにちは\""
But printable characters double-quote ('"'
) and backslash and ('\'
) are escaped:
'"'.inspect # => "\"\\\"\""
'\\'.inspect # => "\"\\\\\""
Unprintable characters are the ASCII characters whose values are in range 0..31
, along with the character whose value is 127
.
Most of these characters are rendered thus:
0.chr.inspect # => "\"\\x00\""
1.chr.inspect # => "\"\\x01\""
2.chr.inspect # => "\"\\x02\""
# ...
A few, however, have special renderings:
7.chr.inspect # => "\"\\a\"" # BEL
8.chr.inspect # => "\"\\b\"" # BS
9.chr.inspect # => "\"\\t\"" # TAB
10.chr.inspect # => "\"\\n\"" # LF
11.chr.inspect # => "\"\\v\"" # VT
12.chr.inspect # => "\"\\f\"" # FF
13.chr.inspect # => "\"\\r\"" # CR
27.chr.inspect # => "\"\\e\"" # ESC
Related: see Converting to Non-String
.
# File 'string.c', line 7310
VALUE rb_str_inspect(VALUE str) { int encidx = ENCODING_GET(str); rb_encoding *enc = rb_enc_from_index(encidx); const char *p, *pend, *prev; char buf[CHAR_ESC_LEN + 1]; VALUE result = rb_str_buf_new(0); rb_encoding *resenc = rb_default_internal_encoding(); int unicode_p = rb_enc_unicode_p(enc); int asciicompat = rb_enc_asciicompat(enc); if (resenc == NULL) resenc = rb_default_external_encoding(); if (!rb_enc_asciicompat(resenc)) resenc = rb_usascii_encoding(); rb_enc_associate(result, resenc); str_buf_cat2(result, "\""); p = RSTRING_PTR(str); pend = RSTRING_END(str); prev = p; while (p < pend) { unsigned int c, cc; int n; n = rb_enc_precise_mbclen(p, pend, enc); if (!MBCLEN_CHARFOUND_P(n)) { if (p > prev) str_buf_cat(result, prev, p - prev); n = rb_enc_mbminlen(enc); if (pend < p + n) n = (int)(pend - p); while (n--) { snprintf(buf, CHAR_ESC_LEN, "\\x%02X", *p & 0377); str_buf_cat(result, buf, strlen(buf)); prev = ++p; } continue; } n = MBCLEN_CHARFOUND_LEN(n); c = rb_enc_mbc_to_codepoint(p, pend, enc); p += n; if ((asciicompat || unicode_p) && (c == '"'|| c == '\\' || (c == '#' && p < pend && MBCLEN_CHARFOUND_P(rb_enc_precise_mbclen(p,pend,enc)) && (cc = rb_enc_codepoint(p,pend,enc), (cc == '$' || cc == '@' || cc == '{'))))) { if (p - n > prev) str_buf_cat(result, prev, p - n - prev); str_buf_cat2(result, "\\"); if (asciicompat || enc == resenc) { prev = p - n; continue; } } switch (c) { case '\n': cc = 'n'; break; case '\r': cc = 'r'; break; case '\t': cc = 't'; break; case '\f': cc = 'f'; break; case '\013': cc = 'v'; break; case '\010': cc = 'b'; break; case '\007': cc = 'a'; break; case 033: cc = 'e'; break; default: cc = 0; break; } if (cc) { if (p - n > prev) str_buf_cat(result, prev, p - n - prev); buf[0] = '\\'; buf[1] = (char)cc; str_buf_cat(result, buf, 2); prev = p; continue; } /* The special casing of 0x85 (NEXT_LINE) here is because * Oniguruma historically treats it as printable, but it * doesn't match the print POSIX bracket class or character * property in regexps. * * See Ruby Bug #16842 for details: * https://bugs.ruby-lang.org/issues/16842 */ if ((enc == resenc && rb_enc_isprint(c, enc) && c != 0x85) || (asciicompat && rb_enc_isascii(c, enc) && ISPRINT(c))) { continue; } else { if (p - n > prev) str_buf_cat(result, prev, p - n - prev); rb_str_buf_cat_escaped_char(result, c, unicode_p); prev = p; continue; } } if (p > prev) str_buf_cat(result, prev, p - prev); str_buf_cat2(result, "\""); return result; }
#intern ⇒ Symbol Also known as: #to_sym
Returns the ::Symbol
object derived from self
, creating it if it did not already exist:
'foo'.intern # => :foo
'тест'.intern # => :тест
'こんにちは'.intern # => :こんにちは
Related: see Converting to Non-String
.
# File 'symbol.c', line 936
VALUE rb_str_intern(VALUE str) { return sym_find_or_insert_dynamic_symbol(&ruby_global_symbols, str); }
#length ⇒ Integer Also known as: #size
# File 'string.c', line 2396
VALUE rb_str_length(VALUE str) { return LONG2NUM(str_strlen(str, NULL)); }
#lines(record_separator = $/, chomp: false) ⇒ String
Returns substrings (“lines”) of self
according to the given arguments:
s = <<~EOT
This is the first line.
This is line two.
This is line four.
This is line five.
EOT
With the default argument values:
$/ # => "\n"
s.lines
# =>
["This is the first line.\n",
"This is line two.\n",
"\n",
"This is line four.\n",
"This is line five.\n"]
With a different record_separator
:
record_separator = ' is '
s.lines(record_separator)
# =>
["This is ",
"the first line.\nThis is ",
"line two.\n\nThis is ",
"line four.\nThis is ",
"line five.\n"]
With keyword argument #chomp as true
, removes the trailing newline from each line:
s.lines(chomp: true)
# =>
["This is the first line.",
"This is line two.",
"",
"This is line four.",
"This is line five."]
Related: see Converting to Non-String
.
# File 'string.c', line 9740
static VALUE rb_str_lines(int argc, VALUE *argv, VALUE str) { VALUE ary = WANTARRAY("lines", 0); return rb_str_enumerate_lines(argc, argv, str, ary); }
#ljust(width, pad_string = ' ') ⇒ String
Returns a copy of self
, left-justified and, if necessary, right-padded with the pad_string
:
'hello'.ljust(10) # => "hello "
' hello'.ljust(10) # => " hello "
'hello'.ljust(10, 'ab') # => "helloababa"
'тест'.ljust(10) # => "тест "
'こんにちは'.ljust(10) # => "こんにちは "
If width <= self.length
, returns a copy of self
:
'hello'.ljust(5) # => "hello"
'hello'.ljust(1) # => "hello" # Does not truncate to width.
Related: see Converting to New String
.
# File 'string.c', line 11079
static VALUE rb_str_ljust(int argc, VALUE *argv, VALUE str) { return rb_str_justify(argc, argv, str, 'l'); }
#lstrip ⇒ String
# File 'string.c', line 10408
static VALUE rb_str_lstrip(VALUE str) { char *start; long len, loffset; RSTRING_GETMEM(str, start, len); loffset = lstrip_offset(str, start, start+len, STR_ENC_GET(str)); if (loffset <= 0) return str_duplicate(rb_cString, str); return rb_str_subseq(str, loffset, len - loffset); }
#lstrip! ⇒ self
?
# File 'string.c', line 10370
static VALUE rb_str_lstrip_bang(VALUE str) { rb_encoding *enc; char *start, *s; long olen, loffset; str_modify_keep_cr(str); enc = STR_ENC_GET(str); RSTRING_GETMEM(str, start, olen); loffset = lstrip_offset(str, start, start+olen, enc); if (loffset > 0) { long len = olen-loffset; s = start + loffset; memmove(start, s, len); STR_SET_LEN(str, len); TERM_FILL(start+len, rb_enc_mbminlen(enc)); return str; } return Qnil; }
Returns a ::MatchData
object (or nil
) based on self
and the given pattern
.
Note: also updates Regexp@Global+Variables.
-
Computes
regexp
by convertingpattern
(if not already a::Regexp
).regexp = Regexp.new(pattern)
-
Computes
matchdata
, which will be either a::MatchData
object ornil
(see Regexp#match):matchdata = regexp.match(self)
With no block given, returns the computed matchdata
:
'foo'.match('f') # => #<MatchData "f">
'foo'.match('o') # => #<MatchData "o">
'foo'.match('x') # => nil
If Integer argument offset
is given, the search begins at index offset
:
'foo'.match('f', 1) # => nil
'foo'.match('o', 1) # => #<MatchData "o">
With a block given, calls the block with the computed matchdata
and returns the block’s return value:
'foo'.match(/o/) {|matchdata| matchdata } # => #<MatchData "o">
'foo'.match(/x/) {|matchdata| matchdata } # => nil
'foo'.match(/f/, 1) {|matchdata| matchdata } # => nil
# File 'string.c', line 5100
static VALUE rb_str_match_m(int argc, VALUE *argv, VALUE str) { VALUE re, result; if (argc < 1) rb_check_arity(argc, 1, 2); re = argv[0]; argv[0] = str; result = rb_funcallv(get_pat(re), rb_intern("match"), argc, argv); if (!NIL_P(result) && rb_block_given_p()) { return rb_yield(result); } return result; }
#match?(pattern, offset = 0) ⇒ Boolean
Returns true
or false
based on whether a match is found for self
and pattern
.
Note: does not update Regexp@Global+Variables.
Computes regexp
by converting pattern
(if not already a ::Regexp
).
regexp = Regexp.new(pattern)
Returns true
if self
.match(regexp)</tt> returns a ::MatchData
object, false
otherwise:
'foo'.match?(/o/) # => true
'foo'.match?('o') # => true
'foo'.match?(/x/) # => false
If Integer argument offset
is given, the search begins at index offset
:
'foo'.match?('f', 1) # => false
'foo'.match?('o', 1) # => true
# File 'string.c', line 5139
static VALUE rb_str_match_m_p(int argc, VALUE *argv, VALUE str) { VALUE re; rb_check_arity(argc, 1, 2); re = get_pat(argv[0]); return rb_reg_match_p(re, str, argc > 1 ? NUM2LONG(argv[1]) : 0); }
#next ⇒ String
Also known as: #succ
Returns the successor to self
. The successor is calculated by incrementing characters.
The first character to be incremented is the rightmost alphanumeric: or, if no alphanumerics, the rightmost character:
'THX1138'.succ # => "THX1139"
'<<koala>>'.succ # => "<<koalb>>"
'***'.succ # => '**+'
The successor to a digit is another digit, “carrying” to the next-left character for a “rollover” from 9 to 0, and prepending another digit if necessary:
'00'.succ # => "01"
'09'.succ # => "10"
'99'.succ # => "100"
The successor to a letter is another letter of the same case, carrying to the next-left character for a rollover, and prepending another same-case letter if necessary:
'aa'.succ # => "ab"
'az'.succ # => "ba"
'zz'.succ # => "aaa"
'AA'.succ # => "AB"
'AZ'.succ # => "BA"
'ZZ'.succ # => "AAA"
The successor to a non-alphanumeric character is the next character in the underlying character set’s collating sequence, carrying to the next-left character for a rollover, and prepending another character if necessary:
s = 0.chr * 3
s # => "\x00\x00\x00"
s.succ # => "\x00\x00\x01"
s = 255.chr * 3
s # => "\xFF\xFF\xFF"
s.succ # => "\x01\x00\x00\x00"
Carrying can occur between and among mixtures of alphanumeric characters:
s = 'zz99zz99'
s.succ # => "aaa00aa00"
s = '99zz99zz'
s.succ # => "100aa00aa"
The successor to an empty String
is a new empty String
:
''.succ # => ""
# File 'string.c', line 5391
VALUE rb_str_succ(VALUE orig) { VALUE str; str = rb_str_new(RSTRING_PTR(orig), RSTRING_LEN(orig)); rb_enc_cr_str_copy_for_substr(str, orig); return str_succ(str); }
#next! ⇒ self
Also known as: #succ!
Equivalent to #succ, but modifies self
in place; returns self
.
# File 'string.c', line 5495
static VALUE rb_str_succ_bang(VALUE str) { rb_str_modify(str); str_succ(str); return str; }
#oct ⇒ Integer
Interprets the leading substring of self
as a string of octal digits (with an optional sign) and returns the corresponding number; returns zero if there is no such leading substring:
'123'.oct # => 83
'-377'.oct # => -255
'0377non-numeric'.oct # => 255
'non-numeric'.oct # => 0
If self
starts with 0
, radix indicators are honored; see Kernel.Integer.
Related: #hex.
# File 'string.c', line 10756
static VALUE rb_str_oct(VALUE str) { return rb_str_to_inum(str, -8, FALSE); }
#ord ⇒ Integer
Returns the integer ordinal of the first character of self
:
'h'.ord # => 104
'hello'.ord # => 104
'тест'.ord # => 1090
'こんにちは'.ord # => 12371
# File 'string.c', line 10909
static VALUE rb_str_ord(VALUE s) { unsigned int c; c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s)); return UINT2NUM(c); }
#partition(string_or_regexp) ⇒ Array, ...
Returns a 3-element array of substrings of self
.
Matches a pattern against self
, scanning from the beginning. The pattern is:
-
string_or_regexp
itself, if it is a::Regexp
. -
Regexp.quote(string_or_regexp)
, ifstring_or_regexp
is a string.
If the pattern is matched, returns pre-match, first-match, post-match:
'hello'.partition('l') # => ["he", "l", "lo"]
'hello'.partition('ll') # => ["he", "ll", "o"]
'hello'.partition('h') # => ["", "h", "ello"]
'hello'.partition('o') # => ["hell", "o", ""]
'hello'.partition(/l+/) #=> ["he", "ll", "o"]
'hello'.partition('') # => ["", "", "hello"]
'тест'.partition('т') # => ["", "т", "ест"]
'こんにちは'.partition('に') # => ["こん", "に", "ちは"]
If the pattern is not matched, returns a copy of self
and two empty strings:
'hello'.partition('x') # => ["hello", "", ""]
Related: #rpartition, #split.
# File 'string.c', line 11124
static VALUE rb_str_partition(VALUE str, VALUE sep) { long pos; sep = get_pat_quoted(sep, 0); if (RB_TYPE_P(sep, T_REGEXP)) { if (rb_reg_search(sep, str, 0, 0) < 0) { goto failed; } VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); pos = BEG(0); sep = rb_str_subseq(str, pos, END(0) - pos); } else { pos = rb_str_index(str, sep, 0); if (pos < 0) goto failed; } return rb_ary_new3(3, rb_str_subseq(str, 0, pos), sep, rb_str_subseq(str, pos+RSTRING_LEN(sep), RSTRING_LEN(str)-pos-RSTRING_LEN(sep))); failed: return rb_ary_new3(3, str_duplicate(rb_cString, str), str_new_empty_String(str), str_new_empty_String(str)); }
#prepend(*other_strings) ⇒ String
Prepends each string in other_strings
to self
and returns self
:
s = 'foo'
s.prepend('bar', 'baz') # => "barbazfoo"
s # => "barbazfoo"
Related: #concat.
# File 'string.c', line 4085
static VALUE rb_str_prepend_multi(int argc, VALUE *argv, VALUE str) { str_modifiable(str); if (argc == 1) { rb_str_update(str, 0L, 0L, argv[0]); } else if (argc > 1) { int i; VALUE arg_str = rb_str_tmp_new(0); rb_enc_copy(arg_str, str); for (i = 0; i < argc; i++) { rb_str_append(arg_str, argv[i]); } rb_str_update(str, 0L, 0L, arg_str); } return str; }
#replace(other_string) ⇒ self
Also known as: #initialize_copy
Replaces the contents of self
with the contents of other_string
:
s = 'foo' # => "foo"
s.replace('bar') # => "bar"
# File 'string.c', line 6643
VALUE rb_str_replace(VALUE str, VALUE str2) { str_modifiable(str); if (str == str2) return str; StringValue(str2); str_discard(str); return str_replace(str, str2); }
#reverse ⇒ String
Returns a new string with the characters from self
in reverse order.
'stressed'.reverse # => "desserts"
# File 'string.c', line 6979
static VALUE rb_str_reverse(VALUE str) { rb_encoding *enc; VALUE rev; char *s, *e, *p; int cr; if (RSTRING_LEN(str) <= 1) return str_duplicate(rb_cString, str); enc = STR_ENC_GET(str); rev = rb_str_new(0, RSTRING_LEN(str)); s = RSTRING_PTR(str); e = RSTRING_END(str); p = RSTRING_END(rev); cr = ENC_CODERANGE(str); if (RSTRING_LEN(str) > 1) { if (single_byte_optimizable(str)) { while (s < e) { *--p = *s++; } } else if (cr == ENC_CODERANGE_VALID) { while (s < e) { int clen = rb_enc_fast_mbclen(s, e, enc); p -= clen; memcpy(p, s, clen); s += clen; } } else { cr = rb_enc_asciicompat(enc) ? ENC_CODERANGE_7BIT : ENC_CODERANGE_VALID; while (s < e) { int clen = rb_enc_mbclen(s, e, enc); if (clen > 1 || (*s & 0x80)) cr = ENC_CODERANGE_UNKNOWN; p -= clen; memcpy(p, s, clen); s += clen; } } } STR_SET_LEN(rev, RSTRING_LEN(str)); str_enc_copy_direct(rev, str); ENC_CODERANGE_SET(rev, cr); return rev; }
#reverse! ⇒ self
Returns self
with its characters reversed:
s = 'stressed'
s.reverse! # => "desserts"
s # => "desserts"
# File 'string.c', line 7042
static VALUE rb_str_reverse_bang(VALUE str) { if (RSTRING_LEN(str) > 1) { if (single_byte_optimizable(str)) { char *s, *e, c; str_modify_keep_cr(str); s = RSTRING_PTR(str); e = RSTRING_END(str) - 1; while (s < e) { c = *s; *s++ = *e; *e-- = c; } } else { str_shared_replace(str, rb_str_reverse(str)); } } else { str_modify_keep_cr(str); } return str; }
Returns the ::Integer
index of the last occurrence of the given substring
, or nil
if none found:
'foo'.rindex('f') # => 0
'foo'.rindex('o') # => 2
'foo'.rindex('oo') # => 1
'foo'.rindex('ooo') # => nil
Returns the ::Integer
index of the last match for the given ::Regexp
regexp
, or nil
if none found:
'foo'.rindex(/f/) # => 0
'foo'.rindex(/o/) # => 2
'foo'.rindex(/oo/) # => 1
'foo'.rindex(/ooo/) # => nil
The last match means starting at the possible last position, not the last of longest matches.
'foo'.rindex(/o+/) # => 2
$~ #=> #<MatchData "o">
To get the last longest match, needs to combine with negative lookbehind.
'foo'.rindex(/(?<!o)o+/) # => 1
$~ #=> #<MatchData "oo">
Or #index with negative lookforward.
'foo'.index(/o+(?!.*o)/) # => 1
$~ #=> #<MatchData "oo">
::Integer
argument offset
, if given and non-negative, specifies the maximum starting position in the string to end the search:
'foo'.rindex('o', 0) # => nil
'foo'.rindex('o', 1) # => 1
'foo'.rindex('o', 2) # => 2
'foo'.rindex('o', 3) # => 2
If offset
is a negative ::Integer
, the maximum starting position in the string to end the search is the sum of the string’s length and offset
:
'foo'.rindex('o', -1) # => 2
'foo'.rindex('o', -2) # => 1
'foo'.rindex('o', -3) # => nil
'foo'.rindex('o', -4) # => nil
Related: #index.
# File 'string.c', line 4815
static VALUE rb_str_rindex_m(int argc, VALUE *argv, VALUE str) { VALUE sub; VALUE initpos; rb_encoding *enc = STR_ENC_GET(str); long pos, len = str_strlen(str, enc); /* str's enc */ if (rb_scan_args(argc, argv, "11", &sub, &initpos) == 2) { pos = NUM2LONG(initpos); if (pos < 0 && (pos += len) < 0) { if (RB_TYPE_P(sub, T_REGEXP)) { rb_backref_set(Qnil); } return Qnil; } if (pos > len) pos = len; } else { pos = len; } if (RB_TYPE_P(sub, T_REGEXP)) { /* enc = rb_enc_check(str, sub); */ pos = str_offset(RSTRING_PTR(str), RSTRING_END(str), pos, enc, single_byte_optimizable(str)); if (rb_reg_search(sub, str, pos, 1) >= 0) { VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); pos = rb_str_sublen(str, BEG(0)); return LONG2NUM(pos); } } else { StringValue(sub); pos = rb_str_rindex(str, sub, pos); if (pos >= 0) { pos = rb_str_sublen(str, pos); return LONG2NUM(pos); } } return Qnil; }
#rjust(size, pad_string = ' ') ⇒ String
Returns a right-justified copy of self
.
If integer argument #size is greater than the size (in characters) of self
, returns a new string of length #size that is a copy of self
, right justified and padded on the left with pad_string
:
'hello'.rjust(10) # => " hello"
'hello '.rjust(10) # => " hello "
'hello'.rjust(10, 'ab') # => "ababahello"
'тест'.rjust(10) # => " тест"
'こんにちは'.rjust(10) # => " こんにちは"
If #size is not greater than the size of self
, returns a copy of self
:
'hello'.rjust(5, 'ab') # => "hello"
'hello'.rjust(1, 'ab') # => "hello"
# File 'string.c', line 11095
static VALUE rb_str_rjust(int argc, VALUE *argv, VALUE str) { return rb_str_justify(argc, argv, str, 'r'); }
#rpartition(sep) ⇒ Array, ...
Returns a 3-element array of substrings of self
.
Matches a pattern against self
, scanning backwards from the end. The pattern is:
-
string_or_regexp
itself, if it is a::Regexp
. -
Regexp.quote(string_or_regexp)
, ifstring_or_regexp
is a string.
If the pattern is matched, returns pre-match, last-match, post-match:
'hello'.rpartition('l') # => ["hel", "l", "o"]
'hello'.rpartition('ll') # => ["he", "ll", "o"]
'hello'.rpartition('h') # => ["", "h", "ello"]
'hello'.rpartition('o') # => ["hell", "o", ""]
'hello'.rpartition(/l+/) # => ["hel", "l", "o"]
'hello'.rpartition('') # => ["hello", "", ""]
'тест'.rpartition('т') # => ["тес", "т", ""]
'こんにちは'.rpartition('に') # => ["こん", "に", "ちは"]
If the pattern is not matched, returns two empty strings and a copy of self
:
'hello'.rpartition('x') # => ["", "", "hello"]
Related: #partition, #split.
# File 'string.c', line 11161
static VALUE rb_str_rpartition(VALUE str, VALUE sep) { long pos = RSTRING_LEN(str); sep = get_pat_quoted(sep, 0); if (RB_TYPE_P(sep, T_REGEXP)) { if (rb_reg_search(sep, str, pos, 1) < 0) { goto failed; } VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); pos = BEG(0); sep = rb_str_subseq(str, pos, END(0) - pos); } else { pos = rb_str_sublen(str, pos); pos = rb_str_rindex(str, sep, pos); if (pos < 0) { goto failed; } } return rb_ary_new3(3, rb_str_subseq(str, 0, pos), sep, rb_str_subseq(str, pos+RSTRING_LEN(sep), RSTRING_LEN(str)-pos-RSTRING_LEN(sep))); failed: return rb_ary_new3(3, str_new_empty_String(str), str_new_empty_String(str), str_duplicate(rb_cString, str)); }
#rstrip ⇒ String
# File 'string.c', line 10495
static VALUE rb_str_rstrip(VALUE str) { rb_encoding *enc; char *start; long olen, roffset; enc = STR_ENC_GET(str); RSTRING_GETMEM(str, start, olen); roffset = rstrip_offset(str, start, start+olen, enc); if (roffset <= 0) return str_duplicate(rb_cString, str); return rb_str_subseq(str, 0, olen-roffset); }
#rstrip! ⇒ self
?
# File 'string.c', line 10458
static VALUE rb_str_rstrip_bang(VALUE str) { rb_encoding *enc; char *start; long olen, roffset; str_modify_keep_cr(str); enc = STR_ENC_GET(str); RSTRING_GETMEM(str, start, olen); roffset = rstrip_offset(str, start, start+olen, enc); if (roffset > 0) { long len = olen - roffset; STR_SET_LEN(str, len); TERM_FILL(start+len, rb_enc_mbminlen(enc)); return str; } return Qnil; }
#scan(string_or_regexp) ⇒ Array
#scan(string_or_regexp) {|matches| ... } ⇒ self
self
Matches a pattern against self
; the pattern is:
-
string_or_regexp
itself, if it is a::Regexp
. -
Regexp.quote(string_or_regexp)
, ifstring_or_regexp
is a string.
Iterates through self
, generating a collection of matching results:
-
If the pattern contains no groups, each result is the matched string,
$&
. -
If the pattern contains groups, each result is an array containing one entry per group.
With no block given, returns an array of the results:
s = 'cruel world'
s.scan(/\w+/) # => ["cruel", "world"]
s.scan(/.../) # => ["cru", "el ", "wor"]
s.scan(/(...)/) # => [["cru"], ["el "], ["wor"]]
s.scan(/(..)(..)/) # => [["cr", "ue"], ["l ", "wo"]]
With a block given, calls the block with each result; returns self
:
s.scan(/\w+/) {|w| print "<<#{w}>> " }
print "\n"
s.scan(/(.)(.)/) {|x,y| print y, x }
print "\n"
Output:
<<cruel>> <<world>>
rceu lowlr
# File 'string.c', line 10674
static VALUE rb_str_scan(VALUE str, VALUE pat) { VALUE result; long start = 0; long last = -1, prev = 0; char *p = RSTRING_PTR(str); long len = RSTRING_LEN(str); pat = get_pat_quoted(pat, 1); mustnot_broken(str); if (!rb_block_given_p()) { VALUE ary = rb_ary_new(); while (!NIL_P(result = scan_once(str, pat, &start, 0))) { last = prev; prev = start; rb_ary_push(ary, result); } if (last >= 0) rb_pat_search(pat, str, last, 1); else rb_backref_set(Qnil); return ary; } while (!NIL_P(result = scan_once(str, pat, &start, 1))) { last = prev; prev = start; rb_yield(result); str_mod_check(str, p, len); } if (last >= 0) rb_pat_search(pat, str, last, 1); return str; }
#scrub(replacement_string = default_replacement) ⇒ String
#scrub {|bytes| ... } ⇒ String
String
#scrub {|bytes| ... } ⇒ String
Returns a copy of self
with each invalid byte sequence replaced by the given replacement_string
.
With no block given and no argument, replaces each invalid sequence with the default replacement string ("�"
for a Unicode encoding, '?'
otherwise):
s = "foo\x81\x81bar"
s.scrub # => "foo��bar"
With no block given and argument replacement_string
given, replaces each invalid sequence with that string:
"foo\x81\x81bar".scrub('xyzzy') # => "fooxyzzyxyzzybar"
With a block given, replaces each invalid sequence with the value of the block:
"foo\x81\x81bar".scrub {|bytes| p bytes; 'XYZZY' }
# => "fooXYZZYXYZZYbar"
Output:
"\x81"
"\x81"
# File 'string.c', line 11906
static VALUE str_scrub(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); return NIL_P(new) ? str_duplicate(rb_cString, str): new; }
#scrub! ⇒ self
#scrub!(replacement_string = default_replacement) ⇒ self
#scrub! {|bytes| ... } ⇒ self
self
#scrub!(replacement_string = default_replacement) ⇒ self
#scrub! {|bytes| ... } ⇒ self
Like #scrub, except that any replacements are made in self
.
# File 'string.c', line 11923
static VALUE str_scrub_bang(int argc, VALUE *argv, VALUE str) { VALUE repl = argc ? (rb_check_arity(argc, 0, 1), argv[0]) : Qnil; VALUE new = rb_str_scrub(str, repl); if (!NIL_P(new)) rb_str_replace(str, new); return str; }
#setbyte(index, integer) ⇒ Integer
# File 'string.c', line 6727
VALUE rb_str_setbyte(VALUE str, VALUE index, VALUE value) { long pos = NUM2LONG(index); long len = RSTRING_LEN(str); char *ptr, *head, *left = 0; rb_encoding *enc; int cr = ENC_CODERANGE_UNKNOWN, width, nlen; if (pos < -len || len <= pos) rb_raise(rb_eIndexError, "index %ld out of string", pos); if (pos < 0) pos += len; VALUE v = rb_to_int(value); VALUE w = rb_int_and(v, INT2FIX(0xff)); char byte = (char)(NUM2INT(w) & 0xFF); if (!str_independent(str)) str_make_independent(str); enc = STR_ENC_GET(str); head = RSTRING_PTR(str); ptr = &head[pos]; if (!STR_EMBED_P(str)) { cr = ENC_CODERANGE(str); switch (cr) { case ENC_CODERANGE_7BIT: left = ptr; *ptr = byte; if (ISASCII(byte)) goto end; nlen = rb_enc_precise_mbclen(left, head+len, enc); if (!MBCLEN_CHARFOUND_P(nlen)) ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN); else ENC_CODERANGE_SET(str, ENC_CODERANGE_VALID); goto end; case ENC_CODERANGE_VALID: left = rb_enc_left_char_head(head, ptr, head+len, enc); width = rb_enc_precise_mbclen(left, head+len, enc); *ptr = byte; nlen = rb_enc_precise_mbclen(left, head+len, enc); if (!MBCLEN_CHARFOUND_P(nlen)) ENC_CODERANGE_SET(str, ENC_CODERANGE_BROKEN); else if (MBCLEN_CHARFOUND_LEN(nlen) != width || ISASCII(byte)) ENC_CODERANGE_CLEAR(str); goto end; } } ENC_CODERANGE_CLEAR(str); *ptr = byte; end: return value; }
Alias for #length.
#[](index) ⇒ String
?
#[](start, length) ⇒ String
?
#[](range) ⇒ String
?
#[](substring) ⇒ String
?
String
?
#[](start, length) ⇒ String
?
#[](range) ⇒ String
?
#[](substring) ⇒ String
?
Alias for #[].
#slice!(index) ⇒ String
?
#slice!(start, length) ⇒ String
?
#slice!(range) ⇒ String
?
#slice!(regexp, capture = 0) ⇒ String
?
#slice!(substring) ⇒ String
?
String
?
#slice!(start, length) ⇒ String
?
#slice!(range) ⇒ String
?
#slice!(regexp, capture = 0) ⇒ String
?
#slice!(substring) ⇒ String
?
Removes and returns the substring of self
specified by the arguments. See
.String
Slices
A few examples:
string = "This is a string"
string.slice!(2) #=> "i"
string.slice!(3..6) #=> " is "
string.slice!(/s.*t/) #=> "sa st"
string.slice!("r") #=> "r"
string #=> "Thing"
# File 'string.c', line 6103
static VALUE rb_str_slice_bang(int argc, VALUE *argv, VALUE str) { VALUE result = Qnil; VALUE indx; long beg, len = 1; char *p; rb_check_arity(argc, 1, 2); str_modify_keep_cr(str); indx = argv[0]; if (RB_TYPE_P(indx, T_REGEXP)) { if (rb_reg_search(indx, str, 0, 0) < 0) return Qnil; VALUE match = rb_backref_get(); struct re_registers *regs = RMATCH_REGS(match); int nth = 0; if (argc > 1 && (nth = rb_reg_backref_number(match, argv[1])) < 0) { if ((nth += regs->num_regs) <= 0) return Qnil; } else if (nth >= regs->num_regs) return Qnil; beg = BEG(nth); len = END(nth) - beg; goto subseq; } else if (argc == 2) { beg = NUM2LONG(indx); len = NUM2LONG(argv[1]); goto num_index; } else if (FIXNUM_P(indx)) { beg = FIX2LONG(indx); if (!(p = rb_str_subpos(str, beg, &len))) return Qnil; if (!len) return Qnil; beg = p - RSTRING_PTR(str); goto subseq; } else if (RB_TYPE_P(indx, T_STRING)) { beg = rb_str_index(str, indx, 0); if (beg == -1) return Qnil; len = RSTRING_LEN(indx); result = str_duplicate(rb_cString, indx); goto squash; } else { switch (rb_range_beg_len(indx, &beg, &len, str_strlen(str, NULL), 0)) { case Qnil: return Qnil; case Qfalse: beg = NUM2LONG(indx); if (!(p = rb_str_subpos(str, beg, &len))) return Qnil; if (!len) return Qnil; beg = p - RSTRING_PTR(str); goto subseq; default: goto num_index; } } num_index: if (!(p = rb_str_subpos(str, beg, &len))) return Qnil; beg = p - RSTRING_PTR(str); subseq: result = rb_str_new(RSTRING_PTR(str)+beg, len); rb_enc_cr_str_copy_for_substr(result, str); squash: if (len > 0) { if (beg == 0) { rb_str_drop_bytes(str, len); } else { char *sptr = RSTRING_PTR(str); long slen = RSTRING_LEN(str); if (beg + len > slen) /* pathological check */ len = slen - beg; memmove(sptr + beg, sptr + beg + len, slen - (beg + len)); slen -= len; STR_SET_LEN(str, slen); TERM_FILL(&sptr[slen], TERM_LEN(str)); } } return result; }
#split(field_sep = $;, limit = 0) ⇒ Array
#split(field_sep = $;, limit = 0) {|substring| ... } ⇒ self
self
Returns an array of substrings of self
that are the result of splitting self
at each occurrence of the given field separator field_sep
.
When field_sep
is $;
:
-
If
$;
isnil
(its default value), the split occurs just as iffield_sep
were given as a space character (see below). -
If
$;
is a string, the split occurs just as iffield_sep
were given as that string (see below).
When field_sep
is ' '
and limit
is 0
(its default value), the split occurs at each sequence of whitespace:
'abc def ghi'.split(' ') # => ["abc", "def", "ghi"]
"abc \n\tdef\t\n ghi".split(' ') # => ["abc", "def", "ghi"]
'abc def ghi'.split(' ') # => ["abc", "def", "ghi"]
''.split(' ') # => []
When field_sep
is a string different from ' '
and limit
is 0
, the split occurs at each occurrence of field_sep
; trailing empty substrings are not returned:
'abracadabra'.split('ab') # => ["", "racad", "ra"]
'aaabcdaaa'.split('a') # => ["", "", "", "bcd"]
''.split('a') # => []
'3.14159'.split('1') # => ["3.", "4", "59"]
'!@#$%^$&*($)_+'.split('$') # => ["!@#", "%^", "&*(", ")_+"]
'тест'.split('т') # => ["", "ес"]
'こんにちは'.split('に') # => ["こん", "ちは"]
When field_sep
is a ::Regexp
and limit
is 0
, the split occurs at each occurrence of a match; trailing empty substrings are not returned:
'abracadabra'.split(/ab/) # => ["", "racad", "ra"]
'aaabcdaaa'.split(/a/) # => ["", "", "", "bcd"]
'aaabcdaaa'.split(//) # => ["a", "a", "a", "b", "c", "d", "a", "a", "a"]
'1 + 1 == 2'.split(/\W+/) # => ["1", "1", "2"]
If the Regexp contains groups, their matches are also included in the returned array:
'1:2:3'.split(/(:)()()/, 2) # => ["1", ":", "", "", "2:3"]
As seen above, if limit
is 0
, trailing empty substrings are not returned:
'aaabcdaaa'.split('a') # => ["", "", "", "bcd"]
If limit
is positive integer n
, no more than n - 1-
splits occur, so that at most n
substrings are returned, and trailing empty substrings are included:
'aaabcdaaa'.split('a', 1) # => ["aaabcdaaa"]
'aaabcdaaa'.split('a', 2) # => ["", "aabcdaaa"]
'aaabcdaaa'.split('a', 5) # => ["", "", "", "bcd", "aa"]
'aaabcdaaa'.split('a', 7) # => ["", "", "", "bcd", "", "", ""]
'aaabcdaaa'.split('a', 8) # => ["", "", "", "bcd", "", "", ""]
Note that if field_sep
is a Regexp containing groups, their matches are in the returned array, but do not count toward the limit.
If limit
is negative, it behaves the same as if limit
was zero, meaning that there is no limit, and trailing empty substrings are included:
'aaabcdaaa'.split('a', -1) # => ["", "", "", "bcd", "", "", ""]
If a block is given, it is called with each substring and returns self
:
'abc def ghi'.split(' ') {|substring| p substring }
Output:
"abc"
"def"
"ghi"
#=> "abc def ghi"
Note that the above example is functionally the same as calling #each
after #split
and giving the same block. However, the above example has better performance because it avoids the creation of an intermediate array. Also, note the different return values.
'abc def ghi'.split(' ').each {|substring| p substring }
Output:
"abc"
"def"
"ghi"
#=> ["abc", "def", "ghi"]
Related: #partition, #rpartition.
# File 'string.c', line 9253
static VALUE rb_str_split_m(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; VALUE spat; VALUE limit; split_type_t split_type; long beg, end, i = 0, empty_count = -1; int lim = 0; VALUE result, tmp; result = rb_block_given_p() ? Qfalse : Qnil; if (rb_scan_args(argc, argv, "02", &spat, &limit) == 2) { lim = NUM2INT(limit); if (lim <= 0) limit = Qnil; else if (lim == 1) { if (RSTRING_LEN(str) == 0) return result ? rb_ary_new2(0) : str; tmp = str_duplicate(rb_cString, str); if (!result) { rb_yield(tmp); return str; } return rb_ary_new3(1, tmp); } i = 1; } if (NIL_P(limit) && !lim) empty_count = 0; enc = STR_ENC_GET(str); split_type = SPLIT_TYPE_REGEXP; if (!NIL_P(spat)) { spat = get_pat_quoted(spat, 0); } else if (NIL_P(spat = rb_fs)) { split_type = SPLIT_TYPE_AWK; } else if (!(spat = rb_fs_check(spat))) { rb_raise(rb_eTypeError, "value of $; must be String or Regexp"); } else { rb_category_warn(RB_WARN_CATEGORY_DEPRECATED, "$; is set to non-nil value"); } if (split_type != SPLIT_TYPE_AWK) { switch (BUILTIN_TYPE(spat)) { case T_REGEXP: rb_reg_options(spat); /* check if uninitialized */ tmp = RREGEXP_SRC(spat); split_type = literal_split_pattern(tmp, SPLIT_TYPE_REGEXP); if (split_type == SPLIT_TYPE_AWK) { spat = tmp; split_type = SPLIT_TYPE_STRING; } break; case T_STRING: mustnot_broken(spat); split_type = literal_split_pattern(spat, SPLIT_TYPE_STRING); break; default: UNREACHABLE_RETURN(Qnil); } } #define SPLIT_STR(beg, len) ( \ empty_count = split_string(result, str, beg, len, empty_count), \ str_mod_check(str, str_start, str_len)) beg = 0; char *ptr = RSTRING_PTR(str); char *const str_start = ptr; const long str_len = RSTRING_LEN(str); char *const eptr = str_start + str_len; if (split_type == SPLIT_TYPE_AWK) { char *bptr = ptr; int skip = 1; unsigned int c; if (result) result = rb_ary_new(); end = beg; if (is_ascii_string(str)) { while (ptr < eptr) { c = (unsigned char)*ptr++; if (skip) { if (ascii_isspace(c)) { beg = ptr - bptr; } else { end = ptr - bptr; skip = 0; if (!NIL_P(limit) && lim <= i) break; } } else if (ascii_isspace(c)) { SPLIT_STR(beg, end-beg); skip = 1; beg = ptr - bptr; if (!NIL_P(limit)) ++i; } else { end = ptr - bptr; } } } else { while (ptr < eptr) { int n; c = rb_enc_codepoint_len(ptr, eptr, &n, enc); ptr += n; if (skip) { if (rb_isspace(c)) { beg = ptr - bptr; } else { end = ptr - bptr; skip = 0; if (!NIL_P(limit) && lim <= i) break; } } else if (rb_isspace(c)) { SPLIT_STR(beg, end-beg); skip = 1; beg = ptr - bptr; if (!NIL_P(limit)) ++i; } else { end = ptr - bptr; } } } } else if (split_type == SPLIT_TYPE_STRING) { char *substr_start = ptr; char *sptr = RSTRING_PTR(spat); long slen = RSTRING_LEN(spat); if (result) result = rb_ary_new(); mustnot_broken(str); enc = rb_enc_check(str, spat); while (ptr < eptr && (end = rb_memsearch(sptr, slen, ptr, eptr - ptr, enc)) >= 0) { /* Check we are at the start of a char */ char *t = rb_enc_right_char_head(ptr, ptr + end, eptr, enc); if (t != ptr + end) { ptr = t; continue; } SPLIT_STR(substr_start - str_start, (ptr+end) - substr_start); str_mod_check(spat, sptr, slen); ptr += end + slen; substr_start = ptr; if (!NIL_P(limit) && lim <= ++i) break; } beg = ptr - str_start; } else if (split_type == SPLIT_TYPE_CHARS) { int n; if (result) result = rb_ary_new_capa(RSTRING_LEN(str)); mustnot_broken(str); enc = rb_enc_get(str); while (ptr < eptr && (n = rb_enc_precise_mbclen(ptr, eptr, enc)) > 0) { SPLIT_STR(ptr - str_start, n); ptr += n; if (!NIL_P(limit) && lim <= ++i) break; } beg = ptr - str_start; } else { if (result) result = rb_ary_new(); long len = RSTRING_LEN(str); long start = beg; long idx; int last_null = 0; struct re_registers *regs; VALUE match = 0; for (; rb_reg_search(spat, str, start, 0) >= 0; (match ? (rb_match_unbusy(match), rb_backref_set(match)) : (void)0)) { match = rb_backref_get(); if (!result) rb_match_busy(match); regs = RMATCH_REGS(match); end = BEG(0); if (start == end && BEG(0) == END(0)) { if (!ptr) { SPLIT_STR(0, 0); break; } else if (last_null == 1) { SPLIT_STR(beg, rb_enc_fast_mbclen(ptr+beg, eptr, enc)); beg = start; } else { if (start == len) start++; else start += rb_enc_fast_mbclen(ptr+start,eptr,enc); last_null = 1; continue; } } else { SPLIT_STR(beg, end-beg); beg = start = END(0); } last_null = 0; for (idx=1; idx < regs->num_regs; idx++) { if (BEG(idx) == -1) continue; SPLIT_STR(BEG(idx), END(idx)-BEG(idx)); } if (!NIL_P(limit) && lim <= ++i) break; } if (match) rb_match_unbusy(match); } if (RSTRING_LEN(str) > 0 && (!NIL_P(limit) || RSTRING_LEN(str) > beg || lim < 0)) { SPLIT_STR(beg, RSTRING_LEN(str)-beg); } return result ? result : str; }
#squeeze(*selectors) ⇒ String
Returns a copy of self
with characters specified by selectors
“squeezed” (see Multiple Character Selectors
):
“Squeezed” means that each multiple-character run of a selected character is squeezed down to a single character; with no arguments given, squeezes all characters:
"yellow moon".squeeze #=> "yelow mon"
" now is the".squeeze(" ") #=> " now is the"
"putters shoot balls".squeeze("m-z") #=> "puters shot balls"
# File 'string.c', line 9029
static VALUE rb_str_squeeze(int argc, VALUE *argv, VALUE str) { str = str_duplicate(rb_cString, str); rb_str_squeeze_bang(argc, argv, str); return str; }
#squeeze!(*selectors) ⇒ self
?
Like #squeeze, but modifies self
in place. Returns self
if any changes were made, nil
otherwise.
# File 'string.c', line 8936
static VALUE rb_str_squeeze_bang(int argc, VALUE *argv, VALUE str) { char squeez[TR_TABLE_SIZE]; rb_encoding *enc = 0; VALUE del = 0, nodel = 0; unsigned char *s, *send, *t; int i, modify = 0; int ascompat, singlebyte = single_byte_optimizable(str); unsigned int save; if (argc == 0) { enc = STR_ENC_GET(str); } else { for (i=0; i<argc; i++) { VALUE s = argv[i]; StringValue(s); enc = rb_enc_check(str, s); if (singlebyte && !single_byte_optimizable(s)) singlebyte = 0; tr_setup_table(s, squeez, i==0, &del, &nodel, enc); } } str_modify_keep_cr(str); s = t = (unsigned char *)RSTRING_PTR(str); if (!s || RSTRING_LEN(str) == 0) return Qnil; send = (unsigned char *)RSTRING_END(str); save = -1; ascompat = rb_enc_asciicompat(enc); if (singlebyte) { while (s < send) { unsigned int c = *s++; if (c != save || (argc > 0 && !squeez[c])) { *t++ = save = c; } } } else { while (s < send) { unsigned int c; int clen; if (ascompat && (c = *s) < 0x80) { if (c != save || (argc > 0 && !squeez[c])) { *t++ = save = c; } s++; } else { c = rb_enc_codepoint_len((char *)s, (char *)send, &clen, enc); if (c != save || (argc > 0 && !tr_find(c, squeez, del, nodel))) { if (t != s) rb_enc_mbcput(c, t, enc); save = c; t += clen; } s += clen; } } } TERM_FILL((char *)t, TERM_LEN(str)); if ((char *)t - RSTRING_PTR(str) != RSTRING_LEN(str)) { STR_SET_LEN(str, (char *)t - RSTRING_PTR(str)); modify = 1; } if (modify) return str; return Qnil; }
#start_with?(*string_or_regexp) ⇒ Boolean
Returns whether self
starts with any of the given string_or_regexp
.
Matches patterns against the beginning of self
. For each given string_or_regexp
, the pattern is:
-
string_or_regexp
itself, if it is a::Regexp
. -
Regexp.quote(string_or_regexp)
, ifstring_or_regexp
is a string.
Returns true
if any pattern matches the beginning, false
otherwise:
'hello'.start_with?('hell') # => true
'hello'.start_with?(/H/i) # => true
'hello'.start_with?('heaven', 'hell') # => true
'hello'.start_with?('heaven', 'paradise') # => false
'тест'.start_with?('т') # => true
'こんにちは'.start_with?('こ') # => true
Related: #end_with?.
# File 'string.c', line 11201
static VALUE rb_str_start_with(int argc, VALUE *argv, VALUE str) { int i; for (i=0; i<argc; i++) { VALUE tmp = argv[i]; if (RB_TYPE_P(tmp, T_REGEXP)) { if (rb_reg_start_with_p(tmp, str)) return Qtrue; } else { const char *p, *s, *e; long slen, tlen; rb_encoding *enc; StringValue(tmp); enc = rb_enc_check(str, tmp); if ((tlen = RSTRING_LEN(tmp)) == 0) return Qtrue; if ((slen = RSTRING_LEN(str)) < tlen) continue; p = RSTRING_PTR(str); e = p + slen; s = p + tlen; if (!at_char_right_boundary(p, s, e, enc)) continue; if (memcmp(p, RSTRING_PTR(tmp), tlen) == 0) return Qtrue; } } return Qfalse; }
#strip ⇒ String
# File 'string.c', line 10563
static VALUE rb_str_strip(VALUE str) { char *start; long olen, loffset, roffset; rb_encoding *enc = STR_ENC_GET(str); RSTRING_GETMEM(str, start, olen); loffset = lstrip_offset(str, start, start+olen, enc); roffset = rstrip_offset(str, start+loffset, start+olen, enc); if (loffset <= 0 && roffset <= 0) return str_duplicate(rb_cString, str); return rb_str_subseq(str, loffset, olen-loffset-roffset); }
#strip! ⇒ self
?
# File 'string.c', line 10521
static VALUE rb_str_strip_bang(VALUE str) { char *start; long olen, loffset, roffset; rb_encoding *enc; str_modify_keep_cr(str); enc = STR_ENC_GET(str); RSTRING_GETMEM(str, start, olen); loffset = lstrip_offset(str, start, start+olen, enc); roffset = rstrip_offset(str, start+loffset, start+olen, enc); if (loffset > 0 || roffset > 0) { long len = olen-roffset; if (loffset > 0) { len -= loffset; memmove(start, start + loffset, len); } STR_SET_LEN(str, len); TERM_FILL(start+len, rb_enc_mbminlen(enc)); return str; } return Qnil; }
#sub(pattern, replacement) ⇒ String
#sub(pattern) {|match| ... } ⇒ String
String
#sub(pattern) {|match| ... } ⇒ String
# File 'string.c', line 6408
static VALUE rb_str_sub(int argc, VALUE *argv, VALUE str) { str = str_duplicate(rb_cString, str); rb_str_sub_bang(argc, argv, str); return str; }
#sub!(pattern, replacement) ⇒ self
?
#sub!(pattern) {|match| ... } ⇒ self
?
self
?
#sub!(pattern) {|match| ... } ⇒ self
?
# File 'string.c', line 6283
static VALUE rb_str_sub_bang(int argc, VALUE *argv, VALUE str) { VALUE pat, repl, hash = Qnil; int iter = 0; long plen; int min_arity = rb_block_given_p() ? 1 : 2; long beg; rb_check_arity(argc, min_arity, 2); if (argc == 1) { iter = 1; } else { repl = argv[1]; hash = rb_check_hash_type(argv[1]); if (NIL_P(hash)) { StringValue(repl); } } pat = get_pat_quoted(argv[0], 1); str_modifiable(str); beg = rb_pat_search(pat, str, 0, 1); if (beg >= 0) { rb_encoding *enc; int cr = ENC_CODERANGE(str); long beg0, end0; VALUE match, match0 = Qnil; struct re_registers *regs; char *p, *rp; long len, rlen; match = rb_backref_get(); regs = RMATCH_REGS(match); if (RB_TYPE_P(pat, T_STRING)) { beg0 = beg; end0 = beg0 + RSTRING_LEN(pat); match0 = pat; } else { beg0 = BEG(0); end0 = END(0); if (iter) match0 = rb_reg_nth_match(0, match); } if (iter || !NIL_P(hash)) { p = RSTRING_PTR(str); len = RSTRING_LEN(str); if (iter) { repl = rb_obj_as_string(rb_yield(match0)); } else { repl = rb_hash_aref(hash, rb_str_subseq(str, beg0, end0 - beg0)); repl = rb_obj_as_string(repl); } str_mod_check(str, p, len); rb_check_frozen(str); } else { repl = rb_reg_regsub(repl, str, regs, RB_TYPE_P(pat, T_STRING) ? Qnil : pat); } enc = rb_enc_compatible(str, repl); if (!enc) { rb_encoding *str_enc = STR_ENC_GET(str); p = RSTRING_PTR(str); len = RSTRING_LEN(str); if (coderange_scan(p, beg0, str_enc) != ENC_CODERANGE_7BIT || coderange_scan(p+end0, len-end0, str_enc) != ENC_CODERANGE_7BIT) { rb_raise(rb_eEncCompatError, "incompatible character encodings: %s and %s", rb_enc_inspect_name(str_enc), rb_enc_inspect_name(STR_ENC_GET(repl))); } enc = STR_ENC_GET(repl); } rb_str_modify(str); rb_enc_associate(str, enc); if (ENC_CODERANGE_UNKNOWN < cr && cr < ENC_CODERANGE_BROKEN) { int cr2 = ENC_CODERANGE(repl); if (cr2 == ENC_CODERANGE_BROKEN || (cr == ENC_CODERANGE_VALID && cr2 == ENC_CODERANGE_7BIT)) cr = ENC_CODERANGE_UNKNOWN; else cr = cr2; } plen = end0 - beg0; rlen = RSTRING_LEN(repl); len = RSTRING_LEN(str); if (rlen > plen) { RESIZE_CAPA(str, len + rlen - plen); } p = RSTRING_PTR(str); if (rlen != plen) { memmove(p + beg0 + rlen, p + beg0 + plen, len - beg0 - plen); } rp = RSTRING_PTR(repl); memmove(p + beg0, rp, rlen); len += rlen - plen; STR_SET_LEN(str, len); TERM_FILL(&RSTRING_PTR(str)[len], TERM_LEN(str)); ENC_CODERANGE_SET(str, cr); RB_GC_GUARD(match); return str; } return Qnil; }
#next ⇒ String
#succ ⇒ String
String
#succ ⇒ String
Alias for #next.
#next! ⇒ self
#succ! ⇒ self
self
#succ! ⇒ self
Alias for #next!.
#sum(n = 16) ⇒ Integer
Returns a basic n
-bit checksum of the characters in self
; the checksum is the sum of the binary value of each byte in self
, modulo 2**n - 1
:
'hello'.sum # => 532
'hello'.sum(4) # => 4
'hello'.sum(64) # => 532
'тест'.sum # => 1405
'こんにちは'.sum # => 2582
This is not a particularly strong checksum.
# File 'string.c', line 10925
static VALUE rb_str_sum(int argc, VALUE *argv, VALUE str) { int bits = 16; char *ptr, *p, *pend; long len; VALUE sum = INT2FIX(0); unsigned long sum0 = 0; if (rb_check_arity(argc, 0, 1) && (bits = NUM2INT(argv[0])) < 0) { bits = 0; } ptr = p = RSTRING_PTR(str); len = RSTRING_LEN(str); pend = p + len; while (p < pend) { if (FIXNUM_MAX - UCHAR_MAX < sum0) { sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0)); str_mod_check(str, ptr, len); sum0 = 0; } sum0 += (unsigned char)*p; p++; } if (bits == 0) { if (sum0) { sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0)); } } else { if (sum == INT2FIX(0)) { if (bits < (int)sizeof(long)*CHAR_BIT) { sum0 &= (((unsigned long)1)<<bits)-1; } sum = LONG2FIX(sum0); } else { VALUE mod; if (sum0) { sum = rb_funcall(sum, '+', 1, LONG2FIX(sum0)); } mod = rb_funcall(INT2FIX(1), idLTLT, 1, INT2FIX(bits)); mod = rb_funcall(mod, '-', 1, INT2FIX(1)); sum = rb_funcall(sum, '&', 1, mod); } } return sum; }
#swapcase(mapping) ⇒ String
Returns a string containing the characters in self
, with cases reversed; each uppercase character is downcased; each lowercase character is upcased:
s = 'Hello World!' # => "Hello World!"
s.swapcase # => "hELLO wORLD!"
The casing may be affected by the given mapping
; see Case Mapping
.
Related: #swapcase!.
# File 'string.c', line 8340
static VALUE rb_str_swapcase(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE; VALUE ret; flags = check_case_options(argc, argv, flags); enc = str_true_enc(str); if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return str_duplicate(rb_cString, str); if (flags&ONIGENC_CASE_ASCII_ONLY) { ret = rb_str_new(0, RSTRING_LEN(str)); rb_str_ascii_casemap(str, ret, &flags, enc); } else { ret = rb_str_casemap(str, &flags, enc); } return ret; }
#swapcase!(mapping) ⇒ self
?
Upcases each lowercase character in self
; downcases uppercase character; returns self
if any changes were made, nil
otherwise:
s = 'Hello World!' # => "Hello World!"
s.swapcase! # => "hELLO wORLD!"
s # => "hELLO wORLD!"
''.swapcase! # => nil
The casing may be affected by the given mapping
; see Case Mapping
.
Related: #swapcase.
# File 'string.c', line 8303
static VALUE rb_str_swapcase_bang(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_UPCASE | ONIGENC_CASE_DOWNCASE; flags = check_case_options(argc, argv, flags); str_modify_keep_cr(str); enc = str_true_enc(str); if (flags&ONIGENC_CASE_ASCII_ONLY) rb_str_ascii_casemap(str, str, &flags, enc); else str_shared_replace(str, rb_str_casemap(str, &flags, enc)); if (ONIGENC_CASE_MODIFIED&flags) return str; return Qnil; }
#to_c ⇒ Complex
Returns self
interpreted as a ::Complex
object; leading whitespace and trailing garbage are ignored:
'9'.to_c # => (9+0i)
'2.5'.to_c # => (2.5+0i)
'2.5/1'.to_c # => ((5/2)+0i)
'-3/2'.to_c # => ((-3/2)+0i)
'-i'.to_c # => (0-1i)
'45i'.to_c # => (0+45i)
'3-4i'.to_c # => (3-4i)
'-4e2-4e-2i'.to_c # => (-400.0-0.04i)
'-0.0-0.0i'.to_c # => (-0.0-0.0i)
'1/2+3/4i'.to_c # => ((1/2)+(3/4)*i)
'1.0@0'.to_c # => (1+0.0i)
"1.0@#{Math::PI/2}".to_c # => (0.0+1i)
"1.0@#{Math::PI}".to_c # => (-1+0.0i)
Returns Complex zero if the string cannot be converted:
'ruby'.to_c # => (0+0i)
See Kernel.Complex.
# File 'complex.c', line 2255
static VALUE string_to_c(VALUE self) { VALUE num; rb_must_asciicompat(self); (void)parse_comp(rb_str_fill_terminator(self, 1), FALSE, &num); return num; }
#to_f ⇒ Float
Returns the result of interpreting leading characters in self
as a ::Float
:
'3.14159'.to_f # => 3.14159
'1.234e-2'.to_f # => 0.01234
Characters past a leading valid number (in the given base
) are ignored:
'3.14 (pi to two places)'.to_f # => 3.14
Returns zero if there is no leading valid number:
'abcdef'.to_f # => 0.0
# File 'string.c', line 7161
static VALUE rb_str_to_f(VALUE str) { return DBL2NUM(rb_str_to_dbl(str, FALSE)); }
#to_i(base = 10) ⇒ Integer
Returns the result of interpreting leading characters in self
as an integer in the given base
(which must be in (0, 2..36)):
'123456'.to_i # => 123456
'123def'.to_i(16) # => 1195503
With base
zero, string object
may contain leading characters to specify the actual base:
'123def'.to_i(0) # => 123
'0123def'.to_i(0) # => 83
'0b123def'.to_i(0) # => 1
'0o123def'.to_i(0) # => 83
'0d123def'.to_i(0) # => 123
'0x123def'.to_i(0) # => 1195503
Characters past a leading valid number (in the given base
) are ignored:
'12.345'.to_i # => 12
'12345'.to_i(2) # => 1
Returns zero if there is no leading valid number:
'abcdef'.to_i # => 0
'2'.to_i(2) # => 0
# File 'string.c', line 7130
static VALUE rb_str_to_i(int argc, VALUE *argv, VALUE str) { int base = 10; if (rb_check_arity(argc, 0, 1) && (base = NUM2INT(argv[0])) < 0) { rb_raise(rb_eArgError, "invalid radix %d", base); } return rb_str_to_inum(str, base, FALSE); }
#to_r ⇒ Rational
Returns the result of interpreting leading characters in str
as a rational. Leading whitespace and extraneous characters past the end of a valid number are ignored. Digit sequences can be separated by an underscore. If there is not a valid number at the start of str
, zero is returned. This method never raises an exception.
' 2 '.to_r #=> (2/1)
'300/2'.to_r #=> (150/1)
'-9.2'.to_r #=> (-46/5)
'-9.2e2'.to_r #=> (-920/1)
'1_234_567'.to_r #=> (1234567/1)
'21 June 09'.to_r #=> (21/1)
'21/06/09'.to_r #=> (7/2)
'BWV 1079'.to_r #=> (0/1)
NOTE: “0.3”.to_r isn’t the same as 0.3.to_r. The former is equivalent to “3/10”.to_r, but the latter isn’t so.
"0.3".to_r == 3/10r #=> true
0.3.to_r == 3/10r #=> false
See also Kernel.Rational.
# File 'rational.c', line 2496
static VALUE string_to_r(VALUE self) { VALUE num; rb_must_asciicompat(self); num = parse_rat(RSTRING_PTR(self), RSTRING_END(self), 0, TRUE); if (RB_FLOAT_TYPE_P(num) && !FLOAT_ZERO_P(num)) rb_raise(rb_eFloatDomainError, "Infinity"); return num; }
#to_s ⇒ self
, String
Also known as: #to_str
Returns self
if self
is a String
, or self
converted to a String
if self
is a subclass of String
.
# File 'string.c', line 7176
static VALUE rb_str_to_s(VALUE str) { if (rb_obj_class(str) != rb_cString) { return str_duplicate(rb_cString, str); } return str; }
#to_s ⇒ self
, String
#to_str ⇒ self
, String
self
, String
#to_str ⇒ self
, String
Alias for #to_s.
Alias for #intern.
#tr(selector, replacements) ⇒ String
Returns a copy of self
with each character specified by string selector
translated to the corresponding character in string replacements
. The correspondence is positional:
-
Each occurrence of the first character specified by
selector
is translated to the first character inreplacements
. -
Each occurrence of the second character specified by
selector
is translated to the second character inreplacements
. -
And so on.
Example:
'hello'.tr('el', 'ip') #=> "hippo"
If replacements
is shorter than selector
, it is implicitly padded with its own last character:
'hello'.tr('aeiou', '-') # => "h-ll-"
'hello'.tr('aeiou', 'AA-') # => "hAll-"
Arguments selector
and replacements
must be valid character selectors (see Character Selectors
), and may use any of its valid forms, including negation, ranges, and escaping:
# Negation.
'hello'.tr('^aeiou', '-') # => "-e--o"
# Ranges.
'ibm'.tr('b-z', 'a-z') # => "hal"
# Escapes.
'hel^lo'.tr('\^aeiou', '-') # => "h-l-l-" # Escaped leading caret.
'i-b-m'.tr('b\-z', 'a-z') # => "ibabm" # Escaped embedded hyphen.
'foo\\bar'.tr('ab\\', 'XYZ') # => "fooZYXr" # Escaped backslash.
# File 'string.c', line 8743
static VALUE rb_str_tr(VALUE str, VALUE src, VALUE repl) { str = str_duplicate(rb_cString, str); tr_trans(str, src, repl, 0); return str; }
#tr!(selector, replacements) ⇒ self
?
Like #tr, but modifies self
in place. Returns self
if any changes were made, nil
otherwise.
# File 'string.c', line 8697
static VALUE rb_str_tr_bang(VALUE str, VALUE src, VALUE repl) { return tr_trans(str, src, repl, 0); }
#tr_s(selector, replacements) ⇒ String
# File 'string.c', line 9070
static VALUE rb_str_tr_s(VALUE str, VALUE src, VALUE repl) { str = str_duplicate(rb_cString, str); tr_trans(str, src, repl, 1); return str; }
#tr_s!(selector, replacements) ⇒ self
?
# File 'string.c', line 9048
static VALUE rb_str_tr_s_bang(VALUE str, VALUE src, VALUE repl) { return tr_trans(str, src, repl, 1); }
#undump ⇒ String
# File 'string.c', line 7712
static VALUE str_undump(VALUE str) { const char *s = RSTRING_PTR(str); const char *s_end = RSTRING_END(str); rb_encoding *enc = rb_enc_get(str); VALUE undumped = rb_enc_str_new(s, 0L, enc); bool utf8 = false; bool binary = false; int w; rb_must_asciicompat(str); if (rb_str_is_ascii_only_p(str) == Qfalse) { rb_raise(rb_eRuntimeError, "non-ASCII character detected"); } if (!str_null_check(str, &w)) { rb_raise(rb_eRuntimeError, "string contains null byte"); } if (RSTRING_LEN(str) < 2) goto invalid_format; if (*s != '"') goto invalid_format; /* strip '"' at the start */ s++; for (;;) { if (s >= s_end) { rb_raise(rb_eRuntimeError, "unterminated dumped string"); } if (*s == '"') { /* epilogue */ s++; if (s == s_end) { /* ascii compatible dumped string */ break; } else { static const char force_encoding_suffix[] = ".force_encoding(\""; /* "\")" */ static const char dup_suffix[] = ".dup"; const char *encname; int encidx; ptrdiff_t size; /* check separately for strings dumped by older versions */ size = sizeof(dup_suffix) - 1; if (s_end - s > size && memcmp(s, dup_suffix, size) == 0) s += size; size = sizeof(force_encoding_suffix) - 1; if (s_end - s <= size) goto invalid_format; if (memcmp(s, force_encoding_suffix, size) != 0) goto invalid_format; s += size; if (utf8) { rb_raise(rb_eRuntimeError, "dumped string contained Unicode escape but used force_encoding"); } encname = s; s = memchr(s, '"', s_end-s); size = s - encname; if (!s) goto invalid_format; if (s_end - s != 2) goto invalid_format; if (s[0] != '"' || s[1] != ')') goto invalid_format; encidx = rb_enc_find_index2(encname, (long)size); if (encidx < 0) { rb_raise(rb_eRuntimeError, "dumped string has unknown encoding name"); } rb_enc_associate_index(undumped, encidx); } break; } if (*s == '\\') { s++; if (s >= s_end) { rb_raise(rb_eRuntimeError, "invalid escape"); } undump_after_backslash(undumped, &s, s_end, &enc, &utf8, &binary); } else { rb_str_cat(undumped, s++, 1); } } RB_GC_GUARD(str); return undumped; invalid_format: rb_raise(rb_eRuntimeError, "invalid dumped string; not wrapped with '\"' nor '\"...\".force_encoding(\"...\")' form"); }
#unicode_normalize(form = :nfc) ⇒ String
Returns a copy of self
with Unicode normalization applied.
Argument form
must be one of the following symbols (see Unicode normalization forms):
-
:nfc
: Canonical decomposition, followed by canonical composition. -
:nfd
: Canonical decomposition. -
:nfkc
: Compatibility decomposition, followed by canonical composition. -
:nfkd
: Compatibility decomposition.
The encoding of self
must be one of:
-
Encoding::UTF_8
-
Encoding::UTF_16BE
-
Encoding::UTF_16LE
-
Encoding::UTF_32BE
-
Encoding::UTF_32LE
-
Encoding::GB18030
-
Encoding::UCS_2BE
-
Encoding::UCS_4BE
Examples:
"a\u0300".unicode_normalize # => "a"
"\u00E0".unicode_normalize(:nfd) # => "a "
Related: #unicode_normalize!, #unicode_normalized?.
# File 'string.c', line 11984
static VALUE rb_str_unicode_normalize(int argc, VALUE *argv, VALUE str) { return unicode_normalize_common(argc, argv, str, id_normalize); }
#unicode_normalize!(form = :nfc) ⇒ self
Like #unicode_normalize, except that the normalization is performed on self
.
Related #unicode_normalized?.
# File 'string.c', line 12000
static VALUE rb_str_unicode_normalize_bang(int argc, VALUE *argv, VALUE str) { return rb_str_replace(str, unicode_normalize_common(argc, argv, str, id_normalize)); }
#unicode_normalized?(form = :nfc) ⇒ Boolean
Returns true
if self
is in the given form
of Unicode normalization, false
otherwise. The form
must be one of :nfc
, :nfd
, :nfkc
, or :nfkd
.
Examples:
"a\u0300".unicode_normalized? # => false
"a\u0300".unicode_normalized?(:nfd) # => true
"\u00E0".unicode_normalized? # => true
"\u00E0".unicode_normalized?(:nfd) # => false
Raises an exception if self
is not in a Unicode encoding:
s = "\xE0".force_encoding(Encoding::ISO_8859_1)
s.unicode_normalized? # Raises Encoding::CompatibilityError.
Related: #unicode_normalize, #unicode_normalize!.
# File 'string.c', line 12029
static VALUE rb_str_unicode_normalized_p(int argc, VALUE *argv, VALUE str) { return unicode_normalize_common(argc, argv, str, id_normalized_p); }
#unpack(template, offset: 0, &block) ⇒ Array
Extracts data from self
.
If block
is not given, forming objects that become the elements of a new array, and returns that array. Otherwise, yields each object.
See Packed Data
.
# File 'pack.rb', line 23
def unpack(fmt, offset: 0) Primitive.attr! :use_block Primitive.pack_unpack(fmt, offset) end
#unpack1(template, offset: 0) ⇒ Object
Like #unpack, but unpacks and returns only the first extracted object. See Packed Data
.
# File 'pack.rb', line 33
def unpack1(fmt, offset: 0) Primitive.pack_unpack1(fmt, offset) end
#upcase(mapping) ⇒ String
Returns a string containing the upcased characters in self
:
s = 'Hello World!' # => "Hello World!"
s.upcase # => "HELLO WORLD!"
The casing may be affected by the given mapping
; see Case Mapping
.
Related: #upcase!, #downcase, #downcase!.
# File 'string.c', line 8090
static VALUE rb_str_upcase(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_UPCASE; VALUE ret; flags = check_case_options(argc, argv, flags); enc = str_true_enc(str); if (case_option_single_p(flags, enc, str)) { ret = rb_str_new(RSTRING_PTR(str), RSTRING_LEN(str)); str_enc_copy_direct(ret, str); upcase_single(ret); } else if (flags&ONIGENC_CASE_ASCII_ONLY) { ret = rb_str_new(0, RSTRING_LEN(str)); rb_str_ascii_casemap(str, ret, &flags, enc); } else { ret = rb_str_casemap(str, &flags, enc); } return ret; }
#upcase!(mapping) ⇒ self
?
Upcases the characters in self
; returns self
if any changes were made, nil
otherwise:
s = 'Hello World!' # => "Hello World!"
s.upcase! # => "HELLO WORLD!"
s # => "HELLO WORLD!"
s.upcase! # => nil
The casing may be affected by the given mapping
; see Case Mapping
.
Related: #upcase, #downcase, #downcase!.
# File 'string.c', line 8051
static VALUE rb_str_upcase_bang(int argc, VALUE *argv, VALUE str) { rb_encoding *enc; OnigCaseFoldType flags = ONIGENC_CASE_UPCASE; flags = check_case_options(argc, argv, flags); str_modify_keep_cr(str); enc = str_true_enc(str); if (case_option_single_p(flags, enc, str)) { if (upcase_single(str)) flags |= ONIGENC_CASE_MODIFIED; } else if (flags&ONIGENC_CASE_ASCII_ONLY) rb_str_ascii_casemap(str, str, &flags, enc); else str_shared_replace(str, rb_str_casemap(str, &flags, enc)); if (ONIGENC_CASE_MODIFIED&flags) return str; return Qnil; }
#upto(other_string, exclusive = false) {|string| ... } ⇒ self
#upto(other_string, exclusive = false) ⇒ Enumerator
self
#upto(other_string, exclusive = false) ⇒ Enumerator
With a block given, calls the block with each String
value returned by successive calls to String#succ;
the first value is self
, the next is self.succ
, and so on; the sequence terminates when value other_string
is reached; returns self
:
'a8'.upto('b6') {|s| print s, ' ' } # => "a8"
Output:
a8 a9 b0 b1 b2 b3 b4 b5 b6
If argument exclusive
is given as a truthy object, the last value is omitted:
'a8'.upto('b6', true) {|s| print s, ' ' } # => "a8"
Output:
a8 a9 b0 b1 b2 b3 b4 b5
If other_string
would not be reached, does not call the block:
'25'.upto('5') {|s| fail s }
'aa'.upto('a') {|s| fail s }
With no block given, returns a new ::Enumerator
:
'a8'.upto('b6') # => #<Enumerator: "a8":upto("b6")>
# File 'string.c', line 5555
static VALUE rb_str_upto(int argc, VALUE *argv, VALUE beg) { VALUE end, exclusive; rb_scan_args(argc, argv, "11", &end, &exclusive); RETURN_ENUMERATOR(beg, argc, argv); return rb_str_upto_each(beg, end, RTEST(exclusive), str_upto_i, Qnil); }