Class: StringScanner
| Relationships & Source Files | |
| Namespace Children | |
|
Exceptions:
| |
| Inherits: | Object |
| Defined in: | ext/strscan/strscan.c, ext/strscan/strscan.c, ext/strscan/lib/strscan/strscan.rb |
Overview
\Class StringScanner supports processing a stored string as a stream;
this code creates a new StringScanner object with string 'foobarbaz':
require 'strscan'
scanner = StringScanner.new('foobarbaz')
About the Examples
All examples here assume that StringScanner has been required:
require 'strscan'
Some examples here assume that these constants are defined:
MULTILINE_TEXT = <<~EOT
Go placidly amid the noise and haste,
and remember what peace there may be in silence.
EOT
HIRAGANA_TEXT = 'こんにちは'
ENGLISH_TEXT = 'Hello'
Some examples here assume that certain helper methods are defined:
put_situation(scanner): Displays the values of the scanner's methods #pos, #charpos, #rest, and #rest_size.put_match_values(scanner): Displays the scanner's match values.match_values_cleared?(scanner): Returns whether the scanner's match values are cleared.
See examples at helper methods.
The StringScanner \Object
This code creates a StringScanner object
(we'll call it simply a scanner),
and shows some of its basic properties:
scanner = StringScanner.new('foobarbaz')
scanner.string # => "foobarbaz"
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
The scanner has:
A stored string, which is:
- Initially set by StringScanner.new(string) to the given
string('foobarbaz'in the example above). - Modifiable by methods #string=(new_string) and #concat(more_string).
- Returned by method #string.
More at Stored String below.
- Initially set by StringScanner.new(string) to the given
A position; a zero-based index into the bytes of the stored string (not into its characters):
- Initially set by StringScanner.new to
0. - Returned by method #pos.
- Modifiable explicitly by methods #reset, #terminate, and #pos=(new_pos).
- Modifiable implicitly (various traversing methods, among others).
More at Byte Position below.
- Initially set by StringScanner.new to
A target substring, which is a trailing substring of the stored string; it extends from the current position to the end of the stored string:
- Initially set by StringScanner.new(string) to the given
string('foobarbaz'in the example above). - Returned by method #rest.
- Modified by any modification to either the stored string or the position.
Most importantly: the searching and traversing methods operate on the target substring, which may be (and often is) less than the entire stored string.
More at Target Substring below.
- Initially set by StringScanner.new(string) to the given
Stored \String
The stored string is the string stored in the StringScanner object.
Each of these methods sets, modifies, or returns the stored string:
| Method | Effect |
|---|---|
| .new(string) | Creates a new scanner for the given string. |
| #string=(new_string) | Replaces the existing stored string. |
| #concat(more_string) | Appends a string to the existing stored string. |
| #string | Returns the stored string. |
Positions
A StringScanner object maintains a zero-based byte position
and a zero-based character position.
Each of these methods explicitly sets positions:
| Method | Effect |
|---|---|
| #reset | Sets both positions to zero (beginning of stored string). |
| #terminate | Sets both positions to the end of the stored string. |
| #pos=(new_byte_position) | Sets byte position; adjusts character position. |
Byte Position (Position)
The byte position (or simply position)
is a zero-based index into the bytes in the scanner's stored string;
for a new StringScanner object, the byte position is zero.
When the byte position is:
- Zero (at the beginning), the target substring is the entire stored string.
- Equal to the size of the stored string (at the end),
the target substring is the empty string
''.
To get or set the byte position:
Many methods use the byte position as the basis for finding matches; many others set, increment, or decrement the byte position:
scanner = StringScanner.new('foobar')
scanner.pos # => 0
scanner.scan(/foo/) # => "foo" # Match found.
scanner.pos # => 3 # Byte position incremented.
scanner.scan(/foo/) # => nil # Match not found.
scanner.pos # => 3 # Byte position not changed.
Some methods implicitly modify the byte position; see:
The values of these methods are derived directly from the values of #pos and #string:
- #charpos: the character position.
- #rest: the target substring.
- #rest_size:
rest.size.
Character Position
The character position is a zero-based index into the characters
in the stored string;
for a new StringScanner object, the character position is zero.
\Method #charpos returns the character position; its value may not be reset explicitly.
Some methods change (increment or reset) the character position; see:
Example (string includes multi-byte characters):
scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters.
scanner.concat(HIRAGANA_TEXT) # Five 3-byte characters
scanner.string # => "Helloこんにちは" # Twenty bytes in all.
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "Helloこんにちは"
# rest_size: 20
scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters.
put_situation(scanner)
# Situation:
# pos: 5
# charpos: 5
# rest: "こんにちは"
# rest_size: 15
scanner.getch # => "こ" # One 3-byte character.
put_situation(scanner)
# Situation:
# pos: 8
# charpos: 6
# rest: "んにちは"
# rest_size: 12
Target Substring
The target substring is the part of the stored string that extends from the current byte position to the end of the stored string; it is always either:
- The entire stored string (byte position is zero).
- A trailing substring of the stored string (byte position positive).
The target substring is returned by method #rest, and its size is returned by method #rest_size.
Examples:
scanner = StringScanner.new('foobarbaz')
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
scanner.pos = 3
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "barbaz"
# rest_size: 6
scanner.pos = 9
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 9
# rest: ""
# rest_size: 0
Setting the Target Substring
The target substring is set whenever:
- The stored string is set (position reset to zero; target substring set to stored string).
- The byte position is set (target substring adjusted accordingly).
Querying the Target Substring
This table summarizes (details and examples at the links):
| Method | Returns |
|---|---|
| #rest | Target substring. |
| #rest_size | Size (bytes) of target substring. |
Searching the Target Substring
A search method examines the target substring, but does not advance the positions or (by implication) shorten the target substring.
This table summarizes (details and examples at the links):
| Method | Returns | Sets Match Values? |
|---|---|---|
| #check(pattern) | Matched leading substring or nil. |
Yes. |
| #check_until(pattern) | Matched substring (anywhere) or nil. |
Yes. |
| #exist?(pattern) | Matched substring (anywhere) end index. | Yes. |
| #match?(pattern) | Size of matched leading substring or nil. |
Yes. |
| #peek(size) | Leading substring of given length (bytes). | No. |
| #peek_byte | Integer leading byte or nil. |
No. |
| #rest | Target substring (from byte position to end). | No. |
Traversing the Target Substring
A traversal method examines the target substring, and, if successful:
- Advances the positions.
- Shortens the target substring.
This table summarizes (details and examples at links):
| Method | Returns | Sets Match Values? |
|---|---|---|
| #get_byte | Leading byte or nil. |
No. |
| #getch | Leading character or nil. |
No. |
| #scan(pattern) | Matched leading substring or nil. |
Yes. |
| #scan_byte | Integer leading byte or nil. |
No. |
| #scan_until(pattern) | Matched substring (anywhere) or nil. |
Yes. |
| #skip(pattern) | Matched leading substring size or nil. |
Yes. |
| #skip_until(pattern) | Position delta to end-of-matched-substring or nil. |
Yes. |
| #unscan | self. |
No. |
Querying the Scanner
Each of these methods queries the scanner object without modifying it (details and examples at links)
| Method | Returns |
|---|---|
| #beginning_of_line? | true or false. |
| #charpos | Character position. |
| #eos? | true or false. |
| #fixed_anchor? | true or false. |
| #inspect | String representation of self. |
| #pos | Byte position. |
| #rest | Target substring. |
| #rest_size | Size of target substring. |
| #string | Stored string. |
Matching
StringScanner implements pattern matching via Ruby class Regexp,
and its matching behaviors are the same as Ruby's
except for the fixed-anchor property.
Matcher Methods
Each matcher method takes a single argument pattern,
and attempts to find a matching substring in the target substring.
| Method | Pattern Type | Matches Target Substring | Success Return | May Update Positions? |
|---|---|---|---|---|
| #check | Regexp or String. | At beginning. | Matched substring. | No. |
| #check_until | Regexp or String. | Anywhere. | Substring. | No. |
| #match? | Regexp or String. | At beginning. | Match size. | No. |
| #exist? | Regexp or String. | Anywhere. | Substring size. | No. |
| #scan | Regexp or String. | At beginning. | Matched substring. | Yes. |
| #scan_until | Regexp or String. | Anywhere. | Substring. | Yes. |
| #skip | Regexp or String. | At beginning. | Match size. | Yes. |
| #skip_until | Regexp or String. | Anywhere. | Substring size. | Yes. |
Which matcher you choose will depend on:
Where you want to find a match:
- Only at the beginning of the target substring: #check, #match?, #scan, #skip.
- Anywhere in the target substring: #check_until, #exist?, #scan_until, #skip_until.
Whether you want to:
- Traverse, by advancing the positions: #scan, #scan_until, #skip, #skip_until.
- Keep the positions unchanged: #check, #check_until, #match?, #exist?.
What you want for the return value:
- The matched substring: #check, #scan.
- The substring: #check_until, #scan_until.
- The match size: #match?, #skip.
- The substring size: #exist?, #skip_until.
Match Values
The match values in a StringScanner object
generally contain the results of the most recent attempted match.
Each match value may be thought of as:
- Clear: Initially, or after an unsuccessful match attempt:
usually,
false,nil, or{}. - Set: After a successful match attempt:
true, string, array, or hash.
Each of these methods clears match values:
- .new(string).
- #reset.
- #terminate.
Each of these methods attempts a match based on a pattern, and either sets match values (if successful) or clears them (if not);
- #check(pattern)
- #check_until(pattern)
- #exist?(pattern)
- #match?(pattern)
- #scan(pattern)
- #scan_until(pattern)
- #skip(pattern)
- #skip_until(pattern)
Basic Match Values
Basic match values are those not related to captures.
Each of these methods returns a basic match value:
| Method | Return After Match | Return After No Match |
|---|---|---|
| #matched? | true. |
false. |
| #matched_size | Size of matched substring. | nil. |
| #matched | Matched substring. | nil. |
| #pre_match | Substring preceding matched substring. | nil. |
| #post_match | Substring following matched substring. | nil. |
See examples below.
Captured Match Values
Captured match values are those related to captures.
Each of these methods returns a captured match value:
| Method | Return After Match | Return After No Match |
|---|---|---|
| #size | Count of captured substrings. | nil. |
| #[](n) | nth captured substring. | nil. |
| #captures | Array of all captured substrings. | nil. |
| #values_at(*n) | Array of specified captured substrings. | nil. |
| #named_captures | Hash of named captures. | {}. |
See examples below.
Match Values Examples
Successful basic match attempt (no captures):
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "foo"
# matched : "bar"
# post_match: "baz"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["bar", nil]
# []:
# [0]: "bar"
# [1]: nil
Failed basic match attempt (no captures);
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/nope/)
match_values_cleared?(scanner) # => true
Successful unnamed capture match attempt:
scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 15
# pre_match: ""
# matched : "foobarbazbatbam"
# post_match: ""
# Captured match values:
# size: 4
# captures: ["foo", "baz", "bam"]
# named_captures: {}
# values_at: ["foobarbazbatbam", "foo", "baz", "bam", nil]
# []:
# [0]: "foobarbazbatbam"
# [1]: "foo"
# [2]: "baz"
# [3]: "bam"
# [4]: nil
Successful named capture match attempt; same as unnamed above, except for #named_captures:
scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}
Failed unnamed capture match attempt:
scanner = StringScanner.new('somestring')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
match_values_cleared?(scanner) # => true
Failed named capture match attempt; same as unnamed above, except for #named_captures:
scanner = StringScanner.new('somestring')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
match_values_cleared?(scanner) # => false
scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}
Fixed-Anchor Property
Pattern matching in StringScanner is the same as in Ruby's,
except for its fixed-anchor property,
which determines the meaning of '\A':
false(the default): matches the current byte position.scanner = StringScanner.new('foobar') scanner.scan(/\A./) # => "f" scanner.scan(/\A./) # => "o" scanner.scan(/\A./) # => "o" scanner.scan(/\A./) # => "b"true: matches the beginning of the target substring; never matches unless the byte position is zero:scanner = StringScanner.new('foobar', fixed_anchor: true) scanner.scan(/\A./) # => "f" scanner.scan(/\A./) # => nil scanner.reset scanner.scan(/\A./) # => "f"
The fixed-anchor property is set when the StringScanner object is created,
and may not be modified
(see .new);
method #fixed_anchor? returns the setting.
Class Method Summary
-
.new(string, fixed_anchor: false) ⇒ string_scanner
constructor
private
Returns a new
StringScannerobject whose stored string is the given #string; sets the fixed-anchor property: - .must_C_version Internal use only
Instance Attribute Summary
-
#beginning_of_line? ⇒ Boolean
readonly
Returns whether the position is at the beginning of a line; that is, at the beginning of the stored string or immediately after a newline:
-
#eos? ⇒ Boolean
readonly
Returns whether the position is at the end of the stored string:
-
#fixed_anchor? ⇒ Boolean
readonly
Returns whether the fixed-anchor property is set.
-
#matched ⇒ matched_substring?
readonly
Returns the matched substring from the most recent match attempt if it was successful, or
nilotherwise; see Basic Matched Values: -
#matched? ⇒ Boolean
readonly
Returns
trueof the most recent match attempt was successful,falseotherwise; see Basic Matched Values: -
#pointer ⇒ byte_position
rw
Alias for #pos.
-
#pos ⇒ byte_position
(also: #pointer)
rw
Returns the integer byte position, which may be different from the character position:
-
#pos=(n) ⇒ n
(also: #pointer=)
rw
Sets the byte position and the character position; returns
n. -
#rest ⇒ target_substring
readonly
Returns the 'rest' of the stored string (all after the current position), which is the target substring:
-
#string ⇒ stored_string
rw
Returns the stored string:
-
#string=(other_string) ⇒ other_string
rw
Replaces the stored string with the given
other_string: - #rest? ⇒ Boolean readonly Internal use only
Instance Method Summary
-
#<<(more_string) ⇒ self
(also: #concat)
Appends the given
more_stringto the stored string. -
#[](specifier) ⇒ substring?
Returns a captured substring or
nil; see Captured Match Values. -
#captures ⇒ substring_array?
Returns the array of captured match values at indexes
(1..)if the most recent match attempt succeeded, ornilotherwise: -
#charpos ⇒ character_position
Returns the character position (initially zero), which may be different from the byte position given by method #pos:
-
#check(pattern) ⇒ matched_substring?
Attempts to match the given
patternat the beginning of the target substring; does not modify the positions. -
#check_until(pattern) ⇒ substring?
Attempts to match the given
patternanywhere (at any position) in the target substring; does not modify the positions. -
#concat(more_string) ⇒ self
Alias for #<<.
-
#exist?(pattern) ⇒ byte_offset?
Attempts to match the given
patternanywhere (at any position) n the target substring; does not modify the positions. -
#get_byte ⇒ byte_as_character?
Returns the next byte, if available:
-
#getch ⇒ character?
Returns the next (possibly multibyte) character, if available:
-
#inspect ⇒ String
Returns a string representation of
selfthat may show: -
#match?(pattern) ⇒ updated_position?
Attempts to match the given
patternat the beginning of the target substring; does not modify the positions. -
#matched_size ⇒ substring_size?
Returns the size (in bytes) of the matched substring from the most recent match match attempt if it was successful, or
nilotherwise; see Basic Matched Values: -
#named_captures ⇒ Hash
Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise; see Captured Match Values:
-
#peek(length) ⇒ substring
Returns the substring
string[pos, length]; does not update match values or positions: -
#peek_byte
Peeks at the current byte and returns it as an integer.
-
#post_match ⇒ substring
Returns the substring that follows the matched substring from the most recent match attempt if it was successful, or
nilotherwise; see Basic Match Values: -
#pre_match ⇒ substring
Returns the substring that precedes the matched substring from the most recent match attempt if it was successful, or
nilotherwise; see Basic Match Values: -
#reset ⇒ self
Sets both byte position and character position to zero, and clears match values; returns
self: -
#rest_size ⇒ Integer
Returns the size (in bytes) of the #rest of the stored string:
-
#scan(pattern) ⇒ substring?
Attempts to match the given
patternat the beginning of the target substring. -
#scan_byte ⇒ integer_byte
Scans one byte and returns it as an integer.
-
#scan_integer(base: 10)
If
baseisn’t provided or is10, then it is equivalent to calling #scan with a[+-]?d+pattern, and returns an Integer or nil. -
#scan_until(pattern) ⇒ substring?
Attempts to match the given
patternanywhere (at any position) in the target substring. -
#size ⇒ captures_count
Returns the count of captures if the most recent match attempt succeeded,
nilotherwise; see Captures Match Values: -
#skip(pattern) match_size or nil)
Attempts to match the given
patternat the beginning of the target substring; -
#skip_until(pattern) ⇒ matched_substring_size?
Attempts to match the given
patternanywhere (at any position) in the target substring; does not modify the positions. -
#terminate ⇒ self
Sets the scanner to end-of-string; returns
self: - #unscan ⇒ self
-
#values_at(*specifiers) ⇒ array_of_captures?
Returns an array of captured substrings, or
nilof none. -
#dup ⇒ shallow_copy
private
Returns a shallow copy of
self; the stored string in the copy is the same string as inself. - #scan_full(re, s, f) Internal use only
- #search_full(re, s, f) Internal use only
- #scan_base10_integer private Internal use only
- #scan_base16_integer private Internal use only
Constructor Details
.new(string, fixed_anchor: false) ⇒ string_scanner (private)
Returns a new StringScanner object whose stored string
is the given #string;
sets the fixed-anchor property:
scanner = StringScanner.new('foobarbaz')
scanner.string # => "foobarbaz"
scanner.fixed_anchor? # => false
put_situation(scanner)
#### Situation:
#### pos: 0
#### charpos: 0
#### rest: "foobarbaz"
#### rest_size: 9
# File 'ext/strscan/strscan.c', line 254
static VALUE
strscan_initialize(int argc, VALUE *argv, VALUE self)
{
struct strscanner *p;
VALUE str, options;
p = check_strscan(self);
rb_scan_args(argc, argv, "11", &str, &options);
options = rb_check_hash_type(options);
if (!NIL_P(options)) {
VALUE fixed_anchor;
ID keyword_ids[1];
keyword_ids[0] = rb_intern("fixed_anchor");
rb_get_kwargs(options, keyword_ids, 0, 1, &fixed_anchor);
if (fixed_anchor == Qundef) {
p->fixed_anchor_p = false;
}
else {
p->fixed_anchor_p = RTEST(fixed_anchor);
}
}
else {
p->fixed_anchor_p = false;
}
StringValue(str);
RB_OBJ_WRITE(self, &p->str, str);
return self;
}
Class Method Details
.must_C_version
# File 'ext/strscan/strscan.c', line 332
static VALUE
strscan_s_mustc(VALUE self)
{
return self;
}
Instance Attribute Details
#beginning_of_line? ⇒ Boolean (readonly)
Returns whether the position is at the beginning of a line; that is, at the beginning of the stored string or immediately after a newline:
scanner = StringScanner.new(MULTILINE_TEXT)
scanner.string
# => "Go placidly amid the noise and haste,\nand remember what peace there may be in silence.\n"
scanner.pos # => 0
scanner.beginning_of_line? # => true
scanner.scan_until(/,/) # => "Go placidly amid the noise and haste,"
scanner.beginning_of_line? # => false
scanner.scan(/\n/) # => "\n"
scanner.beginning_of_line? # => true
scanner.terminate
scanner.beginning_of_line? # => true
scanner.concat('x')
scanner.terminate
scanner.beginning_of_line? # => false
StringScanner#bol? is an alias for beginning_of_line?.
# File 'ext/strscan/strscan.c', line 1445
static VALUE
strscan_bol_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (CURPTR(p) > S_PEND(p)) return Qnil;
if (p->curr == 0) return Qtrue;
return (*(CURPTR(p) - 1) == '\n') ? Qtrue : Qfalse;
}
#eos? ⇒ Boolean (readonly)
Returns whether the position is at the end of the stored string:
scanner = StringScanner.new('foobarbaz')
scanner.eos? # => false
pos = 3
scanner.eos? # => false
scanner.terminate
scanner.eos? # => true
# File 'ext/strscan/strscan.c', line 1476
static VALUE
strscan_eos_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return EOS_P(p) ? Qtrue : Qfalse;
}
#fixed_anchor? ⇒ Boolean (readonly)
Returns whether the fixed-anchor property is set.
# File 'ext/strscan/strscan.c', line 2112
static VALUE
strscan_fixed_anchor_p(VALUE self)
{
struct strscanner *p;
p = check_strscan(self);
return p->fixed_anchor_p ? Qtrue : Qfalse;
}
#matched ⇒ matched_substring? (readonly)
Returns the matched substring from the most recent match attempt
if it was successful,
or nil otherwise;
see Basic Matched Values:
scanner = StringScanner.new('foobarbaz')
scanner.matched # => nil
scanner.pos = 3
scanner.match?(/bar/) # => 3
scanner.matched # => "bar"
scanner.match?(/nope/) # => nil
scanner.matched # => nil
# File 'ext/strscan/strscan.c', line 1561
static VALUE
strscan_matched(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return extract_range(p,
adjust_register_position(p, p->regs.beg[0]),
adjust_register_position(p, p->regs.end[0]));
}
#matched? ⇒ Boolean (readonly)
Returns true of the most recent match attempt was successful,
false otherwise;
see Basic Matched Values:
scanner = StringScanner.new('foobarbaz')
scanner.matched? # => false
scanner.pos = 3
scanner.exist?(/baz/) # => 6
scanner.matched? # => true
scanner.exist?(/nope/) # => nil
scanner.matched? # => false
# File 'ext/strscan/strscan.c', line 1529
static VALUE
strscan_matched_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return MATCHED_P(p) ? Qtrue : Qfalse;
}
#pos ⇒ byte_position (rw)
#pointer ⇒ byte_position
byte_position (rw)
#pointer ⇒ byte_position
Alias for #pos.
#pos ⇒ byte_position (rw) Also known as: #pointer
Returns the integer byte position, which may be different from the character position:
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos # => 0
scanner.getch # => "こ" # 3-byte character.
scanner.charpos # => 1
scanner.pos # => 3
# File 'ext/strscan/strscan.c', line 509
static VALUE
strscan_get_pos(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return LONG2NUM(p->curr);
}
#pos=(n) ⇒ n (rw)
#pointer=(n) ⇒ n
Also known as: #pointer=
n (rw)
#pointer=(n) ⇒ n
Sets the byte position and the character position;
returns n.
Does not affect match values.
For non-negative n, sets the position to n:
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 3 # => 3
scanner.rest # => "んにちは"
scanner.charpos # => 1
For negative n, counts from the end of the stored string:
scanner.pos = -9 # => -9
scanner.pos # => 6
scanner.rest # => "にちは"
scanner.charpos # => 2
# File 'ext/strscan/strscan.c', line 538
static VALUE
strscan_set_pos(VALUE self, VALUE v)
{
struct strscanner *p;
long i;
GET_SCANNER(self, p);
i = NUM2LONG(v);
if (i < 0) i += S_LEN(p);
if (i < 0) rb_raise(rb_eRangeError, "index out of range");
if (i > S_LEN(p)) rb_raise(rb_eRangeError, "index out of range");
p->curr = i;
return LONG2NUM(i);
}
#rest ⇒ target_substring (readonly)
Returns the 'rest' of the stored string (all after the current position), which is the target substring:
scanner = StringScanner.new('foobarbaz')
scanner.rest # => "foobarbaz"
scanner.pos = 3
scanner.rest # => "barbaz"
scanner.terminate
scanner.rest # => ""
# File 'ext/strscan/strscan.c', line 1949
static VALUE
strscan_rest(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (EOS_P(p)) {
return str_new(p, "", 0);
}
return extract_range(p, p->curr, S_LEN(p));
}
#rest? ⇒ Boolean (readonly)
# File 'ext/strscan/strscan.c', line 1498
static VALUE
strscan_rest_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return EOS_P(p) ? Qfalse : Qtrue;
}
#string ⇒ stored_string (rw)
Returns the stored string:
scanner = StringScanner.new('foobar')
scanner.string # => "foobar"
scanner.concat('baz')
scanner.string # => "foobarbaz"
# File 'ext/strscan/strscan.c', line 408
static VALUE
strscan_get_string(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return p->str;
}
#string=(other_string) ⇒ other_string (rw)
Replaces the stored string with the given other_string:
- Sets both positions to zero.
- Clears match values.
- Returns
other_string.
scanner = StringScanner.new('foobar')
scanner.scan(/foo/)
put_situation(scanner)
#### Situation:
#### pos: 3
#### charpos: 3
#### rest: "bar"
#### rest_size: 3
match_values_cleared?(scanner) # => false
scanner.string = 'baz' # => "baz"
put_situation(scanner)
#### Situation:
#### pos: 0
#### charpos: 0
#### rest: "baz"
#### rest_size: 3
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 452
static VALUE
strscan_set_string(VALUE self, VALUE str)
{
struct strscanner *p = check_strscan(self);
StringValue(str);
RB_OBJ_WRITE(self, &p->str, str);
p->curr = 0;
CLEAR_MATCH_STATUS(p);
return str;
}
Instance Method Details
#<<(more_string) ⇒ self Also known as: #concat
- Appends the given
more_stringto the stored string. - Returns
self. - Does not affect the positions or match values.
scanner = StringScanner.new('foo')
scanner.string # => "foo"
scanner.terminate
scanner.concat('barbaz') # => #<StringScanner 3/9 "foo" @ "barba...">
scanner.string # => "foobarbaz"
put_situation(scanner)
#### Situation:
#### pos: 3
#### charpos: 3
#### rest: "barbaz"
#### rest_size: 6
# File 'ext/strscan/strscan.c', line 493
static VALUE
strscan_concat(VALUE self, VALUE str)
{
struct strscanner *p;
GET_SCANNER(self, p);
StringValue(str);
rb_str_append(p->str, str);
return self;
}
#[](specifier) ⇒ substring?
Returns a captured substring or nil;
see Captured Match Values.
When there are captures:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.scan(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
specifierzero: returns the entire matched substring:scanner[0] # => "Fri Dec 12 " scanner.pre_match # => "" scanner.post_match # => "1975 14:39"specifierpositive integer. returns thenth capture, ornilif out of range:scanner[1] # => "Fri" scanner[2] # => "Dec" scanner[3] # => "12" scanner[4] # => nilspecifiernegative integer. counts backward from the last subgroup:scanner[-1] # => "12" scanner[-4] # => "Fri Dec 12 " scanner[-5] # => nilspecifiersymbol or string. returns the named subgroup, ornilif no such:scanner[:wday] # => "Fri" scanner['wday'] # => "Fri" scanner[:month] # => "Dec" scanner[:day] # => "12" scanner[:nope] # => nil
When there are no captures, only [0] returns non-nil:
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
scanner[0] # => "bar"
scanner[1] # => nil
For a failed match, even [0] returns nil:
scanner.scan(/nope/) # => nil
scanner[0] # => nil
scanner[1] # => nil
# File 'ext/strscan/strscan.c', line 1695
static VALUE
strscan_aref(VALUE self, VALUE idx)
{
const char *name;
struct strscanner *p;
long i;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
switch (TYPE(idx)) {
case T_SYMBOL:
idx = rb_sym2str(idx);
/* fall through */
case T_STRING:
RSTRING_GETMEM(idx, name, i);
i = name_to_backref_number(&(p->regs), p->regex, name, name + i, rb_enc_get(idx));
break;
default:
i = NUM2LONG(idx);
}
if (i < 0)
i += p->regs.num_regs;
if (i < 0) return Qnil;
if (i >= p->regs.num_regs) return Qnil;
if (p->regs.beg[i] == -1) return Qnil;
return extract_range(p,
adjust_register_position(p, p->regs.beg[i]),
adjust_register_position(p, p->regs.end[i]));
}
#captures ⇒ substring_array?
Returns the array of captured match values at indexes (1..)
if the most recent match attempt succeeded, or nil otherwise:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.captures # => nil
scanner.exist?(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
scanner.captures # => ["Fri", "Dec", "12"]
scanner.values_at(*0..4) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil]
scanner.exist?(/Fri/)
scanner.captures # => []
scanner.scan(/nope/)
scanner.captures # => nil
# File 'ext/strscan/strscan.c', line 1788
static VALUE
strscan_captures(VALUE self)
{
struct strscanner *p;
int i, num_regs;
VALUE new_ary;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
num_regs = p->regs.num_regs;
new_ary = rb_ary_new2(num_regs);
for (i = 1; i < num_regs; i++) {
VALUE str;
if (p->regs.beg[i] == -1)
str = Qnil;
else
str = extract_range(p,
adjust_register_position(p, p->regs.beg[i]),
adjust_register_position(p, p->regs.end[i]));
rb_ary_push(new_ary, str);
}
return new_ary;
}
#charpos ⇒ character_position
Returns the character position (initially zero), which may be different from the byte position given by method #pos:
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.getch # => "こ" # 3-byte character.
scanner.getch # => "ん" # 3-byte character.
put_situation(scanner)
# Situation:
# pos: 6
# charpos: 2
# rest: "にちは"
# rest_size: 9
# File 'ext/strscan/strscan.c', line 523
static VALUE
strscan_get_charpos(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return LONG2NUM(rb_enc_strlen(S_PBEG(p), CURPTR(p), rb_enc_get(p->str)));
}
#check(pattern) ⇒ matched_substring?
Attempts to match the given pattern
at the beginning of the target substring;
does not modify the positions.
If the match succeeds:
- Returns the matched substring.
- Sets all match values.
scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.check('bar') # => "bar"
put_match_values(scanner)
#### Basic match values:
#### matched?: true
#### matched_size: 3
#### pre_match: "foo"
#### matched : "bar"
#### post_match: "baz"
#### Captured match values:
#### size: 1
#### captures: []
#### named_captures: {}
#### values_at: ["bar", nil]
#### []:
#### [0]: "bar"
#### [1]: nil
#### => 0..1
put_situation(scanner)
#### Situation:
#### pos: 3
#### charpos: 3
#### rest: "barbaz"
#### rest_size: 6
If the match fails:
- Returns
nil. - Clears all match values.
scanner.check(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 896
static VALUE
strscan_check(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 1, 1);
}
#check_until(pattern) ⇒ substring?
Attempts to match the given pattern
anywhere (at any position)
in the target substring;
does not modify the positions.
If the match succeeds:
- Sets all match values.
- Returns the matched substring, which extends from the current position to the end of the matched substring.
scanner = StringScanner.new('foobarbazbatbam')
scanner.pos = 6
scanner.check_until(/bat/) # => "bazbat"
put_match_values(scanner)
#### Basic match values:
#### matched?: true
#### matched_size: 3
#### pre_match: "foobarbaz"
#### matched : "bat"
#### post_match: "bam"
#### Captured match values:
#### size: 1
#### captures: []
#### named_captures: {}
#### values_at: ["bat", nil]
#### []:
#### [0]: "bat"
#### [1]: nil
put_situation(scanner)
#### Situation:
#### pos: 6
#### charpos: 6
#### rest: "bazbatbam"
#### rest_size: 9
If the match fails:
- Clears all match values.
- Returns
nil.
scanner.check_until(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 1069
static VALUE
strscan_check_until(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 1, 0);
}
#<<(more_string) ⇒ self
#concat(more_string) ⇒ self
self
#concat(more_string) ⇒ self
Alias for #<<.
#exist?(pattern) ⇒ byte_offset?
Attempts to match the given pattern
anywhere (at any position)
n the target substring;
does not modify the positions.
If the match succeeds:
- Returns a byte offset: the distance in bytes between the current position and the end of the matched substring.
- Sets all match values.
scanner = StringScanner.new('foobarbazbatbam')
scanner.pos = 6
scanner.exist?(/bat/) # => 6
put_match_values(scanner)
#### Basic match values:
#### matched?: true
#### matched_size: 3
#### pre_match: "foobarbaz"
#### matched : "bat"
#### post_match: "bam"
#### Captured match values:
#### size: 1
#### captures: []
#### named_captures: {}
#### values_at: ["bat", nil]
#### []:
#### [0]: "bat"
#### [1]: nil
put_situation(scanner)
#### Situation:
#### pos: 6
#### charpos: 6
#### rest: "bazbatbam"
#### rest_size: 9
If the match fails:
- Returns
nil. - Clears all match values.
scanner.exist?(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 995
static VALUE
strscan_exist_p(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 0, 0);
}
#get_byte ⇒ byte_as_character?
Returns the next byte, if available:
If the position is not at the end of the stored string:
- Returns the next byte.
- Increments the byte position.
- Adjusts the character position.
scanner = StringScanner.new(HIRAGANA_TEXT) # => #<StringScanner 0/15 @ "\xE3\x81\x93\xE3\x82..."> scanner.string # => "こんにちは" [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 1, 1] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 4, 2] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x82", 5, 3] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 6, 2]Otherwise, returns
nil, and does not change the positions.scanner.terminate [scanner.get_byte, scanner.pos, scanner.charpos] # => [nil, 15, 5]
# File 'ext/strscan/strscan.c', line 1190
static VALUE
strscan_get_byte(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
if (EOS_P(p))
return Qnil;
p->prev = p->curr;
p->curr++;
MATCHED(p);
adjust_registers_to_matched(p);
return extract_range(p,
adjust_register_position(p, p->regs.beg[0]),
adjust_register_position(p, p->regs.end[0]));
}
#getch ⇒ character?
Returns the next (possibly multibyte) character, if available:
If the position is at the beginning of a character:
- Returns the character.
- Increments the character position by 1.
- Increments the byte position by the size (in bytes) of the character.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" [scanner.getch, scanner.pos, scanner.charpos] # => ["こ", 3, 1] [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2] [scanner.getch, scanner.pos, scanner.charpos] # => ["に", 9, 3] [scanner.getch, scanner.pos, scanner.charpos] # => ["ち", 12, 4] [scanner.getch, scanner.pos, scanner.charpos] # => ["は", 15, 5] [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]If the position is within a multi-byte character (that is, not at its beginning), behaves like #get_byte (returns a 1-byte character):
scanner.pos = 1 [scanner.getch, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] [scanner.getch, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]If the position is at the end of the stored string, returns
niland does not modify the positions:scanner.terminate [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
# File 'ext/strscan/strscan.c', line 1117
static VALUE
strscan_getch(VALUE self)
{
struct strscanner *p;
long len;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
if (EOS_P(p))
return Qnil;
len = rb_enc_mbclen(CURPTR(p), S_PEND(p), rb_enc_get(p->str));
len = minl(len, S_RESTLEN(p));
p->prev = p->curr;
p->curr += len;
MATCHED(p);
adjust_registers_to_matched(p);
return extract_range(p,
adjust_register_position(p, p->regs.beg[0]),
adjust_register_position(p, p->regs.end[0]));
}
#dup ⇒ shallow_copy (private)
Returns a shallow copy of self;
the stored string in the copy is the same string as in self.
# File 'ext/strscan/strscan.c', line 300
static VALUE
strscan_init_copy(VALUE vself, VALUE vorig)
{
struct strscanner *self, *orig;
self = check_strscan(vself);
orig = check_strscan(vorig);
if (self != orig) {
self->flags = orig->flags;
RB_OBJ_WRITE(vself, &self->str, orig->str);
self->prev = orig->prev;
self->curr = orig->curr;
if (rb_reg_region_copy(&self->regs, &orig->regs))
rb_memerror();
RB_GC_GUARD(vorig);
}
return vself;
}
#inspect ⇒ String
Returns a string representation of self that may show:
- The current position.
- The size (in bytes) of the stored string.
- The substring preceding the current position.
- The substring following the current position (which is also the target substring).
scanner = StringScanner.new("Fri Dec 12 1975 14:39")
scanner.pos = 11
scanner.inspect # => "#<StringScanner 11/21 \"...c 12 \" @ \"1975 ...\">"
If at beginning-of-string, item 4 above (following substring) is omitted:
scanner.reset
scanner.inspect # => "#<StringScanner 0/21 @ \"Fri D...\">"
If at end-of-string, all items above are omitted:
scanner.terminate
scanner.inspect # => "#<StringScanner fin>"
# File 'ext/strscan/strscan.c', line 2034
static VALUE
strscan_inspect(VALUE self)
{
struct strscanner *p;
VALUE a, b;
p = check_strscan(self);
if (NIL_P(p->str)) {
a = rb_sprintf("#<%"PRIsVALUE" (uninitialized)>", rb_obj_class(self));
return a;
}
if (EOS_P(p)) {
a = rb_sprintf("#<%"PRIsVALUE" fin>", rb_obj_class(self));
return a;
}
if (p->curr == 0) {
b = inspect2(p);
a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld @ %"PRIsVALUE">",
rb_obj_class(self),
p->curr, S_LEN(p),
b);
return a;
}
a = inspect1(p);
b = inspect2(p);
a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld %"PRIsVALUE" @ %"PRIsVALUE">",
rb_obj_class(self),
p->curr, S_LEN(p),
a, b);
return a;
}
#match?(pattern) ⇒ updated_position?
Attempts to match the given pattern
at the beginning of the target substring;
does not modify the positions.
If the match succeeds:
- Sets match values.
- Returns the size in bytes of the matched substring.
scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.match?(/bar/) => 3
put_match_values(scanner)
#### Basic match values:
#### matched?: true
#### matched_size: 3
#### pre_match: "foo"
#### matched : "bar"
#### post_match: "baz"
#### Captured match values:
#### size: 1
#### captures: []
#### named_captures: {}
#### values_at: ["bar", nil]
#### []:
#### [0]: "bar"
#### [1]: nil
put_situation(scanner)
#### Situation:
#### pos: 3
#### charpos: 3
#### rest: "barbaz"
#### rest_size: 6
If the match fails:
- Clears match values.
- Returns
nil. - Does not increment positions.
scanner.match?(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 824
static VALUE
strscan_match_p(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 0, 1);
}
#matched_size ⇒ substring_size?
Returns the size (in bytes) of the matched substring
from the most recent match match attempt if it was successful,
or nil otherwise;
see Basic Matched Values:
scanner = StringScanner.new('foobarbaz')
scanner.matched_size # => nil
pos = 3
scanner.exist?(/baz/) # => 9
scanner.matched_size # => 3
scanner.exist?(/nope/) # => nil
scanner.matched_size # => nil
# File 'ext/strscan/strscan.c', line 1598
static VALUE
strscan_matched_size(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return LONG2NUM(p->regs.end[0] - p->regs.beg[0]);
}
#named_captures ⇒ Hash
Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise; see Captured Match Values:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.named_captures # => {}
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.named_captures # => {"wday"=>"Fri", "month"=>"Dec", "day"=>"12"}
scanner.string = 'nope'
scanner.match?(pattern)
scanner.named_captures # => {"wday"=>nil, "month"=>nil, "day"=>nil}
scanner.match?(/nosuch/)
scanner.named_captures # => {}
# File 'ext/strscan/strscan.c', line 2176
static VALUE
strscan_named_captures(VALUE self)
{
struct strscanner *p;
named_captures_data data;
GET_SCANNER(self, p);
data.self = self;
data.captures = rb_hash_new();
if (!RB_NIL_P(p->regex)) {
onig_foreach_name(RREGEXP_PTR(p->regex), named_captures_iter, &data);
}
return data.captures;
}
#peek(length) ⇒ substring
Returns the substring string[pos, length];
does not update match values or positions:
scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.peek(3) # => "bar"
scanner.terminate
scanner.peek(3) # => ""
# File 'ext/strscan/strscan.c', line 1228
static VALUE
strscan_peek(VALUE self, VALUE vlen)
{
struct strscanner *p;
long len;
GET_SCANNER(self, p);
len = NUM2LONG(vlen);
if (EOS_P(p))
return str_new(p, "", 0);
len = minl(len, S_RESTLEN(p));
return extract_beg_len(p, p->curr, len);
}
#peek_byte
Peeks at the current byte and returns it as an integer.
s = StringScanner.new('ab')
s.peek_byte # => 97
# File 'ext/strscan/strscan.c', line 1173
static VALUE
strscan_peek_byte(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (EOS_P(p))
return Qnil;
return INT2FIX((unsigned char)*CURPTR(p));
}
#post_match ⇒ substring
Returns the substring that follows the matched substring
from the most recent match attempt if it was successful,
or nil otherwise;
see Basic Match Values:
scanner = StringScanner.new('foobarbaz')
scanner.post_match # => nil
scanner.pos = 3
scanner.match?(/bar/) # => 3
scanner.post_match # => "baz"
scanner.match?(/nope/) # => nil
scanner.post_match # => nil
# File 'ext/strscan/strscan.c', line 1917
static VALUE
strscan_post_match(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return extract_range(p,
adjust_register_position(p, p->regs.end[0]),
S_LEN(p));
}
#pre_match ⇒ substring
Returns the substring that precedes the matched substring
from the most recent match attempt if it was successful,
or nil otherwise;
see Basic Match Values:
scanner = StringScanner.new('foobarbaz')
scanner.pre_match # => nil
scanner.pos = 3
scanner.exist?(/baz/) # => 6
scanner.pre_match # => "foobar" # Substring of entire string, not just target string.
scanner.exist?(/nope/) # => nil
scanner.pre_match # => nil
# File 'ext/strscan/strscan.c', line 1880
static VALUE
strscan_pre_match(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return extract_range(p,
0,
adjust_register_position(p, p->regs.beg[0]));
}
#reset ⇒ self
Sets both byte position and character position to zero,
and clears match values;
returns self:
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/) # => 6
scanner.reset # => #<StringScanner 0/9 @ "fooba...">
put_situation(scanner)
#### Situation:
#### pos: 0
#### charpos: 0
#### rest: "foobarbaz"
#### rest_size: 9
#### => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 364
static VALUE
strscan_reset(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
p->curr = 0;
CLEAR_MATCH_STATUS(p);
return self;
}
#rest_size ⇒ Integer
# File 'ext/strscan/strscan.c', line 1983
static VALUE
strscan_rest_size(VALUE self)
{
struct strscanner *p;
long i;
GET_SCANNER(self, p);
if (EOS_P(p)) {
return INT2FIX(0);
}
i = S_RESTLEN(p);
return INT2FIX(i);
}
#scan(pattern) ⇒ substring?
Attempts to match the given pattern
at the beginning of the target substring.
If the match succeeds:
- Returns the matched substring.
- Increments the byte position by substring.bytesize, and may increment the character position.
- Sets match values.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.scan(/に/) # => "に"
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こん"
# matched : "に"
# post_match: "ちは"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["に", nil]
# []:
# [0]: "に"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 3
# rest: "ちは"
# rest_size: 6
If the match fails:
- Returns
nil. - Does not increment byte and character positions.
- Clears match values.
scanner.scan(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 762
static VALUE
strscan_scan(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 1, 1);
}
#scan_base10_integer (private)
# File 'ext/strscan/strscan.c', line 1282
static VALUE
strscan_scan_base10_integer(VALUE self)
{
char *ptr;
long len = 0, remaining_len;
struct strscanner *p;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
strscan_must_ascii_compat(p->str);
ptr = CURPTR(p);
remaining_len = S_RESTLEN(p);
if (remaining_len <= 0) {
return Qnil;
}
if (ptr[len] == '-' || ptr[len] == '+') {
len++;
}
if (!rb_isdigit(ptr[len])) {
return Qnil;
}
p->prev = p->curr;
while (len < remaining_len && rb_isdigit(ptr[len])) {
len++;
}
return strscan_parse_integer(p, 10, len);
}
#scan_base16_integer (private)
# File 'ext/strscan/strscan.c', line 1320
static VALUE
strscan_scan_base16_integer(VALUE self)
{
char *ptr;
long len = 0, remaining_len;
struct strscanner *p;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
strscan_must_ascii_compat(p->str);
ptr = CURPTR(p);
remaining_len = S_RESTLEN(p);
if (remaining_len <= 0) {
return Qnil;
}
if (ptr[len] == '-' || ptr[len] == '+') {
len++;
}
if ((remaining_len >= (len + 3)) && ptr[len] == '0' && ptr[len + 1] == 'x' && rb_isxdigit(ptr[len + 2])) {
len += 2;
}
if (len >= remaining_len || !rb_isxdigit(ptr[len])) {
return Qnil;
}
p->prev = p->curr;
while (len < remaining_len && rb_isxdigit(ptr[len])) {
len++;
}
return strscan_parse_integer(p, 16, len);
}
#scan_byte ⇒ integer_byte
Scans one byte and returns it as an integer. This method is not multibyte character sensitive. See also: #getch.
# File 'ext/strscan/strscan.c', line 1148
static VALUE
strscan_scan_byte(VALUE self)
{
struct strscanner *p;
VALUE byte;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
if (EOS_P(p))
return Qnil;
byte = INT2FIX((unsigned char)*CURPTR(p));
p->prev = p->curr;
p->curr++;
MATCHED(p);
adjust_registers_to_matched(p);
return byte;
}
#scan_full(re, s, f)
# File 'ext/strscan/strscan.c', line 921
static VALUE
strscan_scan_full(VALUE self, VALUE re, VALUE s, VALUE f)
{
return strscan_do_scan(self, re, RTEST(s), RTEST(f), 1);
}
#scan_integer(base: 10)
If base isn’t provided or is 10, then it is equivalent to calling #scan with a [+-]?d+ pattern, and returns an Integer or nil.
If base is 16, then it is equivalent to calling #scan with a [+-]?(0x)?[0-9a-fA-F]+ pattern, and returns an Integer or nil.
The scanned string must be encoded with an ASCII compatible encoding, otherwise Encoding::CompatibilityError will be raised.
# File 'ext/strscan/lib/strscan/strscan.rb', line 15
def scan_integer(base: 10) case base when 10 scan_base10_integer when 16 scan_base16_integer else raise ArgumentError, "Unsupported integer base: #{base.inspect}, expected 10 or 16" end end
#scan_until(pattern) ⇒ substring?
Attempts to match the given pattern
anywhere (at any position) in the target substring.
If the match attempt succeeds:
- Sets match values.
- Sets the byte position to the end of the matched substring; may adjust the character position.
- Returns the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.scan_until(/ち/) # => "にち"
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こんに"
# matched : "ち"
# post_match: "は"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["ち", nil]
# []:
# [0]: "ち"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 12
# charpos: 4
# rest: "は"
# rest_size: 3
If the match attempt fails:
- Clears match data.
- Returns
nil. - Does not update positions.
scanner.scan_until(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 932
static VALUE
strscan_scan_until(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 1, 0);
}
#search_full(re, s, f)
# File 'ext/strscan/strscan.c', line 1094
static VALUE
strscan_search_full(VALUE self, VALUE re, VALUE s, VALUE f)
{
return strscan_do_scan(self, re, RTEST(s), RTEST(f), 0);
}
#size ⇒ captures_count
Returns the count of captures if the most recent match attempt succeeded, nil otherwise;
see Captures Match Values:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.size # => nil
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.values_at(*0..scanner.size) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil]
scanner.size # => 4
scanner.match?(/nope/) # => nil
scanner.size # => nil
# File 'ext/strscan/strscan.c', line 1752
static VALUE
strscan_size(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return INT2FIX(p->regs.num_regs);
}
#skip(pattern) match_size or nil)
Attempts to match the given pattern
at the beginning of the target substring;
If the match succeeds:
- Increments the byte position by substring.bytesize, and may increment the character position.
- Sets match values.
- Returns the size (bytes) of the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.skip(/に/) # => 3
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こん"
# matched : "に"
# post_match: "ちは"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["に", nil]
# []:
# [0]: "に"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 3
# rest: "ちは"
# rest_size: 6
scanner.skip(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 835
static VALUE
strscan_skip(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 0, 1);
}
#skip_until(pattern) ⇒ matched_substring_size?
Attempts to match the given pattern
anywhere (at any position) in the target substring;
does not modify the positions.
If the match attempt succeeds:
- Sets match values.
- Returns the size of the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.skip_until(/ち/) # => 6
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こんに"
# matched : "ち"
# post_match: "は"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["ち", nil]
# []:
# [0]: "ち"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 12
# charpos: 4
# rest: "は"
# rest_size: 3
If the match attempt fails:
- Clears match values.
- Returns
nil.
scanner.skip_until(/nope/) # => nil
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 1006
static VALUE
strscan_skip_until(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 0, 0);
}
#terminate ⇒ self
Sets the scanner to end-of-string;
returns self:
- Sets both positions to end-of-stream.
- Clears match values.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.scan_until(/に/)
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 3
# rest: "ちは"
# rest_size: 6
match_values_cleared?(scanner) # => false
scanner.terminate # => #<StringScanner fin>
put_situation(scanner)
# Situation:
# pos: 15
# charpos: 5
# rest: ""
# rest_size: 0
match_values_cleared?(scanner) # => true
# File 'ext/strscan/strscan.c', line 380
static VALUE
strscan_terminate(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
p->curr = S_LEN(p);
CLEAR_MATCH_STATUS(p);
return self;
}
#unscan ⇒ self
Sets the position to its value previous to the recent successful match attempt:
scanner = StringScanner.new('foobarbaz')
scanner.scan(/foo/)
put_situation(scanner)
#### Situation:
#### pos: 3
#### charpos: 3
#### rest: "barbaz"
#### rest_size: 6
scanner.unscan
#### => #<StringScanner 0/9 @ "fooba...">
put_situation(scanner)
#### Situation:
#### pos: 0
#### charpos: 0
#### rest: "foobarbaz"
#### rest_size: 9
Raises an exception if match values are clear:
scanner.scan(/nope/) # => nil
match_values_cleared?(scanner) # => true
scanner.unscan # Raises StringScanner::Error.
# File 'ext/strscan/strscan.c', line 1399
static VALUE
strscan_unscan(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p))
rb_raise(ScanError, "unscan failed: previous match record not exist");
p->curr = p->prev;
CLEAR_MATCH_STATUS(p);
return self;
}
#values_at(*specifiers) ⇒ array_of_captures?
Returns an array of captured substrings, or nil of none.
For each specifier, the returned substring is [specifier];
see #[].
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.values_at(*0..3) # => ["Fri Dec 12 ", "Fri", "Dec", "12"]
scanner.values_at(*%i[wday month day]) # => ["Fri", "Dec", "12"]
# File 'ext/strscan/strscan.c', line 1837
static VALUE
strscan_values_at(int argc, VALUE *argv, VALUE self)
{
struct strscanner *p;
long i;
VALUE new_ary;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
new_ary = rb_ary_new2(argc);
for (i = 0; i<argc; i++) {
rb_ary_push(new_ary, strscan_aref(self, argv[i]));
}
return new_ary;
}