Module: URI
Relationships & Source Files | |
Namespace Children | |
Modules:
| |
Classes:
| |
Exceptions:
| |
Extension / Inclusion / Inheritance Descendants | |
Included In:
| |
Super Chains via Extension / Inclusion / Inheritance | |
Class Chain:
self,
Escape
|
|
Instance Chain:
self,
RFC2396_REGEXP
|
|
Defined in: | lib/uri.rb, lib/uri/common.rb, lib/uri/ftp.rb, lib/uri/generic.rb, lib/uri/http.rb, lib/uri/https.rb, lib/uri/ldap.rb, lib/uri/ldaps.rb, lib/uri/mailto.rb, lib/uri/rfc2396_parser.rb, lib/uri/rfc3986_parser.rb |
Overview
URI
is a module providing classes to handle Uniform Resource Identifiers (RFC2396)
Features
-
Uniform handling of handling URIs
-
Flexibility to introduce custom
URI
schemes -
Flexibility to have an alternate Parser (or just different patterns and regexp's)
Basic example
require 'uri'
uri = URI("http://foo.com/posts?id=30&limit=5#time=1305298413")
#=> #<URI::HTTP:0x00000000b14880
URL:http://foo.com/posts?id=30&limit=5#time=1305298413>
uri.scheme
#=> "http"
uri.host
#=> "foo.com"
uri.path
#=> "/posts"
uri.query
#=> "id=30&limit=5"
uri.fragment
#=> "time=1305298413"
uri.to_s
#=> "http://foo.com/posts?id=30&limit=5#time=1305298413"
Adding custom URIs
module URI
class RSYNC < Generic
DEFAULT_PORT = 873
end
@@schemes['RSYNC'] = RSYNC
end
#=> URI::RSYNC
URI.scheme_list
#=> {"FTP"=>URI::FTP, "HTTP"=>URI::HTTP, "HTTPS"=>URI::HTTPS,
"LDAP"=>URI::LDAP, "LDAPS"=>URI::LDAPS, "MAILTO"=>URI::MailTo,
"RSYNC"=>URI::RSYNC}
uri = URI("rsync://rsync.foo.com")
#=> #<URI::RSYNC:0x00000000f648c8 URL:rsync://rsync.foo.com>
RFC References
A good place to view an RFC spec is www.ietf.org/rfc.html
Here is a list of all related RFC's.
Class tree
-
Generic (in uri/generic.rb)
-
URI::FTP - (in uri/ftp.rb)
-
URI::HTTP - (in uri/http.rb)
-
URI::HTTPS - (in uri/https.rb)
-
-
URI::LDAP - (in uri/ldap.rb)
-
URI::LDAPS - (in uri/ldaps.rb)
-
-
URI::MailTo - (in uri/mailto.rb)
-
-
Parser - (in uri/common.rb)
-
REGEXP - (in uri/common.rb)
-
URI::REGEXP::PATTERN - (in uri/common.rb)
-
-
Util - (in uri/common.rb)
-
Escape - (in uri/common.rb)
-
Error - (in uri/common.rb)
-
URI::InvalidURIError - (in uri/common.rb)
-
URI::InvalidComponentError - (in uri/common.rb)
-
URI::BadURIError - (in uri/common.rb)
-
Copyright Info
- Author
-
Akira Yamada <akira@ruby-lang.org>
- Documentation
-
Akira Yamada <akira@ruby-lang.org> Dmitry V. Sabanin <sdmitry@lrn.ru> Vincent Batts <vbatts@hashbangbash.com>
- License
-
Copyright © 2001 akira yamada <akira@ruby-lang.org> You can redistribute it and/or modify it under the same term as Ruby.
- Revision
-
$Id$
Constant Summary
-
DEFAULT_PARSER =
URI::Parser.new
Parser.new
-
HTML5ASCIIINCOMPAT =
Internal use only
# File 'lib/uri/common.rb', line 360defined? Encoding::UTF_7 ? [Encoding::UTF_7, Encoding::UTF_16BE, Encoding::UTF_16LE, Encoding::UTF_32BE, Encoding::UTF_32LE] : [] # :nodoc:
-
Parser =
# File 'lib/uri/common.rb', line 18RFC2396_Parser
-
REGEXP =
# File 'lib/uri/common.rb', line 17RFC2396_REGEXP
-
RFC3986_PARSER =
# File 'lib/uri/common.rb', line 19RFC3986_Parser.new
-
TBLDECWWWCOMP_ =
Internal use only
# File 'lib/uri/common.rb', line 349{}
-
TBLENCWWWCOMP_ =
Internal use only
# File 'lib/uri/common.rb', line 343{}
-
VERSION =
Internal use only
# File 'lib/uri.rb', line 100VERSION_CODE.scan(/../).collect{|n| n.to_i}.join('.').freeze
-
VERSION_CODE =
Internal use only
# File 'lib/uri.rb', line 99'001000'.freeze
-
WEB_ENCODINGS_ =
Internal use only
curl encoding.spec.whatwg.org/encodings.json|
ruby -rjson -e 'H={} h={ "shift_jis"=>"Windows-31J", "euc-jp"=>"cp51932", "iso-2022-jp"=>"cp50221", "x-mac-cyrillic"=>"macCyrillic", } JSON($<.read).map{|x|x["encodings"]}.flatten.each{|x| Encoding.find(n=h.fetch(n=x["name"].downcase,n))rescue next x["labels"].each{|y|H[y]=n} } puts "{" H.each{|k,v|puts %[ #{k.dump}=>#{v.dump},]} puts "}"
'
{ "unicode-1-1-utf-8"=>"utf-8", "utf-8"=>"utf-8", "utf8"=>"utf-8", "866"=>"ibm866", "cp866"=>"ibm866", "csibm866"=>"ibm866", "ibm866"=>"ibm866", "csisolatin2"=>"iso-8859-2", "iso-8859-2"=>"iso-8859-2", "iso-ir-101"=>"iso-8859-2", "iso8859-2"=>"iso-8859-2", "iso88592"=>"iso-8859-2", "iso_8859-2"=>"iso-8859-2", "iso_8859-2:1987"=>"iso-8859-2", "l2"=>"iso-8859-2", "latin2"=>"iso-8859-2", "csisolatin3"=>"iso-8859-3", "iso-8859-3"=>"iso-8859-3", "iso-ir-109"=>"iso-8859-3", "iso8859-3"=>"iso-8859-3", "iso88593"=>"iso-8859-3", "iso_8859-3"=>"iso-8859-3", "iso_8859-3:1988"=>"iso-8859-3", "l3"=>"iso-8859-3", "latin3"=>"iso-8859-3", "csisolatin4"=>"iso-8859-4", "iso-8859-4"=>"iso-8859-4", "iso-ir-110"=>"iso-8859-4", "iso8859-4"=>"iso-8859-4", "iso88594"=>"iso-8859-4", "iso_8859-4"=>"iso-8859-4", "iso_8859-4:1988"=>"iso-8859-4", "l4"=>"iso-8859-4", "latin4"=>"iso-8859-4", "csisolatincyrillic"=>"iso-8859-5", "cyrillic"=>"iso-8859-5", "iso-8859-5"=>"iso-8859-5", "iso-ir-144"=>"iso-8859-5", "iso8859-5"=>"iso-8859-5", "iso88595"=>"iso-8859-5", "iso_8859-5"=>"iso-8859-5", "iso_8859-5:1988"=>"iso-8859-5", "arabic"=>"iso-8859-6", "asmo-708"=>"iso-8859-6", "csiso88596e"=>"iso-8859-6", "csiso88596i"=>"iso-8859-6", "csisolatinarabic"=>"iso-8859-6", "ecma-114"=>"iso-8859-6", "iso-8859-6"=>"iso-8859-6", "iso-8859-6-e"=>"iso-8859-6", "iso-8859-6-i"=>"iso-8859-6", "iso-ir-127"=>"iso-8859-6", "iso8859-6"=>"iso-8859-6", "iso88596"=>"iso-8859-6", "iso_8859-6"=>"iso-8859-6", "iso_8859-6:1987"=>"iso-8859-6", "csisolatingreek"=>"iso-8859-7", "ecma-118"=>"iso-8859-7", "elot_928"=>"iso-8859-7", "greek"=>"iso-8859-7", "greek8"=>"iso-8859-7", "iso-8859-7"=>"iso-8859-7", "iso-ir-126"=>"iso-8859-7", "iso8859-7"=>"iso-8859-7", "iso88597"=>"iso-8859-7", "iso_8859-7"=>"iso-8859-7", "iso_8859-7:1987"=>"iso-8859-7", "sun_eu_greek"=>"iso-8859-7", "csiso88598e"=>"iso-8859-8", "csisolatinhebrew"=>"iso-8859-8", "hebrew"=>"iso-8859-8", "iso-8859-8"=>"iso-8859-8", "iso-8859-8-e"=>"iso-8859-8", "iso-ir-138"=>"iso-8859-8", "iso8859-8"=>"iso-8859-8", "iso88598"=>"iso-8859-8", "iso_8859-8"=>"iso-8859-8", "iso_8859-8:1988"=>"iso-8859-8", "visual"=>"iso-8859-8", "csisolatin6"=>"iso-8859-10", "iso-8859-10"=>"iso-8859-10", "iso-ir-157"=>"iso-8859-10", "iso8859-10"=>"iso-8859-10", "iso885910"=>"iso-8859-10", "l6"=>"iso-8859-10", "latin6"=>"iso-8859-10", "iso-8859-13"=>"iso-8859-13", "iso8859-13"=>"iso-8859-13", "iso885913"=>"iso-8859-13", "iso-8859-14"=>"iso-8859-14", "iso8859-14"=>"iso-8859-14", "iso885914"=>"iso-8859-14", "csisolatin9"=>"iso-8859-15", "iso-8859-15"=>"iso-8859-15", "iso8859-15"=>"iso-8859-15", "iso885915"=>"iso-8859-15", "iso_8859-15"=>"iso-8859-15", "l9"=>"iso-8859-15", "iso-8859-16"=>"iso-8859-16", "cskoi8r"=>"koi8-r", "koi"=>"koi8-r", "koi8"=>"koi8-r", "koi8-r"=>"koi8-r", "koi8_r"=>"koi8-r", "koi8-ru"=>"koi8-u", "koi8-u"=>"koi8-u", "dos-874"=>"windows-874", "iso-8859-11"=>"windows-874", "iso8859-11"=>"windows-874", "iso885911"=>"windows-874", "tis-620"=>"windows-874", "windows-874"=>"windows-874", "cp1250"=>"windows-1250", "windows-1250"=>"windows-1250", "x-cp1250"=>"windows-1250", "cp1251"=>"windows-1251", "windows-1251"=>"windows-1251", "x-cp1251"=>"windows-1251", "ansi_x3.4-1968"=>"windows-1252", "ascii"=>"windows-1252", "cp1252"=>"windows-1252", "cp819"=>"windows-1252", "csisolatin1"=>"windows-1252", "ibm819"=>"windows-1252", "iso-8859-1"=>"windows-1252", "iso-ir-100"=>"windows-1252", "iso8859-1"=>"windows-1252", "iso88591"=>"windows-1252", "iso_8859-1"=>"windows-1252", "iso_8859-1:1987"=>"windows-1252", "l1"=>"windows-1252", "latin1"=>"windows-1252", "us-ascii"=>"windows-1252", "windows-1252"=>"windows-1252", "x-cp1252"=>"windows-1252", "cp1253"=>"windows-1253", "windows-1253"=>"windows-1253", "x-cp1253"=>"windows-1253", "cp1254"=>"windows-1254", "csisolatin5"=>"windows-1254", "iso-8859-9"=>"windows-1254", "iso-ir-148"=>"windows-1254", "iso8859-9"=>"windows-1254", "iso88599"=>"windows-1254", "iso_8859-9"=>"windows-1254", "iso_8859-9:1989"=>"windows-1254", "l5"=>"windows-1254", "latin5"=>"windows-1254", "windows-1254"=>"windows-1254", "x-cp1254"=>"windows-1254", "cp1255"=>"windows-1255", "windows-1255"=>"windows-1255", "x-cp1255"=>"windows-1255", "cp1256"=>"windows-1256", "windows-1256"=>"windows-1256", "x-cp1256"=>"windows-1256", "cp1257"=>"windows-1257", "windows-1257"=>"windows-1257", "x-cp1257"=>"windows-1257", "cp1258"=>"windows-1258", "windows-1258"=>"windows-1258", "x-cp1258"=>"windows-1258", "x-mac-cyrillic"=>"macCyrillic", "x-mac-ukrainian"=>"macCyrillic", "chinese"=>"gbk", "csgb2312"=>"gbk", "csiso58gb231280"=>"gbk", "gb2312"=>"gbk", "gb_2312"=>"gbk", "gb_2312-80"=>"gbk", "gbk"=>"gbk", "iso-ir-58"=>"gbk", "x-gbk"=>"gbk", "gb18030"=>"gb18030", "big5"=>"big5", "big5-hkscs"=>"big5", "cn-big5"=>"big5", "csbig5"=>"big5", "x-x-big5"=>"big5", "cseucpkdfmtjapanese"=>"cp51932", "euc-jp"=>"cp51932", "x-euc-jp"=>"cp51932", "csiso2022jp"=>"cp50221", "iso-2022-jp"=>"cp50221", "csshiftjis"=>"Windows-31J", "ms932"=>"Windows-31J", "ms_kanji"=>"Windows-31J", "shift-jis"=>"Windows-31J", "shift_jis"=>"Windows-31J", "sjis"=>"Windows-31J", "windows-31j"=>"Windows-31J", "x-sjis"=>"Windows-31J", "cseuckr"=>"euc-kr", "csksc56011987"=>"euc-kr", "euc-kr"=>"euc-kr", "iso-ir-149"=>"euc-kr", "korean"=>"euc-kr", "ks_c_5601-1987"=>"euc-kr", "ks_c_5601-1989"=>"euc-kr", "ksc5601"=>"euc-kr", "ksc_5601"=>"euc-kr", "windows-949"=>"euc-kr", "utf-16be"=>"utf-16be", "utf-16"=>"utf-16le", "utf-16le"=>"utf-16le", }
Class Method Summary
-
.decode_www_form(str, enc = Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false)
Decode URL-encoded form data from given
str
. -
.decode_www_form_component(str, enc = Encoding::UTF_8)
Decode given
str
of URL-encoded form data. -
.encode_www_form(enum, enc = nil)
Generate URL-encoded form data from given
enum
. -
.encode_www_form_component(str, enc = nil)
Encode given
str
to URL-encoded form data. -
.extract(str, schemes = nil, &block)
Synopsis.
-
.join(*str)
Synopsis.
-
.parse(uri)
Synopsis.
-
.regexp(schemes = nil)
Synopsis.
-
.scheme_list
Returns a Hash of the defined schemes.
-
.split(uri)
Synopsis.
-
.get_encoding(label)
private
Internal use only
return encoding or nil encoding.spec.whatwg.org/#concept-encoding-get.
Escape - Extended
decode | Alias for Escape#unescape. |
encode | Alias for Escape#escape. |
escape | Synopsis. |
unescape | Synopsis. |
Class Method Details
.decode_www_form(str, enc = Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false)
Decode URL-encoded form data from given str
.
This decodes application/x-www-form-urlencoded data and returns array of key-value array.
This refers url.spec.whatwg.org/#concept-urlencoded-parser , so this supports only &-separator, don't support ;-separator.
ary = URI.decode_www_form("a=1&a=2&b=3")
p ary #=> [['a', '1'], ['a', '2'], ['b', '3']]
p ary.assoc('a').last #=> '1'
p ary.assoc('b').last #=> '3'
p ary.rassoc('a').last #=> '2'
p Hash[ary] # => {"a"=>"2", "b"=>"3"}
# File 'lib/uri/common.rb', line 460
def self.decode_www_form(str, enc=Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false) raise ArgumentError, "the input of #{self.name}.#{__method__} must be ASCII only string" unless str.ascii_only? ary = [] return ary if str.empty? enc = Encoding.find(enc) str.b.each_line(separator) do |string| string.chomp!(separator) key, sep, val = string.partition('=') if isindex if sep.empty? val = key key = '' end isindex = false end if use__charset_ and key == '_charset_' and e = get_encoding(val) enc = e use__charset_ = false end key.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_) if val val.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_) else val = '' end ary << [key, val] end ary.each do |k, v| k.force_encoding(enc) k.scrub! v.force_encoding(enc) v.scrub! end ary end
.decode_www_form_component(str, enc = Encoding::UTF_8)
Decode given str
of URL-encoded form data.
This decodes + to SP.
# File 'lib/uri/common.rb', line 392
def self.decode_www_form_component(str, enc=Encoding::UTF_8) raise ArgumentError, "invalid %-encoding (#{str})" if /%(?!\h\h)/ =~ str str.b.gsub(/\+|%\h\h/, TBLDECWWWCOMP_).force_encoding(enc) end
.encode_www_form(enum, enc = nil)
Generate URL-encoded form data from given enum
.
This generates application/x-www-form-urlencoded data defined in HTML5 from given an Enumerable object.
This internally uses .encode_www_form_component(str).
This method doesn't convert the encoding of given items, so convert them before call this method if you want to send data as other than original encoding or mixed encoding data. (Strings which are encoded in an HTML5 ASCII incompatible encoding are converted to UTF-8.)
This method doesn't handle files. When you send a file, use multipart/form-data.
This refers url.spec.whatwg.org/#concept-urlencoded-serializer
URI.encode_www_form([["q", "ruby"], ["lang", "en"]])
#=> "q=ruby&lang=en"
URI.encode_www_form("q" => "ruby", "lang" => "en")
#=> "q=ruby&lang=en"
URI.encode_www_form("q" => ["ruby", "perl"], "lang" => "en")
#=> "q=ruby&q=perl&lang=en"
URI.encode_www_form([["q", "ruby"], ["q", "perl"], ["lang", "en"]])
#=> "q=ruby&q=perl&lang=en"
# File 'lib/uri/common.rb', line 424
def self.encode_www_form(enum, enc=nil) enum.map do |k,v| if v.nil? encode_www_form_component(k, enc) elsif v.respond_to?(:to_ary) v.to_ary.map do |w| str = encode_www_form_component(k, enc) unless w.nil? str << '=' str << encode_www_form_component(w, enc) end end.join('&') else str = encode_www_form_component(k, enc) str << '=' str << encode_www_form_component(v, enc) end end.join('&') end
.encode_www_form_component(str, enc = nil)
Encode given str
to URL-encoded form data.
This method doesn't convert *, -, ., 0-9, A-Z, _, a-z, but does convert SP (ASCII space) to + and converts others to %XX.
If enc
is given, convert str
to the encoding before percent encoding.
This is an implementation of www.w3.org/TR/2013/CR-html5-20130806/forms.html#url-encoded-form-data
# File 'lib/uri/common.rb', line 374
def self.encode_www_form_component(str, enc=nil) str = str.to_s.dup if str.encoding != Encoding::ASCII_8BIT if enc && enc != Encoding::ASCII_8BIT str.encode!(Encoding::UTF_8, invalid: :replace, undef: :replace) str.encode!(enc, fallback: ->(x){"&#{x.ord};"}) end str.force_encoding(Encoding::ASCII_8BIT) end str.gsub!(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_) str.force_encoding(Encoding::US_ASCII) end
.extract(str, schemes = nil, &block)
Synopsis
URI::extract(str[, schemes][,&blk])
Args
str
-
String to extract URIs from.
schemes
-
Limit URI matching to a specific schemes.
Description
Extracts URIs from a string. If block given, iterates through all matched URIs. Returns nil if block given or array with matches.
Usage
require "uri"
URI.extract("text here http://foo.example.org/bla and here mailto:test@example.com and here also.")
# => ["http://foo.example.com/bla", "mailto:test@example.com"]
# File 'lib/uri/common.rb', line 302
def self.extract(str, schemes = nil, &block) warn "URI.extract is obsolete", uplevel: 1 if $VERBOSE DEFAULT_PARSER.extract(str, schemes, &block) end
.get_encoding(label) (private)
return encoding or nil encoding.spec.whatwg.org/#concept-encoding-get
# File 'lib/uri/common.rb', line 729
def self.get_encoding(label) Encoding.find(WEB_ENCODINGS_[label.to_str.strip.downcase]) rescue nil end
.join(*str)
Synopsis
URI::join(str[, str, ...])
Args
str
-
String(s) to work with, will be converted to RFC3986 URIs before merging.
Description
Joins URIs.
Usage
require 'uri'
p URI.join("http://example.com/","main.rbx")
# => #<URI::HTTP:0x2022ac02 URL:http://example.com/main.rbx>
p URI.join('http://example.com', 'foo')
# => #<URI::HTTP:0x01ab80a0 URL:http://example.com/foo>
p URI.join('http://example.com', '/foo', '/bar')
# => #<URI::HTTP:0x01aaf0b0 URL:http://example.com/bar>
p URI.join('http://example.com', '/foo', 'bar')
# => #<URI::HTTP:0x801a92af0 URL:http://example.com/bar>
p URI.join('http://example.com', '/foo/', 'bar')
# => #<URI::HTTP:0x80135a3a0 URL:http://example.com/foo/bar>
# File 'lib/uri/common.rb', line 274
def self.join(*str) RFC3986_PARSER.join(*str) end
.parse(uri)
Synopsis
URI::parse(uri_str)
Args
uri_str
-
String with URI.
Description
Creates one of the URI's subclasses instance from the string.
Raises
Raised if URI given is not a correct one.
Usage
require 'uri'
uri = URI.parse("http://www.ruby-lang.org/")
p uri
# => #<URI::HTTP:0x202281be URL:http://www.ruby-lang.org/>
p uri.scheme
# => "http"
p uri.host
# => "www.ruby-lang.org"
It's recommended to first .escape
the provided uri_str
if there are any invalid URI
characters.
# File 'lib/uri/common.rb', line 236
def self.parse(uri) RFC3986_PARSER.parse(uri) end
.regexp(schemes = nil)
Synopsis
URI::regexp([match_schemes])
Args
match_schemes
-
Array of schemes. If given, resulting regexp matches to URIs whose scheme is one of the match_schemes.
Description
Returns a Regexp object which matches to URI-like strings. The Regexp object returned by this method includes arbitrary number of capture group (parentheses). Never rely on it's number.
Usage
require 'uri'
# extract first URI from html_string
html_string.slice(URI.regexp)
# remove ftp URIs
html_string.sub(URI.regexp(['ftp'])
# You should not rely on the number of parentheses
html_string.scan(URI.regexp) do |*matches|
p $&
end
# File 'lib/uri/common.rb', line 338
def self.regexp(schemes = nil) warn "URI.regexp is obsolete", uplevel: 1 if $VERBOSE DEFAULT_PARSER.make_regexp(schemes) end
.scheme_list
Returns a Hash of the defined schemes
# File 'lib/uri/common.rb', line 146
def self.scheme_list @@schemes end
.split(uri)
Synopsis
URI::split(uri)
Args
uri
-
String with URI.
Description
Splits the string on following parts and returns array with result:
* Scheme
* Userinfo
* Host
* Port
* Registry
* Path
* Opaque
* Query
* Fragment
Usage
require 'uri'
p URI.split("http://www.ruby-lang.org/")
# => ["http", nil, "www.ruby-lang.org", nil, nil, "/", nil, nil, nil]
# File 'lib/uri/common.rb', line 198
def self.split(uri) RFC3986_PARSER.split(uri) end