Class: Unicorn::HttpParser

Relationships & Source Files
Inherits:	Object
Defined in:	lib/unicorn/http_request.rb, ext/unicorn_http/unicorn_http.rl

Constant Summary

CHUNK_MAX =

The maximum size a single chunk when using chunked transfer encoding. This is only a theoretical maximum used to detect errors in clients, it is highly unlikely to encounter clients that send more than several kilobytes at once.

# File 'ext/unicorn_http/unicorn_http.rl', line 1030
```
OFFT2NUM(UH_OFF_T_MAX)
```

DEFAULTS = Internal use only

default parameters we merge into the request env for Rack handlers

# File 'lib/unicorn/http_request.rb', line 12

{
  "rack.errors" => $stderr,
  "rack.multiprocess" => true,
  "rack.multithread" => false,
  "rack.run_once" => false,
  "rack.version" => [1, 2],
  "rack.hijack?" => true,
  "SCRIPT_NAME" => "",

  # this is not in the Rack spec, but some apps may rely on it
  "SERVER_SOFTWARE" => "Unicorn #{Unicorn::Const::UNICORN_VERSION}"
}

EMPTY_ARRAY = Internal use only
# File 'lib/unicorn/http_request.rb', line 29
```
[].freeze
```
HTTP_RESPONSE_START = Internal use only
# File 'lib/unicorn/http_request.rb', line 28
```
[ 'HTTP'.freeze, '/1.1 '.freeze ]
```
LENGTH_MAX =

The maximum size of the body as specified by Content-Length. This is only a theoretical maximum, the actual limit is subject to the limits of the file system used for Dir.tmpdir.

# File 'ext/unicorn_http/unicorn_http.rl', line 1037
```
OFFT2NUM(UH_OFF_T_MAX)
```
NULL_IO = Internal use only
# File 'lib/unicorn/http_request.rb', line 25
```
StringIO.new("")
```
TCPI = Internal use only
# File 'lib/unicorn/http_request.rb', line 109
```
Raindrops::TCP_Info.allocate
```

Class Attribute Summary

.max_header_len=(len) writeonly

this is only intended for use with Rainbows!
.check_client_connection rw Internal use only
.check_client_connection=(bool) rw Internal use only
.input_class rw Internal use only
.input_class=(klass) rw Internal use only

Class Method Summary

.new ⇒ parser constructor

Creates a new parser.
.is_chunked?(v) ⇒ Boolean Internal use only

called by ext/unicorn_http/unicorn_http.rl via rb_funcall.

Instance Attribute Summary

#body_eof? ⇒ Boolean readonly

Detects if we’re done filtering the body or not.
#headers? ⇒ Boolean readonly

This should be used to detect if a request has headers (and if the response will have headers as well).
#keepalive? ⇒ Boolean readonly

This should be used to detect if a request can really handle keepalives and pipelining.
#next? ⇒ Boolean readonly

Exactly like #keepalive?, except it will reset the internal parser state on next parse if it returns true.
#response_start_sent rw

ignored by Ruby anyways.
#response_start_sent=(boolean) rw
#chunkable_response? ⇒ Boolean readonly Internal use only
#hijacked? ⇒ Boolean readonly Internal use only

Instance Method Summary

#add_parse(buffer) ⇒ env^?

adds the contents of buffer to the internal buffer and attempts to continue parsing.
#buf
#clear ⇒ parser

Resets the parser to it’s initial state so that you can reuse it rather than making new ones.
#content_length ⇒ nil, Integer

Returns the number of bytes left to run through #filter_body.
#env
#filter_body(dst, src) ⇒ nil/src

Takes a String of src, will modify data if dechunking is done.
#headers(env, buf) (also: #trailers) readonly
#hijacked!
#parse ⇒ env^?

Takes a Hash and a String of data, parses the String of data filling in the Hash returning the Hash if parsing is finished, nil otherwise When returning the env Hash, it may modify data to point to where body processing should begin.
#trailers(env, buf)

Alias for #headers.
#call Internal use only

for rack.hijack, we respond to this method so no extra allocation of a proc object.
#check_client_connection(socket, ai) Internal use only

Ruby 2.2+ can show struct tcp_info as a string Socket::Option#inspect.
#closed_state?(state) ⇒ Boolean Internal use only

raindrops before 0.18 only supported TCP_INFO under Linux.
#closed_state_str?(state) ⇒ Boolean Internal use only
#read_headers(socket, ai) Internal use only

Does the majority of the IO processing.
#write_http_header(socket) Internal use only

Constructor Details

.new ⇒ `parser`

Creates a new parser.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 638


static VALUE HttpParser_init(VALUE self)
{
  struct http_parser *hp = data_get(self);

  http_parser_init(hp);
  hp->buf = rb_str_new(NULL, 0);
  hp->env = rb_hash_new();

  return self;
}

Class Attribute Details

.check_client_connection (rw)

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 42


def self.check_client_connection
  @@check_client_connection
end

.check_client_connection=(bool) (rw)

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 46


def self.check_client_connection=(bool)
  @@check_client_connection = bool
end

.input_class (rw)

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 34


def self.input_class
  @@input_class
end

.input_class=(klass) (rw)

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 38


def self.input_class=(klass)
  @@input_class = klass
end

.max_header_len=(len) (writeonly)

this is only intended for use with Rainbows!

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 43


static VALUE set_maxhdrlen(VALUE self, VALUE len)
{
  return UINT2NUM(MAX_HEADER_LEN = NUM2UINT(len));
}

Class Method Details

.is_chunked?(v) ⇒ `Boolean`

This method is for internal use only.

called by ext/unicorn_http/unicorn_http.rl via rb_funcall

Raises:

(Unicorn::HttpParserError)

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 171


def self.is_chunked?(v) # :nodoc:
  vals = v.split(/[ \t]*,[ \t]*/).map!(&:downcase)
  if vals.pop == 'chunked'.freeze
    return true unless vals.include?('chunked'.freeze)
    raise Unicorn::HttpParserError, 'double chunked', []
  end
  return false unless vals.include?('chunked'.freeze)
  raise Unicorn::HttpParserError, 'chunked not last', []
end

Instance Attribute Details

#body_eof? ⇒ `Boolean` (readonly)

Detects if we’re done filtering the body or not. This can be used to detect when to stop calling #filter_body.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 799


static VALUE HttpParser_body_eof(VALUE self)
{
  struct http_parser *hp = data_get(self);

  if (HP_FL_TEST(hp, CHUNKED))
    return chunked_eof(hp) ? Qtrue : Qfalse;

  return hp->len.content == 0 ? Qtrue : Qfalse;
}

#chunkable_response? ⇒ `Boolean` (readonly)

This method is for internal use only.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 828


static VALUE chunkable_response_p(VALUE self)
{
  const struct http_parser *hp = data_get(self);

  return HP_FL_ALL(hp, RES_CHUNKABLE) ? Qtrue : Qfalse;
}

#headers? ⇒ `Boolean` (readonly)

This should be used to detect if a request has headers (and if the response will have headers as well). HTTP/0.9 requests should return false, all subsequent HTTP versions will return true

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 861


static VALUE HttpParser_has_headers(VALUE self)
{
  struct http_parser *hp = data_get(self);

  return HP_FL_TEST(hp, HASHEADER) ? Qtrue : Qfalse;
}

#hijacked? ⇒ `Boolean` (readonly)

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 104


def hijacked?
  env.include?('rack.hijack_io'.freeze)
end

#keepalive? ⇒ `Boolean` (readonly)

This should be used to detect if a request can really handle keepalives and pipelining. Currently, the rules are:

MUST be a GET or HEAD request
MUST be HTTP/1.1 or HTTP/1.0 with “Connection: keep-alive”
MUST NOT have “Connection: close” set

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 820


static VALUE HttpParser_keepalive(VALUE self)
{
  struct http_parser *hp = data_get(self);

  return HP_FL_ALL(hp, KEEPALIVE) ? Qtrue : Qfalse;
}

#next? ⇒ `Boolean` (readonly)

Exactly like #keepalive?, except it will reset the internal parser state on next parse if it returns true.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 842


static VALUE HttpParser_next(VALUE self)
{
  struct http_parser *hp = data_get(self);

  if (HP_FL_ALL(hp, KEEPALIVE)) {
    HP_FL_SET(hp, TO_CLEAR);
    return Qtrue;
  }
  return Qfalse;
}

#response_start_sent (rw)

ignored by Ruby anyways

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 977


static VALUE HttpParser_rssget(VALUE self)
{
  struct http_parser *hp = data_get(self);

  return HP_FL_TEST(hp, RESSTART) ? Qtrue : Qfalse;
}

#response_start_sent=(boolean) (rw)

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 965


static VALUE HttpParser_rssset(VALUE self, VALUE boolean)
{
  struct http_parser *hp = data_get(self);

  if (RTEST(boolean))
    HP_FL_SET(hp, RESSTART);
  else
    HP_FL_UNSET(hp, RESSTART);

  return boolean; /* ignored by Ruby anyways */
}

Instance Method Details

#add_parse(buffer) ⇒ env^?

adds the contents of buffer to the internal buffer and attempts to continue parsing. Returns the #env Hash on success or nil if more data is needed.

Raises HttpParserError if there are parsing errors.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 756


static VALUE HttpParser_add_parse(VALUE self, VALUE buffer)
{
  struct http_parser *hp = data_get(self);

  Check_Type(buffer, T_STRING);
  rb_str_buf_append(hp->buf, buffer);

  return HttpParser_parse(self);
}

#buf

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 868


static VALUE HttpParser_buf(VALUE self)
{
  return data_get(self)->buf;
}

#call

This method is for internal use only.

for rack.hijack, we respond to this method so no extra allocation of a proc object

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 99


def call
  hijacked!
  env['rack.hijack_io'] = env['unicorn.socket']
end

#check_client_connection(socket, ai)

This method is for internal use only.

Ruby 2.2+ can show struct tcp_info as a string Socket::Option#inspect. Not that efficient, but probably still better than doing unnecessary work after a client gives up.

See additional method definition at line 111.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 142


def check_client_connection(socket, ai) # :nodoc:
  if ai.ip?
    # Raindrops::TCP_Info#get!, #state (reads struct tcp_info#tcpi_state)
    raise Errno::EPIPE, "client closed connection".freeze,
          EMPTY_ARRAY if closed_state?(TCPI.get!(socket).state)
  else
    write_http_header(socket)
  end
end

#clear ⇒ `parser`

Resets the parser to it’s initial state so that you can reuse it rather than making new ones.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 656


static VALUE HttpParser_clear(VALUE self)
{
  struct http_parser *hp = data_get(self);

  /* we can't safely reuse .buf and .env if hijacked */
  if (HP_FL_TEST(hp, HIJACK))
    return HttpParser_init(self);

  http_parser_init(hp);
  rb_hash_clear(hp->env);

  return self;
}

#closed_state?(state) ⇒ `Boolean`

This method is for internal use only.

raindrops before 0.18 only supported TCP_INFO under Linux

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 132


def closed_state?(state) # :nodoc:
  # TCP_ESTABLISHED == 1 on Linux
  state == 1 ? false : true
end

#closed_state_str?(state) ⇒ `Boolean`

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 158


def closed_state_str?(state)
  state == 'ESTABLISHED' ? false : true
end

#content_length ⇒ `nil`, `Integer`

Returns the number of bytes left to run through #filter_body. This will initially be the value of the “Content-Length” HTTP header after header parsing is complete and will decrease in value as #filter_body is called for each chunk. This should return zero for requests with no body.

This will return nil on “Transfer-Encoding: chunked” requests.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 698


static VALUE HttpParser_content_length(VALUE self)
{
  struct http_parser *hp = data_get(self);

  return HP_FL_TEST(hp, CHUNKED) ? Qnil : OFFT2NUM(hp->len.content);
}

#env

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 873


static VALUE HttpParser_env(VALUE self)
{
  return data_get(self)->env;
}

#filter_body(dst, src) ⇒ `nil`/`src`

Takes a String of src, will modify data if dechunking is done. Returns nil if there is more data left to process. Returns src if body processing is complete. When returning src, it may modify src so the start of the string points to where the body ended so that trailer processing can begin.

Raises HttpParserError if there are dechunking errors. Basically this is a glorified memcpy(3) that copies src into #buf while filtering it through the dechunker.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 901


static VALUE HttpParser_filter_body(VALUE self, VALUE dst, VALUE src)
{
  struct http_parser *hp = data_get(self);
  char *srcptr;
  long srclen;

  srcptr = RSTRING_PTR(src);
  srclen = RSTRING_LEN(src);

  StringValue(dst);

  if (HP_FL_TEST(hp, CHUNKED)) {
    if (!chunked_eof(hp)) {
      rb_str_modify(dst);
      rb_str_resize(dst, srclen); /* we can never copy more than srclen bytes */

      hp->s.dest_offset = 0;
      hp->cont = dst;
      hp->buf = src;
      http_parser_execute(hp, srcptr, srclen);
      if (hp->cs == http_parser_error)
        parser_raise(eHttpParserError, "Invalid HTTP format, parsing fails.");

      assert(hp->s.dest_offset <= hp->offset &&
             "destination buffer overflow");
      advance_str(src, hp->offset);
      rb_str_set_len(dst, hp->s.dest_offset);

      if (RSTRING_LEN(dst) == 0 && chunked_eof(hp)) {
        assert(hp->len.chunk == 0 && "chunk at EOF but more to parse");
      } else {
        src = Qnil;
      }
    }
  } else {
    /* no need to enter the Ragel machine for unchunked transfers */
    assert(hp->len.content >= 0 && "negative Content-Length");
    if (hp->len.content > 0) {
      long nr = MIN(srclen, hp->len.content);

      rb_str_modify(dst);
      rb_str_resize(dst, nr);
      /*
       * using rb_str_replace() to avoid memcpy() doesn't help in
       * most cases because a GC-aware programmer will pass an explicit
       * buffer to env["rack.input"].read and reuse the buffer in a loop.
       * This causes copy-on-write behavior to be triggered anyways
       * when the src buffer is modified (when reading off the socket).
       */
      hp->buf = src;
      memcpy(RSTRING_PTR(dst), srcptr, nr);
      hp->len.content -= nr;
      if (hp->len.content == 0) {
        HP_FL_SET(hp, REQEOF);
        hp->cs = http_parser_first_final;
      }
      advance_str(src, nr);
      src = Qnil;
    }
  }
  hp->offset = 0; /* for trailer parsing */
  return src;
}

#headers(env, buf) (readonly) Also known as: #trailers

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 777


static VALUE HttpParser_headers(VALUE self, VALUE env, VALUE buf)
{
  struct http_parser *hp = data_get(self);

  hp->env = env;
  hp->buf = buf;

  return HttpParser_parse(self);
}

#hijacked!

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 878


static VALUE HttpParser_hijacked_bang(VALUE self)
{
  struct http_parser *hp = data_get(self);

  HP_FL_SET(hp, HIJACK);

  return self;
}

#parse ⇒ env^?

Takes a Hash and a String of data, parses the String of data filling in the Hash returning the Hash if parsing is finished, nil otherwise When returning the env Hash, it may modify data to point to where body processing should begin.

Raises HttpParserError if there are parsing errors.

[ GitHub ]

# File 'ext/unicorn_http/unicorn_http.rl', line 717


static VALUE HttpParser_parse(VALUE self)
{
  struct http_parser *hp = data_get(self);
  VALUE data = hp->buf;

  if (HP_FL_TEST(hp, TO_CLEAR))
    HttpParser_clear(self);

  http_parser_execute(hp, RSTRING_PTR(data), RSTRING_LEN(data));
  if (hp->offset > MAX_HEADER_LEN)
    parser_raise(e413, "HTTP header is too large");

  if (hp->cs == http_parser_first_final ||
      hp->cs == http_parser_en_ChunkedBody) {
    advance_str(data, hp->offset + 1);
    hp->offset = 0;
    if (HP_FL_TEST(hp, INTRAILER))
      HP_FL_SET(hp, REQEOF);

    return hp->env;
  }

  if (hp->cs == http_parser_error)
    parser_raise(eHttpParserError, "Invalid HTTP format, parsing fails.");

  return Qnil;
}

#read_headers(socket, ai)

This method is for internal use only.

Does the majority of the IO processing. It has been written in Ruby using about 8 different IO processing strategies.

It is currently carefully constructed to make sure that it gets the best possible performance for the common case: GET requests that are fully complete after a single read(2)

Anyone who thinks they can make it faster is more than welcome to take a crack at it.

returns an environment hash suitable for Rack if successful This does minimal exception trapping and it is up to the caller to handle any socket errors (e.g. user aborted upload).

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 65


def read_headers(socket, ai)
  e = env

  # From https://www.ietf.org/rfc/rfc3875:
  # "Script authors should be aware that the REMOTE_ADDR and
  #  REMOTE_HOST meta-variables (see sections 4.1.8 and 4.1.9)
  #  may not identify the ultimate source of the request.  They
  #  identify the client for the immediate request to the server;
  #  that client may be a proxy, gateway, or other intermediary
  #  acting on behalf of the actual source client."
  e['REMOTE_ADDR'] = ai.unix? ? '127.0.0.1' : ai.ip_address

  # short circuit the common case with small GET requests first
  socket.readpartial(16384, buf)
  if parse.nil?
    # Parser is not done, queue up more data to read and continue parsing
    # an Exception thrown from the parser will throw us out of the loop
    false until add_parse(socket.readpartial(16384))
  end

  check_client_connection(socket, ai) if @@check_client_connection

  e['rack.input'] = 0 == content_length ?
                    NULL_IO : @@input_class.new(socket, self)

  # for Rack hijacking in Rack 1.5 and later
  e['unicorn.socket'] = socket
  e['rack.hijack'] = self

  e.merge!(DEFAULTS)
end

#trailers(env, buf)

Alias for #headers. This is an alias for #headers

#write_http_header(socket)

This method is for internal use only.

[ GitHub ]

# File 'lib/unicorn/http_request.rb', line 163


def write_http_header(socket) # :nodoc:
  if headers?
    self.response_start_sent = true
    HTTP_RESPONSE_START.each { |c| socket.write(c) }
  end
end

Class: Unicorn::HttpParser

Constant Summary

Class Attribute Summary

Class Method Summary

Instance Attribute Summary

Instance Method Summary

Constructor Details

.new ⇒ parser

Class Attribute Details

.check_client_connection (rw)

.check_client_connection=(bool) (rw)

.input_class (rw)

.input_class=(klass) (rw)

.max_header_len=(len) (writeonly)

Class Method Details

.is_chunked?(v) ⇒ Boolean

Instance Attribute Details

#body_eof? ⇒ Boolean (readonly)

#chunkable_response? ⇒ Boolean (readonly)

#headers? ⇒ Boolean (readonly)

#hijacked? ⇒ Boolean (readonly)

#keepalive? ⇒ Boolean (readonly)

#next? ⇒ Boolean (readonly)

#response_start_sent (rw)

#response_start_sent=(boolean) (rw)

Instance Method Details

#add_parse(buffer) ⇒ env?

#buf

#call

#check_client_connection(socket, ai)

#clear ⇒ parser

#closed_state?(state) ⇒ Boolean

#closed_state_str?(state) ⇒ Boolean

#content_length ⇒ nil, Integer

#env

#filter_body(dst, src) ⇒ nil/src

#headers(env, buf) (readonly) Also known as: #trailers

#hijacked!

#parse ⇒ env?

#read_headers(socket, ai)

#trailers(env, buf)

#write_http_header(socket)

.new ⇒ `parser`

.is_chunked?(v) ⇒ `Boolean`

#body_eof? ⇒ `Boolean` (readonly)

#chunkable_response? ⇒ `Boolean` (readonly)

#headers? ⇒ `Boolean` (readonly)

#hijacked? ⇒ `Boolean` (readonly)

#keepalive? ⇒ `Boolean` (readonly)

#next? ⇒ `Boolean` (readonly)

#add_parse(buffer) ⇒ env^?

#clear ⇒ `parser`

#closed_state?(state) ⇒ `Boolean`

#closed_state_str?(state) ⇒ `Boolean`

#content_length ⇒ `nil`, `Integer`

#filter_body(dst, src) ⇒ `nil`/`src`

#parse ⇒ env^?