TypeProf: A type analysis tool for Ruby code based on abstract interpretation
Show me demo first
See demo.md.
How to use TypeProf
Analyze app.rb:
$ typeprof app.rb
Analyze app.rb with sig/app.rbs that specifies some method types:
$ typeprof sig/app.rbs app.rb
Here is a typical use case:
$ typeprof sig/app.rbs app.rb -o sig/app.gen.rbs
Here is a list of currently available options:
-o OUTFILE: Write the analyze result to OUTFILE instead of standard output-q: Hide the progress indicator-v: Alias to-fshow-errors-d: Show the analysis log (Currently, the log is just for debugging and may become very huge)-I DIR: AddDIRto the file search path ofrequire-r GEMNAME: Load the RBS files ofGEMNAME--exclude-dir DIR: Omit the result of files that are placed under the directoryDIR. If there are some directory specifications, the latter one is stronger. (Assuming that--include-dir foo --exclude-dir foo/baris specified, the analysis result of foo/bar/baz.rb is omitted, but foo/baz.rb is shown.)--include-dir DIR: Show the result of files that are placed under the directoryDIR. If there are some directory specifications, the latter one is stronger. (Assuming that--exclude-dir foo --include-dir foo/baris specified, the analysis result of foo/bar/baz.rb is shown, but foo/baz.rb is omitted.)--show-errors: Prints out possible bugs found during execution (often a lot of false positives).--show-untyped: When TypeProf infers a typeA | untyped, it simply outputsAby default. But this option forces to outputA | untyped.--type-depth-limit=NUM: (undocumented yet)
What is a TypeProf?
TypeProf is a Ruby interpreter that abstractly executes Ruby programs at the type level. It executes a given program and observes what types are passed to and returned from methods and what types are assigned to instance variables. All values are, in principle, abstracted to the class to which the object belongs, not the object itself (detailed in the next section).
Here is an example of a method call.
def foo(n)
p n #=> Integer
n.to_s
end
p foo(42) #=> String
The analysis results of TypeProf are as follows.
$ ruby exe/typeprof test.rb
# Revealed types
# test.rb:2 #=> Integer
# test.rb:6 #=> String
# Classes
class Object
def foo : (Integer) -> String
end
When the method call foo(42) is executed, the type (abstract value) "Integer" is passed instead of the Integer object 42.
The method foo executes n.to_s.
Then, the built-in method Integer#to_s is called and you get the type "String", which the method foo returns.
Collecting observations of these execution results, TypeProf outputs, "the method foo receives Integer and returns String" in the RBS format.
Also, the argument of p is output in the Revealed types section.
Instance variables are stored in each object in Ruby, but are aggregated in class units in TypeProf.
class Foo
def initialize
@a = 42
end
attr_accessor :a
end
Foo.new.a = "str"
p Foo.new.a #=> Integer | String
$ ruby exe/typeprof test.rb
# Revealed types
# test.rb:11 #=> Integer | String
# Classes
class Foo
attr_accessor a : Integer | String
def initialize : -> Integer
end
Abstract values
As mentioned above, TypeProf abstracts almost all Ruby values to the type level, with some exceptions like class objects. To avoid confusion with normal Ruby values, we use the word "abstract value" to refer the values that TypeProf handles.
TypeProf handles the following abstract values.
- Instance of a class
- Class object
- Symbol
untyped- Union of abstract values
- Instance of a container class
- Proc object
Instances of classes are the most common values.
A Ruby code Foo.new returns an instance of the class Foo.
This abstract value is represented as Foo in the RBS format, though it is a bit confusing.
The integer literal 42 generates an instance of Integer and the string literal "str" generates an instance of String.
A class object is a value that represents the class itself.
For example, the constants Integer and String has class objects.
In Ruby semantics, a class object is an instance of the class Class, but it is not abstracted into Class in TypeProf.
This is because, if it is abstracted, TypeProf cannot handle constant references and class methods correctly.
A symbol is an abstract value returned by Symbol literals like :foo.
A symbol object is not abstracted to an instance of the class Symbol because its concrete value is often required in many cases, such as keyword arguments, JSON data keys, the argument of Module#attr_reader, etc.
Note that some Symbol objects are handled as instances of the class Symbol, for example, the return value of String#to_sym and Symbol literals that contains interpolation like :"foo_#{ x }".
untyped is an abstract value generated when TypeProf fails to trace values due to analysis limits or restrictions.
Any operations and method calls on untyped are ignored, and the evaluation result is also untyped.
A union of abstract values is a value that represents multiple possibilities.,
For (a bit artificial) example, the result of rand < 0.5 ? 42 : "str" is a union, Integer | String.
An instance of a container class, such as Array and Hash, is an object that contains other abstract values as elements. At present, only Array, Enumerator and Hash are supported. Details will be described later.
A Proc object is a closure produced by lambda expressions (-> {... }) and block parameters (&blk).
During the interpretation, these objects are not abstracted but treated as concrete values associated with a piece of code.
In the RBS result, they are represented by using anonymous proc type, whose types they accepted and returned.
Execution
TypeProf is a kind of Ruby interpreter, but its execution order is quite different from Ruby semantics.
Branch
When it executes a branch, both clauses are executed in parallel. It is unspecified which is evaluated first.
In the following example, the "then" clause assigns an Integer to the variable x and the "else" clause assigns a String to x.
if rand <0.5
x = 42
else
x = "str"
end
p x #=> Integer | String
TypeProf first evaluates the conditional expression, then does both "then" and "else" clauses (we cannot tell which comes first), and after the branch, evaluates the method call to p with Integer | String.
Restart
If you assign different abstract values to an instance variable, the order of execution may be more complicated.
class Foo
def initialize
@x = 1
end
def get_x
@x
end
def update_x
@x = "str"
end
end
foo = Foo.new
# ...
p foo.get_x #=> Integer | String
# ...
foo.update_x
In the above example, an Integer is assigned to the instance variable @x in Foo#initialize.
Foo#get_x reads @x and returns an Integer once.
However, when Foo#update_x is called, the abstract value of the instance variable @x is expanded to Integer | String.
Therefore, reading @x should have returned a Integer | String instead of a simple Integer, and the access to @x in Foo#get_x restarts to return Integer | String, i.e., retroactively executed again.
Therefore, the return value of the call to Foo#get_x will eventually be Integer | String.
Method call
TypeProf does not keep track of the call stack. In other words, there is no concept of "caller" during the execution of the method. Instead, when a method returns, it returns the abstract value to all possible places that may invoke to the method.
def fib(n)
if n <2
return n
else
fib(n-1) + fib(n-2)
end
end
p fib(10) #=> Integer
In the above example, the method fib is called from three places (including recursive calls).
When return n is executed, TypeProf returns an Integer to all three places.
Note that, in Ruby, we cannot statically identify all places that may call to the method (because it depends upon the type of receiver).
Therefore, if TypeProf finds a new call to fib after return n is executed, the call also returns an Integer immediately.
If a method returns different abstract values, it can lead to retrospective execution.
Stub execution
Even after TypeProf traced all programs as possible, there may be methods or blocks that aren't executed.
For example, a method is not executed if it is called from nowhere; this is typical for library method that has no test.
(Basically, when you use TypeProf, it is recommended to invoke all methods with supposed argument types.)
TypeProf forcibly calls these unreachable methods and blocks with untyped as arguments.
def foo(n)
end
def (n)
foo(1)
end
In the above program, neither the method foo nor the method bar is called.
TypeProf stub-calls the bar with a untyped argument, so you can get the information that an Integer is passed to a method foo.
However, this feature may slow down the analysis and may also brings many wrong guesses, so we plan to allow a user to enable/disable this feature in the configuration.
Limitations
Some Ruby language features cannot be handled because they abstract values.
Basically, it ignores language features whose object identity is important, such as singleton methods for general objects. Note that class method definitions are handled correctly; class objects are not abstracted for the sake. Currently, TypeProf only handles instance methods and class methods; it has no general concept of metaclasses (a class of a class).
Meta programming is only partially supported.
Module#attr_readerandObject#sendhandle correctly only when symbol abstract value is passed (for example, when written in a symbol literal).Kernel#instance_evalonly supports the function to replace the receiver object when a block is passed (the contents of the string are not tracked).Class.newis not supported; it always returnsuntyped.Kernel#requirehas a dedicated support only when the argument string is a literal.
Other features
Partial RBS specification
Sometimes, TypeProf fails to correctly infer the programmer's intent due to theoretical or implementation limitations. In such cases, you can manually write a RBS description for some difficult methods to convey your intent to TypeProf.
For example, TypeProf does not handle a overloaded method.
# Programmer Intent: (Integer) -> Integer | (String) -> String
# TypeProf : (Integer | String) -> (Integer | String)
def foo(n)
if n.is_a?(Integer)
42
else
"str"
end
end
# Overload intent not respected
p foo(42) #=> Integer | String
p foo("str") #=> Integer | String
Assume that a programmer write the method foo as a overloaded method that returns an Integer only when an Integer is passed, and that returns a String only when a String is passed.
Thus, we expect the result of foo(42) to be an Integer. However, it's a bit wider result, Integer | String.
If you write the RBS manually to specify the intention of the method foo, the result will be as intended.
# test.rbs
class Object
def foo: (Integer) -> Integer | (String) -> String
end
# test.rb
def foo(n)
# Regardless of the contents, the description of test.rbs has priority
end
# Overload is respected correctly
p foo(42) #=> Integer
p foo("str") #=> String
Many of the built-in class methods are also specified by RBS. We plan a feature to load all RBS files of libraries required in Gemfile (but not implemented yet).
RBS's "interface" type is not supported and is treated as untyped.
Debug feature
Unfortunately, understanding the behavior and analysis results of TypeProf is sometimes difficult.
Currently, you can observe the abstract value of the argument by calling Kernel#p in your code, as if you debug your program in Ruby.
The only way to get a deeper understanding of the analysis is to watch the debug output with the environment variable TP_DEBUG=1.
We plan to provide some more useful way to make it easy to understand the analysis result in the future.
Flow-sensitive analysis
TypeProf attempts to separate branches if the condition separates a union abstract value.
For example, consider that a local variable var has an abstract value Foo | Bar, and that a branch condition is var.is_a?(Foo).
TypeProf will execute the "then" clause with var as only a Foo, and does the "else" clause with var as only a Bar.
Note that it can work well only if the receiver is a local variable defined in the current scope.
If the condition is about an instance variable, say @var.is_a?(Foo), or if the variable var is defined outside the block, the union is not separated.
At present, only the following simple patterns (is_a?, respond_to?, and case/when) can be handled well.
def foo(x)
if x.is_a?(Integer)
p x #=> Integer
else
p x #=> String
end
end
foo(42)
foo("str")
def foo(x)
if x.respond_to?(:times)
p x #=> Integer
else
p x #=> String
end
end
foo(42)
foo("str")
def foo(x)
case x
when Integer
p x #=> Integer
when String
p x #=> String
end
end
foo(42)
foo("str")
Container type
At present, only Array-like containers (Array and Enumerator) and Hash-like containers (Hash) are supported.
TypeProf keeps the object identity inside a method; the container instances are identified by the place where it is created. You can update the types; this allows the following code to initialize the array:
def foo
a = []
100.times {|n| a << n.to_s}
a
end
p foo #=> Array[String]
However, we do not track updates across methods (due to performance reasons).
def (a)
a << "str"
end
def foo
a = []
(a)
a
end
foo #=> [], not Array[String]
When a container abstract value is read from an instance variable, an update operation against it will be respected to the instance variable.
Currently, TypeProf has some limitations about container instances (because of performance).
- If you put a container type into a key of hash object, the key is replaced with
untyped. - The maximum depth of nested arrays and hashes is limited to 5.
We plan to allow them to be configurable, and relax the depth limitation when RBS is manually speficied (mainly for JSON data).
(Write later)
- Proc
- Struct