Binary Searching
Binary Searching
Each of these methods returns an enumerator if no block is given. Given a block, each of these methods
returns an element (or element index) from self as determined by a binary search. The search finds an
element of self which meets the given condition in O (log n) operations, where n is the count of
elements. self should be sorted, but this is not checked.
There are two search modes:
Find-minimum mode Method bsearch returns the first element for which the block returns true; the block
must return true or false.
Find-any mode: Method bsearch some element, if any, for which the block returns zero. the block must
return a numeric value. The block should not mix the modes by sometimes returning true or false and
other times returning a numeric value, but this is not checked.
Find-Minimum Mode: In find-minimum mode, the block must return true or false. The further requirement
(though not checked) is that there are no indexes i and j such that:
0 <= i < j <= self.size.
The block returns true for self-[i] and false for self-[j].
Less formally: the block is such that all false-evaluating elements precede all true-evaluating elements.
In find-minimum mode, method bsearch returns the first element for which the block returns true.
Examples:
r = (0...a.size)
r.bsearch {|i| a[i] >= 4 } #=> 1
r.bsearch {|i| a[i] >= 6 } #=> 2
r.bsearch {|i| a[i] >= 8 } #=> 3
r.bsearch {|i| a[i] >= 100 } #=> nil
r = (0.0...Float::INFINITY)
r.bsearch {|x| Math.log(x) >= 0 } #=> 1.0
Find-Any Mode: In find-any mode, the block must return a numeric value. The further requirement (though
not checked) is that there are no indexes i and j such that: 0 <= i < j <= self.size.
The block returns a negative value for self-[i] and a positive value for self-[j].
The block returns a negative value for self-[i] and zero self-[j].
The block returns zero for self-[i] and a positive value for self[j].
Examples:
Crash Bugs With 3rd Party C Extensions: If the crash happens inside a 3rd party C extension, try to figure
out inside which C extension it happens, and add a note to the issue to report the issue to that C extension,
and set the status to Third Party’s Issue.
Non-Bug reports: Any issues in the bug tracker that are not reports of problems should have the tracker
changed from Bug to either Feature (new features or performance improvements) or Misc. This change can
be made without adding a note.
Stale Issues: There are many issues that are stale, with no updates in months or even years. For stale
issues in Feedback state, where the feedback has not been received, you can change the status to Closed
without adding a note. For stale issues in Assigned state, you can reach out to the assignee and see if they
can update the issue. If the assignee is no longer an active committer, remove them as the assignee and
change the status to open.
In Symbol:
Symbol#capitalize Symbol#casecmp? Symbol#swapcase
Symbol#casecmp Symbol#downcase Symbol#upcase
Default Case Mapping: By default, all of these methods use full Unicode case mapping, which is suitable for
most languages. See Section 3.13 (Default Case Algorithms) of the Unicode standard. Non-ASCII case
mapping and folding are supported for UTF-8, UTF-16BE/LE, UTF-32BE/LE, and ISO-8859-1~16 Strings
/Symbols.
Context-dependent case mapping as described in Table 3-17 (Context Specification for Casing) of the
Unicode standard is currently not supported.
In most cases, case conversions of a string have the same number of characters. There are exceptions
(see also :fold below):
.
Case mapping may also depend on locale (see also :turkic below):
Case changing methods may not maintain Unicode normalization. See String#unicode_normalize).
:turkic: Full Unicode case mapping, adapted for the Turkic languages that distinguish dotted and
dot less I, for example Turkish and Azeri.
Character Selector: A character selector is a string argument accepted by certain Ruby methods. Each of
these instance methods accepts one or more character selectors:
String#tr(selector, replacements): returns a new string.
String#tr!(selector, replacements): returns self or nil.
String#tr_s(selector, replacements): returns a new string.
String#tr_s!(selector, replacements): returns self or nil.
String#count(*selectors): returns the count of the specified characters.
String#delete(*selectors): returns a new string.
String#delete!(*selectors): returns self or nil.
String#squeeze(*selectors): returns a new string.
String#squeeze!(*selectors): returns self or nil.
A character selector identifies zero or more characters in self that are to be operands for the method.
In this section, we illustrate using method String#delete(selector), which deletes the selected characters.
In the simplest case, the characters selected are exactly those contained in the selector itself:
A hyphen ('-') between two other characters defines a range of characters instead of a plain string
of characters:
In effect, the given selectors are formed into a single selector consisting of only those characters common
to all of the given selectors. All forms of selectors may be used, including negations, ranges, and escapes.
Each of these pairs of method calls is equivalent:
s.delete('abcde', 'dcbfg')
s.delete('bcd')
s.delete('^abc', '^def')
s.delete('^abcdef')
s.delete('a-e', 'c-g')
s.delete('cde')
Command Injection: Some Ruby core methods accept string data that includes text to be executed as a
system command. They should not be called with unknown or unsanitized commands. These methods
include:
Kernel.system
`command` (backtick method) (also called by the expression %x[command]).
IO.popen(command).
IO.read(command).
IO.write(command).
IO.binread(command).
IO.binwrite(command).
IO.readlines(command).
IO.foreach(command).
Note that some of these methods do not execute commands when called from subclass File:
File.read(path). File.binread(path). File.readlines(path).
File.write(path). File.binwrite(path). File.foreach(path).
Contributing to Ruby: This guide outlines ways to get started with contributing to Ruby:
Reporting issues: How to report issues, how to request features, and how backporting works
Building Ruby: How to build Ruby on your local machine for development
Testing Ruby: How to test Ruby on your local machine once you’ve built it
Making changes to Ruby: How to submit pull requests to change Ruby’s documentation, code, test
suite, or standard libraries
Making changes to Ruby standard libraries: How to build, test, and contribute to Ruby standard
libraries
5. Create a build directory outside of the source directory: mkdir build && cd build
While it's not necessary to build in a separate directory, it's good practice to do so.
o If you are frequently building Ruby, add the --disable-install-doc flag to not build
documentation which will speed up the build process.
8. Build Ruby:
9. make install
o If you're on macOS and installed OpenSSL through Homebrew, you may encounter failure to
build OpenSSL that look like this:
o openssl:
o ruby/ext/openssl/extconf.rb: OpenSSL library could not be found. You might want to use --
with-openssl-dir=<dir> option to specify the prefix where OpenSSL is installed.
Adding --with-openssl-dir=$(brew --prefix openssl) to the list of options passed to configure may solve the
issue.
Remember to delete your build directory and start again from the configure step.
If you are having unexplainable build errors, after saving all your work, try running git clean -xfdin the source
root to remove all git ignored local files. If you are working from a source directory that's been updated
several times, you may have temporary build artifacts from previous releases which can cause build
failures.
More details
If you're interested in continuing development on Ruby, here are more details about Ruby's build to help out.
We can also set MAKEFLAGS to run all make commands in parallel. Having the right --jobs flag will ensure
all processors are utilized when building software projects. To do this effectively, you can set MAKEFLAGS
in your shell configuration/profile:
Debugging: You can use either lldb or gdb for debugging. Before debugging, you need to create
a test.rb with the Ruby script you’d like to run. You can use the following make targets:
./autogen.sh
mkdir build && cd build
export ASAN_OPTIONS="halt_on_error=0:use_sigaltstack=0:detect_leaks=0"
../configure cppflags="-fsanitize=address -fno-omit-frame-pointer" optflags=-O0 LDFLAGS="-
fsanitize=address -fno-omit-frame-pointer" make
On Linux it is important to specify -O0 when debugging. This is especially true for ASAN which sometimes
works incorrectly at higher optimization levels.
If you need only C code coverage, you can remove COVERAGE=true from the above process. You can
also use gcov command directly to get per-file coverage.
If you need only Ruby code coverage, you can remove --enable-gcov. Note that test
coverage.dat accumulates all runs of make test-all. Make sure that you remove the file if you want to
measure one test run. You can see the coverage result of CI: rubyci.org/coverage
Documentation Guide: This guide discusses recommendations for documenting classes, modules, and
methods in the Ruby core and in the Ruby standard library.
Generating documentation
Most Ruby documentation lives in the source files and is written in RDoc format. Some pages live under
the doc folder and can be written in either .rdoc or .md format, determined by the file extension
To generate the output of documentation changes in HTML in the {build folder}/.ext /html directory, run the
following inside your build directory: make html
Then you can preview your changes by opening {build folder}/.ext/html/index.html file in your browser.
Goal: The goal of Ruby documentation is to impart the most important and relevant in the shortest time. The
reader should be able to quickly understand the usefulness of the subject code and how to use it.
Providing too little information is bad, but providing unimportant information or unnecessary examples is not
good either. Use your judgment about what the user needs to know.
General Guidelines
Keep in mind that the reader may not be fluent in English.
Write short declarative or imperative sentences.
Group sentences into (ideally short) paragraphs, each covering a single topic.
Organize material with headers.
Refer to authoritative and relevant sources using links.
Use simple verb tenses: simple present, simple past, simple future.
Use simple sentence structure, not compound or complex structure.
Avoid:
o Excessive comma-separated phrases; consider a list.
o Idioms and culture-specific references.
o Overuse of headers.
o Using US-ASCII-incompatible characters in C source files; see Characters below.
Characters
Use only US-ASCII-compatible characters in a C source file. (If you use other characters, the Ruby CI will
gently let you know.) If want to put ASCII-incompatible characters into the documentation for a C-coded
class, module, or method, there are workarounds involving new files doc/*.rdoc:
For class Foo (defined in file foo.c), create file doc /foo. rdoc, declare class Foo; end, and place
the class documentation above that declaration:
# Documentation for class Foo goes here.
class Foo; end
Similarly, for module Bar (defined in file bar.c, create file doc/bar.rdoc, declare module Bar; end,
and place the module documentation above that declaration:
For a method, things are different. Documenting a method as above disables the "click to toggle
source" feature in the rendered documentation.
Therefore it's best to use file inclusion:
o Retain the call-seq in the C code.
o Use file inclusion (:include:) to include text from an .rdoc file.
Example:
/* * call-seq: * each_byte {|byte| ... } -> self * each_byte -> enumerator * :include:
doc/string/each_byte.rdoc * */
RDoc: Ruby is documented using RDoc. For information on RDoc syntax and features, see the RDoc
Markup Reference.
Output from irb: For code examples, consider using interactive Ruby, irb. For a code example that
includes irb output, consider aligning # => ... in successive lines. Alignment may sometimes aid readability:
Blank Lines: A blank line begins a new paragraph. A code block or list should be preceded by and followed
by a blank line. This is unnecessary for the HTML output, but helps in the ri output.
HTML Tags: In general, avoid using HTML tags (even in formats where it’s allowed) because ri (the Ruby
Interactive reference tool) may not render them properly.
Tables : In particular, avoid building tables with HTML tags (<table>, etc.). Alternatives are:
The GFM (GitHub Flavored Markdown) table extension, which is enabled by default. SeeGFM
tables extension.
A verbatim text block, using spaces and punctuation to format the text. Note that text markupwill
not be honored.
Documenting Classes and Modules: The general structure of the class or module documentation should be:
Synopsis Common uses, with “What’s Here”
examples summary (optional)
Synopsis: The synopsis is a short description of what the class or module does and why the reader might
want to use it. Avoid details in the synopsis.
Common Uses: Show common uses of the class or module. Depending on the class or module, this section
may vary greatly in both length and complexity.
What here’s summary?: The documentation for a class or module may include a “What’s Here” section.
Guidelines:
The section title is What's Here.
Consider listing the parent class and any included modules; consider links to their "What's Here"
sections if those exist.
List methods as a bullet list:
o Begin each item with the method name, followed by a colon and a short description.
o If the method has aliases, mention them in parentheses before the colon (and do not list
the aliases separately).
o Check the rendered documentation to determine whether RDoc has recognized the
method and linked to it; if not, manually insert a link.
If there are numerous entries, consider grouping them into subsections with headers.
If there are more than a few such subsections, consider adding a table of contents just below the
main section title.
Example:
* call-seq:
* Hash.new(default_value = nil) -> new_hash
* Hash.new {|hash, key| ... } -> new_hash
For an instance method, use the form (omitting any prefix, just as RDoc does for a Ruby-coded method):
* call-seq: * count -> integer * count(obj) -> integer * count {|element| ... } -> integer
Arguments:
If the method does not accept arguments, omit the parentheses.
If the method accepts optional arguments:
o Separate each argument name and its default value with = (equal-sign with surrounding
spaces).
o If the method has the same behavior with either an omitted or an explicit argument, use
a call-seq with optional arguments. For example, use:
Block:
If the method does not accept a block, omit the block.
If the method accepts a block, the call-seq should have {|args| ... }, not {|args| block } or {|args|
code }.
Return types:
If the method can return multiple different types, separate the types with “or” and, if necessary,
commas.
If the method can return multiple types, use object.
If the method returns the receiver, use self.
If the method returns an object of the same class, prefix new_ if an only if the object is not self;
example: new_array.
Aliases:
Omit aliases from the call-seq, but mention them near the end (see below).
Synopsis: The synopsis comes next, and is a short description of what the method does and why you would
want to use it. Ideally, this is a single sentence, but for more complex methods it may require an entire
paragraph.
For Array#count, the synopsis is:
This is great as it is short and descriptive. Avoid documenting too much in the synopsis, stick to the most
important information for the benefit of the reader.
Details and Examples: Most non-trivial methods benefit from examples, as well as details beyond what is
given in the synopsis. In the details and examples section, you can document how the method handles
different types of arguments, and provides examples on proper usage. In this section, focus on how to use
the method properly, not on how the method handles improper arguments or corner cases.
Not every behavior of a method requires an example. If the method is documented to return self, you don’t
need to provide an example showing the return value is the same as the receiver. If the method is
documented to return nil, you don’t need to provide an example showing that it returns nil. If the details
mention that for a certain argument type, an empty array is returned, you don’t need to provide an example
for that.
Only add an example if it provides the user additional information, do not add an example if it provides the
same information given in the synopsis or details. The purpose of examples is not to prove what the details
are stating.
Argument Description (if necessary): For methods that require arguments, if not obvious and not explicitly
mentioned in the details or implicitly shown in the examples, you can provide details about the types of
arguments supported. When discussing the types of arguments, use simple language even if less-precise,
such as "level must be an integer", not "level must be an Integer-convertible object". The vast majority of
use will be with the expected type, not an argument that is explicitly convertible to the expected type, and
documenting the difference is not important.
For methods that take blocks, it can be useful to document the type of argument passed if it is not obvious,
not explicitly mentioned in the details, and not implicitly shown in the examples.
If there is more than one argument or block argument, use a labeled list.
Corner Cases and Exceptions: For corner cases of methods, such as atypical usage, briefly mention the
behavior, but do not provide any examples.
Only document exceptions raised if they are not obvious. For example, if you have stated earlier than an
argument type must be an integer, you do not need to document that a TypeError is raised if a non-integer
is passed. Do not provide examples of exceptions being raised unless that is a common case, such
as Hash#fetch raising a KeyError.
Aliases
Mention aliases in the form
Related Methods (optional): In some cases, it is useful to document which methods are related to the
current method. For example, documentation for Hash#[] might mention Hash#fetch as a related method,
and Hash#mergemight mention Hash#merge! as a related method.
Consider which methods may be related to the current method, and if you think the reader would
benefit it, at the end of the method documentation, add a line starting with "Related: " (e.g.
"Related: fetch.").
Don't list more than three related methods. If you think more than three methods are related, list
the three you think are most important.
Consider adding:
o A phrase suggesting how the related method is similar to, or different from,the current
method. See an example at Time#getutc.
o Example code that illustrates the similarities and differences. See examples
at Time#ctime, Time#inspect, Time#to_s.
Methods Accepting Multiple Argument Types: For methods that accept multiple argument types, in some
cases it can be useful to document the different argument types separately. It's best to use a separate
paragraph for each case you are discussing.
Commit messages
Use the following style for commit messages:
Use a succinct subject line
Include reasoning behind the change in the commit message, focusing on why the change is being
made
Refer to issue (such as Fixes [Bug #1234] or Implements [Feature #3456]), or discussion on the
mailing list (such as [ruby-core:12345])
CI : GitHub actions will run on each pull request. There is a CI that runs on master. It has broad coverage of
different systems and architectures, such as Solaris SPARC and macOS.
Making Changes To Standard Libraries: Everything in the lib directory is mirrored from a standalone
repository into the Ruby repository. If you’d like to make contributions to standard libraries, do so in the
standalone repositories, and the changes will be automatically mirrored into the Ruby repository.
For example, CSV lives in a separate repository and is mirrored into Ruby.
bundle install
Libraries with C-extension: If the library has a /ext directory, it has C files that you need to compile with:
bundle exec rake compile
Running tests: All standard libraries use test-unit as the test framework. To run all tests:
Reporting bugs: If you’ve encountered a bug in Ruby, please report it to the Redmine issue tracker available
at bugs.ruby-lang.org, by following these steps:
Check if anyone has already reported your issue by searching the Redmine issue tracker.
If you haven’t already, sign up for an account on the Redmine issue tracker.
If you can’t find a ticket addressing your issue, please create a new issue. You will need to fill in
the subject, description and Ruby version.
o Ensure the issue exists on Ruby master by trying to replicate your bug on the head of
master (see "making changes to Ruby").
o Write a concise subject and briefly describe your problem in the description section. If
your issue affects a released version of Ruby, please say so.
o Fill in the Ruby version you're using when experiencing this issue (the output of
running ruby -v).
o Attach any logs or reproducible programs to provide additional information. Any scripts
should be as small as possible.
If the ticket doesn’t have any replies after 10 days, you can send a reminder.
Please reply to feedback requests. If a bug report doesn't get any feedback, it'll eventually get
rejected.
Reporting website issues: If you’re having an issue with the bug tracker or the mailing list, you can contact
the webmaster, Hiroshi SHIBATA ([email protected]). You can report issues with ruby-lang.org on
the repo's issue tracker.
Requesting features: If there’s a new feature that you want to see added to Ruby, you will need to write a
proposal on the Redmine issue tracker. When you open the issue, select Feature in the Tracker dropdown.
When writing a proposal, be sure to check for previous discussions on the topic and have a solid use case.
You should also consider the potential compatibility issues that this new feature might raise. Consider
making your feature into a gem, and if there are enough people who benefit from your feature it could help
persuade Ruby core.
Here is a template you can use for a feature proposal:
Backport requests: If a bug exists in a released version of Ruby, please report this in the issue. Once this
bug is fixed, the fix can be backported if deemed necessary. Only Ruby committers can request
backporting, and backporting is done by the backport manager. New patch versions are released at the
discretion of the backport manager.
Ruby versions can be in one of three maintenance states:
Stable releases: backport any bug fixes
Security maintenance: only backport security fixes
End of life: no backports, please upgrade your Ruby version
Testing Ruby - Test suites: There are several test suites in the Ruby codebase: We can run any of the make
scripts in parallel to speed them up.
1. bootstraptest/
This is a small test suite that runs on Miniruby (see building Ruby). We can run it with:
make btest
To run it with logs, we can use:
To run individual bootstrap tests, we can either specify a list of filenames or use the --setsflag in
the variable BTESTS:
If we want to run the bootstrap test suite on Ruby (not Miniruby), we can use:
make test
make test/ruby/test_foo.rb
make test/ruby/test_foo.rb TESTOPTS="-n /test_bar/"
2. test / This is a more comprehensive test suite that runs on Ruby. We can run it with:
make test-all
We can run a specific test directory in this suite using the TESTS option, for example:
We can run a specific test file in this suite by also using the TESTS option, for example:
We can run a specific test in this suite using the TESTS option, specifying first the file name, and
then the test name, prefixed with --name. For example:
make check
3. spec/ruby
This is a test suite that exists in the Ruby spec repository and is mirrored into
the spec/ruby directory in the Ruby repository. It tests the behavior of the Ruby programming
language. We can run this using:
make test-spec
To run a specific file, we can also use MSPECOPT to specify the file:
To run a specific test, we can use the --example flag to match against the test name:
make spec/ruby/core/foo/bar_spec.rb
4. spec/bundler
The bundler test suite exists in the RubyGems repository and is mirrored into
the spec/bundler directory in the Ruby repository. We can run this using:
make test-bundler
item = {
id: "0001",
type: "donut",
name: "Cake",
ppu: 0.55,
batters: {
batter: [
{id: "1001", type: "Regular"},
{id: "1002", type: "Chocolate"},
{id: "1003", type: "Blueberry"},
{id: "1004", type: "Devil's Food"}
]
},
topping: [
{id: "5001", type: "None"},
{id: "5002", type: "Glazed"},
{id: "5005", type: "Sugar"},
{id: "5007", type: "Powdered Sugar"},
{id: "5006", type: "Chocolate with Sprinkles"},
{id: "5003", type: "Chocolate"},
{id: "5004", type: "Maple"}
]
}
Without a dig method, you can write, erroneously (raises NoMethodError (undefined method `[]' for
nil:NilClass)):
item[:batters][:BATTER][1][:type]
With a dig method, you can write (still erroneously, but avoiding the exception):
obj.dig(*identifiers)
A dig method raises an exception if any receiver does not respond to #dig:
h = { foo: 1 }
# Raises TypeError (Integer does not have #dig method):
h.dig(:foo, :bar)
What Else?: The structure above has Hash objects and Array objects, both of which have instance
method dig. Altogether there are six built-in Ruby classes that have method dig, three in the core classes
and three in the standard library. In the core:
Array#dig: the first argument is an Integer index.
Hash#dig: the first argument is a key.
Struct#dig: the first argument is a key.
provider:module:function:name(arguments)
Since module and function cannot be specified, they will be blank. An example probe definition for Ruby
would then be:
Where “ruby” is the provider name, module and function names are blank, the probe name is “method-
entry”, and the probe takes four arguments:
class name
method name
file name
line number
Probes List
Stability
Before we list the specific probes, let’s talk about stability. Probe stability is declared in the probes.d file at
the bottom on the pragma D attributes lines. Here is a description of each of the stability declarations.
Provider name stability
The provider name of “ruby” has been declared as stable. It is unlikely that we will change the
provider name from “ruby” to something else.
Module and Function stability
Since we are not allowed to provide values for the module and function name, the values we
have provided (no value) is declared as stable.
Probe name stability
The probe names are likely to change in the future, so they are marked as “Evolving”.
Consumers should not depend on these names to be stable.
Probe argument stability
The parameters passed to the probes are likely to change in the future, so they are marked as
“Evolving”. Consumers should not depend on these to be stable.
Declared probes
Probes are defined in the probes.d file. Here are the declared probes along with when they are fired and the
arguments they take:
ruby:::method-entry(classname, methodname, filename, lineno);
This probe is fired just before a method is entered.
classname
name of the class (a string)
methodname
name of the method about to be executed (a string)
filename
the file name where the method is _being called_ (a string)
lineno
the line number where the method is _being called_ (an int)
NOTE: will only be fired if tracing is enabled, e.g. with: TracePoint.new{}.enable.
See Feature#14104 for more details.
ruby:::method-return(classname, methodname, filename, lineno);
This probe is fired just after a method has returned. The arguments are the same as
“ruby:::method-entry”.
NOTE: will only be fired if tracing is enabled, e.g. with: TracePoint.new{}.enable.
See Feature#14104 for more details.
ruby:::cmethod-entry(classname, methodname, filename, lineno);
This probe is fired just before a C method is entered. The arguments are the same as
“ruby:::method-entry”.
ruby:::cmethod-return(classname, methodname, filename, lineno);
This probe is fired just before a C method returns. The arguments are the same as
“ruby:::method-entry”.
ruby:::require-entry(requiredfile, filename, lineno);
This probe is fired on calls to rb_require_safe (when a file is required).
requiredfile
the name of the file to be required (string).
filename
the file that called “require” (string).
lineno
the line number where the call to require was made (int).
ruby:::require-return(requiredfile, filename, lineno);
This probe is fired just before rb_require_safe (when a file is required) returns. The arguments
are the same as “ruby:::require-entry”. This probe will not fire if there was an exception during file
require.
ruby:::find-require-entry(requiredfile, filename, lineno);
This probe is fired right before search_required is called. search_required determines whether
the file has already been required by searching loaded features ($"), and if not, figures out which
file must be loaded.
requiredfile
the file to be required (string).
filename
the file that called “require” (string).
lineno
the line number where the call to require was made (int).
ruby:::find-require-return(requiredfile, filename, lineno);
This probe is fired right after search_required returns. See the documentation for “ruby:::find-
require-entry” for more details. Arguments for this probe are the same as “ruby:::find-require-
entry”.
ruby:::load-entry(loadedfile, filename, lineno);
This probe is fired when calls to “load” are made. The arguments are the same as “ruby:::require-
entry”.
ruby:::load-return(loadedfile, filename, lineno);
This probe is fired when “load” returns. The arguments are the same as “ruby:::load-entry”.
ruby:::raise(classname, filename, lineno);
This probe is fired when an exception is raised.
classname
the class name of the raised exception (string)
filename
the name of the file where the exception was raised (string)
lineno
the line number in the file where the exception was raised (int)
ruby:::object-create(classname, filename, lineno);
This probe is fired when an object is about to be allocated.
classname
the class of the allocated object (string)
filename
the name of the file where the object is allocated (string)
lineno
the line number in the file where the object is allocated (int)
ruby:::array-create(length, filename, lineno);
This probe is fired when an Array is about to be allocated.
length
the size of the array (long)
filename
the name of the file where the array is allocated (string)
lineno
the line number in the file where the array is allocated (int)
ruby:::hash-create(length, filename, lineno);
This probe is fired when a Hash is about to be allocated.
length
the size of the hash (long)
filename
the name of the file where the hash is allocated (string)
lineno
the line number in the file where the hash is allocated (int)
ruby:::string-create(length, filename, lineno);
This probe is fired when a String is about to be allocated.
length
the size of the string (long)
filename
the name of the file where the string is allocated (string)
lineno
the line number in the file where the string is allocated (int)
ruby:::symbol-create(str, filename, lineno);
This probe is fired when a Symbol is about to be allocated.
str
the contents of the symbol (string)
filename
the name of the file where the string is allocated (string)
lineno
the line number in the file where the string is allocated (int)
ruby:::parse-begin(sourcefile, lineno);
Fired just before parsing and compiling a source file.
sourcefile
the file being parsed (string)
lineno
the line number where the source starts (int)
ruby:::parse-end(sourcefile, lineno);
Fired just after parsing and compiling a source file.
sourcefile
the file being parsed (string)
lineno
the line number where the source ended (int)
ruby:::gc-mark-begin();
Fired at the beginning of a mark phase.
ruby:::gc-mark-end();
Fired at the end of a mark phase.
ruby:::gc-sweep-begin();
Fired at the beginning of a sweep phase.
ruby:::gc-sweep-end();
Fired at the end of a sweep phase.
ruby:::method-cache-clear(class, sourcefile, lineno);
Fired when the method cache is cleared.
class
the classname being cleared, or “global” (string)
sourcefile
the file being parsed (string)
lineno
the line number where the source ended (int)
Encodings
The Basics
A character encoding, often shortened to encoding, is a mapping between:
A sequence of 8-bit bytes (each byte in the range 0..255).
Characters in a specific character set.
Some character sets contain only 1-byte characters; US-ASCII, for example, has 256 1-byte characters.
This string, encoded in US-ASCII, has six characters that are stored as six bytes:
Other encodings may involve multi-byte characters. UTF-8, for example, encodes more than one million
characters, encoding each in one to four bytes. The lowest-valued of these characters correspond to ASCII
characters, and so are 1-byte characters:
Encoding Objects
Ruby encodings are defined by constants in class Encoding. There can be only one instance of Encoding
for each of these constants. Method Encoding.list returns an array of Encoding objects (one for each
constant):
An Encoding object has zero or more aliases; method Encoding#names returns an array containing the
name and all aliases:
Encoding::ASCII_8BIT.names
# => ["ASCII-8BIT", "BINARY"]
Encoding::WINDOWS_31J.names
#=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"]
Encoding.aliases.size # => 71
Encoding.aliases.take(3)
# => [["BINARY", "ASCII-8BIT"], ["CP437", "IBM437"], ["CP720", "IBM720"]]
Method Encoding.name_list returns an array of all the encoding names and aliases:
Method name_list returns more entries than method list because it includes both the names and their
aliases.
Method Encoding.find returns the Encoding for a given name or alias, if it exists:
Default Encodings
Method Encoding.find, above, also returns a default Encoding for each of these special names:
external: the default external Encoding:
Compatible Encodings
Method Encoding.compatible? returns whether two given objects are encoding-compatible (that is, whether
they can be concatenated); returns the Encoding of the concatenated string, or nil if incompatible:
String Encoding
A Ruby String object has an encoding that is an instance of class Encoding. The encoding may be retrieved
by method String#encoding.
The default encoding for a string literal is the script encoding (see Script encoding at Encoding):
The default encoding for a string created with method String.new is:
For a String object argument, the encoding of that string.
For a string literal, the script encoding (see Script encoding at Encoding).
In either case, any encoding may be specified:
Changing the assigned encoding does not alter the content of the string; it changes only the way the content
is to be interpreted:
s # => "R\xC3\xA9sum\xC3\xA9"
s.force_encoding('UTF-8') # => "Résumé"
The actual content of a string may also be altered; see Transcoding a String.
Here are a couple of useful query methods:
Filesystem Encoding
The filesystem encoding is the default Encoding for a string from the filesystem:
Encoding.find("filesystem") # => #<Encoding:UTF-8>
Locale Encoding
The locale encoding is the default encoding for a string from the environment, other than from the
filesystem:
Stream Encodings
Certain stream objects can have two encodings; these objects include instances of:
IO.
File.
ARGF.
StringIO.
The two encodings are:
An external encoding, which identifies the encoding of the stream.
An internal encoding, which (if not nil) specifies the encoding to be used for the string constructed
from the stream.
External Encoding
The external encoding, which is an Encoding object, specifies how bytes read from the stream are to be
interpreted as characters.
The default external encoding is:
UTF-8 for a text stream.
ASCII-8BIT for a binary stream.
The default external encoding is returned by method Encoding.default_external, and may be set by:
Ruby command-line options --external_encoding or -E.
You can also set the default external encoding using method Encoding.default_external=, but doing so may
cause problems; strings created before and after the change may have a different encodings.
For an IO or File object, the external encoding may be set by:
Open options external_encoding or encoding, when the object is created; see Open Options.
For an IO, File, ARGF, or StringIO object, the external encoding may be set by:
Methods set_encoding or (except for ARGF) set_encoding_by_bom.
Internal Encoding
The internal encoding, which is an Encoding object or nil, specifies how characters read from the stream are
to be converted to characters in the internal encoding; those characters become a string whose encoding is
set to the internal encoding.
The default internal encoding is nil (no conversion). It is returned by method Encoding.default_internal, and
may be set by:
Ruby command-line options --internal_encoding or -E.
You can also set the default internal encoding using method Encoding.default_internal=, but doing so may
cause problems; strings created before and after the change may have a different encodings.
For an IO or File object, the internal encoding may be set by:
Open options internal_encoding or encoding, when the object is created; see Open Options.
For an IO, File, ARGF, or StringIO object, the internal encoding may be set by:
Method set_encoding.
Script Encoding
A Ruby script has a script encoding, which may be retrieved by:
The default script encoding is UTF-8; a Ruby source file may set its script encoding with a magic comment
on the first line of the file (or second line, if there is a shebang on the first). The comment must contain the
word coding or encoding, followed by a colon, space and the Encoding name or alias:
# encoding: ISO-8859-1
__ENCODING__ #=> #<Encoding:ISO-8859-1>
Transcoding
Transcoding is the process of changing a sequence of characters from one encoding to another.
As far as possible, the characters remain the same, but the bytes that represent them may change.
The handling for characters that cannot be represented in the destination encoding may be specified by
@Encoding+Options.
Transcoding a String
Each of these methods transcodes a string:
String#encode: Transcodes self into a new string according to given encodings and options.
String#encode!: Like String#encode, but transcodes self in place.
String#scrub: Transcodes self into a new string by replacing invalid byte sequences with a given or
default replacement string.
String#scrub!: Like String#scrub, but transcodes self in place.
String#unicode_normalize: Transcodes self into a new string according to Unicode normalization.
String#unicode_normalize!: Like String#unicode_normalize, but transcodes self in place.
Transcoding a Stream
Each of these methods may transcode a stream; whether it does so depends on the external and internal
encodings:
IO.foreach: Yields each line of given stream to the block.
IO.new: Creates and returns a new IO object for the given integer file descriptor.
IO.open: Creates a new IO object.
IO.pipe: Creates a connected pair of reader and writer IO objects.
IO.popen: Creates an IO object to interact with a subprocess.
IO.read: Returns a string with all or a subset of bytes from the given stream.
IO.readlines: Returns an array of strings, which are the lines from the given stream.
IO.write: Writes a given string to the given stream.
This example writes a string to a file, encoding it as ISO-8859-1, then reads the file into a new string,
encoding it as UTF-8:
s = "R\u00E9sum\u00E9"
path = 't.tmp'
ext_enc = 'ISO-8859-1'
int_enc = 'UTF-8'
p raw_text
p transcoded_text
Output:
"R\xE9sum\xE9"
"Résumé"
Encoding Options
A number of methods in the Ruby core accept keyword arguments as encoding options.
Some of the options specify or utilize a replacement string, to be used in certain transcoding operations. A
replacement string may be in any encoding that can be converted to the encoding of the destination string.
These keyword-value pairs specify encoding options:
For an invalid byte sequence:
o :invalid: nil (default): Raise exception.
o :invalid: :replace: Replace each invalid byte sequence with the replacement string.
Examples:
s = "\x80foo\x80"
s.encode('ISO-8859-3') # Raises Encoding::InvalidByteSequenceError.
s.encode('ISO-8859-3', invalid: :replace) # => "?foo?"
s = "\x80foo\x80"
"\x80".encode('UTF-8', 'ASCII-8BIT') # Raises Encoding::UndefinedConversionError.
s.encode('UTF-8', 'ASCII-8BIT', undef: :replace) # => "�foo�"
Replacement string:
o :replace: nil (default): Set replacement string to default value: "\uFFFD"(“�”) for a
Unicode encoding, '?' otherwise.
o :replace: some_string: Set replacement string to the given some_string;
overrides :fallback.
Examples:
s = "\xA5foo\xA5"
options = {:undef => :replace, :replace => 'xyzzy'}
s.encode('UTF-8', 'ISO-8859-3', **options) # => "xyzzyfooxyzzy"
Replacement fallback:
One of these may be specified:
o :fallback: nil (default): No replacement fallback.
o :fallback: hash_like_object: Set replacement fallback to the given hash_like_object; the
replacement string is hash_like_object[X].
o :fallback: method: Set replacement fallback to the given method; the replacement string
is method(X).
o :fallback: proc: Set replacement fallback to the given proc; the replacement string
is proc[X].
Examples:
s = "\u3042foo\u3043"
XML entities:
One of these may be specified:
o :xml: nil (default): No handling for XML entities.
o :xml: :text: Treat source text as XML; replace each undefined character with its upper-
case hexdecimal numeric character reference, except that:
& is replaced with &.
< is replaced with <.
> is replaced with >.
o :xml: :attr: Treat source text as XML attribute value; replace each undefined character
with its upper-case hexdecimal numeric character reference, except that:
The replacement string r is double-quoted ("r").
Each embedded double-quote is replaced with ".
& is replaced with &.
< is replaced with <.
> is replaced with >.
Examples:
s = 'foo"<&>"bar' + "\u3042"
s.encode('ASCII', xml: :text) # => "foo\"<&>\"barあ"
s.encode('ASCII', xml: :attr) # => "\"foo"<&>"barあ\""
Newlines:
One of these may be specified:
o :cr_newline: true: Replace each line-feed character ("\n") with a carriage-return character
("\r").
o :crlf_newline: true: Replace each line-feed character ("\n") with a carriage-return/line-feed
string ("\r\n").
o :universal_newline: true: Replace each carriage-return character ("\r") and each carriage-
return/line-feed string ("\r\n") with a line-feed character ("\n").
Examples:
Basic Knowledge
In C, variables have types and data do not have types. In contrast, Ruby variables do not have a static type,
and data themselves have types, so data will need to be converted between the languages.
Data in Ruby are represented by the C type ‘VALUE’. Each VALUE data has its data type.
To retrieve C data from a VALUE, you need to:
1. Identify the VALUE’s data type
2. Convert the VALUE into C data
Converting to the wrong data type may cause serious problems.
Data Types
The Ruby interpreter has the following data types:
T_NIL
nil
T_OBJECT
ordinary object
T_CLASS
class
T_MODULE
module
T_FLOAT
floating point number
T_STRING
string
T_REGEXP
regular expression
T_ARRAY
array
T_HASH
associative array
T_STRUCT
(Ruby) structure
T_BIGNUM
multi precision integer
T_FIXNUM
Fixnum(31bit or 63bit integer)
T_COMPLEX
complex number
T_RATIONAL
rational number
T_FILE
IO
T_TRUE
true
T_FALSE
false
T_DATA
data
T_SYMBOL
symbol
In addition, there are several other types used internally:
T_ICLASS
included module
T_MATCH
MatchData object
T_UNDEF
undefined
T_NODE
syntax tree node
T_ZOMBIE
object awaiting finalization
Most of the types are represented by C structures.
Check Data Type of the VALUE
The macro TYPE() defined in ruby.h shows the data type of the VALUE. TYPE() returns the constant
number T_XXXX described above. To handle data types, your code will look something like this:
switch (TYPE(obj)) {
case T_FIXNUM:
/* process Fixnum */
break;
case T_STRING:
/* process String */
break;
case T_ARRAY:
/* process Array */
break;
default:
/* raise exception */
break;
which raises an exception if the VALUE does not have the type specified.
There are also faster check macros for fixnums and nil.
FIXNUM_P(obj)
NIL_P(obj)
These functions return the newly created class or module. You may want to save this reference into a
variable to use later.
To define nested classes or modules, use the functions below:
The ‘argc’ represents the number of the arguments to the C function, which must be less than 17. But I
doubt you’ll need that many.
If ‘argc’ is negative, it specifies the calling sequence, not number of the arguments.
If argc is -1, the function will be called as:
where argc is the actual number of arguments, argv is the C array of the arguments, and obj is the receiver.
If argc is -2, the arguments are passed in a Ruby array. The function will be called like:
At last, rb_define_module_function defines a module function, which are private AND singleton methods of
the module. For example, sqrt is a module function defined in the Math module. It can be called in the
following way:
Math.sqrt(4)
or
include Math
sqrt(4)
In addition, function-like methods, which are private methods defined in the Kernel module, can be defined
using:
void rb_define_attr(VALUE klass, const char *name, int read, int write)
func has to take the klass as the argument and return a newly allocated instance. This instance should be
as empty as possible, without any expensive (including external) resources.
If you are overriding an existing method of any ancestor of your class, you may rely on:
kw_splat can have these possible values (used by all methods that accept kw_splat argument):
RB_NO_KEYWORDS
Do not pass keywords
RB_PASS_KEYWORDS
Pass keywords, final argument should be a hash of keywords
RB_PASS_CALLED_KEYWORDS
Pass keywords if current method was called with keywords, useful for argument delegation
To achieve the receiver of the current scope (if no other way is available), you can use:
VALUE rb_current_receiver(void)
Constant Definition
We have 2 functions to define constants:
The former is to define a constant under specified class/module. The latter is to define a global constant.
Evaluation is done under the current context, thus current local variables of the innermost method (which is
defined by Ruby) can be accessed.
Note that the evaluation can raise an exception. There is a safer function:
It returns nil when an error occurred. Moreover, *state is zero if str was successfully evaluated, or nonzero
otherwise.
ID or Symbol
You can invoke methods directly, without parsing the string. First I need to explain about ID. ID is the integer
number to represent Ruby’s identifiers such as variable names. The Ruby data type corresponding to ID
is Symbol. It can be accessed from Ruby in the form:
:Identifier
or
You can get the ID value from a string within C code by using
You can retrieve ID from Ruby object (Symbol or String) given as an argument by using
rb_to_id(VALUE symbol)
These functions try to convert the argument to a String if it was not a Symbol nor a String. The second
function stores the converted result into *name, and returns 0 if the string is not a known symbol. After this
function returned a non-zero value, *name is always a Symbol or a String, otherwise it is a String if the
result is 0. The third function takes NUL-terminated C string, not Ruby VALUE.
You can retrieve Symbol from Ruby object (Symbol or String) given as an argument by using
rb_to_symbol(VALUE name)
These functions are similar to above functions except that these return a Symbol instead of an ID.
You can convert C ID to Ruby Symbol by using
ID SYM2ID(VALUE symbol)
This function invokes a method on the recv, with the method name specified by the symbol mid.
This function defines the variable which is shared by both environments. The value of the global variable
pointed to by ‘var’ can be accessed through Ruby’s global variable named ‘name’.
You can define read-only (from Ruby, of course) variables using the function below.
You can define hooked variables. The accessor functions (getter and setter) are called on access to the
hooked variables.
void rb_define_hooked_variable(const char *name, VALUE *var,
If you need to supply either setter or getter, just supply 0 for the hook you don’t need. If both hooks are 0,
rb_define_hooked_variable() works just like rb_define_variable().
The prototypes of the getter and setter functions are as follows:
Also you can define a Ruby global variable without a corresponding C variable. The value of the variable will
be set/get only by hooks.
struct rb_data_type_struct {
struct {
void (*dmark)(void*);
void (*dfree)(void*);
void (*dcompact)(void*);
void *reserved[1];
} function;
void *data;
VALUE flags;
};
wrap_struct_name is an identifier of this instance of the struct. It is basically used for collecting and emitting
statistics. So the identifier must be unique in the process, but doesn’t need to be valid as a C or Ruby
identifier.
These dmark / dfree functions are invoked during GC execution. No object allocations are allowed during it,
so do not allocate ruby objects inside them.
dmark is a function to mark Ruby objects referred from your struct. It must mark all references from your
struct with rb_gc_mark or its family if your struct keeps such references.
dfree is a function to free the pointer allocation. If this is RUBY_DEFAULT_FREE, the pointer will be just
freed.
dsize calculates memory consumption in bytes by the struct. Its parameter is a pointer to your struct. You
can pass 0 as dsize if it is hard to implement such a function. But it is still recommended to avoid 0.
dcompact is invoked when memory compaction took place. Referred Ruby objects that were marked by
rb_gc_mark_movable() can here be updated per rb_gc_location().
You have to fill reserved with 0.
parent can point to another C type definition that the Ruby object is inherited from. Then
TypedData_Get_Struct() does also accept derived objects.
You can fill “data” with an arbitrary value for your use. Ruby does nothing with the member.
flags is a bitwise-OR of the following flag values. Since they require deep understanding of garbage
collector in Ruby, you can just set 0 to flags if you are not sure.
RUBY_TYPED_FREE_IMMEDIATELY
This flag makes the garbage collector immediately invoke dfree() during GC when it need to free
your struct. You can specify this flag if the dfree never unlocks Ruby’s internal lock (GVL).
If this flag is not set, Ruby defers invocation of dfree() and invokes dfree() at the same time as
finalizers.
RUBY_TYPED_WB_PROTECTED
It shows that implementation of the object supports write barriers. If this flag is set, Ruby is better
able to do garbage collection of the object.
When it is set, however, you are responsible for putting write barriers in all implementations of
methods of that object as appropriate. Otherwise Ruby might crash while running.
More about write barriers can be found in “Generational GC” in Appendix D.
RUBY_TYPED_FROZEN_SHAREABLE
This flag indicates that the object is shareable object if the object is frozen. See Appendix F more
details.
If this flag is not set, the object can not become a shareable object
by Ractor.make_shareable() method.
You can allocate and wrap the structure in one step.
This macro returns an allocated Data object, wrapping the pointer to the structure, which is also allocated.
This macro works like:
Arguments klass and data_type work like their counterparts in TypedData_Wrap_Struct(). A pointer to the
allocated structure will be assigned to sval, which should be a pointer of the type specified.
% mkdir ext/dbm
#include <ruby.h>
void
Init_dbm(void)
/* Redefine DBM.allocate
rb_define_alloc_func(cDBM, fdbm_alloc);
rb_include_module(cDBM, rb_mEnumerable);
/* ... */
id_dbm = rb_intern("dbm");
The dbm extension wraps the dbm struct in the C environment using TypedData_Make_Struct.
struct dbmdata {
int di_size;
DBM *di_dbm;
};
"dbm",
0, 0,
RUBY_TYPED_FREE_IMMEDIATELY,
};
static VALUE
fdbm_alloc(VALUE klass)
/* Allocate T_DATA object and C struct and fill struct with zero bytes */
This code wraps the dbmdata structure into a Ruby object. We avoid wrapping DBM* directly, because we
want to cache size information. Since Object.allocate allocates an ordinary T_OBJECT type (instead of
T_DATA), it’s important to either use rb_define_alloc_func() to overwrite it or rb_undef_alloc_func() to delete
it.
To retrieve the dbmdata structure from a Ruby object, we define the following macro:
if ((dbmp) == 0) closed_dbm();\
if ((dbmp)->di_dbm == 0) closed_dbm();\
} while (0)
This sort of complicated macro does the retrieving and close checking for the DBM.
There are three kinds of way to receive method arguments. First, methods with a fixed number of
arguments receive arguments like this:
static VALUE
GetDBM(obj, dbmp);
dbm_fetch(dbmp->di_dbm, StringValueCStr(keystr));
/* ... */
The first argument of the C function is the self, the rest are the arguments to the method.
Second, methods with an arbitrary number of arguments receive arguments like this:
static VALUE
/* ... */
}
/* ... */
The first argument is the number of method arguments, the second argument is the C array of the method
arguments, and the third argument is the receiver of the method.
You can use the function rb_scan_args() to check and retrieve the arguments. The third argument is a string
that specifies how to capture method arguments and assign them to the following VALUE references.
You can just check the argument number with rb_check_arity(), this is handy in the case you want to treat
the arguments as a list.
The following is an example of a method that takes arguments by Ruby’s array:
static VALUE
/* ... */
The first argument is the receiver, the second one is the Ruby array which contains the arguments to the
method.
Notice: GC should know about global variables which refer to Ruby’s objects, but are not exported to the
Ruby world. You need to protect them by
Prepare extconf.rb
If the file named extconf.rb exists, it will be executed to generate Makefile.
extconf.rb is the file for checking compilation conditions etc. You need to put
require 'mkmf'
at the top of the file. You can use the functions below to check various conditions.
have_library(lib[, func[, headers[, opt]]]): check whether library containing function exists
Generate Makefile
Try generating the Makefile by:
ruby extconf.rb
If the library should be installed under vendor_ruby directory instead of site_ruby directory, use –vendor
option as follows.
You don’t need this step if you put the extension library under the ext directory of the ruby source tree. In
that case, compilation of the interpreter will do this step for you.
Run make
Type
make
to compile your extension. You don’t need this step either if you have put the extension library under the ext
directory of the ruby source tree.
Debug
You may need to rb_debug the extension. Extensions can be linked statically by adding the directory name
in the ext/Setup file so that you can inspect the extension with the debugger.
compile.c
eval.c
eval_error.c
eval_jump.c
eval_safe.c
thread_pthread.c : ditto
vm.c
vm_dump.c
vm_eval.c
vm_exec.c
vm_insnhelper.c
vm_method.c
regcomp.c
regenc.c
regerror.c
regexec.c
regparse.c
regsyntax.c
Utility Functions
debug.c
debug symbols for C debugger
dln.c
dynamic loading
st.c
general purpose hash table
strftime.c
formatting times
util.c
misc utilities
Ruby Interpreter Implementation
dmyext.c
dmydln.c
dmyencoding.c
id.c
inits.c
main.c
ruby.c
version.c
gem_prelude.rb
prelude.rb
Class Library
array.c
Array
bignum.c
Bignum
compar.c
Comparable
complex.c
Complex
cont.c
Fiber, Continuation
dir.c
Dir
enum.c
Enumerable
enumerator.c
Enumerator
file.c
File
hash.c
Hash
io.c
IO
marshal.c
Marshal
math.c
Math
numeric.c
Numeric, Integer, Fixnum, Float
pack.c
Array#pack, String#unpack
proc.c
Binding, Proc
process.c
Process
random.c
random number
range.c
Range
rational.c
Rational
re.c
Regexp, MatchData
signal.c
Signal
sprintf.c
String#sprintf
string.c
String
struct.c
Struct
time.c
Time
defs/known_errors.def
Errno::* exception classes
-> known_errors.inc
automatically generated
Multilingualization
encoding.c
Encoding
transcode.c
Encoding::Converter
enc/*.c
encoding classes
enc/trans/*
codepoint mapping tables
goruby.c
Types
VALUE
The type for the Ruby object. Actual structures are defined in ruby.h, such as struct RString, etc.
To refer the values in structures, use casting macros like RSTRING(obj).
C Pointer Wrapping
Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval)
Wrap a C pointer into a Ruby object. If object has references to other Ruby objects, they should
be marked by using the mark function during the GC process. Otherwise, mark should be 0.
When this object is no longer referred by anywhere, the pointer will be discarded by free function.
Data_Make_Struct(klass, type, mark, free, sval)
This macro allocates memory using malloc(), assigns it to the variable sval, and returns the
DATA encapsulating the pointer to memory region.
Data_Get_Struct(data, type, sval)
This macro retrieves the pointer value from DATA, and assigns it to the variable sval.
Checking Data Types
RB_TYPE_P(value, type)
Is value an internal type (T_NIL, T_FIXNUM, etc.)?
TYPE(value)
Internal type (T_NIL, T_FIXNUM, etc.)
FIXNUM_P(value)
Is value a Fixnum?
NIL_P(value)
Is value nil?
RB_INTEGER_TYPE_P(value)
Is value an Integer?
RB_FLOAT_TYPE_P(value)
Is value a Float?
void Check_Type(VALUE value, int type)
Ensures value is of the given internal type or raises a TypeError
Data Type Conversion
FIX2INT(value), INT2FIX(i)
Fixnum <-> integer
FIX2LONG(value), LONG2FIX(l)
Fixnum <-> long
NUM2INT(value), INT2NUM(i)
Numeric <-> integer
NUM2UINT(value), UINT2NUM(ui)
Numeric <-> unsigned integer
NUM2LONG(value), LONG2NUM(l)
Numeric <-> long
NUM2ULONG(value), ULONG2NUM(ul)
Numeric <-> unsigned long
NUM2LL(value), LL2NUM(ll)
Numeric <-> long long
NUM2ULL(value), ULL2NUM(ull)
Numeric <-> unsigned long long
NUM2OFFT(value), OFFT2NUM(off)
Numeric <-> off_t
NUM2SIZET(value), SIZET2NUM(size)
Numeric <-> size_t
NUM2SSIZET(value), SSIZET2NUM(ssize)
Numeric <-> ssize_t
rb_integer_pack(value, words, numwords, wordsize, nails, flags),
rb_integer_unpack(words, numwords, wordsize, nails, flags)
Numeric <-> Arbitrary size integer buffer
NUM2DBL(value)
Numeric -> double
rb_float_new(f)
double -> Float
RSTRING_LEN(str)
String -> length of String data in bytes
RSTRING_PTR(str)
String -> pointer to String data Note that the result pointer may not be NUL-terminated
StringValue(value)
Object with #to_str -> String
StringValuePtr(value)
Object with #to_str -> pointer to String data
StringValueCStr(value)
Object with #to_str -> pointer to String data without NUL bytes It is guaranteed that the result
data is NUL-terminated
rb_str_new2(s)
char * -> String
The getter function must return the value for the access.
void rb_define_hooked_variable(const char *name, VALUE *var, VALUE (*getter)(), void (*setter)())
Defines hooked variable. It’s a virtual variable with a C variable. The getter is called as
Constant Definition
void rb_define_const(VALUE klass, const char *name, VALUE val)
Defines a new constant under the class/module.
void rb_define_global_const(const char *name, VALUE val)
Defines a global constant. This is just the same as
rb_define_const(rb_cObject, name, val)
Method Definition
rb_define_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)
Defines a method for the class. func is the function pointer. argc is the number of arguments. if
argc is -1, the function will receive 3 arguments: argc, argv, and self. if argc is -2, the function will
receive 2 arguments, self and args, where args is a Ruby array of the method arguments.
rb_define_private_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)
Defines a private method for the class. Arguments are same as rb_define_method().
rb_define_singleton_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)
Defines a singleton method. Arguments are same as rb_define_method().
rb_check_arity(int argc, int min, int max)
Check the number of arguments, argc is in the range of min..max. If max is
UNLIMITED_ARGUMENTS, upper bound is not checked. If argc is out of bounds,
an ArgumentError will be raised.
rb_scan_args(int argc, VALUE *argv, const char *fmt, …)
Retrieve argument from argc and argv to given VALUE references according to the format string.
The format can be described in ABNF as follows:
pre-opt-post-arg-spec
post-arg-spec := sym-for-variable-length-args
[num-of-trailing-mandatory-args]
num-of-trailing-mandatory-args
keyword-arg-spec := sym-for-keyword-arg
block-arg-spec := sym-for-block-arg
num-of-leading-mandatory-args := DIGIT ; The number of leading
; mandatory arguments
; arguments
; mandatory arguments
; given
For example, “12” means that the method requires at least one argument, and at most receives
three (1+2) arguments. So, the format string must be followed by three variable references, which
are to be assigned to captured arguments. For omitted arguments, variables are set to Qnil.
NULL can be put in place of a variable reference, which means the corresponding captured
argument(s) should be just dropped.
The number of given arguments, excluding an option hash or iterator block, is returned.
rb_scan_args_kw(int kw_splat, int argc, VALUE *argv, const char *fmt, …)
The same as rb_scan_args, except the kw_splat argument specifies whether keyword arguments
are provided (instead of being determined by the call from Ruby to the C
function). kw_splat should be one of the following values:
RB_SCAN_ARGS_PASS_CALLED_KEYWORDS
Same behavior as rb_scan_args.
RB_SCAN_ARGS_KEYWORDS
The final argument should be a hash treated as keywords.
RB_SCAN_ARGS_LAST_HASH_KEYWORDS
Treat a final argument as keywords if it is a hash, and not as keywords otherwise.
int rb_get_kwargs(VALUE keyword_hash, const ID *table, int required, int optional, VALUE
*values)
Retrieves argument VALUEs bound to keywords, which directed by table into values, deleting
retrieved entries from keyword_hash along the way. First required number of IDs referred
by table are mandatory, and succeeding optional (- optional - 1 if optionalis negative) number of
IDs are optional. If a mandatory key is not contained in keyword_hash, raises “missing
keyword” ArgumentError. If an optional key is not present in keyword_hash, the corresponding
element in values is set to Qundef. If optional is negative, rest of keyword_hash are ignored,
otherwise raises “unknown keyword” ArgumentError.
Be warned, handling keyword arguments in the C API is less efficient than handling them in
Ruby. Consider using a Ruby wrapper method around a non-keyword C function. ref: bugs.ruby-
lang.org/issues/11339
VALUE rb_extract_keywords(VALUE *original_hash)
Extracts pairs whose key is a symbol into a new hash from a hash object referred
by original_hash. If the original hash contains non-symbol keys, then they are copied to another
hash and the new hash is stored through original_hash, else 0 is stored.
Instance Variables
VALUE rb_iv_get(VALUE obj, const char *name)
Retrieve the value of the instance variable. If the name is not prefixed by ‘@’, that variable shall
be inaccessible from Ruby.
VALUE rb_iv_set(VALUE obj, const char *name, VALUE val)
Sets the value of the instance variable.
Control Structure
VALUE rb_block_call(VALUE recv, ID mid, int argc, VALUE * argv, VALUE (*func) (ANYARGS),
VALUE data2)
Calls a method on the recv, with the method name specified by the symbol mid, with argc
arguments in argv, supplying func as the block. When func is called as the block, it will receive
the value from yield as the first argument, and data2 as the second argument. When yielded with
multiple values (in C, rb_yield_values(), rb_yield_values2() and rb_yield_splat()), data2 is packed
as an Array, whereas yielded values can be gotten via argc/argv of the third/fourth arguments.
VALUE rb_block_call_kw(VALUE recv, ID mid, int argc, VALUE * argv, VALUE (*func) (ANYARGS),
VALUE data2, int kw_splat)
Same as rb_funcall_with_block, using kw_splat to determine whether keyword arguments are
passed.
[OBSOLETE] VALUE rb_iterate(VALUE (*func1)(), VALUE arg1, VALUE (*func2)(), VALUE arg2)
Calls the function func1, supplying func2 as the block. func1 will be called with the argument
arg1. func2 receives the value from yield as the first argument, arg2 as the second argument.
When rb_iterate is used in 1.9, func1 has to call some Ruby-level method. This function is
obsolete since 1.9; use rb_block_call instead.
VALUE rb_yield(VALUE val)
Yields val as a single argument to the block.
VALUE rb_yield_values(int n, …)
Yields n number of arguments to the block, using one C argument per Ruby argument.
VALUE rb_yield_values2(int n, VALUE *argv)
Yields n number of arguments to the block, with all Ruby arguments in the C argv array.
VALUE rb_yield_values_kw(int n, VALUE *argv, int kw_splat)
Same as rb_yield_values2, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_yield_splat(VALUE args)
Same as rb_yield_values2, except arguments are specified by the Ruby array args.
VALUE rb_yield_splat_kw(VALUE args, int kw_splat)
Same as rb_yield_splat, using kw_splatto determine whether keyword arguments are passed.
VALUE rb_rescue(VALUE (*func1)(ANYARGS), VALUE arg1, VALUE (*func2)(ANYARGS),
VALUE arg2)
Calls the function func1, with arg1 as the argument. If an exception occurs during func1, it calls
func2 with arg2 as the first argument and the exception object as the second argument. The
return value of rb_rescue() is the return value from func1 if no exception occurs, from func2
otherwise.
VALUE rb_ensure(VALUE (*func1)(ANYARGS), VALUE arg1, VALUE (*func2)
(ANYARGS), VALUE arg2)
Calls the function func1 with arg1 as the argument, then calls func2 with arg2 if execution
terminated. The return value from rb_ensure() is that of func1 when no exception occurred.
VALUE rb_protect(VALUE (*func) (VALUE), VALUE arg, int *state)
Calls the function func with arg as the argument. If no exception occurred during func, it returns
the result of func and *state is zero. Otherwise, it returns Qnil and sets *state to nonzero. If state
is NULL, it is not set in both cases. You have to clear the error info with rb_set_errinfo(Qnil) when
ignoring the caught exception.
void rb_jump_tag(int state)
Continues the exception caught by rb_protect() and rb_eval_string_protect(). state must be the
returned value from those functions. This function never return to the caller.
void rb_iter_break()
Exits from the current innermost block. This function never return to the caller.
void rb_iter_break_value(VALUE value)
Exits from the current innermost block with the value. The block will return the given argument
value. This function never return to the caller.
Threading
As of Ruby 1.9, Ruby supports native 1:1 threading with one kernel thread per Ruby Thread object.
Currently, there is a GVL (Global VM Lock) which prevents simultaneous execution of Ruby code which
may be released by the rb_thread_call_without_gvl and rb_thread_call_without_gvl2 functions. These
functions are tricky-to-use and documented in thread.c; do not use them before reading comments in
thread.c.
void rb_thread_schedule(void)
Give the scheduler a hint to pass execution to another thread.
Input/Output (IO) on a single file descriptor
int rb_io_wait_readable(int fd)
Wait indefinitely for the given FD to become readable, allowing other threads to be scheduled.
Returns a true value if a read may be performed, false if there is an unrecoverable error.
int rb_io_wait_writable(int fd)
Like rb_io_wait_readable, but for writability.
int rb_wait_for_single_fd(int fd, int events, struct timeval *timeout)
Allows waiting on a single FD for one or multiple events with a specified timeout.
events is a mask of any combination of the following values:
RB_WAITFD_IN - wait for readability of normal data
I/O Multiplexing
Ruby supports I/O multiplexing based on the select(2) system call. The Linux select_tut(2) manpage
<man7.org/linux/man-pages/man2/select_tut.2.html> provides a good overview on how to use select(2), and
the Ruby API has analogous functions and data structures to the well-known select API. Understanding of
select(2) is required to understand this section.
typedef struct rb_fdset_t
The data structure which wraps the fd_set bitmap used by select(2). This allows Ruby to use FD
sets larger than that allowed by historic limitations on modern platforms.
void rb_fd_init(rb_fdset_t *)
Initializes the rb_fdset_t, it must be initialized before other rb_fd_* operations. Analogous to
calling malloc(3) to allocate an fd_set.
void rb_fd_term(rb_fdset_t *)
Destroys the rb_fdset_t, releasing any memory and resources it used. It must be reinitialized
using rb_fd_init before future use. Analogous to calling free(3) to release memory for an fd_set.
void rb_fd_zero(rb_fdset_t *)
Clears all FDs from the rb_fdset_t, analogous to FD_ZERO(3).
void rb_fd_set(int fd, rb_fdset_t *)
Adds a given FD in the rb_fdset_t, analogous to FD_SET(3).
void rb_fd_clr(int fd, rb_fdset_t *)
Removes a given FD from the rb_fdset_t, analogous to FD_CLR(3).
int rb_fd_isset(int fd, const rb_fdset_t *)
Returns true if a given FD is set in the rb_fdset_t, false if not. Analogous to FD_ISSET(3).
int rb_thread_fd_select(int nfds, rb_fdset_t *readfds, rb_fdset_t *writefds, rb_fdset_t
*exceptfds, struct timeval *timeout)
Analogous to the select(2) system call, but allows other Ruby threads to be scheduled while
waiting.
When only waiting on a single FD, favor rb_io_wait_readable, rb_io_wait_writable, or
rb_wait_for_single_fd functions since they can be optimized for specific platforms (currently, only
Linux).
RUBY_EVENT_LINE
RUBY_EVENT_CLASS
RUBY_EVENT_END
RUBY_EVENT_CALL
RUBY_EVENT_RETURN
RUBY_EVENT_C_CALL
RUBY_EVENT_C_RETURN
RUBY_EVENT_RAISE
RUBY_EVENT_ALL
The third argument ‘data’ to rb_add_event_hook() is passed to the hook function as the second
argument, which was the pointer to the current NODE in 1.8. See
RB_EVENT_HOOKS_HAVE_CALLBACK_DATA below.
int rb_remove_event_hook(rb_event_hook_func_t func)
Removes the specified hook function.
Memory usage
void rb_gc_adjust_memory_usage(ssize_t diff)
Adjusts the amount of registered external memory. You can tell GC how much memory is used
by an external library by this function. Calling this function with positive diff means the memory
usage is increased; new memory block is allocated or a block is reallocated as larger size.
Calling this function with negative diff means the memory usage is decreased; a memory block is
freed or a block is reallocated as smaller size. This function may trigger the GC.
#ifndef RB_PASS_KEYWORDS
rb_enumeratorize_with_size((obj), ID2SYM(rb_frame_this_func()), \
} while (0)
#endif
Incompatibility
You can’t write RBASIC(obj)->klass field directly because it is const value now.
Basically you should not write this field because MRI expects it to be an immutable field, but if you want to
do it in your extension you can use the following functions:
VALUE rb_obj_hide(VALUE obj)
Clear RBasic::klass field. The object will be an internal object. ObjectSpace::each_object can’t
find this object.
VALUE rb_obj_reveal(VALUE obj, VALUE klass)
Reset RBasic::klass to be klass. We expect the ‘klass’ is hidden class by rb_obj_hide().
Write barriers
RGenGC doesn’t require write barriers to support generational GC. However, caring about write barrier can
improve the performance of RGenGC. Please check the following tips.
VALUE s, w;
s = rb_str_new_cstr("hello world!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");
sptr = RSTRING_PTR(s);
In the above example, RB_GC_GUARD must be placed after the last use of sptr. Placing RB_GC_GUARD
before dereferencing sptr would be of no use. RB_GC_GUARD is only effective on the VALUE data type,
not converted C data types.
RB_GC_GUARD would not be necessary at all in the above example if non-inlined function calls are made
on the ‘s’ VALUE after sptr is dereferenced. Thus, in the above example, calling any un-inlined function on
‘s’ such as:
rb_str_modify(s);
Will ensure ‘s’ stays on the stack or register to prevent a GC invocation from prematurely freeing it.
Using the RB_GC_GUARD macro is preferable to using the “volatile” keyword in C. RB_GC_GUARD has
the following advantages:
1. the intent of the macro use is clear
2. RB_GC_GUARD only affects its call site, “volatile” generates some extra code every time the
variable is used, hurting optimization.
3. “volatile” implementations may be buggy/inconsistent in some compilers and architectures.
RB_GC_GUARD is customizable for broken systems/compilers without negatively affecting other
systems.
Appendix F. Ractor support
Ractor(s) are the parallel execution mechanism introduced in Ruby 3.0. All ractors can run in parallel on a
different OS thread (using an underlying system provided thread), so the C extension should be thread-safe.
A C extension that can run in multiple ractors is called “Ractor-safe”.
Ractor safety around C extensions has the following properties:
1. By default, all C extensions are recognized as Ractor-unsafe.
2. Ractor-unsafe C-methods may only be called from the main Ractor. If invoked by a non-
main Ractor, then a Ractor::UnsafeError is raised.
3. If an extension desires to be marked as Ractor-safe the extension should call
rb_ext_ractor_safe(true) at the Init_ function for the extension, and all defined methods will be
marked as Ractor-safe.
To make a “Ractor-safe” C extension, we need to check the following points:
(1) Do not share unshareable objects between ractors
For example, C’s global variable can lead sharing an unshareable objects between ractors.
VALUE g_var;
set() and get() pair can share an unshareable objects using g_var, and it is Ractor-unsafe.
Not only using global variables directly, some indirect data structure such as global st_table can share the
objects, so please take care.
Note that class and module objects are shareable objects, so you can keep the code “cFoo =
rb_define_class(…)” with C’s global variables.
(2) Check the thread-safety of the extension
An extension should be thread-safe. For example, the following code is not thread-safe:
g_called = true;
g_called = false;
return ret;
because g_called global variable should be synchronized by other ractor’s threads. To avoid such data-
race, some synchronization should be used. Check include/ruby/thread_native.h and include/ruby/atomic.h.
With Ractors, all objects given as method parameters and the receiver (self) are guaranteed to be from the
current Ractor or to be shareable. As a consequence, it is easier to make code ractor-safe than to make
code generally thread-safe. For example, we don’t need to lock an array object to access the element of it.
(3) Check the thread-safety of any used library
If the extension relies on an external library, such as a function foo() from a library libfoo, the function libfoo
foo() should be thread safe.
(4) Make an object shareable
This is not required to make an extension Ractor-safe.
If an extension provides special objects defined by rb_data_type_t, consider these objects can become
shareable or not.
RUBY_TYPED_FROZEN_SHAREABLE flag indicates that these objects can be shareable objects if the
object is frozen. This means that if the object is frozen, the mutation of wrapped data is not allowed.
(5) Others
There are possibly other points or requirements which must be considered in the making of a Ractor-safe
extension. This document will be extended as they are discovered.
Fiber
Fibers provide a mechanism for cooperative concurrency.
Context Switching
Fibers execute a user-provided block. During the execution, the block may
call Fiber.yield or Fiber.transfer to switch to another fiber. Fiber#resume is used to continue execution from
the point where Fiber.yield was called.
#!/usr/bin/env ruby
f = Fiber.new do
puts "3: Entered fiber."
Fiber.yield
puts "5: Resumed fiber."
end
Scheduler
The scheduler interface is used to intercept blocking operations. A typical implementation would be a
wrapper for a gem like EventMachine or Async. This design provides separation of concerns between the
event loop implementation and application code. It also allows for layered schedulers which can perform
instrumentation.
To set the scheduler for the current thread:
Fiber.set_scheduler(MyScheduler.new)
Fiber.set_scheduler(nil)
Design
The scheduler interface is designed to be a un-opinionated light-weight layer between user code and
blocking operations. The scheduler hooks should avoid translating or converting arguments or return values.
Ideally, the exact same arguments from the user code are provided directly to the scheduler hook with no
changes.
Interface
This is the interface you need to implement.
class Scheduler
# Wait for the specified process ID to exit.
# This hook is optional.
# @parameter pid [Integer] The process ID to wait for.
# @parameter flags [Integer] A bit-mask of flags suitable for `Process::Status.wait`.
# @returns [Process::Status] A process status instance.
def process_wait(pid, flags)
Thread.new do
Process::Status.wait(pid, flags)
end.value
end
# Wait for the given io readiness to match the specified events within
# the specified timeout.
# @parameter event [Integer] A bit mask of `IO::READABLE`,
# `IO::WRITABLE` and `IO::PRIORITY`.
# @parameter timeout [Numeric] The amount of time to wait for the event in seconds.
# @returns [Integer] The subset of events that are ready.
def io_wait(io, events, timeout)
end
# Sleep the current task for the specified duration, or forever if not
# specified.
# @parameter duration [Numeric] The amount of time to sleep in seconds.
def kernel_sleep(duration = nil)
end
# Execute the given block. If the block execution exceeds the given timeout,
# the specified exception `klass` will be raised. Typically, only non-blocking
# methods which enter the scheduler will raise such exceptions.
# @parameter duration [Integer] The amount of time to wait, after which an exception will be raised.
# @parameter klass [Class] The exception class to raise.
# @parameter *arguments [Array] The arguments to send to the constructor of the exception.
# @yields {...} The user code to execute.
def timeout_after(duration, klass, *arguments, &block)
end
def run
# Implement event loop here.
end
end
Additional hooks may be introduced in the future, we will use feature detection in order to enable these
hooks.
Non-blocking Execution
The scheduler hooks will only be used in special non-blocking execution contexts. Non-blocking execution
contexts introduce non-determinism because the execution of scheduler hooks may introduce context
switching points into your program.
Fibers
Fibers can be used to create non-blocking execution contexts.
Fiber.new do
puts Fiber.current.blocking? # false
We also introduce a new method which simplifies the creation of these non-blocking fibers:
Fiber.schedule do
puts Fiber.current.blocking? # false
end
The purpose of this method is to allow the scheduler to internally decide the policy for when to start the
fiber, and whether to use symmetric or asymmetric fibers.
You can also create blocking execution contexts:
Fiber.new(blocking: true) do
# Won't use the scheduler:
sleep(n)
end
However you should generally avoid this unless you are implementing a scheduler.
IO
By default, I/O is non-blocking. Not all operating systems support non-blocking I/O. Windows is a notable
example where socket I/O can be non-blocking but pipe I/O is blocking. Provided that there is a scheduler
and the current thread is non-blocking, the operation will invoke the scheduler.
Mutex
The Mutex class can be used in a non-blocking context and is fiber specific.
ConditionVariable
The ConditionVariable class can be used in a non-blocking context and is fiber-specific.
Queue / SizedQueue
The Queue and SizedQueue classes can be used in a non-blocking context and are fiber-specific.
Thread
The Thread#join operation can be used in a non-blocking context and is fiber-specific.
Format Specifications
Several Ruby core classes have instance method printf or sprintf:
ARGF#printf
IO#printf
Kernel#printf
Kernel#sprintf
Each of these methods takes:
Argument format_string, which has zero or more embedded format specifications (see below).
Arguments *arguments, which are zero or more objects to be formatted.
Each of these methods prints or returns the string resulting from replacing each format specification
embedded in format_string with a string form of the corresponding argument among arguments.
A simple example:
sprintf('Name: %s; value: %d', 'Foo', 0) # => "Name: Foo; value: 0"
%[flags][width][.precision]type
It consists of:
A leading percent character.
Zero or more flags (each is a character).
An optional width specifier (an integer).
An optional precision specifier (a period followed by a non-negative integer).
A type specifier (a character).
Except for the leading percent character, the only required part is the type specifier, so we begin with that.
Type Specifiers
This section provides a brief explanation of each type specifier. The links lead to the details and examples.
Flags
The effect of a flag may vary greatly among type specifiers. These remarks are general in nature. See type-
specific details.
Multiple flags may be given with single type specifier; order does not matter.
' ' Flag
Insert a space before a non-negative number:
'#' Flag
Use an alternate format; varies among types:
'+' Flag
Add a leading plus sign for a non-negative number:
'-' Flag
Left justify the value in its field:
'0' Flag
Left-pad with zeros instead of spaces:
'*' Flag
Use the next argument as the field width:
'n$' Flag
Format the (1-based) nth argument into this field:
Width Specifier
In general, a width specifier determines the minimum width (in characters) of the formatted field:
# Left-justify if negative.
sprintf('%-10d', 100) # => "100 "
Precision Specifier
A precision specifier is a decimal point followed by zero or more decimal digits.
For integer type specifiers, the precision specifies the minimum number of digits to be written. If the
precision is shorter than the integer, the result is padded with leading zeros. There is no modification or
truncation of the result if the integer is longer than the precision:
For the a/A, e/E, f/F specifiers, the precision specifies the number of digits after the decimal point to be
written:
For the g/G specifiers, the precision specifies the number of significant digits to be written:
For the s, p specifiers, the precision specifies the number of characters to write:
# Capital 'A' means that alphabetical characters are printed in upper case.
sprintf('%A', 4096) # => "0X1P+12"
sprintf('%A', -4096) # => "-0X1P+12"
Specifiers b and B
The two specifiers b and B behave identically except when flag '#'+ is used.
Format argument as a binary integer:
# Alternate format.
sprintf('%#b', 4) # => "0b100"
sprintf('%#B', 4) # => "0B100"
Specifier c
Format argument as a single character:
Specifier d
Format argument as a decimal integer:
Specifier f
Format argument as a floating-point number:
# Alternate format.
sprintf('%#g', 100000000000) # => "1.00000e+11"
sprintf('%#g', 0.000000000001) # => "1.00000e-12"
sprintf('%#G', 100000000000) # => "1.00000E+11"
sprintf('%#G', 0.000000000001) # => "1.00000E-12"
Specifier o
Format argument as an octal integer. If argument is negative, it will be formatted as a two’s complement
prefixed with ..7:
Specifier p
Format argument as a string via argument.inspect:
t = Time.now
sprintf('%p', t) # => "2022-05-01 13:42:07.1645683 -0500"
Specifier s
Format argument as a string via argument.to_s:
t = Time.now
sprintf('%s', t) # => "2022-05-01 13:42:07 -0500"
Specifier %
Format argument ('%') as a single percent character:
Reference by Name
For more complex formatting, Ruby supports a reference by name. %<name>s style uses format style, but
%{name} style doesn’t.
Examples:
Implicit Conversions
Some Ruby methods accept one or more objects that can be either:
Of a given class, and so accepted as is.
Implicitly convertible to that class, in which case the called method converts the object.
For each of the relevant classes, the conversion is done by calling a specific conversion method:
Array: to_ary
Hash: to_hash
Integer: to_int
String: to_str
Array-Convertible Objects
An Array-convertible object is an object that:
Has instance method to_ary.
The method accepts no arguments.
The method returns an object obj for which obj.kind_of?(Array) returns true.
The Ruby core class that satisfies these requirements is:
Array
The examples in this section use method Array#replace, which accepts an Array-convertible argument.
This class is Array-convertible:
class ArrayConvertible
def to_ary
[:foo, 'bar', 2]
end
end
a = []
a.replace(ArrayConvertible.new) # => [:foo, "bar", 2]
class NotArrayConvertible
def to_ary(x)
[:foo, 'bar', 2]
end
end
a = []
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
a.replace(NotArrayConvertible.new)
Hash-Convertible Objects
A Hash-convertible object is an object that:
Has instance method to_hash.
The method accepts no arguments.
The method returns an object obj for which obj.kind_of?(Hash) returns true.
The Ruby core class that satisfies these requirements is:
Hash
The examples in this section use method Hash#merge, which accepts a Hash-convertible argument.
This class is Hash-convertible:
class HashConvertible
def to_hash
{foo: 0, bar: 1, baz: 2}
end
end
h = {}
h.merge(HashConvertible.new) # => {:foo=>0, :bar=>1, :baz=>2}
class NotHashConvertible
def to_hash(x)
{foo: 0, bar: 1, baz: 2}
end
end
h = {}
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
h.merge(NotHashConvertible.new)
class NotHashConvertible
def to_hash
:foo
end
end
h = {}
# Raises TypeError (can't convert NotHashConvertible to Hash (ToHashReturnsNonHash#to_hash gives
Symbol))
h.merge(NotHashConvertible.new)
Integer-Convertible Objects
An Integer-convertible object is an object that:
Has instance method to_int.
The method accepts no arguments.
The method returns an object obj for which obj.kind_of?(Integer) returns true.
The Ruby core classes that satisfy these requirements are:
Integer
Float
Complex
Rational
The examples in this section use method Array.new, which accepts an Integer-convertible argument.
This user-defined class is Integer-convertible:
class IntegerConvertible
def to_int
3
end
end
a = Array.new(IntegerConvertible.new).size
a # => 3
class NotIntegerConvertible
def to_int(x)
3
end
end
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
Array.new(NotIntegerConvertible.new)
class NotIntegerConvertible
def to_int
:foo
end
end
# Raises TypeError (can't convert NotIntegerConvertible to Integer (NotIntegerConvertible#to_int gives
Symbol))
Array.new(NotIntegerConvertible.new)
String-Convertible Objects
A String-convertible object is an object that:
Has instance method to_str.
The method accepts no arguments.
The method returns an object obj for which obj.kind_of?(String) returns true.
The Ruby core class that satisfies these requirements is:
String
The examples in this section use method String::new, which accepts a String-convertible argument.
This class is String-convertible:
class StringConvertible
def to_str
'foo'
end
end
String.new(StringConvertible.new) # => "foo"
class NotStringConvertible
def to_str(x)
'foo'
end
end
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
String.new(NotStringConvertible.new)
class NotStringConvertible
def to_str
:foo
end
end
# Raises TypeError (can't convert NotStringConvertible to String (NotStringConvertible#to_str gives
Symbol))
String.new(NotStringConvertible.new)
Keywords
The following keywords are used by Ruby.
__ENCODING__
The script encoding of the current file. See Encoding.
__LINE__
The line number of this keyword in the current file.
__FILE__
The path to the current file.
BEGIN
Runs before any other code in the current file. See miscellaneous syntax
END
Runs after any other code in the current file. See miscellaneous syntax
alias
Creates an alias between two methods (and other things). See modules and classes syntax
and
Short-circuit Boolean and with lower precedence than &&
begin
Starts an exception handling block. See exceptions syntax
break
Leaves a block early. See control expressions syntax
case
Starts a case expression. See control expressions syntax
class
Creates or opens a class. See modules and classes syntax
def
Defines a method. See methods syntax
defined?
Returns a string describing its argument. See miscellaneous syntax
do
Starts a block.
else
The unhandled condition in case, if and unless expressions. See control expressions
elsif
An alternate condition for an if expression. See control expressions
end
The end of a syntax block. Used by classes, modules, methods, exception handling and control
expressions.
ensure
Starts a section of code that is always run when an exception is raised. See exception handling
false
Boolean false. See literals
for
A loop that is similar to using the each method. See control expressions
if
Used for if and modifier if statements. See control expressions
in
Used to separate the iterable object and iterator variable in a for loop. See control expressions It
also serves as a pattern in a case expression. See pattern matching
module
Creates or opens a module. See modules and classes syntax
next
Skips the rest of the block. See control expressions
nil
A false value usually indicating “no value” or “unknown”. See literals
not
Inverts the following boolean expression. Has a lower precedence than !
or
Boolean or with lower precedence than ||
redo
Restarts execution in the current block. See control expressions
rescue
Starts an exception section of code in a begin block. See exception handling
retry
Retries an exception block. See exception handling
return
Exits a method. See methods. If met in top-level scope, immediately stops interpretation of the
current file.
self
The object the current method is attached to. See methods
super
Calls the current method in a superclass. See methods
then
Indicates the end of conditional blocks in control structures. See control expressions
true
Boolean true. See literals
undef
Prevents a class or module from responding to a method call. See modules and classes
unless
Used for unless and modifier unless statements. See control expressions
until
Creates a loop that executes until the condition is true. See control expressions
when
A condition in a case expression. See control expressions
while
Creates a loop that executes while the condition is true. See control expressions
yield
Starts execution of the block sent to the current method. See methods
Maintainers
This page describes the current module, library, and extension maintainers of Ruby.
Module Maintainers
A module maintainer is responsible for a certain part of Ruby.
The maintainer fixes bugs of the part. Particularly, they should fix security vulnerabilities as soon
as possible.
They handle issues related the module on the Redmine or ML.
They may be discharged by the 3 months rule [ruby-core:25764].
They have commit right to Ruby’s repository to modify their part in the repository.
They have “developer” role on the Redmine to modify issues.
They have authority to decide the feature of their part. But they should always respect discussions
on ruby-core/ruby-dev.
A submaintainer of a module is like a maintainer. But the submaintainer does not have authority to
change/add a feature on his/her part. They need consensus on ruby-core/ruby-dev before changing/adding.
Some of submaintainers have commit right, others don’t.
Evaluator
Koichi Sasada (ko1)
Core classes
Yukihiro Matsumoto (matz)
Libraries
lib/mkmf.rb
unmaintained
lib/rubygems.rb, lib/rubygems/*
Eric Hodel (drbrain), Hiroshi SHIBATA (hsbt) github.com/rubygems/rubygems
lib/unicode_normalize.rb, lib/unicode_normalize/*
Martin J. Dürst
Extensions
ext/continuation
Koichi Sasada (ko1)
ext/coverage
Yusuke Endoh (mame)
ext/fiber
Koichi Sasada (ko1)
ext/monitor
Koichi Sasada (ko1)
ext/objspace
unmaintained
ext/pty
unmaintained
ext/ripper
unmaintained
ext/socket
Tanaka Akira (akr)
API change needs matz’s approval
ext/win32
NAKAMURA Usaku (usa)
Libraries
lib/abbrev.rb
Akinori MUSHA (knu) github.com/ruby/abbrev rubygems.org/gems/abbrev
lib/base64.rb
Yusuke Endoh (mame) github.com/ruby/base64 rubygems.org/gems/base64
lib/benchmark.rb
unmaintained github.com/ruby/benchmark rubygems.org/gems/benchmark
lib/bundler.rb, lib/bundler/*
Hiroshi SHIBATA (hsbt) github.com/rubygems/rubygems rubygems.org/gems/bundler
lib/cgi.rb, lib/cgi/*
unmaintained github.com/ruby/cgi rubygems.org/gems/cgi
lib/csv.rb
Kenta Murata (mrkn), Kouhei Sutou (kou) github.com/ruby/csv rubygems.org/gems/csv
lib/English.rb
unmaintained github.com/ruby/English rubygems.org/gems/English
lib/debug.rb
unmaintained github.com/ruby/debug
lib/delegate.rb
unmaintained github.com/ruby/delegate rubygems.org/gems/delegate
lib/did_you_mean.rb
Yuki Nishijima (yuki24) github.com/ruby/did_you_mean rubygems.org/gems/did_you_mean
ext/digest, ext/digest/*
Akinori MUSHA (knu) github.com/ruby/digest rubygems.org/gems/digest
lib/drb.rb, lib/drb/*
Masatoshi SEKI (seki) github.com/ruby/drb rubygems.org/gems/drb
lib/erb.rb
Masatoshi SEKI (seki), Takashi Kokubun (k0kubun) github.com/ruby/erb rubygems.org/gems/erb
lib/error_highlight.rb, lib/error_highlight/*
Yusuke Endoh (mame) github.com/ruby/error_highlight rubygems.org/gems/error_highlight
lib/fileutils.rb
unmaintained github.com/ruby/fileutils rubygems.org/gems/fileutils
lib/find.rb
Kazuki Tsujimoto (ktsj) github.com/ruby/find rubygems.org/gems/find
lib/forwardable.rb
Keiju ISHITSUKA (keiju) github.com/ruby/forwardable rubygems.org/gems/forwardable
lib/getoptlong.rb
unmaintained github.com/ruby/getoptlong rubygems.org/gems/getoptlong
lib/ipaddr.rb
Akinori MUSHA (knu) github.com/ruby/ipaddr rubygems.org/gems/ipaddr
lib/irb.rb, lib/irb/*
aycabta github.com/ruby/irb rubygems.org/gems/irb
lib/optparse.rb, lib/optparse/*
Nobuyuki Nakada (nobu) github.com/ruby/optparse
lib/logger.rb
Naotoshi Seo (sonots) github.com/ruby/logger rubygems.org/gems/logger
lib/mutex_m.rb
Keiju ISHITSUKA (keiju) github.com/ruby/mutex_m rubygems.org/gems/mutex_m
lib/net/http.rb, lib/net/https.rb
NARUSE, Yui (naruse) github.com/ruby/net-http rubygems.org/gems/net-http
lib/net/protocol.rb
unmaintained github.com/ruby/net-protocol rubygems.org/gems/net-protocol
lib/observer.rb
unmaintained github.com/ruby/observer rubygems.org/gems/observer
lib/open3.rb
unmaintained github.com/ruby/open3 rubygems.org/gems/open3
lib/open-uri.rb
Tanaka Akira (akr) github.com/ruby/open-uri
lib/ostruct.rb
Marc-André Lafortune (marcandre) github.com/ruby/ostruct rubygems.org/gems/ostruct
lib/pp.rb
Tanaka Akira (akr) github.com/ruby/pp rubygems.org/gems/pp
lib/prettyprint.rb
Tanaka Akira (akr) github.com/ruby/prettyprint rubygems.org/gems/prettyprint
lib/pstore.rb
unmaintained github.com/ruby/pstore rubygems.org/gems/pstore
lib/racc.rb, lib/racc/*
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/raccrubygems.org/gems/racc
lib/readline.rb
aycabta github.com/ruby/readline rubygems.org/gems/readline
lib/resolv.rb
Tanaka Akira (akr) github.com/ruby/resolv rubygems.org/gems/resolv
lib/resolv-replace.rb
Tanaka Akira (akr) github.com/ruby/resolv-replace rubygems.org/gems/resolv-replace
lib/rdoc.rb, lib/rdoc/*
Eric Hodel (drbrain), Hiroshi SHIBATA (hsbt) github.com/ruby/rdoc rubygems.org/gems/rdoc
lib/readline.rb
aycabta github.com/ruby/readline rubygems.org/gems/readline
lib/reline.rb, lib/reline/*
aycabta github.com/ruby/reline rubygems.org/gems/reline
lib/rinda/*
Masatoshi SEKI (seki) github.com/ruby/rinda rubygems.org/gems/rinda
lib/securerandom.rb
Tanaka Akira (akr) github.com/ruby/securerandom rubygems.org/gems/securerandom
lib/set.rb
Akinori MUSHA (knu) github.com/ruby/set rubygems.org/gems/set
lib/shellwords.rb
Akinori MUSHA (knu) github.com/ruby/shellwords rubygems.org/gems/shellwords
lib/singleton.rb
Yukihiro Matsumoto (matz) github.com/ruby/singleton rubygems.org/gems/singleton
lib/tempfile.rb
unmaintained github.com/ruby/tempfile rubygems.org/gems/tempfile
lib/time.rb
Tanaka Akira (akr) github.com/ruby/time rubygems.org/gems/time
lib/timeout.rb
Yukihiro Matsumoto (matz) github.com/ruby/timeout rubygems.org/gems/timeout
lib/thwait.rb
Keiju ISHITSUKA (keiju) github.com/ruby/thwait rubygems.org/gems/thwait
lib/tmpdir.rb
unmaintained github.com/ruby/tmpdir rubygems.org/gems/tmpdir
lib/tsort.rb
Tanaka Akira (akr) github.com/ruby/tsort rubygems.org/gems/tsort
lib/un.rb
WATANABE Hirofumi (eban) github.com/ruby/un rubygems.org/gems/un
lib/uri.rb, lib/uri/*
YAMADA, Akira (akira) github.com/ruby/uri rubygems.org/gems/uri
lib/yaml.rb, lib/yaml/*
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/yamlrubygems.org/gems/yaml
lib/weakref.rb
unmaintained github.com/ruby/weakref rubygems.org/gems/weakref
Extensions
ext/bigdecimal
Kenta Murata (mrkn) github.com/ruby/bigdecimal rubygems.org/gems/bigdecimal
ext/cgi
Nobuyoshi Nakada (nobu) github.com/ruby/cgi rubygems.org/gems/cgi
ext/date
unmaintained github.com/ruby/date rubygems.org/gems/date
ext/etc
Ruby core team github.com/ruby/etc rubygems.org/gems/etc
ext/fcntl
Ruby core team github.com/ruby/fcntl rubygems.org/gems/fcntl
ext/fiddle
Aaron Patterson (tenderlove) github.com/ruby/fiddle rubygems.org/gems/fiddle
ext/io/console
Nobuyuki Nakada (nobu) github.com/ruby/io-console rubygems.org/gems/io-console
ext/io/nonblock
Nobuyuki Nakada (nobu) github.com/ruby/io-nonblock rubygems.org/gems/io-nonblock
ext/io/wait
Nobuyuki Nakada (nobu) github.com/ruby/io-wait rubygems.org/gems/io-wait
ext/json
NARUSE, Yui (naruse), Hiroshi SHIBATA (hsbt) github.com/flori/json rubygems.org/gems/json
ext/nkf
NARUSE, Yui (naruse) github.com/ruby/nkf rubygems.org/gems/nkf
ext/openssl
Kazuki Yamaguchi (rhe) github.com/ruby/openssl rubygems.org/gems/openssl
ext/pathname
Tanaka Akira (akr) github.com/ruby/pathname rubygems.org/gems/pathname
ext/psych
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/psychrubygems.org/gems/psych
ext/racc
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/raccrubygems.org/gems/racc
ext/readline
TAKAO Kouji (kouji) github.com/ruby/readline-ext rubygems.org/gems/readline-ext
ext/stringio
Nobuyuki Nakada (nobu) github.com/ruby/stringio rubygems.org/gems/stringio
ext/strscan
Kouhei Sutou (kou) github.com/ruby/strscan rubygems.org/gems/strscan
ext/syslog
Akinori MUSHA (knu) github.com/ruby/syslog rubygems.org/gems/syslog
ext/win32ole
Masaki Suketa (suke) github.com/ruby/win32ole rubygems.org/gems/win32ole
ext/zlib
NARUSE, Yui (naruse) github.com/ruby/zlib rubygems.org/gems/zlib
Platform Maintainers
mswin64 (Microsoft Windows)
NAKAMURA Usaku (usa)
mingw32 (Minimalist GNU for Windows)
Nobuyoshi Nakada (nobu)
AIX
Yutaka Kanemoto (kanemoto)
FreeBSD
Akinori MUSHA (knu)
Solaris
Naohisa Goto (ngoto)
RHEL, CentOS
KOSAKI Motohiro (kosaki)
macOS
Kenta Murata (mrkn)
OpenBSD
Jeremy Evans (jeremyevans0)
cygwin, …
none. (Maintainer WANTED)
WebAssembly/WASI
Yuta Saito (katei)
Marshal Format
The Marshal format is used to serialize ruby objects. The format can store arbitrary objects through three
user-defined extension mechanisms.
For documentation on using Marshal to serialize and deserialize objects, see the Marshal module.
This document calls a serialized set of objects a stream. The Ruby implementation can load a set of objects
from a String, an IO or an object that implements a getc method.
Stream Format
The first two bytes of the stream contain the major and minor version, each as a single byte encoding a
digit. The version implemented in Ruby is 4.8 (stored as “x04x08”) and is supported by ruby 1.8.0 and
newer.
Different major versions of the Marshal format are not compatible and cannot be understood by other major
versions. Lesser minor versions of the format can be understood by newer minor versions. Format 4.7 can
be loaded by a 4.8 implementation but format 4.8 cannot be loaded by a 4.7 implementation.
Following the version bytes is a stream describing the serialized object. The stream contains nested objects
(the same as a Ruby object) but objects in the stream do not necessarily have a direct mapping to the Ruby
object model.
Each object in the stream is described by a byte indicating its type followed by one or more bytes describing
the object. When “object” is mentioned below it means any of the types below that defines a Ruby object.
"\x04\x08:\x0ahello"
“;” represents a Symbol link which references a previously defined Symbol. Following the type byte is a long
containing the index in the lookup table for the linked (referenced) Symbol.
For example, the following stream contains [:hello, :hello]:
"\x04\b[\a:\nhello;\x00"
When a “symbol” is referenced below it may be either a real symbol or a symbol link.
Object References
Separate from but similar to symbol references, the stream contains only one copy of each object (as
determined by object_id) for all objects except true, false, nil, Fixnums and Symbols (which are stored
separately as described above) a one-indexed 32 bit value will be stored and reused when the object is
encountered again. (The first object has an index of 1).
“@” represents an object link. Following the type byte is a long giving the index of the object.
For example, the following stream contains an Array of the same "hello" object twice:
"\004\b[\a\"\nhello@\006"
Instance Variables
“I” indicates that instance variables follow the next object. An object follows the type byte. Following the
object is a length indicating the number of instance variables for the object. Following the length is a set of
name-value pairs. The names are symbols while the values are objects. The symbols must be instance
variable names (:@name).
An Object (“o” type, described below) uses the same format for its instance variables as described here.
For a String and Regexp (described below) a special instance variable :E is used to indicate the Encoding.
Extended
“e” indicates that the next object is extended by a module. An object follows the type byte. Following the
object is a symbol that contains the name of the module the object is extended by.
Array
“[” represents an Array. Following the type byte is a long indicating the number of objects in the array. The
given number of objects follow the length.
Bignum
“l” represents a Bignum which is composed of three parts:
sign
A single byte containing “+” for a positive value or “-” for a negative value.
length
A long indicating the number of bytes of Bignum data follows, divided by two. Multiply the length
by two to determine the number of bytes of data that follow.
data
Bytes of Bignum data representing the number.
The following ruby code will reconstruct the Bignum value from an array of bytes:
result = 0
Regular Expression
“/” represents a regular expression. Following the type byte is a byte sequence containing the regular
expression source. Following the type byte is a byte containing the regular expression options (case-
insensitive, etc.) as a signed 8-bit value.
Regular expressions can have an encoding attached through instance variables (see above). If no encoding
is attached escapes for the following regexp specials not present in ruby 1.8 must be removed: g-m, o-q, u,
y, E, F, H-L, N-V, X, Y.
String
‘“’ represents a String. Following the type byte is a byte sequence containing the string content. When
dumped from ruby 1.9 an encoding instance variable (:E see above) should be included unless the
encoding is binary.
Struct
“S” represents a Struct. Following the type byte is a symbol containing the name of the struct. Following the
name is a long indicating the number of members in the struct. Double the number of objects follow the
member count. Each member is a pair containing the member’s symbol and an object for the value of that
member.
If the struct name does not match a Struct subclass in the running ruby an exception should be raised.
If there is a mismatch between the struct in the currently running ruby and the member count in the
marshaled struct an exception should be raised.
User Class
“C” represents a subclass of a String, Regexp, Array or Hash. Following the type byte is a symbol
containing the name of the subclass. Following the name is the wrapped object.
User Defined
“u” represents an object with a user-defined serialization format using the _dump instance method
and _load class method. Following the type byte is a symbol containing the class name. Following the class
name is a byte sequence containing the user-defined representation of the object.
The class method _load is called on the class with a string created from the byte-sequence.
User Marshal
“U” represents an object with a user-defined serialization format using
the marshal_dump and marshal_load instance methods. Following the type byte is a symbol containing the
class name. Following the class name is an object containing the data.
Upon loading a new instance must be allocated and marshal_load must be called on the instance with the
data.
MemoryView
MemoryView provides the features to share multidimensional homogeneous arrays of fixed-size element on
memory among extension libraries.
Disclaimer
This feature is still experimental. The specification described here can be changed in the future.
This document is under construction. Please refer the master branch of ruby for the latest version
of this document.
Overview
We sometimes deal with certain kinds of objects that have arrays of the same typed fixed-size elements on
a contiguous memory area as its internal representation. Numo::NArray in numo-narray and Magick::Image
in rmagick are typical examples of such objects. MemoryView plays the role of the hub to share the internal
data of such objects without copy among such libraries.
Copy-less sharing of data is very important in some field such as data analysis, machine learning, and
image processing. In these field, people need to handle large amount of on-memory data with several
libraries. If we are forced to copy to exchange large data among libraries, a large amount of the data
processing time must be occupied by copying data. You can avoid such wasting time by using
MemoryView.
MemoryView has two categories of APIs:
1. Producer API
Classes can register own MemoryView entry which allows objects of that classes to expose their
MemoryView
2. Consumer API
Consumer API allows us to obtain and manage the MemoryView of an object
MemoryView structure
A MemoryView structure, rb_memory_view_t, is used for exporting objects’ MemoryView. This structure
contains the reference of the object, which is the owner of the MemoryView, the pointer to the head of
exported memory, and the metadata that describes the structure of the memory. The metadata can describe
multidimensional arrays with strides.
MemoryView APIs
For consumers
bool rb_memory_view_available_p(VALUE obj)
Return true if obj supports to export a MemoryView. Return false otherwise.
If this function returns true, it doesn’t mean the function rb_memory_view_get will succeed.
bool rb_memory_view_get(VALUE obj, rb_memory_view_t *view, int flags)
If the given obj supports to export a MemoryView that conforms the given flags, this function
fills view by the information of the MemoryView and returns true. In this case, the reference count
of obj is increased.
If the given combination of obj and flags cannot export a MemoryView, this function returns false.
The content of view is not touched in this case.
The exported MemoryView must be released by rb_memory_view_release when the MemoryView
is no longer needed.
bool rb_memory_view_release(rb_memory_view_t *view)
Release the given MemoryView view and decrement the reference count of view->obj.
Consumers must call this function when the MemoryView is no longer needed. Missing to call this
function leads memory leak.
ssize_t rb_memory_view_item_size_from_format(const char *format, const char **err)
Calculate the number of bytes occupied by an element.
When the calculation fails, the failed location in format is stored into err, and returns -1.
void *rb_memory_view_get_item_pointer(rb_memory_view_t *view, const ssize_t *indices)
Calculate the location of the item indicated by the given indices. The length of indicesmust equal
to view->ndim. This function initializes view->item_desc if needed.
VALUE rb_memory_view_get_item(rb_memory_view_t *view, const ssize_t *indices)
Return the Ruby object representation of the item indicated by the given indices. The length
of indices must equal to view->ndim. This function uses rb_memory_view_get_item_pointer.
rb_memory_view_init_as_byte_array(rb_memory_view_t *view, VALUE obj, void *data, const
ssize_t len, const bool readonly)
Fill the members of view as an 1-dimensional byte array.
void rb_memory_view_fill_contiguous_strides(const ssize_t ndim, const ssize_t item_size, const
ssize_t *const shape, const bool row_major_p, ssize_t *const strides)
Fill the strides array with byte-Strides of a contiguous array of the given shape with the given element size.
void rb_memory_view_prepare_item_desc(rb_memory_view_t *view)
Fill the item_desc member of view.
bool rb_memory_view_is_contiguous(const rb_memory_view_t *view)
Return true if the data in the MemoryView view is row-major or column-major contiguous.
Return false otherwise.
bool rb_memory_view_is_row_major_contiguous(const rb_memory_view_t *view)
Return true if the data in the MemoryView view is row-major contiguous.
Return false otherwise.
bool rb_memory_view_is_column_major_contiguous(const rb_memory_view_t *view)
Return true if the data in the MemoryView view is column-major contiguous.
Return false otherwise.
Argument Converters
An option can specify that its argument is to be converted from the default String to an instance of another
class.
Contents
Built-In Argument Converters
o Date
o DateTime
o Time
o URI
o Shellwords
o Integer
o Float
o Numeric
o DecimalInteger
o OctalInteger
o DecimalNumeric
o TrueClass
o FalseClass
o Object
o String
o Array
o Regexp
Custom Argument Converters
Date
File date.rb defines an option whose argument is to be converted to a Date object. The argument is
converted by method Date#parse.
require 'optparse/date'
parser = OptionParser.new
parser.on('--date=DATE', Date) do |value|
p [value, value.class]
end
parser.parse!
Executions:
DateTime
File datetime.rb defines an option whose argument is to be converted to a DateTime object. The argument
is converted by method DateTime#parse.
require 'optparse/date'
parser = OptionParser.new
parser.on('--datetime=DATETIME', DateTime) do |value|
p [value, value.class]
end
parser.parse!
Executions:
Time
File time.rb defines an option whose argument is to be converted to a Time object. The argument is
converted by method Time#httpdate or Time#parse.
require 'optparse/time'
parser = OptionParser.new
parser.on('--time=TIME', Time) do |value|
p [value, value.class]
end
parser.parse!
Executions:
$ ruby time.rb --time "Thu, 06 Oct 2011 02:26:12 GMT"
URI
File uri.rb defines an option whose argument is to be converted to a URI object. The argument is converted
by method URI#parse.
require 'optparse/uri'
parser = OptionParser.new
parser.on('--uri=URI', URI) do |value|
p [value, value.class]
end
parser.parse!
Executions:
Shellwords
File shellwords.rb defines an option whose argument is to be converted to an Array object by
method Shellwords#shellwords.
require 'optparse/shellwords'
parser = OptionParser.new
parser.on('--shellwords=SHELLWORDS', Shellwords) do |value|
p [value, value.class]
end
parser.parse!
Executions:
Integer
File integer.rb defines an option whose argument is to be converted to an Integer object. The argument is
converted by method Kernel#Integer.
require 'optparse'
parser = OptionParser.new
parser.on('--integer=INTEGER', Integer) do |value|
p [value, value.class]
end
parser.parse!
Executions:
[100, Integer]
[-100, Integer]
[64, Integer]
[256, Integer]
$ ruby integer.rb --integer 0b100
[4, Integer]
Float
File float.rb defines an option whose argument is to be converted to a Float object. The argument is
converted by method Kernel#Float.
require 'optparse'
parser = OptionParser.new
parser.on('--float=FLOAT', Float) do |value|
p [value, value.class]
end
parser.parse!
Executions:
[1.0, Float]
[3.14159, Float]
[123.4, Float]
[0.01234, Float]
Numeric
File numeric.rb defines an option whose argument is to be converted to an instance of Rational, Float, or
Integer. The argument is converted by method Kernel#Rational, Kernel#Float, or Kernel#Integer.
require 'optparse'
parser = OptionParser.new
parser.on('--numeric=NUMERIC', Numeric) do |value|
p [value, value.class]
end
parser.parse!
Executions:
[(1/3), Rational]
[0.3333, Float]
[3, Integer]
DecimalInteger
File decimal_integer.rb defines an option whose argument is to be converted to an Integer object. The
argument is converted by method Kernel#Integer.
require 'optparse'
include OptionParser::Acceptables
parser = OptionParser.new
parser.on('--decimal_integer=DECIMAL_INTEGER', DecimalInteger) do |value|
p [value, value.class]
end
parser.parse!
The argument may not be in a binary or hexadecimal format; a leading zero is ignored (not parsed as octal).
Executions:
[100, Integer]
[-100, Integer]
[-100, Integer]
OctalInteger
File octal_integer.rb defines an option whose argument is to be converted to an Integer object. The
argument is converted by method Kernel#Integer.
require 'optparse'
include OptionParser::Acceptables
parser = OptionParser.new
parser.on('--octal_integer=OCTAL_INTEGER', OctalInteger) do |value|
p [value, value.class]
end
parser.parse!
The argument may not be in a binary or hexadecimal format; it is parsed as octal, regardless of whether it
has a leading zero.
Executions:
[64, Integer]
[-64, Integer]
[64, Integer]
DecimalNumeric
File decimal_numeric.rb defines an option whose argument is to be converted to an Integer object. The
argument is converted by method Kernel#Integer
require 'optparse'
include OptionParser::Acceptables
parser = OptionParser.new
parser.on('--decimal_numeric=DECIMAL_NUMERIC', DecimalNumeric) do |value|
p [value, value.class]
end
parser.parse!
The argument may not be in a binary or hexadecimal format; a leading zero causes the argument to be
parsed as octal.
Executions:
[100, Integer]
[-100, Integer]
[64, Integer]
TrueClass
File true_class.rb defines an option whose argument is to be converted to true or false. The argument is
evaluated by method Object#nil?.
require 'optparse'
parser = OptionParser.new
parser.on('--true_class=TRUE_CLASS', TrueClass) do |value|
p [value, value.class]
end
parser.parse!
[true, TrueClass]
[true, TrueClass]
[false, FalseClass]
[false, FalseClass]
[false, FalseClass]
[false, FalseClass]
FalseClass
File false_class.rb defines an option whose argument is to be converted to true or false. The argument is
evaluated by method Object#nil?.
require 'optparse'
parser = OptionParser.new
parser.on('--false_class=FALSE_CLASS', FalseClass) do |value|
p [value, value.class]
end
parser.parse!
[false, FalseClass]
[false, FalseClass]
$ ruby false_class.rb --false_class -
[false, FalseClass]
[false, FalseClass]
[true, TrueClass]
[true, TrueClass]
[true, TrueClass]
Object
File object.rb defines an option whose argument is not to be converted from String.
require 'optparse'
parser = OptionParser.new
parser.on('--object=OBJECT', Object) do |value|
p [value, value.class]
end
parser.parse!
Executions:
["foo", String]
["nil", String]
String
File string.rb defines an option whose argument is not to be converted from String.
require 'optparse'
parser = OptionParser.new
parser.on('--string=STRING', String) do |value|
p [value, value.class]
end
parser.parse!
Executions:
["foo", String]
["nil", String]
Array
File array.rb defines an option whose argument is to be converted from String to an array of strings, based
on comma-separated substrings.
require 'optparse'
parser = OptionParser.new
parser.on('--array=ARRAY', Array) do |value|
p [value, value.class]
end
parser.parse!
Executions:
[[], Array]
require 'optparse'
parser = OptionParser.new
parser.on('--regexp=REGEXP', Regexp) do |value|
p [value, value.class]
end
parser.parse!
Executions:
require 'optparse/date'
parser = OptionParser.new
parser.accept(Complex) do |value|
value.to_c
end
parser.on('--complex COMPLEX', Complex) do |value|
p [value, value.class]
end
parser.parse!
Executions:
[(0+0i), Complex]
[(1+0i), Complex]
[(0.3-0.5i), Complex]
This custom converter accepts any 1-word argument and capitalizes it, if possible.
require 'optparse/date'
parser = OptionParser.new
parser.accept(:capitalize, /\w*/) do |value|
value.capitalize
end
parser.on('--capitalize XXX', :capitalize) do |value|
p [value, value.class]
end
parser.parse!
Executions:
["Foo", String]
Creates an option from the given parameters params. See Parameters for New Options.
The block, if given, is the handler for the created option. When the option is encountered during command-
line parsing, the block is called with the argument given for the option, if any. See Option Handlers.
Option Names
There are two kinds of option names:
Short option name, consisting of a single hyphen and a single character.
Long option name, consisting of two hyphens and one or more characters.
Short Names
require 'optparse'
parser = OptionParser.new
parser.on('-x', 'One short name') do |value|
p ['-x', value]
end
parser.on('-1', '-%', 'Two short names (aliases)') do |value|
p ['-1 or -%', value]
end
parser.parse!
Executions:
$ ruby short_simple.rb -x
["-x", true]
$ ruby short_simple.rb -1 -x -%
["-x", true]
require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', 'Short name with required argument') do |value|
p ['-x', value]
end
parser.parse!
Executions:
$ ruby short_required.rb -x
["-x", "FOO"]
require 'optparse'
parser = OptionParser.new
parser.on('-x [XXX]', 'Short name with optional argument') do |value|
p ['-x', value]
end
parser.parse!
Executions:
$ ruby short_optional.rb -x
["-x", nil]
$ ruby short_optional.rb -x FOO
["-x", "FOO"]
require 'optparse'
parser = OptionParser.new
parser.on('-[!-~]', 'Short names in (very large) range') do |name, value|
p ['!-~', name, value]
end
parser.parse!
Executions:
$ ruby short_range.rb -!
$ ruby short_range.rb -!
$ ruby short_range.rb -A
$ ruby short_range.rb -z
Long Names
p ['--xxx', value]
end parser.parse!
Executions:
["--xxx", true]
["--xxx", true]
require 'optparse'
parser = OptionParser.new
parser.on('--xxx XXX', 'Long name with required argument') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", "FOO"]
require 'optparse'
parser = OptionParser.new
parser.on('--xxx [XXX]', 'Long name with optional argument') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", nil]
["--xxx", "FOO"]
Long Names with Negation
A long name may be defined with both positive and negative senses.
File long_with_negation.rb defines an option that has both senses.
require 'optparse'
parser = OptionParser.new
parser.on('--[no-]binary', 'Long name with negation') do |value|
p [value, value.class]
end
parser.parse!
Executions:
[true, TrueClass]
[false, FalseClass]
Mixed Names
An option may have both short and long names.
File mixed_names.rb defines a mixture of short and long names.
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument') do |value|
p ['--xxx', value]
end
parser.on('-yYYY', '--yyy', 'Short and long, required argument') do |value|
p ['--yyy', value]
end
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument') do |value|
p ['--zzz', value]
end
parser.parse!
Executions:
$ ruby mixed_names.rb -x
["--xxx", true]
["--xxx", true]
$ ruby mixed_names.rb -y
["--yyy", "FOO"]
["--yyy", "BAR"]
$ ruby mixed_names.rb -z
["--zzz", nil]
["--zzz", "BAZ"]
["--zzz", "BAT"]
Argument Keywords
As seen above, a given option name string may itself indicate whether the option has no argument, a
required argument, or an optional argument.
An alternative is to use a separate symbol keyword, which is one of :NONE (the
default), :REQUIRED, :OPTIONAL.
File argument_keywords.rb defines an option with a required argument.
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', :REQUIRED, 'Required argument') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", "FOO"]
Argument Strings
Still another way to specify a required argument is to define it in a string separate from the name string.
File argument_strings.rb defines an option with a required argument.
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', '=XXX', 'Required argument') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", "FOO"]
Argument Values
Permissible argument values may be restricted either by specifying explicit values or by providing a pattern
that the given value must match.
require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', ['foo', 'bar'], 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', ['baz', 'bat'], 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!
Executions:
$ ruby explicit_array_values.rb -x
["-x", "foo"]
$ ruby explicit_array_values.rb -x f
["-x", "foo"]
["-x", "bar"]
$ ruby explicit_array_values.rb -y ba
Executions:
$ ruby explicit_hash_values.rb -x
["-x", 0]
$ ruby explicit_hash_values.rb -x f
["-x", 0]
["-x", 1]
$ ruby explicit_hash_values.rb -y
["-y", nil]
["-y", 3]
$ ruby explicit_hash_values.rb -y ba
["-y", nil]
require 'optparse'
parser = OptionParser.new
parser.on('--xxx XXX', /foo/i, 'Matched values') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", "foo"]
["--xxx", "FOO"]
Argument Converters
An option can specify that its argument is to be converted from the default String to an instance of another
class.
There are a number of built-in converters. You can also define custom converters.
See Argument Converters.
Descriptions
A description parameter is any string parameter that is not recognized as an option name or a terminator; in
other words, it does not begin with a hyphen.
You may give any number of description parameters; each becomes a line in the text generated by option --
help.
File descriptions.rb has six strings in its array descriptions. These are all passed as parameters
to OptionParser#on, so that they all, line for line, become the option’s description.
require 'optparse'
parser = OptionParser.new
description = <<-EOT
Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Aenean commodo ligula eget.
Aenean massa. Cum sociis natoque penatibus
et magnis dis parturient montes, nascetur
ridiculus mus. Donec quam felis, ultricies
nec, pellentesque eu, pretium quis, sem.
EOT
descriptions = description.split($/)
parser.on('--xxx', *descriptions) do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", true]
Option Handlers
The handler for an option is an executable that will be called when the option is encountered. The handler
may be:
A block (this is most often seen).
A proc.
A method.
Handler Blocks
An option handler may be a block.
File block.rb defines an option that has a handler block.
require 'optparse'
parser = OptionParser.new
parser.on('--xxx', 'Option with no argument') do |value|
p ['Handler block for -xxx called with value:', value]
end
parser.on('--yyy YYY', 'Option with required argument') do |value|
p ['Handler block for -yyy called with value:', value]
end
parser.parse!
Executions:
Handler Procs
An option handler may be a Proc.
File proc.rb defines an option that has a handler proc.
require 'optparse'
parser = OptionParser.new
parser.on(
'--xxx',
'Option with no argument',
->(value) {p ['Handler proc for -xxx called with value:', value]}
)
parser.on(
'--yyy YYY',
'Option with required argument',
->(value) {p ['Handler proc for -yyy called with value:', value]}
)
parser.parse!
Executions:
Handler Methods
An option handler may be a Method.
File proc.rb defines an option that has a handler method.
require 'optparse'
parser = OptionParser.new
def xxx_handler(value)
p ['Handler method for -xxx called with value:', value]
end
parser.on('--xxx', 'Option with no argument', method(:xxx_handler))
def yyy_handler(value)
p ['Handler method for -yyy called with value:', value]
end
parser.on('--yyy YYY', 'Option with required argument', method(:yyy_handler))
parser.parse!
Executions:
Tutorial
Why OptionParser?
When a Ruby program executes, it captures its command-line arguments and options into variable ARGV.
This simple program just prints its ARGV:
p ARGV
The executing program is responsible for parsing and handling the command-line options.
OptionParser offers methods for parsing and handling those options.
With OptionParser, you can define options so that for each option:
The code that defines the option and code that handles that option are in the same place.
The option may take no argument, a required argument, or an optional argument.
The argument may be automatically converted to a specified class.
The argument may be restricted to specified forms.
The argument may be restricted to specified values.
The class also has method help, which displays automatically-generated help text.
Contents
To Begin With
Defining Options
Option Names
o Short Option Names
o Long Option Names
o Mixing Option Names
o Option Name Abbreviations
Option Arguments
o Option with No Argument
o Option with Required Argument
o Option with Optional Argument
o Argument Abbreviations
Argument Values
o Explicit Argument Values
Explicit Values in Array
Explicit Values in Hash
o Argument Value Patterns
Keyword Argument into
o Collecting Options
o Checking for Missing Options
o Default Values for Options
Argument Converters
Help
Top List and Base List
Defining Options
Parsing
o Method parse!
o Method parse
o Method order!
o Method order
o Method permute!
o Method permute
To Begin With
To use OptionParser:
1. Require the OptionParser code.
2. Create an OptionParser object.
3. Define one or more options.
4. Parse the command line.
File basic.rb defines three options, -x, -y, and -z, each with a descriptive string, and each with a block.
From these defined options, the parser automatically builds help text:
-x Whether to X
-y Whether to Y
-z Whether to Z
When an option is found during parsing, the block defined for the option is called with the argument value.
An invalid option raises an exception.
Method parse!, which is used most often in this tutorial, removes from ARGV the options and arguments it
finds, leaving other non-option arguments for the program to handle on its own. The method returns the
possibly-reduced ARGV array.
Executions:
$ ruby basic.rb -x -z
["x", true]
["z", true]
[]
$ ruby basic.rb -z -y -x
["z", true]
["y", true]
["x", true]
[]
["x", true]
["input_file.txt", "output_file.txt"]
$ ruby basic.rb -a
Option Names
You can give an option one or more names of two types:
Short (1-character) name, beginning with one hyphen (-).
Long (multi-character) name, beginning with two hyphens (--).
require 'optparse'
parser = OptionParser.new
parser.on('-x', 'Short name') do |value|
p ['x', value]
end
parser.on('-1', '-%', 'Two short names') do |value|
p ['-1 or -%', value]
end
parser.parse!
Executions:
-x Short name
-1, -% Two short names
$ ruby short_names.rb -x
["x", true]
$ ruby short_names.rb -1
$ ruby short_names.rb -%
["x", true]
require 'optparse'
parser = OptionParser.new
parser.on('--xxx', 'Long name') do |value|
p ['-xxx', value]
end
parser.on('--y1%', '--z2#', "Two long names") do |value|
p ['--y1% or --z2#', value]
end
parser.parse!
Executions:
$ ruby long_names.rb --help
["-xxx", true]
A long name may be defined with both positive and negative senses.
File long_with_negation.rb defines an option that has both senses.
require 'optparse'
parser = OptionParser.new
parser.on('--[no-]binary', 'Long name with negation') do |value|
p [value, value.class]
end
parser.parse!
Executions:
[true, TrueClass]
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument') do |value|
p ['--xxx', value]
end
parser.on('-yYYY', '--yyy', 'Short and long, required argument') do |value|
p ['--yyy', value]
end
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument') do |value|
p ['--zzz', value]
end
parser.parse!
Executions:
$ ruby mixed_names.rb -x
["--xxx", true]
["--xxx", true]
$ ruby mixed_names.rb -y
mixed_names.rb:12:in `<main>': missing argument: -y (OptionParser::MissingArgument)
["--yyy", "FOO"]
["--yyy", "BAR"]
$ ruby mixed_names.rb -z
["--zzz", nil]
["--zzz", "BAZ"]
["--zzz", nil]
["--zzz", "BAT"]
require 'optparse'
parser = OptionParser.new
parser.on('-n', '--dry-run',) do |value|
p ['--dry-run', value]
end
parser.on('-d', '--draft',) do |value|
p ['--draft', value]
end
parser.parse!
Executions:
-n, --dry-run
-d, --draft
$ ruby name_abbrev.rb -n
["--dry-run", true]
["--dry-run", true]
$ ruby name_abbrev.rb -d
["--draft", true]
["--draft", true]
["--dry-run", true]
["--draft", true]
require 'optparse'
parser = OptionParser.new
parser.on('-n', '--dry-run',) do |value|
p ['--dry-run', value]
end
parser.on('-d', '--draft',) do |value|
p ['--draft', value]
end
parser.require_exact = true
parser.parse!
Executions:
["--dry-run", true]
Option Arguments
An option may take no argument, a required argument, or an optional argument.
require 'optparse'
parser = OptionParser.new
parser.on('-x XXX', '--xxx', 'Required argument via short name') do |value|
p ['--xxx', value]
end
parser.on('-y', '--y YYY', 'Required argument via long name') do |value|
p ['--yyy', value]
end
parser.parse!
["--xxx", "AAA"]
["--yyy", "BBB"]
$ ruby required_argument.rb -x
require 'optparse'
parser = OptionParser.new
parser.on('-x [XXX]', '--xxx', 'Optional argument via short name') do |value|
p ['--xxx', value]
end
parser.on('-y', '--yyy [YYY]', 'Optional argument via long name') do |value|
p ['--yyy', value]
end
parser.parse!
["--xxx", "AAA"]
["--yyy", "BBB"]
Argument Values
Permissible argument values may be restricted either by specifying explicit values or by providing a pattern
that the given value must match.
require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', ['foo', 'bar'], 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', ['baz', 'bat'], 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!
Executions:
$ ruby explicit_array_values.rb -x
["-x", "foo"]
$ ruby explicit_array_values.rb -x f
["-x", "foo"]
["-x", "bar"]
$ ruby explicit_array_values.rb -y ba
require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', {foo: 0, bar: 1}, 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', {baz: 2, bat: 3}, 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!
Executions:
$ ruby explicit_hash_values.rb -x
["-x", 0]
$ ruby explicit_hash_values.rb -x f
["-x", 0]
["-x", 1]
$ ruby explicit_hash_values.rb -y
["-y", nil]
["-y", 2]
["-y", 3]
$ ruby explicit_hash_values.rb -y ba
["-y", nil]
require 'optparse'
parser = OptionParser.new
parser.on('--xxx XXX', /foo/i, 'Matched values') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:
["--xxx", "foo"]
["--xxx", "FOO"]
Collecting Options
Use keyword argument into to collect options.
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument')
parser.on('-yYYY', '--yyy', 'Short and long, required argument')
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument')
options = {}
parser.parse!(into: options)
p options
Executions:
{:xxx=>true}
{:xxx=>true, :yyy=>"FOO"}
{:xxx=>true, :yyy=>"BAR"}
Note in the last execution that the argument value for option --yyy was overwritten.
Checking for Missing Options
Use the collected options to check for missing options.
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument')
parser.on('-yYYY', '--yyy', 'Short and long, required argument')
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument')
options = {}
parser.parse!(into: options)
required_options = [:xxx, :zzz]
missing_options = required_options - options.keys
unless missing_options.empty?
fail "Missing required options: #{missing_options}"
end
Executions:
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument')
parser.on('-yYYY', '--yyy', 'Short and long, required argument')
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument')
options = {yyy: 'AAA', zzz: 'BBB'}
parser.parse!(into: options)
p options
Executions:
{:yyy=>"FOO", :zzz=>"BBB"}
Argument Converters
An option can specify that its argument is to be converted from the default String to an instance of another
class. There are a number of built-in converters.
Example: File date.rb defines an option whose argument is to be converted to a Date object. The argument
is converted by method Date#parse.
require 'optparse/date'
parser = OptionParser.new
parser.on('--date=DATE', Date) do |value|
p [value, value.class]
end
parser.parse!
Executions:
Help
OptionParser makes automatically generated help text available.
The help text consists of:
A banner, showing the usage.
Option short and long names.
Option dummy argument names.
Option descriptions.
Example code:
require 'optparse'
parser = OptionParser.new
parser.on(
'-x', '--xxx',
'Adipiscing elit. Aenean commodo ligula eget.',
'Aenean massa. Cum sociis natoque penatibus',
)
parser.on(
'-y', '--yyy YYY',
'Lorem ipsum dolor sit amet, consectetuer.'
)
parser.on(
'-z', '--zzz [ZZZ]',
'Et magnis dis parturient montes, nascetur',
'ridiculus mus. Donec quam felis, ultricies',
'nec, pellentesque eu, pretium quis, sem.',
)
parser.parse!
The option names and dummy argument names are defined as described above.
The option description consists of the strings that are not themselves option names; An option can have
more than one description string. Execution:
The program name is included in the default banner: Usage: #{program_name} [options]; you can change
the program name.
require 'optparse'
parser = OptionParser.new
parser.program_name = 'help_program_name.rb'
parser.parse!
Execution:
require 'optparse'
parser = OptionParser.new
parser.banner = "Usage: ruby help_banner.rb"
parser.parse!
Execution:
By default, the option names are indented 4 spaces and the width of the option-names field is 32 spaces.
You can change these values, along with the banner, by passing parameters to OptionParser.new.
require 'optparse'
parser = OptionParser.new(
'ruby help_format.rb [options]', # Banner
20, # Width of options field
''*2 # Indentation
)
parser.on(
'-x', '--xxx',
'Adipiscing elit. Aenean commodo ligula eget.',
'Aenean massa. Cum sociis natoque penatibus',
)
parser.on(
'-y', '--yyy YYY',
'Lorem ipsum dolor sit amet, consectetuer.'
)
parser.on(
'-z', '--zzz [ZZZ]',
'Et magnis dis parturient montes, nascetur',
'ridiculus mus. Donec quam felis, ultricies',
'nec, pellentesque eu, pretium quis, sem.',
)
parser.parse!
Execution:
Defining Options
Option-defining methods allow you to create an option, and also append/prepend it to the top list or append
it to the base list.
Each of these next three methods accepts a sequence of parameter arguments and a block, creates an
option object using method Option#make_switch (see below), and returns the created option:
Method OptionParser#define appends the created option to the top list.
Method OptionParser#define_head prepends the created option to the top list.
Method OptionParser#define_tail appends the created option to the base list.
These next three methods are identical to the three above, except for their return values:
Method OptionParser#on is identical to method OptionParser#define, except that it returns the
parser object self.
Method OptionParser#on_head is identical to method OptionParser#define_head, except that it
returns the parser object self.
Method OptionParser#on_tail is identical to method OptionParser#define_tail, except that it returns
the parser object self.
Though you may never need to call it directly, here’s the core method for defining an option:
Method OptionParser#make_switch accepts an array of parameters and a block. See Parameters
for New Options. This method is unlike others here in that it:
o Accepts an array of parameters; others accept a sequence of parameter arguments.
o Returns an array containing the created option object, option names, and other values;
others return either the created option object or the parser object self.
Parsing
OptionParser has six instance methods for parsing.
Three have names ending with a “bang” (!):
parse!
order!
permute!
Each of these methods:
Accepts an optional array of string arguments argv; if not given, argv defaults to the value
of OptionParser#default_argv, whose initial value is ARGV.
Accepts an optional keyword argument into (see Keyword Argument into).
Returns argv, possibly with some elements removed.
The three other methods have names not ending with a “bang”:
parse
order
permute
Each of these methods:
Accepts an array of string arguments or zero or more string arguments.
Accepts an optional keyword argument into and its value into. (see Keyword Argument into).
Returns argv, possibly with some elements removed.
Method parse!
Method parse!:
Accepts an optional array of string arguments argv; if not given, argv defaults to the value
of OptionParser#default_argv, whose initial value is ARGV.
Accepts an optional keyword argument into (see Keyword Argument into).
Returns argv, possibly with some elements removed.
The method processes the elements in argv beginning at argv[0], and ending, by default, at the end.
Otherwise processing ends and the method returns when:
The terminator argument -- is found; the terminator argument is removed before the return.
Environment variable POSIXLY_CORRECT is defined and a non-option argument is found; the
non-option argument is not removed. Note that the value of that variable does not matter, as only
its existence is checked.
File parse_bang.rb:
require 'optparse'
parser = OptionParser.new
parser.on('--xxx') do |value|
p ['--xxx', value]
end
parser.on('--yyy YYY') do |value|
p ['--yyy', value]
end
parser.on('--zzz [ZZZ]') do |value|
p ['--zzz', value]
end
ret = parser.parse!
puts "Returned: #{ret} (#{ret.class})"
Help:
--xxx
--yyy YYY
--zzz [ZZZ]
Default behavior:
["--xxx", true]
["--yyy", "FOO"]
["--zzz", "BAR"]
["--xxx", true]
["--yyy", "FOO"]
["--xxx", true]
Method parse
Method parse:
Accepts an array of string arguments or zero or more string arguments.
Accepts an optional keyword argument into and its value into. (see Keyword Argument into).
Returns argv, possibly with some elements removed.
If given an array ary, the method forms array argv as ary.dup. If given zero or more string arguments, those
arguments are formed into array argv.
The method calls
Note that environment variable POSIXLY_CORRECT and the terminator argument -- are honored.
File parse.rb:
require 'optparse'
parser = OptionParser.new
parser.on('--xxx') do |value|
p ['--xxx', value]
end
parser.on('--yyy YYY') do |value|
p ['--yyy', value]
end
parser.on('--zzz [ZZZ]') do |value|
p ['--zzz', value]
end
ret = parser.parse(ARGV)
puts "Returned: #{ret} (#{ret.class})"
Help:
--xxx
--yyy YYY
--zzz [ZZZ]
Default behavior:
["--xxx", true]
["--yyy", "FOO"]
["--zzz", "BAR"]
Returned: ["input_file.txt", "output_file.txt"] (Array)
["--xxx", true]
["--yyy", "FOO"]
["--xxx", true]
Method order!
Calling method OptionParser#order! gives exactly the same result as calling
method OptionParser#parse! with environment variable POSIXLY_CORRECT defined.
Method order
Calling method OptionParser#order gives exactly the same result as calling
method OptionParser#parse with environment variable POSIXLY_CORRECT defined.
Method permute!
Calling method OptionParser#permute! gives exactly the same result as calling
method OptionParser#parse! with environment variable POSIXLY_CORRECT not defined.
Method permute
Calling method OptionParser#permute gives exactly the same result as calling
method OptionParser#parse with environment variable POSIXLY_CORRECT not defined.
Packed Data
Certain Ruby core methods deal with packing and unpacking data:
Method Array#pack: Formats each element in array self into a binary string; returns that string.
Method String#unpack: Extracts data from string self, forming objects that become the elements of
a new array; returns that array.
Method String#unpack1: Does the same, but unpacks and returns only the first extracted object.
Each of these methods accepts a string template, consisting of zero or more directive characters, each
followed by zero or more modifier characters.
Examples (directive 'C' specifies ‘unsigned character’):
The string template may contain any mixture of valid directives (directive 'c' specifies ‘signed character’):
The string template may contain whitespace (which is ignored) and comments, each of which begins with
character '#' and continues up to and including the next following newline:
Packing Method
Method Array#pack accepts optional keyword argument buffer that specifies the target string (instead of a
new string):
Unpacking Methods
Methods String#unpack and String#unpack1 each accept an optional keyword argument offset that
specifies an offset into the string:
Integer Directives
Each integer directive specifies the packing or unpacking for one element in the input or output array.
s = [67305985, -50462977].pack('l*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('l*')
# => [67305985, -50462977]
'L' - 32-bit unsigned integer, native-endian (like C uint32_t):
s = [67305985, 4244504319].pack('L*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('L*')
# => [67305985, 4244504319]
s = [0,1,-1].pack('N*')
# => "\x00\x00\x00\x00\x00\x00\x00\x01\xFF\xFF\xFF\xFF"
s.unpack('N*')
# => [0, 1, 4294967295]
s = [0,1,-1].pack('V*')
# => "\x00\x00\x00\x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF"
s.unpack('v*')
# => [0, 0, 1, 0, 65535, 65535]
s = [578437695752307201, -506097522914230529].pack('q*')
# => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8"
s.unpack('q*')
# => [578437695752307201, -506097522914230529]
s = [578437695752307201, 17940646550795321087].pack('Q*')
# => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8"
s.unpack('Q*')
# => [578437695752307201, 17940646550795321087]
s = [67305985, -50462977].pack('i*')
# => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
s.unpack('i*')
# => [67305985, -50462977]
Pointer Directives
'j' - 64-bit pointer-width signed integer, native-endian (like C intptr_t):
s = [67305985, -50462977].pack('j*')
# => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\xFF\xFF\xFF\xFF"
s.unpack('j*')
# => [67305985, -50462977]
s = [67305985, 4244504319].pack('J*')
# => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\x00\x00\x00\x00"
s.unpack('J*')
# => [67305985, 4244504319]
s = [4194304].pack('U*')
# => "\xF8\x90\x80\x80\x80"
s.unpack('U*')
# => [4194304]
s = [1073741823].pack('w*')
# => "\x83\xFF\xFF\xFF\x7F"
s.unpack('w*')
# => [1073741823]
Float Directives
Each float directive specifies the packing or unpacking for one element in the input or output array.
'e' - Little-endian:
'g' - Big-endian:
'E' - Little-endian:
'G' - Big-endian:
String Directives
Each string directive specifies the packing or unpacking for one byte in the input or output string.
'Z' - Same as 'a', except that null is added or ignored with '*':
'm' - Base64 encoded string; count specifies input bytes between each newline, rounded down to
nearest multiple of 3; if count is zero, no newlines are added; (see RFC 4648):
Offset Directives
'@' - Begin packing at the given byte offset; for packing, null fill if necessary:
Summary
Thread-safety
Ractor helps to write a thread-safe concurrent program, but we can make thread-unsafe programs with
Ractors.
GOOD: Sharing limitation
Most objects are unshareable, so we can’t make data-racy and race-conditional programs.
Shareable objects are protected by an interpreter or locking mechanism.
BAD: Class/Module can violate this assumption
To make it compatible with old behavior, classes and modules can introduce data-race and so on.
Ruby programmers should take care if they modify class/module objects on multi Ractorprograms.
BAD: Ractor can’t solve all thread-safety problems
There are several blocking operations (waiting send, waiting yield and waiting take) so you can
make a program which has dead-lock and live-lock issues.
Some kind of shareable objects can introduce transactions (STM, for example). However,
misusing transactions will generate inconsistent state.
Without Ractor, we need to trace all state-mutations to debug thread-safety issues. With Ractor, you can
concentrate on suspicious code which are shared with Ractors.
begin
a = true
r = Ractor.new do
a #=> ArgumentError because this block accesses `a`.
end
r.take # see later
rescue ArgumentError
end
r = Ractor.new do
p self.class #=> Ractor
self.object_id
end
r.take == self.object_id #=> false
Passed arguments to Ractor.new() becomes block parameters for the given block. However, an interpreter
does not pass the parameter object references, but send them as messages (see below for details).
r = Ractor.new do
'ok'
end
r.take #=> `ok`
# almost similar to the last example
r = Ractor.new do
Ractor.yield 'ok'
end
r.take #=> 'ok'
Error in the given block will be propagated to the receiver of an outgoing message.
r = Ractor.new do
raise 'ok' # exception will be transferred to the receiver
end
begin
r.take
rescue Ractor::RemoteError => e
e.cause.class #=> RuntimeError
e.cause.message #=> 'ok'
e.ractor #=> r
end
Sending/Receiving ports
Each Ractor has incoming-port and outgoing-port. Incoming-port is connected to the infinite sized incoming
queue.
Ractor r
+-------------------------------------------+
| incoming outgoing |
| port port |
| | |
| v |
| Ractor.receive |
+-------------------------------------------+
+----+ +----+
* r1 |---->* r2 *
+----+ +----+
+----+ +----+
* r1 *---->- r2 *
+----+ +----+
+----+
* r1 *------+
+----+ |
+----+ |
* r2 *------|
+----+
r = Ractor.new do
msg = Ractor.receive # Receive from r's incoming queue
msg # send back msg as block return value
end
r.send 'ok' # Send 'ok' to r's incoming port -> incoming queue
r.take # Receive from r's outgoing port
+------+ +---+
+------+ +---+ |
^ |
+-------------------+
msg # Return value of the given block will be sent via outgoing port
end
When the block return value is available, the Ractor is dead so that no ractors except taken Ractorcan
touch the return value, so any values can be sent with this communication path without any modification.
r = Ractor.new do
a = "hello"
binding
end
r.take.eval("p a") #=> "hello" (other communication path can not send a Binding object directly)
r1 = Ractor.new{'r1'}
r, obj = Ractor.select(r1)
r == r1 and obj == 'r1' #=> true
r1 = Ractor.new{'r1'}
r2 = Ractor.new{'r2'}
rs = [r1, r2]
as = []
Complex example:
pipe = Ractor.new do
loop do
Ractor.yield Ractor.receive
end
end
RN = 10
rs = RN.times.map{|i|
Ractor.new pipe, i do |pipe, i|
msg = pipe.take
msg # ping-pong
end
}
RN.times{|i|
pipe << i
}
RN.times.map{
r, n = Ractor.select(*rs)
rs.delete r
n
}.sort #=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pipe = Ractor.new do
loop do
Ractor.yield Ractor.receive
end
end
RN = 10
rs = RN.times.map{|i|
Ractor.new pipe, i do |pipe, i|
pipe << i
end
}
RN.times.map{
pipe.take
}.sort #=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
TODO: Current Ractor.select() has the same issue of select(2), so this interface should be refined.
TODO: select syntax of go-language uses round-robin technique to make fair scheduling.
Now Ractor.select() doesn’t use it.
r = Ractor.new do
'finish'
end
r.take # success (will return 'finish')
begin
o = r.take # try to take from closed Ractor
rescue Ractor::ClosedError
'ok'
else
"ng: #{o}"
end
begin
r.send(1)
rescue Ractor::ClosedError
'ok'
else
'ng'
end
When multiple Ractors are waiting for Ractor.yield(), Ractor#close_outgoing will cancel all blocking by
raising an exception (ClosedError).
obj = 'str'.dup
r = Ractor.new obj do |msg|
# return received msg's object_id
msg.object_id
end
Some objects are not supported to copy the value, and raise an exception.
obj = Thread.new{}
begin
Ractor.new obj do |msg|
msg
end
rescue TypeError => e
e.message #=> #<TypeError: allocator undefined for Thread>
else
'ng' # unreachable here
end
str = 'hello'
r.send str, move: true
modified = r.take #=> 'hello world'
begin
# Error because it touches moved str.
str << ' exception' # raise Ractor::MovedError
rescue Ractor::MovedError
modified #=> 'hello world'
else
raise 'unreachable'
end
# move with Ractor.yield
r = Ractor.new do
obj = 'hello'
Ractor.yield obj, move: true
obj << 'world' # raise Ractor::MovedError
end
str = r.take
begin
r.take
rescue Ractor::RemoteError
p str #=> "hello"
end
Some objects are not supported to move, and an exception will be raised.
r = Ractor.new do
Ractor.receive
end
To achieve the access prohibition for moved objects, class replacement technique is used to implement it.
Shareable objects
The following objects are shareable.
Immutable objects
Small integers, some symbols, true, false, nil (a.k.a. SPECIAL_CONST_P() objects in internal)
Frozen native objects
o Numeric objects: Float, Complex, Rational, big integers (T_BIGNUM in internal)
o All Symbols.
Frozen String and Regexp objects (their instance variables should refer only shareable objects)
Class, Module objects (T_CLASS, T_MODULE and T_ICLASS in internal)
Ractor and other special objects which care about synchronization.
Implementation: Now shareable objects (RVALUE) have FL_SHAREABLE flag. This flag can be added
lazily.
To make shareable objects, Ractor.make_shareable(obj) method is provided. In this case, try to make
sharaeble by freezing obj and recursively travasible objects. This method accepts copy:keyword (default
value is false).Ractor.make_shareable(obj, copy: true) tries to make a deep copy of obj and make the
copied object shareable.
Global variables
Only the main Ractor (a Ractor created at starting of interpreter) can access global variables.
$gv = 1
r = Ractor.new do
$gv
end
begin
r.take
rescue Ractor::RemoteError => e
e.cause.message #=> 'can not access global variables from non-main Ractors'
end
Note that some special global variables are ractor-local, like $stdin, $stdout, $stderr. See [Bug #17268] for
more details.
class C
@iv = 1
end
p Ractor.new do
class C
@iv
end
end.take #=> 1
Otherwise, only the main Ractor can access instance variables of shareable objects.
class C
@iv = [] # unshareable object
end
Ractor.new do
class C
begin
p @iv
rescue Ractor::IsolationError
p $!.message
#=> "can not get unshareable values from instance variables of classes/modules from non-main Ractors"
end
begin
@iv = 42
rescue Ractor::IsolationError
p $!.message
#=> "can not set instance variables of classes/modules by non-main Ractors"
end
end
end.take
shared = Ractor.new{}
shared.instance_variable_set(:@iv, 'str')
begin
r.take
rescue Ractor::RemoteError => e
e.cause.message #=> can not access instance variables of shareable objects from non-main Ractors
(Ractor::IsolationError)
end
Note that instance variables for class/module objects are also prohibited on Ractors.
Class variables
Only the main Ractor can access class variables.
class C
@@cv = 'str'
end
r = Ractor.new do
class C
p @@cv
end
end
begin
r.take
rescue => e
e.class #=> Ractor::IsolationError
end
Constants
Only the main Ractor can read constants which refer to the unshareable object.
class C
CONST = 'str'
end
r = Ractor.new do
C::CONST
end
begin
r.take
rescue => e
e.class #=> Ractor::IsolationError
end
Only the main Ractor can define constants which refer to the unshareable object.
class C
end
r = Ractor.new do
C::CONST = 'str'
end
begin
r.take
rescue => e
e.class #=> Ractor::IsolationError
end
To make multi-ractor supported library, the constants should only refer shareable objects.
In this case, TABLE references an unshareable Hash object. So that other ractors can not
refer TABLE constant. To make it shareable, we can use Ractor.make_shareable() like that.
# shareable_constant_value: literal
shareable_constant_value directive accepts the following modes (descriptions use the example: CONST =
expr):
none: Do nothing. Same as: CONST = expr
literal:
Implementation note
Each Ractor has its own thread, it means each Ractor has at least 1 native thread.
Each Ractor has its own ID (rb_ractor_t::pub::id).
On debug mode, all unshareable objects are labeled with current Ractor’s id, and it is checked to
detect unshareable object leak (access an object from different Ractor) in VM.
Examples
RN = 1_000
CR = Ractor.current
r = Ractor.new do
p Ractor.receive
CR << :fin
end
RN.times{
r = Ractor.new r do |next_r|
next_r << Ractor.receive
end
}
p :setup_ok
r << 1
p Ractor.receive
Fork-join
def fib n
if n < 2
1
else
fib(n-2) + fib(n-1)
end
end
RN = 10
rs = (1..RN).map do |i|
Ractor.new i do |i|
[i, fib(i)]
end
end
until rs.empty?
r, v = Ractor.select(*rs)
rs.delete r
p answer: v
end
Worker pool
require 'prime'
pipe = Ractor.new do
loop do
Ractor.yield Ractor.receive
end
end
N = 1000
RN = 10
workers = (1..RN).map do
Ractor.new pipe do |pipe|
while n = pipe.take
Ractor.yield [n, n.prime?]
end
end
end
(1..N).each{|i|
pipe << i
}
pp (1..N).map{
_r, (n, b) = Ractor.select(*workers)
[n, b]
}.sort_by{|(n, b)| n}
Pipeline
r2 = Ractor.new r1 do |r1|
r1.take + 'r2'
end
r3 = Ractor.new r2 do |r2|
r2.take + 'r3'
end
r2 = Ractor.new r3 do |r3|
r3.send Ractor.receive + 'r2'
end
r1 = Ractor.new r2 do |r2|
r2.send Ractor.receive + 'r1'
end
r1 << 'r0'
p Ractor.receive #=> "r0r1r2r3"
Supervise
r = Ractor.current
(1..10).map{|i|
r = Ractor.new r, i do |r, i|
r.send Ractor.receive + "r#{i}"
end
}
r.send "r0"
p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1"
# ring example with an error
r = Ractor.current
rs = (1..10).map{|i|
r = Ractor.new r, i do |r, i|
loop do
msg = Ractor.receive
raise if /e/ =~ msg
r.send msg + "r#{i}"
end
end
}
r.send "r0"
p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1"
r.send "r0"
p Ractor.select(*rs, Ractor.current) #=> [:receive, "r0r10r9r8r7r6r5r4r3r2r1"]
r.send "e0"
p Ractor.select(*rs, Ractor.current)
#=>
#<Thread:0x000056262de28bd8 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
2: from /home/ko1/src/ruby/trunk/test.rb7:in `block (2 levels) in <main>'
1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop'
/home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception
Traceback (most recent call last):
2: from /home/ko1/src/ruby/trunk/test.rb7:in `block (2 levels) in <main>'
1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop'
/home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception
1: from /home/ko1/src/ruby/trunk/test.rb21:in `<main>'
<internal:ractor>:69:in `select': thrown by remote Ractor. (Ractor::RemoteError)
# resend non-error message
r = Ractor.current
rs = (1..10).map{|i|
r = Ractor.new r, i do |r, i|
loop do
msg = Ractor.receive
raise if /e/ =~ msg
r.send msg + "r#{i}"
end
end
}
r.send "r0"
p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1"
r.send "r0"
p Ractor.select(*rs, Ractor.current)
[:receive, "r0r10r9r8r7r6r5r4r3r2r1"]
msg = 'e0'
begin
r.send msg
p Ractor.select(*rs, Ractor.current)
rescue Ractor::RemoteError
msg = 'r0'
retry
end
def make_ractor r, i
Ractor.new r, i do |r, i|
loop do
msg = Ractor.receive
raise if /e/ =~ msg
r.send msg + "r#{i}"
end
end
end
r = Ractor.current
rs = (1..10).map{|i|
r = make_ractor(r, i)
}
Regular expressions (regexps) are patterns which describe the contents of a string. They’re used for testing
whether a string contains a given pattern, or extracting the portions that match. They are created with
the /pat/ and %r{pat} literals or the Regexp.new constructor.
A regexp is usually delimited with forward slashes (/). For example:
/hay/ =~ 'haystack' #=> 0
/y/.match('haystack') #=> #<MatchData "y">
If a string contains the pattern it is said to match. A literal string matches itself.
Here ‘haystack’ does not contain the pattern ‘needle’, so it doesn’t match:
Specifically, /st/ requires that the string contains the letter s followed by the letter t, so it matches haystack,
also.
Note that any Regexp matching will raise a RuntimeError if timeout is set and exceeded.
See “Timeout” section in detail.
Regexp Interpolation
A regexp may contain interpolated strings; trivially:
foo = 'bar'
/#{foo}/ # => /bar/
=~ and Regexp#match
Pattern matching may be achieved by using =~ operator or Regexp#match method.
=~ Operator
=~ is Ruby’s basic pattern-matching operator. When one operand is a regular expression and the other is a
string then the regular expression is used as a pattern to match against the string. (This operator is
equivalently defined by Regexp and String so the order of String and Regexp do not matter. Other classes
may have different implementations of =~.) If a match is found, the operator returns index of first match in
string, otherwise it returns nil.
Using =~ operator with a String and Regexp the $~ global variable is set after a successful match. $~ holds
a MatchData object. Regexp.last_match is equivalent to $~.
Regexp#match Method
The match method returns a MatchData object:
Patterns behave like double-quoted strings and can contain the same backslash escapes (the meaning of \
s is different, however, see below).
Arbitrary Ruby expressions can be embedded into patterns with the #{...} construct.
place = "東京都"
/#{place}/.match("Go to 東京都")
#=> #<MatchData "東京都">
Character Classes
A character class is delimited with square brackets ([, ]) and lists characters that may appear at that point in
the match. /[ab]/ means a or b, as opposed to /ab/ which means a followed by b.
Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of
characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is
equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is
irrelevant.
If the first character of a character class is a caret (^) the class is inverted: it matches any
character except those named.
A character class may contain another character class. By itself this isn’t useful because [a-z[0-9]]describes
the same set as [a-z0-9]. However, character classes also support the && operator which performs set
intersection on its arguments. The two can be combined as follows:
Repetition
The constructs described so far match a single character. They can be followed by a repetition
metacharacter to specify how many times they need to occur. Such metacharacters are called quantifiers.
* - Zero or more times
+ - One or more times
? - Zero or one times (optional)
{n} - Exactly n times
{n,} - n or more times
{,m} - m or less times
{n,m} - At least n and at most m times
At least one uppercase character (‘H’), at least one lowercase character (‘e’), two ‘l’ characters, then one ‘o’:
Greedy Match
Repetition is greedy by default: as many occurrences as possible are matched while still allowing the overall
match to succeed. By contrast, lazy matching makes the minimal amount of matches necessary for overall
success. Most greedy metacharacters can be made lazy by following them with ?. For the {n} pattern,
because it specifies an exact number of characters to match and not a variable number of characters,
the ? metacharacter instead makes the repeated pattern optional.
Both patterns below match the string. The first uses a greedy quantifier so ‘.+’ matches ‘<a><b>’; the
second uses a lazy quantifier so ‘.+?’ matches ‘<a>’:
Possessive Match
A quantifier followed by + matches possessively: once it has matched it does not backtrack. They behave
like greedy quantifiers, but having matched they refuse to “give up” their match even if this jeopardises the
overall match.
Capturing
Parentheses can be used for capturing. The text enclosed by the nth group of parentheses can be
subsequently referred to with n. Within a pattern use the backreference \n (e.g. \1); outside of the pattern
use MatchData[n] (e.g. MatchData[1]).
In this example, 'at' is captured by the first group of parentheses, then referred to later with \1:
/[csh](..) [csh]\1 in/.match("The cat sat in the hat")
#=> #<MatchData "cat sat in" 1:"at">
Regexp#match returns a MatchData object which makes the captured text available with its [] method:
While Ruby supports an arbitrary number of numbered captured groups, only groups 1-9 are supported
using the \n backreference syntax.
Ruby also supports \0 as a special backreference, which references the entire matched string. This is also
available at MatchData[0]. Note that the \0 backreference cannot be used inside the regexp, as
backreferences can only be used after the end of the capture group, and the \0 backreference uses the
implicit capture group of the entire match. However, you can use this backreference when doing
substitution:
Named Captures
Capture groups can be referred to by name when defined with the (?<name>) or (?'name')constructs.
/\$(?<dollars>\d+)\.(?<cents>\d+)/.match("$3.67")
#=> #<MatchData "$3.67" dollars:"3" cents:"67">
/\$(?<dollars>\d+)\.(?<cents>\d+)/.match("$3.67")[:dollars] #=> "3"
Named groups can be backreferenced with \k<name>, where name is the group name.
/(?<vowel>[aeiou]).\k<vowel>.\k<vowel>/.match('ototomy')
#=> #<MatchData "ototo" vowel:"o">
Note: A regexp can’t use named backreferences and numbered backreferences simultaneously. Also, if a
named capture is used in a regexp, then parentheses used for grouping which would otherwise result in a
unnamed capture are treated as non-capturing.
When named capture groups are used with a literal regexp on the left-hand side of an expression and
the =~ operator, the captured text is also assigned to local variables with corresponding names.
Grouping
Parentheses also group the terms they enclose, allowing them to be quantified as one atomic whole.
The pattern below matches a vowel followed by 2 word characters:
Whereas the following pattern matches a vowel followed by a word character, twice, i.e. [aeiou]\w[aeiou]\w:
‘enor’.
/([aeiou]\w){2}/.match("Caenorhabditis elegans")
#=> #<MatchData "enor" 1:"or">
The (?:…) construct provides grouping without capturing. That is, it combines the terms it contains into an
atomic whole without creating a backreference. This benefits performance at the slight expense of
readability.
The first group of parentheses captures ‘n’ and the second ‘ti’. The second group is referred to later with the
backreference \2:
/I(n)ves(ti)ga\2ons/.match("Investigations")
#=> #<MatchData "Investigations" 1:"n" 2:"ti">
The first group of parentheses is now made non-capturing with ‘?:’, so it still matches ‘n’, but doesn’t create
the backreference. Thus, the backreference \1 now refers to ‘ti’.
/I(?:n)ves(ti)ga\1ons/.match("Investigations")
#=> #<MatchData "Investigations" 1:"ti">
Atomic Grouping
Grouping can be made atomic with (?>pat). This causes the subexpression pat to be matched
independently of the rest of the expression such that what it matches becomes fixed for the remainder of the
match, unless the entire subexpression must be abandoned and subsequently revisited. In this way pat is
treated as a non-divisible whole. Atomic grouping is typically used to optimise patterns so as to prevent the
regular expression engine from backtracking needlessly.
The " in the pattern below matches the first character of the string, then .* matches Quote“. This causes the
overall match to fail, so the text matched by .* is backtracked by one position, which leaves the final
character of the string available to match "
Subexpression Calls
The \g<name> syntax matches the previous subexpression named name, which can be a group name or
number, again. This differs from backreferences in that it re-executes the group rather than simply trying to
re-match the same text.
This pattern matches a ( character and assigns it to the paren group, tries to call that the paren sub-
expression again but fails, then matches a literal ):
/\A(?<paren>\(\g<paren>*\))*\z/ =~ '()'
1. Matches at the beginning of the string, i.e. before the first character.
2. Enters a named capture group called paren
3. Matches a literal (, the first character in the string
4. Calls the paren group again, i.e. recurses back to the second step
5. Re-enters the paren group
6. Matches a literal (, the second character in the string
7. Try to call paren a third time, but fail because doing so would prevent an overall successful match
8. Match a literal ), the third character in the string. Marks the end of the second recursive call
9. Match a literal ), the fourth character in the string
10. Match the end of the string
Alternation
The vertical bar metacharacter (|) combines several expressions into a single one that matches any of the
expressions. Each expression is an alternative.
Character Properties
The \p{} construct matches characters with the named property, much like POSIX bracket classes.
/\p{Alnum}/ - Alphabetic and numeric character
/\p{Alpha}/ - Alphabetic character
/\p{Blank}/ - Space or tab
/\p{Cntrl}/ - Control character
/\p{Digit}/ - Digit
/\p{Emoji}/ - Unicode emoji
/\p{Graph}/ - Non-blank character (excludes spaces, control characters, and similar)
/\p{Lower}/ - Lowercase alphabetical character
/\p{Print}/ - Like \p{Graph}, but includes the space character
/\p{Punct}/ - Punctuation character
/\p{Space}/ - Whitespace character ([:blank:], newline, carriage return, etc.)
/\p{Upper}/ - Uppercase alphabetical
/\p{XDigit}/ - Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)
/\p{Word}/ - A member of one of the following Unicode general
category Letter, Mark, Number, Connector_Punctuation
/\p{ASCII}/ - A character in the ASCII character set
/\p{Any}/ - Any Unicode character (including unassigned characters)
/\p{Assigned}/ - An assigned character
A Unicode character’s General Category value can also be matched with \p{Ab} where Ab is the category’s
abbreviation as described below:
/\p{L}/ - ‘Letter’
/\p{Ll}/ - ‘Letter: Lowercase’
/\p{Lm}/ - ‘Letter: Mark’
/\p{Lo}/ - ‘Letter: Other’
/\p{Lt}/ - ‘Letter: Titlecase’
/\p{Lu}/ - ‘Letter: Uppercase
/\p{Lo}/ - ‘Letter: Other’
/\p{M}/ - ‘Mark’
/\p{Mn}/ - ‘Mark: Nonspacing’
/\p{Mc}/ - ‘Mark: Spacing Combining’
/\p{Me}/ - ‘Mark: Enclosing’
/\p{N}/ - ‘Number’
/\p{Nd}/ - ‘Number: Decimal Digit’
/\p{Nl}/ - ‘Number: Letter’
/\p{No}/ - ‘Number: Other’
/\p{P}/ - ‘Punctuation’
/\p{Pc}/ - ‘Punctuation: Connector’
/\p{Pd}/ - ‘Punctuation: Dash’
/\p{Ps}/ - ‘Punctuation: Open’
/\p{Pe}/ - ‘Punctuation: Close’
/\p{Pi}/ - ‘Punctuation: Initial Quote’
/\p{Pf}/ - ‘Punctuation: Final Quote’
/\p{Po}/ - ‘Punctuation: Other’
/\p{S}/ - ‘Symbol’
/\p{Sm}/ - ‘Symbol: Math’
/\p{Sc}/ - ‘Symbol: Currency’
/\p{Sc}/ - ‘Symbol: Currency’
/\p{Sk}/ - ‘Symbol: Modifier’
/\p{So}/ - ‘Symbol: Other’
/\p{Z}/ - ‘Separator’
/\p{Zs}/ - ‘Separator: Space’
/\p{Zl}/ - ‘Separator: Line’
/\p{Zp}/ - ‘Separator: Paragraph’
/\p{C}/ - ‘Other’
/\p{Cc}/ - ‘Other: Control’
/\p{Cf}/ - ‘Other: Format’
/\p{Cn}/ - ‘Other: Not Assigned’
/\p{Co}/ - ‘Other: Private Use’
/\p{Cs}/ - ‘Other: Surrogate’
Lastly, \p{} matches a character’s Unicode script. The following scripts are
supported: Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal,
Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Geo
rgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, K
annada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Mal
ayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osma
nya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac
, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi.
Unicode codepoint U+06E9 is named “ARABIC PLACE OF SAJDAH” and belongs to the Arabic script:
All character properties can be inverted by prefixing their name with a caret (^).
Letter ‘A’ is not in the Unicode Ll (Letter; Lowercase) category, so this match succeeds:
Anchors
Anchors are metacharacter that match the zero-width positions between characters, anchoring the match to
a specific position.
^ - Matches beginning of line
$ - Matches end of line
\A - Matches beginning of string.
\Z - Matches end of string. If string ends with a newline, it matches just before newline
\z - Matches end of string
\G - Matches first matching position:
In methods like String#gsub and String#scan, it changes on each iteration. It initially matches the
beginning of subject, and in each following iteration it matches where the last match finished.
In methods like Regexp#match and String#match that take an (optional) offset, it matches where
the search begins.
\b - Matches word boundaries when outside brackets; backspace (0x08) when inside brackets
\B - Matches non-word boundaries
(?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but
doesn’t include those characters in the matched text
(?!pat) - Negative lookahead assertion: ensures that the following characters do not match pat, but
doesn’t include those characters in the matched text
(?<=pat) - Positive lookbehind assertion: ensures that the preceding characters match pat, but
doesn’t include those characters in the matched text
(?<!pat) - Negative lookbehind assertion: ensures that the preceding characters do not match pat,
but doesn’t include those characters in the matched text
\K - Match reset: the matched content preceding \K in the regexp is excluded from the result. For
example, the following two regexps are almost equivalent:
These match same string and $& equals "c", while the matched position is different.
As are the following two regexps:
/(a)\K(b)\Kc/
/(?<=(?<=(a))(b))c/
Anchoring the pattern to the beginning of the string forces the match to start there. ‘real’ doesn’t occur at the
beginning of the string, so now the match fails:
/\Areal/.match("surrealist") #=> nil
The match below fails because although ‘Demand’ contains ‘and’, the pattern does not occur at a word
boundary.
/\band/.match("Demand")
Whereas in the following example ‘and’ has been anchored to a non-word boundary so instead of matching
the first ‘and’ it matches from the fourth letter of ‘demand’ instead:
The pattern below uses positive lookahead and positive lookbehind to match text appearing in tags without
including the tags in the match:
Options
The end delimiter for a regexp can be followed by one or more single-letter options which control how the
pattern can match.
/pat/i - Ignore case
/pat/m - Treat a newline as a character matched by .
/pat/x - Ignore whitespace and comments in the pattern
/pat/o - Perform #{} interpolation only once
i, m, and x can also be applied on the subexpression level with the (?on-off) construct, which enables
options on, and disables options off for the expression enclosed by the parentheses:
Additionally, these options can also be toggled for the remainder of the pattern:
float_pat = /\A
[[:digit:]]+ # 1 or more digits before the decimal point
(\. # Decimal point
[[:digit:]]+ # 1 or more digits after the decimal point
)? # The decimal point and following digits are optional
\Z/x
float_pat.match('3.14') #=> #<MatchData "3.14" 1:".14">
Performance
Certain pathological combinations of constructs can lead to abysmally bad performance.
Consider a string of 25 as, a d, 4 as, and a c.
s = 'a' * 25 + 'd' + 'a' * 4 + 'c'
#=> "aaaaaaaaaaaaaaaaaaaaaaaaadaaaac"
/(b|a)/ =~ s #=> 0
/(b|a+)/ =~ s #=> 0
/(b|a+)*/ =~ s #=> 0
/(b|a+)*c/ =~ s #=> 26
This happens because an atom in the regexp is quantified by both an immediate + and an enclosing *with
nothing to differentiate which is in control of any particular character. The nondeterminism that results
produces super-linear performance. (Consult Mastering Regular Expressions (3rd ed.), pp 222, by Jeffery
Friedl, for an in-depth analysis). This particular case can be fixed by use of atomic grouping, which prevents
the unnecessary backtracking:
A similar case is typified by the following example, which takes approximately 60 seconds to execute for
me:
Match a string of 29 as against a pattern of 29 optional as followed by 29 mandatory as:
The 29 optional as match the string, but this prevents the 29 mandatory as that follow from matching. Ruby
must then backtrack repeatedly so as to satisfy as many of the optional matches as it can while still
matching the mandatory 29. It is plain to us that none of the optional matches can succeed, but this fact
unfortunately eludes Ruby.
The best way to improve performance is to significantly reduce the amount of backtracking needed. For this
case, instead of individually matching 29 optional as, a range of optional as can be matched all at once
with a{0,29}:
Timeout
There are two APIs to set timeout. One is Regexp.timeout=, which is process-global configuration of timeout
for Regexp matching.
Regexp.timeout = 3
s = 'a' * 25 + 'd' + 'a' * 4 + 'c'
/(b|a+)*c/ =~ s #=> This raises an exception in three seconds
re = Regexp.new("(b|a+)*c", timeout: 3)
s = 'a' * 25 + 'd' + 'a' * 4 + 'c'
/(b|a+)*c/ =~ s #=> This raises an exception in three seconds
When using Regexps to process untrusted input, you should use the timeout feature to avoid excessive
backtracking. Otherwise, a malicious user can provide input to Regexp causing Denial-of-Service attack.
Note that the timeout is not set by default because an appropriate limit highly depends on an application
requirement and context.
Ruby Security
The Ruby programming language is large and complex and there are many security pitfalls often
encountered by newcomers and experienced Rubyists alike.
This document aims to discuss many of these pitfalls and provide more secure alternatives where
applicable.
Please check the full list of publicly known CVEs and how to correctly report a security vulnerability,
at: www.ruby-lang.org/en/security/ Japanese version is here: www.ruby-lang.org/ja/security/
Security vulnerabilities should be reported via an email to [email protected] (the PGP public key),
which is a private mailing list. Reported problems will be published after fixes.
Marshal.load
Ruby’s Marshal module provides methods for serializing and deserializing Ruby object trees to and from a
binary data format.
Never use Marshal.load to deserialize untrusted or user supplied data. Because Marshal can deserialize to
almost any Ruby object and has full control over instance variables, it is possible to craft a malicious
payload that executes code shortly after deserialization.
If you need to deserialize untrusted data, you should use JSON as it is only capable of returning ‘primitive’
types such as strings, arrays, hashes, numbers and nil. If you need to deserialize other classes, you should
handle this manually. Never deserialize to a user specified class.
YAML
YAML is a popular human readable data serialization format used by many Ruby programs for configuration
and database persistence of Ruby object trees.
Similar to Marshal, it is able to deserialize into arbitrary Ruby classes. For example, the
following YAML data will create an ERB object when deserialized:
!ruby/object:ERB
src: puts `uname`
Because of this, many of the security considerations applying to Marshal are also applicable to YAML. Do
not use YAML to deserialize untrusted data.
Symbols
Symbols are often seen as syntax sugar for simple strings, but they play a much more crucial role. The MRI
Ruby implementation uses Symbols internally for method, variable and constant names. The reason for this
is that symbols are simply integers with names attached to them, so they are faster to look up in hashtables.
Starting in version 2.2, most symbols can be garbage collected; these are called mortal symbols. Most
symbols you create (e.g. by calling to_sym) are mortal.
Immortal symbols on the other hand will never be garbage collected. They are created when modifying
code:
defining a method (e.g. with define_method),
setting an instance variable (e.g. with instance_variable_set),
creating a variable or constant (e.g. with const_set)
C extensions that have not been updated and are still calling ‘SYM2ID` will create immortal symbols. Bugs
in 2.2.0: send and __send__ also created immortal symbols, and calling methods with keyword arguments
could also create some.
Don’t create immortal symbols from user inputs. Otherwise, this would allow a user to mount a denial of
service attack against your application by flooding it with unique strings, which will cause memory to grow
indefinitely until the Ruby process is killed or causes the system to slow to a halt.
While it might not be a good idea to call these with user inputs, methods that used to be vulnerable such
as to_sym, respond_to?, method, instance_variable_get, const_get, etc. are no longer a threat.
Regular expressions
Ruby’s regular expression syntax has some minor differences when compared to other languages. In Ruby,
the ^ and $ anchors do not refer to the beginning and end of the string, rather the beginning and end of
a line.
This means that if you’re using a regular expression like /^[a-z]+$/ to restrict a string to only letters, an
attacker can bypass this check by passing a string containing a letter, then a newline, then any string of
their choosing.
If you want to match the beginning and end of the entire string in Ruby, use the anchors \A and \z.
eval
Never pass untrusted or user controlled input to eval.
Unless you are implementing a REPL like irb or pry, eval is almost certainly not what you want. Do not
attempt to filter user input before passing it to eval - this approach is fraught with danger and will most likely
open your application up to a serious remote code execution vulnerability.
send
‘Global functions’ in Ruby (puts, exit, etc.) are actually private instance methods on Object. This means it is
possible to invoke these methods with send, even if the call to send has an explicit receiver.
For example, the following code snippet writes “Hello world” to the terminal:
You should never call send with user supplied input as the first parameter. Doing so can introduce a denial
of service vulnerability:
If an attacker can control the first two arguments to send, remote code execution is possible:
When dispatching a method call based on user input, carefully verify that the method name. If possible,
check it against a whitelist of safe method names.
Note that the use of public_send is also dangerous, as send itself is public:
DRb
As DRb allows remote clients to invoke arbitrary methods, it is not suitable to expose to untrusted clients.
When using DRb, try to avoid exposing it over the network if possible. If this isn’t possible and you need to
expose DRb to the world, you must configure an appropriate security policy with DRb::ACL.
Libraries
MakeMakefile
Module used to generate a Makefile for C extensions
RbConfig
Information of your configure and build of Ruby
Gem
Package management framework for Ruby
Extensions
Coverage
Provides coverage measurement for Ruby
Monitor
Provides an object or module to use safely by more than one thread
objspace
Extends ObjectSpace module to add methods for internal statistics
PTY
Creates and manages pseudo terminals
Ripper
Provides an interface for parsing Ruby programs into S-expressions
Socket
Access underlying OS socket implementations
Default gems
Libraries
Abbrev
Calculates a set of unique abbreviations for a given set of strings
Base64
Support for encoding and decoding binary data using a Base64 representation
Benchmark
Provides methods to measure and report the time used to execute code
Bundler
Manage your Ruby application’s gem dependencies
CGI
Support for the Common Gateway Interface protocol
CSV
Provides an interface to read and write CSV files and data
Delegator
Provides three abilities to delegate method calls to an object
DidYouMean
“Did you mean?” experience in Ruby
DRb
Distributed object system for Ruby
English
Provides references to special global variables with less cryptic names
ERB
An easy to use but powerful templating system for Ruby
ErrorHighlight
Highlight error location in your code
FileUtils
Several file utility methods for copying, moving, removing, etc
Find
This module supports top-down traversal of a set of file paths
Forwardable
Provides delegation of specified methods to a designated object
GetoptLong
Parse command line options similar to the GNU C getopt_long()
IPAddr
Provides methods to manipulate IPv4 and IPv6 IP addresses
IRB
Interactive Ruby command-line tool for REPL (Read Eval Print Loop)
OptionParser
Ruby-oriented class for command-line option analysis
Logger
Provides a simple logging utility for outputting messages
Mutex_m
Mixin to extend objects to be handled like a Mutex
Net::HTTP
HTTP client api for Ruby
Observable
Provides a mechanism for publish/subscribe pattern in Ruby
Open3
Provides access to stdin, stdout and stderr when running other programs
OpenStruct
Class to build custom data structures, similar to a Hash
OpenURI
An easy-to-use wrapper for Net::HTTP, Net::HTTPS and Net::FTP
PP
Provides a PrettyPrinter for Ruby objects
PrettyPrinter
Implements a pretty printing algorithm for readable structure
PStore
Implements a file based persistence mechanism based on a Hash
Readline
Wrapper for Readline extencion and Reline
Reline
GNU Readline and Editline by pure Ruby implementation.
Resolv
Thread-aware DNS resolver library in Ruby
resolv-replace.rb
Replace Socket DNS with Resolv
RDoc
Produces HTML and command-line documentation for Ruby
Rinda
The Linda distributed computing paradigm in Ruby
SecureRandom
Interface for secure random number generator
Set
Provides a class to deal with collections of unordered, unique values
Shellwords
Manipulates strings with word parsing rules of UNIX Bourne shell
Singleton
Implementation of the Singleton pattern for Ruby
Tempfile
A utility class for managing temporary files
Time
Extends the Time class with methods for parsing and conversion
Timeout
Auto-terminate potentially long-running operations in Ruby
tmpdir.rb
Extends the Dir class to manage the OS temporary file path
TSort
Topological sorting using Tarjan’s algorithm
un.rb
Utilities to replace common UNIX commands
URI
A Ruby module providing support for Uniform Resource Identifiers
YAML
Ruby client library for the Psych YAML implementation
WeakRef
Allows a referenced object to be garbage-collected
Extensions
BigDecimal
Provides arbitrary-precision floating point decimal arithmetic
Date
A subclass of Object includes Comparable module for handling dates
DateTime
Subclass of Date to handling dates, hours, minutes, seconds, offsets
Digest
Provides a framework for message digest libraries
Etc
Provides access to information typically stored in UNIX /etc directory
Fcntl
Loads constants defined in the OS fcntl.h C header file
Fiddle
A libffi wrapper for Ruby
IO
Extensions for Ruby IO class, including wait, nonblock and ::console
JSON
Implements Javascript Object Notation for Ruby
NKF
Ruby extension for Network Kanji Filter
OpenSSL
Provides SSL, TLS and general purpose cryptography for Ruby
Pathname
Representation of the name of a file or directory on the filesystem
Psych
A YAML parser and emitter for Ruby
Racc
A LALR(1) parser generator written in Ruby.
Readline
Provides an interface for GNU Readline and Edit Line (libedit)
StringIO
Pseudo I/O on String objects
StringScanner
Provides lexical scanning operations on a String
Syslog
Ruby interface for the POSIX system logging facility
WIN32OLE
Provides an interface for OLE Automation in Ruby
Zlib
Ruby interface for the zlib compression/decompression library
Bundled gems
Libraries
MiniTest
A test suite with TDD, BDD, mocking and benchmarking
PowerAssert
Power Assert for Ruby.
Rake
Ruby build program with capabilities similar to make
Test::Unit
A compatibility layer for MiniTest
REXML
An XML toolkit for Ruby
RSS
Family of libraries that support various formats of XML “feeds”
Net::FTP
Support for the File Transfer Protocol
Net::IMAP
Ruby client api for Internet Message Access Protocol
Net::POP3
Ruby client library for POP3
Net::SMTP
Simple Mail Transfer Protocol client library for Ruby
Matrix
Represents a mathematical matrix.
Prime
Prime numbers and factorization library
RBS
RBS is a language to describe the structure of Ruby programs
TypeProf
A type analysis tool for Ruby code based on abstract interpretation
DEBUGGER__
Debugging functionality for Ruby
%[flags][width]conversion
It consists of:
A leading percent character.
Zero or more flags (each is a character).
An optional width specifier (an integer).
A conversion specifier (a character).
Except for the leading percent character, the only required part is the conversion specifier, so we begin with
that.
Conversion Specifiers
%C - Century, zero-padded:
%h - Same as %b.
%d - Day of the month, in range (1..31), zero-padded:
Timezone
%z - Timezone as hour and minute offset from UTC:
Weekday
%A - Full weekday name:
Week Number
%U - Week number of the year, in range (0..53), zero-padded, where each week begins on a
Sunday:
%W - Week number of the year, in range (0..53), zero-padded, where each week begins on a
Monday:
Week Dates
See ISO 8601 week dates.
%G - Week-based year:
Literals
%n - Newline character “n”:
%D - Date:
%v - VMS date:
%x - Same as %D.
%X - Same as %T.
%r - 12-hour time:
%R - 24-hour time:
%T - 24-hour time:
DateTime.now.strftime('%+')
# => "Wed Jun 29 08:31:53 -05:00 2022"
DateTime.now.strftime('%a %b %e %H:%M:%S %Z %Y')
# => "Wed Jun 29 08:32:18 -05:00 2022"
Flags
Flags may affect certain formatting specifications.
Multiple flags may be given with a single conversion specified; order does not matter.
Padding Flags
0 - Pad with zeroes:
- - Don’t pad:
# - Swapcase result:
Timezone Flags
: - Put timezone as colon-separated hours and minutes:
Width Specifiers
The integer width specifier gives a minimum width for the returned string:
HTTP Format
The HTTP date format is based on RFC 2616, and treats dates in the format '%a, %d %b %Y %T GMT':
Dates
See ISO 8601 dates.
Years:
o Basic year (YYYY):
Calendar dates:
o Basic date (YYYYMMDD):
Week dates:
o Basic date (YYYYWww or YYYYWwwD):
Ordinal dates:
o Basic date (YYYYDDD):
Times
See ISO 8601 times.
Times:
o See also:
Local time (unqualified).
Coordinated Universal Time (UTC).
Time offsets from UTC.
Ruby Syntax
The Ruby syntax is large and is split up into the following sections:
Literals
Numbers, Strings, Arrays, Hashes, etc.
Assignment
Assignment and variables
Control Expressions
if, unless, while, until, for, break, next, redo
Pattern matching
Experimental structural pattern matching and variable binding syntax
Methods
Method and method argument syntax
Calling Methods
How to call a method (or send a message to a method)
Modules and Classes
Creating modules and classes including inheritance
Exceptions
Exception handling syntax
Precedence
Precedence of ruby operators
Refinements
Use and behavior of the refinements feature
Miscellaneous
alias, undef, BEGIN, END
Comments
Line and block code comments
Assignment
In Ruby, assignment uses the = (equals sign) character. This example assigns the number five to the local
variable v:
v=5
Assignment creates a local variable if the variable was not previously referenced.
An assignment expression result is always the assigned value, including assignment methods.
1.times do
a=1
puts "local variables in the block: #{local_variables.join ", "}"
end
This prints:
Since the block creates a new scope, any local variables created inside it do not leak to the surrounding
scope.
Variables defined in an outer scope appear inner scope:
a=0
1.times do
puts "local variables: #{local_variables.join ", "}"
end
This prints:
local variables: a
You may isolate variables in a block from the outer scope by listing them following a ; in the block’s
arguments. See the documentation for block local variables in the calling methods documentation for an
example.
See also Kernel#local_variables, but note that a for loop does not create a new scope like a block does.
p a # prints nil
The similarity between method and local variable names can lead to confusing code, for example:
def big_calculation
42 # pretend this takes a long time
end
big_calculation = big_calculation()
Now any reference to big_calculation is considered a local variable and will be cached. To call the method,
use self.big_calculation.
You can force a method call by using empty argument parentheses as shown above or by using an explicit
receiver like self. Using an explicit receiver may raise a NameError if the method’s visibility is not public or
the receiver is the literal self.
Another commonly confusing case is when using a modifier if:
p a if a = 0.zero?
Rather than printing “true” you receive a NameError, “undefined local variable or method ‘a’”. Since ruby
parses the bare a left of the if first and has not yet seen an assignment to a it assumes you wish to call a
method. Ruby then sees the assignment to a and will assume you are referencing a local method.
The confusion comes from the out-of-order execution of the expression. First the local variable is assigned-
to then you attempt to call a nonexistent method.
def m
eval "bar = 1"
lvs = eval "baz = 2; ary = [local_variables, foo, baz]; x = 2; ary"
eval "quux = 3"
foo = 1
lvs << local_variables
end
m
# => [[:baz, :ary, :x, :lvs, :foo], nil, 2, [:lvs, :foo]]
Instance Variables
Instance variables are shared across all methods for the same object.
An instance variable must start with a @ (“at” sign or commercial at). Otherwise instance variable names
follow the rules as local variable names. Since the instance variable starts with an @ the second character
may be an upper-case letter.
Here is an example of instance variable usage:
class C
def initialize(value)
@instance_variable = value
end
def value
@instance_variable
end
end
An uninitialized instance variable has a value of nil. If you run Ruby with warnings enabled, you will get a
warning when accessing an uninitialized instance variable.
The value method has access to the value set by the initialize method, but only for the same object.
Class Variables
Class variables are shared between a class, its subclasses and its instances.
A class variable must start with a @@ (two “at” signs). The rest of the name follows the same rules as
instance variables.
Here is an example:
class A
@@class_variable = 0
def value
@@class_variable
end
def update
@@class_variable = @@class_variable + 1
end
end
class B < A
def update
@@class_variable = @@class_variable + 2
end
end
a = A.new
b = B.new
This prints:
A value: 0
B value: 0
Continuing with the same example, we can update using objects from either class and the value is shared:
puts "update A"
a.update
This prints:
update A
A value: 1
B value: 1
update B
A value: 3
B value: 3
update A
A value: 4
B value: 4
Global Variables
Global variables are accessible everywhere.
Global variables start with a $ (dollar sign). The rest of the name follows the same rules as instance
variables.
Here is an example:
$global = 0
class C
puts "in a class: #{$global}"
def my_method
puts "in a method: #{$global}"
$global = $global + 1
$other_global = 3
end
end
C.new.my_method
This prints:
in a class: 0
in a method: 0
Assignment Methods
You can define methods that will behave like assignment, for example:
class C
def value=(value)
@value = value
end
end
c = C.new
c.value = 42
Using assignment methods allows your programs to look nicer. When assigning to an instance variable
most people use Module#attr_accessor:
class C
attr_accessor :value
end
When using method assignment you must always have a receiver. If you do not have a receiver, Ruby
assumes you are assigning to a local variable:
class C
attr_accessor :value
def my_method
value = 42
C.new.my_method
This prints:
local_variables: value
@value: nil
class C
attr_accessor :value
def my_method
self.value = 42
C.new.my_method
This prints:
local_variables:
@value: 42
Note that the value returned by an assignment method is ignored whatever, since an assignment
expression result is always the assignment value.
Abbreviated Assignment
You can mix several of the operators and assignment. To add 1 to an object you can write:
a=1
a += 2
p a # prints 3
a=1
a=a+2
p a # prints 3
You can use the following operators this way: +, -, *, /, %, **, &, |, ^, <<, >>
There are also ||= and &&=. The former makes an assignment if the value was nil or false while the latter
makes an assignment if the value was not nil or false.
Here is an example:
a ||= 0
a &&= 1
p a # prints 1
a = 1, 2, 3
p a # prints [1, 2, 3]
a = *[1, 2, 3]
p a # prints [1, 2, 3]
a = 1, *[2, 3]
p a # prints [1, 2, 3]
Multiple Assignment
You can assign multiple values on the right-hand side to multiple variables:
a, b = 1, 2
In the following sections any place “variable” is used an assignment method, instance, class or global will
also work:
def value=(value)
p assigned: value
end
p $global # prints 2
old_value = 1
If you have more values on the right hand side of the assignment than variables on the left hand side, the
extra values are ignored:
a, b = 1, 2, 3
p a: a, b: b # prints {:a=>1, :b=>2}
You can use * to gather extra values on the right-hand side of the assignment.
a, *b = 1, 2, 3
*a, b = 1, 2, 3
(a, b) = [1, 2]
a, (b, c) = 1, [2, 3]
Since each decomposition is considered its own multiple assignment you can use * to gather arguments in
the decomposition:
p a: a, b: b, c: c, d: d
# prints {:a=>1, :b=>2, :c=>[3, 4], :d=>[5, 6]}
Calling Methods
Calling a method sends a message to an object so it can perform some work.
In ruby you send a message to an object like this:
my_method()
Note that the parenthesis are optional:
my_method
Except when there is difference between using and omitting parentheses, this document uses parenthesis
when arguments are present to avoid confusion.
This section only covers calling methods. See also the syntax documentation on defining methods.
Receiver
self is the default receiver. If you don’t specify any receiver self will be used. To specify a receiver use .:
my_object.my_method
This sends the my_method message to my_object. Any object can be a receiver but depending on the
method’s visibility sending a message may raise a NoMethodError.
You may also use :: to designate a receiver, but this is rarely used due to the potential for confusion
with :: for namespaces.
a = [:foo, 'bar', 2]
a1 = [:baz, nil, :bam, nil]
a2 = a.append(*a1).compact
a2 # => [:foo, "bar", 2, :baz, :bam]
Details:
First method merge creates a copy of a, appends (separately) each element of a1 to the copy, and
returns
Chained method compact creates a copy of that return value, removes its nil-valued entries, and
returns
You can chain methods that are in different classes. This example chains
methods Hash#to_a and Array#reverse:
Details:
First method Hash#to_a converts a to an Array, and returns
Chained method Array#reverse creates copy of that return value, reverses it, and returns
This allows to easily chain methods which could return empty value. Note that &. skips only one next call, so
for a longer chain it is necessary to add operator on each level:
Arguments
There are three types of arguments when sending a message, the positional arguments, keyword (or
named) arguments and the block argument. Each message sent may use one, two or all types of
arguments, but the arguments must be supplied in this order.
All arguments in ruby are passed by reference and are not lazily evaluated.
Each argument is separated by a ,:
or a keyword argument:
key: value
Hash and keyword arguments must be contiguous and must appear after all positional arguments, but may
be mixed:
Positional Arguments
The positional arguments for the message follow the method name:
my_method(argument1, argument2)
However, parenthesis are necessary to avoid ambiguity. This will raise a SyntaxError because ruby does
not know which method argument3 should be sent to:
If the method definition has a *argument extra positional arguments will be assigned to argumentin the
method as an Array.
If the method definition doesn’t include keyword arguments, the keyword or hash-type arguments are
assigned as a single hash to the last argument:
def my_method(options)
p options
end
Here c and d have default values which ruby will apply for you. If you send only two arguments to this
method:
my_method(1, 2)
my_method(1, 2, 5)
def my_method(a, b = 2, c = 3, d)
p [a, b, c, d]
end
Here b and c have default values. If you send only two arguments to this method:
my_method(1, 4)
my_method(1, 5, 6)
Keyword Arguments
Keyword arguments follow any positional arguments and are separated by commas like positional
arguments:
Be aware that when method parenthesis are omitted, too, the parsing order might be unexpected:
some_other_expression
Block Argument
The block argument sends a closure from the calling scope to the method.
The block argument is always last when sending a message to a method. A block is sent to a method
using do ... end or { ... }:
my_method do
# ...
end
or:
my_method {
# ...
}
method_1 method_2 {
# ...
}
method_1 method_2 do
# ...
end
Sends the block to method_1. Note that in the first case if parentheses are used the block is sent
to method_1.
A block will accept arguments from the method it was sent to. Arguments are defined similar to the way a
method defines arguments. The block’s arguments go in | ... | following the opening do or {:
def my_method
yield self
end
place = "world"
This prints:
So the place variable in the block is not the same place variable as outside the block. Removing ;
place from the block arguments gives this result:
You can turn an Array into an argument list with * (or splat) operator:
arguments = [1, 2, 3]
my_method(*arguments)
or:
arguments = [2, 3]
my_method(1, *arguments)
my_method(1, 2, 3)
If the method accepts keyword arguments, the splat operator will convert a hash at the end of the array into
keyword arguments:
def my_method(a, b, c: 3)
end
arguments = [1, 2, { c: 4 }]
my_method(*arguments)
Note that this behavior is currently deprecated and will emit a warning. This behavior will be removed in
Ruby 3.0.
You may also use the ** (described next) to convert a Hash into keyword arguments.
If the number of objects in the Array do not match the number of arguments for the method,
an ArgumentError will be raised.
If the splat operator comes first in the call, parentheses must be used to avoid a warning:
You can turn a Hash into keyword arguments with the ** (keyword splat) operator:
arguments = { first: 3, second: 4, third: 5 }
my_method(**arguments)
or:
If the method definition uses the keyword splat operator to gather arbitrary keyword arguments, they will not
be gathered by *:
Prints:
def my_method
yield self
end
You can convert a proc or lambda to a block argument with the & (block conversion) operator:
my_method(&argument)
If the block conversion operator comes first in the call, parenthesis must be used to avoid a warning:
Method Lookup
When you send a message, Ruby looks up the method that matches the name of the message for the
receiver. Methods are stored in classes and modules so method lookup walks these, not the objects
themselves.
Here is the order of method lookup for the receiver’s class or module R:
The prepended modules of R in reverse order
For a matching method in R
The included modules of R in reverse order
If R is a class with a superclass, this is repeated with R‘s superclass until a method is found.
Once a match is found method lookup stops.
If no match is found this repeats from the beginning, but looking for method_missing. The
default method_missing is BasicObject#method_missing which raises a NameError when invoked.
If refinements (an experimental feature) are active, the method lookup changes. See the refinements
documentation for details
Code Comments
Ruby has two types of comments: inline and block.
Inline comments start with the # character and continue until the end of the line:
# On a separate line
class Foo # or at the end of the line
# can be indented
def bar
end
end
Block comments start with =begin and end with =end. Each should start on a separate line.
=begin
This is
commented out
=end
class Foo
end
=begin some_tag
this works, too
=end
class Foo
=begin
=end
end
Magic Comments
While comments are typically ignored by Ruby, special “magic comments” contain directives that affect how
the code is interpreted.
Top-level magic comments must appear in the first comment section of a file.
NOTE: Magic comments affect only the file in which they appear; other files are unaffected.
# frozen_string_literal: true
var = 'hello'
var.frozen? # => true
Alternative syntax
Magic comments may consist of a single directive (as in the example above). Alternatively, multiple
directives may appear on the same line if separated by “;” and wrapped between “-*-” (see Emacs’ file
variables).
encoding Directive
Indicates which string encoding should be used for string literals, regexp literals and __ENCODING__:
# encoding: big5
# frozen_string_literal: true
3.times do
p 'hello'.object_id # => prints same number
end
p 'world'.frozen? # => true
The default is false; this can be changed with --enable=frozen-string-literal. Without the directive, or with #
frozen_string_literal: false, the example above would print 3 different numbers and “false”.
Starting in Ruby 3.0, string literals that are dynamic are not frozen nor reused:
# frozen_string_literal: true
def foo
end # => no warning
# warn_indent: true
def bar
end # => warning: mismatched indentations at 'end' with 'def' at 6
Another way to get these warnings to show is by running Ruby with warnings (ruby -w). Using a directive to
set this false will prevent these warnings to show.
shareable_constant_value Directive
Note: This directive is experimental in Ruby 3.0 and may change in future releases.
This special directive helps to create constants that hold only immutable objects, or Ractor-
shareableconstants.
The directive can specify special treatment for values assigned to constants:
none: (default)
literal: literals are implicitly frozen, others must be Ractor-shareable
experimental_everything: all made shareable
experimental_copy: copy deeply and make it shareable
Mode none (default)
No special treatment in this mode (as in Ruby 2.x): no automatic freezing and no checks.
It has always been a good idea to deep-freeze constants; Ractor makes this an even better idea as only the
main ractor can access non-shareable constants:
# shareable_constant_value: none
A = {foo: []}
A.frozen? # => false
Ractor.new { puts A } # => can not access non-shareable objects by non-main Ractor.
Mode literal
In “literal” mode, constants assigned to literals will be deeply-frozen:
# shareable_constant_value: literal
X = [{foo: []}] # => same as [{foo: [].freeze}.freeze].freeze
# shareable_constant_value: literal
X = Object.new # => cannot assign unshareable object to X
Note that only literals directly assigned to constants, or recursively held in such literals will be frozen:
# shareable_constant_value: literal
var = [{foo: []}]
var.frozen? # => false (assignment was made to local variable)
X = var # => cannot assign unshareable object to X
# shareable_constant_value: experimental_everything
FOO = Set[1, 2, {foo: []}]
# same as FOO = Ractor.make_sharable(...)
# OR same as `FOO = Set[1, 2, {foo: [].freeze}.freeze].freeze`
This mode is “experimental”, because it might be error prone, for example by deep-freezing the constants of
an external resource which could cause errors:
# shareable_constant_value: experimental_everything
FOO = SomeGem::Something::FOO
# => deep freezes the gem's constant!
This will be revisited before Ruby 3.1 to either allow ‘everything` or to instead remove this mode.
The method Module#const_set is not affected.
Mode experimental_copy
In this mode, all values assigned to constants are deeply copied and made shareable. It is safer mode
than experimental_everything.
# shareable_constant_value: experimental_everything
var = [{foo: []}]
var.frozen? # => false (assignment was made to local variable)
X = var # => calls `Ractor.make_shareable(var, copy: true)`
var.frozen? # => false
Ractor.shareable?(X) #=> true
var.object_id == X.object_id #=> false
This mode is “experimental” and has not been discussed thoroughly. This will be revisited before Ruby 3.1
to either allow ‘copy` or to instead remove this mode.
The method Module#const_set is not affected.
Scope
This directive can be used multiple times in the same file:
# shareable_constant_value: none
A = {foo: []}
A.frozen? # => false
Ractor.new { puts A } # => can not access non-shareable objects by non-main Ractor.
# shareable_constant_value: literal
B = {foo: []}
B.frozen? # => true
B[:foo].frozen? # => true
D = [Object.new.freeze]
D.frozen? # => true
# shareable_constant_value: experimental_everything
E = Set[1, 2, Object.new]
E.frozen? # => true
E.all(&:frozen?) # => true
The directive affects only subsequent constants and only for the current scope:
module Mod
# shareable_constant_value: literal
A = [1, 2, 3]
module Sub
B = [4, 5]
end
end
C = [4, 5]
module Mod
D = [6]
end
p Mod::A.frozen?, Mod::Sub::B.frozen? # => true, true
p C.frozen?, Mod::D.frozen? # => false, false
Control Expressions
Ruby has a variety of ways to control execution. All the expressions described here return a value.
For the tests in these control expressions, nil and false are false-values and true and any other object are
true-values. In this document “true” will mean “true-value” and “false” will mean “false-value”.
if Expression
The simplest if expression has two parts, a “test” expression and a “then” expression. If the “test” expression
evaluates to a true then the “then” expression is evaluated.
Here is a simple if statement:
if true then
puts "the test resulted in a true-value"
end
if true
puts "the test resulted in a true-value"
end
This document will omit the optional then for all expressions as that is the most common usage of if.
You may also add an else expression. If the test does not evaluate to true the else expression will be
executed:
if false
puts "the test resulted in a true-value"
else
puts "the test resulted in a false-value"
end
a=1
if a == 0
puts "a is zero"
elsif a == 1
puts "a is one"
else
puts "a is some other value"
end
This will print “a is one” as 1 is not equal to 0. Since else is only executed when there are no matching
conditions.
Once a condition matches, either the if condition or any elsif condition, the if expression is complete and no
further tests will be performed.
Like an if, an elsif condition may be followed by a then.
In this example only “a is one” is printed:
a=1
if a == 0
puts "a is zero"
elsif a == 1
puts "a is one"
elsif a >= 1
puts "a is greater than or equal to one"
else
puts "a is some other value"
end
The tests for if and elsif may have side-effects. The most common use of side-effect is to cache a value into
a local variable:
if a = object.some_value
# do something to a
end
The result value of an if expression is the last value executed in the expression.
Ternary if
You may also write a if-then-else expression using ? and :. This ternary if:
input_type =
if gets =~ /hello/i
"greeting"
else
"other"
end
While the ternary if is much shorter to write than the more verbose form, for readability it is recommended
that the ternary if is only used for simple conditionals. Also, avoid using multiple ternary conditions in the
same expression as this can be confusing.
unless Expression
The unless expression is the opposite of the if expression. If the value is false, the “then” expression is
executed:
unless true
puts "the value is a false-value"
end
if not true
puts "the value is a false-value"
end
unless true
puts "the value is false"
else
puts "the value is true"
end
a=0
a += 1 if a.zero?
pa
a=0
a += 1 unless a.zero?
pa
p a if a = 0.zero?
Here the string "12345" is compared with /^1/ by calling /^1/ === "12345" which returns true. Like
the if expression, the first when that matches is executed and all other matches are ignored.
If no matches are found, the else is executed.
The else and then are optional, this case expression gives the same result as the one above:
case "12345"
when /^1/
puts "the string starts with one"
end
case "2"
when /^1/, "2"
puts "the string starts with one or is '2'"
end
Ruby will try each condition in turn, so first /^1/ === "2" returns false, then "2" === "2"returns true, so “the
string starts with one or is ‘2’” is printed.
You may use then after the when condition. This is most frequently used to place the body of the when on a
single line.
case a
when 1, 2 then puts "a is one or two"
when 3 then puts "a is three"
else puts "I don't know what a is"
end
a=2
case
when a == 1, a == 2
puts "a is one or two"
when a == 3
puts "a is three"
else
puts "I don't know what a is"
end
case {a: 1, b: 2, c: 3}
in a: Integer => m
"matched: #{m}"
else
"not matched"
end
# => "matched: 1"
a=0
while a < 10 do
pa
a += 1
end
pa
Prints the numbers 0 through 10. The condition a < 10 is checked before the loop is entered, then the body
executes, then the condition is checked again. When the condition results in false the loop is terminated.
The do keyword is optional. The following loop is equivalent to the loop above:
while a < 10
pa
a += 1
end
The result of a while loop is nil unless break is used to supply a value.
until Loop
The until loop executes while a condition is false:
a=0
until a > 10 do
pa
a += 1
end
pa
This prints the numbers 0 through 11. Like a while loop the condition a > 10 is checked when entering the
loop and each time the loop body executes. If the condition is false the loop will continue to execute.
Like a while loop, the do is optional.
Like a while loop, the result of an until loop is nil unless break is used.
for Loop
The for loop consists of for followed by a variable to contain the iteration argument followed by in and the
value to iterate over using each. The do is optional:
Prints 1, 2 and 3.
Like while and until, the do is optional.
The for loop is similar to using each, but does not create a new variable scope.
The result value of a for loop is the value iterated over unless break is used.
The for loop is rarely used in modern ruby programs.
Modifier while and until
Like if and unless, while and until can be used as modifiers:
a=0
a += 1 while a < 10
p a # prints 10
a=0
a += 1 until a > 10
p a # prints 11
You can use begin and end to create a while loop that runs the body once before the condition:
a=0
begin
a += 1
end while a < 10
p a # prints 10
If you don’t use rescue or ensure, Ruby optimizes away any exception handling overhead.
break Statement
Use break to leave a block early. This will stop iterating over the items in values if one of them is even:
values.each do |value|
break if value.even?
# ...
end
a=0
while true do
pa
a += 1
break if a < 10
end
pa
p result # prints 4
next Statement
Use next to skip the rest of the current iteration:
value * 2
end
p result # prints [2, nil, 6]
next accepts an argument that can be used as the result of the current block iteration:
value * 2
end
redo Statement
Use redo to redo the current iteration:
result = []
redo if result.last.even?
p result
Modifier Statements
Ruby’s grammar differentiates between statements and expressions. All expressions are statements (an
expression is a type of statement), but not all statements are expressions. Some parts of the grammar
accept expressions and not other types of statements, which causes code that looks similar to be parsed
differently.
For example, when not used as a modifier, if, else, while, until, and begin are expressions (and also
statements). However, when used as a modifier, if, else, while, until and rescueare statements but not
expressions.
If you put a space between the method name and opening parenthesis, you do not need two sets of
parentheses.
This is because this is parsed similar to a method call without parentheses. It is equivalent to the following
code, without the creation of a local variable:
x = (1 if true)
px
In a modifier statement, the left-hand side must be a statement and the right-hand side must be an
expression.
So in a if b rescue c, because b rescue c is a statement that is not an expression, and therefore is not
allowed as the right-hand side of the if modifier statement, the code is necessarily parsed as (a if b) rescue
c.
This interacts with operator precedence in such a way that:
This is because modifier rescue has higher precedence than =, and modifier if has lower precedence
than =.
Flip-Flop
The flip-flop is a slightly special conditional expression. One of its typical uses is processing text from ruby
one-line programs used with ruby -n or ruby -p.
The form of the flip-flop is an expression that indicates when the flip-flop turns on, .. (or ...), then an
expression that indicates when the flip-flop will turn off. While the flip-flop is on it will continue to evaluate
to true, and false when off.
Here is an example:
selected = []
0.upto 10 do |value|
selected << value if value==2..value==8
end
In the above example, the ‘on’ condition is n==2. The flip-flop is initially ‘off’ (false) for 0 and 1, but becomes
‘on’ (true) for 2 and remains ‘on’ through 8. After 8 it turns off and remains ‘off’ for 9 and 10.
The flip-flop must be used inside a conditional such as !, ? :, not, if, while, unless, untiletc. including the
modifier forms.
When you use an inclusive range (..), the ‘off’ condition is evaluated when the ‘on’ condition changes:
selected = []
0.upto 5 do |value|
selected << value if value==2..value==2
end
Here, both sides of the flip-flop are evaluated so the flip-flop turns on and off only when value equals 2.
Since the flip-flop turned on in the iteration it returns true.
When you use an exclusive range (...), the ‘off’ condition is evaluated on the following iteration:
selected = []
0.upto 5 do |value|
selected << value if value==2...value==2
end
Here, the flip-flop turns on when value equals 2, but doesn’t turn off on the same iteration. The ‘off’ condition
isn’t evaluated until the following iteration and value will never be two again.
Exception Handling
Exceptions are rescued in a begin/end block:
begin
# code that might raise
rescue
# handle exception
end
If you are inside a method, you do not need to use begin or end unless you wish to limit the scope of
rescued exceptions:
def my_method
# ...
rescue
# ...
end
You can assign the exception to a local variable by using => variable_name at the end of the rescue line:
begin
# ...
rescue => exception
warn exception.message
raise # re-raise the current exception
end
By default, StandardError and its subclasses are rescued. You can rescue a specific set of exception
classes (and their subclasses) by listing them after rescue:
begin
# ...
rescue ArgumentError, NameError
# handle ArgumentError or NameError
end
begin
# ...
rescue ArgumentError
# handle ArgumentError
rescue NameError
# handle NameError
rescue
# handle any StandardError
end
The exception is matched to the rescue section starting at the top, and matches only once. If
an ArgumentError is raised in the begin section, it will not be handled in the StandardErrorsection.
You may retry rescued exceptions:
begin
# ...
rescue
# do something that may change the result of the begin block
retry
end
Execution will resume at the start of the begin block, so be careful not to create an infinite loop.
Inside a rescue block is the only valid location for retry, all other uses will raise a SyntaxError. If you wish to
retry a block iteration use redo. See Control Expressions for details.
To always run some code whether an exception was raised or not, use ensure:
begin
# ...
rescue
# ...
ensure
# this always runs
end
You may also run some code when an exception is not raised:
begin
# ...
rescue
# ...
else
# this runs only when no exception was raised
ensure
# ...
end
Literals
Literals create objects you can use in your program. Literals include:
Boolean and Nil Literals
Number Literals
o Integer Literals
o Float Literals
o Rational Literals
o Complex Literals
String Literals
Here Document Literals
Symbol Literals
Array Literals
Hash Literals
Range Literals
Regexp Literals
Lambda Proc Literals
Percent Literals
o %q: Non-Interpolable String Literals
o % and %Q: Interpolable String Literals
o %w and %W: String-Array Literals
o %i and %I: Symbol-Array Literals
o %r: Regexp Literals
o %s: Symbol Literals
o %x: Backtick Literals
Number Literals
Integer Literals
You can write integers of any size as follows:
1234
1_234
These numbers have the same value, 1,234. The underscore may be used to enhance readability for
humans. You may place an underscore anywhere in the number.
You can use a special prefix to write numbers in decimal, hexadecimal, octal or binary formats. For decimal
numbers use a prefix of 0d, for hexadecimal numbers use a prefix of 0x, for octal numbers use a prefix
of 0 or 0o, for binary numbers use a prefix of 0b. The alphabetic component of the number is not case-
sensitive.
Examples:
0d170
0D170
0xaa
0xAa
0xAA
0Xaa
0XAa
0XaA
0252
0o252
0O252
0b10101010
0B10101010
All these numbers have the same decimal value, 170. Like integers and floats you may use an underscore
for readability.
Float Literals
Floating-point numbers may be written as follows:
12.34
1234e-2
1.234E1
These numbers have the same value, 12.34. You may use underscores in floating point numbers as well.
Rational Literals
You can write a Rational literal using a special suffix, 'r'.
Examples:
1r # => (1/1)
2/3r # => (2/3) # With denominator.
-1r # => (-1/1) # With signs.
-2/3r # => (-2/3)
2/-3r # => (-2/3)
-2/-3r # => (2/3)
+1/+3r # => (1/3)
1.2r # => (6/5) # With fractional part.
1_1/2_1r # => (11/21) # With embedded underscores.
2/4r # => (1/2) # Automatically reduced.
Syntax:
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Note this, which is parsed as Float numerator 1.2 divided by Rational denominator 3r, resulting in a Float:
Complex Literals
You can write a Complex number as follows (suffixed i):
1i #=> (0+1i)
1i * 1i #=> (-1+0i)
String Literals
The most common way of writing strings is using ":
"This is a string."
Double-quote strings allow escaped characters such as \n for newline, \t for tab, etc. The full list of
supported escape sequences are as follows:
\\ backslash, \
\nnn octal bit pattern, where nnn is 1-3 octal digits ([0-7])
\u{nnnn ...} Unicode character(s), where each nnnn is 1-6 hexadecimal digits ([0-9a-fA-F])
Any expression may be placed inside the interpolated section, but it’s best to keep the expression small for
readability.
You can also use #@foo, #@@foo and #$foo as a shorthand for,
respectively, #{ @foo }, #{ @@foo } and #{ $foo }.
Interpolation may be disabled by escaping the “#” character or using single-quote strings:
In addition to disabling interpolation, single-quoted strings also disable all escape sequences except for the
single-quote (\') and backslash (\\).
Adjacent string literals are automatically concatenated by the interpreter:
Any combination of adjacent single-quote, double-quote, percent strings will be concatenated as long as a
percent-string is not last.
There is also a character literal notation to represent single character strings, which syntax is a question
mark (?) followed by a single character or escape sequence that corresponds to a single codepoint in the
script encoding:
?a #=> "a"
?あ #=> "あ"
See also:
%q: Non-Interpolable String Literals
% and %Q: Interpolable String Literals
expected_result = <<HEREDOC
This would contain specially formatted text.
The heredoc starts on the line following <<HEREDOC and ends with the next line that starts
with HEREDOC. The result includes the ending newline.
You may use any identifier with a heredoc, but all-uppercase identifiers are typically used.
You may indent the ending identifier if you place a “-” after <<:
expected_result = <<-INDENTED_HEREDOC
This would contain specially formatted text.
expected_result = <<~SQUIGGLY_HEREDOC
This would contain specially formatted text.
The indentation of the least-indented line will be removed from each line of the content. Note that empty
lines and lines consisting solely of literal tabs and spaces will be ignored for the purposes of determining
indentation, but escaped tabs and spaces are considered non-indentation characters.
For the purpose of measuring an indentation, a horizontal tab is regarded as a sequence of one to eight
spaces such that the column position corresponding to its end is a multiple of eight. The amount to be
removed is counted in terms of the number of spaces. If the boundary appears in the middle of a tab, that
tab is not removed.
A heredoc allows interpolation and escaped characters. You may disable interpolation and escaping by
surrounding the opening identifier with single quotes:
expected_result = <<-'EXPECTED'
One plus one is #{1 + 1}
EXPECTED
The identifier may also be surrounded with double quotes (which is the same as no quotes) or with
backticks. When surrounded by backticks the HEREDOC behaves like Kernel#`:
puts <<-`HEREDOC`
cat #{__FILE__}
HEREDOC
When surrounding with quotes, any character but that quote and newline (CR and/or LF) can be used as the
identifier.
To call a method on a heredoc place it after the opening identifier:
expected_result = <<-EXPECTED.chomp
One plus one is #{1 + 1}
EXPECTED
You may open multiple heredocs on the same line, but this can be difficult to read:
puts(<<-ONE, <<-TWO)
content for heredoc one
ONE
content for heredoc two
TWO
Symbol Literals
A Symbol represents a name inside the ruby interpreter. See Symbol for more details on what symbols are
and when ruby creates them internally.
You may reference a symbol using a colon: :my_symbol.
You may also create symbols by interpolation:
:"my_symbol1"
:"my_symbol#{1 + 1}"
When creating a Hash, there is a special syntax for referencing a Symbol as well.
See also:
%s: Symbol Literals
Array Literals
An array is created using the objects between [ and ]:
[1, 2, 3]
[1, 1 + 1, 1 + 2]
[1, [1 + 1, [1 + 2]]]
See also:
%w and %W: String-Array Literals
%i and %I: Symbol-Array Literals
See Array for the methods you may use with an array.
Hash Literals
A hash is created using key-value pairs between { and }:
{ "a" => 1, "b" => 2 }
{ a: 1, b: 2 }
is equal to
Hash values can be omitted, meaning that value will be fetched from the context by the name of the key:
x = 100
y = 200
h = { x:, y: }
#=> {:x=>100, :y=>200}
See Hash for the methods you may use with a hash.
Range Literals
A range represents an interval of values. The range may include or exclude its ending value.
You may create a range of any object. See the Range documentation for details on the methods you need
to implement.
Regexp Literals
A regular expression may be created using leading and trailing slash ('/') characters:
-> { 1 + 1 }
->(v) { 1 + v }
Percent Literals
Each of the literals in described in this section may use these paired delimiters:
[ and ].
( and ).
{ and }.
< and >.
Any other character, as both beginning and ending delimiters.
These are demonstrated in the next section.
%q: Non-Interpolable String Literals
You can write a non-interpolable string with %q. The created string is the same as if you created it with
single quotes:
The trailing delimiter may be followed by one or more flag characters that modify the behavior. See Regexp
options for details.
%x: Backtick Literals
You can write and execute a shell command with %x:
%x(echo 1) # => "1\n"
Methods
Methods implement the functionality of your program. Here is a simple method definition:
def one_plus_one
1+1
end
A method definition consists of the def keyword, a method name, the body of the method, returnvalue and
the end keyword. When called the method will execute the body of the method. This method returns 2.
Since Ruby 3.0, there is also a shorthand syntax for methods consisting of exactly one expression:
def one_plus_one = 1 + 1
This section only covers defining methods. See also the syntax documentation on calling methods.
Method Names
Method names may be one of the operators or must start a letter or a character with the eighth bit set. It
may contain letters, numbers, an _ (underscore or low line) or a character with the eighth bit set. The
convention is to use underscores to separate words in a multiword method name:
def method_name
puts "use underscores to separate words"
end
Ruby programs must be written in a US-ASCII-compatible character set such as UTF-8, ISO-8859-1 etc. In
such character sets if the eighth bit is set it indicates an extended character. Ruby allows method names
and other identifiers to contain such characters. Ruby programs cannot contain some characters like ASCII
NUL (\x00).
The following are examples of valid Ruby methods:
def hello
"hello"
end
def こんにちは
puts "means hello in Japanese"
end
Typically method names are US-ASCII compatible since the keys to type them exist on all keyboards.
Method names may end with a ! (bang or exclamation mark), a ? (question mark), or = (equals sign).
The bang methods (! at the end of the method name) are called and executed just like any other method.
However, by convention, a method with an exclamation point or bang is considered dangerous. In Ruby’s
core library the dangerous method implies that when a method ends with a bang (!), it indicates that unlike
its non-bang equivalent, permanently modifies its receiver. Almost always, the Ruby core library will have a
non-bang counterpart (method name which does NOT end with !) of every bang method (method name
which does end with !) that does not modify the receiver. This convention is typically true for the Ruby core
library but may or may not hold true for other Ruby libraries.
Methods that end with a question mark by convention return boolean, but they may not always return
just true or false. Often, they will return an object to indicate a true value (or “truthy” value).
Methods that end with an equals sign indicate an assignment method.
class C
def attr
@attr
end
def attr=(val)
@attr = val
end
end
c = C.new
c.attr #=> nil
c.attr = 10 # calls "attr=(10)"
c.attr #=> 10
class C
def -@
puts "you inverted this object"
end
end
obj = C.new
The @ is needed to differentiate unary minus and plus operators from binary minus and plus operators.
You can also follow tilde and not (!) unary methods with @, but it is not required as there are no binary tilde
and not operators.
Unary methods accept zero arguments.
Additionally, methods for element reference and assignment may be defined: [] and []=respectively. Both
can take one or more arguments, and element reference can take none.
class C
def [](a, b)
puts a + b
end
def []=(a, b, c)
puts a * b + c
end
end
obj = C.new
Return Values
By default, a method returns the last expression that was evaluated in the body of the method. In the
example above, the last (and only) expression evaluated was the simple sum 1 + 1. The returnkeyword can
be used to make it explicit that a method returns a value.
def one_plus_one
return 1 + 1
end
It can also be used to make a method return before the last expression is evaluated.
def two_plus_two
return 2 + 2
1 + 1 # this expression is never evaluated
end
Note that for assignment methods the return value will be ignored when using the assignment syntax.
Instead, the argument will be returned:
def a=(value)
return 1 + value
end
p(self.a = 5) # prints 5
The actual return value will be returned when invoking the method directly:
p send(:a=, 5) # prints 6
Scope
The standard syntax to define a method:
def my_method
# ...
end
adds the method to a class. You can define an instance method on a specific class with the classkeyword:
class C
def my_method
# ...
end
end
A method may be defined on another object. You may define a “class method” (a method that is defined on
the class, not an instance of the class) like this:
class C
def self.my_method
# ...
end
end
However, this is simply a special case of a greater syntactical power in Ruby, the ability to add methods to
any object. Classes are objects, so adding class methods is simply adding methods to the Classobject.
The syntax for adding a method to an object is as follows:
greeting = "Hello"
def greeting.broaden
self + ", world!"
end
def String.hello
"Hello, world!"
end
A method defined like this is called a “singleton method”. broaden will only exist on the string
instance greeting. Other strings will not have broaden.
Overriding
When Ruby encounters the def keyword, it doesn’t consider it an error if the method already exists: it simply
redefines it. This is called overriding. Rather like extending core classes, this is a potentially dangerous
ability, and should be used sparingly because it can cause unexpected results. For example, consider this
irb session:
>> "43".to_i
=> 43
>> 42
>> end
>> end
=> nil
>> "43".to_i
=> 42
This will effectively sabotage any code which makes use of the method String#to_i to parse numbers from
strings.
Arguments
A method may accept arguments. The argument list follows the method name:
def add_one(value)
value + 1
end
When called, the user of the add_one method must provide an argument. The argument is a local variable
in the method body. The method will then add one to this argument and return the value. If given 1 this
method will return 2.
The parentheses around the arguments are optional:
# OK
# SyntaxError
def add_values(a, b)
a+b
end
When called, the arguments must be provided in the exact order. In other words, the arguments are
positional.
Default Values
Arguments may have default values:
def add_values(a, b = 1)
a+b
end
The default value does not need to appear first, but arguments with defaults must be grouped together. This
is ok:
def add_values(a = 1, b = 2, c)
a+b+c
end
def add_values(a = 1, b, c = 1)
a+b+c
end
Default argument values can refer to arguments that have already been evaluated as local variables, and
argument values are always evaluated left to right. So this is allowed:
def add_values(a = 1, b = a)
a+b
end
add_values
# => 2
But this will raise a NameError (unless there is a method named b defined):
def add_values(a = b, b = 1)
a+b
end
add_values
# NameError (undefined local variable or method `b' for main:Object)
Array Decomposition
You can decompose (unpack or extract values from) an Array using extra parentheses in the arguments:
my_method([1, 2])
This prints:
{:a=>1, :b=>2}
If the argument has extra elements in the Array they will be ignored:
def my_method((a, b))
p a: a, b: b
end
my_method([1, 2, 3])
my_method([1, 2, 3])
This prints:
The argument will be decomposed if it responds to to_ary. You should only define to_ary if you can use your
object in place of an Array.
Use of the inner parentheses only uses one of the sent arguments. If the argument is not an Array it will be
assigned to the first argument in the decomposition and the remaining arguments in the decomposition will
be nil:
my_method(1, 2, 3)
This prints:
Array/Hash Argument
Prefixing an argument with * causes any remaining arguments to be converted to an Array:
def gather_arguments(*arguments)
p arguments
end
The array argument will capture a Hash as the last entry if keywords were provided by the caller after all
positional arguments.
def gather_arguments(*arguments)
p arguments
end
However, this only occurs if the method does not declare any keyword arguments.
gather_arguments_keyword 1, 2, three: 3
#=> raises: unknown keyword: three (ArgumentError)
def ignore_arguments(*)
end
You can also use a bare * when calling a method to pass the arguments directly to another method:
def delegate_arguments(*)
other_method(*)
end
Keyword Arguments
Keyword arguments are similar to positional arguments with default values:
When calling a method with keyword arguments the arguments may appear in any order. If an unknown
keyword argument is sent by the caller, and the method does not accept arbitrary keyword arguments,
an ArgumentError is raised.
To require a specific keyword argument, do not include a default value for the keyword argument:
When mixing keyword arguments and positional arguments, all positional arguments must appear before
any keyword arguments.
Also, note that ** can be used to ignore keyword arguments:
def ignore_keywords(**)
end
You can also use ** when calling a method to delegate keyword arguments to another method:
def delegate_keywords(**)
other_method(**)
end
To mark a method as accepting keywords, but not actually accepting keywords, you can use the **nil:
def no_keywords(**nil)
end
Calling such a method with keywords or a non-empty keyword splat will result in an ArgumentError. This
syntax is supported so that keywords can be added to the method later without affected backwards
compatibility.
If a method definition does not accept any keywords, and the **nil syntax is not used, any keywords
provided when calling the method will be converted to a Hash positional argument:
def meth(arg)
arg
end
meth(a: 1)
# => {:a=>1}
Block Argument
The block argument is indicated by & and must come last:
def my_method(&my_block)
my_block.call(self)
end
Most frequently the block argument is used to pass a block to another method:
def each_item(&block)
@items.each(&block)
end
You are not required to give a name to the block if you will just be passing it to another method:
def each_item(&)
@items.each(&)
end
If you are only going to call the block and will not otherwise manipulate it or send it to another method,
using yield without an explicit block parameter is preferred. This method is equivalent to the first method in
this section:
def my_method
yield self
end
Argument Forwarding
Since Ruby 2.7, an all-arguments forwarding syntax is available:
def concrete_method(*positional_args, **keyword_args, &block)
[positional_args, keyword_args, block]
end
def forwarding_method(...)
concrete_method(...)
end
forwarding_method(1, b: 2) { puts 3 }
#=> [[1], {:b=>2}, #<Proc:...skip...>]
Calling with forwarding ... is available only in methods defined with ....
end
Since Ruby 3.0, there can be leading arguments before ... both in definitions and in invocations (but in
definitions they can be only positional arguments without default values).
def get(...)
request(:GET, ...) # leading argument in invoking
end
Note that omitting parentheses in forwarding calls may lead to unexpected results:
def log(...)
puts ... # This would be treated as `puts()...',
# i.e. endless range from puts result
end
log("test")
# Prints: warning: ... at EOL, should be parenthesized?
# ...and then empty line
Exception Handling
Methods have an implied exception handling block so you do not need to use begin or end to handle
exceptions. This:
def my_method
begin
# code that may raise an exception
rescue
# handle exception
end
end
def my_method
# code that may raise an exception
rescue
# handle exception
end
Similarly, if you wish to always run code even if an exception is raised, you can
use ensure without begin and end:
def my_method
# code that may raise an exception
ensure
# code that runs even if previous code raised an exception
end
You can also combine rescue with ensure and/or else, without begin and end:
def my_method
# code that may raise an exception
rescue
# handle exception
else
# only run if no exception raised above
ensure
# code that runs even if previous code raised an exception
end
If you wish to rescue an exception for only part of your method, use begin and end. For more details see the
page on exception handling.
Miscellaneous Syntax
Ending an Expression
Ruby uses a newline as the end of an expression. When ending a line with an operator, open parentheses,
comma, etc. the expression will continue.
You can end an expression with a ; (semicolon). Semicolons are most frequently used with ruby -e.
Indentation
Ruby does not require any indentation. Typically, ruby programs are indented two spaces.
If you run ruby with warnings enabled and have an indentation mismatch, you will receive a warning.
alias
The alias keyword is most frequently used to alias methods. When aliasing a method, you can use either its
name or a symbol:
$old = 0
p $new # prints 0
undef my_method
You don’t need to use parenthesis with defined?, but they are recommended due to the low
precedence of defined?.
For example, if you wish to check if an instance variable exists and that the instance variable is zero:
This returns "expression", which is not what you want if the instance variable is not defined.
@instance_variable = 1
defined?(@instance_variable) && @instance_variable.zero?
Adding parentheses when checking if the instance variable is defined is a better check. This correctly
returns nil when the instance variable is not defined and false when the instance variable is not zero.
Using the specific reflection methods such as instance_variable_defined? for instance variables or
const_defined? for constants is less error prone than using defined?.
defined? handles some regexp global variables specially based on whether there is an active regexp match
and how many capture groups there are:
/b/ =~ 'a'
defined?($~) # => "global-variable"
defined?($&) # => nil
defined?($`) # => nil
defined?($') # => nil
defined?($+) # => nil
defined?($1) # => nil
defined?($2) # => nil
/./ =~ 'a'
defined?($~) # => "global-variable"
defined?($&) # => "global-variable"
defined?($`) # => "global-variable"
defined?($') # => "global-variable"
defined?($+) # => nil
defined?($1) # => nil
defined?($2) # => nil
/(.)/ =~ 'a'
defined?($~) # => "global-variable"
defined?($&) # => "global-variable"
defined?($`) # => "global-variable"
defined?($') # => "global-variable"
defined?($+) # => "global-variable"
defined?($1) # => "global-variable"
defined?($2) # => nil
BEGIN {
count = 0
}
You must use { and } you may not use do and end.
Here is an example one-liner that adds numbers from standard input or any files in the argument list:
Modules
Modules serve two purposes in Ruby, namespacing and mix-in functionality.
A namespace can be used to organize code by package or functionality that separates common names
from interference by other packages. For example, the IRB namespace provides functionality for irb that
prevents a collision for the common name “Context”.
Mix-in functionality allows sharing common methods across multiple classes or modules. Ruby comes with
the Enumerable mix-in module which provides many enumeration methods based on the eachmethod
and Comparable allows comparison of objects based on the <=> comparison method.
Note that there are many similarities between modules and classes. Besides the ability to mix-in a module,
the description of modules below also applies to classes.
Module Definition
A module is created using the module keyword:
module MyModule
# ...
end
A module may be reopened any number of times to add, change or remove functionality:
module MyModule
def my_method
end
end
module MyModule
alias my_alias my_method
end
module MyModule
remove_method :my_method
end
Reopening classes is a very powerful feature of Ruby, but it is best to only reopen classes you own.
Reopening classes you do not own may lead to naming conflicts or difficult to diagnose bugs.
Nesting
Modules may be nested:
module Outer
module Inner
end
end
Many packages create a single outermost module (or class) to provide a namespace for their functionality.
You may also define inner modules using :: provided the outer modules (or classes) are already defined:
module Outer::Inner::GrandChild
end
Note that this will raise a NameError if Outer and Outer::Inner are not already defined.
This style has the benefit of allowing the author to reduce the amount of indentation. Instead of 3 levels of
indentation only one is necessary. However, the scope of constant lookup is different for creating a
namespace using this syntax instead of the more verbose syntax.
Scope
self
self refers to the object that defines the current scope. self will change when entering a different method or
when defining a new module.
Constants
Accessible constants are different depending on the module nesting (which syntax was used to define the
module). In the following example the constant A::Z is accessible from B as A is part of the nesting:
module A
Z=1
module B
p Module.nesting #=> [A::B, A]
p Z #=> 1
end
end
However, if you use :: to define A::B without nesting it inside A, a NameError exception will be raised
because the nesting does not include A:
module A
Z=1
end
module A::B
p Module.nesting #=> [A::B]
p Z #=> raises NameError
end
If a constant is defined at the top-level you may preceded it with :: to reference it:
Z=0
module A
Z=1
module B
p ::Z #=> 0
end
end
Methods
For method definition documentation see the syntax documentation for methods.
Class methods may be called directly. (This is slightly confusing, but a method on a module is often called a
“class method” instead of a “module method”. See also Module#module_function which can convert an
instance method into a class method.)
When a class method references a constant, it uses the same rules as referencing it outside the method as
the scope is the same.
Instance methods defined in a module are only callable when included. These methods have access to the
constants defined when they were included through the ancestors list:
module A
Z=1
def z
Z
end
end
include A
Visibility
Ruby has three types of visibility. The default is public. A public method may be called from any other
object.
The second visibility is protected. When calling a protected method the sender must inherit
the Class or Module which defines the method. Otherwise a NoMethodError will be raised.
Protected visibility is most frequently used to define == and other comparison methods where the author
does not wish to expose an object’s state to any caller and would like to restrict it only to inherited classes.
Here is an example:
class A
def n(other)
other.m
end
end
class B < A
def m
1
end
protected :m
end
class C < B
end
a = A.new
b = B.new
c = C.new
The third visibility is private. A private method may only be called from inside the owner class without a
receiver, or with a literal self as a receiver. If a private method is called with a receiver other than a
literal self, a NoMethodError will be raised.
class A
def without
m
end
def with_self
self.m
end
def with_other
A.new.m
end
def with_renamed
copy = self
copy.m
end
def m
1
end
private :m
end
a = A.new
a.without #=> 1
a.with_self #=> 1
a.with_other # NoMethodError (private method `m' called for #<A:0x0000559c287f27d0>)
a.with_renamed # NoMethodError (private method `m' called for #<A:0x0000559c285f8330>)
Classes
Every class is also a module, but unlike modules a class may not be mixed-in to another module (or class).
Like a module, a class can be used as a namespace. A class also inherits methods and constants from its
superclass.
Defining a class
Use the class keyword to create a class:
class MyClass
# ...
end
If you do not supply a superclass your new class will inherit from Object. You may inherit from a different
class using < followed by a class name:
There is a special class BasicObject which is designed as a blank class and includes a minimum of built-in
methods. You can use BasicObject to create an independent inheritance structure. See
the BasicObject documentation for further details.
Inheritance
Any method defined on a class is callable from its subclass:
class A
Z=1
def z
Z
end
end
class B < A
end
p B.new.z #=> 1
class A
Z=1
end
class B < A
def z
Z
end
end
p B.new.z #=> 1
You can override the functionality of a superclass method by redefining the method:
class A
def m
1
end
end
class B < A
def m
2
end
end
p B.new.m #=> 2
If you wish to invoke the superclass functionality from a method use super:
class A
def m
1
end
end
class B < A
def m
2 + super
end
end
p B.new.m #=> 3
When used without any arguments super uses the arguments given to the subclass method. To send no
arguments to the superclass method use super(). To send specific arguments to the superclass method
provide them manually like super(2).
super may be called as many times as you like in the subclass method.
Singleton Classes
The singleton class (also known as the metaclass or eigenclass) of an object is a class that holds methods
for only that instance. You can access the singleton class of an object using class << object like this:
class C
end
class << C
# self is the singleton class here
end
Most frequently you’ll see the singleton class accessed like this:
class C
class << self
# ...
end
end
This allows definition of methods and attributes on a class (or module) without needing to write def
self.my_method.
Since you can open the singleton class of any object this means that this code block:
o = Object.new
def o.my_method
1+1
end
o = Object.new
class << o
def my_method
1+1
end
end
Pattern matching
Pattern matching is a feature allowing deep matching of structured values: checking the structure and
binding the matched parts to local variables.
Pattern matching in Ruby is implemented with the case/in expression:
case <expression>
in <pattern1>
...
in <pattern2>
...
in <pattern3>
...
else
...
end
(Note that in and when branches can NOT be mixed in one case expression.)
Or with the => operator and the in operator, which can be used in a standalone expression:
<expression> in <pattern>
The case/in expression is exhaustive: if the value of the expression does not match any branch of
the case expression (and the else branch is absent), NoMatchingPatternError is raised.
Therefore, the case expression might be used for conditional matching and unpacking:
case config
in db: {user:} # matches subhash and puts matched value in variable user
puts "Connect with user '#{user}'"
in connection: {username: }
puts "Connect with user '#{username}'"
else
puts "Unrecognized structure of config"
end
# Prints: "Connect with user 'admin'"
whilst the => operator is most useful when the expected data structure is known beforehand, to just unpack
parts of it:
config => {db: {user:}} # will raise if the config's structure is unexpected
<expression> in <pattern> is the same as case <expression>; in <pattern>; true; else false; end. You can
use it when you only want to know if a pattern has been matched or not:
Patterns
Patterns can be:
any Ruby object (matched by the === operator, like in when); (Value pattern)
array pattern: [<subpattern>, <subpattern>, <subpattern>, ...]; (Array pattern)
find pattern: [*variable, <subpattern>, <subpattern>, <subpattern>, ..., *variable]; (Find pattern)
hash pattern: {key: <subpattern>, key: <subpattern>, ...}; (Hash pattern)
combination of patterns with |; (Alternative pattern)
variable capture: <pattern> => variable or variable; (As pattern, Variable pattern)
Any pattern can be nested inside array/find/hash patterns where <subpattern> is specified.
Array patterns and find patterns match arrays, or objects that respond to deconstruct (see below about the
latter). Hash patterns match hashes, or objects that respond to deconstruct_keys (see below about the
latter). Note that only symbol keys are supported for hash patterns.
An important difference between array and hash pattern behavior is that arrays match only a wholearray:
case [1, 2, 3]
in [Integer, Integer]
"matched"
else
"not matched"
end
#=> "not matched"
while the hash matches even if there are other keys besides the specified part:
case {a: 1, b: 2, c: 3}
in {a: Integer}
"matched"
else
"not matched"
end
#=> "matched"
{} is the only exclusion from this rule. It matches only if an empty hash is given:
case {a: 1, b: 2, c: 3}
in {}
"matched"
else
"not matched"
end
#=> "not matched"
case {}
in {}
"matched"
else
"not matched"
end
#=> "matched"
There is also a way to specify there should be no other keys in the matched hash except those explicitly
specified by the pattern, with **nil:
case {a: 1, b: 2}
in {a: Integer, **nil} # this will not match the pattern having keys other than a:
"matched a part"
in {a: Integer, b: Integer, **nil}
"matched a whole"
else
"not matched"
end
#=> "matched a whole"
case [1, 2, 3]
in [Integer, *]
"matched"
else
"not matched"
end
#=> "matched"
case {a: 1, b: 2, c: 3}
in {a: Integer, **}
"matched"
else
"not matched"
end
#=> "matched"
case [1, 2]
in Integer, Integer
"matched"
else
"not matched"
end
#=> "matched"
case {a: 1, b: 2, c: 3}
in a: Integer
"matched"
else
"not matched"
end
#=> "matched"
[1, 2] => a, b
[1, 2] in a, b
{a: 1, b: 2, c: 3} => a:
{a: 1, b: 2, c: 3} in a:
Find pattern is similar to array pattern but it can be used to check if the given object has any elements that
match the pattern:
Variable binding
Besides deep structural checks, one of the very important features of the pattern matching is the binding of
the matched parts to local variables. The basic form of binding is just specifying => variable_name after the
matched (sub)pattern (one might find this similar to storing exceptions in local variables in a rescue
ExceptionClass => var clause):
case [1, 2]
in Integer => a, Integer
"matched: #{a}"
else
"not matched"
end
#=> "matched: 1"
case {a: 1, b: 2, c: 3}
in a: Integer => m
"matched: #{m}"
else
"not matched"
end
#=> "matched: 1"
If no additional check is required, for only binding some part of the data to a variable, a simpler form could
be used:
case [1, 2]
in a, Integer
"matched: #{a}"
else
"not matched"
end
#=> "matched: 1"
case {a: 1, b: 2, c: 3}
in a: m
"matched: #{m}"
else
"not matched"
end
#=> "matched: 1"
For hash patterns, even a simpler form exists: key-only specification (without any sub-pattern) binds the
local variable with the key’s name, too:
case {a: 1, b: 2, c: 3}
in a:
"matched: #{a}"
else
"not matched"
end
#=> "matched: 1"
case [1, 2, 3]
in a, *rest
"matched: #{a}, #{rest}"
else
"not matched"
end
#=> "matched: 1, [2, 3]"
case {a: 1, b: 2, c: 3}
in a:, **rest
"matched: #{a}, #{rest}"
else
"not matched"
end
#=> "matched: 1, {:b=>2, :c=>3}"
Binding to variables currently does NOT work for alternative patterns joined with |:
case {a: 1, b: 2}
in {a: } | Array
"matched: #{a}"
else
"not matched"
end
Variables that start with _ are the only exclusions from this rule:
case {a: 1, b: 2}
in {a: _, b: _foo} | Array
"matched: #{_}, #{_foo}"
else
"not matched"
end
# => "matched: 1, 2"
It is, though, not advised to reuse the bound value, as this pattern’s goal is to signify a discarded value.
Variable pinning
Due to the variable binding feature, existing local variable can not be straightforwardly used as a sub-
pattern:
expectation = 18
case [1, 2]
in expectation, *rest
"matched. expectation was: #{expectation}"
else
"not matched. expectation was: #{expectation}"
end
# expected: "not matched. expectation was: 18"
# real: "matched. expectation was: 1" -- local variable just rewritten
For this case, the pin operator ^ can be used, to tell Ruby “just use this value as part of the pattern”:
expectation = 18
case [1, 2]
in ^expectation, *rest
"matched. expectation was: #{expectation}"
else
"not matched. expectation was: #{expectation}"
end
#=> "not matched. expectation was: 18"
One important usage of variable pinning is specifying that the same value should occur in the pattern
several times:
jane = {school: 'high', schools: [{id: 1, level: 'middle'}, {id: 2, level: 'high'}]}
john = {school: 'high', schools: [{id: 1, level: 'middle'}]}
case jane
in school:, schools: [*, {id:, level: ^school}] # select the last school, level should match
"matched. school: #{id}"
else
"not matched"
end
#=> "matched. school: 2"
case john # the specified school level is "high", but last school does not match
in school:, schools: [*, {id:, level: ^school}]
"matched. school: #{id}"
else
"not matched"
end
#=> "not matched"
In addition to pinning local variables, you can also pin instance, global, and class variables:
$gvar = 1
class A
@ivar = 2
@@cvar = 3
case [1, 2, 3]
in ^$gvar, ^@ivar, ^@@cvar
"matched"
else
"not matched"
end
#=> "matched"
end
You can also pin the result of arbitrary expressions using parentheses:
a=1
b=2
case 3
in ^(a + b)
"matched"
else
"not matched"
end
#=> "matched"
class Point
def initialize(x, y)
@x, @y = x, y
end
def deconstruct
puts "deconstruct called"
[@x, @y]
end
def deconstruct_keys(keys)
puts "deconstruct_keys called with #{keys.inspect}"
{x: @x, y: @y}
end
end
keys are passed to deconstruct_keys to provide a room for optimization in the matched class: if calculating
a full hash representation is expensive, one may calculate only the necessary subhash. When
the **rest pattern is used, nil is passed as a keys value:
Additionally, when matching custom classes, the expected class can be specified as part of the pattern and
is checked with ===
Guard clauses
if can be used to attach an additional condition (guard clause) when the pattern matches. This condition
may use bound variables:
case [1, 2]
in a, b if b == a*2
"matched"
else
"not matched"
end
#=> "matched"
case [1, 1]
in a, b if b == a*2
"matched"
else
"not matched"
end
#=> "not matched"
case [1, 1]
in a, b unless b == a*2
"matched"
else
"not matched"
end
#=> "matched"
Warning[:experimental] = false
eval('[0] => [*, 0, *]')
# ...no warning printed...
Note that pattern-matching warnings are raised at compile time, so this will not suppress the warning:
Warning[:experimental] = false # At the time this line is evaluated, the parsing happened and warning
emitted
[0] => [*, 0, *]
So, only subsequently loaded files or ‘eval`-ed code is affected by switching the flag.
Alternatively, the command line option -W:no-experimental can be used to turn off “experimental” feature
warnings.
pattern: value_pattern
| variable_pattern
| alternative_pattern
| as_pattern
| array_pattern
| find_pattern
| hash_pattern
value_pattern: literal
| Constant
| ^local_variable
| ^instance_variable
| ^class_variable
| ^global_variable
| ^(expression)
variable_pattern: variable
case [0, 1]
in [a, 2]
"not matched"
in b
"matched"
in c
"not matched"
end
a #=> undefined
c #=> undefined
$i = 0
ary = [0]
def ary.deconstruct
$i += 1
self
end
case ary
in [0, 1]
"not matched"
in [0]
"matched"
end
$i #=> undefined
Precedence
From highest to lowest, this is the precedence table for ruby. High precedence operations happen before
low precedence operations.
!, ~, unary +
**
unary -
*, /, %
+, -
<<, >>
&
|, ^
&&
||
.., ...
?, :
modifier-rescue
defined?
not
or, and
{ } blocks
a += 1 unless a.zero?
Note that (a if b rescue c) is parsed as ((a if b) rescue c) due to reasons not related to precedence.
See modifier statements.
{ ... } blocks have priority below all listed operations, but do ... end blocks have lower priority. All other words
in the precedence table above are keywords.
Precedence
From highest to lowest, this is the precedence table for ruby. High precedence operations happen before
low precedence operations.
!, ~, unary +
**
unary -
*, /, %
+, -
<<, >>
&
|, ^
&&
||
.., ...
?, :
modifier-rescue
defined?
not
or, and
{ } blocks
a += 1 unless a.zero?
Note that (a if b rescue c) is parsed as ((a if b) rescue c) due to reasons not related to precedence.
See modifier statements.
{ ... } blocks have priority below all listed operations, but do ... end blocks have lower priority.
All other words in the precedence table above are keywords.
Refinements
Due to Ruby’s open classes you can redefine or add functionality to existing classes. This is called a
“monkey patch”. Unfortunately the scope of such changes is global. All users of the monkey-patched class
see the same changes. This can cause unintended side-effects or breakage of programs.
Refinements are designed to reduce the impact of monkey patching on other users of the monkey-patched
class. Refinements provide a way to extend a class locally. Refinements can modify both classes and
modules.
Here is a basic refinement:
class C
def foo
puts "C#foo"
end
end
module M
refine C do
def foo
puts "C#foo in M"
end
end
end
using M
c = C.new
Scope
You may activate refinements at top-level, and inside classes and modules. You may not activate
refinements in method scope. Refinements are activated until the end of the current class or module
definition, or until the end of the current file if used at the top-level.
You may activate refinements in a string passed to Kernel#eval. Refinements are active until the end of the
eval string.
Refinements are lexical in scope. Refinements are only active within a scope after the call to using. Any
code before the using statement will not have the refinement activated.
When control is transferred outside the scope, the refinement is deactivated. This means that if you require
or load a file or call a method that is defined outside the current scope the refinement will be deactivated:
class C
end
module M
refine C do
def foo
puts "C#foo in M"
end
end
end
def call_foo(x)
x.foo
end
using M
x = C.new
x.foo # prints "C#foo in M"
call_foo(x) #=> raises NoMethodError
If a method is defined in a scope where a refinement is active, the refinement will be active when the
method is called. This example spans multiple files:
c.rb:
class C
end
m.rb:
require "c"
module M
refine C do
def foo
puts "C#foo in M"
end
end
end
m_user.rb:
require "m"
using M
class MUser
def call_foo(x)
x.foo
end
end
main.rb:
require "m_user"
x = C.new
m_user = MUser.new
m_user.call_foo(x) # prints "C#foo in M"
x.foo #=> raises NoMethodError
Since the refinement M is active in m_user.rb where MUser#call_foo is defined it is also active
when main.rb calls call_foo.
Since using is a method, refinements are only active when it is called. Here are examples of where a
refinement M is and is not active.
In a file:
In a class:
Note that the refinements in M are not activated automatically if the class Foo is reopened later.
In eval:
When defining multiple refinements in the same module inside multiple refine blocks, all refinements from
the same module are active when a refined method (any of the to_json methods from the example below) is
called:
module ToJSON
refine Integer do
def to_json
to_s
end
end
refine Array do
def to_json
"[" + map { |i| i.to_json }.join(",") + "]"
end
end
refine Hash do
def to_json
"{" + map { |k, v| k.to_s.dump + ":" + v.to_json }.join(",") + "}"
end
end
end
using ToJSON
Method Lookup
When looking up a method for an instance of class C Ruby checks:
If refinements are active for C, in the reverse order they were activated:
o The prepended modules from the refinement for C
o The refinement for C
o The included modules from the refinement for C
The prepended modules of C
C
The included modules of C
If no method was found at any point this repeats with the superclass of C.
Note that methods in a subclass have priority over refinements in a superclass. For example, if the
method / is defined in a refinement for Numeric 1 / 2 invokes the original Integer#/ because Integer is a
subclass of Numeric and is searched before the refinements for the superclass Numeric. Since the
method / is also present in child Integer, the method lookup does not move up to the superclass.
However, if a method foo is defined on Numeric in a refinement, 1.foo invokes that method since foo does
not exist on Integer.
super
When super is invoked method lookup checks:
The included modules of the current class. Note that the current class may be a refinement.
If the current class is a refinement, the method lookup proceeds as in the Method Lookup section
above.
If the current class has a direct superclass, the method proceeds as in the Method Lookup section
above using the superclass.
Note that super in a method of a refinement invokes the method in the refined class even if there is another
refinement which has been activated in the same context. This is only true for super in a method of a
refinement, it does not apply to super in a method in a module that is included in a refinement.
Methods Introspection
When using introspection methods such as Kernel#method or Kernel#methods refinements are not
honored.
This behavior may be changed in the future.
Refinement inheritance by Module#include
When a module X is included into a module Y, Y inherits refinements from X.
For example, C inherits refinements from A and B in the following code:
module A
end
module B
end
module C
include A
include B
end
using C
Further Reading
See bugs.ruby-lang.org/projects/ruby-master/wiki/RefinementsSpec for the current specification for
implementing refinements. The specification also contains more details.
Timezones
Timezone Specifiers
Certain Time methods accept arguments that specify timezones:
Time.at: keyword argument in:.
Time.new: positional argument zone or keyword argument in:.
Time.now: keyword argument in:.
Time#getlocal: positional argument zone.
Time#localtime: positional argument zone.
The value given with any of these must be one of the following (each detailed below):
Hours/minutes offset.
Single-letter offset.
Integer offset.
Timezone object.
Hours/Minutes Offsets
The zone value may be a string offset from UTC in the form '+HH:MM' or '-HH:MM', where:
HH is the 2-digit hour in the range 0..23.
MM is the 2-digit minute in the range 0..59.
Examples:
Single-Letter Offsets
The zone value may be a letter in the range 'A'..'I' or 'K'..'Z'; see List of military time zones:
Integer Offsets
The zone value may be an integer number of seconds in the range -86399..86399:
utc_to_local:
o Called when Time.at or Time.now is invoked with tz as the value for keyword
argument in:, and when Time#getlocal or Time#localtime is called with tz as the value for
positional argument zone.
o Argument: a Time::tm object.
o Returns: a Time-like object in the local timezone.
A custom timezone class may have these instance methods, which will be called if defined:
abbr:
o Called when Time#strftime is invoked with a format involving %Z.
o Argument: a Time::tm object.
o Returns: a string abbreviation for the timezone name.
dst?:
o Called when Time.at or Time.now is invoked with tz as the value for keyword
argument in:, and when Time#getlocal or Time#localtime is called with tz as the value for
positional argument zone.
o Argument: a Time::tm object.
o Returns: whether the time is daylight saving time.
name:
o Called when <tt>Marshal.dump(t) is invoked
o Argument: none.
o Returns: the string name of the timezone.
Requirement
1. Windows 7 or later.
2. Visual C++ 12.0 (2013) or later.
Note
if you want to build x64 version, use native compiler for x64.
3. Please set environment variable INCLUDE, LIB, PATH to run required commands properly from
the command line.
Note
building ruby requires following commands.
o nmake
o cl
o ml
o lib
o dumpbin
4. If you want to build from GIT source, following commands are required.
o bison
o patch
o sed
o ruby 2.0 or later
5. Enable Command Extension of your command line. It’s the default behavior of cmd.exe. If you
want to enable it explicitly, run cmd.exe with /E:ON option.
Icons
Any icon files(*.ico) in the build directory, directories specified with icondirs make variable
and win32directory under the ruby source directory will be included in DLL or executable files, according to
their base names.
$(RUBY_INSTALL_NAME).ico or ruby.ico --> $(RUBY_INSTALL_NAME).exe
Although no icons are distributed with the ruby source, you can use anything you like. You will be able to
find many images by search engines. For example, followings are made from Ruby logo kit:
Small favicon in the official site
ruby.morphball.net/vit-ruby-ico_en.html or icon itself
Build examples
Build on the ruby source directory.
ex.)
C:
cd \ruby
win32\configure --prefix=/usr/local
nmake
nmake check
nmake install
C:
cd \ruby
mkdir mswin32
cd mswin32
..\win32\configure --prefix=/usr/local
nmake
nmake check
nmake install
D:
cd D:\build\ruby
C:\src\ruby\win32\configure --prefix=/usr/local
nmake
nmake check
C:
cd \ruby
nmake
nmake check
nmake install
Bugs
You can NOT use a path name that contains any white space characters as the ruby source directory, this
restriction comes from the behavior of !INCLUDE directives of NMAKE.
You can build ruby in any directory including the source directory, except win32 directory in the source
directory. This is restriction originating in the path search method of NMAKE.