0% found this document useful (0 votes)
161 views

Binary Searching

The document provides information about binary searching and case mapping in Ruby: - It describes Ruby methods like Array#bsearch that perform binary searches on collections and return elements matching search criteria in O(log n) time. These methods operate in either find-minimum or find-any mode depending on the block returned. - It also covers string and symbol methods like downcase, upcase, capitalize that perform case mapping on characters using Unicode mappings. Case mapping may depend on locale and some mappings are context-sensitive or result in different character counts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views

Binary Searching

The document provides information about binary searching and case mapping in Ruby: - It describes Ruby methods like Array#bsearch that perform binary searches on collections and return elements matching search criteria in O(log n) time. These methods operate in either find-minimum or find-any mode depending on the block returned. - It also covers string and symbol methods like downcase, upcase, capitalize that perform case mapping on characters using Unicode mappings. Case mapping may depend on locale and some mappings are context-sensitive or result in different character counts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 348

Ruby User Manual

Ruby is an interpreted object-oriented programming language often used for web


development. It also offers many scripting features to process plain text and serialized
files, or manage system tasks. It is simple, straightforward, and extensible.
Binary Searching
A few Ruby methods support binary searching in a collection:
Array#bsearch Returns an element selected via a binary search as determined by a given block.
Array#bsearch_index Returns the index of an element selected via a binary search as determined by a
given block.
Range#bsearch Returns an element selected via a binary search as determined by a given block.

Each of these methods returns an enumerator if no block is given. Given a block, each of these methods
returns an element (or element index) from self as determined by a binary search. The search finds an
element of self which meets the given condition in O (log n) operations, where n is the count of
elements. self should be sorted, but this is not checked.
There are two search modes:

Find-minimum mode Method bsearch returns the first element for which the block returns true; the block
must return true or false.
Find-any mode: Method bsearch some element, if any, for which the block returns zero. the block must
return a numeric value. The block should not mix the modes by sometimes returning true or false and
other times returning a numeric value, but this is not checked.
Find-Minimum Mode: In find-minimum mode, the block must return true or false. The further requirement
(though not checked) is that there are no indexes i and j such that:
 0 <= i < j <= self.size.
 The block returns true for self-[i] and false for self-[j].

Less formally: the block is such that all false-evaluating elements precede all true-evaluating elements.
In find-minimum mode, method bsearch returns the first element for which the block returns true.

Examples:

a = [0, 4, 7, 10, 12]


a.bsearch {|x| x >= 4 } # => 4
a.bsearch {|x| x >= 6 } # => 7
a.bsearch {|x| x >= -1 } # => 0
a.bsearch {|x| x >= 100 } # => nil

r = (0...a.size)
r.bsearch {|i| a[i] >= 4 } #=> 1
r.bsearch {|i| a[i] >= 6 } #=> 2
r.bsearch {|i| a[i] >= 8 } #=> 3
r.bsearch {|i| a[i] >= 100 } #=> nil
r = (0.0...Float::INFINITY)
r.bsearch {|x| Math.log(x) >= 0 } #=> 1.0

These blocks make sense in find-minimum mode:


a = [0, 4, 7, 10, 12]
a.map {|x| x >= 4 } # => [false, true, true, true, true]
a.map {|x| x >= 6 } # => [false, false, true, true, true]
a.map {|x| x >= -1 } # => [true, true, true, true, true]
a.map {|x| x >= 100 } # => [false, false, false, false, false]

This would not make sense:

a.map {|x| x == 7 } # => [false, false, true, false, false]

Find-Any Mode: In find-any mode, the block must return a numeric value. The further requirement (though
not checked) is that there are no indexes i and j such that: 0 <= i < j <= self.size.
 The block returns a negative value for self-[i] and a positive value for self-[j].
 The block returns a negative value for self-[i] and zero self-[j].
 The block returns zero for self-[i] and a positive value for self[j].

Less formally: the block is such that:


 All positive-evaluating elements precede all zero-evaluating elements.
 All positive-evaluating elements precede all negative-evaluating elements.
 All zero-evaluating elements precede all negative-evaluating elements.
In find-any mode, method bsearch returns some element for which the block returns zero, or nil if no such
element is found.

Examples:

a = [0, 4, 7, 10, 12]


a.bsearch {|element| 7 <=> element } # => 7
a.bsearch {|element| -1 <=> element } # => nil
a.bsearch {|element| 5 <=> element } # => nil
a.bsearch {|element| 15 <=> element } # => nil

a = [0, 100, 100, 100, 200]


r = (0..4)
r.bsearch {|i| 100 - a[i] } #=> 1, 2 or 3
r.bsearch {|i| 300 - a[i] } #=> nil
r.bsearch {|i| 50 - a[i] } #=> nil

These blocks make sense in find-any mode:


a = [0, 4, 7, 10, 12]
a.map {|element| 7 <=> element } # => [1, 1, 0, -1, -1]
a.map {|element| -1 <=> element } # => [-1, -1, -1, -1, -1]
a.map {|element| 5 <=> element } # => [1, 1, -1, -1, -1]
a.map {|element| 15 <=> element } # => [1, 1, 1, 1, 1]

This would not make sense:

a.map {|element| element <=> 7 } # => [-1, -1, 0, 1, 1]

Bug Triaging Guide


This guide discusses recommendations for triaging bugs in Ruby’s bug tracker.

Bugs with Reproducible Examples


These are the best bug reports. First, consider whether the bug reported is actually an issue or if it is
expected Ruby behavior. If it is expected Ruby behavior, update the issue with why the behavior is
expected, and set the status to Rejected.
If the bug reported appears to be an actual bug, try reproducing the bug with the master branch. If you are
not able to reproduce the issue on the master branch, try reproducing it on the latest version for the branch
the bug was reported on. If you cannot reproduce the issue in either case, update the issue stating you
cannot reproduce the issue, ask the reporter if they can reproduce the issue with either the master branch
or a later release, and set the status to Feedback.
If you can reproduce the example with the master branch, try to figure out what is causing the issue. If you
feel comfortable, try working on a patch for the issue, update the issue, and attach the patch. Try to figure
out which committer should be assigned to the issue, and set them as the assignee, and set the status to
Assigned.
If you cannot reproduce the example with the master branch, but can reproduce the issue on the latest
version for the branch, then it is likely the bug has already been fixed, but it has not been back ported yet.
Try to determine which commit fixed it, and update the issue noting that the issue has been fixed but not yet
back ported. If the Ruby version is in the security maintenance phase or no longer supported, change the
status to Closed. This change can be made without adding a note to avoid spamming the mailing list.
For issues that may require backwards incompatible changes or may benefit from general committer
attention or discussion, consider adding them as agenda items for the next committer meeting (bugs.ruby-
lang.org/issues/14770).

Crash Bugs without Reproducers


Many bugs reported have little more than a crash report, often with no way to reproduce the issue. These
bugs are difficult to triage as they often do not contain enough information.
For these bugs, if the Ruby version is the master branch or is the latest release for the branch and the
branch is in normal maintenance phase, look at the back trace and see if you can determine what could be
causing the issue. If you can guess what could be causing the issue, see if you can put together a
reproducible example (this is in general quite difficult). If you cannot guess what could be causing the issue,
or cannot put together a reproducible example yourself, please ask the reporter to provide a reproducible
example, and change the status to Feedback.
If the Ruby version is no longer current (e.g. 2.5.0 when the latest version on the Ruby 2.5 branch is 2.5.5),
add a note to the issue asking the reporter to try the latest Ruby version for the branch and report back, and
change the status to Feedback. If the Ruby version is in the security maintenance phase or no longer
supported, change the status to Closed. This change can be made without adding a note.

Crash Bugs With 3rd Party C Extensions: If the crash happens inside a 3rd party C extension, try to figure
out inside which C extension it happens, and add a note to the issue to report the issue to that C extension,
and set the status to Third Party’s Issue.

Non-Bug reports: Any issues in the bug tracker that are not reports of problems should have the tracker
changed from Bug to either Feature (new features or performance improvements) or Misc. This change can
be made without adding a note.

Stale Issues: There are many issues that are stale, with no updates in months or even years. For stale
issues in Feedback state, where the feedback has not been received, you can change the status to Closed
without adding a note. For stale issues in Assigned state, you can reach out to the assignee and see if they
can update the issue. If the assignee is no longer an active committer, remove them as the assignee and
change the status to open.

Case Mapping: Some string-oriented methods use case mapping.In String:


 String#capitalize  String#downcase  String#upcase
 String#capitalize!  String#downcase!  String#upcase!
 String#casecmp  String#swapcase
 String#casecmp?  String#swapcase!

In Symbol:
 Symbol#capitalize  Symbol#casecmp?  Symbol#swapcase
 Symbol#casecmp  Symbol#downcase  Symbol#upcase

Default Case Mapping: By default, all of these methods use full Unicode case mapping, which is suitable for
most languages. See Section 3.13 (Default Case Algorithms) of the Unicode standard. Non-ASCII case
mapping and folding are supported for UTF-8, UTF-16BE/LE, UTF-32BE/LE, and ISO-8859-1~16 Strings
/Symbols.
Context-dependent case mapping as described in Table 3-17 (Context Specification for Casing) of the
Unicode standard is currently not supported.

In most cases, case conversions of a string have the same number of characters. There are exceptions
(see also :fold below):

s = "\u00DF" # => "ß"


s.upcase # => "SS"
s = "\u0149" # => "ʼn"
s.upcase # => "ʼN"

.
Case mapping may also depend on locale (see also :turkic below):

s = "\u0049" # => "I"


s.downcase # => "i" # Dot above.
s.downcase(:turkic) # => "ı" # No dot above.

Case changes may not be reversible:

s = 'Hello World!' # => "Hello World!"


s.downcase # => "hello world!"
s.downcase.upcase # => "HELLO WORLD!" # Different from original s.

Case changing methods may not maintain Unicode normalization. See String#unicode_normalize).

Options for Case Mapping


Except for casecmp and casecmp?, each of the case-mapping methods listed above accepts optional
arguments, *options.
The arguments may be:
 :ascii only.
 :fold only.
 :turkic or :lithuanian or both.
The options:
 :ascii: ASCII-only mapping: uppercase letters (‘A’..‘Z’) are mapped to lowercase letters (‘a’..‘z);
other characters are not changed

 s = "Foo \u00D8 \u00F8 Bar" # => "Foo Ø ø Bar"


 s.upcase # => "FOO Ø Ø BAR"
 s.downcase # => "foo ø ø bar"
 s.upcase(:ascii) # => "FOO Ø ø BAR"
 s.downcase(:ascii) # => "foo Ø ø bar"

 :turkic: Full Unicode case mapping, adapted for the Turkic languages that distinguish dotted and
dot less I, for example Turkish and Azeri.

 s = 'Türkiye' # => "Türkiye"


 s.upcase # => "TÜRKIYE"
 s.upcase(:turkic) # => "TÜRKİYE" # Dot above.

 s = 'TÜRKIYE' # => "TÜRKIYE"
 s.downcase # => "türkiye"
 s.downcase(:turkic) # => "türkıye" # No dot above.

 :lithuanian: Not yet implemented.


 :fold (available only for String#downcase, String#downcase!, and Symbol#downcase): Unicode
case folding, which is more far-reaching than Unicode case mapping.

 s = "\u00DF" # => "ß"


 s.downcase # => "ß"
 s.downcase(:fold) # => "ss"
 s.upcase # => "SS"

 s = "\uFB04" # => "ffl"
 s.downcase # => "ffl"
 s.upcase # => "FFL"
 s.downcase(:fold) # => "ffl"

Character Selector: A character selector is a string argument accepted by certain Ruby methods. Each of
these instance methods accepts one or more character selectors:
 String#tr(selector, replacements): returns a new string.
 String#tr!(selector, replacements): returns self or nil.
 String#tr_s(selector, replacements): returns a new string.
 String#tr_s!(selector, replacements): returns self or nil.
 String#count(*selectors): returns the count of the specified characters.
 String#delete(*selectors): returns a new string.
 String#delete!(*selectors): returns self or nil.
 String#squeeze(*selectors): returns a new string.
 String#squeeze!(*selectors): returns self or nil.

A character selector identifies zero or more characters in self that are to be operands for the method.
In this section, we illustrate using method String#delete(selector), which deletes the selected characters.
In the simplest case, the characters selected are exactly those contained in the selector itself:

'abracadabra'.delete('a') # => "brcdbr"


'abracadabra'.delete('ab') # => "rcdr"
'abracadabra'.delete('abc') # => "rdr"
'0123456789'.delete('258') # => "0134679"
'!@#$%&*()_+'.delete('+&#') # => "!@$%*()_"
'тест'.delete('т') # => "ес"
'こんにちは'.delete('に') # => "こんちは"

Note that order and repetitions do not matter:

'abracadabra'.delete('dcab') # => "rr"


'abracadabra'.delete('aaaa') # => "brcdbr"

In a character selector, these three characters get special treatment:


 A leading caret ('^') functions as a “not” operator for the characters to its right:

 'abracadabra'.delete('^bc') # => "bcb"


 '0123456789'.delete('^852') # => "258"

 A hyphen ('-') between two other characters defines a range of characters instead of a plain string
of characters:

 'abracadabra'.delete('a-d') # => "rr"


 '0123456789'.delete('4-7') # => "012389"
 '!@#$%&*()_+'.delete(' -/') # => "@^_"

 # May contain more than one range.
 'abracadabra'.delete('a-cq-t') # => "d"

 # Ranges may be mixed with plain characters.
 '0123456789'.delete('67-950-23') # => "4"

 # Ranges may be mixed with negations.
 'abracadabra'.delete('^a-c') # => "abacaaba"

 A backslash ('\') acts as an escape for a caret, a hyphen, or another backslash:

 'abracadabra^'.delete('\^bc') # => "araadara"


 'abracadabra-'.delete('a\-d') # => "brcbr"
 "hello\r\nworld".delete("\r") # => "hello\nworld"
 "hello\r\nworld".delete("\\r") # => "hello\r\nwold"
 "hello\r\nworld".delete("\\\r") # => "hello\nworld"

Multiple Character Selectors


These instance methods accept multiple character selectors:
 String#count(*selectors): returns the count of the specified characters.
 String#delete(*selectors): returns a new string.
 String#delete!(*selectors): returns self or nil.
 String#squeeze(*selectors): returns a new string.
 String#squeeze!(*selectors): returns self or nil.

In effect, the given selectors are formed into a single selector consisting of only those characters common
to all of the given selectors. All forms of selectors may be used, including negations, ranges, and escapes.
Each of these pairs of method calls is equivalent:

s.delete('abcde', 'dcbfg')
s.delete('bcd')

s.delete('^abc', '^def')
s.delete('^abcdef')

s.delete('a-e', 'c-g')
s.delete('cde')

Command Injection: Some Ruby core methods accept string data that includes text to be executed as a
system command. They should not be called with unknown or unsanitized commands. These methods
include:

 Kernel.system
 `command` (backtick method) (also called by the expression %x[command]).
 IO.popen(command).
 IO.read(command).
 IO.write(command).
 IO.binread(command).
 IO.binwrite(command).
 IO.readlines(command).
 IO.foreach(command).

Note that some of these methods do not execute commands when called from subclass File:
 File.read(path).  File.binread(path).  File.readlines(path).
 File.write(path).  File.binwrite(path).  File.foreach(path).
Contributing to Ruby: This guide outlines ways to get started with contributing to Ruby:

 Reporting issues: How to report issues, how to request features, and how backporting works

 Building Ruby: How to build Ruby on your local machine for development

 Testing Ruby: How to test Ruby on your local machine once you’ve built it

 Making changes to Ruby: How to submit pull requests to change Ruby’s documentation, code, test
suite, or standard libraries

 Making changes to Ruby standard libraries: How to build, test, and contribute to Ruby standard
libraries

 Making changes to Ruby documentation: How to make changes to Ruby documentation

 Benchmarking Ruby: How to benchmark Ruby

Building Ruby - Quick start guide

1. Install the prerequisite dependencies for building the CRuby interpreter:

o C compiler o gperf - 3.0.3 or later


o autoconf - 2.67 or later o ruby - 2.7 or later
o bison - 3.0 or later

2. Install optional, recommended dependencies:


o OpenSSL/LibreSSL o libyaml
o readline/editline (libedit) o libexecinfo (FreeBSD)
o zlib o rustc - 1.58.0 or later (if you wish to
o libffi build YJIT)

3. Checkout the CRuby source code: git clone https://github.com/ruby/ruby.git


4. Generate the configure file: ./autogen.sh

5. Create a build directory outside of the source directory: mkdir build && cd build

While it's not necessary to build in a separate directory, it's good practice to do so.

6. We'll install Ruby in ~/.rubies/ruby-master, so create the directory: mkdir ~/.rubies

7. Run configure: ../configure --prefix="${HOME}/.rubies/ruby-master"

o If you are frequently building Ruby, add the --disable-install-doc flag to not build
documentation which will speed up the build process.

8. Build Ruby:
9. make install

o If you're on macOS and installed OpenSSL through Homebrew, you may encounter failure to
build OpenSSL that look like this:

o openssl:

o Could not be configured. It will not be installed.

o ruby/ext/openssl/extconf.rb: OpenSSL library could not be found. You might want to use --
with-openssl-dir=<dir> option to specify the prefix where OpenSSL is installed.

Check ext/openssl/mkmf.log for more details.

Adding --with-openssl-dir=$(brew --prefix openssl) to the list of options passed to configure may solve the
issue.

Remember to delete your build directory and start again from the configure step.

10. Run tests to confirm your build succeeded.

Unexplainable Build Errors

If you are having unexplainable build errors, after saving all your work, try running git clean -xfdin the source
root to remove all git ignored local files. If you are working from a source directory that's been updated
several times, you may have temporary build artifacts from previous releases which can cause build
failures.

More details
If you're interested in continuing development on Ruby, here are more details about Ruby's build to help out.

Running makes scripts in parallel


In GNU make and BSD make implementations, to run a specific make script in parallel, pass the flag -
j<number of processes>. For instance, to run tests on 8 processes, use: make test-all -j8

We can also set MAKEFLAGS to run all make commands in parallel. Having the right --jobs flag will ensure
all processors are utilized when building software projects. To do this effectively, you can set MAKEFLAGS
in your shell configuration/profile:

# On macOS with Fish shell: export MAKEFLAGS="--jobs "(sysctl -n hw.ncpu)


# On macOS with Bash/ZSH shell: export MAKEFLAGS="--jobs $(sysctl -n hw.ncpu)"
# On Linux with Fish shell: export MAKEFLAGS="--jobs "(nproc)
# On Linux with Bash/ZSH shell: export MAKEFLAGS="--jobs $(nproc)"
Miniruby vs Ruby: Miniruby is a version of Ruby which has no external dependencies and lacks certain
features. It can be useful in Ruby development because it allows for faster build times. Miniruby is built
before Ruby. A functional Miniruby is required to build Ruby. To build Miniruby: make miniruby

Debugging: You can use either lldb or gdb for debugging. Before debugging, you need to create
a test.rb with the Ruby script you’d like to run. You can use the following make targets:

 make run: Runs test.rb using Miniruby


 make lldb: Runs test.rb using Miniruby in lldb
 make gdb: Runs test.rb using Miniruby in gdb
 make runruby: Runs test.rb using Ruby
 make lldb-ruby: Runs test.rb using Ruby in lldb
 make gdb-ruby: Runs test.rb using Ruby in gdb

Building with Address Sanitizer


Using the address sanitizer is a great way to detect memory issues:

./autogen.sh
mkdir build && cd build
export ASAN_OPTIONS="halt_on_error=0:use_sigaltstack=0:detect_leaks=0"
../configure cppflags="-fsanitize=address -fno-omit-frame-pointer" optflags=-O0 LDFLAGS="-
fsanitize=address -fno-omit-frame-pointer" make

On Linux it is important to specify -O0 when debugging. This is especially true for ASAN which sometimes
works incorrectly at higher optimization levels.

How to measure coverage of C and Ruby code


You need to be able to use gcc (gcov) and lcov visualizer.
./autogen.sh
./configure --enable-gcov
make
make update-coverage
rm -f test-coverage.dat
make test-all COVERAGE=true
make lcov
open lcov-out/index.html

If you need only C code coverage, you can remove COVERAGE=true from the above process. You can
also use gcov command directly to get per-file coverage.

If you need only Ruby code coverage, you can remove --enable-gcov. Note that test
coverage.dat accumulates all runs of make test-all. Make sure that you remove the file if you want to
measure one test run. You can see the coverage result of CI: rubyci.org/coverage
Documentation Guide: This guide discusses recommendations for documenting classes, modules, and
methods in the Ruby core and in the Ruby standard library.

Generating documentation
Most Ruby documentation lives in the source files and is written in RDoc format. Some pages live under
the doc folder and can be written in either .rdoc or .md format, determined by the file extension

To generate the output of documentation changes in HTML in the {build folder}/.ext /html directory, run the
following inside your build directory: make html

Then you can preview your changes by opening {build folder}/.ext/html/index.html file in your browser.

Goal: The goal of Ruby documentation is to impart the most important and relevant in the shortest time. The
reader should be able to quickly understand the usefulness of the subject code and how to use it.
Providing too little information is bad, but providing unimportant information or unnecessary examples is not
good either. Use your judgment about what the user needs to know.

General Guidelines
 Keep in mind that the reader may not be fluent in English.
 Write short declarative or imperative sentences.
 Group sentences into (ideally short) paragraphs, each covering a single topic.
 Organize material with headers.
 Refer to authoritative and relevant sources using links.
 Use simple verb tenses: simple present, simple past, simple future.
 Use simple sentence structure, not compound or complex structure.
 Avoid:
o Excessive comma-separated phrases; consider a list.
o Idioms and culture-specific references.
o Overuse of headers.
o Using US-ASCII-incompatible characters in C source files; see Characters below.

Characters
Use only US-ASCII-compatible characters in a C source file. (If you use other characters, the Ruby CI will
gently let you know.) If want to put ASCII-incompatible characters into the documentation for a C-coded
class, module, or method, there are workarounds involving new files doc/*.rdoc:
 For class Foo (defined in file foo.c), create file doc /foo. rdoc, declare class Foo; end, and place
the class documentation above that declaration:
 # Documentation for class Foo goes here.
 class Foo; end

 Similarly, for module Bar (defined in file bar.c, create file doc/bar.rdoc, declare module Bar; end,
and place the module documentation above that declaration:

 # Documentation for module Bar goes here.


 module Bar; end

 For a method, things are different. Documenting a method as above disables the "click to toggle
source" feature in the rendered documentation.
Therefore it's best to use file inclusion:
o Retain the call-seq in the C code.
o Use file inclusion (:include:) to include text from an .rdoc file.
Example:

/* * call-seq: * each_byte {|byte| ... } -> self * each_byte -> enumerator * :include:
doc/string/each_byte.rdoc * */

RDoc: Ruby is documented using RDoc. For information on RDoc syntax and features, see the RDoc
Markup Reference.

Output from irb: For code examples, consider using interactive Ruby, irb. For a code example that
includes irb output, consider aligning # => ... in successive lines. Alignment may sometimes aid readability:

a = [1, 2, 3] #=> [1, 2, 3] a.shuffle! #=> [2, 3, 1] a #=> [2, 3, 1]

Headers: Organize a long discussion with headers.

Blank Lines: A blank line begins a new paragraph. A code block or list should be preceded by and followed
by a blank line. This is unnecessary for the HTML output, but helps in the ri output.

Method Names: For a method name in text:


 For a method in the current class or module, use a double-colon for a singleton method, or a hash
mark for an instance method: ::bar, #baz.
 Otherwise, include the class or module name and use a dot for a singleton method, or a hash
mark for an instance method: Foo.bar, Foo#baz.
Auto-Linking: In general, RDoc’s auto-linking should not be suppressed. For example, we should
write Array, not \Array. We might consider whether to suppress when:
 The word in question does not refer to a Ruby entity (e.g., some uses of Class or English).
 The reference is to the current class document (e.g., Array in the documentation for class Array).
 The same reference is repeated many times (e.g., RDoc on this page).

HTML Tags: In general, avoid using HTML tags (even in formats where it’s allowed) because ri (the Ruby
Interactive reference tool) may not render them properly.

Tables : In particular, avoid building tables with HTML tags (<table>, etc.). Alternatives are:
 The GFM (GitHub Flavored Markdown) table extension, which is enabled by default. SeeGFM
tables extension.
 A verbatim text block, using spaces and punctuation to format the text. Note that text markupwill
not be honored.
Documenting Classes and Modules: The general structure of the class or module documentation should be:
 Synopsis  Common uses, with  “What’s Here”
examples summary (optional)

Synopsis: The synopsis is a short description of what the class or module does and why the reader might
want to use it. Avoid details in the synopsis.

Common Uses: Show common uses of the class or module. Depending on the class or module, this section
may vary greatly in both length and complexity.

What here’s summary?: The documentation for a class or module may include a “What’s Here” section.
Guidelines:
 The section title is What's Here.
 Consider listing the parent class and any included modules; consider links to their "What's Here"
sections if those exist.
 List methods as a bullet list:
o Begin each item with the method name, followed by a colon and a short description.
o If the method has aliases, mention them in parentheses before the colon (and do not list
the aliases separately).
o Check the rendered documentation to determine whether RDoc has recognized the
method and linked to it; if not, manually insert a link.
 If there are numerous entries, consider grouping them into subsections with headers.
 If there are more than a few such subsections, consider adding a table of contents just below the
main section title.

Documenting Methods: General Structure


The general structure of the method documentation should be:
 Calling sequence (methods written in C).  Corner cases and exceptions.
 Synopsis (short description).  Aliases.
 Details and examples.  Related methods (optional).
 Argument description (if necessary).

Calling Sequence (for methods written in C)


 For methods written in Ruby, RDoc documents the calling sequence automatically.
 For methods written in C, RDoc cannot determine what arguments the method accepts, so those
need to be documented using RDoc directive {call-seq:}.
 For a singleton method, use the form:

class_name.method_name(method_args) {|block_args| ... } -> return_type

Example:

* call-seq:
* Hash.new(default_value = nil) -> new_hash
* Hash.new {|hash, key| ... } -> new_hash

For an instance method, use the form (omitting any prefix, just as RDoc does for a Ruby-coded method):

method_name(method_args) {|block_args| ... } -> return_type

For example, in Array, use:

* call-seq: * count -> integer * count(obj) -> integer * count {|element| ... } -> integer

* call-seq: * <=> other -> -1, 0, 1, or nil

Arguments:
 If the method does not accept arguments, omit the parentheses.
 If the method accepts optional arguments:
o Separate each argument name and its default value with = (equal-sign with surrounding
spaces).
o If the method has the same behavior with either an omitted or an explicit argument, use
a call-seq with optional arguments. For example, use:

respond_to?(symbol, include_all = false) -> true or false


o If the behavior is different with an omitted or an explicit argument, use a call-seq with
separate lines. For example, in Enumerable, use:

o * max -> element * max(n) -> array

Block:
 If the method does not accept a block, omit the block.
 If the method accepts a block, the call-seq should have {|args| ... }, not {|args| block } or {|args|
code }.

Return types:
 If the method can return multiple different types, separate the types with “or” and, if necessary,
commas.
 If the method can return multiple types, use object.
 If the method returns the receiver, use self.
 If the method returns an object of the same class, prefix new_ if an only if the object is not self;
example: new_array.

Aliases:
 Omit aliases from the call-seq, but mention them near the end (see below).

Synopsis: The synopsis comes next, and is a short description of what the method does and why you would
want to use it. Ideally, this is a single sentence, but for more complex methods it may require an entire
paragraph.
For Array#count, the synopsis is:

Returns a count of specified elements.

This is great as it is short and descriptive. Avoid documenting too much in the synopsis, stick to the most
important information for the benefit of the reader.

Details and Examples: Most non-trivial methods benefit from examples, as well as details beyond what is
given in the synopsis. In the details and examples section, you can document how the method handles
different types of arguments, and provides examples on proper usage. In this section, focus on how to use
the method properly, not on how the method handles improper arguments or corner cases.
Not every behavior of a method requires an example. If the method is documented to return self, you don’t
need to provide an example showing the return value is the same as the receiver. If the method is
documented to return nil, you don’t need to provide an example showing that it returns nil. If the details
mention that for a certain argument type, an empty array is returned, you don’t need to provide an example
for that.
Only add an example if it provides the user additional information, do not add an example if it provides the
same information given in the synopsis or details. The purpose of examples is not to prove what the details
are stating.
Argument Description (if necessary): For methods that require arguments, if not obvious and not explicitly
mentioned in the details or implicitly shown in the examples, you can provide details about the types of
arguments supported. When discussing the types of arguments, use simple language even if less-precise,
such as "level must be an integer", not "level must be an Integer-convertible object". The vast majority of
use will be with the expected type, not an argument that is explicitly convertible to the expected type, and
documenting the difference is not important.
For methods that take blocks, it can be useful to document the type of argument passed if it is not obvious,
not explicitly mentioned in the details, and not implicitly shown in the examples.
If there is more than one argument or block argument, use a labeled list.

Corner Cases and Exceptions: For corner cases of methods, such as atypical usage, briefly mention the
behavior, but do not provide any examples.
Only document exceptions raised if they are not obvious. For example, if you have stated earlier than an
argument type must be an integer, you do not need to document that a TypeError is raised if a non-integer
is passed. Do not provide examples of exceptions being raised unless that is a common case, such
as Hash#fetch raising a KeyError.

Aliases
Mention aliases in the form

// Array#find_index is an alias for Array#index.

Related Methods (optional): In some cases, it is useful to document which methods are related to the
current method. For example, documentation for Hash#[] might mention Hash#fetch as a related method,
and Hash#mergemight mention Hash#merge! as a related method.
 Consider which methods may be related to the current method, and if you think the reader would
benefit it, at the end of the method documentation, add a line starting with "Related: " (e.g.
"Related: fetch.").
 Don't list more than three related methods. If you think more than three methods are related, list
the three you think are most important.
 Consider adding:
o A phrase suggesting how the related method is similar to, or different from,the current
method. See an example at Time#getutc.
o Example code that illustrates the similarities and differences. See examples
at Time#ctime, Time#inspect, Time#to_s.
Methods Accepting Multiple Argument Types: For methods that accept multiple argument types, in some
cases it can be useful to document the different argument types separately. It's best to use a separate
paragraph for each case you are discussing.

Contributing a pull request - Code style


Here are some general rules to follow when writing Ruby and C code for CRuby:
 Do not change code unrelated to your pull request (including style fixes)
 Indent 4 spaces for C without tabs (tabs are two levels of indentation, equivalent to 8 spaces)
 Indent 2 spaces for Ruby without tabs
 ANSI C style for function declarations
 Follow C99 Standard
 PascalStyle for class/module names
 UNDERSCORE_SEPARATED_UPPER_CASE for other constants
 Abbreviations should be all upper case

Commit messages
Use the following style for commit messages:
 Use a succinct subject line
 Include reasoning behind the change in the commit message, focusing on why the change is being
made
 Refer to issue (such as Fixes [Bug #1234] or Implements [Feature #3456]), or discussion on the
mailing list (such as [ruby-core:12345])

CI : GitHub actions will run on each pull request. There is a CI that runs on master. It has broad coverage of
different systems and architectures, such as Solaris SPARC and macOS.

Making Changes To Standard Libraries: Everything in the lib directory is mirrored from a standalone
repository into the Ruby repository. If you’d like to make contributions to standard libraries, do so in the
standalone repositories, and the changes will be automatically mirrored into the Ruby repository.
For example, CSV lives in a separate repository and is mirrored into Ruby.

Maintainers: You can find the list of maintainers here.

Build First, install its dependencies using:

bundle install

Libraries with C-extension: If the library has a /ext directory, it has C files that you need to compile with:
bundle exec rake compile

Running tests: All standard libraries use test-unit as the test framework. To run all tests:

bundle exec rake test

To run a single test file:

bundle exec rake test TEST="test/test_foo.rb"

To run a single test case:

bundle exec rake test TEST="test/test_foo.rb" TESTOPS="--name=/test_mytest/"

Reporting Issues: Reporting security issues


If you’ve found security vulnerability, please follow these instructions.

Reporting bugs: If you’ve encountered a bug in Ruby, please report it to the Redmine issue tracker available
at bugs.ruby-lang.org, by following these steps:
 Check if anyone has already reported your issue by searching the Redmine issue tracker.
 If you haven’t already, sign up for an account on the Redmine issue tracker.
 If you can’t find a ticket addressing your issue, please create a new issue. You will need to fill in
the subject, description and Ruby version.
o Ensure the issue exists on Ruby master by trying to replicate your bug on the head of
master (see "making changes to Ruby").
o Write a concise subject and briefly describe your problem in the description section. If
your issue affects a released version of Ruby, please say so.
o Fill in the Ruby version you're using when experiencing this issue (the output of
running ruby -v).
o Attach any logs or reproducible programs to provide additional information. Any scripts
should be as small as possible.
 If the ticket doesn’t have any replies after 10 days, you can send a reminder.
 Please reply to feedback requests. If a bug report doesn't get any feedback, it'll eventually get
rejected.
Reporting website issues: If you’re having an issue with the bug tracker or the mailing list, you can contact
the webmaster, Hiroshi SHIBATA ([email protected]). You can report issues with ruby-lang.org on
the repo's issue tracker.

Requesting features: If there’s a new feature that you want to see added to Ruby, you will need to write a
proposal on the Redmine issue tracker. When you open the issue, select Feature in the Tracker dropdown.
When writing a proposal, be sure to check for previous discussions on the topic and have a solid use case.
You should also consider the potential compatibility issues that this new feature might raise. Consider
making your feature into a gem, and if there are enough people who benefit from your feature it could help
persuade Ruby core.
Here is a template you can use for a feature proposal:

[Abstract]: Briefly summarize your feature


[Background] Describe current behavior
[Proposal] Describe your feature in detail
[Use cases] Give specific example uses of your feature
[Discussion] Describe why this feature is necessary and better than using existing features
[See also] Link to other related resources (such as implementations in other languages)

Backport requests: If a bug exists in a released version of Ruby, please report this in the issue. Once this
bug is fixed, the fix can be backported if deemed necessary. Only Ruby committers can request
backporting, and backporting is done by the backport manager. New patch versions are released at the
discretion of the backport manager.
Ruby versions can be in one of three maintenance states:
 Stable releases: backport any bug fixes
 Security maintenance: only backport security fixes
 End of life: no backports, please upgrade your Ruby version

Add context to existing issues¶ ↑


There are several ways you can help with a bug that aren’t directly resolving it. These include:
 Verifying or reproducing the existing issue and reporting it
 Adding more specific reproduction instructions
 Contributing a failing test as a patch (see “making changes to Ruby”)
 Testing patches that others have submitted (see “making changes to Ruby”)

Testing Ruby - Test suites: There are several test suites in the Ruby codebase: We can run any of the make
scripts in parallel to speed them up.
1. bootstraptest/
This is a small test suite that runs on Miniruby (see building Ruby). We can run it with:

make btest
To run it with logs, we can use:

make btest OPTS=-v

To run individual bootstrap tests, we can either specify a list of filenames or use the --setsflag in
the variable BTESTS:

make btest BTESTS="bootstraptest/test_fork.rb bootstraptest/tes_gc.rb"


make btest BTESTS="--sets=fork,gc"

If we want to run the bootstrap test suite on Ruby (not Miniruby), we can use:

make test

To run it with logs, we can use:

make test OPTS=-v

To run a file or directory with GNU make, we can use:

make test/ruby/test_foo.rb
make test/ruby/test_foo.rb TESTOPTS="-n /test_bar/"

2. test / This is a more comprehensive test suite that runs on Ruby. We can run it with:

make test-all

We can run a specific test directory in this suite using the TESTS option, for example:

make test-all TESTS=test/rubygems

We can run a specific test file in this suite by also using the TESTS option, for example:

make test-all TESTS=test/ruby/test_array.rb

We can run a specific test in this suite using the TESTS option, specifying first the file name, and
then the test name, prefixed with --name. For example:

make test-all TESTS="../test/ruby/test_alias.rb --name=/test_alias_with_zsuper_method/"

To run these specs with logs, we can use:

make test-all TESTS=-v


If we would like to run both the test/ and bootstraptest/ test suites, we can run

make check

3. spec/ruby
This is a test suite that exists in the Ruby spec repository and is mirrored into
the spec/ruby directory in the Ruby repository. It tests the behavior of the Ruby programming
language. We can run this using:

make test-spec

To run a specific directory, we can use MSPECOPT to specify the directory:

make test-spec MSPECOPT=spec/ruby/core/array

To run a specific file, we can also use MSPECOPT to specify the file:

make test-spec MSPECOPT=spec/ruby/core/array/any_spec.rb

To run a specific test, we can use the --example flag to match against the test name:

make test-spec MSPECOPT="../spec/ruby/core/array/any_spec.rb --example='is false if the


array is empty'"

To run these specs with logs, we can use:

make test-spec MSPECOPT=-Vfs

To run a ruby-spec file or directory with GNU make, we can use

make spec/ruby/core/foo/bar_spec.rb

4. spec/bundler
The bundler test suite exists in the RubyGems repository and is mirrored into
the spec/bundler directory in the Ruby repository. We can run this using:

make test-bundler

To run a specific bundler spec file, we can use BUNDLER_SPECS as follows:

$ make test-bundler BUNDLER_SPECS=commands/exec_spec.rb


Dig Methods: Ruby’s dig methods are useful for accessing nested data structures. Consider this data:

item = {
id: "0001",
type: "donut",
name: "Cake",
ppu: 0.55,
batters: {
batter: [
{id: "1001", type: "Regular"},
{id: "1002", type: "Chocolate"},
{id: "1003", type: "Blueberry"},
{id: "1004", type: "Devil's Food"}
]
},
topping: [
{id: "5001", type: "None"},
{id: "5002", type: "Glazed"},
{id: "5005", type: "Sugar"},
{id: "5007", type: "Powdered Sugar"},
{id: "5006", type: "Chocolate with Sprinkles"},
{id: "5003", type: "Chocolate"},
{id: "5004", type: "Maple"}
]
}

Without a dig method, you can write:

item[:batters][:batter][1][:type] # => "Chocolate"

With a dig method, you can write:

item.dig(:batters, :batter, 1, :type) # => "Chocolate"

Without a dig method, you can write, erroneously (raises NoMethodError (undefined method `[]' for
nil:NilClass)):

item[:batters][:BATTER][1][:type]

With a dig method, you can write (still erroneously, but avoiding the exception):

item.dig(:batters, :BATTER, 1, :type) # => nil


Why Is dig Better?
 It has fewer syntactical elements (to get wrong).
 It reads better.
 It does not raise an exception if an item is not found.

Why Is dig Better?


 It has fewer syntactical elements (to get wrong).
 It reads better.
 It does not raise an exception if an item is not found.

Why Is dig Better?


 It has fewer syntactical elements (to get wrong).
 It reads better.
 It does not raise an exception if an item is not found.

How Does dig Work?


The call sequence is:

obj.dig(*identifiers)

The identifiers define a “path” into the nested data structures:


 For each identifier in identifiers, calls method #dig on a receiver with that identifier.
 The first receiver is self.
 Each successive receiver is the value returned by the previous call to dig.
 The value finally returned is the value returned by the last call to dig.

A dig method raises an exception if any receiver does not respond to #dig:

h = { foo: 1 }
# Raises TypeError (Integer does not have #dig method):
h.dig(:foo, :bar)

What Else?: The structure above has Hash objects and Array objects, both of which have instance
method dig. Altogether there are six built-in Ruby classes that have method dig, three in the core classes
and three in the standard library. In the core:
 Array#dig: the first argument is an Integer index.
 Hash#dig: the first argument is a key.
 Struct#dig: the first argument is a key.

In the standard library:


 OpenStruct#dig: the first argument is a String name.
 CSV::Table#dig: the first argument is an Integer index or a String header.
 CSV::Row#dig: the first argument is an Integer index or a String header.
DTrace Probes
A list of DTrace probes and their functionality. “Module” and “Function” cannot be defined in user defined
probes (known as USDT), so they will not be specified. Probe definitions are in the format of:

provider:module:function:name(arguments)

Since module and function cannot be specified, they will be blank. An example probe definition for Ruby
would then be:

ruby:::method-entry(class name, method name, file name, line number)

Where “ruby” is the provider name, module and function names are blank, the probe name is “method-
entry”, and the probe takes four arguments:
 class name
 method name
 file name
 line number

Probes List

Stability
Before we list the specific probes, let’s talk about stability. Probe stability is declared in the probes.d file at
the bottom on the pragma D attributes lines. Here is a description of each of the stability declarations.
Provider name stability
The provider name of “ruby” has been declared as stable. It is unlikely that we will change the
provider name from “ruby” to something else.
Module and Function stability
Since we are not allowed to provide values for the module and function name, the values we
have provided (no value) is declared as stable.
Probe name stability
The probe names are likely to change in the future, so they are marked as “Evolving”.
Consumers should not depend on these names to be stable.
Probe argument stability
The parameters passed to the probes are likely to change in the future, so they are marked as
“Evolving”. Consumers should not depend on these to be stable.
Declared probes
Probes are defined in the probes.d file. Here are the declared probes along with when they are fired and the
arguments they take:
ruby:::method-entry(classname, methodname, filename, lineno);
This probe is fired just before a method is entered.
classname
name of the class (a string)
methodname
name of the method about to be executed (a string)
filename
the file name where the method is _being called_ (a string)
lineno
the line number where the method is _being called_ (an int)
NOTE: will only be fired if tracing is enabled, e.g. with: TracePoint.new{}.enable.
See Feature#14104 for more details.
ruby:::method-return(classname, methodname, filename, lineno);
This probe is fired just after a method has returned. The arguments are the same as
“ruby:::method-entry”.
NOTE: will only be fired if tracing is enabled, e.g. with: TracePoint.new{}.enable.
See Feature#14104 for more details.
ruby:::cmethod-entry(classname, methodname, filename, lineno);
This probe is fired just before a C method is entered. The arguments are the same as
“ruby:::method-entry”.
ruby:::cmethod-return(classname, methodname, filename, lineno);
This probe is fired just before a C method returns. The arguments are the same as
“ruby:::method-entry”.
ruby:::require-entry(requiredfile, filename, lineno);
This probe is fired on calls to rb_require_safe (when a file is required).
requiredfile
the name of the file to be required (string).
filename
the file that called “require” (string).
lineno
the line number where the call to require was made (int).
ruby:::require-return(requiredfile, filename, lineno);
This probe is fired just before rb_require_safe (when a file is required) returns. The arguments
are the same as “ruby:::require-entry”. This probe will not fire if there was an exception during file
require.
ruby:::find-require-entry(requiredfile, filename, lineno);
This probe is fired right before search_required is called. search_required determines whether
the file has already been required by searching loaded features ($"), and if not, figures out which
file must be loaded.
requiredfile
the file to be required (string).
filename
the file that called “require” (string).
lineno
the line number where the call to require was made (int).
ruby:::find-require-return(requiredfile, filename, lineno);
This probe is fired right after search_required returns. See the documentation for “ruby:::find-
require-entry” for more details. Arguments for this probe are the same as “ruby:::find-require-
entry”.
ruby:::load-entry(loadedfile, filename, lineno);
This probe is fired when calls to “load” are made. The arguments are the same as “ruby:::require-
entry”.
ruby:::load-return(loadedfile, filename, lineno);
This probe is fired when “load” returns. The arguments are the same as “ruby:::load-entry”.
ruby:::raise(classname, filename, lineno);
This probe is fired when an exception is raised.
classname
the class name of the raised exception (string)
filename
the name of the file where the exception was raised (string)
lineno
the line number in the file where the exception was raised (int)
ruby:::object-create(classname, filename, lineno);
This probe is fired when an object is about to be allocated.
classname
the class of the allocated object (string)
filename
the name of the file where the object is allocated (string)
lineno
the line number in the file where the object is allocated (int)
ruby:::array-create(length, filename, lineno);
This probe is fired when an Array is about to be allocated.
length
the size of the array (long)
filename
the name of the file where the array is allocated (string)
lineno
the line number in the file where the array is allocated (int)
ruby:::hash-create(length, filename, lineno);
This probe is fired when a Hash is about to be allocated.
length
the size of the hash (long)
filename
the name of the file where the hash is allocated (string)
lineno
the line number in the file where the hash is allocated (int)
ruby:::string-create(length, filename, lineno);
This probe is fired when a String is about to be allocated.
length
the size of the string (long)
filename
the name of the file where the string is allocated (string)
lineno
the line number in the file where the string is allocated (int)
ruby:::symbol-create(str, filename, lineno);
This probe is fired when a Symbol is about to be allocated.
str
the contents of the symbol (string)
filename
the name of the file where the string is allocated (string)
lineno
the line number in the file where the string is allocated (int)
ruby:::parse-begin(sourcefile, lineno);
Fired just before parsing and compiling a source file.
sourcefile
the file being parsed (string)
lineno
the line number where the source starts (int)
ruby:::parse-end(sourcefile, lineno);
Fired just after parsing and compiling a source file.
sourcefile
the file being parsed (string)
lineno
the line number where the source ended (int)
ruby:::gc-mark-begin();
Fired at the beginning of a mark phase.
ruby:::gc-mark-end();
Fired at the end of a mark phase.
ruby:::gc-sweep-begin();
Fired at the beginning of a sweep phase.
ruby:::gc-sweep-end();
Fired at the end of a sweep phase.
ruby:::method-cache-clear(class, sourcefile, lineno);
Fired when the method cache is cleared.
class
the classname being cleared, or “global” (string)
sourcefile
the file being parsed (string)
lineno
the line number where the source ended (int)

Encodings

The Basics
A character encoding, often shortened to encoding, is a mapping between:
 A sequence of 8-bit bytes (each byte in the range 0..255).
 Characters in a specific character set.

Some character sets contain only 1-byte characters; US-ASCII, for example, has 256 1-byte characters.
This string, encoded in US-ASCII, has six characters that are stored as six bytes:

s = 'Hello!'.encode('US-ASCII') # => "Hello!"


s.encoding # => #<Encoding:US-ASCII>
s.bytes # => [72, 101, 108, 108, 111, 33]

Other encodings may involve multi-byte characters. UTF-8, for example, encodes more than one million
characters, encoding each in one to four bytes. The lowest-valued of these characters correspond to ASCII
characters, and so are 1-byte characters:

s = 'Hello!' # => "Hello!"


s.bytes # => [72, 101, 108, 108, 111, 33]

Other characters, such as the Euro symbol, are multi-byte:

s = "\u20ac" # => "€"


s.bytes # => [226, 130, 172]

The Encoding Class

Encoding Objects
Ruby encodings are defined by constants in class Encoding. There can be only one instance of Encoding
for each of these constants. Method Encoding.list returns an array of Encoding objects (one for each
constant):

Encoding.list.size # => 103


Encoding.list.first.class # => Encoding
Encoding.list.take(3)
# => [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>, #<Encoding:US-ASCII>]

Names and Aliases


Method Encoding#name returns the name of an Encoding:

Encoding::ASCII_8BIT.name # => "ASCII-8BIT"


Encoding::WINDOWS_31J.name # => "Windows-31J"

An Encoding object has zero or more aliases; method Encoding#names returns an array containing the
name and all aliases:
Encoding::ASCII_8BIT.names
# => ["ASCII-8BIT", "BINARY"]
Encoding::WINDOWS_31J.names
#=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"]

Method Encoding.aliases returns a hash of all alias/name pairs:

Encoding.aliases.size # => 71
Encoding.aliases.take(3)
# => [["BINARY", "ASCII-8BIT"], ["CP437", "IBM437"], ["CP720", "IBM720"]]

Method Encoding.name_list returns an array of all the encoding names and aliases:

Encoding.name_list.size # => 175


Encoding.name_list.take(3)
# => ["ASCII-8BIT", "UTF-8", "US-ASCII"]

Method name_list returns more entries than method list because it includes both the names and their
aliases.
Method Encoding.find returns the Encoding for a given name or alias, if it exists:

Encoding.find("US-ASCII") # => #<Encoding:US-ASCII>


Encoding.find("US-ASCII").class # => Encoding

Default Encodings
Method Encoding.find, above, also returns a default Encoding for each of these special names:
 external: the default external Encoding:

 Encoding.find("external") # => #<Encoding:UTF-8>

 internal: the default internal Encoding (may be nil):

 Encoding.find("internal") # => nil

 locale: the default Encoding for a string from the environment:

 Encoding.find("locale") # => #<Encoding:UTF-8> # Linux


 Encoding.find("locale") # => #<Encoding:IBM437> # Windows

 filesystem: the default Encoding for a string from the filesystem:

 Encoding.find("filesystem") # => #<Encoding:UTF-8>

Method Encoding.default_external returns the default external Encoding:


Encoding.default_external # => #<Encoding:UTF-8>

Method Encoding.default_external= sets that value:

Encoding.default_external = 'US-ASCII' # => "US-ASCII"


Encoding.default_external # => #<Encoding:US-ASCII>

Method Encoding.default_internal returns the default internal Encoding:

Encoding.default_internal # => nil

Method Encoding.default_internal= sets the default internal Encoding:

Encoding.default_internal = 'US-ASCII' # => "US-ASCII"


Encoding.default_internal # => #<Encoding:US-ASCII>

Compatible Encodings
Method Encoding.compatible? returns whether two given objects are encoding-compatible (that is, whether
they can be concatenated); returns the Encoding of the concatenated string, or nil if incompatible:

rus = "\u{442 435 441 442}"


eng = 'text'
Encoding.compatible?(rus, eng) # => #<Encoding:UTF-8>

s0 = "\xa1\xa1".force_encoding('iso-8859-1') # => "\xA1\xA1"


s1 = "\xa1\xa1".force_encoding('euc-jp') # => "\x{A1A1}"
Encoding.compatible?(s0, s1) # => nil

String Encoding
A Ruby String object has an encoding that is an instance of class Encoding. The encoding may be retrieved
by method String#encoding.
The default encoding for a string literal is the script encoding (see Script encoding at Encoding):

's'.encoding # => #<Encoding:UTF-8>

The default encoding for a string created with method String.new is:
 For a String object argument, the encoding of that string.

 For a string literal, the script encoding (see Script encoding at Encoding).
In either case, any encoding may be specified:

s = String.new(encoding: 'UTF-8') # => ""


s.encoding # => #<Encoding:UTF-8>
s = String.new('foo', encoding: 'ASCII-8BIT') # => "foo"
s.encoding # => #<Encoding:ASCII-8BIT>

The encoding for a string may be changed:

s = "R\xC3\xA9sum\xC3\xA9" # => "Résumé"


s.encoding # => #<Encoding:UTF-8>
s.force_encoding('ISO-8859-1') # => "R\xC3\xA9sum\xC3\xA9"
s.encoding # => #<Encoding:ISO-8859-1>

Changing the assigned encoding does not alter the content of the string; it changes only the way the content
is to be interpreted:

s # => "R\xC3\xA9sum\xC3\xA9"
s.force_encoding('UTF-8') # => "Résumé"

The actual content of a string may also be altered; see Transcoding a String.
Here are a couple of useful query methods:

s = "abc".force_encoding("UTF-8") # => "abc"


s.ascii_only? # => true
s = "abc\u{6666}".force_encoding("UTF-8") # => "abc 晦"
s.ascii_only? # => false

s = "\xc2\xa1".force_encoding("UTF-8") # => "¡"


s.valid_encoding? # => true
s = "\xc2".force_encoding("UTF-8") # => "\xC2"
s.valid_encoding? # => false

Symbol and Regexp Encodings


The string stored in a Symbol or Regexp object also has an encoding; the encoding may be retrieved by
method Symbol#encoding or Regexp#encoding.
The default encoding for these, however, is:
 US-ASCII, if all characters are US-ASCII.
 The script encoding, otherwise (see Script encoding at Encoding).

Filesystem Encoding
The filesystem encoding is the default Encoding for a string from the filesystem:
Encoding.find("filesystem") # => #<Encoding:UTF-8>

Locale Encoding
The locale encoding is the default encoding for a string from the environment, other than from the
filesystem:

Encoding.find('locale') # => #<Encoding:IBM437>

Stream Encodings
Certain stream objects can have two encodings; these objects include instances of:
 IO.
 File.
 ARGF.
 StringIO.
The two encodings are:
 An external encoding, which identifies the encoding of the stream.
 An internal encoding, which (if not nil) specifies the encoding to be used for the string constructed
from the stream.

External Encoding
The external encoding, which is an Encoding object, specifies how bytes read from the stream are to be
interpreted as characters.
The default external encoding is:
 UTF-8 for a text stream.
 ASCII-8BIT for a binary stream.
The default external encoding is returned by method Encoding.default_external, and may be set by:
 Ruby command-line options --external_encoding or -E.
You can also set the default external encoding using method Encoding.default_external=, but doing so may
cause problems; strings created before and after the change may have a different encodings.
For an IO or File object, the external encoding may be set by:
 Open options external_encoding or encoding, when the object is created; see Open Options.
For an IO, File, ARGF, or StringIO object, the external encoding may be set by:
 Methods set_encoding or (except for ARGF) set_encoding_by_bom.

Internal Encoding
The internal encoding, which is an Encoding object or nil, specifies how characters read from the stream are
to be converted to characters in the internal encoding; those characters become a string whose encoding is
set to the internal encoding.
The default internal encoding is nil (no conversion). It is returned by method Encoding.default_internal, and
may be set by:
 Ruby command-line options --internal_encoding or -E.
You can also set the default internal encoding using method Encoding.default_internal=, but doing so may
cause problems; strings created before and after the change may have a different encodings.
For an IO or File object, the internal encoding may be set by:
 Open options internal_encoding or encoding, when the object is created; see Open Options.
For an IO, File, ARGF, or StringIO object, the internal encoding may be set by:
 Method set_encoding.

Script Encoding
A Ruby script has a script encoding, which may be retrieved by:

__ENCODING__ # => #<Encoding:UTF-8>

The default script encoding is UTF-8; a Ruby source file may set its script encoding with a magic comment
on the first line of the file (or second line, if there is a shebang on the first). The comment must contain the
word coding or encoding, followed by a colon, space and the Encoding name or alias:

# encoding: ISO-8859-1
__ENCODING__ #=> #<Encoding:ISO-8859-1>

Transcoding
Transcoding is the process of changing a sequence of characters from one encoding to another.
As far as possible, the characters remain the same, but the bytes that represent them may change.
The handling for characters that cannot be represented in the destination encoding may be specified by
@Encoding+Options.

Transcoding a String
Each of these methods transcodes a string:
 String#encode: Transcodes self into a new string according to given encodings and options.
 String#encode!: Like String#encode, but transcodes self in place.
 String#scrub: Transcodes self into a new string by replacing invalid byte sequences with a given or
default replacement string.
 String#scrub!: Like String#scrub, but transcodes self in place.
 String#unicode_normalize: Transcodes self into a new string according to Unicode normalization.
 String#unicode_normalize!: Like String#unicode_normalize, but transcodes self in place.
Transcoding a Stream
Each of these methods may transcode a stream; whether it does so depends on the external and internal
encodings:
 IO.foreach: Yields each line of given stream to the block.
 IO.new: Creates and returns a new IO object for the given integer file descriptor.
 IO.open: Creates a new IO object.
 IO.pipe: Creates a connected pair of reader and writer IO objects.
 IO.popen: Creates an IO object to interact with a subprocess.
 IO.read: Returns a string with all or a subset of bytes from the given stream.
 IO.readlines: Returns an array of strings, which are the lines from the given stream.
 IO.write: Writes a given string to the given stream.
This example writes a string to a file, encoding it as ISO-8859-1, then reads the file into a new string,
encoding it as UTF-8:

s = "R\u00E9sum\u00E9"
path = 't.tmp'
ext_enc = 'ISO-8859-1'
int_enc = 'UTF-8'

File.write(path, s, external_encoding: ext_enc)


raw_text = File.binread(path)

transcoded_text = File.read(path, external_encoding: ext_enc, internal_encoding: int_enc)

p raw_text
p transcoded_text

Output:

"R\xE9sum\xE9"
"Résumé"

Encoding Options
A number of methods in the Ruby core accept keyword arguments as encoding options.
Some of the options specify or utilize a replacement string, to be used in certain transcoding operations. A
replacement string may be in any encoding that can be converted to the encoding of the destination string.
These keyword-value pairs specify encoding options:
 For an invalid byte sequence:
o :invalid: nil (default): Raise exception.
o :invalid: :replace: Replace each invalid byte sequence with the replacement string.
Examples:
s = "\x80foo\x80"
s.encode('ISO-8859-3') # Raises Encoding::InvalidByteSequenceError.
s.encode('ISO-8859-3', invalid: :replace) # => "?foo?"

 For an undefined character:


o :undef: nil (default): Raise exception.
o :undef: :replace: Replace each undefined character with the replacement string.
Examples:

s = "\x80foo\x80"
"\x80".encode('UTF-8', 'ASCII-8BIT') # Raises Encoding::UndefinedConversionError.
s.encode('UTF-8', 'ASCII-8BIT', undef: :replace) # => "�foo�"

 Replacement string:
o :replace: nil (default): Set replacement string to default value: "\uFFFD"(“�”) for a
Unicode encoding, '?' otherwise.
o :replace: some_string: Set replacement string to the given some_string;
overrides :fallback.
Examples:

s = "\xA5foo\xA5"
options = {:undef => :replace, :replace => 'xyzzy'}
s.encode('UTF-8', 'ISO-8859-3', **options) # => "xyzzyfooxyzzy"

 Replacement fallback:
One of these may be specified:
o :fallback: nil (default): No replacement fallback.
o :fallback: hash_like_object: Set replacement fallback to the given hash_like_object; the
replacement string is hash_like_object[X].
o :fallback: method: Set replacement fallback to the given method; the replacement string
is method(X).
o :fallback: proc: Set replacement fallback to the given proc; the replacement string
is proc[X].
Examples:

s = "\u3042foo\u3043"

hash = {"\u3042" => 'xyzzy'}


hash.default = 'XYZZY'
s.encode('ASCII', fallback: h) # => "xyzzyfooXYZZY"

def (fallback = "U+%.4X").escape(x)


self % x.unpack("U")
end
"\u{3042}".encode("US-ASCII", fallback: fallback.method(:escape)) # => "U+3042"

proc = Proc.new {|x| x == "\u3042" ? 'xyzzy' : 'XYZZY' }


s.encode('ASCII', fallback: proc) # => "XYZZYfooXYZZY"

 XML entities:
One of these may be specified:
o :xml: nil (default): No handling for XML entities.
o :xml: :text: Treat source text as XML; replace each undefined character with its upper-
case hexdecimal numeric character reference, except that:
 & is replaced with &amp;.
 < is replaced with &lt;.
 > is replaced with &gt;.
o :xml: :attr: Treat source text as XML attribute value; replace each undefined character
with its upper-case hexdecimal numeric character reference, except that:
 The replacement string r is double-quoted ("r").
 Each embedded double-quote is replaced with &quot;.
 & is replaced with &amp;.
 < is replaced with &lt;.
 > is replaced with &gt;.
Examples:

s = 'foo"<&>"bar' + "\u3042"
s.encode('ASCII', xml: :text) # => "foo\"&lt;&amp;&gt;\"bar&#x3042;"
s.encode('ASCII', xml: :attr) # => "\"foo&quot;&lt;&amp;&gt;&quot;bar&#x3042;\""

 Newlines:
One of these may be specified:
o :cr_newline: true: Replace each line-feed character ("\n") with a carriage-return character
("\r").
o :crlf_newline: true: Replace each line-feed character ("\n") with a carriage-return/line-feed
string ("\r\n").
o :universal_newline: true: Replace each carriage-return character ("\r") and each carriage-
return/line-feed string ("\r\n") with a line-feed character ("\n").
Examples:

s = "\n \r \r\n" # => "\n \r \r\n"


s.encode('ASCII', cr_newline: true) # => "\r \r \r\r"
s.encode('ASCII', crlf_newline: true) # => "\r\n \r \r\r\n"
s.encode('ASCII', universal_newline: true) # => "\n \n \n"

Creating Extension Libraries for Ruby


This document explains how to make extension libraries for Ruby.

Basic Knowledge
In C, variables have types and data do not have types. In contrast, Ruby variables do not have a static type,
and data themselves have types, so data will need to be converted between the languages.
Data in Ruby are represented by the C type ‘VALUE’. Each VALUE data has its data type.
To retrieve C data from a VALUE, you need to:
1. Identify the VALUE’s data type
2. Convert the VALUE into C data
Converting to the wrong data type may cause serious problems.
Data Types
The Ruby interpreter has the following data types:
T_NIL
nil
T_OBJECT
ordinary object
T_CLASS
class
T_MODULE
module
T_FLOAT
floating point number
T_STRING
string
T_REGEXP
regular expression
T_ARRAY
array
T_HASH
associative array
T_STRUCT
(Ruby) structure
T_BIGNUM
multi precision integer
T_FIXNUM
Fixnum(31bit or 63bit integer)
T_COMPLEX
complex number
T_RATIONAL
rational number
T_FILE
IO
T_TRUE
true
T_FALSE
false
T_DATA
data
T_SYMBOL
symbol
In addition, there are several other types used internally:
T_ICLASS
included module
T_MATCH
MatchData object
T_UNDEF
undefined
T_NODE
syntax tree node
T_ZOMBIE
object awaiting finalization
Most of the types are represented by C structures.
Check Data Type of the VALUE
The macro TYPE() defined in ruby.h shows the data type of the VALUE. TYPE() returns the constant
number T_XXXX described above. To handle data types, your code will look something like this:

switch (TYPE(obj)) {

case T_FIXNUM:

/* process Fixnum */

break;

case T_STRING:

/* process String */

break;

case T_ARRAY:

/* process Array */

break;

default:

/* raise exception */

rb_raise(rb_eTypeError, "not valid value");

break;

There is the data type check function

void Check_Type(VALUE value, int type)

which raises an exception if the VALUE does not have the type specified.
There are also faster check macros for fixnums and nil.

FIXNUM_P(obj)
NIL_P(obj)

Convert VALUE into C Data


The data for type T_NIL, T_FALSE, T_TRUE are nil, false, true respectively. They are singletons for the
data type. The equivalent C constants are: Qnil, Qfalse, Qtrue. RTEST() will return true if a VALUE is
neither Qfalse nor Qnil. If you need to differentiate Qfalse from Qnil, specifically test against Qfalse.
The T_FIXNUM data is a 31bit or 63bit length fixed integer. This size depends on the size of long: if long is
32bit then T_FIXNUM is 31bit, if long is 64bit then T_FIXNUM is 63bit. T_FIXNUM can be converted to a C
integer by using the FIX2INT() macro or FIX2LONG(). Though you have to check that the data is really
FIXNUM before using them, they are faster. FIX2LONG() never raises exceptions, but FIX2INT()
raises RangeError if the result is bigger or smaller than the size of int. There are also NUM2INT() and
NUM2LONG() which converts any Ruby numbers into C integers. These macros include a type check, so
an exception will be raised if the conversion failed. NUM2DBL() can be used to retrieve the double float
value in the same way.
You can use the macros StringValue() and StringValuePtr() to get a char* from a VALUE. StringValue(var)
replaces var’s value with the result of “var.to_str()”. StringValuePtr(var) does the same replacement and
returns the char* representation of var. These macros will skip the replacement if var is a String. Notice that
the macros take only the lvalue as their argument, to change the value of var in place.
You can also use the macro named StringValueCStr(). This is just like StringValuePtr(), but always adds a
NUL character at the end of the result. If the result contains a NUL character, this macro causes
the ArgumentError exception. StringValuePtr() doesn’t guarantee the existence of a NUL at the end of the
result, and the result may contain NUL.
Other data types have corresponding C structures, e.g. struct RArray for T_ARRAY etc. The VALUE of the
type which has the corresponding structure can be cast to retrieve the pointer to the struct. The casting
macro will be of the form RXXXX for each data type; for instance, RARRAY(obj). See “ruby.h”. However, we
do not recommend to access RXXXX data directly because these data structures are complex. Use
corresponding rb_xxx() functions to access the internal struct. For example, to access an entry of array, use
rb_ary_entry(ary, offset) and rb_ary_store(ary, offset, obj).
There are some accessing macros for structure members, for example ‘RSTRING_LEN(str)’ to get the size
of the Ruby String object. The allocated region can be accessed by ‘RSTRING_PTR(str)’.
Notice: Do not change the value of the structure directly, unless you are responsible for the result. This ends
up being the cause of interesting bugs.
Convert C Data into VALUE
To convert C data to Ruby values:
FIXNUM
left shift 1 bit, and turn on its least significant bit (LSB).
Other pointer values
cast to VALUE.
You can determine whether a VALUE is a pointer or not by checking its LSB.
Notice: Ruby does not allow arbitrary pointer values to be a VALUE. They should be pointers to the
structures which Ruby knows about. The known structures are defined in <ruby.h>.
To convert C numbers to Ruby values, use these macros:
INT2FIX()
for integers within 31bits.
INT2NUM()
for arbitrary sized integers.
INT2NUM() converts an integer into a Bignum if it is out of the FIXNUM range, but is a bit slower.
Manipulating Ruby Data
As I already mentioned, it is not recommended to modify an object’s internal structure. To manipulate
objects, use the functions supplied by the Ruby interpreter. Some (not all) of the useful functions are listed
below:
String Functions
rb_str_new(const char *ptr, long len)
Creates a new Ruby string.
rb_str_new2(const char *ptr)
rb_str_new_cstr(const char *ptr)
Creates a new Ruby string from a C string. This is equivalent to rb_str_new(ptr, strlen(ptr)).
rb_str_new_literal(const char *ptr)
Creates a new Ruby string from a C string literal.
rb_sprintf(const char *format, …)
rb_vsprintf(const char *format, va_list ap)
Creates a new Ruby string with printf(3) format.
Note: In the format string, “%”PRIsVALUE can be used for Object#to_s (or Object#inspect if ‘+’
flag is set) output (and related argument must be a VALUE). Since it conflicts with “%i”, for
integers in format strings, use “%d”.
rb_str_append(VALUE str1, VALUE str2)
Appends Ruby string str2 to Ruby string str1.
rb_str_cat(VALUE str, const char *ptr, long len)
Appends len bytes of data from ptr to the Ruby string.
rb_str_cat2(VALUE str, const char* ptr)
rb_str_cat_cstr(VALUE str, const char* ptr)
Appends C string ptr to Ruby string str. This function is equivalent to rb_str_cat(str, ptr,
strlen(ptr)).
rb_str_catf(VALUE str, const char* format, …)
rb_str_vcatf(VALUE str, const char* format, va_list ap)
Appends C string format and successive arguments to Ruby string str according to a printf-like
format. These functions are equivalent to rb_str_append(str, rb_sprintf(format, …)) and
rb_str_append(str, rb_vsprintf(format, ap)), respectively.
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc)
Creates a new Ruby string with the specified encoding.
rb_enc_str_new_literal(const char *ptr, rb_encoding *enc)
Creates a new Ruby string from a C string literal with the specified encoding.
rb_usascii_str_new(const char *ptr, long len)
rb_usascii_str_new_cstr(const char *ptr)
Creates a new Ruby string with encoding US-ASCII.
rb_usascii_str_new_literal(const char *ptr)
Creates a new Ruby string from a C string literal with encoding US-ASCII.
rb_utf8_str_new(const char *ptr, long len)
rb_utf8_str_new_cstr(const char *ptr)
Creates a new Ruby string with encoding UTF-8.
rb_utf8_str_new_literal(const char *ptr)
Creates a new Ruby string from a C string literal with encoding UTF-8.
rb_str_resize(VALUE str, long len)
Resizes a Ruby string to len bytes. If str is not modifiable, this function raises an exception. The
length of str must be set in advance. If len is less than the old length the content beyond len
bytes is discarded, else if len is greater than the old length the content beyond the old length
bytes will not be preserved but will be garbage. Note that RSTRING_PTR(str) may change by
calling this function.
rb_str_set_len(VALUE str, long len)
Sets the length of a Ruby string. If str is not modifiable, this function raises an exception. This
function preserves the content up to len bytes, regardless RSTRING_LEN(str). len must not
exceed the capacity of str.
rb_str_modify(VALUE str)
Prepares a Ruby string to modify. If str is not modifiable, this function raises an exception, or if
the buffer of str is shared, this function allocates new buffer to make it unshared. Always you
MUST call this function before modifying the contents using RSTRING_PTR and/or
rb_str_set_len.
Array Functions
rb_ary_new()
Creates an array with no elements.
rb_ary_new2(long len)
rb_ary_new_capa(long len)
Creates an array with no elements, allocating internal buffer for len elements.
rb_ary_new3(long n, …)
rb_ary_new_from_args(long n, …)
Creates an n-element array from the arguments.
rb_ary_new4(long n, VALUE *elts)
rb_ary_new_from_values(long n, VALUE *elts)
Creates an n-element array from a C array.
rb_ary_to_ary(VALUE obj)
Converts the object into an array. Equivalent to Object#to_ary.
There are many functions to operate an array. They may dump core if other types are given.
rb_ary_aref(int argc, const VALUE *argv, VALUE ary)
Equivalent to Array#[].
rb_ary_entry(VALUE ary, long offset)
ary[offset]
rb_ary_store(VALUE ary, long offset, VALUE obj)
ary[offset] = obj
rb_ary_subseq(VALUE ary, long beg, long len)
ary[beg, len]
rb_ary_push(VALUE ary, VALUE val)
rb_ary_pop(VALUE ary)
rb_ary_shift(VALUE ary)
rb_ary_unshift(VALUE ary, VALUE val)
ary.push, ary.pop, ary.shift, ary.unshift
rb_ary_cat(VALUE ary, const VALUE *ptr, long len)
Appends len elements of objects from ptr to the array.

Extending Ruby with C

Adding New Features to Ruby


You can add new features (classes, methods, etc.) to the Ruby interpreter. Ruby provides APIs for defining
the following things:
 Classes, Modules
 Methods, Singleton Methods
 Constants
Class and Module Definition
To define a class or module, use the functions below:
VALUE rb_define_class(const char *name, VALUE super)

VALUE rb_define_module(const char *name)

These functions return the newly created class or module. You may want to save this reference into a
variable to use later.
To define nested classes or modules, use the functions below:

VALUE rb_define_class_under(VALUE outer, const char *name, VALUE super)

VALUE rb_define_module_under(VALUE outer, const char *name)

Method and Singleton Method Definition


To define methods or singleton methods, use these functions:

void rb_define_method(VALUE klass, const char *name,

VALUE (*func)(ANYARGS), int argc)

void rb_define_singleton_method(VALUE object, const char *name,

VALUE (*func)(ANYARGS), int argc)

The ‘argc’ represents the number of the arguments to the C function, which must be less than 17. But I
doubt you’ll need that many.
If ‘argc’ is negative, it specifies the calling sequence, not number of the arguments.
If argc is -1, the function will be called as:

VALUE func(int argc, VALUE *argv, VALUE obj)

where argc is the actual number of arguments, argv is the C array of the arguments, and obj is the receiver.
If argc is -2, the arguments are passed in a Ruby array. The function will be called like:

VALUE func(VALUE obj, VALUE args)


where obj is the receiver, and args is the Ruby array containing actual arguments.
There are some more functions to define methods. One takes an ID as the name of method to be defined.
See also ID or Symbol below.

void rb_define_method_id(VALUE klass, ID name,

VALUE (*func)(ANYARGS), int argc)

There are two functions to define private/protected methods:

void rb_define_private_method(VALUE klass, const char *name,

VALUE (*func)(ANYARGS), int argc)

void rb_define_protected_method(VALUE klass, const char *name,

VALUE (*func)(ANYARGS), int argc)

At last, rb_define_module_function defines a module function, which are private AND singleton methods of
the module. For example, sqrt is a module function defined in the Math module. It can be called in the
following way:

Math.sqrt(4)

or

include Math
sqrt(4)

To define module functions, use:

void rb_define_module_function(VALUE module, const char *name,

VALUE (*func)(ANYARGS), int argc)

In addition, function-like methods, which are private methods defined in the Kernel module, can be defined
using:

void rb_define_global_function(const char *name, VALUE (*func)(ANYARGS), int argc)

To define an alias for the method,


void rb_define_alias(VALUE module, const char* new, const char* old);

To define a reader/writer for an attribute,

void rb_define_attr(VALUE klass, const char *name, int read, int write)

To define and undefine the ‘allocate’ class method,

void rb_define_alloc_func(VALUE klass, VALUE (*func)(VALUE klass));

void rb_undef_alloc_func(VALUE klass);

func has to take the klass as the argument and return a newly allocated instance. This instance should be
as empty as possible, without any expensive (including external) resources.
If you are overriding an existing method of any ancestor of your class, you may rely on:

VALUE rb_call_super(int argc, const VALUE *argv)

To specify whether keyword arguments are passed when calling super:

VALUE rb_call_super_kw(int argc, const VALUE *argv, int kw_splat)

kw_splat can have these possible values (used by all methods that accept kw_splat argument):
RB_NO_KEYWORDS
Do not pass keywords
RB_PASS_KEYWORDS
Pass keywords, final argument should be a hash of keywords
RB_PASS_CALLED_KEYWORDS
Pass keywords if current method was called with keywords, useful for argument delegation
To achieve the receiver of the current scope (if no other way is available), you can use:

VALUE rb_current_receiver(void)

Constant Definition
We have 2 functions to define constants:

void rb_define_const(VALUE klass, const char *name, VALUE val)

void rb_define_global_const(const char *name, VALUE val)

The former is to define a constant under specified class/module. The latter is to define a global constant.

Use Ruby Features from C


There are several ways to invoke Ruby’s features from C code.
Evaluate Ruby Programs in a String
The easiest way to use Ruby’s functionality from a C program is to evaluate the string as Ruby program.
This function will do the job:

VALUE rb_eval_string(const char *str)

Evaluation is done under the current context, thus current local variables of the innermost method (which is
defined by Ruby) can be accessed.
Note that the evaluation can raise an exception. There is a safer function:

VALUE rb_eval_string_protect(const char *str, int *state)

It returns nil when an error occurred. Moreover, *state is zero if str was successfully evaluated, or nonzero
otherwise.
ID or Symbol
You can invoke methods directly, without parsing the string. First I need to explain about ID. ID is the integer
number to represent Ruby’s identifiers such as variable names. The Ruby data type corresponding to ID
is Symbol. It can be accessed from Ruby in the form:

:Identifier

or

:"any kind of string"

You can get the ID value from a string within C code by using

rb_intern(const char *name)


rb_intern_str(VALUE name)

You can retrieve ID from Ruby object (Symbol or String) given as an argument by using
rb_to_id(VALUE symbol)

rb_check_id(volatile VALUE *name)

rb_check_id_cstr(const char *name, long len, rb_encoding *enc)

These functions try to convert the argument to a String if it was not a Symbol nor a String. The second
function stores the converted result into *name, and returns 0 if the string is not a known symbol. After this
function returned a non-zero value, *name is always a Symbol or a String, otherwise it is a String if the
result is 0. The third function takes NUL-terminated C string, not Ruby VALUE.
You can retrieve Symbol from Ruby object (Symbol or String) given as an argument by using

rb_to_symbol(VALUE name)

rb_check_symbol(volatile VALUE *namep)

rb_check_symbol_cstr(const char *ptr, long len, rb_encoding *enc)

These functions are similar to above functions except that these return a Symbol instead of an ID.
You can convert C ID to Ruby Symbol by using

VALUE ID2SYM(ID id)

and to convert Ruby Symbol object to ID, use

ID SYM2ID(VALUE symbol)

Invoke Ruby Method from C


To invoke methods directly, you can use the function below

VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)

This function invokes a method on the recv, with the method name specified by the symbol mid.

Accessing the Variables and Constants


You can access class variables and instance variables using access functions. Also, global variables can be
shared between both environments. There’s no way to access Ruby’s local variables.
The functions to access/modify instance variables are below:
VALUE rb_ivar_get(VALUE obj, ID id)

VALUE rb_ivar_set(VALUE obj, ID id, VALUE val)

id must be the symbol, which can be retrieved by rb_intern().


To access the constants of the class/module:

VALUE rb_const_get(VALUE obj, ID id)

See also Constant Definition above.

Information Sharing Between Ruby and C

Ruby Constants That Can Be Accessed From C


As stated in section 1.3, the following Ruby constants can be referred from C.
Qtrue
Qfalse
Boolean values. Qfalse is false in C also (i.e. 0).
Qnil
Ruby nil in C scope.

Global Variables Shared Between C and Ruby


Information can be shared between the two environments using shared global variables. To define them,
you can use functions listed below:

void rb_define_variable(const char *name, VALUE *var)

This function defines the variable which is shared by both environments. The value of the global variable
pointed to by ‘var’ can be accessed through Ruby’s global variable named ‘name’.
You can define read-only (from Ruby, of course) variables using the function below.

void rb_define_readonly_variable(const char *name, VALUE *var)

You can define hooked variables. The accessor functions (getter and setter) are called on access to the
hooked variables.
void rb_define_hooked_variable(const char *name, VALUE *var,

VALUE (*getter)(), void (*setter)())

If you need to supply either setter or getter, just supply 0 for the hook you don’t need. If both hooks are 0,
rb_define_hooked_variable() works just like rb_define_variable().
The prototypes of the getter and setter functions are as follows:

VALUE (*getter)(ID id, VALUE *var);

void (*setter)(VALUE val, ID id, VALUE *var);

Also you can define a Ruby global variable without a corresponding C variable. The value of the variable will
be set/get only by hooks.

void rb_define_virtual_variable(const char *name,

VALUE (*getter)(), void (*setter)())

The prototypes of the getter and setter functions are as follows:

VALUE (*getter)(ID id);

void (*setter)(VALUE val, ID id);

Encapsulate C Data into a Ruby Object


Sometimes you need to expose your struct in the C world as a Ruby object. In a situation like this, making
use of the TypedData_XXX macro family, the pointer to the struct and the Ruby object can be mutually
converted.

C struct to Ruby object


You can convert sval, a pointer to your struct, into a Ruby object with the next macro.

TypedData_Wrap_Struct(klass, data_type, sval)

TypedData_Wrap_Struct() returns a created Ruby object as a VALUE.


The klass argument is the class for the object. The klass should derive from rb_cObject, and the allocator
must be set by calling rb_define_alloc_func or rb_undef_alloc_func.
data_type is a pointer to a const rb_data_type_t which describes how Ruby should manage the struct.
rb_data_type_t is defined like this. Let’s take a look at each member of the struct.

typedef struct rb_data_type_struct rb_data_type_t;

struct rb_data_type_struct {

const char *wrap_struct_name;

struct {

void (*dmark)(void*);

void (*dfree)(void*);

size_t (*dsize)(const void *);

void (*dcompact)(void*);

void *reserved[1];

} function;

const rb_data_type_t *parent;

void *data;

VALUE flags;

};

wrap_struct_name is an identifier of this instance of the struct. It is basically used for collecting and emitting
statistics. So the identifier must be unique in the process, but doesn’t need to be valid as a C or Ruby
identifier.
These dmark / dfree functions are invoked during GC execution. No object allocations are allowed during it,
so do not allocate ruby objects inside them.
dmark is a function to mark Ruby objects referred from your struct. It must mark all references from your
struct with rb_gc_mark or its family if your struct keeps such references.
dfree is a function to free the pointer allocation. If this is RUBY_DEFAULT_FREE, the pointer will be just
freed.
dsize calculates memory consumption in bytes by the struct. Its parameter is a pointer to your struct. You
can pass 0 as dsize if it is hard to implement such a function. But it is still recommended to avoid 0.
dcompact is invoked when memory compaction took place. Referred Ruby objects that were marked by
rb_gc_mark_movable() can here be updated per rb_gc_location().
You have to fill reserved with 0.
parent can point to another C type definition that the Ruby object is inherited from. Then
TypedData_Get_Struct() does also accept derived objects.
You can fill “data” with an arbitrary value for your use. Ruby does nothing with the member.
flags is a bitwise-OR of the following flag values. Since they require deep understanding of garbage
collector in Ruby, you can just set 0 to flags if you are not sure.
RUBY_TYPED_FREE_IMMEDIATELY
This flag makes the garbage collector immediately invoke dfree() during GC when it need to free
your struct. You can specify this flag if the dfree never unlocks Ruby’s internal lock (GVL).
If this flag is not set, Ruby defers invocation of dfree() and invokes dfree() at the same time as
finalizers.
RUBY_TYPED_WB_PROTECTED
It shows that implementation of the object supports write barriers. If this flag is set, Ruby is better
able to do garbage collection of the object.
When it is set, however, you are responsible for putting write barriers in all implementations of
methods of that object as appropriate. Otherwise Ruby might crash while running.
More about write barriers can be found in “Generational GC” in Appendix D.
RUBY_TYPED_FROZEN_SHAREABLE
This flag indicates that the object is shareable object if the object is frozen. See Appendix F more
details.
If this flag is not set, the object can not become a shareable object
by Ractor.make_shareable() method.
You can allocate and wrap the structure in one step.

TypedData_Make_Struct(klass, type, data_type, sval)

This macro returns an allocated Data object, wrapping the pointer to the structure, which is also allocated.
This macro works like:

(sval = ZALLOC(type), TypedData_Wrap_Struct(klass, data_type, sval))

Arguments klass and data_type work like their counterparts in TypedData_Wrap_Struct(). A pointer to the
allocated structure will be assigned to sval, which should be a pointer of the type specified.

Ruby object to C struct


To retrieve the C pointer from the Data object, use the macro TypedData_Get_Struct().
TypedData_Get_Struct(obj, type, &data_type, sval)

A pointer to the structure will be assigned to the variable sval.


See the example below for details.

Example - Creating the dbm Extension


OK, here’s the example of making an extension library. This is the extension to access DBMs. The full
source is included in the ext/ directory in the Ruby’s source tree.

Make the Directory

% mkdir ext/dbm

Make a directory for the extension library under ext directory.

Design the Library


You need to design the library features, before making it.

Write the C Code


You need to write C code for your extension library. If your library has only one source file, choosing
“LIBRARY.c” as a file name is preferred. On the other hand, in case your library has multiple source files,
avoid choosing “LIBRARY.c” for a file name. It may conflict with an intermediate file “LIBRARY.o” on some
platforms. Note that some functions in mkmf library described below generate a file “conftest.c” for checking
with compilation. You shouldn’t choose “conftest.c” as a name of a source file.
Ruby will execute the initializing function named “Init_LIBRARY” in the library. For example, “Init_dbm()” will
be executed when loading the library.
Here’s the example of an initializing function.

#include <ruby.h>

void

Init_dbm(void)

/* define DBM class */


VALUE cDBM = rb_define_class("DBM", rb_cObject);

/* Redefine DBM.allocate

rb_define_alloc_func(cDBM, fdbm_alloc);

/* DBM includes Enumerable module */

rb_include_module(cDBM, rb_mEnumerable);

/* DBM has class method open(): arguments are received as C array */

rb_define_singleton_method(cDBM, "open", fdbm_s_open, -1);

/* DBM instance method close(): no args */

rb_define_method(cDBM, "close", fdbm_close, 0);

/* DBM instance method []: 1 argument */

rb_define_method(cDBM, "[]", fdbm_aref, 1);

/* ... */

/* ID for a instance variable to store DBM data */

id_dbm = rb_intern("dbm");

The dbm extension wraps the dbm struct in the C environment using TypedData_Make_Struct.

struct dbmdata {

int di_size;

DBM *di_dbm;
};

static const rb_data_type_t dbm_type = {

"dbm",

{0, free_dbm, memsize_dbm,},

0, 0,

RUBY_TYPED_FREE_IMMEDIATELY,

};

static VALUE

fdbm_alloc(VALUE klass)

struct dbmdata *dbmp;

/* Allocate T_DATA object and C struct and fill struct with zero bytes */

return TypedData_Make_Struct(klass, struct dbmdata, &dbm_type, dbmp);

This code wraps the dbmdata structure into a Ruby object. We avoid wrapping DBM* directly, because we
want to cache size information. Since Object.allocate allocates an ordinary T_OBJECT type (instead of
T_DATA), it’s important to either use rb_define_alloc_func() to overwrite it or rb_undef_alloc_func() to delete
it.
To retrieve the dbmdata structure from a Ruby object, we define the following macro:

#define GetDBM(obj, dbmp) do {\

TypedData_Get_Struct((obj), struct dbmdata, &dbm_type, (dbmp));\

if ((dbmp) == 0) closed_dbm();\
if ((dbmp)->di_dbm == 0) closed_dbm();\

} while (0)

This sort of complicated macro does the retrieving and close checking for the DBM.
There are three kinds of way to receive method arguments. First, methods with a fixed number of
arguments receive arguments like this:

static VALUE

fdbm_aref(VALUE obj, VALUE keystr)

struct dbmdata *dbmp;

GetDBM(obj, dbmp);

/* Use dbmp to access the key */

dbm_fetch(dbmp->di_dbm, StringValueCStr(keystr));

/* ... */

The first argument of the C function is the self, the rest are the arguments to the method.
Second, methods with an arbitrary number of arguments receive arguments like this:

static VALUE

fdbm_s_open(int argc, VALUE *argv, VALUE klass)

/* ... */

if (rb_scan_args(argc, argv, "11", &file, &vmode) == 1) {

mode = 0666; /* default value */

}
/* ... */

The first argument is the number of method arguments, the second argument is the C array of the method
arguments, and the third argument is the receiver of the method.
You can use the function rb_scan_args() to check and retrieve the arguments. The third argument is a string
that specifies how to capture method arguments and assign them to the following VALUE references.
You can just check the argument number with rb_check_arity(), this is handy in the case you want to treat
the arguments as a list.
The following is an example of a method that takes arguments by Ruby’s array:

static VALUE

thread_initialize(VALUE thread, VALUE args)

/* ... */

The first argument is the receiver, the second one is the Ruby array which contains the arguments to the
method.
Notice: GC should know about global variables which refer to Ruby’s objects, but are not exported to the
Ruby world. You need to protect them by

void rb_global_variable(VALUE *var)

or the objects themselves by

void rb_gc_register_mark_object(VALUE object)

Prepare extconf.rb
If the file named extconf.rb exists, it will be executed to generate Makefile.
extconf.rb is the file for checking compilation conditions etc. You need to put

require 'mkmf'
at the top of the file. You can use the functions below to check various conditions.

append_cppflags(array-of-flags[, opt]): append each flag to $CPPFLAGS if usable

append_cflags(array-of-flags[, opt]): append each flag to $CFLAGS if usable

append_ldflags(array-of-flags[, opt]): append each flag to $LDFLAGS if usable

have_macro(macro[, headers[, opt]]): check whether macro is defined

have_library(lib[, func[, headers[, opt]]]): check whether library containing function exists

find_library(lib[, func, *paths]): find library from paths

have_func(func[, headers[, opt]): check whether function exists

have_var(var[, headers[, opt]]): check whether variable exists

have_header(header[, preheaders[, opt]]): check whether header file exists

find_header(header, *paths): find header from paths

have_framework(fw): check whether framework exists (for MacOS X)

have_struct_member(type, member[, headers[, opt]]): check whether struct has member

have_type(type[, headers[, opt]]): check whether type exists

find_type(type, opt, *headers): check whether type exists in headers

have_const(const[, headers[, opt]]): check whether constant is defined

check_sizeof(type[, headers[, opts]]): check size of type

check_signedness(type[, headers[, opts]]): check signedness of type

convertible_int(type[, headers[, opts]]): find convertible integer type

find_executable(bin[, path]): find executable file path

create_header(header): generate configured header

create_makefile(target[, target_prefix]): generate Makefile

See MakeMakefile for full documentation of these functions.


The value of the variables below will affect the Makefile.
$CFLAGS: included in CFLAGS make variable (such as -O)

$CPPFLAGS: included in CPPFLAGS make variable (such as -I, -D)

$LDFLAGS: included in LDFLAGS make variable (such as -L)

$objs: list of object file names

Compiler/linker flags are not portable usually, you should


use append_cppflags, append_cpflags and append_ldflags respectively instead of appending the above
variables directly.
Normally, the object files list is automatically generated by searching source files, but you must define them
explicitly if any sources will be generated while building.
If a compilation condition is not fulfilled, you should not call “create_makefile”. The Makefile will not be
generated, compilation will not be done.

Prepare Depend (Optional)


If the file named depend exists, Makefile will include that file to check dependencies. You can make this file
by invoking

% gcc -MM *.c > depend

It’s harmless. Prepare it.

Generate Makefile
Try generating the Makefile by:

ruby extconf.rb

If the library should be installed under vendor_ruby directory instead of site_ruby directory, use –vendor
option as follows.

ruby extconf.rb --vendor

You don’t need this step if you put the extension library under the ext directory of the ruby source tree. In
that case, compilation of the interpreter will do this step for you.

Run make
Type
make

to compile your extension. You don’t need this step either if you have put the extension library under the ext
directory of the ruby source tree.

Debug
You may need to rb_debug the extension. Extensions can be linked statically by adding the directory name
in the ext/Setup file so that you can inspect the extension with the debugger.

Done! Now You Have the Extension Library


You can do anything you want with your library. The author of Ruby will not claim any restrictions on your
code depending on the Ruby API. Feel free to use, modify, distribute or sell your program.

Appendix A. Ruby Header and Source Files Overview

Ruby Header Files


Everything under $repo_root/include/ruby is installed with make install. It should be included per #include
<ruby.h> from C extensions. All symbols are public API with the exception of symbols prefixed
with rbimpl_ or RBIMPL_. They are implementation details and shouldn’t be used by C extensions.
Only $repo_root/include/ruby/*.h whose corresponding macros are defined in
the $repo_root/include/ruby.h header are allowed to be #include-d by C extensions.
Header files under $repo_root/internal/ or directly under the root $repo_root/*.h are not make-installed. They
are internal headers with only internal APIs.

Ruby Language Core


class.c
classes and modules
error.c
exception classes and exception mechanism
gc.c
memory management
load.c
library loading
object.c
objects
variable.c
variables and constants

Ruby Syntax Parser


parse.y
grammar definition
parse.c
automatically generated from parse.y
defs/keywords
reserved keywords
lex.c
automatically generated from keywords

Ruby Evaluator (a.k.a. YARV)

compile.c

eval.c

eval_error.c

eval_jump.c

eval_safe.c

insns.def : definition of VM instructions

iseq.c : implementation of VM::ISeq

thread.c : thread management and context switching

thread_win32.c : thread implementation

thread_pthread.c : ditto

vm.c

vm_dump.c

vm_eval.c

vm_exec.c
vm_insnhelper.c

vm_method.c

defs/opt_insns_unif.def : instruction unification

defs/opt_operand.def : definitions for optimization

-> insn*.inc : automatically generated

-> opt*.inc : automatically generated

-> vm.inc : automatically generated

Regular Expression Engine (Onigumo)

regcomp.c
regenc.c
regerror.c
regexec.c
regparse.c
regsyntax.c

Utility Functions
debug.c
debug symbols for C debugger
dln.c
dynamic loading
st.c
general purpose hash table
strftime.c
formatting times
util.c
misc utilities
Ruby Interpreter Implementation

dmyext.c
dmydln.c
dmyencoding.c
id.c
inits.c
main.c
ruby.c
version.c

gem_prelude.rb
prelude.rb

Class Library
array.c
Array
bignum.c
Bignum
compar.c
Comparable
complex.c
Complex
cont.c
Fiber, Continuation
dir.c
Dir
enum.c
Enumerable
enumerator.c
Enumerator
file.c
File
hash.c
Hash
io.c
IO
marshal.c
Marshal
math.c
Math
numeric.c
Numeric, Integer, Fixnum, Float
pack.c
Array#pack, String#unpack
proc.c
Binding, Proc
process.c
Process
random.c
random number
range.c
Range
rational.c
Rational
re.c
Regexp, MatchData
signal.c
Signal
sprintf.c
String#sprintf
string.c
String
struct.c
Struct
time.c
Time
defs/known_errors.def
Errno::* exception classes
-> known_errors.inc
automatically generated

Multilingualization
encoding.c
Encoding
transcode.c
Encoding::Converter
enc/*.c
encoding classes
enc/trans/*
codepoint mapping tables

goruby Interpreter Implementation

goruby.c

golf_prelude.rb : goruby specific libraries.

-> golf_prelude.c : automatically generated

Appendix B. Ruby Extension API Reference

Types
VALUE
The type for the Ruby object. Actual structures are defined in ruby.h, such as struct RString, etc.
To refer the values in structures, use casting macros like RSTRING(obj).

Variables and Constants


Qnil
nil object
Qtrue
true object (default true value)
Qfalse
false object

C Pointer Wrapping
Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval)
Wrap a C pointer into a Ruby object. If object has references to other Ruby objects, they should
be marked by using the mark function during the GC process. Otherwise, mark should be 0.
When this object is no longer referred by anywhere, the pointer will be discarded by free function.
Data_Make_Struct(klass, type, mark, free, sval)
This macro allocates memory using malloc(), assigns it to the variable sval, and returns the
DATA encapsulating the pointer to memory region.
Data_Get_Struct(data, type, sval)
This macro retrieves the pointer value from DATA, and assigns it to the variable sval.
Checking Data Types
RB_TYPE_P(value, type)
Is value an internal type (T_NIL, T_FIXNUM, etc.)?
TYPE(value)
Internal type (T_NIL, T_FIXNUM, etc.)
FIXNUM_P(value)
Is value a Fixnum?
NIL_P(value)
Is value nil?
RB_INTEGER_TYPE_P(value)
Is value an Integer?
RB_FLOAT_TYPE_P(value)
Is value a Float?
void Check_Type(VALUE value, int type)
Ensures value is of the given internal type or raises a TypeError
Data Type Conversion
FIX2INT(value), INT2FIX(i)
Fixnum <-> integer
FIX2LONG(value), LONG2FIX(l)
Fixnum <-> long
NUM2INT(value), INT2NUM(i)
Numeric <-> integer
NUM2UINT(value), UINT2NUM(ui)
Numeric <-> unsigned integer
NUM2LONG(value), LONG2NUM(l)
Numeric <-> long
NUM2ULONG(value), ULONG2NUM(ul)
Numeric <-> unsigned long
NUM2LL(value), LL2NUM(ll)
Numeric <-> long long
NUM2ULL(value), ULL2NUM(ull)
Numeric <-> unsigned long long
NUM2OFFT(value), OFFT2NUM(off)
Numeric <-> off_t
NUM2SIZET(value), SIZET2NUM(size)
Numeric <-> size_t
NUM2SSIZET(value), SSIZET2NUM(ssize)
Numeric <-> ssize_t
rb_integer_pack(value, words, numwords, wordsize, nails, flags),
rb_integer_unpack(words, numwords, wordsize, nails, flags)
Numeric <-> Arbitrary size integer buffer
NUM2DBL(value)
Numeric -> double
rb_float_new(f)
double -> Float
RSTRING_LEN(str)
String -> length of String data in bytes
RSTRING_PTR(str)
String -> pointer to String data Note that the result pointer may not be NUL-terminated
StringValue(value)
Object with #to_str -> String
StringValuePtr(value)
Object with #to_str -> pointer to String data
StringValueCStr(value)
Object with #to_str -> pointer to String data without NUL bytes It is guaranteed that the result
data is NUL-terminated
rb_str_new2(s)
char * -> String

Defining Classes and Modules


VALUE rb_define_class(const char *name, VALUE super)
Defines a new Ruby class as a subclass of super.
VALUE rb_define_class_under(VALUE module, const char *name, VALUE super)
Creates a new Ruby class as a subclass of super, under the module’s namespace.
VALUE rb_define_module(const char *name)
Defines a new Ruby module.
VALUE rb_define_module_under(VALUE module, const char *name)
Defines a new Ruby module under the module’s namespace.
void rb_include_module(VALUE klass, VALUE module)
Includes module into class. If class already includes it, just ignored.
void rb_extend_object(VALUE object, VALUE module)
Extend the object with the module’s attributes.

Defining Global Variables


void rb_define_variable(const char *name, VALUE *var)
Defines a global variable which is shared between C and Ruby. If name contains a character
which is not allowed to be part of the symbol, it can’t be seen from Ruby programs.
void rb_define_readonly_variable(const char *name, VALUE *var)
Defines a read-only global variable. Works just like rb_define_variable(), except the defined
variable is read-only.
void rb_define_virtual_variable(const char *name, VALUE (*getter)(), void (*setter)())
Defines a virtual variable, whose behavior is defined by a pair of C functions. The getter function
is called when the variable is referenced. The setter function is called when the variable is set to
a value. The prototype for getter/setter functions are:

VALUE getter(ID id)

void setter(VALUE val, ID id)

The getter function must return the value for the access.
void rb_define_hooked_variable(const char *name, VALUE *var, VALUE (*getter)(), void (*setter)())
Defines hooked variable. It’s a virtual variable with a C variable. The getter is called as

VALUE getter(ID id, VALUE *var)

returning a new value. The setter is called as

void setter(VALUE val, ID id, VALUE *var)

void rb_global_variable(VALUE *var)


Tells GC to protect C global variable, which holds Ruby value to be marked.
void rb_gc_register_mark_object(VALUE object)
Tells GC to protect the object, which may not be referenced anywhere.

Constant Definition
void rb_define_const(VALUE klass, const char *name, VALUE val)
Defines a new constant under the class/module.
void rb_define_global_const(const char *name, VALUE val)
Defines a global constant. This is just the same as
rb_define_const(rb_cObject, name, val)

Method Definition
rb_define_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)
Defines a method for the class. func is the function pointer. argc is the number of arguments. if
argc is -1, the function will receive 3 arguments: argc, argv, and self. if argc is -2, the function will
receive 2 arguments, self and args, where args is a Ruby array of the method arguments.
rb_define_private_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)
Defines a private method for the class. Arguments are same as rb_define_method().
rb_define_singleton_method(VALUE klass, const char *name, VALUE (*func)(ANYARGS), int argc)
Defines a singleton method. Arguments are same as rb_define_method().
rb_check_arity(int argc, int min, int max)
Check the number of arguments, argc is in the range of min..max. If max is
UNLIMITED_ARGUMENTS, upper bound is not checked. If argc is out of bounds,
an ArgumentError will be raised.
rb_scan_args(int argc, VALUE *argv, const char *fmt, …)
Retrieve argument from argc and argv to given VALUE references according to the format string.
The format can be described in ABNF as follows:

scan-arg-spec := param-arg-spec [keyword-arg-spec] [block-arg-spec]

param-arg-spec := pre-arg-spec [post-arg-spec] / post-arg-spec /

pre-opt-post-arg-spec

pre-arg-spec := num-of-leading-mandatory-args [num-of-optional-args]

post-arg-spec := sym-for-variable-length-args

[num-of-trailing-mandatory-args]

pre-opt-post-arg-spec := num-of-leading-mandatory-args num-of-optional-args

num-of-trailing-mandatory-args

keyword-arg-spec := sym-for-keyword-arg

block-arg-spec := sym-for-block-arg
num-of-leading-mandatory-args := DIGIT ; The number of leading

; mandatory arguments

num-of-optional-args := DIGIT ; The number of optional

; arguments

sym-for-variable-length-args := "*" ; Indicates that variable

; length arguments are

; captured as a ruby array

num-of-trailing-mandatory-args := DIGIT ; The number of trailing

; mandatory arguments

sym-for-keyword-arg := ":" ; Indicates that keyword

; argument captured as a hash.

; If keyword arguments are not

; provided, returns nil.

sym-for-block-arg := "&" ; Indicates that an iterator

; block should be captured if

; given

For example, “12” means that the method requires at least one argument, and at most receives
three (1+2) arguments. So, the format string must be followed by three variable references, which
are to be assigned to captured arguments. For omitted arguments, variables are set to Qnil.
NULL can be put in place of a variable reference, which means the corresponding captured
argument(s) should be just dropped.
The number of given arguments, excluding an option hash or iterator block, is returned.
rb_scan_args_kw(int kw_splat, int argc, VALUE *argv, const char *fmt, …)
The same as rb_scan_args, except the kw_splat argument specifies whether keyword arguments
are provided (instead of being determined by the call from Ruby to the C
function). kw_splat should be one of the following values:
RB_SCAN_ARGS_PASS_CALLED_KEYWORDS
Same behavior as rb_scan_args.
RB_SCAN_ARGS_KEYWORDS
The final argument should be a hash treated as keywords.
RB_SCAN_ARGS_LAST_HASH_KEYWORDS
Treat a final argument as keywords if it is a hash, and not as keywords otherwise.
int rb_get_kwargs(VALUE keyword_hash, const ID *table, int required, int optional, VALUE
*values)
Retrieves argument VALUEs bound to keywords, which directed by table into values, deleting
retrieved entries from keyword_hash along the way. First required number of IDs referred
by table are mandatory, and succeeding optional (- optional - 1 if optionalis negative) number of
IDs are optional. If a mandatory key is not contained in keyword_hash, raises “missing
keyword” ArgumentError. If an optional key is not present in keyword_hash, the corresponding
element in values is set to Qundef. If optional is negative, rest of keyword_hash are ignored,
otherwise raises “unknown keyword” ArgumentError.
Be warned, handling keyword arguments in the C API is less efficient than handling them in
Ruby. Consider using a Ruby wrapper method around a non-keyword C function. ref: bugs.ruby-
lang.org/issues/11339
VALUE rb_extract_keywords(VALUE *original_hash)
Extracts pairs whose key is a symbol into a new hash from a hash object referred
by original_hash. If the original hash contains non-symbol keys, then they are copied to another
hash and the new hash is stored through original_hash, else 0 is stored.

Invoking Ruby method


VALUE rb_funcall(VALUE recv, ID mid, int narg, …)
Invokes a method. To retrieve mid from a method name, use rb_intern(). Able to call even
private/protected methods.
VALUE rb_funcall2(VALUE recv, ID mid, int argc, VALUE *argv)
VALUE rb_funcallv(VALUE recv, ID mid, int argc, VALUE *argv)
Invokes a method, passing arguments as an array of values. Able to call even private/protected
methods.
VALUE rb_funcallv_kw(VALUE recv, ID mid, int argc, VALUE *argv, int kw_splat)
Same as rb_funcallv, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_funcallv_public(VALUE recv, ID mid, int argc, VALUE *argv)
Invokes a method, passing arguments as an array of values. Able to call only public methods.
VALUE rb_funcallv_public_kw(VALUE recv, ID mid, int argc, VALUE *argv, int kw_splat)
Same as rb_funcallv_public, using kw_splat to determine whether keyword arguments are
passed.
VALUE rb_funcall_passing_block(VALUE recv, ID mid, int argc, const VALUE* argv)
Same as rb_funcallv_public, except is passes the currently active block as the block when calling
the method.
VALUE rb_funcall_passing_block_kw(VALUE recv, ID mid, int argc, const VALUE* argv, int
kw_splat)
Same as rb_funcall_passing_block, using kw_splat to determine whether keyword arguments are
passed.
VALUE rb_funcall_with_block(VALUE recv, ID mid, int argc, const VALUE *argv, VALUE
passed_procval)
Same as rb_funcallv_public, except passed_procval specifies the block to pass to the method.
VALUE rb_funcall_with_block_kw(VALUE recv, ID mid, int argc, const VALUE *argv, VALUE
passed_procval, int kw_splat)
Same as rb_funcall_with_block, using kw_splat to determine whether keyword arguments are
passed.
VALUE rb_eval_string(const char *str)
Compiles and executes the string as a Ruby program.
ID rb_intern(const char *name)
Returns ID corresponding to the name.
char *rb_id2name(ID id)
Returns the name corresponding ID.
char *rb_class2name(VALUE klass)
Returns the name of the class.
int rb_respond_to(VALUE obj, ID id)
Returns true if the object responds to the message specified by id.

Instance Variables
VALUE rb_iv_get(VALUE obj, const char *name)
Retrieve the value of the instance variable. If the name is not prefixed by ‘@’, that variable shall
be inaccessible from Ruby.
VALUE rb_iv_set(VALUE obj, const char *name, VALUE val)
Sets the value of the instance variable.

Control Structure
VALUE rb_block_call(VALUE recv, ID mid, int argc, VALUE * argv, VALUE (*func) (ANYARGS),
VALUE data2)
Calls a method on the recv, with the method name specified by the symbol mid, with argc
arguments in argv, supplying func as the block. When func is called as the block, it will receive
the value from yield as the first argument, and data2 as the second argument. When yielded with
multiple values (in C, rb_yield_values(), rb_yield_values2() and rb_yield_splat()), data2 is packed
as an Array, whereas yielded values can be gotten via argc/argv of the third/fourth arguments.
VALUE rb_block_call_kw(VALUE recv, ID mid, int argc, VALUE * argv, VALUE (*func) (ANYARGS),
VALUE data2, int kw_splat)
Same as rb_funcall_with_block, using kw_splat to determine whether keyword arguments are
passed.
[OBSOLETE] VALUE rb_iterate(VALUE (*func1)(), VALUE arg1, VALUE (*func2)(), VALUE arg2)
Calls the function func1, supplying func2 as the block. func1 will be called with the argument
arg1. func2 receives the value from yield as the first argument, arg2 as the second argument.
When rb_iterate is used in 1.9, func1 has to call some Ruby-level method. This function is
obsolete since 1.9; use rb_block_call instead.
VALUE rb_yield(VALUE val)
Yields val as a single argument to the block.
VALUE rb_yield_values(int n, …)
Yields n number of arguments to the block, using one C argument per Ruby argument.
VALUE rb_yield_values2(int n, VALUE *argv)
Yields n number of arguments to the block, with all Ruby arguments in the C argv array.
VALUE rb_yield_values_kw(int n, VALUE *argv, int kw_splat)
Same as rb_yield_values2, using kw_splat to determine whether keyword arguments are passed.
VALUE rb_yield_splat(VALUE args)
Same as rb_yield_values2, except arguments are specified by the Ruby array args.
VALUE rb_yield_splat_kw(VALUE args, int kw_splat)
Same as rb_yield_splat, using kw_splatto determine whether keyword arguments are passed.
VALUE rb_rescue(VALUE (*func1)(ANYARGS), VALUE arg1, VALUE (*func2)(ANYARGS),
VALUE arg2)
Calls the function func1, with arg1 as the argument. If an exception occurs during func1, it calls
func2 with arg2 as the first argument and the exception object as the second argument. The
return value of rb_rescue() is the return value from func1 if no exception occurs, from func2
otherwise.
VALUE rb_ensure(VALUE (*func1)(ANYARGS), VALUE arg1, VALUE (*func2)
(ANYARGS), VALUE arg2)
Calls the function func1 with arg1 as the argument, then calls func2 with arg2 if execution
terminated. The return value from rb_ensure() is that of func1 when no exception occurred.
VALUE rb_protect(VALUE (*func) (VALUE), VALUE arg, int *state)
Calls the function func with arg as the argument. If no exception occurred during func, it returns
the result of func and *state is zero. Otherwise, it returns Qnil and sets *state to nonzero. If state
is NULL, it is not set in both cases. You have to clear the error info with rb_set_errinfo(Qnil) when
ignoring the caught exception.
void rb_jump_tag(int state)
Continues the exception caught by rb_protect() and rb_eval_string_protect(). state must be the
returned value from those functions. This function never return to the caller.
void rb_iter_break()
Exits from the current innermost block. This function never return to the caller.
void rb_iter_break_value(VALUE value)
Exits from the current innermost block with the value. The block will return the given argument
value. This function never return to the caller.

Exceptions and Errors


void rb_warn(const char *fmt, …)
Prints a warning message according to a printf-like format.
void rb_warning(const char *fmt, …)
Prints a warning message according to a printf-like format, if $VERBOSE is true.
void rb_raise(rb_eRuntimeError, const char *fmt, …)
Raises RuntimeError. The fmt is a format string just like printf().
void rb_raise(VALUE exception, const char *fmt, …)
Raises a class exception. The fmt is a format string just like printf().
void rb_fatal(const char *fmt, …)
Raises a fatal error, terminates the interpreter. No exception handling will be done for fatal errors,
but ensure blocks will be executed.
void rb_bug(const char *fmt, …)
Terminates the interpreter immediately. This function should be called under the situation caused
by the bug in the interpreter. No exception handling nor ensure execution will be done.
Note: In the format string, “%”PRIsVALUE can be used for Object#to_s (or Object#inspect if ‘+’ flag is set)
output (and related argument must be a VALUE). Since it conflicts with “%i”, for integers in format strings,
use “%d”.

Threading
As of Ruby 1.9, Ruby supports native 1:1 threading with one kernel thread per Ruby Thread object.
Currently, there is a GVL (Global VM Lock) which prevents simultaneous execution of Ruby code which
may be released by the rb_thread_call_without_gvl and rb_thread_call_without_gvl2 functions. These
functions are tricky-to-use and documented in thread.c; do not use them before reading comments in
thread.c.
void rb_thread_schedule(void)
Give the scheduler a hint to pass execution to another thread.
Input/Output (IO) on a single file descriptor
int rb_io_wait_readable(int fd)
Wait indefinitely for the given FD to become readable, allowing other threads to be scheduled.
Returns a true value if a read may be performed, false if there is an unrecoverable error.
int rb_io_wait_writable(int fd)
Like rb_io_wait_readable, but for writability.
int rb_wait_for_single_fd(int fd, int events, struct timeval *timeout)
Allows waiting on a single FD for one or multiple events with a specified timeout.
events is a mask of any combination of the following values:
 RB_WAITFD_IN - wait for readability of normal data

 RB_WAITFD_OUT - wait for writability


 RB_WAITFD_PRI - wait for readability of urgent data
Use a NULL timeout to wait indefinitely.

I/O Multiplexing
Ruby supports I/O multiplexing based on the select(2) system call. The Linux select_tut(2) manpage
<man7.org/linux/man-pages/man2/select_tut.2.html> provides a good overview on how to use select(2), and
the Ruby API has analogous functions and data structures to the well-known select API. Understanding of
select(2) is required to understand this section.
typedef struct rb_fdset_t
The data structure which wraps the fd_set bitmap used by select(2). This allows Ruby to use FD
sets larger than that allowed by historic limitations on modern platforms.
void rb_fd_init(rb_fdset_t *)
Initializes the rb_fdset_t, it must be initialized before other rb_fd_* operations. Analogous to
calling malloc(3) to allocate an fd_set.
void rb_fd_term(rb_fdset_t *)
Destroys the rb_fdset_t, releasing any memory and resources it used. It must be reinitialized
using rb_fd_init before future use. Analogous to calling free(3) to release memory for an fd_set.
void rb_fd_zero(rb_fdset_t *)
Clears all FDs from the rb_fdset_t, analogous to FD_ZERO(3).
void rb_fd_set(int fd, rb_fdset_t *)
Adds a given FD in the rb_fdset_t, analogous to FD_SET(3).
void rb_fd_clr(int fd, rb_fdset_t *)
Removes a given FD from the rb_fdset_t, analogous to FD_CLR(3).
int rb_fd_isset(int fd, const rb_fdset_t *)
Returns true if a given FD is set in the rb_fdset_t, false if not. Analogous to FD_ISSET(3).
int rb_thread_fd_select(int nfds, rb_fdset_t *readfds, rb_fdset_t *writefds, rb_fdset_t
*exceptfds, struct timeval *timeout)
Analogous to the select(2) system call, but allows other Ruby threads to be scheduled while
waiting.
When only waiting on a single FD, favor rb_io_wait_readable, rb_io_wait_writable, or
rb_wait_for_single_fd functions since they can be optimized for specific platforms (currently, only
Linux).

Initialize and Start the Interpreter


The embedding API functions are below (not needed for extension libraries):
void ruby_init()
Initializes the interpreter.
void *ruby_options(int argc, char **argv)
Process command line arguments for the interpreter. And compiles the Ruby source to execute.
It returns an opaque pointer to the compiled source or an internal special value.
int ruby_run_node(void *n)
Runs the given compiled source and exits this process. It returns EXIT_SUCCESS if successfully
runs the source. Otherwise, it returns other value.
void ruby_script(char *name)
Specifies the name of the script ($0).

Hooks for the Interpreter Events


void rb_add_event_hook(rb_event_hook_func_t func, rb_event_flag_t events, VALUE data)
Adds a hook function for the specified interpreter events. events should be OR’ed value of:

RUBY_EVENT_LINE
RUBY_EVENT_CLASS
RUBY_EVENT_END
RUBY_EVENT_CALL
RUBY_EVENT_RETURN
RUBY_EVENT_C_CALL
RUBY_EVENT_C_RETURN
RUBY_EVENT_RAISE
RUBY_EVENT_ALL

The definition of rb_event_hook_func_t is below:

typedef void (*rb_event_hook_func_t)(rb_event_t event, VALUE data,


VALUE self, ID id, VALUE klass)

The third argument ‘data’ to rb_add_event_hook() is passed to the hook function as the second
argument, which was the pointer to the current NODE in 1.8. See
RB_EVENT_HOOKS_HAVE_CALLBACK_DATA below.
int rb_remove_event_hook(rb_event_hook_func_t func)
Removes the specified hook function.

Memory usage
void rb_gc_adjust_memory_usage(ssize_t diff)
Adjusts the amount of registered external memory. You can tell GC how much memory is used
by an external library by this function. Calling this function with positive diff means the memory
usage is increased; new memory block is allocated or a block is reallocated as larger size.
Calling this function with negative diff means the memory usage is decreased; a memory block is
freed or a block is reallocated as smaller size. This function may trigger the GC.

Macros for Compatibility


Some macros to check API compatibilities are available by default.
NORETURN_STYLE_NEW
Means that NORETURN macro is functional style instead of prefix.
HAVE_RB_DEFINE_ALLOC_FUNC
Means that function rb_define_alloc_func() is provided, that means the allocation framework is
used. This is the same as the result of have_func(“rb_define_alloc_func”, “ruby.h”).
HAVE_RB_REG_NEW_STR
Means that function rb_reg_new_str() is provided, that creates Regexp object from String object.
This is the same as the result of have_func(“rb_reg_new_str”, “ruby.h”).
HAVE_RB_IO_T
Means that type rb_io_t is provided.
USE_SYMBOL_AS_METHOD_NAME
Means that Symbols will be returned as method names, e.g., Module#methods,
#singleton_methods and so on.
HAVE_RUBY_*_H
Defined in ruby.h and means corresponding header is available. For instance, when
HAVE_RUBY_ST_H is defined you should use ruby/st.h not mere st.h.
Header files corresponding to these macros may be #include directly from extension libraries.
RB_EVENT_HOOKS_HAVE_CALLBACK_DATA
Means that rb_add_event_hook() takes the third argument ‘data’, to be passed to the given event
hook function.

Defining backward compatible macros for keyword argument functions


Most ruby C extensions are designed to support multiple Ruby versions. In order to correctly support Ruby
2.7+ in regards to keyword argument separation, C extensions need to use *_kw functions. However, these
functions do not exist in Ruby 2.6 and below, so in those cases macros should be defined to allow you to
use the same code on multiple Ruby versions. Here are example macros you can use in extensions that
support Ruby 2.6 (or below) when using the *_kw functions introduced in Ruby 2.7.

#ifndef RB_PASS_KEYWORDS

/* Only define macros on Ruby <2.7 */

#define rb_funcallv_kw(o, m, c, v, kw) rb_funcallv(o, m, c, v)

#define rb_funcallv_public_kw(o, m, c, v, kw) rb_funcallv_public(o, m, c, v)

#define rb_funcall_passing_block_kw(o, m, c, v, kw) rb_funcall_passing_block(o, m, c, v)

#define rb_funcall_with_block_kw(o, m, c, v, b, kw) rb_funcall_with_block(o, m, c, v, b)

#define rb_scan_args_kw(kw, c, v, s, ...) rb_scan_args(c, v, s, __VA_ARGS__)

#define rb_call_super_kw(c, v, kw) rb_call_super(c, v)

#define rb_yield_values_kw(c, v, kw) rb_yield_values2(c, v)

#define rb_yield_splat_kw(a, kw) rb_yield_splat(a)

#define rb_block_call_kw(o, m, c, v, f, p, kw) rb_block_call(o, m, c, v, f, p)

#define rb_fiber_resume_kw(o, c, v, kw) rb_fiber_resume(o, c, v)

#define rb_fiber_yield_kw(c, v, kw) rb_fiber_yield(c, v)

#define rb_enumeratorize_with_size_kw(o, m, c, v, f, kw) rb_enumeratorize_with_size(o, m, c, v, f)

#define SIZED_ENUMERATOR_KW(obj, argc, argv, size_fn, kw_splat) \

rb_enumeratorize_with_size((obj), ID2SYM(rb_frame_this_func()), \

(argc), (argv), (size_fn))

#define RETURN_SIZED_ENUMERATOR_KW(obj, argc, argv, size_fn, kw_splat) do { \


if (!rb_block_given_p()) \

return SIZED_ENUMERATOR(obj, argc, argv, size_fn); \

} while (0)

#define RETURN_ENUMERATOR_KW(obj, argc, argv, kw_splat) RETURN_SIZED_ENUMERATOR(obj,


argc, argv, 0)

#define rb_check_funcall_kw(o, m, c, v, kw) rb_check_funcall(o, m, c, v)

#define rb_obj_call_init_kw(o, c, v, kw) rb_obj_call_init(o, c, v)

#define rb_class_new_instance_kw(c, v, k, kw) rb_class_new_instance(c, v, k)

#define rb_proc_call_kw(p, a, kw) rb_proc_call(p, a)

#define rb_proc_call_with_block_kw(p, c, v, b, kw) rb_proc_call_with_block(p, c, v, b)

#define rb_method_call_kw(c, v, m, kw) rb_method_call(c, v, m)

#define rb_method_call_with_block_kw(c, v, m, b, kw) rb_method_call_with_block(c, v, m, b)

#define rb_eval_cmd_kwd(c, a, kw) rb_eval_cmd(c, a, 0)

#endif

Appendix C. Functions available for use in extconf.rb


See documentation for mkmf.
Appendix D. Generational GC
Ruby 2.1 introduced a generational garbage collector (called RGenGC). RGenGC (mostly) keeps
compatibility.
Generally, the use of the technique called write barriers is required in extension libraries for
generational GC (en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29). RGenGC works
fine without write barriers in extension libraries.
If your library adheres to the following tips, performance can be further improved. Especially, the “Don’t
touch pointers directly” section is important.

Incompatibility
You can’t write RBASIC(obj)->klass field directly because it is const value now.
Basically you should not write this field because MRI expects it to be an immutable field, but if you want to
do it in your extension you can use the following functions:
VALUE rb_obj_hide(VALUE obj)
Clear RBasic::klass field. The object will be an internal object. ObjectSpace::each_object can’t
find this object.
VALUE rb_obj_reveal(VALUE obj, VALUE klass)
Reset RBasic::klass to be klass. We expect the ‘klass’ is hidden class by rb_obj_hide().

Write barriers
RGenGC doesn’t require write barriers to support generational GC. However, caring about write barrier can
improve the performance of RGenGC. Please check the following tips.

Don’t touch pointers directly


In MRI (include/ruby/ruby.h), some macros to acquire pointers to the internal data structures are supported
such as RARRAY_PTR(), RSTRUCT_PTR() and so on.
DO NOT USE THESE MACROS and instead use the corresponding C-APIs such as rb_ary_aref(),
rb_ary_store() and so on.

Consider whether to insert write barriers


You don’t need to care about write barriers if you only use built-in types.
If you support T_DATA objects, you may consider using write barriers.
Inserting write barriers into T_DATA objects only works with the following type objects: (a) long-lived
objects, (b) when a huge number of objects are generated and (c) container-type objects that have
references to other objects. If your extension provides such a type of T_DATA objects, consider inserting
write barriers.
(a): short-lived objects don’t become old generation objects. (b): only a few oldgen objects don’t have
performance impact. (c): only a few references don’t have performance impact.
Inserting write barriers is a very difficult hack, it is easy to introduce critical bugs. And inserting write barriers
has several areas of overhead. Basically we don’t recommend you insert write barriers. Please carefully
consider the risks.

Combine with built-in types


Please consider utilizing built-in types. Most built-in types support write barrier, so you can use them to
avoid manually inserting write barriers.
For example, if your T_DATA has references to other objects, then you can move these references to Array.
A T_DATA object only has a reference to an array object. Or you can also use a Struct object to gather a
T_DATA object (without any references) and an that Array contains references.
With use of such techniques, you don’t need to insert write barriers anymore.

Insert write barriers


[AGAIN] Inserting write barriers is a very difficult hack, and it is easy to introduce critical bugs. And inserting
write barriers has several areas of overhead. Basically we don’t recommend you insert write barriers.
Please carefully consider the risks.
Before inserting write barriers, you need to know about RGenGC algorithm (gc.c will help you). Macros and
functions to insert write barriers are available in include/ruby/ruby.h. An example is available in iseq.c.
For a complete guide for RGenGC and write barriers, please refer to <bugs.ruby-lang.org/projects/ruby-
master/wiki/RGenGC>.
Appendix E. RB_GC_GUARD to protect from premature GC
C Ruby currently uses conservative garbage collection, thus VALUE variables must remain visible on the
stack or registers to ensure any associated data remains usable. Optimizing C compilers are not designed
with conservative garbage collection in mind, so they may optimize away the original VALUE even if the
code depends on data associated with that VALUE.
The following example illustrates the use of RB_GC_GUARD to ensure the contents of sptr remain valid
while the second invocation of rb_str_new_cstr is running.

VALUE s, w;

const char *sptr;

s = rb_str_new_cstr("hello world!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!");

sptr = RSTRING_PTR(s);

w = rb_str_new_cstr(sptr + 6); /* Possible GC invocation */

RB_GC_GUARD(s); /* ensure s (and thus sptr) do not get GC-ed */

In the above example, RB_GC_GUARD must be placed after the last use of sptr. Placing RB_GC_GUARD
before dereferencing sptr would be of no use. RB_GC_GUARD is only effective on the VALUE data type,
not converted C data types.
RB_GC_GUARD would not be necessary at all in the above example if non-inlined function calls are made
on the ‘s’ VALUE after sptr is dereferenced. Thus, in the above example, calling any un-inlined function on
‘s’ such as:

rb_str_modify(s);

Will ensure ‘s’ stays on the stack or register to prevent a GC invocation from prematurely freeing it.
Using the RB_GC_GUARD macro is preferable to using the “volatile” keyword in C. RB_GC_GUARD has
the following advantages:
1. the intent of the macro use is clear
2. RB_GC_GUARD only affects its call site, “volatile” generates some extra code every time the
variable is used, hurting optimization.
3. “volatile” implementations may be buggy/inconsistent in some compilers and architectures.
RB_GC_GUARD is customizable for broken systems/compilers without negatively affecting other
systems.
Appendix F. Ractor support
Ractor(s) are the parallel execution mechanism introduced in Ruby 3.0. All ractors can run in parallel on a
different OS thread (using an underlying system provided thread), so the C extension should be thread-safe.
A C extension that can run in multiple ractors is called “Ractor-safe”.
Ractor safety around C extensions has the following properties:
1. By default, all C extensions are recognized as Ractor-unsafe.
2. Ractor-unsafe C-methods may only be called from the main Ractor. If invoked by a non-
main Ractor, then a Ractor::UnsafeError is raised.
3. If an extension desires to be marked as Ractor-safe the extension should call
rb_ext_ractor_safe(true) at the Init_ function for the extension, and all defined methods will be
marked as Ractor-safe.
To make a “Ractor-safe” C extension, we need to check the following points:
(1) Do not share unshareable objects between ractors
For example, C’s global variable can lead sharing an unshareable objects between ractors.

VALUE g_var;

VALUE set(VALUE self, VALUE v){ return g_var = v; }

VALUE get(VALUE self){ return g_var; }

set() and get() pair can share an unshareable objects using g_var, and it is Ractor-unsafe.
Not only using global variables directly, some indirect data structure such as global st_table can share the
objects, so please take care.
Note that class and module objects are shareable objects, so you can keep the code “cFoo =
rb_define_class(…)” with C’s global variables.
(2) Check the thread-safety of the extension
An extension should be thread-safe. For example, the following code is not thread-safe:

bool g_called = false;

VALUE call(VALUE self) {


if (g_called) rb_raise("recursive call is not allowed.");

g_called = true;

VALUE ret = do_something();

g_called = false;

return ret;

because g_called global variable should be synchronized by other ractor’s threads. To avoid such data-
race, some synchronization should be used. Check include/ruby/thread_native.h and include/ruby/atomic.h.
With Ractors, all objects given as method parameters and the receiver (self) are guaranteed to be from the
current Ractor or to be shareable. As a consequence, it is easier to make code ractor-safe than to make
code generally thread-safe. For example, we don’t need to lock an array object to access the element of it.
(3) Check the thread-safety of any used library
If the extension relies on an external library, such as a function foo() from a library libfoo, the function libfoo
foo() should be thread safe.
(4) Make an object shareable
This is not required to make an extension Ractor-safe.
If an extension provides special objects defined by rb_data_type_t, consider these objects can become
shareable or not.
RUBY_TYPED_FROZEN_SHAREABLE flag indicates that these objects can be shareable objects if the
object is frozen. This means that if the object is frozen, the mutation of wrapped data is not allowed.
(5) Others
There are possibly other points or requirements which must be considered in the making of a Ractor-safe
extension. This document will be extended as they are discovered.

Fiber
Fibers provide a mechanism for cooperative concurrency.

Context Switching
Fibers execute a user-provided block. During the execution, the block may
call Fiber.yield or Fiber.transfer to switch to another fiber. Fiber#resume is used to continue execution from
the point where Fiber.yield was called.
#!/usr/bin/env ruby

puts "1: Start program."

f = Fiber.new do
puts "3: Entered fiber."
Fiber.yield
puts "5: Resumed fiber."
end

puts "2: Resume fiber first time."


f.resume

puts "4: Resume fiber second time."


f.resume

puts "6: Finished."

This program demonstrates the flow control of fibers.

Scheduler
The scheduler interface is used to intercept blocking operations. A typical implementation would be a
wrapper for a gem like EventMachine or Async. This design provides separation of concerns between the
event loop implementation and application code. It also allows for layered schedulers which can perform
instrumentation.
To set the scheduler for the current thread:

Fiber.set_scheduler(MyScheduler.new)

When the thread exits, there is an implicit call to set_scheduler:

Fiber.set_scheduler(nil)

Design
The scheduler interface is designed to be a un-opinionated light-weight layer between user code and
blocking operations. The scheduler hooks should avoid translating or converting arguments or return values.
Ideally, the exact same arguments from the user code are provided directly to the scheduler hook with no
changes.

Interface
This is the interface you need to implement.
class Scheduler
# Wait for the specified process ID to exit.
# This hook is optional.
# @parameter pid [Integer] The process ID to wait for.
# @parameter flags [Integer] A bit-mask of flags suitable for `Process::Status.wait`.
# @returns [Process::Status] A process status instance.
def process_wait(pid, flags)
Thread.new do
Process::Status.wait(pid, flags)
end.value
end

# Wait for the given io readiness to match the specified events within
# the specified timeout.
# @parameter event [Integer] A bit mask of `IO::READABLE`,
# `IO::WRITABLE` and `IO::PRIORITY`.
# @parameter timeout [Numeric] The amount of time to wait for the event in seconds.
# @returns [Integer] The subset of events that are ready.
def io_wait(io, events, timeout)
end

# Read from the given io into the specified buffer.


# WARNING: Experimental hook! Do not use in production code!
# @parameter io [IO] The io to read from.
# @parameter buffer [IO::Buffer] The buffer to read into.
# @parameter length [Integer] The minimum amount to read.
def io_read(io, buffer, length)
end

# Write from the given buffer into the specified IO.


# WARNING: Experimental hook! Do not use in production code!
# @parameter io [IO] The io to write to.
# @parameter buffer [IO::Buffer] The buffer to write from.
# @parameter length [Integer] The minimum amount to write.
def io_write(io, buffer, length)
end

# Sleep the current task for the specified duration, or forever if not
# specified.
# @parameter duration [Numeric] The amount of time to sleep in seconds.
def kernel_sleep(duration = nil)
end

# Execute the given block. If the block execution exceeds the given timeout,
# the specified exception `klass` will be raised. Typically, only non-blocking
# methods which enter the scheduler will raise such exceptions.
# @parameter duration [Integer] The amount of time to wait, after which an exception will be raised.
# @parameter klass [Class] The exception class to raise.
# @parameter *arguments [Array] The arguments to send to the constructor of the exception.
# @yields {...} The user code to execute.
def timeout_after(duration, klass, *arguments, &block)
end

# Resolve hostname to an array of IP addresses.


# This hook is optional.
# @parameter hostname [String] Example: "www.ruby-lang.org".
# @returns [Array] An array of IPv4 and/or IPv6 address strings that the hostname resolves to.
def address_resolve(hostname)
end

# Block the calling fiber.


# @parameter blocker [Object] What we are waiting on, informational only.
# @parameter timeout [Numeric | Nil] The amount of time to wait for in seconds.
# @returns [Boolean] Whether the blocking operation was successful or not.
def block(blocker, timeout = nil)
end

# Unblock the specified fiber.


# @parameter blocker [Object] What we are waiting on, informational only.
# @parameter fiber [Fiber] The fiber to unblock.
# @reentrant Thread safe.
def unblock(blocker, fiber)
end

# Intercept the creation of a non-blocking fiber.


# @returns [Fiber]
def fiber(&block)
Fiber.new(blocking: false, &block)
end

# Invoked when the thread exits.


def close
self.run
end

def run
# Implement event loop here.
end
end

Additional hooks may be introduced in the future, we will use feature detection in order to enable these
hooks.

Non-blocking Execution
The scheduler hooks will only be used in special non-blocking execution contexts. Non-blocking execution
contexts introduce non-determinism because the execution of scheduler hooks may introduce context
switching points into your program.

Fibers
Fibers can be used to create non-blocking execution contexts.

Fiber.new do
puts Fiber.current.blocking? # false

# May invoke `Fiber.scheduler&.io_wait`.


io.read(...)

# May invoke `Fiber.scheduler&.io_wait`.


io.write(...)

# Will invoke `Fiber.scheduler&.kernel_sleep`.


sleep(n)
end.resume

We also introduce a new method which simplifies the creation of these non-blocking fibers:

Fiber.schedule do
puts Fiber.current.blocking? # false
end

The purpose of this method is to allow the scheduler to internally decide the policy for when to start the
fiber, and whether to use symmetric or asymmetric fibers.
You can also create blocking execution contexts:

Fiber.new(blocking: true) do
# Won't use the scheduler:
sleep(n)
end

However you should generally avoid this unless you are implementing a scheduler.
IO
By default, I/O is non-blocking. Not all operating systems support non-blocking I/O. Windows is a notable
example where socket I/O can be non-blocking but pipe I/O is blocking. Provided that there is a scheduler
and the current thread is non-blocking, the operation will invoke the scheduler.

Mutex
The Mutex class can be used in a non-blocking context and is fiber specific.

ConditionVariable
The ConditionVariable class can be used in a non-blocking context and is fiber-specific.

Queue / SizedQueue
The Queue and SizedQueue classes can be used in a non-blocking context and are fiber-specific.
Thread
The Thread#join operation can be used in a non-blocking context and is fiber-specific.

Format Specifications
Several Ruby core classes have instance method printf or sprintf:
 ARGF#printf
 IO#printf
 Kernel#printf
 Kernel#sprintf
Each of these methods takes:
 Argument format_string, which has zero or more embedded format specifications (see below).
 Arguments *arguments, which are zero or more objects to be formatted.
Each of these methods prints or returns the string resulting from replacing each format specification
embedded in format_string with a string form of the corresponding argument among arguments.
A simple example:

sprintf('Name: %s; value: %d', 'Foo', 0) # => "Name: Foo; value: 0"

A format specification has the form:

%[flags][width][.precision]type

It consists of:
 A leading percent character.
 Zero or more flags (each is a character).
 An optional width specifier (an integer).
 An optional precision specifier (a period followed by a non-negative integer).
 A type specifier (a character).
Except for the leading percent character, the only required part is the type specifier, so we begin with that.

Type Specifiers
This section provides a brief explanation of each type specifier. The links lead to the details and examples.

Integer Type Specifiers


 b or B: Format argument as a binary integer. See Specifiers b and B.
 d, i, or u (all are identical): Format argument as a decimal integer. See Specifier d.
 o: Format argument as an octal integer. See Specifier o.
 x or X: Format argument as a hexadecimal integer. See Specifiers x and X.

Floating-Point Type Specifiers


 a or A: Format argument as hexadecimal floating-point number. See Specifiers a and A.
 e or E: Format argument in scientific notation. See Specifiers e and E.
 f: Format argument as a decimal floating-point number. See Specifier f.
 g or G: Format argument in a “general” format. See Specifiers g and G.

Other Type Specifiers


 c: Format argument as a character. See Specifier c.
 p: Format argument as a string via argument.inspect. See Specifier p.
 s: Format argument as a string via argument.to_s. See Specifier s.
 %: Format argument ('%') as a single percent character. See Specifier %.

Flags
The effect of a flag may vary greatly among type specifiers. These remarks are general in nature. See type-
specific details.
Multiple flags may be given with single type specifier; order does not matter.
' ' Flag
Insert a space before a non-negative number:

sprintf('%d', 10) # => "10"


sprintf('% d', 10) # => " 10"

Insert a minus sign for negative value:

sprintf('%d', -10) # => "-10"


sprintf('% d', -10) # => "-10"

'#' Flag
Use an alternate format; varies among types:

sprintf('%x', 100) # => "64"


sprintf('%#x', 100) # => "0x64"

'+' Flag
Add a leading plus sign for a non-negative number:

sprintf('%x', 100) # => "64"


sprintf('%+x', 100) # => "+64"

'-' Flag
Left justify the value in its field:

sprintf('%6d', 100) # => " 100"


sprintf('%-6d', 100) # => "100 "

'0' Flag
Left-pad with zeros instead of spaces:

sprintf('%6d', 100) # => " 100"


sprintf('%06d', 100) # => "000100"

'*' Flag
Use the next argument as the field width:

sprintf('%d', 20, 14) # => "20"


sprintf('%*d', 20, 14) # => " 14"

'n$' Flag
Format the (1-based) nth argument into this field:

sprintf("%s %s", 'world', 'hello') # => "world hello"


sprintf("%2$s %1$s", 'world', 'hello') # => "hello world"

Width Specifier
In general, a width specifier determines the minimum width (in characters) of the formatted field:

sprintf('%10d', 100) # => " 100"

# Left-justify if negative.
sprintf('%-10d', 100) # => "100 "

# Ignore if too small.


sprintf('%1d', 100) # => "100"

Precision Specifier
A precision specifier is a decimal point followed by zero or more decimal digits.
For integer type specifiers, the precision specifies the minimum number of digits to be written. If the
precision is shorter than the integer, the result is padded with leading zeros. There is no modification or
truncation of the result if the integer is longer than the precision:

sprintf('%.3d', 1) # => "001"


sprintf('%.3d', 1000) # => "1000"

# If the precision is 0 and the value is 0, nothing is written


sprintf('%.d', 0) # => ""
sprintf('%.0d', 0) # => ""

For the a/A, e/E, f/F specifiers, the precision specifies the number of digits after the decimal point to be
written:

sprintf('%.2f', 3.14159) # => "3.14"


sprintf('%.10f', 3.14159) # => "3.1415900000"

# With no precision specifier, defaults to 6-digit precision.


sprintf('%f', 3.14159) # => "3.141590"

For the g/G specifiers, the precision specifies the number of significant digits to be written:

sprintf('%.2g', 123.45) # => "1.2e+02"


sprintf('%.3g', 123.45) # => "123"
sprintf('%.10g', 123.45) # => "123.45"

# With no precision specifier, defaults to 6 significant digits.


sprintf('%g', 123.456789) # => "123.457"

For the s, p specifiers, the precision specifies the number of characters to write:

sprintf('%s', Time.now) # => "2022-05-04 11:59:16 -0400"


sprintf('%.10s', Time.now) # => "2022-05-04"

Type Specifier Details and Examples


Specifiers a and A
Format argument as hexadecimal floating-point number:
sprintf('%a', 3.14159) # => "0x1.921f9f01b866ep+1"
sprintf('%a', -3.14159) # => "-0x1.921f9f01b866ep+1"
sprintf('%a', 4096) # => "0x1p+12"
sprintf('%a', -4096) # => "-0x1p+12"

# Capital 'A' means that alphabetical characters are printed in upper case.
sprintf('%A', 4096) # => "0X1P+12"
sprintf('%A', -4096) # => "-0X1P+12"

Specifiers b and B
The two specifiers b and B behave identically except when flag '#'+ is used.
Format argument as a binary integer:

sprintf('%b', 1) # => "1"


sprintf('%b', 4) # => "100"

# Prefix '..' for negative value.


sprintf('%b', -4) # => "..100"

# Alternate format.
sprintf('%#b', 4) # => "0b100"
sprintf('%#B', 4) # => "0B100"

Specifier c
Format argument as a single character:

sprintf('%c', 'A') # => "A"


sprintf('%c', 65) # => "A"

Specifier d
Format argument as a decimal integer:

sprintf('%d', 100) # => "100"


sprintf('%d', -100) # => "-100"

Flag '#' does not apply.


Specifiers e and E
Format argument in scientific notation:

sprintf('%e', 3.14159) # => "3.141590e+00"


sprintf('%E', -3.14159) # => "-3.141590E+00"

Specifier f
Format argument as a floating-point number:

sprintf('%f', 3.14159) # => "3.141590"


sprintf('%f', -3.14159) # => "-3.141590"

Flag '#' does not apply.


Specifiers g and G
Format argument using exponential form (e/E specifier) if the exponent is less than -4 or greater than or
equal to the precision. Otherwise format argument using floating-point form (f specifier):

sprintf('%g', 100) # => "100"


sprintf('%g', 100.0) # => "100"
sprintf('%g', 3.14159) # => "3.14159"
sprintf('%g', 100000000000) # => "1e+11"
sprintf('%g', 0.000000000001) # => "1e-12"

# Capital 'G' means use capital 'E'.


sprintf('%G', 100000000000) # => "1E+11"
sprintf('%G', 0.000000000001) # => "1E-12"

# Alternate format.
sprintf('%#g', 100000000000) # => "1.00000e+11"
sprintf('%#g', 0.000000000001) # => "1.00000e-12"
sprintf('%#G', 100000000000) # => "1.00000E+11"
sprintf('%#G', 0.000000000001) # => "1.00000E-12"

Specifier o
Format argument as an octal integer. If argument is negative, it will be formatted as a two’s complement
prefixed with ..7:

sprintf('%o', 16) # => "20"

# Prefix '..7' for negative value.


sprintf('%o', -16) # => "..760"

# Prefix zero for alternate format if positive.


sprintf('%#o', 16) # => "020"
sprintf('%#o', -16) # => "..760"

Specifier p
Format argument as a string via argument.inspect:

t = Time.now
sprintf('%p', t) # => "2022-05-01 13:42:07.1645683 -0500"

Specifier s
Format argument as a string via argument.to_s:
t = Time.now
sprintf('%s', t) # => "2022-05-01 13:42:07 -0500"

Flag '#' does not apply.


Specifiers x and X
Format argument as a hexadecimal integer. If argument is negative, it will be formatted as a two’s
complement prefixed with ..f:

sprintf('%x', 100) # => "64"

# Prefix '..f' for negative value.


sprintf('%x', -100) # => "..f9c"

# Use alternate format.


sprintf('%#x', 100) # => "0x64"

# Alternate format for negative value.


sprintf('%#x', -100) # => "0x..f9c"

Specifier %
Format argument ('%') as a single percent character:

sprintf('%d %%', 100) # => "100 %"

Flags do not apply.

Reference by Name
For more complex formatting, Ruby supports a reference by name. %<name>s style uses format style, but
%{name} style doesn’t.
Examples:

sprintf("%<foo>d : %<bar>f", { :foo => 1, :bar => 2 }) # => 1 : 2.000000


sprintf("%{foo}f", { :foo => 1 }) # => "1f"

Pre-defined global variables


$!
The Exception object set by Kernel#raise.
$@
The same as $!.backtrace.
$~
The information about the last match in the current scope (thread-local and frame-local).
$&
The string matched by the last successful match.
$‘
The string to the left of the last successful match.
$‘
The string to the right of the last successful match.
$+
The highest group matched by the last successful match.
$1
The Nth group of the last successful match. May be > 1.
$=
This variable is no longer effective. Deprecated.
$/
The input record separator, newline by default. Aliased to $-0.
$\
The output record separator for Kernel#print and IO#write. Default is nil.
$,
The output field separator for Kernel#print and Array#join. Non-nil $, will be deprecated.
$;
The default separator for String#split. Non-nil $; will be deprecated. Aliased to $-F.
$.
The current input line number of the last file that was read.
$<
The same as ARGF.
$>
The default output stream for Kernel#print and Kernel#printf. $stdout by default.
$_
The last input line of string by gets or readline.
$0
Contains the name of the script being executed. May be assignable.
$*
The same as ARGV.
$$
The process number of the Ruby running this script. Same as Process.pid.
$?
The status of the last executed child process (thread-local).
$LOAD_PATH
Load path for searching Ruby scripts and extension libraries used
by Kernel#load and Kernel#require. Aliased to $: and $-I. Has a singleton
method $LOAD_PATH.resolve_feature_path(feature) that returns [:rb or :so, path], which
resolves the feature to the path the original Kernel#require method would load.
$LOADED_FEATURES
The array contains the module names loaded by require. Aliased to $“.
$DEBUG
The debug flag, which is set by the -d switch. Enabling debug output prints each exception raised
to $stderr (but not its backtrace). Setting this to a true value enables debug output as if -d were
given on the command line. Setting this to a false value disables debug output. Aliased to $-d.
$FILENAME
Current input filename from ARGF. Same as ARGF.filename.
$stderr
The current standard error output.
$stdin
The current standard input.
$stdout
The current standard output.
$VERBOSE
The verbose flag, which is set by the -w or -v switch. Setting this to a true value enables warnings
as if -w or -v were given on the command line. Setting this to nil disables warnings, including
from Kernel#warn. Aliased to $-v and $-w.
$-a
True if option -a is set. Read-only variable.
$-i
In in-place-edit mode, this variable holds the extension, otherwise nil.
$-l
True if option -l is set. Read-only variable.
$-p
True if option -p is set. Read-only variable.

Pre-defined global constants


STDIN
The standard input. The default value for $stdin.
STDOUT
The standard output. The default value for $stdout.
STDERR
The standard error output. The default value for $stderr.
ENV
The hash contains current environment variables.
ARGF
The virtual concatenation of the files given on command line (or from $stdin if no files were
given).
ARGV
An Array of command line arguments given for the script.
DATA
The file object of the script, pointing just after __END__.
TOPLEVEL_BINDING
The Binding of the top level scope.
RUBY_VERSION
The Ruby language version.
RUBY_RELEASE_DATE
The release date string.
RUBY_PLATFORM
The platform identifier.
RUBY_PATCHLEVEL
The patchlevel for this Ruby. If this is a development build of Ruby the patchlevel will be -1.
RUBY_REVISION
The GIT commit hash for this Ruby.
RUBY_COPYRIGHT
The copyright string for Ruby.
RUBY_ENGINE
The name of the Ruby implementation.
RUBY_ENGINE_VERSION
The version of the Ruby implementation.
RUBY_DESCRIPTION
The same as ruby --version, a String describing various aspects of the Ruby implementation.

Implicit Conversions
Some Ruby methods accept one or more objects that can be either:
 Of a given class, and so accepted as is.
 Implicitly convertible to that class, in which case the called method converts the object.
For each of the relevant classes, the conversion is done by calling a specific conversion method:
 Array: to_ary
 Hash: to_hash
 Integer: to_int
 String: to_str

Array-Convertible Objects
An Array-convertible object is an object that:
 Has instance method to_ary.
 The method accepts no arguments.
 The method returns an object obj for which obj.kind_of?(Array) returns true.
The Ruby core class that satisfies these requirements is:
 Array
The examples in this section use method Array#replace, which accepts an Array-convertible argument.
This class is Array-convertible:

class ArrayConvertible
def to_ary
[:foo, 'bar', 2]
end
end
a = []
a.replace(ArrayConvertible.new) # => [:foo, "bar", 2]

This class is not Array-convertible (no to_ary method):

class NotArrayConvertible; end


a = []
# Raises TypeError (no implicit conversion of NotArrayConvertible into Array)
a.replace(NotArrayConvertible.new)

This class is not Array-convertible (method to_ary takes arguments):

class NotArrayConvertible
def to_ary(x)
[:foo, 'bar', 2]
end
end
a = []
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
a.replace(NotArrayConvertible.new)

This class is not Array-convertible (method to_ary returns non-Array):


class NotArrayConvertible
def to_ary
:foo
end
end
a = []
# Raises TypeError (can't convert NotArrayConvertible to Array (NotArrayConvertible#to_ary gives
Symbol))
a.replace(NotArrayConvertible.new)

Hash-Convertible Objects
A Hash-convertible object is an object that:
 Has instance method to_hash.
 The method accepts no arguments.
 The method returns an object obj for which obj.kind_of?(Hash) returns true.
The Ruby core class that satisfies these requirements is:
 Hash
The examples in this section use method Hash#merge, which accepts a Hash-convertible argument.
This class is Hash-convertible:

class HashConvertible
def to_hash
{foo: 0, bar: 1, baz: 2}
end
end
h = {}
h.merge(HashConvertible.new) # => {:foo=>0, :bar=>1, :baz=>2}

This class is not Hash-convertible (no to_hash method):

class NotHashConvertible; end


h = {}
# Raises TypeError (no implicit conversion of NotHashConvertible into Hash)
h.merge(NotHashConvertible.new)

This class is not Hash-convertible (method to_hash takes arguments):

class NotHashConvertible
def to_hash(x)
{foo: 0, bar: 1, baz: 2}
end
end
h = {}
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
h.merge(NotHashConvertible.new)

This class is not Hash-convertible (method to_hash returns non-Hash):

class NotHashConvertible
def to_hash
:foo
end
end
h = {}
# Raises TypeError (can't convert NotHashConvertible to Hash (ToHashReturnsNonHash#to_hash gives
Symbol))
h.merge(NotHashConvertible.new)

Integer-Convertible Objects
An Integer-convertible object is an object that:
 Has instance method to_int.
 The method accepts no arguments.
 The method returns an object obj for which obj.kind_of?(Integer) returns true.
The Ruby core classes that satisfy these requirements are:
 Integer
 Float
 Complex
 Rational
The examples in this section use method Array.new, which accepts an Integer-convertible argument.
This user-defined class is Integer-convertible:

class IntegerConvertible
def to_int
3
end
end
a = Array.new(IntegerConvertible.new).size
a # => 3

This class is not Integer-convertible (method to_int takes arguments):

class NotIntegerConvertible
def to_int(x)
3
end
end
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
Array.new(NotIntegerConvertible.new)

This class is not Integer-convertible (method to_int returns non-Integer):

class NotIntegerConvertible
def to_int
:foo
end
end
# Raises TypeError (can't convert NotIntegerConvertible to Integer (NotIntegerConvertible#to_int gives
Symbol))
Array.new(NotIntegerConvertible.new)

String-Convertible Objects
A String-convertible object is an object that:
 Has instance method to_str.
 The method accepts no arguments.
 The method returns an object obj for which obj.kind_of?(String) returns true.
The Ruby core class that satisfies these requirements is:
 String
The examples in this section use method String::new, which accepts a String-convertible argument.
This class is String-convertible:

class StringConvertible
def to_str
'foo'
end
end
String.new(StringConvertible.new) # => "foo"

This class is not String-convertible (no to_str method):

class NotStringConvertible; end


# Raises TypeError (no implicit conversion of NotStringConvertible into String)
String.new(NotStringConvertible.new)

This class is not String-convertible (method to_str takes arguments):

class NotStringConvertible
def to_str(x)
'foo'
end
end
# Raises ArgumentError (wrong number of arguments (given 0, expected 1))
String.new(NotStringConvertible.new)

This class is not String-convertible (method to_str returns non-String):

class NotStringConvertible
def to_str
:foo
end
end
# Raises TypeError (can't convert NotStringConvertible to String (NotStringConvertible#to_str gives
Symbol))
String.new(NotStringConvertible.new)

Keywords
The following keywords are used by Ruby.
__ENCODING__
The script encoding of the current file. See Encoding.
__LINE__
The line number of this keyword in the current file.
__FILE__
The path to the current file.
BEGIN
Runs before any other code in the current file. See miscellaneous syntax
END
Runs after any other code in the current file. See miscellaneous syntax
alias
Creates an alias between two methods (and other things). See modules and classes syntax
and
Short-circuit Boolean and with lower precedence than &&
begin
Starts an exception handling block. See exceptions syntax
break
Leaves a block early. See control expressions syntax
case
Starts a case expression. See control expressions syntax
class
Creates or opens a class. See modules and classes syntax
def
Defines a method. See methods syntax
defined?
Returns a string describing its argument. See miscellaneous syntax
do
Starts a block.
else
The unhandled condition in case, if and unless expressions. See control expressions
elsif
An alternate condition for an if expression. See control expressions
end
The end of a syntax block. Used by classes, modules, methods, exception handling and control
expressions.
ensure
Starts a section of code that is always run when an exception is raised. See exception handling
false
Boolean false. See literals
for
A loop that is similar to using the each method. See control expressions
if
Used for if and modifier if statements. See control expressions
in
Used to separate the iterable object and iterator variable in a for loop. See control expressions It
also serves as a pattern in a case expression. See pattern matching
module
Creates or opens a module. See modules and classes syntax
next
Skips the rest of the block. See control expressions
nil
A false value usually indicating “no value” or “unknown”. See literals
not
Inverts the following boolean expression. Has a lower precedence than !
or
Boolean or with lower precedence than ||
redo
Restarts execution in the current block. See control expressions
rescue
Starts an exception section of code in a begin block. See exception handling
retry
Retries an exception block. See exception handling
return
Exits a method. See methods. If met in top-level scope, immediately stops interpretation of the
current file.
self
The object the current method is attached to. See methods
super
Calls the current method in a superclass. See methods
then
Indicates the end of conditional blocks in control structures. See control expressions
true
Boolean true. See literals
undef
Prevents a class or module from responding to a method call. See modules and classes
unless
Used for unless and modifier unless statements. See control expressions
until
Creates a loop that executes until the condition is true. See control expressions
when
A condition in a case expression. See control expressions
while
Creates a loop that executes while the condition is true. See control expressions
yield
Starts execution of the block sent to the current method. See methods

Maintainers
This page describes the current module, library, and extension maintainers of Ruby.
Module Maintainers
A module maintainer is responsible for a certain part of Ruby.
 The maintainer fixes bugs of the part. Particularly, they should fix security vulnerabilities as soon
as possible.
 They handle issues related the module on the Redmine or ML.
 They may be discharged by the 3 months rule [ruby-core:25764].
 They have commit right to Ruby’s repository to modify their part in the repository.
 They have “developer” role on the Redmine to modify issues.
 They have authority to decide the feature of their part. But they should always respect discussions
on ruby-core/ruby-dev.
A submaintainer of a module is like a maintainer. But the submaintainer does not have authority to
change/add a feature on his/her part. They need consensus on ruby-core/ruby-dev before changing/adding.
Some of submaintainers have commit right, others don’t.

Language core features including security


Yukihiro Matsumoto (matz)

Evaluator
Koichi Sasada (ko1)

Core classes
Yukihiro Matsumoto (matz)

Standard Library Maintainers

Libraries
lib/mkmf.rb
unmaintained
lib/rubygems.rb, lib/rubygems/*
Eric Hodel (drbrain), Hiroshi SHIBATA (hsbt) github.com/rubygems/rubygems
lib/unicode_normalize.rb, lib/unicode_normalize/*
Martin J. Dürst

Extensions
ext/continuation
Koichi Sasada (ko1)
ext/coverage
Yusuke Endoh (mame)
ext/fiber
Koichi Sasada (ko1)
ext/monitor
Koichi Sasada (ko1)
ext/objspace
unmaintained
ext/pty
unmaintained
ext/ripper
unmaintained
ext/socket
 Tanaka Akira (akr)
 API change needs matz’s approval
ext/win32
NAKAMURA Usaku (usa)

Default gems Maintainers

Libraries
lib/abbrev.rb
Akinori MUSHA (knu) github.com/ruby/abbrev rubygems.org/gems/abbrev
lib/base64.rb
Yusuke Endoh (mame) github.com/ruby/base64 rubygems.org/gems/base64
lib/benchmark.rb
unmaintained github.com/ruby/benchmark rubygems.org/gems/benchmark
lib/bundler.rb, lib/bundler/*
Hiroshi SHIBATA (hsbt) github.com/rubygems/rubygems rubygems.org/gems/bundler
lib/cgi.rb, lib/cgi/*
unmaintained github.com/ruby/cgi rubygems.org/gems/cgi
lib/csv.rb
Kenta Murata (mrkn), Kouhei Sutou (kou) github.com/ruby/csv rubygems.org/gems/csv
lib/English.rb
unmaintained github.com/ruby/English rubygems.org/gems/English
lib/debug.rb
unmaintained github.com/ruby/debug
lib/delegate.rb
unmaintained github.com/ruby/delegate rubygems.org/gems/delegate
lib/did_you_mean.rb
Yuki Nishijima (yuki24) github.com/ruby/did_you_mean rubygems.org/gems/did_you_mean
ext/digest, ext/digest/*
Akinori MUSHA (knu) github.com/ruby/digest rubygems.org/gems/digest
lib/drb.rb, lib/drb/*
Masatoshi SEKI (seki) github.com/ruby/drb rubygems.org/gems/drb
lib/erb.rb
Masatoshi SEKI (seki), Takashi Kokubun (k0kubun) github.com/ruby/erb rubygems.org/gems/erb
lib/error_highlight.rb, lib/error_highlight/*
Yusuke Endoh (mame) github.com/ruby/error_highlight rubygems.org/gems/error_highlight
lib/fileutils.rb
unmaintained github.com/ruby/fileutils rubygems.org/gems/fileutils
lib/find.rb
Kazuki Tsujimoto (ktsj) github.com/ruby/find rubygems.org/gems/find
lib/forwardable.rb
Keiju ISHITSUKA (keiju) github.com/ruby/forwardable rubygems.org/gems/forwardable
lib/getoptlong.rb
unmaintained github.com/ruby/getoptlong rubygems.org/gems/getoptlong
lib/ipaddr.rb
Akinori MUSHA (knu) github.com/ruby/ipaddr rubygems.org/gems/ipaddr
lib/irb.rb, lib/irb/*
aycabta github.com/ruby/irb rubygems.org/gems/irb
lib/optparse.rb, lib/optparse/*
Nobuyuki Nakada (nobu) github.com/ruby/optparse
lib/logger.rb
Naotoshi Seo (sonots) github.com/ruby/logger rubygems.org/gems/logger
lib/mutex_m.rb
Keiju ISHITSUKA (keiju) github.com/ruby/mutex_m rubygems.org/gems/mutex_m
lib/net/http.rb, lib/net/https.rb
NARUSE, Yui (naruse) github.com/ruby/net-http rubygems.org/gems/net-http
lib/net/protocol.rb
unmaintained github.com/ruby/net-protocol rubygems.org/gems/net-protocol
lib/observer.rb
unmaintained github.com/ruby/observer rubygems.org/gems/observer
lib/open3.rb
unmaintained github.com/ruby/open3 rubygems.org/gems/open3
lib/open-uri.rb
Tanaka Akira (akr) github.com/ruby/open-uri
lib/ostruct.rb
Marc-André Lafortune (marcandre) github.com/ruby/ostruct rubygems.org/gems/ostruct
lib/pp.rb
Tanaka Akira (akr) github.com/ruby/pp rubygems.org/gems/pp
lib/prettyprint.rb
Tanaka Akira (akr) github.com/ruby/prettyprint rubygems.org/gems/prettyprint
lib/pstore.rb
unmaintained github.com/ruby/pstore rubygems.org/gems/pstore
lib/racc.rb, lib/racc/*
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/raccrubygems.org/gems/racc
lib/readline.rb
aycabta github.com/ruby/readline rubygems.org/gems/readline
lib/resolv.rb
Tanaka Akira (akr) github.com/ruby/resolv rubygems.org/gems/resolv
lib/resolv-replace.rb
Tanaka Akira (akr) github.com/ruby/resolv-replace rubygems.org/gems/resolv-replace
lib/rdoc.rb, lib/rdoc/*
Eric Hodel (drbrain), Hiroshi SHIBATA (hsbt) github.com/ruby/rdoc rubygems.org/gems/rdoc
lib/readline.rb
aycabta github.com/ruby/readline rubygems.org/gems/readline
lib/reline.rb, lib/reline/*
aycabta github.com/ruby/reline rubygems.org/gems/reline
lib/rinda/*
Masatoshi SEKI (seki) github.com/ruby/rinda rubygems.org/gems/rinda
lib/securerandom.rb
Tanaka Akira (akr) github.com/ruby/securerandom rubygems.org/gems/securerandom
lib/set.rb
Akinori MUSHA (knu) github.com/ruby/set rubygems.org/gems/set
lib/shellwords.rb
Akinori MUSHA (knu) github.com/ruby/shellwords rubygems.org/gems/shellwords
lib/singleton.rb
Yukihiro Matsumoto (matz) github.com/ruby/singleton rubygems.org/gems/singleton
lib/tempfile.rb
unmaintained github.com/ruby/tempfile rubygems.org/gems/tempfile
lib/time.rb
Tanaka Akira (akr) github.com/ruby/time rubygems.org/gems/time
lib/timeout.rb
Yukihiro Matsumoto (matz) github.com/ruby/timeout rubygems.org/gems/timeout
lib/thwait.rb
Keiju ISHITSUKA (keiju) github.com/ruby/thwait rubygems.org/gems/thwait
lib/tmpdir.rb
unmaintained github.com/ruby/tmpdir rubygems.org/gems/tmpdir
lib/tsort.rb
Tanaka Akira (akr) github.com/ruby/tsort rubygems.org/gems/tsort
lib/un.rb
WATANABE Hirofumi (eban) github.com/ruby/un rubygems.org/gems/un
lib/uri.rb, lib/uri/*
YAMADA, Akira (akira) github.com/ruby/uri rubygems.org/gems/uri
lib/yaml.rb, lib/yaml/*
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/yamlrubygems.org/gems/yaml
lib/weakref.rb
unmaintained github.com/ruby/weakref rubygems.org/gems/weakref

Extensions
ext/bigdecimal
Kenta Murata (mrkn) github.com/ruby/bigdecimal rubygems.org/gems/bigdecimal
ext/cgi
Nobuyoshi Nakada (nobu) github.com/ruby/cgi rubygems.org/gems/cgi
ext/date
unmaintained github.com/ruby/date rubygems.org/gems/date
ext/etc
Ruby core team github.com/ruby/etc rubygems.org/gems/etc
ext/fcntl
Ruby core team github.com/ruby/fcntl rubygems.org/gems/fcntl
ext/fiddle
Aaron Patterson (tenderlove) github.com/ruby/fiddle rubygems.org/gems/fiddle
ext/io/console
Nobuyuki Nakada (nobu) github.com/ruby/io-console rubygems.org/gems/io-console
ext/io/nonblock
Nobuyuki Nakada (nobu) github.com/ruby/io-nonblock rubygems.org/gems/io-nonblock
ext/io/wait
Nobuyuki Nakada (nobu) github.com/ruby/io-wait rubygems.org/gems/io-wait
ext/json
NARUSE, Yui (naruse), Hiroshi SHIBATA (hsbt) github.com/flori/json rubygems.org/gems/json
ext/nkf
NARUSE, Yui (naruse) github.com/ruby/nkf rubygems.org/gems/nkf
ext/openssl
Kazuki Yamaguchi (rhe) github.com/ruby/openssl rubygems.org/gems/openssl
ext/pathname
Tanaka Akira (akr) github.com/ruby/pathname rubygems.org/gems/pathname
ext/psych
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/psychrubygems.org/gems/psych
ext/racc
Aaron Patterson (tenderlove), Hiroshi SHIBATA
(hsbt) github.com/ruby/raccrubygems.org/gems/racc
ext/readline
TAKAO Kouji (kouji) github.com/ruby/readline-ext rubygems.org/gems/readline-ext
ext/stringio
Nobuyuki Nakada (nobu) github.com/ruby/stringio rubygems.org/gems/stringio
ext/strscan
Kouhei Sutou (kou) github.com/ruby/strscan rubygems.org/gems/strscan
ext/syslog
Akinori MUSHA (knu) github.com/ruby/syslog rubygems.org/gems/syslog
ext/win32ole
Masaki Suketa (suke) github.com/ruby/win32ole rubygems.org/gems/win32ole
ext/zlib
NARUSE, Yui (naruse) github.com/ruby/zlib rubygems.org/gems/zlib

Bundled gems upstream repositories


minitest
github.com/seattlerb/minitest
power_assert
github.com/ruby/power_assert
rake
github.com/ruby/rake
test-unit
github.com/test-unit/test-unit
rexml
github.com/ruby/rexml
rss
github.com/ruby/rss
net-ftp
github.com/ruby/net-ftp
net-imap
github.com/ruby/net-imap
net-pop
github.com/ruby/net-pop
net-smtp
github.com/ruby/net-smtp
matrix
github.com/ruby/matrix
prime
github.com/ruby/prime
rbs
github.com/ruby/rbs
typeprof
github.com/ruby/typeprof

Platform Maintainers
mswin64 (Microsoft Windows)
NAKAMURA Usaku (usa)
mingw32 (Minimalist GNU for Windows)
Nobuyoshi Nakada (nobu)
AIX
Yutaka Kanemoto (kanemoto)
FreeBSD
Akinori MUSHA (knu)
Solaris
Naohisa Goto (ngoto)
RHEL, CentOS
KOSAKI Motohiro (kosaki)
macOS
Kenta Murata (mrkn)
OpenBSD
Jeremy Evans (jeremyevans0)
cygwin, …
none. (Maintainer WANTED)
WebAssembly/WASI
Yuta Saito (katei)
Marshal Format
The Marshal format is used to serialize ruby objects. The format can store arbitrary objects through three
user-defined extension mechanisms.
For documentation on using Marshal to serialize and deserialize objects, see the Marshal module.
This document calls a serialized set of objects a stream. The Ruby implementation can load a set of objects
from a String, an IO or an object that implements a getc method.

Stream Format
The first two bytes of the stream contain the major and minor version, each as a single byte encoding a
digit. The version implemented in Ruby is 4.8 (stored as “x04x08”) and is supported by ruby 1.8.0 and
newer.
Different major versions of the Marshal format are not compatible and cannot be understood by other major
versions. Lesser minor versions of the format can be understood by newer minor versions. Format 4.7 can
be loaded by a 4.8 implementation but format 4.8 cannot be loaded by a 4.7 implementation.
Following the version bytes is a stream describing the serialized object. The stream contains nested objects
(the same as a Ruby object) but objects in the stream do not necessarily have a direct mapping to the Ruby
object model.
Each object in the stream is described by a byte indicating its type followed by one or more bytes describing
the object. When “object” is mentioned below it means any of the types below that defines a Ruby object.

true, false, nil


These objects are each one byte long. “T” is represents true, “F” represents false and “0” represents nil.

Fixnum and long


“i” represents a signed 32 bit value using a packed format. One through five bytes follows the type. The
value loaded will always be a Fixnum. On 32 bit platforms (where the precision of a Fixnum is less than 32
bits) loading large values will cause overflow on CRuby.
The fixnum type is used to represent both ruby Fixnum objects and the sizes of marshaled arrays, hashes,
instance variables and other types. In the following sections “long” will mean the format described below,
which supports full 32 bit precision.
The first byte has the following special values:
“x00”
The value of the integer is 0. No bytes follow.
“x01”
The total size of the integer is two bytes. The following byte is a positive integer in the range of 0
through 255. Only values between 123 and 255 should be represented this way to save bytes.
“xff”
The total size of the integer is two bytes. The following byte is a negative integer in the range of -
1 through -256.
“x02”
The total size of the integer is three bytes. The following two bytes are a positive little-endian
integer.
“xfe”
The total size of the integer is three bytes. The following two bytes are a negative little-endian
integer.
“x03”
The total size of the integer is four bytes. The following three bytes are a positive little-endian
integer.
“xfd”
The total size of the integer is four bytes. The following three bytes are a negative little-endian
integer.
“x04”
The total size of the integer is five bytes. The following four bytes are a positive little-endian
integer. For compatibility with 32 bit ruby, only Fixnums less than 1073741824 should be
represented this way. For sizes of stream objects full precision may be used.
“xfc”
The total size of the integer is five bytes. The following four bytes are a negative little-endian
integer. For compatibility with 32 bit ruby, only Fixnums greater than -10737341824 should be
represented this way. For sizes of stream objects full precision may be used.
Otherwise the first byte is a sign-extended eight-bit value with an offset. If the value is positive the value is
determined by subtracting 5 from the value. If the value is negative the value is determined by adding 5 to
the value.
There are multiple representations for many values. CRuby always outputs the shortest representation
possible.

Symbols and Byte Sequence


“:” represents a real symbol. A real symbol contains the data needed to define the symbol for the rest of the
stream as future occurrences in the stream will instead be references (a symbol link) to this one. The
reference is a zero-indexed 32 bit value (so the first occurrence of :hello is 0).
Following the type byte is byte sequence which consists of a long indicating the number of bytes in the
sequence followed by that many bytes of data. Byte sequences have no encoding.
For example, the following stream contains the Symbol :hello:

"\x04\x08:\x0ahello"

“;” represents a Symbol link which references a previously defined Symbol. Following the type byte is a long
containing the index in the lookup table for the linked (referenced) Symbol.
For example, the following stream contains [:hello, :hello]:
"\x04\b[\a:\nhello;\x00"

When a “symbol” is referenced below it may be either a real symbol or a symbol link.
Object References
Separate from but similar to symbol references, the stream contains only one copy of each object (as
determined by object_id) for all objects except true, false, nil, Fixnums and Symbols (which are stored
separately as described above) a one-indexed 32 bit value will be stored and reused when the object is
encountered again. (The first object has an index of 1).
“@” represents an object link. Following the type byte is a long giving the index of the object.
For example, the following stream contains an Array of the same "hello" object twice:

"\004\b[\a\"\nhello@\006"

Instance Variables
“I” indicates that instance variables follow the next object. An object follows the type byte. Following the
object is a length indicating the number of instance variables for the object. Following the length is a set of
name-value pairs. The names are symbols while the values are objects. The symbols must be instance
variable names (:@name).
An Object (“o” type, described below) uses the same format for its instance variables as described here.
For a String and Regexp (described below) a special instance variable :E is used to indicate the Encoding.

Extended
“e” indicates that the next object is extended by a module. An object follows the type byte. Following the
object is a symbol that contains the name of the module the object is extended by.
Array
“[” represents an Array. Following the type byte is a long indicating the number of objects in the array. The
given number of objects follow the length.

Bignum
“l” represents a Bignum which is composed of three parts:
sign
A single byte containing “+” for a positive value or “-” for a negative value.
length
A long indicating the number of bytes of Bignum data follows, divided by two. Multiply the length
by two to determine the number of bytes of data that follow.
data
Bytes of Bignum data representing the number.
The following ruby code will reconstruct the Bignum value from an array of bytes:

result = 0

bytes.each_with_index do |byte, exp|


result += (byte * 2 ** (exp * 8))
end

Class and Module


“c” represents a Class object, “m” represents a Module and “M” represents either a class or module (this is
an old-style for compatibility). No class or module content is included, this type is only a reference.
Following the type byte is a byte sequence which is used to look up an existing class or module,
respectively.
Instance variables are not allowed on a class or module.
If no class or module exists an exception should be raised.
For “c” and “m” types, the loaded object must be a class or module, respectively.
Data
“d” represents a Data object. (Data objects are wrapped pointers from ruby extensions.) Following the type
byte is a symbol indicating the class for the Data object and an object that contains the state of
the Data object.
To dump a Data object Ruby calls _dump_data. To load a Data object Ruby calls _load_data with the state
of the object on a newly allocated instance.
Float
“f” represents a Float object. Following the type byte is a byte sequence containing the float value. The
following values are special:
“inf”
Positive infinity
“-inf”
Negative infinity
“nan”
Not a Number
Otherwise the byte sequence contains a C double (loadable by strtod(3)). Older minor versions
of Marshal also stored extra mantissa bits to ensure portability across platforms but 4.8 does not include
these. See
ruby-talk:69518
for some explanation.
Hash and Hash with Default Value
“{” represents a Hash object while “}” represents a Hash with a default value set (Hash.new 0). Following the
type byte is a long indicating the number of key-value pairs in the Hash, the size. Double the given number
of objects follow the size.
For a Hash with a default value, the default value follows all the pairs.
Module and Old Module
Object
“o” represents an object that doesn’t have any other special form (such as a user-defined or built-in format).
Following the type byte is a symbol containing the class name of the object. Following the class name is a
long indicating the number of instance variable names and values for the object. Double the given number
of pairs of objects follow the size.
The keys in the pairs must be symbols containing instance variable names.

Regular Expression
“/” represents a regular expression. Following the type byte is a byte sequence containing the regular
expression source. Following the type byte is a byte containing the regular expression options (case-
insensitive, etc.) as a signed 8-bit value.
Regular expressions can have an encoding attached through instance variables (see above). If no encoding
is attached escapes for the following regexp specials not present in ruby 1.8 must be removed: g-m, o-q, u,
y, E, F, H-L, N-V, X, Y.
String
‘“’ represents a String. Following the type byte is a byte sequence containing the string content. When
dumped from ruby 1.9 an encoding instance variable (:E see above) should be included unless the
encoding is binary.
Struct
“S” represents a Struct. Following the type byte is a symbol containing the name of the struct. Following the
name is a long indicating the number of members in the struct. Double the number of objects follow the
member count. Each member is a pair containing the member’s symbol and an object for the value of that
member.
If the struct name does not match a Struct subclass in the running ruby an exception should be raised.
If there is a mismatch between the struct in the currently running ruby and the member count in the
marshaled struct an exception should be raised.
User Class
“C” represents a subclass of a String, Regexp, Array or Hash. Following the type byte is a symbol
containing the name of the subclass. Following the name is the wrapped object.

User Defined
“u” represents an object with a user-defined serialization format using the _dump instance method
and _load class method. Following the type byte is a symbol containing the class name. Following the class
name is a byte sequence containing the user-defined representation of the object.
The class method _load is called on the class with a string created from the byte-sequence.
User Marshal
“U” represents an object with a user-defined serialization format using
the marshal_dump and marshal_load instance methods. Following the type byte is a symbol containing the
class name. Following the class name is an object containing the data.
Upon loading a new instance must be allocated and marshal_load must be called on the instance with the
data.
MemoryView
MemoryView provides the features to share multidimensional homogeneous arrays of fixed-size element on
memory among extension libraries.

Disclaimer
 This feature is still experimental. The specification described here can be changed in the future.
 This document is under construction. Please refer the master branch of ruby for the latest version
of this document.

Overview
We sometimes deal with certain kinds of objects that have arrays of the same typed fixed-size elements on
a contiguous memory area as its internal representation. Numo::NArray in numo-narray and Magick::Image
in rmagick are typical examples of such objects. MemoryView plays the role of the hub to share the internal
data of such objects without copy among such libraries.
Copy-less sharing of data is very important in some field such as data analysis, machine learning, and
image processing. In these field, people need to handle large amount of on-memory data with several
libraries. If we are forced to copy to exchange large data among libraries, a large amount of the data
processing time must be occupied by copying data. You can avoid such wasting time by using
MemoryView.
MemoryView has two categories of APIs:
1. Producer API
Classes can register own MemoryView entry which allows objects of that classes to expose their
MemoryView
2. Consumer API
Consumer API allows us to obtain and manage the MemoryView of an object

MemoryView structure
A MemoryView structure, rb_memory_view_t, is used for exporting objects’ MemoryView. This structure
contains the reference of the object, which is the owner of the MemoryView, the pointer to the head of
exported memory, and the metadata that describes the structure of the memory. The metadata can describe
multidimensional arrays with strides.

The member of MemoryView structure


The MemoryView structure consists of the following members.
 VALUE obj
The reference to the original object that has the memory exported via the MemoryView.
RubyVM manages the reference count of the MemoryView-exported objects to guard them from
the garbage collection. The consumers do not have to struggle to guard this object from GC.
 void *data
The pointer to the head of the exported memory.
 ssize_t byte_size
The number of bytes in the memory pointed by data.
 bool readonly
true for readonly memory, false for writable memory.
 const char *format
A string to describe the format of an element, or NULL for unsigned byte.
 ssize_t item_size
The number of bytes in each element.
 const rb_memory_view_item_component_t *item_desc.components
The array of the metadata of the component in an element.
 size_t item_desc.length
The number of items in item_desc.components.
 ssize_t ndim
The number of dimensions.
 const ssize_t *shape
A ndim size array indicating the number of elements in each dimension. This can
be NULLwhen ndim is 1.
 const ssize_t *strides
A ndim size array indicating the number of bytes to skip to go to the next element in each
dimension. This can be NULL when ndim is 1.
 const ssize_t *sub_offsets
A ndim size array consisting of the offsets in each dimension when the MemoryView exposes a
nested array. This can be NULL when the MemoryView exposes a flat array.
 void *private_data
The private data that MemoryView provider uses internally. This can be NULL when any private
data is unnecessary.

MemoryView APIs

For consumers
 bool rb_memory_view_available_p(VALUE obj)
Return true if obj supports to export a MemoryView. Return false otherwise.
If this function returns true, it doesn’t mean the function rb_memory_view_get will succeed.
 bool rb_memory_view_get(VALUE obj, rb_memory_view_t *view, int flags)
If the given obj supports to export a MemoryView that conforms the given flags, this function
fills view by the information of the MemoryView and returns true. In this case, the reference count
of obj is increased.
If the given combination of obj and flags cannot export a MemoryView, this function returns false.
The content of view is not touched in this case.
The exported MemoryView must be released by rb_memory_view_release when the MemoryView
is no longer needed.
 bool rb_memory_view_release(rb_memory_view_t *view)
Release the given MemoryView view and decrement the reference count of view->obj.
Consumers must call this function when the MemoryView is no longer needed. Missing to call this
function leads memory leak.
 ssize_t rb_memory_view_item_size_from_format(const char *format, const char **err)
Calculate the number of bytes occupied by an element.
When the calculation fails, the failed location in format is stored into err, and returns -1.
 void *rb_memory_view_get_item_pointer(rb_memory_view_t *view, const ssize_t *indices)
Calculate the location of the item indicated by the given indices. The length of indicesmust equal
to view->ndim. This function initializes view->item_desc if needed.
 VALUE rb_memory_view_get_item(rb_memory_view_t *view, const ssize_t *indices)
Return the Ruby object representation of the item indicated by the given indices. The length
of indices must equal to view->ndim. This function uses rb_memory_view_get_item_pointer.
 rb_memory_view_init_as_byte_array(rb_memory_view_t *view, VALUE obj, void *data, const
ssize_t len, const bool readonly)
Fill the members of view as an 1-dimensional byte array.
 void rb_memory_view_fill_contiguous_strides(const ssize_t ndim, const ssize_t item_size, const
ssize_t *const shape, const bool row_major_p, ssize_t *const strides)
Fill the strides array with byte-Strides of a contiguous array of the given shape with the given element size.
 void rb_memory_view_prepare_item_desc(rb_memory_view_t *view)
Fill the item_desc member of view.
 bool rb_memory_view_is_contiguous(const rb_memory_view_t *view)
Return true if the data in the MemoryView view is row-major or column-major contiguous.
Return false otherwise.
 bool rb_memory_view_is_row_major_contiguous(const rb_memory_view_t *view)
Return true if the data in the MemoryView view is row-major contiguous.
Return false otherwise.
 bool rb_memory_view_is_column_major_contiguous(const rb_memory_view_t *view)
Return true if the data in the MemoryView view is column-major contiguous.
Return false otherwise.

Argument Converters
An option can specify that its argument is to be converted from the default String to an instance of another
class.

Contents
 Built-In Argument Converters
o Date
o DateTime
o Time
o URI
o Shellwords
o Integer
o Float
o Numeric
o DecimalInteger
o OctalInteger
o DecimalNumeric
o TrueClass
o FalseClass
o Object
o String
o Array
o Regexp
 Custom Argument Converters

Built-In Argument Converters


OptionParser has a number of built-in argument converters, which are demonstrated below.

Date
File date.rb defines an option whose argument is to be converted to a Date object. The argument is
converted by method Date#parse.

require 'optparse/date'
parser = OptionParser.new
parser.on('--date=DATE', Date) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby date.rb --date 2001-02-03

[#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, Date]

$ ruby date.rb --date 20010203

[#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, Date]

$ ruby date.rb --date "3rd Feb 2001"


[#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, Date]

DateTime
File datetime.rb defines an option whose argument is to be converted to a DateTime object. The argument
is converted by method DateTime#parse.

require 'optparse/date'
parser = OptionParser.new
parser.on('--datetime=DATETIME', DateTime) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby datetime.rb --datetime 2001-02-03T04:05:06+07:00

[#<DateTime: 2001-02-03T04:05:06+07:00 ((2451943j,75906s,0n),+25200s,2299161j)>, DateTime]

$ ruby datetime.rb --datetime 20010203T040506+0700

[#<DateTime: 2001-02-03T04:05:06+07:00 ((2451943j,75906s,0n),+25200s,2299161j)>, DateTime]

$ ruby datetime.rb --datetime "3rd Feb 2001 04:05:06 PM"

[#<DateTime: 2001-02-03T16:05:06+00:00 ((2451944j,57906s,0n),+0s,2299161j)>, DateTime]

Time
File time.rb defines an option whose argument is to be converted to a Time object. The argument is
converted by method Time#httpdate or Time#parse.

require 'optparse/time'
parser = OptionParser.new
parser.on('--time=TIME', Time) do |value|
p [value, value.class]
end
parser.parse!

Executions:
$ ruby time.rb --time "Thu, 06 Oct 2011 02:26:12 GMT"

[2011-10-06 02:26:12 UTC, Time]

$ ruby time.rb --time 2010-10-31

[2010-10-31 00:00:00 -0500, Time]

URI
File uri.rb defines an option whose argument is to be converted to a URI object. The argument is converted
by method URI#parse.

require 'optparse/uri'
parser = OptionParser.new
parser.on('--uri=URI', URI) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby uri.rb --uri https://github.com

[#<URI::HTTPS https://github.com>, URI::HTTPS]

$ ruby uri.rb --uri http://github.com

[#<URI::HTTP http://github.com>, URI::HTTP]

$ ruby uri.rb --uri file://~/var

[#<URI::File file://~/var>, URI::File]

Shellwords
File shellwords.rb defines an option whose argument is to be converted to an Array object by
method Shellwords#shellwords.

require 'optparse/shellwords'
parser = OptionParser.new
parser.on('--shellwords=SHELLWORDS', Shellwords) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby shellwords.rb --shellwords "ruby my_prog.rb | less"

[["ruby", "my_prog.rb", "|", "less"], Array]

$ ruby shellwords.rb --shellwords "here are 'two words'"

[["here", "are", "two words"], Array]

Integer
File integer.rb defines an option whose argument is to be converted to an Integer object. The argument is
converted by method Kernel#Integer.

require 'optparse'
parser = OptionParser.new
parser.on('--integer=INTEGER', Integer) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby integer.rb --integer 100

[100, Integer]

$ ruby integer.rb --integer -100

[-100, Integer]

$ ruby integer.rb --integer 0100

[64, Integer]

$ ruby integer.rb --integer 0x100

[256, Integer]
$ ruby integer.rb --integer 0b100

[4, Integer]

Float
File float.rb defines an option whose argument is to be converted to a Float object. The argument is
converted by method Kernel#Float.

require 'optparse'
parser = OptionParser.new
parser.on('--float=FLOAT', Float) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby float.rb --float 1

[1.0, Float]

$ ruby float.rb --float 3.14159

[3.14159, Float]

$ ruby float.rb --float 1.234E2

[123.4, Float]

$ ruby float.rb --float 1.234E-2

[0.01234, Float]

Numeric
File numeric.rb defines an option whose argument is to be converted to an instance of Rational, Float, or
Integer. The argument is converted by method Kernel#Rational, Kernel#Float, or Kernel#Integer.

require 'optparse'
parser = OptionParser.new
parser.on('--numeric=NUMERIC', Numeric) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby numeric.rb --numeric 1/3

[(1/3), Rational]

$ ruby numeric.rb --numeric 3.333E-1

[0.3333, Float]

$ ruby numeric.rb --numeric 3

[3, Integer]

DecimalInteger
File decimal_integer.rb defines an option whose argument is to be converted to an Integer object. The
argument is converted by method Kernel#Integer.

require 'optparse'
include OptionParser::Acceptables
parser = OptionParser.new
parser.on('--decimal_integer=DECIMAL_INTEGER', DecimalInteger) do |value|
p [value, value.class]
end
parser.parse!

The argument may not be in a binary or hexadecimal format; a leading zero is ignored (not parsed as octal).
Executions:

$ ruby decimal_integer.rb --decimal_integer 100

[100, Integer]

$ ruby decimal_integer.rb --decimal_integer -100

[-100, Integer]

$ ruby decimal_integer.rb --decimal_integer 0100


[100, Integer]

$ ruby decimal_integer.rb --decimal_integer -0100

[-100, Integer]

OctalInteger
File octal_integer.rb defines an option whose argument is to be converted to an Integer object. The
argument is converted by method Kernel#Integer.

require 'optparse'
include OptionParser::Acceptables
parser = OptionParser.new
parser.on('--octal_integer=OCTAL_INTEGER', OctalInteger) do |value|
p [value, value.class]
end
parser.parse!

The argument may not be in a binary or hexadecimal format; it is parsed as octal, regardless of whether it
has a leading zero.
Executions:

$ ruby octal_integer.rb --octal_integer 100

[64, Integer]

$ ruby octal_integer.rb --octal_integer -100

[-64, Integer]

$ ruby octal_integer.rb --octal_integer 0100

[64, Integer]

DecimalNumeric
File decimal_numeric.rb defines an option whose argument is to be converted to an Integer object. The
argument is converted by method Kernel#Integer

require 'optparse'
include OptionParser::Acceptables
parser = OptionParser.new
parser.on('--decimal_numeric=DECIMAL_NUMERIC', DecimalNumeric) do |value|
p [value, value.class]
end
parser.parse!

The argument may not be in a binary or hexadecimal format; a leading zero causes the argument to be
parsed as octal.
Executions:

$ ruby decimal_numeric.rb --decimal_numeric 100

[100, Integer]

$ ruby decimal_numeric.rb --decimal_numeric -100

[-100, Integer]

$ ruby decimal_numeric.rb --decimal_numeric 0100

[64, Integer]

TrueClass
File true_class.rb defines an option whose argument is to be converted to true or false. The argument is
evaluated by method Object#nil?.

require 'optparse'
parser = OptionParser.new
parser.on('--true_class=TRUE_CLASS', TrueClass) do |value|
p [value, value.class]
end
parser.parse!

The argument may be any of those shown in the examples below.


Executions:

$ ruby true_class.rb --true_class true

[true, TrueClass]

$ ruby true_class.rb --true_class yes


[true, TrueClass]

$ ruby true_class.rb --true_class +

[true, TrueClass]

$ ruby true_class.rb --true_class false

[false, FalseClass]

$ ruby true_class.rb --true_class no

[false, FalseClass]

$ ruby true_class.rb --true_class -

[false, FalseClass]

$ ruby true_class.rb --true_class nil

[false, FalseClass]

FalseClass
File false_class.rb defines an option whose argument is to be converted to true or false. The argument is
evaluated by method Object#nil?.

require 'optparse'
parser = OptionParser.new
parser.on('--false_class=FALSE_CLASS', FalseClass) do |value|
p [value, value.class]
end
parser.parse!

The argument may be any of those shown in the examples below.


Executions:

$ ruby false_class.rb --false_class false

[false, FalseClass]

$ ruby false_class.rb --false_class no

[false, FalseClass]
$ ruby false_class.rb --false_class -

[false, FalseClass]

$ ruby false_class.rb --false_class nil

[false, FalseClass]

$ ruby false_class.rb --false_class true

[true, TrueClass]

$ ruby false_class.rb --false_class yes

[true, TrueClass]

$ ruby false_class.rb --false_class +

[true, TrueClass]

Object
File object.rb defines an option whose argument is not to be converted from String.

require 'optparse'
parser = OptionParser.new
parser.on('--object=OBJECT', Object) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby object.rb --object foo

["foo", String]

$ ruby object.rb --object nil

["nil", String]

String
File string.rb defines an option whose argument is not to be converted from String.
require 'optparse'
parser = OptionParser.new
parser.on('--string=STRING', String) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby string.rb --string foo

["foo", String]

$ ruby string.rb --string nil

["nil", String]

Array
File array.rb defines an option whose argument is to be converted from String to an array of strings, based
on comma-separated substrings.

require 'optparse'
parser = OptionParser.new
parser.on('--array=ARRAY', Array) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby array.rb --array ""

[[], Array]

$ ruby array.rb --array foo,bar,baz

[["foo", "bar", "baz"], Array]

$ ruby array.rb --array "foo, bar, baz"

[["foo", " bar", " baz"], Array]


Regexp
File regexp.rb defines an option whose argument is to be converted to a Regexp object.

require 'optparse'
parser = OptionParser.new
parser.on('--regexp=REGEXP', Regexp) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby regexp.rb --regexp foo

Custom Argument Converters


You can create custom argument converters. To create a custom converter, call OptionParser#accept with:
 An identifier, which may be any object.

 An optional match pattern, which defaults to /.*/m.


 A block that accepts the argument and returns the converted value.
This custom converter accepts any argument and converts it, if possible, to a Complex object.

require 'optparse/date'
parser = OptionParser.new
parser.accept(Complex) do |value|
value.to_c
end
parser.on('--complex COMPLEX', Complex) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby custom_converter.rb --complex 0

[(0+0i), Complex]

$ ruby custom_converter.rb --complex 1

[(1+0i), Complex]

$ ruby custom_converter.rb --complex 1+2i


[(1+2i), Complex]

$ ruby custom_converter.rb --complex 0.3-0.5i

[(0.3-0.5i), Complex]

This custom converter accepts any 1-word argument and capitalizes it, if possible.

require 'optparse/date'
parser = OptionParser.new
parser.accept(:capitalize, /\w*/) do |value|
value.capitalize
end
parser.on('--capitalize XXX', :capitalize) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby match_converter.rb --capitalize foo

["Foo", String]

$ ruby match_converter.rb --capitalize "foo bar"

match_converter.rb:9:in `<main>': invalid argument: --capitalize foo bar (OptionParser::InvalidArgument)

Creates an option from the given parameters params. See Parameters for New Options.
The block, if given, is the handler for the created option. When the option is encountered during command-
line parsing, the block is called with the argument given for the option, if any. See Option Handlers.

Parameters for New Options


Option-creating methods in OptionParser accept arguments that determine the behavior of a new option:
 OptionParser#on
 OptionParser#on_head
 OptionParser#on_tail
 OptionParser#define
 OptionParser#define_head
 OptionParser#define_tail
 OptionParser#make_switch
The code examples on this page use:
 OptionParser#on, to define options.
 OptionParser#parse!, to parse the command line.
 Built-in option --help, to display defined options.
Contents:
 Option Names
o Short Names
 Simple Short Names
 Short Names with Required Arguments
 Short Names with Optional Arguments
 Short Names from Range
o Long Names
 Simple Long Names
 Long Names with Required Arguments
 Long Names with Optional Arguments
 Long Names with Negation
o Mixed Names
 Argument Styles
 Argument Values
o Explicit Argument Values
 Explicit Values in Array
 Explicit Values in Hash
o Argument Value Patterns
 Argument Converters
 Descriptions
 Option Handlers
o Handler Blocks
o Handler Procs
o Handler Methods

Option Names
There are two kinds of option names:
 Short option name, consisting of a single hyphen and a single character.
 Long option name, consisting of two hyphens and one or more characters.

Short Names

Simple Short Names


File short_simple.rb defines two options:
 One with short name -x.
 The other with two short names, in effect, aliases, -1 and -%.

require 'optparse'
parser = OptionParser.new
parser.on('-x', 'One short name') do |value|
p ['-x', value]
end
parser.on('-1', '-%', 'Two short names (aliases)') do |value|
p ['-1 or -%', value]
end
parser.parse!

Executions:

$ ruby short_simple.rb --help

Usage: short_simple [options]

-x One short name

-1, -% Two short names (aliases)

$ ruby short_simple.rb -x

["-x", true]

$ ruby short_simple.rb -1 -x -%

["-1 or -%", true]

["-x", true]

["-1 or -%", true]

Short Names with Required Arguments


A short name followed (no whitespace) by a dummy word defines an option that requires an argument.
File short_required.rb defines an option -x that requires an argument.

require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', 'Short name with required argument') do |value|
p ['-x', value]
end
parser.parse!

Executions:

$ ruby short_required.rb --help

Usage: short_required [options]

-xXXX Short name with required argument

$ ruby short_required.rb -x

short_required.rb:6:in `<main>': missing argument: -x (OptionParser::MissingArgument)

$ ruby short_required.rb -x FOO

["-x", "FOO"]

Short Names with Optional Arguments


A short name followed (with whitespace) by a dummy word in square brackets defines an option that allows
an optional argument.
File short_optional.rb defines an option -x that allows an optional argument.

require 'optparse'
parser = OptionParser.new
parser.on('-x [XXX]', 'Short name with optional argument') do |value|
p ['-x', value]
end
parser.parse!

Executions:

$ ruby short_optional.rb --help

Usage: short_optional [options]

-x [XXX] Short name with optional argument

$ ruby short_optional.rb -x

["-x", nil]
$ ruby short_optional.rb -x FOO

["-x", "FOO"]

Short Names from Range


You can define an option with multiple short names taken from a range of characters. The parser yields both
the actual character cited and the value.
File short_range.rb defines an option with short names for all printable characters from ! to ~:

require 'optparse'
parser = OptionParser.new
parser.on('-[!-~]', 'Short names in (very large) range') do |name, value|
p ['!-~', name, value]
end
parser.parse!

Executions:

$ ruby short_range.rb --help

Usage: short_range [options]

-[!-~] Short names in (very large) range

$ ruby short_range.rb -!

["!-~", "!", nil]

$ ruby short_range.rb -!

["!-~", "!", nil]

$ ruby short_range.rb -A

["!-~", "A", nil]

$ ruby short_range.rb -z

["!-~", "z", nil]

Long Names

Simple Long Names


File long_simple.rb defines two options:
 One with long name -xxx.
 The other with two long names, in effect, aliases, --y1% and --z2#.
require ‘optparse’ parser = OptionParser.new parser.on(‘–xxx’, ‘One long name’) do |value|

p ['--xxx', value]

end parser.on(‘–y1%’, ‘–z2#’, ‘Two long names (aliases)’) do |value|

p ['--y1% or --z2#', value]

end parser.parse!
Executions:

$ ruby long_simple.rb --help

Usage: long_simple [options]

--xxx One long name

--y1%, --z2# Two long names (aliases)

$ ruby long_simple.rb --xxx

["--xxx", true]

$ ruby long_simple.rb --y1% --xxx --z2#

["--y1% or --z2#", true]

["--xxx", true]

["--y1% or --z2#", true]

Long Names with Required Arguments


A long name followed (with whitespace) by a dummy word defines an option that requires an argument.
File long_required.rb defines an option --xxx that requires an argument.

require 'optparse'
parser = OptionParser.new
parser.on('--xxx XXX', 'Long name with required argument') do |value|
p ['--xxx', value]
end
parser.parse!
Executions:

$ ruby long_required.rb --help

Usage: long_required [options]

--xxx XXX Long name with required argument

$ ruby long_required.rb --xxx

long_required.rb:6:in `<main>': missing argument: --xxx (OptionParser::MissingArgument)

$ ruby long_required.rb --xxx FOO

["--xxx", "FOO"]

Long Names with Optional Arguments


A long name followed (with whitespace) by a dummy word in square brackets defines an option that allows
an optional argument.
File long_optional.rb defines an option --xxx that allows an optional argument.

require 'optparse'
parser = OptionParser.new
parser.on('--xxx [XXX]', 'Long name with optional argument') do |value|
p ['--xxx', value]
end
parser.parse!

Executions:

$ ruby long_optional.rb --help

Usage: long_optional [options]

--xxx [XXX] Long name with optional argument

$ ruby long_optional.rb --xxx

["--xxx", nil]

$ ruby long_optional.rb --xxx FOO

["--xxx", "FOO"]
Long Names with Negation
A long name may be defined with both positive and negative senses.
File long_with_negation.rb defines an option that has both senses.

require 'optparse'
parser = OptionParser.new
parser.on('--[no-]binary', 'Long name with negation') do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby long_with_negation.rb --help

Usage: long_with_negation [options]

--[no-]binary Long name with negation

$ ruby long_with_negation.rb --binary

[true, TrueClass]

$ ruby long_with_negation.rb --no-binary

[false, FalseClass]

Mixed Names
An option may have both short and long names.
File mixed_names.rb defines a mixture of short and long names.

require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument') do |value|
p ['--xxx', value]
end
parser.on('-yYYY', '--yyy', 'Short and long, required argument') do |value|
p ['--yyy', value]
end
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument') do |value|
p ['--zzz', value]
end
parser.parse!
Executions:

$ ruby mixed_names.rb --help

Usage: mixed_names [options]

-x, --xxx Short and long, no argument

-y, --yyyYYY Short and long, required argument

-z, --zzz [ZZZ] Short and long, optional argument

$ ruby mixed_names.rb -x

["--xxx", true]

$ ruby mixed_names.rb --xxx

["--xxx", true]

$ ruby mixed_names.rb -y

mixed_names.rb:12:in `<main>': missing argument: -y (OptionParser::MissingArgument)

$ ruby mixed_names.rb -y FOO

["--yyy", "FOO"]

$ ruby mixed_names.rb --yyy

mixed_names.rb:12:in `<main>': missing argument: --yyy (OptionParser::MissingArgument)

$ ruby mixed_names.rb --yyy BAR

["--yyy", "BAR"]

$ ruby mixed_names.rb -z

["--zzz", nil]

$ ruby mixed_names.rb -z BAZ

["--zzz", "BAZ"]

$ ruby mixed_names.rb --zzz


["--zzz", nil]

$ ruby mixed_names.rb --zzz BAT

["--zzz", "BAT"]

Argument Keywords
As seen above, a given option name string may itself indicate whether the option has no argument, a
required argument, or an optional argument.
An alternative is to use a separate symbol keyword, which is one of :NONE (the
default), :REQUIRED, :OPTIONAL.
File argument_keywords.rb defines an option with a required argument.

require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', :REQUIRED, 'Required argument') do |value|
p ['--xxx', value]
end
parser.parse!

Executions:

$ ruby argument_keywords.rb --help

Usage: argument_keywords [options]

-x, --xxx Required argument

$ ruby argument_styles.rb --xxx

argument_styles.rb:6:in `<main>': missing argument: --xxx (OptionParser::MissingArgument)

$ ruby argument_styles.rb --xxx FOO

["--xxx", "FOO"]

Argument Strings
Still another way to specify a required argument is to define it in a string separate from the name string.
File argument_strings.rb defines an option with a required argument.
require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', '=XXX', 'Required argument') do |value|
p ['--xxx', value]
end
parser.parse!

Executions:

$ ruby argument_strings.rb --help

Usage: argument_strings [options]

-x, --xxx=XXX Required argument

$ ruby argument_strings.rb --xxx

argument_strings.rb:9:in `<main>': missing argument: --xxx (OptionParser::MissingArgument)

$ ruby argument_strings.rb --xxx FOO

["--xxx", "FOO"]

Argument Values
Permissible argument values may be restricted either by specifying explicit values or by providing a pattern
that the given value must match.

Explicit Argument Values


You can specify argument values in either of two ways:
 Specify values an array of strings.
 Specify values a hash.
Explicit Values in Array
You can specify explicit argument values in an array of strings. The argument value must be one of those
strings, or an unambiguous abbreviation.
File explicit_array_values.rb defines options with explicit argument values.

require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', ['foo', 'bar'], 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', ['baz', 'bat'], 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!

Executions:

$ ruby explicit_array_values.rb --help

Usage: explicit_array_values [options]

-xXXX Values for required argument

-y [YYY] Values for optional argument

$ ruby explicit_array_values.rb -x

explicit_array_values.rb:9:in `<main>': missing argument: -x (OptionParser::MissingArgument)

$ ruby explicit_array_values.rb -x foo

["-x", "foo"]

$ ruby explicit_array_values.rb -x f

["-x", "foo"]

$ ruby explicit_array_values.rb -x bar

["-x", "bar"]

$ ruby explicit_array_values.rb -y ba

explicit_array_values.rb:9:in `<main>': ambiguous argument: -y ba (OptionParser::AmbiguousArgument)

$ ruby explicit_array_values.rb -x baz

explicit_array_values.rb:9:in `<main>': invalid argument: -x baz (OptionParser::InvalidArgument)

Explicit Values in Hash


You can specify explicit argument values in a hash with string keys. The value passed must be one of those
keys, or an unambiguous abbreviation; the value yielded will be the value for that key.
File explicit_hash_values.rb defines options with explicit argument values.
require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', {foo: 0, bar: 1}, 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', {baz: 2, bat: 3}, 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!

Executions:

$ ruby explicit_hash_values.rb --help

Usage: explicit_hash_values [options]

-xXXX Values for required argument

-y [YYY] Values for optional argument

$ ruby explicit_hash_values.rb -x

explicit_hash_values.rb:9:in `<main>': missing argument: -x (OptionParser::MissingArgument)

$ ruby explicit_hash_values.rb -x foo

["-x", 0]

$ ruby explicit_hash_values.rb -x f

["-x", 0]

$ ruby explicit_hash_values.rb -x bar

["-x", 1]

$ ruby explicit_hash_values.rb -x baz

explicit_hash_values.rb:9:in `<main>': invalid argument: -x baz (OptionParser::InvalidArgument)

$ ruby explicit_hash_values.rb -y

["-y", nil]

$ ruby explicit_hash_values.rb -y baz


["-y", 2]

$ ruby explicit_hash_values.rb -y bat

["-y", 3]

$ ruby explicit_hash_values.rb -y ba

explicit_hash_values.rb:9:in `<main>': ambiguous argument: -y ba (OptionParser::AmbiguousArgument)

$ ruby explicit_hash_values.rb -y bam

["-y", nil]

Argument Value Patterns


You can restrict permissible argument values by specifying a Regexp that the given argument must match.
File matched_values.rb defines options with matched argument values.

require 'optparse'
parser = OptionParser.new
parser.on('--xxx XXX', /foo/i, 'Matched values') do |value|
p ['--xxx', value]
end
parser.parse!

Executions:

$ ruby matched_values.rb --help

Usage: matched_values [options]

--xxx XXX Matched values

$ ruby matched_values.rb --xxx foo

["--xxx", "foo"]

$ ruby matched_values.rb --xxx FOO

["--xxx", "FOO"]

$ ruby matched_values.rb --xxx bar


matched_values.rb:6:in `<main>': invalid argument: --xxx bar (OptionParser::InvalidArgument)

Argument Converters
An option can specify that its argument is to be converted from the default String to an instance of another
class.
There are a number of built-in converters. You can also define custom converters.
See Argument Converters.

Descriptions
A description parameter is any string parameter that is not recognized as an option name or a terminator; in
other words, it does not begin with a hyphen.
You may give any number of description parameters; each becomes a line in the text generated by option --
help.
File descriptions.rb has six strings in its array descriptions. These are all passed as parameters
to OptionParser#on, so that they all, line for line, become the option’s description.

require 'optparse'
parser = OptionParser.new
description = <<-EOT
Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Aenean commodo ligula eget.
Aenean massa. Cum sociis natoque penatibus
et magnis dis parturient montes, nascetur
ridiculus mus. Donec quam felis, ultricies
nec, pellentesque eu, pretium quis, sem.
EOT
descriptions = description.split($/)
parser.on('--xxx', *descriptions) do |value|
p ['--xxx', value]
end
parser.parse!

Executions:

$ ruby descriptions.rb --help

Usage: descriptions [options]

--xxx Lorem ipsum dolor sit amet, consectetuer


adipiscing elit. Aenean commodo ligula eget.

Aenean massa. Cum sociis natoque penatibus

et magnis dis parturient montes, nascetur

ridiculus mus. Donec quam felis, ultricies

nec, pellentesque eu, pretium quis, sem.

$ ruby descriptions.rb --xxx

["--xxx", true]

Option Handlers
The handler for an option is an executable that will be called when the option is encountered. The handler
may be:
 A block (this is most often seen).
 A proc.
 A method.

Handler Blocks
An option handler may be a block.
File block.rb defines an option that has a handler block.

require 'optparse'
parser = OptionParser.new
parser.on('--xxx', 'Option with no argument') do |value|
p ['Handler block for -xxx called with value:', value]
end
parser.on('--yyy YYY', 'Option with required argument') do |value|
p ['Handler block for -yyy called with value:', value]
end
parser.parse!

Executions:

$ ruby block.rb --help

Usage: block [options]


--xxx Option with no argument

--yyy YYY Option with required argument

$ ruby block.rb --xxx

["Handler block for -xxx called with value:", true]

$ ruby block.rb --yyy FOO

["Handler block for -yyy called with value:", "FOO"]

Handler Procs
An option handler may be a Proc.
File proc.rb defines an option that has a handler proc.

require 'optparse'
parser = OptionParser.new
parser.on(
'--xxx',
'Option with no argument',
->(value) {p ['Handler proc for -xxx called with value:', value]}
)
parser.on(
'--yyy YYY',
'Option with required argument',
->(value) {p ['Handler proc for -yyy called with value:', value]}
)
parser.parse!

Executions:

$ ruby proc.rb --help

Usage: proc [options]

--xxx Option with no argument

--yyy YYY Option with required argument

$ ruby proc.rb --xxx

["Handler proc for -xxx called with value:", true]


$ ruby proc.rb --yyy FOO

["Handler proc for -yyy called with value:", "FOO"]

Handler Methods
An option handler may be a Method.
File proc.rb defines an option that has a handler method.

require 'optparse'
parser = OptionParser.new
def xxx_handler(value)
p ['Handler method for -xxx called with value:', value]
end
parser.on('--xxx', 'Option with no argument', method(:xxx_handler))
def yyy_handler(value)
p ['Handler method for -yyy called with value:', value]
end
parser.on('--yyy YYY', 'Option with required argument', method(:yyy_handler))
parser.parse!

Executions:

$ ruby method.rb --help

Usage: method [options]

--xxx Option with no argument

--yyy YYY Option with required argument

$ ruby method.rb --xxx

["Handler method for -xxx called with value:", true]

$ ruby method.rb --yyy FOO

["Handler method for -yyy called with value:", "FOO"]

Tutorial

Why OptionParser?
When a Ruby program executes, it captures its command-line arguments and options into variable ARGV.
This simple program just prints its ARGV:

p ARGV

Execution, with arguments and options:

$ ruby argv.rb foo --bar --baz bat bam

["foo", "--bar", "--baz", "bat", "bam"]

The executing program is responsible for parsing and handling the command-line options.
OptionParser offers methods for parsing and handling those options.
With OptionParser, you can define options so that for each option:
 The code that defines the option and code that handles that option are in the same place.
 The option may take no argument, a required argument, or an optional argument.
 The argument may be automatically converted to a specified class.
 The argument may be restricted to specified forms.
 The argument may be restricted to specified values.
The class also has method help, which displays automatically-generated help text.

Contents
 To Begin With
 Defining Options
 Option Names
o Short Option Names
o Long Option Names
o Mixing Option Names
o Option Name Abbreviations
 Option Arguments
o Option with No Argument
o Option with Required Argument
o Option with Optional Argument
o Argument Abbreviations
 Argument Values
o Explicit Argument Values
 Explicit Values in Array
 Explicit Values in Hash
o Argument Value Patterns
 Keyword Argument into
o Collecting Options
o Checking for Missing Options
o Default Values for Options
 Argument Converters
 Help
 Top List and Base List
 Defining Options
 Parsing
o Method parse!
o Method parse
o Method order!
o Method order
o Method permute!
o Method permute

To Begin With
To use OptionParser:
1. Require the OptionParser code.
2. Create an OptionParser object.
3. Define one or more options.
4. Parse the command line.
File basic.rb defines three options, -x, -y, and -z, each with a descriptive string, and each with a block.

# Require the OptionParser code.


require 'optparse'
# Create an OptionParser object.
parser = OptionParser.new
# Define one or more options.
parser.on('-x', 'Whether to X') do |value|
p ['x', value]
end
parser.on('-y', 'Whether to Y') do |value|
p ['y', value]
end
parser.on('-z', 'Whether to Z') do |value|
p ['z', value]
end
# Parse the command line and return pared-down ARGV.
p parser.parse!

From these defined options, the parser automatically builds help text:

$ ruby basic.rb --help


Usage: basic [options]

-x Whether to X

-y Whether to Y

-z Whether to Z

When an option is found during parsing, the block defined for the option is called with the argument value.
An invalid option raises an exception.
Method parse!, which is used most often in this tutorial, removes from ARGV the options and arguments it
finds, leaving other non-option arguments for the program to handle on its own. The method returns the
possibly-reduced ARGV array.
Executions:

$ ruby basic.rb -x -z

["x", true]

["z", true]

[]

$ ruby basic.rb -z -y -x

["z", true]

["y", true]

["x", true]

[]

$ ruby basic.rb -x input_file.txt output_file.txt

["x", true]

["input_file.txt", "output_file.txt"]

$ ruby basic.rb -a

basic.rb:16:in `<main>': invalid option: -a (OptionParser::InvalidOption)


Defining Options
A common way to define an option in OptionParser is with instance method OptionParser#on.
The method may be called with any number of arguments (whose order does not matter), and may also
have a trailing optional keyword argument into.
The given arguments determine the characteristics of the new option. These may include:
 One or more short option names.
 One or more long option names.
 Whether the option takes no argument, an optional argument, or a required argument.
 Acceptable forms for the argument.
 Acceptable values for the argument.
 A proc or method to be called when the parser encounters the option.
 String descriptions for the option.

Option Names
You can give an option one or more names of two types:
 Short (1-character) name, beginning with one hyphen (-).
 Long (multi-character) name, beginning with two hyphens (--).

Short Option Names


A short option name consists of a hyphen and a single character.
File short_names.rb defines an option with a short name, -x, and an option with two short names (aliases, in
effect) -y and -z.

require 'optparse'
parser = OptionParser.new
parser.on('-x', 'Short name') do |value|
p ['x', value]
end
parser.on('-1', '-%', 'Two short names') do |value|
p ['-1 or -%', value]
end
parser.parse!

Executions:

$ ruby short_names.rb --help

Usage: short_names [options]

-x Short name
-1, -% Two short names

$ ruby short_names.rb -x

["x", true]

$ ruby short_names.rb -1

["-1 or -%", true]

$ ruby short_names.rb -%

["-1 or -%", true]

Multiple short names can “share” a hyphen:

$ ruby short_names.rb -x1%

["x", true]

["-1 or -%", true]

["-1 or -%", true]

Long Option Names


A long option name consists of two hyphens and a one or more characters (usually two or more characters).
File long_names.rb defines an option with a long name, --xxx, and an option with two long names (aliases,
in effect) --y1% and --z2#.

require 'optparse'
parser = OptionParser.new
parser.on('--xxx', 'Long name') do |value|
p ['-xxx', value]
end
parser.on('--y1%', '--z2#', "Two long names") do |value|
p ['--y1% or --z2#', value]
end
parser.parse!

Executions:
$ ruby long_names.rb --help

Usage: long_names [options]

--xxx Long name

--y1%, --z2# Two long names

$ ruby long_names.rb --xxx

["-xxx", true]

$ ruby long_names.rb --y1%

["--y1% or --z2#", true]

$ ruby long_names.rb --z2#

["--y1% or --z2#", true]

A long name may be defined with both positive and negative senses.
File long_with_negation.rb defines an option that has both senses.

require 'optparse'
parser = OptionParser.new
parser.on('--[no-]binary', 'Long name with negation') do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby long_with_negation.rb --help

Usage: long_with_negation [options]

--[no-]binary Long name with negation

$ ruby long_with_negation.rb --binary

[true, TrueClass]

$ ruby long_with_negation.rb --no-binary


[false, FalseClass]

Mixing Option Names


Many developers like to mix short and long option names, so that a short name is in effect an abbreviation
of a long name.
File mixed_names.rb defines options that each have both a short and a long name.

require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument') do |value|
p ['--xxx', value]
end
parser.on('-yYYY', '--yyy', 'Short and long, required argument') do |value|
p ['--yyy', value]
end
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument') do |value|
p ['--zzz', value]
end
parser.parse!

Executions:

$ ruby mixed_names.rb --help

Usage: mixed_names [options]

-x, --xxx Short and long, no argument

-y, --yyyYYY Short and long, required argument

-z, --zzz [ZZZ] Short and long, optional argument

$ ruby mixed_names.rb -x

["--xxx", true]

$ ruby mixed_names.rb --xxx

["--xxx", true]

$ ruby mixed_names.rb -y
mixed_names.rb:12:in `<main>': missing argument: -y (OptionParser::MissingArgument)

$ ruby mixed_names.rb -y FOO

["--yyy", "FOO"]

$ ruby mixed_names.rb --yyy

mixed_names.rb:12:in `<main>': missing argument: --yyy (OptionParser::MissingArgument)

$ ruby mixed_names.rb --yyy BAR

["--yyy", "BAR"]

$ ruby mixed_names.rb -z

["--zzz", nil]

$ ruby mixed_names.rb -z BAZ

["--zzz", "BAZ"]

$ ruby mixed_names.rb --zzz

["--zzz", nil]

$ ruby mixed_names.rb --zzz BAT

["--zzz", "BAT"]

Option Name Abbreviations


By default, abbreviated option names on the command-line are allowed. An abbreviated name is valid if it is
unique among abbreviated option names.

require 'optparse'
parser = OptionParser.new
parser.on('-n', '--dry-run',) do |value|
p ['--dry-run', value]
end
parser.on('-d', '--draft',) do |value|
p ['--draft', value]
end
parser.parse!
Executions:

$ ruby name_abbrev.rb --help

Usage: name_abbrev [options]

-n, --dry-run

-d, --draft

$ ruby name_abbrev.rb -n

["--dry-run", true]

$ ruby name_abbrev.rb --dry-run

["--dry-run", true]

$ ruby name_abbrev.rb -d

["--draft", true]

$ ruby name_abbrev.rb --draft

["--draft", true]

$ ruby name_abbrev.rb --d

name_abbrev.rb:9:in `<main>': ambiguous option: --d (OptionParser::AmbiguousOption)

$ ruby name_abbrev.rb --dr

name_abbrev.rb:9:in `<main>': ambiguous option: --dr (OptionParser::AmbiguousOption)

$ ruby name_abbrev.rb --dry

["--dry-run", true]

$ ruby name_abbrev.rb --dra

["--draft", true]

You can disable abbreviation using method require_exact.

require 'optparse'
parser = OptionParser.new
parser.on('-n', '--dry-run',) do |value|
p ['--dry-run', value]
end
parser.on('-d', '--draft',) do |value|
p ['--draft', value]
end
parser.require_exact = true
parser.parse!

Executions:

$ ruby no_abbreviation.rb --dry-ru

no_abbreviation.rb:10:in `<main>': invalid option: --dry-ru (OptionParser::InvalidOption)

$ ruby no_abbreviation.rb --dry-run

["--dry-run", true]

Option Arguments
An option may take no argument, a required argument, or an optional argument.

Option with No Argument


All the examples above define options with no argument.

Option with Required Argument


Specify a required argument for an option by adding a dummy word to its name definition.
File required_argument.rb defines two options; each has a required argument because the name definition
has a following dummy word.

require 'optparse'
parser = OptionParser.new
parser.on('-x XXX', '--xxx', 'Required argument via short name') do |value|
p ['--xxx', value]
end
parser.on('-y', '--y YYY', 'Required argument via long name') do |value|
p ['--yyy', value]
end
parser.parse!

When an option is found, the given argument is yielded.


Executions:

$ ruby required_argument.rb --help

Usage: required_argument [options]

-x, --xxx XXX Required argument via short name

-y, --y YYY Required argument via long name

$ ruby required_argument.rb -x AAA

["--xxx", "AAA"]

$ ruby required_argument.rb -y BBB

["--yyy", "BBB"]

Omitting a required argument raises an error:

$ ruby required_argument.rb -x

required_argument.rb:9:in `<main>': missing argument: -x (OptionParser::MissingArgument)

Option with Optional Argument


Specify an optional argument for an option by adding a dummy word enclosed in square brackets to its
name definition.
File optional_argument.rb defines two options; each has an optional argument because the name definition
has a following dummy word in square brackets.

require 'optparse'
parser = OptionParser.new
parser.on('-x [XXX]', '--xxx', 'Optional argument via short name') do |value|
p ['--xxx', value]
end
parser.on('-y', '--yyy [YYY]', 'Optional argument via long name') do |value|
p ['--yyy', value]
end
parser.parse!

When an option with an argument is found, the given argument yielded.


Executions:
$ ruby optional_argument.rb --help

Usage: optional_argument [options]

-x, --xxx [XXX] Optional argument via short name

-y, --yyy [YYY] Optional argument via long name

$ ruby optional_argument.rb -x AAA

["--xxx", "AAA"]

$ ruby optional_argument.rb -y BBB

["--yyy", "BBB"]

Omitting an optional argument does not raise an error.

Argument Values
Permissible argument values may be restricted either by specifying explicit values or by providing a pattern
that the given value must match.

Explicit Argument Values


You can specify argument values in either of two ways:
 Specify values an array of strings.
 Specify values a hash.
Explicit Values in Array
You can specify explicit argument values in an array of strings. The argument value must be one of those
strings, or an unambiguous abbreviation.
File explicit_array_values.rb defines options with explicit argument values.

require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', ['foo', 'bar'], 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', ['baz', 'bat'], 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!
Executions:

$ ruby explicit_array_values.rb --help

Usage: explicit_array_values [options]

-xXXX Values for required argument

-y [YYY] Values for optional argument

$ ruby explicit_array_values.rb -x

explicit_array_values.rb:9:in `<main>': missing argument: -x (OptionParser::MissingArgument)

$ ruby explicit_array_values.rb -x foo

["-x", "foo"]

$ ruby explicit_array_values.rb -x f

["-x", "foo"]

$ ruby explicit_array_values.rb -x bar

["-x", "bar"]

$ ruby explicit_array_values.rb -y ba

explicit_array_values.rb:9:in `<main>': ambiguous argument: -y ba (OptionParser::AmbiguousArgument)

$ ruby explicit_array_values.rb -x baz

explicit_array_values.rb:9:in `<main>': invalid argument: -x baz (OptionParser::InvalidArgument)

Explicit Values in Hash


You can specify explicit argument values in a hash with string keys. The value passed must be one of those
keys, or an unambiguous abbreviation; the value yielded will be the value for that key.
File explicit_hash_values.rb defines options with explicit argument values.

require 'optparse'
parser = OptionParser.new
parser.on('-xXXX', {foo: 0, bar: 1}, 'Values for required argument' ) do |value|
p ['-x', value]
end
parser.on('-y [YYY]', {baz: 2, bat: 3}, 'Values for optional argument') do |value|
p ['-y', value]
end
parser.parse!

Executions:

$ ruby explicit_hash_values.rb --help

Usage: explicit_hash_values [options]

-xXXX Values for required argument

-y [YYY] Values for optional argument

$ ruby explicit_hash_values.rb -x

explicit_hash_values.rb:9:in `<main>': missing argument: -x (OptionParser::MissingArgument)

$ ruby explicit_hash_values.rb -x foo

["-x", 0]

$ ruby explicit_hash_values.rb -x f

["-x", 0]

$ ruby explicit_hash_values.rb -x bar

["-x", 1]

$ ruby explicit_hash_values.rb -x baz

explicit_hash_values.rb:9:in `<main>': invalid argument: -x baz (OptionParser::InvalidArgument)

$ ruby explicit_hash_values.rb -y

["-y", nil]

$ ruby explicit_hash_values.rb -y baz

["-y", 2]

$ ruby explicit_hash_values.rb -y bat

["-y", 3]
$ ruby explicit_hash_values.rb -y ba

explicit_hash_values.rb:9:in `<main>': ambiguous argument: -y ba (OptionParser::AmbiguousArgument)

$ ruby explicit_hash_values.rb -y bam

["-y", nil]

Argument Value Patterns


You can restrict permissible argument values by specifying a Regexp that the given argument must match.
File matched_values.rb defines options with matched argument values.

require 'optparse'
parser = OptionParser.new
parser.on('--xxx XXX', /foo/i, 'Matched values') do |value|
p ['--xxx', value]
end
parser.parse!

Executions:

$ ruby matched_values.rb --help

Usage: matched_values [options]

--xxx XXX Matched values

$ ruby matched_values.rb --xxx foo

["--xxx", "foo"]

$ ruby matched_values.rb --xxx FOO

["--xxx", "FOO"]

$ ruby matched_values.rb --xxx bar

matched_values.rb:6:in `<main>': invalid argument: --xxx bar (OptionParser::InvalidArgument)

Keyword Argument into


In parsing options, you can add keyword option into with a hash-like argument; each parsed option will be
added as a name/value pair.
This is useful for:
 Collecting options.
 Checking for missing options.
 Providing default values for options.

Collecting Options
Use keyword argument into to collect options.

require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument')
parser.on('-yYYY', '--yyy', 'Short and long, required argument')
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument')
options = {}
parser.parse!(into: options)
p options

Executions:

$ ruby collected_options.rb --help

Usage: into [options]

-x, --xxx Short and long, no argument

-y, --yyyYYY Short and long, required argument

-z, --zzz [ZZZ] Short and long, optional argument

$ ruby collected_options.rb --xxx

{:xxx=>true}

$ ruby collected_options.rb --xxx --yyy FOO

{:xxx=>true, :yyy=>"FOO"}

$ ruby collected_options.rb --xxx --yyy FOO --zzz Bar

{:xxx=>true, :yyy=>"FOO", :zzz=>"Bar"}

$ ruby collected_options.rb --xxx --yyy FOO --yyy BAR

{:xxx=>true, :yyy=>"BAR"}

Note in the last execution that the argument value for option --yyy was overwritten.
Checking for Missing Options
Use the collected options to check for missing options.

require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument')
parser.on('-yYYY', '--yyy', 'Short and long, required argument')
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument')
options = {}
parser.parse!(into: options)
required_options = [:xxx, :zzz]
missing_options = required_options - options.keys
unless missing_options.empty?
fail "Missing required options: #{missing_options}"
end

Executions:

$ ruby missing_options.rb --help

Usage: missing_options [options]

-x, --xxx Short and long, no argument

-y, --yyyYYY Short and long, required argument

-z, --zzz [ZZZ] Short and long, optional argument

$ ruby missing_options.rb --yyy FOO

missing_options.rb:11:in `<main>': Missing required options: [:xxx, :zzz] (RuntimeError)

Default Values for Options


Initialize the into argument to define default values for options.

require 'optparse'
parser = OptionParser.new
parser.on('-x', '--xxx', 'Short and long, no argument')
parser.on('-yYYY', '--yyy', 'Short and long, required argument')
parser.on('-z [ZZZ]', '--zzz', 'Short and long, optional argument')
options = {yyy: 'AAA', zzz: 'BBB'}
parser.parse!(into: options)
p options
Executions:

$ ruby default_values.rb --help

Usage: default_values [options]

-x, --xxx Short and long, no argument

-y, --yyyYYY Short and long, required argument

-z, --zzz [ZZZ] Short and long, optional argument

$ ruby default_values.rb --yyy FOO

{:yyy=>"FOO", :zzz=>"BBB"}

Argument Converters
An option can specify that its argument is to be converted from the default String to an instance of another
class. There are a number of built-in converters.
Example: File date.rb defines an option whose argument is to be converted to a Date object. The argument
is converted by method Date#parse.

require 'optparse/date'
parser = OptionParser.new
parser.on('--date=DATE', Date) do |value|
p [value, value.class]
end
parser.parse!

Executions:

$ ruby date.rb --date 2001-02-03

[#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, Date]

$ ruby date.rb --date 20010203

[#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, Date]

$ ruby date.rb --date "3rd Feb 2001"

[#<Date: 2001-02-03 ((2451944j,0s,0n),+0s,2299161j)>, Date]


You can also define custom converters. See Argument Converters for both built-in and custom converters.

Help
OptionParser makes automatically generated help text available.
The help text consists of:
 A banner, showing the usage.
 Option short and long names.
 Option dummy argument names.
 Option descriptions.
Example code:

require 'optparse'
parser = OptionParser.new
parser.on(
'-x', '--xxx',
'Adipiscing elit. Aenean commodo ligula eget.',
'Aenean massa. Cum sociis natoque penatibus',
)
parser.on(
'-y', '--yyy YYY',
'Lorem ipsum dolor sit amet, consectetuer.'
)
parser.on(
'-z', '--zzz [ZZZ]',
'Et magnis dis parturient montes, nascetur',
'ridiculus mus. Donec quam felis, ultricies',
'nec, pellentesque eu, pretium quis, sem.',
)
parser.parse!

The option names and dummy argument names are defined as described above.
The option description consists of the strings that are not themselves option names; An option can have
more than one description string. Execution:

Usage: help [options]

-x, --xxx Adipiscing elit. Aenean commodo ligula eget.

Aenean massa. Cum sociis natoque penatibus

-y, --yyy YYY Lorem ipsum dolor sit amet, consectetuer.


-z, --zzz [ZZZ] Et magnis dis parturient montes, nascetur

ridiculus mus. Donec quam felis, ultricies

nec, pellentesque eu, pretium quis, sem.

The program name is included in the default banner: Usage: #{program_name} [options]; you can change
the program name.

require 'optparse'
parser = OptionParser.new
parser.program_name = 'help_program_name.rb'
parser.parse!

Execution:

$ ruby help_program_name.rb --help

Usage: help_program_name.rb [options]

You can also change the entire banner.

require 'optparse'
parser = OptionParser.new
parser.banner = "Usage: ruby help_banner.rb"
parser.parse!

Execution:

$ ruby help_banner.rb --help

Usage: ruby help_banner.rb

By default, the option names are indented 4 spaces and the width of the option-names field is 32 spaces.
You can change these values, along with the banner, by passing parameters to OptionParser.new.

require 'optparse'
parser = OptionParser.new(
'ruby help_format.rb [options]', # Banner
20, # Width of options field
''*2 # Indentation
)
parser.on(
'-x', '--xxx',
'Adipiscing elit. Aenean commodo ligula eget.',
'Aenean massa. Cum sociis natoque penatibus',
)
parser.on(
'-y', '--yyy YYY',
'Lorem ipsum dolor sit amet, consectetuer.'
)
parser.on(
'-z', '--zzz [ZZZ]',
'Et magnis dis parturient montes, nascetur',
'ridiculus mus. Donec quam felis, ultricies',
'nec, pellentesque eu, pretium quis, sem.',
)
parser.parse!

Execution:

$ ruby help_format.rb --help

ruby help_format.rb [options]

-x, --xxx Adipiscing elit. Aenean commodo ligula eget.

Aenean massa. Cum sociis natoque penatibus

-y, --yyy YYY Lorem ipsum dolor sit amet, consectetuer.

-z, --zzz [ZZZ] Et magnis dis parturient montes, nascetur

ridiculus mus. Donec quam felis, ultricies

nec, pellentesque eu, pretium quis, sem.

Top List and Base List


An OptionParser object maintains a stack of OptionParser::List objects, each of which has a collection of
zero or more options. It is unlikely that you’ll need to add or take away from that stack.
The stack includes:
 The top list, given by OptionParser#top.
 The base list, given by OptionParser#base.
When OptionParser builds its help text, the options in the top list precede those in the base list.

Defining Options
Option-defining methods allow you to create an option, and also append/prepend it to the top list or append
it to the base list.
Each of these next three methods accepts a sequence of parameter arguments and a block, creates an
option object using method Option#make_switch (see below), and returns the created option:
 Method OptionParser#define appends the created option to the top list.
 Method OptionParser#define_head prepends the created option to the top list.
 Method OptionParser#define_tail appends the created option to the base list.
These next three methods are identical to the three above, except for their return values:
 Method OptionParser#on is identical to method OptionParser#define, except that it returns the
parser object self.
 Method OptionParser#on_head is identical to method OptionParser#define_head, except that it
returns the parser object self.
 Method OptionParser#on_tail is identical to method OptionParser#define_tail, except that it returns
the parser object self.
Though you may never need to call it directly, here’s the core method for defining an option:
 Method OptionParser#make_switch accepts an array of parameters and a block. See Parameters
for New Options. This method is unlike others here in that it:
o Accepts an array of parameters; others accept a sequence of parameter arguments.
o Returns an array containing the created option object, option names, and other values;
others return either the created option object or the parser object self.

Parsing
OptionParser has six instance methods for parsing.
Three have names ending with a “bang” (!):
 parse!

 order!
 permute!
Each of these methods:
 Accepts an optional array of string arguments argv; if not given, argv defaults to the value
of OptionParser#default_argv, whose initial value is ARGV.
 Accepts an optional keyword argument into (see Keyword Argument into).
 Returns argv, possibly with some elements removed.
The three other methods have names not ending with a “bang”:
 parse

 order
 permute
Each of these methods:
 Accepts an array of string arguments or zero or more string arguments.
 Accepts an optional keyword argument into and its value into. (see Keyword Argument into).
 Returns argv, possibly with some elements removed.

Method parse!
Method parse!:
 Accepts an optional array of string arguments argv; if not given, argv defaults to the value
of OptionParser#default_argv, whose initial value is ARGV.
 Accepts an optional keyword argument into (see Keyword Argument into).
 Returns argv, possibly with some elements removed.
The method processes the elements in argv beginning at argv[0], and ending, by default, at the end.
Otherwise processing ends and the method returns when:
 The terminator argument -- is found; the terminator argument is removed before the return.
 Environment variable POSIXLY_CORRECT is defined and a non-option argument is found; the
non-option argument is not removed. Note that the value of that variable does not matter, as only
its existence is checked.
File parse_bang.rb:

require 'optparse'
parser = OptionParser.new
parser.on('--xxx') do |value|
p ['--xxx', value]
end
parser.on('--yyy YYY') do |value|
p ['--yyy', value]
end
parser.on('--zzz [ZZZ]') do |value|
p ['--zzz', value]
end
ret = parser.parse!
puts "Returned: #{ret} (#{ret.class})"

Help:

$ ruby parse_bang.rb --help

Usage: parse_bang [options]

--xxx
--yyy YYY

--zzz [ZZZ]

Default behavior:

$ ruby parse_bang.rb input_file.txt output_file.txt --xxx --yyy FOO --zzz BAR

["--xxx", true]

["--yyy", "FOO"]

["--zzz", "BAR"]

Returned: ["input_file.txt", "output_file.txt"] (Array)

Processing ended by terminator argument:

$ ruby parse_bang.rb input_file.txt output_file.txt --xxx --yyy FOO -- --zzz BAR

["--xxx", true]

["--yyy", "FOO"]

Returned: ["input_file.txt", "output_file.txt", "--zzz", "BAR"] (Array)

Processing ended by non-option found when POSIXLY_CORRECT is defined:

$ POSIXLY_CORRECT=true ruby parse_bang.rb --xxx input_file.txt output_file.txt -yyy FOO

["--xxx", true]

Returned: ["input_file.txt", "output_file.txt", "-yyy", "FOO"] (Array)

Method parse
Method parse:
 Accepts an array of string arguments or zero or more string arguments.
 Accepts an optional keyword argument into and its value into. (see Keyword Argument into).
 Returns argv, possibly with some elements removed.
If given an array ary, the method forms array argv as ary.dup. If given zero or more string arguments, those
arguments are formed into array argv.
The method calls

parse!(argv, into: into)

Note that environment variable POSIXLY_CORRECT and the terminator argument -- are honored.
File parse.rb:

require 'optparse'
parser = OptionParser.new
parser.on('--xxx') do |value|
p ['--xxx', value]
end
parser.on('--yyy YYY') do |value|
p ['--yyy', value]
end
parser.on('--zzz [ZZZ]') do |value|
p ['--zzz', value]
end
ret = parser.parse(ARGV)
puts "Returned: #{ret} (#{ret.class})"

Help:

$ ruby parse.rb --help

Usage: parse [options]

--xxx

--yyy YYY

--zzz [ZZZ]

Default behavior:

$ ruby parse.rb input_file.txt output_file.txt --xxx --yyy FOO --zzz BAR

["--xxx", true]

["--yyy", "FOO"]

["--zzz", "BAR"]
Returned: ["input_file.txt", "output_file.txt"] (Array)

Processing ended by terminator argument:

$ ruby parse.rb input_file.txt output_file.txt --xxx --yyy FOO -- --zzz BAR

["--xxx", true]

["--yyy", "FOO"]

Returned: ["input_file.txt", "output_file.txt", "--zzz", "BAR"] (Array)

Processing ended by non-option found when POSIXLY_CORRECT is defined:

$ POSIXLY_CORRECT=true ruby parse.rb --xxx input_file.txt output_file.txt -yyy FOO

["--xxx", true]

Returned: ["input_file.txt", "output_file.txt", "-yyy", "FOO"] (Array)

Method order!
Calling method OptionParser#order! gives exactly the same result as calling
method OptionParser#parse! with environment variable POSIXLY_CORRECT defined.

Method order
Calling method OptionParser#order gives exactly the same result as calling
method OptionParser#parse with environment variable POSIXLY_CORRECT defined.

Method permute!
Calling method OptionParser#permute! gives exactly the same result as calling
method OptionParser#parse! with environment variable POSIXLY_CORRECT not defined.

Method permute
Calling method OptionParser#permute gives exactly the same result as calling
method OptionParser#parse with environment variable POSIXLY_CORRECT not defined.

Packed Data
Certain Ruby core methods deal with packing and unpacking data:
 Method Array#pack: Formats each element in array self into a binary string; returns that string.
 Method String#unpack: Extracts data from string self, forming objects that become the elements of
a new array; returns that array.
 Method String#unpack1: Does the same, but unpacks and returns only the first extracted object.
Each of these methods accepts a string template, consisting of zero or more directive characters, each
followed by zero or more modifier characters.
Examples (directive 'C' specifies ‘unsigned character’):

[65].pack('C') # => "A" # One element, one directive.


[65, 66].pack('CC') # => "AB" # Two elements, two directives.
[65, 66].pack('C') # => "A" # Extra element is ignored.
[65].pack('') # => "" # No directives.
[65].pack('CC') # Extra directive raises ArgumentError.

'A'.unpack('C') # => [65] # One character, one directive.


'AB'.unpack('CC') # => [65, 66] # Two characters, two directives.
'AB'.unpack('C') # => [65] # Extra character is ignored.
'A'.unpack('CC') # => [65, nil] # Extra directive generates nil.
'AB'.unpack('') # => [] # No directives.

The string template may contain any mixture of valid directives (directive 'c' specifies ‘signed character’):

[65, -1].pack('cC') # => "A\xFF"


"A\xFF".unpack('cC') # => [65, 255]

The string template may contain whitespace (which is ignored) and comments, each of which begins with
character '#' and continues up to and including the next following newline:

[0,1].pack(" C #foo \n C ") # => "\x00\x01"


"\0\1".unpack(" C #foo \n C ") # => [0, 1]

Any directive may be followed by either of these modifiers:


 '*' - The directive is to be applied as many times as needed:

 [65, 66].pack('C*') # => "AB"


 'AB'.unpack('C*') # => [65, 66]

 Integer count - The directive is to be applied count times:

 [65, 66].pack('C2') # => "AB"


 [65, 66].pack('C3') # Raises ArgumentError.
 'AB'.unpack('C2') # => [65, 66]
 'AB'.unpack('C3') # => [65, 66, nil]

Note: Directives in %w[A a Z m] use count differently; see String Directives.


If elements don’t fit the provided directive, only least significant bits are encoded:
[257].pack("C").unpack("C") # => [1]

Packing Method
Method Array#pack accepts optional keyword argument buffer that specifies the target string (instead of a
new string):

[65, 66].pack('C*', buffer: 'foo') # => "fooAB"

The method can accept a block:

# Packed string is passed to the block.


[65, 66].pack('C*') {|s| p s } # => "AB"

Unpacking Methods
Methods String#unpack and String#unpack1 each accept an optional keyword argument offset that
specifies an offset into the string:

'ABC'.unpack('C*', offset: 1) # => [66, 67]


'ABC'.unpack1('C*', offset: 1) # => 66

Both methods can accept a block:

# Each unpacked object is passed to the block.


ret = []
"ABCD".unpack("C*") {|c| ret << c }
ret # => [65, 66, 67, 68]

# The single unpacked object is passed to the block.


'AB'.unpack1('C*') {|ele| p ele } # => 65

Integer Directives
Each integer directive specifies the packing or unpacking for one element in the input or output array.

8-Bit Integer Directives


 'c' - 8-bit signed integer (like C signed char):

 [0, 1, 255].pack('c*') # => "\x00\x01\xFF"


 s = [0, 1, -1].pack('c*') # => "\x00\x01\xFF"
 s.unpack('c*') # => [0, 1, -1]

 'C' - 8-bit signed integer (like C unsigned char):

 [0, 1, 255].pack('C*') # => "\x00\x01\xFF"


 s = [0, 1, -1].pack('C*') # => "\x00\x01\xFF"
 s.unpack('C*') # => [0, 1, 255]

16-Bit Integer Directives


 's' - 16-bit signed integer, native-endian (like C int16_t):

 [513, -514].pack('s*') # => "\x01\x02\xFE\xFD"


 s = [513, 65022].pack('s*') # => "\x01\x02\xFE\xFD"
 s.unpack('s*') # => [513, -514]

 'S' - 16-bit unsigned integer, native-endian (like C uint16_t):

 [513, -514].pack('S*') # => "\x01\x02\xFE\xFD"


 s = [513, 65022].pack('S*') # => "\x01\x02\xFE\xFD"
 s.unpack('S*') # => [513, 65022]

 'n' - 16-bit network integer, big-endian:

 s = [0, 1, -1, 32767, -32768, 65535].pack('n*')


 # => "\x00\x00\x00\x01\xFF\xFF\x7F\xFF\x80\x00\xFF\xFF"
 s.unpack('n*')
 # => [0, 1, 65535, 32767, 32768, 65535]

 'v' - 16-bit VAX integer, little-endian:

 s = [0, 1, -1, 32767, -32768, 65535].pack('v*')


 # => "\x00\x00\x01\x00\xFF\xFF\xFF\x7F\x00\x80\xFF\xFF"
 s.unpack('v*')
 # => [0, 1, 65535, 32767, 32768, 65535]

32-Bit Integer Directives


 'l' - 32-bit signed integer, native-endian (like C int32_t):

 s = [67305985, -50462977].pack('l*')
 # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
 s.unpack('l*')
 # => [67305985, -50462977]
 'L' - 32-bit unsigned integer, native-endian (like C uint32_t):

 s = [67305985, 4244504319].pack('L*')
 # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
 s.unpack('L*')
 # => [67305985, 4244504319]

 'N' - 32-bit network integer, big-endian:

 s = [0,1,-1].pack('N*')
 # => "\x00\x00\x00\x00\x00\x00\x00\x01\xFF\xFF\xFF\xFF"
 s.unpack('N*')
 # => [0, 1, 4294967295]

 'V' - 32-bit VAX integer, little-endian:

 s = [0,1,-1].pack('V*')
 # => "\x00\x00\x00\x00\x01\x00\x00\x00\xFF\xFF\xFF\xFF"
 s.unpack('v*')
 # => [0, 0, 1, 0, 65535, 65535]

64-Bit Integer Directives


 'q' - 64-bit signed integer, native-endian (like C int64_t):

 s = [578437695752307201, -506097522914230529].pack('q*')
 # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8"
 s.unpack('q*')
 # => [578437695752307201, -506097522914230529]

 'Q' - 64-bit unsigned integer, native-endian (like C uint64_t):

 s = [578437695752307201, 17940646550795321087].pack('Q*')
 # => "\x01\x02\x03\x04\x05\x06\a\b\xFF\xFE\xFD\xFC\xFB\xFA\xF9\xF8"
 s.unpack('Q*')
 # => [578437695752307201, 17940646550795321087]

Platform-Dependent Integer Directives


 'i' - Platform-dependent width signed integer, native-endian (like C int):

 s = [67305985, -50462977].pack('i*')
 # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
 s.unpack('i*')
 # => [67305985, -50462977]

 'I' - Platform-dependent width unsigned integer, native-endian (like C unsigned int):


 s = [67305985, -50462977].pack('I*')
 # => "\x01\x02\x03\x04\xFF\xFE\xFD\xFC"
 s.unpack('I*')
 # => [67305985, 4244504319]

Pointer Directives
 'j' - 64-bit pointer-width signed integer, native-endian (like C intptr_t):

 s = [67305985, -50462977].pack('j*')
 # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\xFF\xFF\xFF\xFF"
 s.unpack('j*')
 # => [67305985, -50462977]

 'j' - 64-bit pointer-width unsigned integer, native-endian (like C uintptr_t):

 s = [67305985, 4244504319].pack('J*')
 # => "\x01\x02\x03\x04\x00\x00\x00\x00\xFF\xFE\xFD\xFC\x00\x00\x00\x00"
 s.unpack('J*')
 # => [67305985, 4244504319]

Other Integer Directives


:
 'U' - UTF-8 character:

 s = [4194304].pack('U*')
 # => "\xF8\x90\x80\x80\x80"
 s.unpack('U*')
 # => [4194304]

 'w' - BER-encoded integer (see BER enocding):

 s = [1073741823].pack('w*')
 # => "\x83\xFF\xFF\xFF\x7F"
 s.unpack('w*')
 # => [1073741823]

Modifiers for Integer Directives


For directives in 'i', 'I', 's', 'S', 'l', 'L', 'q', 'Q', 'j', and 'J', these modifiers may be suffixed:
 '!' or '_' - Underlying platform’s native size.
 '>' - Big-endian.
 '<' - Little-endian.

Float Directives
Each float directive specifies the packing or unpacking for one element in the input or output array.

Single-Precision Float Directives


 'F' or 'f' - Native format:

 s = [3.0].pack('F') # => "\x00\x00@@"


 s.unpack('F') # => [3.0]

 'e' - Little-endian:

 s = [3.0].pack('e') # => "\x00\x00@@"


 s.unpack('e') # => [3.0]

 'g' - Big-endian:

 s = [3.0].pack('g') # => "@@\x00\x00"


 s.unpack('g') # => [3.0]

Double-Precision Float Directives


 'D' or 'd' - Native format:

 s = [3.0].pack('D') # => "\x00\x00\x00\x00\x00\x00\b@"


 s.unpack('D') # => [3.0]

 'E' - Little-endian:

 s = [3.0].pack('E') # => "\x00\x00\x00\x00\x00\x00\b@"


 s.unpack('E') # => [3.0]

 'G' - Big-endian:

 s = [3.0].pack('G') # => "@\b\x00\x00\x00\x00\x00\x00"


 s.unpack('G') # => [3.0]

A float directive may be infinity or not-a-number:

inf = 1.0/0.0 # => Infinity


[inf].pack('f') # => "\x00\x00\x80\x7F"
"\x00\x00\x80\x7F".unpack('f') # => [Infinity]

nan = inf/inf # => NaN


[nan].pack('f') # => "\x00\x00\xC0\x7F"
"\x00\x00\xC0\x7F".unpack('f') # => [NaN]

String Directives
Each string directive specifies the packing or unpacking for one byte in the input or output string.

Binary String Directives


 'A' - Arbitrary binary string (space padded; count is width); nil is treated as the empty string:

 ['foo'].pack('A') # => "f"


 ['foo'].pack('A*') # => "foo"
 ['foo'].pack('A2') # => "fo"
 ['foo'].pack('A4') # => "foo "
 [nil].pack('A') # => " "
 [nil].pack('A*') # => ""
 [nil].pack('A2') # => " "
 [nil].pack('A4') # => " "

 "foo\0".unpack('A') # => ["f"]
 "foo\0".unpack('A4') # => ["foo"]
 "foo\0bar".unpack('A10') # => ["foo\x00bar"] # Reads past "\0".
 "foo ".unpack('A') # => ["f"]
 "foo ".unpack('A4') # => ["foo"]
 "foo".unpack('A4') # => ["foo"]

 russian = "\u{442 435 441 442}" # => "тест"
 russian.size # => 4
 russian.bytesize # => 8
 [russian].pack('A') # => "\xD1"
 [russian].pack('A*') # => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"
 russian.unpack('A') # => ["\xD1"]
 russian.unpack('A2') # => ["\xD1\x82"]
 russian.unpack('A4') # => ["\xD1\x82\xD0\xB5"]
 russian.unpack('A*') # => ["\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"]

 'a' - Arbitrary binary string (null padded; count is width):

 ["foo"].pack('a') # => "f"


 ["foo"].pack('a*') # => "foo"
 ["foo"].pack('a2') # => "fo"
 ["foo\0"].pack('a4') # => "foo\x00"
 [nil].pack('a') # => "\x00"
 [nil].pack('a*') # => ""
 [nil].pack('a2') # => "\x00\x00"
 [nil].pack('a4') # => "\x00\x00\x00\x00"

 "foo\0".unpack('a') # => ["f"]
 "foo\0".unpack('a4') # => ["foo\x00"]
 "foo ".unpack('a4') # => ["foo "]
 "foo".unpack('a4') # => ["foo"]
 "foo\0bar".unpack('a4') # => ["foo\x00"] # Reads past "\0".

 'Z' - Same as 'a', except that null is added or ignored with '*':

 ["foo"].pack('Z*') # => "foo\x00"


 [nil].pack('Z*') # => "\x00"

 "foo\0".unpack('Z*') # => ["foo"]
 "foo".unpack('Z*') # => ["foo"]
 "foo\0bar".unpack('Z*') # => ["foo"] # Does not read past "\0".

Bit String Directives


 'B' - Bit string (high byte first):

 ['11111111' + '00000000'].pack('B*') # => "\xFF\x00"


 ['10000000' + '01000000'].pack('B*') # => "\x80@"

 ['1'].pack('B0') # => ""
 ['1'].pack('B1') # => "\x80"
 ['1'].pack('B2') # => "\x80\x00"
 ['1'].pack('B3') # => "\x80\x00"
 ['1'].pack('B4') # => "\x80\x00\x00"
 ['1'].pack('B5') # => "\x80\x00\x00"
 ['1'].pack('B6') # => "\x80\x00\x00\x00"

 "\xff\x00".unpack("B*") # => ["1111111100000000"]
 "\x01\x02".unpack("B*") # => ["0000000100000010"]

 "".unpack("B0") # => [""]
 "\x80".unpack("B1") # => ["1"]
 "\x80".unpack("B2") # => ["10"]
 "\x80".unpack("B3") # => ["100"]

 'b' - Bit string (low byte first):


 ['11111111' + '00000000'].pack('b*') # => "\xFF\x00"
 ['10000000' + '01000000'].pack('b*') # => "\x01\x02"

 ['1'].pack('b0') # => ""
 ['1'].pack('b1') # => "\x01"
 ['1'].pack('b2') # => "\x01\x00"
 ['1'].pack('b3') # => "\x01\x00"
 ['1'].pack('b4') # => "\x01\x00\x00"
 ['1'].pack('b5') # => "\x01\x00\x00"
 ['1'].pack('b6') # => "\x01\x00\x00\x00"

 "\xff\x00".unpack("b*") # => ["1111111100000000"]
 "\x01\x02".unpack("b*") # => ["1000000001000000"]

 "".unpack("b0") # => [""]
 "\x01".unpack("b1") # => ["1"]
 "\x01".unpack("b2") # => ["10"]
 "\x01".unpack("b3") # => ["100"]

Hex String Directives


 'H' - Hex string (high nibble first):

 ['10ef'].pack('H*') # => "\x10\xEF"


 ['10ef'].pack('H0') # => ""
 ['10ef'].pack('H3') # => "\x10\xE0"
 ['10ef'].pack('H5') # => "\x10\xEF\x00"

 ['fff'].pack('H3') # => "\xFF\xF0"
 ['fff'].pack('H4') # => "\xFF\xF0"
 ['fff'].pack('H5') # => "\xFF\xF0\x00"
 ['fff'].pack('H6') # => "\xFF\xF0\x00"
 ['fff'].pack('H7') # => "\xFF\xF0\x00\x00"
 ['fff'].pack('H8') # => "\xFF\xF0\x00\x00"

 "\x10\xef".unpack('H*') # => ["10ef"]
 "\x10\xef".unpack('H0') # => [""]
 "\x10\xef".unpack('H1') # => ["1"]
 "\x10\xef".unpack('H2') # => ["10"]
 "\x10\xef".unpack('H3') # => ["10e"]
 "\x10\xef".unpack('H4') # => ["10ef"]
 "\x10\xef".unpack('H5') # => ["10ef"]

 'h' - Hex string (low nibble first):


 ['10ef'].pack('h*') # => "\x01\xFE"
 ['10ef'].pack('h0') # => ""
 ['10ef'].pack('h3') # => "\x01\x0E"
 ['10ef'].pack('h5') # => "\x01\xFE\x00"

 ['fff'].pack('h3') # => "\xFF\x0F"
 ['fff'].pack('h4') # => "\xFF\x0F"
 ['fff'].pack('h5') # => "\xFF\x0F\x00"
 ['fff'].pack('h6') # => "\xFF\x0F\x00"
 ['fff'].pack('h7') # => "\xFF\x0F\x00\x00"
 ['fff'].pack('h8') # => "\xFF\x0F\x00\x00"

 "\x01\xfe".unpack('h*') # => ["10ef"]
 "\x01\xfe".unpack('h0') # => [""]
 "\x01\xfe".unpack('h1') # => ["1"]
 "\x01\xfe".unpack('h2') # => ["10"]
 "\x01\xfe".unpack('h3') # => ["10e"]
 "\x01\xfe".unpack('h4') # => ["10ef"]
 "\x01\xfe".unpack('h5') # => ["10ef"]

Pointer String Directives


 'P' - Pointer to a structure (fixed-length string):

 s = ['abc'].pack('P') # => "\xE0O\x7F\xE5\xA1\x01\x00\x00"


 s.unpack('P*') # => ["abc"]
 ".".unpack("P") # => []
 ("\0" * 8).unpack("P") # => [nil]
 [nil].pack("P") # => "\x00\x00\x00\x00\x00\x00\x00\x00"

 'p' - Pointer to a null-terminated string:

 s = ['abc'].pack('p') # => "(\xE4u\xE5\xA1\x01\x00\x00"


 s.unpack('p*') # => ["abc"]
 ".".unpack("p") # => []
 ("\0" * 8).unpack("p") # => [nil]
 [nil].pack("p") # => "\x00\x00\x00\x00\x00\x00\x00\x00"

Other String Directives


 'M' - Quoted printable, MIME encoding; text mode, but input must use LF and output LF; (see RFC
2045):
 ["a b c\td \ne"].pack('M') # => "a b c\td =\n\ne=\n"
 ["\0"].pack('M') # => "=00=\n"

 ["a"*1023].pack('M') == ("a"*73+"=\n")*14+"a=\n" # => true
 ("a"*73+"=\na=\n").unpack('M') == ["a"*74] # => true
 (("a"*73+"=\n")*14+"a=\n").unpack('M') == ["a"*1023] # => true

 "a b c\td =\n\ne=\n".unpack('M') # => ["a b c\td \ne"]
 "=00=\n".unpack('M') # => ["\x00"]

 "pre=31=32=33after".unpack('M') # => ["pre123after"]
 "pre=\nafter".unpack('M') # => ["preafter"]
 "pre=\r\nafter".unpack('M') # => ["preafter"]
 "pre=".unpack('M') # => ["pre="]
 "pre=\r".unpack('M') # => ["pre=\r"]
 "pre=hoge".unpack('M') # => ["pre=hoge"]
 "pre==31after".unpack('M') # => ["pre==31after"]
 "pre===31after".unpack('M') # => ["pre===31after"]

 'm' - Base64 encoded string; count specifies input bytes between each newline, rounded down to
nearest multiple of 3; if count is zero, no newlines are added; (see RFC 4648):

 [""].pack('m') # => ""


 ["\0"].pack('m') # => "AA==\n"
 ["\0\0"].pack('m') # => "AAA=\n"
 ["\0\0\0"].pack('m') # => "AAAA\n"
 ["\377"].pack('m') # => "/w==\n"
 ["\377\377"].pack('m') # => "//8=\n"
 ["\377\377\377"].pack('m') # => "////\n"

 "".unpack('m') # => [""]
 "AA==\n".unpack('m') # => ["\x00"]
 "AAA=\n".unpack('m') # => ["\x00\x00"]
 "AAAA\n".unpack('m') # => ["\x00\x00\x00"]
 "/w==\n".unpack('m') # => ["\xFF"]
 "//8=\n".unpack('m') # => ["\xFF\xFF"]
 "////\n".unpack('m') # => ["\xFF\xFF\xFF"]
 "A\n".unpack('m') # => [""]
 "AA\n".unpack('m') # => ["\x00"]
 "AA=\n".unpack('m') # => ["\x00"]
 "AAA\n".unpack('m') # => ["\x00\x00"]

 [""].pack('m0') # => ""
 ["\0"].pack('m0') # => "AA=="
 ["\0\0"].pack('m0') # => "AAA="
 ["\0\0\0"].pack('m0') # => "AAAA"
 ["\377"].pack('m0') # => "/w=="
 ["\377\377"].pack('m0') # => "//8="
 ["\377\377\377"].pack('m0') # => "////"

 "".unpack('m0') # => [""]
 "AA==".unpack('m0') # => ["\x00"]
 "AAA=".unpack('m0') # => ["\x00\x00"]
 "AAAA".unpack('m0') # => ["\x00\x00\x00"]
 "/w==".unpack('m0') # => ["\xFF"]
 "//8=".unpack('m0') # => ["\xFF\xFF"]
 "////".unpack('m0') # => ["\xFF\xFF\xFF"]

 'u' - UU-encoded string:

 [0].pack("U") # => "\u0000"


 [0x3fffffff].pack("U") # => "\xFC\xBF\xBF\xBF\xBF\xBF"
 [0x40000000].pack("U") # => "\xFD\x80\x80\x80\x80\x80"
 [0x7fffffff].pack("U") # => "\xFD\xBF\xBF\xBF\xBF\xBF"

Offset Directives
 '@' - Begin packing at the given byte offset; for packing, null fill if necessary:

 [1, 2].pack("C@0C") # => "\x02"


 [1, 2].pack("C@1C") # => "\x01\x02"
 [1, 2].pack("C@5C") # => "\x01\x00\x00\x00\x00\x02"

 "\x01\x00\x00\x02".unpack("C@3C") # => [1, 2]
 "\x00".unpack("@1C") # => [nil]

 'X' - Back up a byte:

 [0, 1, 2].pack("CCXC") # => "\x00\x02"


 [0, 1, 2].pack("CCX2C") # => "\x02"
 "\x00\x02".unpack("CCXC") # => [0, 2, 2]

Null Byte Direcive¶ ↑


 'x' - Null byte:

 [].pack("x0") # => ""


 [].pack("x") # => "\x00"
 [].pack("x8") # => "\x00\x00\x00\x00\x00\x00\x00\x00"
 "\x00\x00\x02".unpack("CxC") # => [0, 2]
Ractor - Ruby’s Actor-like concurrent abstraction
Ractor is designed to provide a parallel execution feature of Ruby without thread-safety concerns.

Summary

Multiple Ractors in an interpreter process


You can make multiple Ractors and they run in parallel.
 Ractor.new{ expr } creates a new Ractor and expr is run in parallel on a parallel computer.
 Interpreter invokes with the first Ractor (called main Ractor).
 If main Ractor terminated, all Ractors receive terminate request like Threads (if main thread (first
invoked Thread), Ruby interpreter sends all running threads to terminate execution).
 Each Ractor has 1 or more Threads.
 Threads in a Ractor shares a Ractor-wide global lock like GIL (GVL in MRI terminology), so they
can’t run in parallel (without releasing GVL explicitly in C-level). Threads in different ractors run in
parallel.
 The overhead of creating a Ractor is similar to overhead of one Thread creation.

Limited sharing between multiple ractors


Ractors don’t share everything, unlike threads.
 Most objects are Unshareable objects, so you don’t need to care about thread-safety problems
which are caused by sharing.
 Some objects are Shareable objects.
 Immutable objects: frozen objects which don’t refer to unshareable-objects.
o i = 123: i is an immutable object.
o s = "str".freeze: s is an immutable object.
o a = [1, [2], 3].freeze: a is not an immutable object because a refers unshareable-
object [2] (which is not frozen).
o h = {c: Object}.freeze: h is an immutable object because h refers Symbol :c and
shareable Object class object which is not frozen.
 Class/Module objects
 Special shareable objects
o Ractor object itself.
o And more…

Two-types communication between Ractors


Ractors communicate with each other and synchronize the execution by message exchanging between
Ractors. There are two message exchange protocols: push type (message passing) and pull type.
 Push type message passing: Ractor#send(obj) and Ractor.receive() pair.
 Sender ractor passes the obj to the ractor r by r.send(obj) and receiver ractor receives the
message with Ractor.receive.
 Sender knows the destination Ractor r and the receiver does not know the sender (accept all
messages from any ractors).
 Receiver has infinite queue and sender enqueues the message. Sender doesn’t block to put
message into this queue.
 This type of message exchanging is employed by many other Actor-based languages.
 Ractor.receive_if{ filter_expr } is a variant of Ractor.receive to select a message.
 Pull type communication: Ractor.yield(obj) and Ractor#take() pair.
 Sender ractor declare to yield the obj by Ractor.yield(obj) and receiver Ractortake it with r.take.
 Sender doesn’t know a destination Ractor and receiver knows the sender Ractor r.
 Sender or receiver will block if there is no other side.

Copy & Move semantics to send messages


To send unshareable objects as messages, objects are copied or moved.
 Copy: use deep-copy.
 Move: move membership.
 Sender can not access the moved object after moving the object.
 Guarantee that at least only 1 Ractor can access the object.

Thread-safety
Ractor helps to write a thread-safe concurrent program, but we can make thread-unsafe programs with
Ractors.
 GOOD: Sharing limitation

 Most objects are unshareable, so we can’t make data-racy and race-conditional programs.
 Shareable objects are protected by an interpreter or locking mechanism.
 BAD: Class/Module can violate this assumption
 To make it compatible with old behavior, classes and modules can introduce data-race and so on.
 Ruby programmers should take care if they modify class/module objects on multi Ractorprograms.
 BAD: Ractor can’t solve all thread-safety problems
 There are several blocking operations (waiting send, waiting yield and waiting take) so you can
make a program which has dead-lock and live-lock issues.
 Some kind of shareable objects can introduce transactions (STM, for example). However,
misusing transactions will generate inconsistent state.
Without Ractor, we need to trace all state-mutations to debug thread-safety issues. With Ractor, you can
concentrate on suspicious code which are shared with Ractors.

Creation and termination


Ractor.new
 Ractor.new{ expr } generates another Ractor.

# Ractor.new with a block creates new Ractor


r = Ractor.new do
# This block will be run in parallel with other ractors
end

# You can name a Ractor with `name:` argument.


r = Ractor.new name: 'test-name' do
end

# and Ractor#name returns its name.


r.name #=> 'test-name'

Given block isolation


The Ractor executes given expr in a given block. Given block will be isolated from outer scope by
the Proc#isolate method (not exposed yet for Ruby users). To prevent sharing unshareable objects between
ractors, block outer-variables, self and other information are isolated.
Proc#isolate is called at Ractor creation time (when Ractor.new is called). If given Procobject is not able to
isolate because of outer variables and so on, an error will be raised.

begin
a = true
r = Ractor.new do
a #=> ArgumentError because this block accesses `a`.
end
r.take # see later
rescue ArgumentError
end

 The self of the given block is the Ractor object itself.

r = Ractor.new do
p self.class #=> Ractor
self.object_id
end
r.take == self.object_id #=> false

Passed arguments to Ractor.new() becomes block parameters for the given block. However, an interpreter
does not pass the parameter object references, but send them as messages (see below for details).

r = Ractor.new 'ok' do |msg|


msg #=> 'ok'
end
r.take #=> 'ok'
# almost similar to the last example
r = Ractor.new do
msg = Ractor.receive
msg
end
r.send 'ok'
r.take #=> 'ok'

An execution result of given block


Return value of the given block becomes an outgoing message (see below for details).

r = Ractor.new do
'ok'
end
r.take #=> `ok`
# almost similar to the last example
r = Ractor.new do
Ractor.yield 'ok'
end
r.take #=> 'ok'

Error in the given block will be propagated to the receiver of an outgoing message.

r = Ractor.new do
raise 'ok' # exception will be transferred to the receiver
end

begin
r.take
rescue Ractor::RemoteError => e
e.cause.class #=> RuntimeError
e.cause.message #=> 'ok'
e.ractor #=> r
end

Communication between Ractors


Communication between Ractors is achieved by sending and receiving messages. There are two ways to
communicate with each other.
 (1) Message sending/receiving
 (1-1) push type send/receive (sender knows receiver). Similar to the Actor model.
 (1-2) pull type yield/take (receiver knows sender).
 (2) Using shareable container objects
 Ractor::TVar gem (ko1/ractor-tvar)
 more?
Users can control program execution timing with (1), but should not control with (2) (only manage as critical
section).
For message sending and receiving, there are two types of APIs: push type and pull type.
 (1-1) send/receive (push type)
 Ractor#send(obj) (Ractor#<<(obj) is an alias) send a message to the Ractor’s incoming port.
Incoming port is connected to the infinite size incoming queue so Ractor#send will never block.
 Ractor.receive dequeue a message from its own incoming queue. If the incoming queue is
empty, Ractor.receive calling will block.
 Ractor.receive_if{|msg| filter_expr } is variant of Ractor.receive. receive_if only receives a
message which filter_expr is true (So Ractor.receive is the same as Ractor.receive_if{ true }.
 (1-2) yield/take (pull type)
 Ractor.yield(obj) send an message to a Ractor which are calling Ractor#takevia outgoing port . If
no Ractors are waiting for it, the Ractor.yield(obj) will block. If multiple Ractors are waiting
for Ractor.yield(obj), only one Ractor can receive the message.
 Ractor#take receives a message which is waiting by Ractor.yield(obj) method from the
specified Ractor. If the Ractor does not call Ractor.yield yet, the Ractor#take call will block.
 Ractor.select() can wait for the success of take, yield and receive.
 You can close the incoming port or outgoing port.
 You can close then with Ractor#close_incoming and Ractor#close_outgoing.
 If the incoming port is closed for a Ractor, you can’t send to the Ractor. If Ractor.receive is
blocked for the closed incoming port, then it will raise an exception.
 If the outgoing port is closed for a Ractor, you can’t call Ractor#take and Ractor.yield on
the Ractor. If ractors are blocking by Ractor#take or Ractor.yield, closing outgoing port will raise
an exception on these blocking ractors.
 When a Ractor is terminated, the Ractor’s ports are closed.
 There are 3 ways to send an object as a message
 (1) Send a reference: Sending a shareable object, send only a reference to the object (fast)
 (2) Copy an object: Sending an unshareable object by copying an object deeply (slow). Note that
you can not send an object which does not support deep copy. Some T_DATA objects are not
supported.
 (3) Move an object: Sending an unshareable object reference with a membership.
Sender Ractor can not access moved objects anymore (raise an exception) after moving it.
Current implementation makes new object as a moved object for receiver Ractor and copies
references of sending object to moved object.
 You can choose “Copy” and “Move” by the move: keyword, Ractor#send(obj, move:
true/false) and Ractor.yield(obj, move: true/false) (default is false(COPY)).

Sending/Receiving ports
Each Ractor has incoming-port and outgoing-port. Incoming-port is connected to the infinite sized incoming
queue.

Ractor r

+-------------------------------------------+

| incoming outgoing |

| port port |

r.send(obj) ->*->[incoming queue] Ractor.yield(obj) ->*-> r.take

| | |

| v |

| Ractor.receive |

+-------------------------------------------+

Connection example: r2.send obj on r1、Ractor.receive on r2

+----+ +----+

* r1 |---->* r2 *

+----+ +----+

Connection example: Ractor.yield(obj) on r1, r1.take on r2

+----+ +----+

* r1 *---->- r2 *
+----+ +----+

Connection example: Ractor.yield(obj) on r1 and r2,

and waiting for both simultaneously by Ractor.select(r1, r2)

+----+

* r1 *------+

+----+ |

+----> Ractor.select(r1, r2)

+----+ |

* r2 *------|

+----+

r = Ractor.new do
msg = Ractor.receive # Receive from r's incoming queue
msg # send back msg as block return value
end
r.send 'ok' # Send 'ok' to r's incoming port -> incoming queue
r.take # Receive from r's outgoing port

The last example shows the following ractor network.

+------+ +---+

* main |------> * r *---+

+------+ +---+ |

^ |

+-------------------+

And this code can be simplified by using an argument for Ractor.new.


# Actual argument 'ok' for `Ractor.new()` will be sent to created Ractor.
r = Ractor.new 'ok' do |msg|
# Values for formal parameters will be received from incoming queue.
# Similar to: msg = Ractor.receive

msg # Return value of the given block will be sent via outgoing port
end

# receive from the r's outgoing port.


r.take #=> `ok`

Return value of a block for Ractor.new


As already explained, the return value of Ractor.new (an evaluated value of expr in Ractor.new{ expr }) can
be taken by Ractor#take.

Ractor.new{ 42 }.take #=> 42

When the block return value is available, the Ractor is dead so that no ractors except taken Ractorcan
touch the return value, so any values can be sent with this communication path without any modification.

r = Ractor.new do
a = "hello"
binding
end

r.take.eval("p a") #=> "hello" (other communication path can not send a Binding object directly)

Wait for multiple Ractors with Ractor.select


You can wait multiple Ractor’s yield with Ractor.select(*ractors). The return value of Ractor.select() is [r,
msg] where r is yielding Ractor and msg is yielded message.
Wait for a single ractor (same as Ractor.take):

r1 = Ractor.new{'r1'}

r, obj = Ractor.select(r1)
r == r1 and obj == 'r1' #=> true

Wait for two ractors:

r1 = Ractor.new{'r1'}
r2 = Ractor.new{'r2'}
rs = [r1, r2]
as = []

# Wait for r1 or r2's Ractor.yield


r, obj = Ractor.select(*rs)
rs.delete(r)
as << obj

# Second try (rs only contain not-closed ractors)


r, obj = Ractor.select(*rs)
rs.delete(r)
as << obj
as.sort == ['r1', 'r2'] #=> true

Complex example:

pipe = Ractor.new do
loop do
Ractor.yield Ractor.receive
end
end

RN = 10
rs = RN.times.map{|i|
Ractor.new pipe, i do |pipe, i|
msg = pipe.take
msg # ping-pong
end
}
RN.times{|i|
pipe << i
}
RN.times.map{
r, n = Ractor.select(*rs)
rs.delete r
n
}.sort #=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Multiple Ractors can send to one Ractor.

# Create 10 ractors and they send objects to pipe ractor.


# pipe ractor yield received objects

pipe = Ractor.new do
loop do
Ractor.yield Ractor.receive
end
end
RN = 10
rs = RN.times.map{|i|
Ractor.new pipe, i do |pipe, i|
pipe << i
end
}

RN.times.map{
pipe.take
}.sort #=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

TODO: Current Ractor.select() has the same issue of select(2), so this interface should be refined.
TODO: select syntax of go-language uses round-robin technique to make fair scheduling.
Now Ractor.select() doesn’t use it.

Closing Ractor’s ports


 Ractor#close_incoming/outgoing close incoming/outgoing ports (similar to Queue#close).
 Ractor#close_incoming
 r.send(obj) where r‘s incoming port is closed, will raise an exception.
 When the incoming queue is empty and incoming port is closed, Ractor.receive raises an
exception. If the incoming queue is not empty, it dequeues an object without exceptions.
 Ractor#close_outgoing
 Ractor.yield on a Ractor which closed the outgoing port, it will raise an exception.
 Ractor#take for a Ractor which closed the outgoing port, it will raise an exception. If Ractor#take is
blocking, it will raise an exception.
 When a Ractor terminates, the ports are closed automatically.
 Return value of the Ractor’s block will be yielded as Ractor.yield(ret_val), even if the
implementation terminates the based native thread.
Example (try to take from closed Ractor):

r = Ractor.new do
'finish'
end
r.take # success (will return 'finish')
begin
o = r.take # try to take from closed Ractor
rescue Ractor::ClosedError
'ok'
else
"ng: #{o}"
end

Example (try to send to closed (terminated) Ractor):


r = Ractor.new do
end

r.take # wait terminate

begin
r.send(1)
rescue Ractor::ClosedError
'ok'
else
'ng'
end

When multiple Ractors are waiting for Ractor.yield(), Ractor#close_outgoing will cancel all blocking by
raising an exception (ClosedError).

Send a message by copying


Ractor#send(obj) or Ractor.yield(obj) copy obj deeply if obj is an unshareable object.

obj = 'str'.dup
r = Ractor.new obj do |msg|
# return received msg's object_id
msg.object_id
end

obj.object_id == r.take #=> false

Some objects are not supported to copy the value, and raise an exception.

obj = Thread.new{}
begin
Ractor.new obj do |msg|
msg
end
rescue TypeError => e
e.message #=> #<TypeError: allocator undefined for Thread>
else
'ng' # unreachable here
end

Send a message by moving


Ractor#send(obj, move: true) or Ractor.yield(obj, move: true) move obj to the destination Ractor. If the
source Ractor touches the moved object (for example, call the method like obj.foo()), it will be an error.

# move with Ractor#send


r = Ractor.new do
obj = Ractor.receive
obj << ' world'
end

str = 'hello'
r.send str, move: true
modified = r.take #=> 'hello world'

# str is moved, and accessing str from this Ractor is prohibited

begin
# Error because it touches moved str.
str << ' exception' # raise Ractor::MovedError
rescue Ractor::MovedError
modified #=> 'hello world'
else
raise 'unreachable'
end
# move with Ractor.yield
r = Ractor.new do
obj = 'hello'
Ractor.yield obj, move: true
obj << 'world' # raise Ractor::MovedError
end

str = r.take
begin
r.take
rescue Ractor::RemoteError
p str #=> "hello"
end

Some objects are not supported to move, and an exception will be raised.

r = Ractor.new do
Ractor.receive
end

r.send(Thread.new{}, move: true) #=> allocator undefined for Thread (TypeError)

To achieve the access prohibition for moved objects, class replacement technique is used to implement it.
Shareable objects
The following objects are shareable.
 Immutable objects
 Small integers, some symbols, true, false, nil (a.k.a. SPECIAL_CONST_P() objects in internal)
 Frozen native objects
o Numeric objects: Float, Complex, Rational, big integers (T_BIGNUM in internal)
o All Symbols.
 Frozen String and Regexp objects (their instance variables should refer only shareable objects)
 Class, Module objects (T_CLASS, T_MODULE and T_ICLASS in internal)
 Ractor and other special objects which care about synchronization.
Implementation: Now shareable objects (RVALUE) have FL_SHAREABLE flag. This flag can be added
lazily.
To make shareable objects, Ractor.make_shareable(obj) method is provided. In this case, try to make
sharaeble by freezing obj and recursively travasible objects. This method accepts copy:keyword (default
value is false).Ractor.make_shareable(obj, copy: true) tries to make a deep copy of obj and make the
copied object shareable.

Language changes to isolate unshareable objects between Ractors


To isolate unshareable objects between Ractors, we introduced additional language semantics on multi-
Ractor Ruby programs.
Note that without using Ractors, these additional semantics is not needed (100% compatible with Ruby 2).

Global variables
Only the main Ractor (a Ractor created at starting of interpreter) can access global variables.

$gv = 1
r = Ractor.new do
$gv
end

begin
r.take
rescue Ractor::RemoteError => e
e.cause.message #=> 'can not access global variables from non-main Ractors'
end

Note that some special global variables are ractor-local, like $stdin, $stdout, $stderr. See [Bug #17268] for
more details.

Instance variables of shareable objects


Instance variables of classes/modules can be get from non-main Ractors if the referring values are
shareable objects.

class C
@iv = 1
end

p Ractor.new do
class C
@iv
end
end.take #=> 1

Otherwise, only the main Ractor can access instance variables of shareable objects.

class C
@iv = [] # unshareable object
end

Ractor.new do
class C
begin
p @iv
rescue Ractor::IsolationError
p $!.message
#=> "can not get unshareable values from instance variables of classes/modules from non-main Ractors"
end

begin
@iv = 42
rescue Ractor::IsolationError
p $!.message
#=> "can not set instance variables of classes/modules by non-main Ractors"
end
end
end.take
shared = Ractor.new{}
shared.instance_variable_set(:@iv, 'str')

r = Ractor.new shared do |shared|


p shared.instance_variable_get(:@iv)
end

begin
r.take
rescue Ractor::RemoteError => e
e.cause.message #=> can not access instance variables of shareable objects from non-main Ractors
(Ractor::IsolationError)
end

Note that instance variables for class/module objects are also prohibited on Ractors.
Class variables
Only the main Ractor can access class variables.

class C
@@cv = 'str'
end

r = Ractor.new do
class C
p @@cv
end
end

begin
r.take
rescue => e
e.class #=> Ractor::IsolationError
end

Constants
Only the main Ractor can read constants which refer to the unshareable object.

class C
CONST = 'str'
end
r = Ractor.new do
C::CONST
end
begin
r.take
rescue => e
e.class #=> Ractor::IsolationError
end

Only the main Ractor can define constants which refer to the unshareable object.

class C
end
r = Ractor.new do
C::CONST = 'str'
end
begin
r.take
rescue => e
e.class #=> Ractor::IsolationError
end

To make multi-ractor supported library, the constants should only refer shareable objects.

TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'}

In this case, TABLE references an unshareable Hash object. So that other ractors can not
refer TABLE constant. To make it shareable, we can use Ractor.make_shareable() like that.

TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} )

To make it easy, Ruby 3.0 introduced new shareable_constant_value Directive.

# shareable_constant_value: literal

TABLE = {a: 'ko1', b: 'ko2', c: 'ko3'}


#=> Same as: TABLE = Ractor.make_shareable( {a: 'ko1', b: 'ko2', c: 'ko3'} )

shareable_constant_value directive accepts the following modes (descriptions use the example: CONST =
expr):
 none: Do nothing. Same as: CONST = expr
 literal:

 if expr is consites of literals, replaced to CONST = Ractor.make_shareable(expr).


 otherwise: replaced to CONST = expr.tap{|o| raise unless Ractor.shareable?}.
 experimental_everything: replaced to CONST = Ractor.make_shareable(expr).
 experimental_copy: replaced to CONST = Ractor.make_shareable(expr, copy: true).
Except the none mode (default), it is guaranteed that the assigned constants refer to only shareable objects.
See doc/syntax/comments.rdoc for more details.

Implementation note
 Each Ractor has its own thread, it means each Ractor has at least 1 native thread.
 Each Ractor has its own ID (rb_ractor_t::pub::id).
 On debug mode, all unshareable objects are labeled with current Ractor’s id, and it is checked to
detect unshareable object leak (access an object from different Ractor) in VM.
Examples

Traditional Ring example in Actor-model

RN = 1_000
CR = Ractor.current

r = Ractor.new do
p Ractor.receive
CR << :fin
end

RN.times{
r = Ractor.new r do |next_r|
next_r << Ractor.receive
end
}

p :setup_ok
r << 1
p Ractor.receive

Fork-join

def fib n
if n < 2
1
else
fib(n-2) + fib(n-1)
end
end

RN = 10
rs = (1..RN).map do |i|
Ractor.new i do |i|
[i, fib(i)]
end
end

until rs.empty?
r, v = Ractor.select(*rs)
rs.delete r
p answer: v
end

Worker pool

require 'prime'

pipe = Ractor.new do
loop do
Ractor.yield Ractor.receive
end
end

N = 1000
RN = 10
workers = (1..RN).map do
Ractor.new pipe do |pipe|
while n = pipe.take
Ractor.yield [n, n.prime?]
end
end
end

(1..N).each{|i|
pipe << i
}

pp (1..N).map{
_r, (n, b) = Ractor.select(*workers)
[n, b]
}.sort_by{|(n, b)| n}

Pipeline

# pipeline with yield/take


r1 = Ractor.new do
'r1'
end

r2 = Ractor.new r1 do |r1|
r1.take + 'r2'
end
r3 = Ractor.new r2 do |r2|
r2.take + 'r3'
end

p r3.take #=> 'r1r2r3'


# pipeline with send/receive

r3 = Ractor.new Ractor.current do |cr|


cr.send Ractor.receive + 'r3'
end

r2 = Ractor.new r3 do |r3|
r3.send Ractor.receive + 'r2'
end

r1 = Ractor.new r2 do |r2|
r2.send Ractor.receive + 'r1'
end

r1 << 'r0'
p Ractor.receive #=> "r0r1r2r3"

Supervise

# ring example again

r = Ractor.current
(1..10).map{|i|
r = Ractor.new r, i do |r, i|
r.send Ractor.receive + "r#{i}"
end
}

r.send "r0"
p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1"
# ring example with an error

r = Ractor.current
rs = (1..10).map{|i|
r = Ractor.new r, i do |r, i|
loop do
msg = Ractor.receive
raise if /e/ =~ msg
r.send msg + "r#{i}"
end
end
}

r.send "r0"
p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1"
r.send "r0"
p Ractor.select(*rs, Ractor.current) #=> [:receive, "r0r10r9r8r7r6r5r4r3r2r1"]
r.send "e0"
p Ractor.select(*rs, Ractor.current)
#=>
#<Thread:0x000056262de28bd8 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
2: from /home/ko1/src/ruby/trunk/test.rb7:in `block (2 levels) in <main>'
1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop'
/home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception
Traceback (most recent call last):
2: from /home/ko1/src/ruby/trunk/test.rb7:in `block (2 levels) in <main>'
1: from /home/ko1/src/ruby/trunk/test.rb:7:in `loop'
/home/ko1/src/ruby/trunk/test.rb:9:in `block (3 levels) in <main>': unhandled exception
1: from /home/ko1/src/ruby/trunk/test.rb21:in `<main>'
<internal:ractor>:69:in `select': thrown by remote Ractor. (Ractor::RemoteError)
# resend non-error message

r = Ractor.current
rs = (1..10).map{|i|
r = Ractor.new r, i do |r, i|
loop do
msg = Ractor.receive
raise if /e/ =~ msg
r.send msg + "r#{i}"
end
end
}

r.send "r0"
p Ractor.receive #=> "r0r10r9r8r7r6r5r4r3r2r1"
r.send "r0"
p Ractor.select(*rs, Ractor.current)
[:receive, "r0r10r9r8r7r6r5r4r3r2r1"]
msg = 'e0'
begin
r.send msg
p Ractor.select(*rs, Ractor.current)
rescue Ractor::RemoteError
msg = 'r0'
retry
end

#=> <internal:ractor>:100:in `send': The incoming-port is already closed (Ractor::ClosedError)


# because r == r[-1] is terminated.
# ring example with supervisor and re-start

def make_ractor r, i
Ractor.new r, i do |r, i|
loop do
msg = Ractor.receive
raise if /e/ =~ msg
r.send msg + "r#{i}"
end
end
end

r = Ractor.current
rs = (1..10).map{|i|
r = make_ractor(r, i)
}

msg = 'e0' # error causing message


begin
r.send msg
p Ractor.select(*rs, Ractor.current)
rescue Ractor::RemoteError
r = rs[-1] = make_ractor(rs[-2], rs.size-1)
msg = 'x0'
retry
end

#=> [:receive, "x0r9r9r8r7r6r5r4r3r2r1"]

Regular expressions (regexps) are patterns which describe the contents of a string. They’re used for testing
whether a string contains a given pattern, or extracting the portions that match. They are created with
the /pat/ and %r{pat} literals or the Regexp.new constructor.
A regexp is usually delimited with forward slashes (/). For example:
/hay/ =~ 'haystack' #=> 0
/y/.match('haystack') #=> #<MatchData "y">

If a string contains the pattern it is said to match. A literal string matches itself.
Here ‘haystack’ does not contain the pattern ‘needle’, so it doesn’t match:

/needle/.match('haystack') #=> nil

Here ‘haystack’ contains the pattern ‘hay’, so it matches:

/hay/.match('haystack') #=> #<MatchData "hay">

Specifically, /st/ requires that the string contains the letter s followed by the letter t, so it matches haystack,
also.
Note that any Regexp matching will raise a RuntimeError if timeout is set and exceeded.
See “Timeout” section in detail.

Regexp Interpolation
A regexp may contain interpolated strings; trivially:

foo = 'bar'
/#{foo}/ # => /bar/

=~ and Regexp#match
Pattern matching may be achieved by using =~ operator or Regexp#match method.
=~ Operator
=~ is Ruby’s basic pattern-matching operator. When one operand is a regular expression and the other is a
string then the regular expression is used as a pattern to match against the string. (This operator is
equivalently defined by Regexp and String so the order of String and Regexp do not matter. Other classes
may have different implementations of =~.) If a match is found, the operator returns index of first match in
string, otherwise it returns nil.

/hay/ =~ 'haystack' #=> 0


'haystack' =~ /hay/ #=> 0
/a/ =~ 'haystack' #=> 1
/u/ =~ 'haystack' #=> nil

Using =~ operator with a String and Regexp the $~ global variable is set after a successful match. $~ holds
a MatchData object. Regexp.last_match is equivalent to $~.
Regexp#match Method
The match method returns a MatchData object:

/st/.match('haystack') #=> #<MatchData "st">


Metacharacters and Escapes
The following are metacharacters (, ), [, ], {, }, ., ?, +, *. They have a specific meaning when appearing in a
pattern. To match them literally they must be backslash-escaped. To match a backslash literally, backslash-
escape it: \\.

/1 \+ 2 = 3\?/.match('Does 1 + 2 = 3?') #=> #<MatchData "1 + 2 = 3?">


/a\\\\b/.match('a\\\\b') #=> #<MatchData "a\\b">

Patterns behave like double-quoted strings and can contain the same backslash escapes (the meaning of \
s is different, however, see below).

/\s\u{6771 4eac 90fd}/.match("Go to 東京都")


#=> #<MatchData " 東京都">

Arbitrary Ruby expressions can be embedded into patterns with the #{...} construct.

place = "東京都"
/#{place}/.match("Go to 東京都")
#=> #<MatchData "東京都">

Character Classes
A character class is delimited with square brackets ([, ]) and lists characters that may appear at that point in
the match. /[ab]/ means a or b, as opposed to /ab/ which means a followed by b.

/W[aeiou]rd/.match("Word") #=> #<MatchData "Word">

Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of
characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is
equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is
irrelevant.

/[0-9a-f]/.match('9f') #=> #<MatchData "9">


/[9f]/.match('9f') #=> #<MatchData "9">

If the first character of a character class is a caret (^) the class is inverted: it matches any
character except those named.

/[^a-eg-z]/.match('f') #=> #<MatchData "f">

A character class may contain another character class. By itself this isn’t useful because [a-z[0-9]]describes
the same set as [a-z0-9]. However, character classes also support the && operator which performs set
intersection on its arguments. The two can be combined as follows:

/[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z))

This is equivalent to:


/[abh-w]/

The following metacharacters also behave like character classes:


 /./ - Any character except a newline.
 /./m - Any character (the m modifier enables multiline mode)
 /\w/ - A word character ([a-zA-Z0-9_])
 /\W/ - A non-word character ([^a-zA-Z0-9_]). Please take a look at Bug #4044 if using /\W/ with
the /i modifier.
 /\d/ - A digit character ([0-9])
 /\D/ - A non-digit character ([^0-9])
 /\h/ - A hexdigit character ([0-9a-fA-F])
 /\H/ - A non-hexdigit character ([^0-9a-fA-F])
 /\s/ - A whitespace character: /[ \t\r\n\f\v]/
 /\S/ - A non-whitespace character: /[^ \t\r\n\f\v]/
 /\R/ - A linebreak: \n, \v, \f, \r \u0085 (NEXT LINE), \u2028 (LINE SEPARATOR), \
u2029 (PARAGRAPH SEPARATOR) or \r\n.
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the
above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/matches only
the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.
 /[[:alnum:]]/ - Alphabetic and numeric character
 /[[:alpha:]]/ - Alphabetic character
 /[[:blank:]]/ - Space or tab
 /[[:cntrl:]]/ - Control character
 /[[:digit:]]/ - Digit
 /[[:graph:]]/ - Non-blank character (excludes spaces, control characters, and similar)
 /[[:lower:]]/ - Lowercase alphabetical character
 /[[:print:]]/ - Like [:graph:], but includes the space character
 /[[:punct:]]/ - Punctuation character
 /[[:space:]]/ - Whitespace character ([:blank:], newline, carriage return, etc.)
 /[[:upper:]]/ - Uppercase alphabetical
 /[[:xdigit:]]/ - Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)
Ruby also supports the following non-POSIX character classes:
 /[[:word:]]/ - A character in one of the following Unicode general
categories Letter, Mark, Number, Connector_Punctuation
 /[[:ascii:]]/ - A character in the ASCII character set

 # U+06F2 is "EXTENDED ARABIC-INDIC DIGIT TWO"


 /[[:digit:]]/.match("\u06F2") #=> #<MatchData "\u{06F2}">
 /[[:upper:]][[:lower:]]/.match("Hello") #=> #<MatchData "He">
 /[[:xdigit:]][[:xdigit:]]/.match("A6") #=> #<MatchData "A6">

Repetition
The constructs described so far match a single character. They can be followed by a repetition
metacharacter to specify how many times they need to occur. Such metacharacters are called quantifiers.
 * - Zero or more times
 + - One or more times
 ? - Zero or one times (optional)
 {n} - Exactly n times
 {n,} - n or more times
 {,m} - m or less times
 {n,m} - At least n and at most m times
At least one uppercase character (‘H’), at least one lowercase character (‘e’), two ‘l’ characters, then one ‘o’:

"Hello".match(/[[:upper:]]+[[:lower:]]+l{2}o/) #=> #<MatchData "Hello">

Greedy Match
Repetition is greedy by default: as many occurrences as possible are matched while still allowing the overall
match to succeed. By contrast, lazy matching makes the minimal amount of matches necessary for overall
success. Most greedy metacharacters can be made lazy by following them with ?. For the {n} pattern,
because it specifies an exact number of characters to match and not a variable number of characters,
the ? metacharacter instead makes the repeated pattern optional.
Both patterns below match the string. The first uses a greedy quantifier so ‘.+’ matches ‘<a><b>’; the
second uses a lazy quantifier so ‘.+?’ matches ‘<a>’:

/<.+>/.match("<a><b>") #=> #<MatchData "<a><b>">


/<.+?>/.match("<a><b>") #=> #<MatchData "<a>">

Possessive Match
A quantifier followed by + matches possessively: once it has matched it does not backtrack. They behave
like greedy quantifiers, but having matched they refuse to “give up” their match even if this jeopardises the
overall match.

/<.*><.+>/.match("<a><b>") #=> #<MatchData "<a><b>">


/<.*+><.+>/.match("<a><b>") #=> nil
/<.*><.++>/.match("<a><b>") #=> nil

Capturing
Parentheses can be used for capturing. The text enclosed by the nth group of parentheses can be
subsequently referred to with n. Within a pattern use the backreference \n (e.g. \1); outside of the pattern
use MatchData[n] (e.g. MatchData[1]).
In this example, 'at' is captured by the first group of parentheses, then referred to later with \1:
/[csh](..) [csh]\1 in/.match("The cat sat in the hat")
#=> #<MatchData "cat sat in" 1:"at">

Regexp#match returns a MatchData object which makes the captured text available with its [] method:

/[csh](..) [csh]\1 in/.match("The cat sat in the hat")[1] #=> 'at'

While Ruby supports an arbitrary number of numbered captured groups, only groups 1-9 are supported
using the \n backreference syntax.
Ruby also supports \0 as a special backreference, which references the entire matched string. This is also
available at MatchData[0]. Note that the \0 backreference cannot be used inside the regexp, as
backreferences can only be used after the end of the capture group, and the \0 backreference uses the
implicit capture group of the entire match. However, you can use this backreference when doing
substitution:

"The cat sat in the hat".gsub(/[csh]at/, '\0s')


# => "The cats sats in the hats"

Named Captures
Capture groups can be referred to by name when defined with the (?<name>) or (?'name')constructs.

/\$(?<dollars>\d+)\.(?<cents>\d+)/.match("$3.67")
#=> #<MatchData "$3.67" dollars:"3" cents:"67">
/\$(?<dollars>\d+)\.(?<cents>\d+)/.match("$3.67")[:dollars] #=> "3"

Named groups can be backreferenced with \k<name>, where name is the group name.

/(?<vowel>[aeiou]).\k<vowel>.\k<vowel>/.match('ototomy')
#=> #<MatchData "ototo" vowel:"o">

Note: A regexp can’t use named backreferences and numbered backreferences simultaneously. Also, if a
named capture is used in a regexp, then parentheses used for grouping which would otherwise result in a
unnamed capture are treated as non-capturing.

/(\w)(\w)/.match("ab").captures # => ["a", "b"]


/(\w)(\w)/.match("ab").named_captures # => {}

/(?<c>\w)(\w)/.match("ab").captures # => ["a"]


/(?<c>\w)(\w)/.match("ab").named_captures # => {"c"=>"a"}

When named capture groups are used with a literal regexp on the left-hand side of an expression and
the =~ operator, the captured text is also assigned to local variables with corresponding names.

/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0


dollars #=> "3"

Grouping
Parentheses also group the terms they enclose, allowing them to be quantified as one atomic whole.
The pattern below matches a vowel followed by 2 word characters:

/[aeiou]\w{2}/.match("Caenorhabditis elegans") #=> #<MatchData "aen">

Whereas the following pattern matches a vowel followed by a word character, twice, i.e. [aeiou]\w[aeiou]\w:
‘enor’.

/([aeiou]\w){2}/.match("Caenorhabditis elegans")
#=> #<MatchData "enor" 1:"or">

The (?:…) construct provides grouping without capturing. That is, it combines the terms it contains into an
atomic whole without creating a backreference. This benefits performance at the slight expense of
readability.
The first group of parentheses captures ‘n’ and the second ‘ti’. The second group is referred to later with the
backreference \2:

/I(n)ves(ti)ga\2ons/.match("Investigations")
#=> #<MatchData "Investigations" 1:"n" 2:"ti">

The first group of parentheses is now made non-capturing with ‘?:’, so it still matches ‘n’, but doesn’t create
the backreference. Thus, the backreference \1 now refers to ‘ti’.

/I(?:n)ves(ti)ga\1ons/.match("Investigations")
#=> #<MatchData "Investigations" 1:"ti">

Atomic Grouping
Grouping can be made atomic with (?>pat). This causes the subexpression pat to be matched
independently of the rest of the expression such that what it matches becomes fixed for the remainder of the
match, unless the entire subexpression must be abandoned and subsequently revisited. In this way pat is
treated as a non-divisible whole. Atomic grouping is typically used to optimise patterns so as to prevent the
regular expression engine from backtracking needlessly.
The " in the pattern below matches the first character of the string, then .* matches Quote“. This causes the
overall match to fail, so the text matched by .* is backtracked by one position, which leaves the final
character of the string available to match "

/".*"/.match('"Quote"') #=> #<MatchData "\"Quote\"">


If .* is grouped atomically, it refuses to backtrack Quote“, even though this means that the overall match
fails

/"(?>.*)"/.match('"Quote"') #=> nil

Subexpression Calls
The \g<name> syntax matches the previous subexpression named name, which can be a group name or
number, again. This differs from backreferences in that it re-executes the group rather than simply trying to
re-match the same text.
This pattern matches a ( character and assigns it to the paren group, tries to call that the paren sub-
expression again but fails, then matches a literal ):

/\A(?<paren>\(\g<paren>*\))*\z/ =~ '()'

/\A(?<paren>\(\g<paren>*\))*\z/ =~ '(())' #=> 0


# ^1
# ^2
# ^3
# ^4
# ^5
# ^6
# ^7
# ^8
# ^9
# ^10

1. Matches at the beginning of the string, i.e. before the first character.
2. Enters a named capture group called paren
3. Matches a literal (, the first character in the string
4. Calls the paren group again, i.e. recurses back to the second step
5. Re-enters the paren group
6. Matches a literal (, the second character in the string
7. Try to call paren a third time, but fail because doing so would prevent an overall successful match
8. Match a literal ), the third character in the string. Marks the end of the second recursive call
9. Match a literal ), the fourth character in the string
10. Match the end of the string

Alternation
The vertical bar metacharacter (|) combines several expressions into a single one that matches any of the
expressions. Each expression is an alternative.

/\w(and|or)\w/.match("Feliformia") #=> #<MatchData "form" 1:"or">


/\w(and|or)\w/.match("furandi") #=> #<MatchData "randi" 1:"and">
/\w(and|or)\w/.match("dissemblance") #=> nil

Character Properties
The \p{} construct matches characters with the named property, much like POSIX bracket classes.
 /\p{Alnum}/ - Alphabetic and numeric character
 /\p{Alpha}/ - Alphabetic character
 /\p{Blank}/ - Space or tab
 /\p{Cntrl}/ - Control character
 /\p{Digit}/ - Digit
 /\p{Emoji}/ - Unicode emoji
 /\p{Graph}/ - Non-blank character (excludes spaces, control characters, and similar)
 /\p{Lower}/ - Lowercase alphabetical character
 /\p{Print}/ - Like \p{Graph}, but includes the space character
 /\p{Punct}/ - Punctuation character
 /\p{Space}/ - Whitespace character ([:blank:], newline, carriage return, etc.)
 /\p{Upper}/ - Uppercase alphabetical
 /\p{XDigit}/ - Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)
 /\p{Word}/ - A member of one of the following Unicode general
category Letter, Mark, Number, Connector_Punctuation
 /\p{ASCII}/ - A character in the ASCII character set
 /\p{Any}/ - Any Unicode character (including unassigned characters)
 /\p{Assigned}/ - An assigned character
A Unicode character’s General Category value can also be matched with \p{Ab} where Ab is the category’s
abbreviation as described below:
 /\p{L}/ - ‘Letter’
 /\p{Ll}/ - ‘Letter: Lowercase’
 /\p{Lm}/ - ‘Letter: Mark’
 /\p{Lo}/ - ‘Letter: Other’
 /\p{Lt}/ - ‘Letter: Titlecase’
 /\p{Lu}/ - ‘Letter: Uppercase
 /\p{Lo}/ - ‘Letter: Other’
 /\p{M}/ - ‘Mark’
 /\p{Mn}/ - ‘Mark: Nonspacing’
 /\p{Mc}/ - ‘Mark: Spacing Combining’
 /\p{Me}/ - ‘Mark: Enclosing’
 /\p{N}/ - ‘Number’
 /\p{Nd}/ - ‘Number: Decimal Digit’
 /\p{Nl}/ - ‘Number: Letter’
 /\p{No}/ - ‘Number: Other’
 /\p{P}/ - ‘Punctuation’
 /\p{Pc}/ - ‘Punctuation: Connector’
 /\p{Pd}/ - ‘Punctuation: Dash’
 /\p{Ps}/ - ‘Punctuation: Open’
 /\p{Pe}/ - ‘Punctuation: Close’
 /\p{Pi}/ - ‘Punctuation: Initial Quote’
 /\p{Pf}/ - ‘Punctuation: Final Quote’
 /\p{Po}/ - ‘Punctuation: Other’
 /\p{S}/ - ‘Symbol’
 /\p{Sm}/ - ‘Symbol: Math’
 /\p{Sc}/ - ‘Symbol: Currency’
 /\p{Sc}/ - ‘Symbol: Currency’
 /\p{Sk}/ - ‘Symbol: Modifier’
 /\p{So}/ - ‘Symbol: Other’
 /\p{Z}/ - ‘Separator’
 /\p{Zs}/ - ‘Separator: Space’
 /\p{Zl}/ - ‘Separator: Line’
 /\p{Zp}/ - ‘Separator: Paragraph’
 /\p{C}/ - ‘Other’
 /\p{Cc}/ - ‘Other: Control’
 /\p{Cf}/ - ‘Other: Format’
 /\p{Cn}/ - ‘Other: Not Assigned’
 /\p{Co}/ - ‘Other: Private Use’
 /\p{Cs}/ - ‘Other: Surrogate’
Lastly, \p{} matches a character’s Unicode script. The following scripts are
supported: Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal,
Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Geo
rgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, K
annada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Mal
ayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osma
nya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac
, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi.
Unicode codepoint U+06E9 is named “ARABIC PLACE OF SAJDAH” and belongs to the Arabic script:

/\p{Arabic}/.match("\u06E9") #=> #<MatchData "\u06E9">

All character properties can be inverted by prefixing their name with a caret (^).
Letter ‘A’ is not in the Unicode Ll (Letter; Lowercase) category, so this match succeeds:

/\p{^Ll}/.match("A") #=> #<MatchData "A">

Anchors
Anchors are metacharacter that match the zero-width positions between characters, anchoring the match to
a specific position.
 ^ - Matches beginning of line
 $ - Matches end of line
 \A - Matches beginning of string.
 \Z - Matches end of string. If string ends with a newline, it matches just before newline
 \z - Matches end of string
 \G - Matches first matching position:
In methods like String#gsub and String#scan, it changes on each iteration. It initially matches the
beginning of subject, and in each following iteration it matches where the last match finished.

" a b c".gsub(/ /, '_') #=> "____a_b_c"


" a b c".gsub(/\G /, '_') #=> "____a b c"

In methods like Regexp#match and String#match that take an (optional) offset, it matches where
the search begins.

"hello, world".match(/,/, 3) #=> #<MatchData ",">


"hello, world".match(/\G,/, 3) #=> nil

 \b - Matches word boundaries when outside brackets; backspace (0x08) when inside brackets
 \B - Matches non-word boundaries
 (?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but
doesn’t include those characters in the matched text
 (?!pat) - Negative lookahead assertion: ensures that the following characters do not match pat, but
doesn’t include those characters in the matched text
 (?<=pat) - Positive lookbehind assertion: ensures that the preceding characters match pat, but
doesn’t include those characters in the matched text
 (?<!pat) - Negative lookbehind assertion: ensures that the preceding characters do not match pat,
but doesn’t include those characters in the matched text
 \K - Match reset: the matched content preceding \K in the regexp is excluded from the result. For
example, the following two regexps are almost equivalent:

 /ab\Kc/ =~ "abc" #=> 0


 /(?<=ab)c/ =~ "abc" #=> 2

These match same string and $& equals "c", while the matched position is different.
As are the following two regexps:

/(a)\K(b)\Kc/
/(?<=(?<=(a))(b))c/

If a pattern isn’t anchored it can begin at any point in the string:

/real/.match("surrealist") #=> #<MatchData "real">

Anchoring the pattern to the beginning of the string forces the match to start there. ‘real’ doesn’t occur at the
beginning of the string, so now the match fails:
/\Areal/.match("surrealist") #=> nil

The match below fails because although ‘Demand’ contains ‘and’, the pattern does not occur at a word
boundary.

/\band/.match("Demand")

Whereas in the following example ‘and’ has been anchored to a non-word boundary so instead of matching
the first ‘and’ it matches from the fourth letter of ‘demand’ instead:

/\Band.+/.match("Supply and demand curve") #=> #<MatchData "and curve">

The pattern below uses positive lookahead and positive lookbehind to match text appearing in tags without
including the tags in the match:

/(?<=<b>)\w+(?=<\/b>)/.match("Fortune favours the <b>bold</b>")


#=> #<MatchData "bold">

Options
The end delimiter for a regexp can be followed by one or more single-letter options which control how the
pattern can match.
 /pat/i - Ignore case
 /pat/m - Treat a newline as a character matched by .
 /pat/x - Ignore whitespace and comments in the pattern
 /pat/o - Perform #{} interpolation only once
i, m, and x can also be applied on the subexpression level with the (?on-off) construct, which enables
options on, and disables options off for the expression enclosed by the parentheses:

/a(?i:b)c/.match('aBc') #=> #<MatchData "aBc">


/a(?-i:b)c/i.match('ABC') #=> nil

Additionally, these options can also be toggled for the remainder of the pattern:

/a(?i)bc/.match('abC') #=> #<MatchData "abC">

Options may also be used with Regexp.new:

Regexp.new("abc", Regexp::IGNORECASE) #=> /abc/i


Regexp.new("abc", Regexp::MULTILINE) #=> /abc/m
Regexp.new("abc # Comment", Regexp::EXTENDED) #=> /abc # Comment/x
Regexp.new("abc", Regexp::IGNORECASE | Regexp::MULTILINE) #=> /abc/mi
Regexp.new("abc", "i") #=> /abc/i
Regexp.new("abc", "m") #=> /abc/m
Regexp.new("abc # Comment", "x") #=> /abc # Comment/x
Regexp.new("abc", "im") #=> /abc/mi

Free-Spacing Mode and Comments


As mentioned above, the x option enables free-spacing mode. Literal white space inside the pattern is
ignored, and the octothorpe (#) character introduces a comment until the end of the line. This allows the
components of the pattern to be organized in a potentially more readable fashion.
A contrived pattern to match a number with optional decimal places:

float_pat = /\A
[[:digit:]]+ # 1 or more digits before the decimal point
(\. # Decimal point
[[:digit:]]+ # 1 or more digits after the decimal point
)? # The decimal point and following digits are optional
\Z/x
float_pat.match('3.14') #=> #<MatchData "3.14" 1:".14">

There are a number of strategies for matching whitespace:


 Use a pattern such as \s or \p{Space}.
 Use escaped whitespace such as \ , i.e. a space preceded by a backslash.
 Use a character class such as [ ].
Comments can be included in a non-x pattern with the (?#comment) construct, where comment is arbitrary
text ignored by the regexp engine.
Comments in regexp literals cannot include unescaped terminator characters.
Encoding
Regular expressions are assumed to use the source encoding. This can be overridden with one of the
following modifiers.
 /pat/u - UTF-8
 /pat/e - EUC-JP
 /pat/s - Windows-31J
 /pat/n - ASCII-8BIT
A regexp can be matched against a string when they either share an encoding, or the regexp’s encoding
is US-ASCII and the string’s encoding is ASCII-compatible.
If a match between incompatible encodings is attempted an Encoding::CompatibilityErrorexception is raised.
The Regexp#fixed_encoding? predicate indicates whether the regexp has a fixed encoding, that is one
incompatible with ASCII. A regexp’s encoding can be explicitly fixed by
supplying Regexp::FIXEDENCODING as the second argument of Regexp.new:
r = Regexp.new("a".force_encoding("iso-8859-1"),Regexp::FIXEDENCODING)
r =~ "a\u3042"
# raises Encoding::CompatibilityError: incompatible encoding regexp match
# (ISO-8859-1 regexp with UTF-8 string)

Regexp Global Variables


Pattern matching sets some global variables :
 $~ is equivalent to Regexp.last_match;
 $& contains the complete matched text;
 $` contains string before match;
 $' contains string after match;
 $1, $2 and so on contain text matching first, second, etc capture group;
 $+ contains last capture group.
Example:

m = /s(\w{2}).*(c)/.match('haystack') #=> #<MatchData "stac" 1:"ta" 2:"c">


$~ #=> #<MatchData "stac" 1:"ta" 2:"c">
Regexp.last_match #=> #<MatchData "stac" 1:"ta" 2:"c">

$& #=> "stac"


# same as m[0]
$` #=> "hay"
# same as m.pre_match
$' #=> "k"
# same as m.post_match
$1 #=> "ta"
# same as m[1]
$2 #=> "c"
# same as m[2]
$3 #=> nil
# no third group in pattern
$+ #=> "c"
# same as m[-1]

These global variables are thread-local and method-local variables.

Performance
Certain pathological combinations of constructs can lead to abysmally bad performance.
Consider a string of 25 as, a d, 4 as, and a c.
s = 'a' * 25 + 'd' + 'a' * 4 + 'c'
#=> "aaaaaaaaaaaaaaaaaaaaaaaaadaaaac"

The following patterns match instantly as you would expect:

/(b|a)/ =~ s #=> 0
/(b|a+)/ =~ s #=> 0
/(b|a+)*/ =~ s #=> 0

However, the following pattern takes appreciably longer:

/(b|a+)*c/ =~ s #=> 26

This happens because an atom in the regexp is quantified by both an immediate + and an enclosing *with
nothing to differentiate which is in control of any particular character. The nondeterminism that results
produces super-linear performance. (Consult Mastering Regular Expressions (3rd ed.), pp 222, by Jeffery
Friedl, for an in-depth analysis). This particular case can be fixed by use of atomic grouping, which prevents
the unnecessary backtracking:

(start = Time.now) && /(b|a+)*c/ =~ s && (Time.now - start)


#=> 24.702736882
(start = Time.now) && /(?>b|a+)*c/ =~ s && (Time.now - start)
#=> 0.000166571

A similar case is typified by the following example, which takes approximately 60 seconds to execute for
me:
Match a string of 29 as against a pattern of 29 optional as followed by 29 mandatory as:

Regexp.new('a?' * 29 + 'a' * 29) =~ 'a' * 29

The 29 optional as match the string, but this prevents the 29 mandatory as that follow from matching. Ruby
must then backtrack repeatedly so as to satisfy as many of the optional matches as it can while still
matching the mandatory 29. It is plain to us that none of the optional matches can succeed, but this fact
unfortunately eludes Ruby.
The best way to improve performance is to significantly reduce the amount of backtracking needed. For this
case, instead of individually matching 29 optional as, a range of optional as can be matched all at once
with a{0,29}:

Regexp.new('a{0,29}' + 'a' * 29) =~ 'a' * 29

Timeout
There are two APIs to set timeout. One is Regexp.timeout=, which is process-global configuration of timeout
for Regexp matching.
Regexp.timeout = 3
s = 'a' * 25 + 'd' + 'a' * 4 + 'c'
/(b|a+)*c/ =~ s #=> This raises an exception in three seconds

The other is timeout keyword of Regexp.new.

re = Regexp.new("(b|a+)*c", timeout: 3)
s = 'a' * 25 + 'd' + 'a' * 4 + 'c'
/(b|a+)*c/ =~ s #=> This raises an exception in three seconds

When using Regexps to process untrusted input, you should use the timeout feature to avoid excessive
backtracking. Otherwise, a malicious user can provide input to Regexp causing Denial-of-Service attack.
Note that the timeout is not set by default because an appropriate limit highly depends on an application
requirement and context.

Ruby Security
The Ruby programming language is large and complex and there are many security pitfalls often
encountered by newcomers and experienced Rubyists alike.
This document aims to discuss many of these pitfalls and provide more secure alternatives where
applicable.
Please check the full list of publicly known CVEs and how to correctly report a security vulnerability,
at: www.ruby-lang.org/en/security/ Japanese version is here: www.ruby-lang.org/ja/security/
Security vulnerabilities should be reported via an email to [email protected] (the PGP public key),
which is a private mailing list. Reported problems will be published after fixes.
Marshal.load
Ruby’s Marshal module provides methods for serializing and deserializing Ruby object trees to and from a
binary data format.
Never use Marshal.load to deserialize untrusted or user supplied data. Because Marshal can deserialize to
almost any Ruby object and has full control over instance variables, it is possible to craft a malicious
payload that executes code shortly after deserialization.
If you need to deserialize untrusted data, you should use JSON as it is only capable of returning ‘primitive’
types such as strings, arrays, hashes, numbers and nil. If you need to deserialize other classes, you should
handle this manually. Never deserialize to a user specified class.
YAML
YAML is a popular human readable data serialization format used by many Ruby programs for configuration
and database persistence of Ruby object trees.
Similar to Marshal, it is able to deserialize into arbitrary Ruby classes. For example, the
following YAML data will create an ERB object when deserialized:

!ruby/object:ERB
src: puts `uname`

Because of this, many of the security considerations applying to Marshal are also applicable to YAML. Do
not use YAML to deserialize untrusted data.

Symbols
Symbols are often seen as syntax sugar for simple strings, but they play a much more crucial role. The MRI
Ruby implementation uses Symbols internally for method, variable and constant names. The reason for this
is that symbols are simply integers with names attached to them, so they are faster to look up in hashtables.
Starting in version 2.2, most symbols can be garbage collected; these are called mortal symbols. Most
symbols you create (e.g. by calling to_sym) are mortal.
Immortal symbols on the other hand will never be garbage collected. They are created when modifying
code:
 defining a method (e.g. with define_method),
 setting an instance variable (e.g. with instance_variable_set),
 creating a variable or constant (e.g. with const_set)
C extensions that have not been updated and are still calling ‘SYM2ID` will create immortal symbols. Bugs
in 2.2.0: send and __send__ also created immortal symbols, and calling methods with keyword arguments
could also create some.
Don’t create immortal symbols from user inputs. Otherwise, this would allow a user to mount a denial of
service attack against your application by flooding it with unique strings, which will cause memory to grow
indefinitely until the Ruby process is killed or causes the system to slow to a halt.
While it might not be a good idea to call these with user inputs, methods that used to be vulnerable such
as to_sym, respond_to?, method, instance_variable_get, const_get, etc. are no longer a threat.

Regular expressions
Ruby’s regular expression syntax has some minor differences when compared to other languages. In Ruby,
the ^ and $ anchors do not refer to the beginning and end of the string, rather the beginning and end of
a line.
This means that if you’re using a regular expression like /^[a-z]+$/ to restrict a string to only letters, an
attacker can bypass this check by passing a string containing a letter, then a newline, then any string of
their choosing.
If you want to match the beginning and end of the entire string in Ruby, use the anchors \A and \z.
eval
Never pass untrusted or user controlled input to eval.
Unless you are implementing a REPL like irb or pry, eval is almost certainly not what you want. Do not
attempt to filter user input before passing it to eval - this approach is fraught with danger and will most likely
open your application up to a serious remote code execution vulnerability.
send
‘Global functions’ in Ruby (puts, exit, etc.) are actually private instance methods on Object. This means it is
possible to invoke these methods with send, even if the call to send has an explicit receiver.
For example, the following code snippet writes “Hello world” to the terminal:

1.send(:puts, "Hello world")

You should never call send with user supplied input as the first parameter. Doing so can introduce a denial
of service vulnerability:

foo.send(params[:bar]) # params[:bar] is "exit!"

If an attacker can control the first two arguments to send, remote code execution is possible:

# params is { :a => "eval", :b => "...ruby code to be executed..." }


foo.send(params[:a], params[:b])

When dispatching a method call based on user input, carefully verify that the method name. If possible,
check it against a whitelist of safe method names.
Note that the use of public_send is also dangerous, as send itself is public:

1.public_send("send", "eval", "...ruby code to be executed...")

DRb
As DRb allows remote clients to invoke arbitrary methods, it is not suitable to expose to untrusted clients.
When using DRb, try to avoid exposing it over the network if possible. If this isn’t possible and you need to
expose DRb to the world, you must configure an appropriate security policy with DRb::ACL.

Caveats for implementing Signal.trap callbacks


As with implementing signal handlers in C or most other languages, all code passed to Signal.trapmust be
reentrant. If you are not familiar with reentrancy, you need to read up on it at Wikipedia or elsewhere before
reading the rest of this document.
Most importantly, “thread-safety” does not guarantee reentrancy; and methods such as Mutex#lock and
Mutex#synchronize which are commonly used for thread-safety even prevent reentrancy.

An implementation detail of the Ruby VM


The Ruby VM defers Signal.trap callbacks from running until it is safe for its internal data structures, but it
does not know when it is safe for data structures in YOUR code. Ruby implements deferred signal handling
by registering short C functions with only async-signal-safe functions as signal handlers. These short C
functions only do enough tell the VM to run callbacks registered via Signal.trap later in the main
Ruby Thread.
Unsafe methods to call in Signal.trap blocks
When in doubt, consider anything not listed as safe below as being unsafe.
 Mutex#lock, Mutex#synchronize and any code using them are explicitly unsafe. This
includes Monitor in the standard library which uses Mutex to provide reentrancy.
 Dir.chdir with block
 any IO write operations when IO#sync is false; including IO#write, IO#write_nonblock, IO#puts.
Pipes and sockets default to ‘IO#sync = true’, so it is safe to write to them unless IO#sync was
disabled.
 File#flock, as the underlying flock(2) call is not specified by POSIX
Commonly safe operations inside Signal.trap blocks
 Assignment and retrieval of local, instance, and class variables

 Most object allocations and initializations of common types


including Array, Hash, String, Struct, Time.
 Common Array, Hash, String, Struct operations which do not execute a block are generally safe;
but beware if iteration is occurring elsewhere.
 Hash#[], Hash#[]= (unless Hash.new was given an unsafe block)
 Thread::Queue#push and Thread::SizedQueue#push (since Ruby 2.1)
 Creating a new Thread via Thread.new/Thread.start can used to get around the unusability of
Mutexes inside a signal handler
 Signal.trap is safe to use inside blocks passed to Signal.trap
 arithmetic on Integer and Float (‘+’, ‘-’, ‘%’, ‘*’, ‘/’)
Additionally, signal handlers do not run between two successive local variable accesses, so
shortcuts such as ‘+=’ and ‘-=’ will not trigger a data race when used on Integer and Float classes
in signal handlers.
System call wrapper methods which are safe inside Signal.trap
Since Ruby has wrappers around many async-signal-safe C functions the corresponding wrappers for
many IO, File, Dir, and Socket methods are safe.
(Incomplete list)
 Dir.chdir (without block arg)
 Dir.mkdir
 Dir.open
 File#truncate
 File.link
 File.open
 File.readlink
 File.rename
 File.stat
 File.symlink
 File.truncate
 File.unlink
 File.utime
 IO#close
 IO#dup
 IO#fsync
 IO#read
 IO#read_nonblock
 IO#stat
 IO#sysread
 IO#syswrite
 IO.select
 IO.pipe
 Process.clock_gettime
 Process.exit!
 Process.fork
 Process.kill
 Process.pid
 Process.ppid
 Process.waitpid

Ruby Standard Library


The Ruby Standard Library is a vast collection of classes and modules that you can require in your code for
additional features.
Below is an overview of libraries and extensions followed by a brief description.

Libraries
MakeMakefile
Module used to generate a Makefile for C extensions
RbConfig
Information of your configure and build of Ruby
Gem
Package management framework for Ruby

Extensions
Coverage
Provides coverage measurement for Ruby
Monitor
Provides an object or module to use safely by more than one thread
objspace
Extends ObjectSpace module to add methods for internal statistics
PTY
Creates and manages pseudo terminals
Ripper
Provides an interface for parsing Ruby programs into S-expressions
Socket
Access underlying OS socket implementations
Default gems

Libraries
Abbrev
Calculates a set of unique abbreviations for a given set of strings
Base64
Support for encoding and decoding binary data using a Base64 representation
Benchmark
Provides methods to measure and report the time used to execute code
Bundler
Manage your Ruby application’s gem dependencies
CGI
Support for the Common Gateway Interface protocol
CSV
Provides an interface to read and write CSV files and data
Delegator
Provides three abilities to delegate method calls to an object
DidYouMean
“Did you mean?” experience in Ruby
DRb
Distributed object system for Ruby
English
Provides references to special global variables with less cryptic names
ERB
An easy to use but powerful templating system for Ruby
ErrorHighlight
Highlight error location in your code
FileUtils
Several file utility methods for copying, moving, removing, etc
Find
This module supports top-down traversal of a set of file paths
Forwardable
Provides delegation of specified methods to a designated object
GetoptLong
Parse command line options similar to the GNU C getopt_long()
IPAddr
Provides methods to manipulate IPv4 and IPv6 IP addresses
IRB
Interactive Ruby command-line tool for REPL (Read Eval Print Loop)
OptionParser
Ruby-oriented class for command-line option analysis
Logger
Provides a simple logging utility for outputting messages
Mutex_m
Mixin to extend objects to be handled like a Mutex
Net::HTTP
HTTP client api for Ruby
Observable
Provides a mechanism for publish/subscribe pattern in Ruby
Open3
Provides access to stdin, stdout and stderr when running other programs
OpenStruct
Class to build custom data structures, similar to a Hash
OpenURI
An easy-to-use wrapper for Net::HTTP, Net::HTTPS and Net::FTP
PP
Provides a PrettyPrinter for Ruby objects
PrettyPrinter
Implements a pretty printing algorithm for readable structure
PStore
Implements a file based persistence mechanism based on a Hash
Readline
Wrapper for Readline extencion and Reline
Reline
GNU Readline and Editline by pure Ruby implementation.
Resolv
Thread-aware DNS resolver library in Ruby
resolv-replace.rb
Replace Socket DNS with Resolv
RDoc
Produces HTML and command-line documentation for Ruby
Rinda
The Linda distributed computing paradigm in Ruby
SecureRandom
Interface for secure random number generator
Set
Provides a class to deal with collections of unordered, unique values
Shellwords
Manipulates strings with word parsing rules of UNIX Bourne shell
Singleton
Implementation of the Singleton pattern for Ruby
Tempfile
A utility class for managing temporary files
Time
Extends the Time class with methods for parsing and conversion
Timeout
Auto-terminate potentially long-running operations in Ruby
tmpdir.rb
Extends the Dir class to manage the OS temporary file path
TSort
Topological sorting using Tarjan’s algorithm
un.rb
Utilities to replace common UNIX commands
URI
A Ruby module providing support for Uniform Resource Identifiers
YAML
Ruby client library for the Psych YAML implementation
WeakRef
Allows a referenced object to be garbage-collected

Extensions
BigDecimal
Provides arbitrary-precision floating point decimal arithmetic
Date
A subclass of Object includes Comparable module for handling dates
DateTime
Subclass of Date to handling dates, hours, minutes, seconds, offsets
Digest
Provides a framework for message digest libraries
Etc
Provides access to information typically stored in UNIX /etc directory
Fcntl
Loads constants defined in the OS fcntl.h C header file
Fiddle
A libffi wrapper for Ruby
IO
Extensions for Ruby IO class, including wait, nonblock and ::console
JSON
Implements Javascript Object Notation for Ruby
NKF
Ruby extension for Network Kanji Filter
OpenSSL
Provides SSL, TLS and general purpose cryptography for Ruby
Pathname
Representation of the name of a file or directory on the filesystem
Psych
A YAML parser and emitter for Ruby
Racc
A LALR(1) parser generator written in Ruby.
Readline
Provides an interface for GNU Readline and Edit Line (libedit)
StringIO
Pseudo I/O on String objects
StringScanner
Provides lexical scanning operations on a String
Syslog
Ruby interface for the POSIX system logging facility
WIN32OLE
Provides an interface for OLE Automation in Ruby
Zlib
Ruby interface for the zlib compression/decompression library

Bundled gems

Libraries
MiniTest
A test suite with TDD, BDD, mocking and benchmarking
PowerAssert
Power Assert for Ruby.
Rake
Ruby build program with capabilities similar to make
Test::Unit
A compatibility layer for MiniTest
REXML
An XML toolkit for Ruby
RSS
Family of libraries that support various formats of XML “feeds”
Net::FTP
Support for the File Transfer Protocol
Net::IMAP
Ruby client api for Internet Message Access Protocol
Net::POP3
Ruby client library for POP3
Net::SMTP
Simple Mail Transfer Protocol client library for Ruby
Matrix
Represents a mathematical matrix.
Prime
Prime numbers and factorization library
RBS
RBS is a language to describe the structure of Ruby programs
TypeProf
A type analysis tool for Ruby code based on abstract interpretation
DEBUGGER__
Debugging functionality for Ruby

Formats for Dates and Times


Several Ruby time-related classes have instance method strftime, which returns a formatted string
representing all or part of a date or time:
 Date#strftime.
 DateTime#strftime.
 Time#strftime.
Each of these methods takes optional argument format, which has zero or more
embedded formatspecifications (see below).
Each of these methods returns the string resulting from replacing each format specification embedded
in format with a string form of one or more parts of the date or time.
A simple example:

Time.now.strftime('%H:%M:%S') # => "14:02:07"

A format specification has the form:

%[flags][width]conversion
It consists of:
 A leading percent character.
 Zero or more flags (each is a character).
 An optional width specifier (an integer).
 A conversion specifier (a character).
Except for the leading percent character, the only required part is the conversion specifier, so we begin with
that.

Conversion Specifiers

Date (Year, Month, Day)


 %Y - Year including century, zero-padded:

 Time.now.strftime('%Y') # => "2022"


 Time.new(-1000).strftime('%Y') # => "-1000" # Before common era.
 Time.new(10000).strftime('%Y') # => "10000" # Far future.
 Time.new(10).strftime('%Y') # => "0010" # Zero-padded by default.

 %y - Year without century, in range (0.99), zero-padded:

 Time.now.strftime('%y') # => "22"


 Time.new(1).strftime('%y') # => "01" # Zero-padded by default.

 %C - Century, zero-padded:

 Time.now.strftime('%C') # => "20"


 Time.new(-1000).strftime('%C') # => "-10" # Before common era.
 Time.new(10000).strftime('%C') # => "100" # Far future.
 Time.new(100).strftime('%C') # => "01" # Zero-padded by default.

 %m - Month of the year, in range (1..12), zero-padded:

 Time.new(2022, 1).strftime('%m') # => "01" # Zero-padded by default.


 Time.new(2022, 12).strftime('%m') # => "12"

 %B - Full month name, capitalized:

 Time.new(2022, 1).strftime('%B') # => "January"


 Time.new(2022, 12).strftime('%B') # => "December"

 %b - Abbreviated month name, capitalized:

 Time.new(2022, 1).strftime('%b') # => "Jan"


 Time.new(2022, 12).strftime('%h') # => "Dec"

 %h - Same as %b.
 %d - Day of the month, in range (1..31), zero-padded:

 Time.new(2002, 1, 1).strftime('%d') # => "01"


 Time.new(2002, 1, 31).strftime('%d') # => "31"

 %e - Day of the month, in range (1..31), blank-padded:

 Time.new(2002, 1, 1).strftime('%e') # => " 1"


 Time.new(2002, 1, 31).strftime('%e') # => "31"

 %j - Day of the year, in range (1..366), zero-padded:

 Time.new(2002, 1, 1).strftime('%j') # => "001"


 Time.new(2002, 12, 31).strftime('%j') # => "365"

Time (Hour, Minute, Second, Subsecond)


 %H - Hour of the day, in range (0..23), zero-padded:

 Time.new(2022, 1, 1, 1).strftime('%H') # => "01"


 Time.new(2022, 1, 1, 13).strftime('%H') # => "13"

 %k - Hour of the day, in range (0..23), blank-padded:

 Time.new(2022, 1, 1, 1).strftime('%k') # => " 1"


 Time.new(2022, 1, 1, 13).strftime('%k') # => "13"

 %I - Hour of the day, in range (1..12), zero-padded:

 Time.new(2022, 1, 1, 1).strftime('%I') # => "01"


 Time.new(2022, 1, 1, 13).strftime('%I') # => "01"

 %l - Hour of the day, in range (1..12), blank-padded:

 Time.new(2022, 1, 1, 1).strftime('%l') # => " 1"


 Time.new(2022, 1, 1, 13).strftime('%l') # => " 1"

 %P - Meridian indicator, lowercase:

 Time.new(2022, 1, 1, 1).strftime('%P') # => "am"


 Time.new(2022, 1, 1, 13).strftime('%P') # => "pm"

 %p - Meridian indicator, uppercase:

 Time.new(2022, 1, 1, 1).strftime('%p') # => "AM"


 Time.new(2022, 1, 1, 13).strftime('%p') # => "PM"

 %M - Minute of the hour, in range (0..59), zero-padded:


 Time.new(2022, 1, 1, 1, 0, 0).strftime('%M') # => "00"

 %S - Second of the minute in range (0..59), zero-padded:

 Time.new(2022, 1, 1, 1, 0, 0, 0).strftime('%S') # => "00"

 %L - Millisecond of the second, in range (0..999), zero-padded:

 Time.new(2022, 1, 1, 1, 0, 0, 0).strftime('%L') # => "000"

 %N - Fractional seconds, default width is 9 digits (nanoseconds):

 t = Time.now # => 2022-06-29 07:10:20.3230914 -0500


 t.strftime('%N') # => "323091400" # Default.

Use width specifiers to adjust units:

t.strftime('%3N') # => "323" # Milliseconds.


t.strftime('%6N') # => "323091" # Microseconds.
t.strftime('%9N') # => "323091400" # Nanoseconds.
t.strftime('%12N') # => "323091400000" # Picoseconds.
t.strftime('%15N') # => "323091400000000" # Femptoseconds.
t.strftime('%18N') # => "323091400000000000" # Attoseconds.
t.strftime('%21N') # => "323091400000000000000" # Zeptoseconds.
t.strftime('%24N') # => "323091400000000000000000" # Yoctoseconds.

 %s - Number of seconds since the epoch:

 Time.now.strftime('%s') # => "1656505136"

Timezone
 %z - Timezone as hour and minute offset from UTC:

 Time.now.strftime('%z') # => "-0500"

 %Z - Timezone name (platform-dependent):

 Time.now.strftime('%Z') # => "Central Daylight Time"

Weekday
 %A - Full weekday name:

 Time.now.strftime('%A') # => "Wednesday"


 %a - Abbreviated weekday name:

 Time.now.strftime('%a') # => "Wed"

 %u - Day of the week, in range (1..7), Monday is 1:

 t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500


 t.strftime('%a') # => "Sun"
 t.strftime('%u') # => "7"

 %w - Day of the week, in range (0..6), Sunday is 0:

 t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500


 t.strftime('%a') # => "Sun"
 t.strftime('%w') # => "0"

Week Number
 %U - Week number of the year, in range (0..53), zero-padded, where each week begins on a
Sunday:

 t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500


 t.strftime('%a') # => "Sun"
 t.strftime('%U') # => "26"

 %W - Week number of the year, in range (0..53), zero-padded, where each week begins on a
Monday:

 t = Time.new(2022, 6, 26) # => 2022-06-26 00:00:00 -0500


 t.strftime('%a') # => "Sun"
 t.strftime('%W') # => "25"

Week Dates
See ISO 8601 week dates.

t0 = Time.new(2023, 1, 1) # => 2023-01-01 00:00:00 -0600


t1 = Time.new(2024, 1, 1) # => 2024-01-01 00:00:00 -0600

 %G - Week-based year:

 t0.strftime('%G') # => "2022"


 t1.strftime('%G') # => "2024"

 %g - Week-based year without century, in range (0..99), zero-padded:


 t0.strftime('%g') # => "22"
 t1.strftime('%g') # => "24"

 %V - Week number of the week-based year, in range (1..53), zero-padded:

 t0.strftime('%V') # => "52"


 t1.strftime('%V') # => "01"

Literals
 %n - Newline character “n”:

 Time.now.strftime('%n') # => "\n"

 %t - Tab character “t”:

 Time.now.strftime('%t') # => "\t"

 %% - Percent character ‘%’:

 Time.now.strftime('%%') # => "%"

Shorthand Conversion Specifiers


Each shorthand specifier here is shown with its corresponding longhand specifier.
 %c - Date and time:

 Time.now.strftime('%c') # => "Wed Jun 29 08:01:41 2022"


 Time.now.strftime('%a %b %e %T %Y') # => "Wed Jun 29 08:02:07 2022"

 %D - Date:

 Time.now.strftime('%D') # => "06/29/22"


 Time.now.strftime('%m/%d/%y') # => "06/29/22"

 %F - ISO 8601 date:

 Time.now.strftime('%F') # => "2022-06-29"


 Time.now.strftime('%Y-%m-%d') # => "2022-06-29"

 %v - VMS date:

 Time.now.strftime('%v') # => "29-JUN-2022"


 Time.now.strftime('%e-%^b-%4Y') # => "29-JUN-2022"

 %x - Same as %D.
 %X - Same as %T.
 %r - 12-hour time:

 Time.new(2022, 1, 1, 1).strftime('%r') # => "01:00:00 AM"


 Time.new(2022, 1, 1, 1).strftime('%I:%M:%S %p') # => "01:00:00 AM"
 Time.new(2022, 1, 1, 13).strftime('%r') # => "01:00:00 PM"
 Time.new(2022, 1, 1, 13).strftime('%I:%M:%S %p') # => "01:00:00 PM"

 %R - 24-hour time:

 Time.new(2022, 1, 1, 1).strftime('%R') # => "01:00"


 Time.new(2022, 1, 1, 1).strftime('%H:%M') # => "01:00"
 Time.new(2022, 1, 1, 13).strftime('%R') # => "13:00"
 Time.new(2022, 1, 1, 13).strftime('%H:%M') # => "13:00"

 %T - 24-hour time:

 Time.new(2022, 1, 1, 1).strftime('%T') # => "01:00:00"


 Time.new(2022, 1, 1, 1).strftime('%H:%M:%S') # => "01:00:00"
 Time.new(2022, 1, 1, 13).strftime('%T') # => "13:00:00"
 Time.new(2022, 1, 1, 13).strftime('%H:%M:%S') # => "13:00:00"

 %+ (not supported in Time#strftime) - Date and time:

 DateTime.now.strftime('%+')
 # => "Wed Jun 29 08:31:53 -05:00 2022"
 DateTime.now.strftime('%a %b %e %H:%M:%S %Z %Y')
 # => "Wed Jun 29 08:32:18 -05:00 2022"

Flags
Flags may affect certain formatting specifications.
Multiple flags may be given with a single conversion specified; order does not matter.

Padding Flags
 0 - Pad with zeroes:

 Time.new(10).strftime('%0Y') # => "0010"

 _ - Pad with blanks:

 Time.new(10).strftime('%_Y') # => " 10"

 - - Don’t pad:

 Time.new(10).strftime('%-Y') # => "10"


Casing Flags
 ^ - Upcase result:

 Time.new(2022, 1).strftime('%B') # => "January" # No casing flag.


 Time.new(2022, 1).strftime('%^B') # => "JANUARY"

 # - Swapcase result:

 Time.now.strftime('%p') # => "AM"


 Time.now.strftime('%^p') # => "AM"
 Time.now.strftime('%#p') # => "am"

Timezone Flags
 : - Put timezone as colon-separated hours and minutes:

 Time.now.strftime('%:z') # => "-05:00"

 :: - Put timezone as colon-separated hours, minutes, and seconds:

 Time.now.strftime('%::z') # => "-05:00:00"

Width Specifiers
The integer width specifier gives a minimum width for the returned string:

Time.new(2002).strftime('%Y') # => "2002" # No width specifier.


Time.new(2002).strftime('%10Y') # => "0000002002"
Time.new(2002, 12).strftime('%B') # => "December" # No width specifier.
Time.new(2002, 12).strftime('%10B') # => " December"
Time.new(2002, 12).strftime('%3B') # => "December" # Ignored if too small.

Specialized Format Strings


Here are a few specialized format strings, each based on an external standard.

HTTP Format
The HTTP date format is based on RFC 2616, and treats dates in the format '%a, %d %b %Y %T GMT':

d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03>


# Return HTTP-formatted string.
httpdate = d.httpdate # => "Sat, 03 Feb 2001 00:00:00 GMT"
# Return new date parsed from HTTP-formatted string.
Date.httpdate(httpdate) # => #<Date: 2001-02-03>
# Return hash parsed from HTTP-formatted string.
Date._httpdate(httpdate)
# =>
{:wday=>6, :mday=>3, :mon=>2, :year=>2001, :hour=>0, :min=>0, :sec=>0, :zone=>"GMT", :offset=>0}

RFC 3339 Format


The RFC 3339 date format is based on RFC 3339:

d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03>


# Return 3339-formatted string.
rfc3339 = d.rfc3339 # => "2001-02-03T00:00:00+00:00"
# Return new date parsed from 3339-formatted string.
Date.rfc3339(rfc3339) # => #<Date: 2001-02-03>
# Return hash parsed from 3339-formatted string.
Date._rfc3339(rfc3339)
# => {:year=>2001, :mon=>2, :mday=>3, :hour=>0, :min=>0, :sec=>0, :zone=>"+00:00", :offset=>0}

RFC 2822 Format


The RFC 2822 date format is based on RFC 2822, and treats dates in the format '%a, %-d %b %Y %T %z']:

d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03>


# Return 2822-formatted string.
rfc2822 = d.rfc2822 # => "Sat, 3 Feb 2001 00:00:00 +0000"
# Return new date parsed from 2822-formatted string.
Date.rfc2822(rfc2822) # => #<Date: 2001-02-03>
# Return hash parsed from 2822-formatted string.
Date._rfc2822(rfc2822)
# =>
{:wday=>6, :mday=>3, :mon=>2, :year=>2001, :hour=>0, :min=>0, :sec=>0, :zone=>"+0000", :offset=>0}

JIS X 0301 Format


The JIS X 0301 format includes the Japanese era name, and treats dates in the format '%Y-%m-%d'with the
first letter of the romanized era name prefixed:

d = Date.new(2001, 2, 3) # => #<Date: 2001-02-03>


# Return 0301-formatted string.
jisx0301 = d.jisx0301 # => "H13.02.03"
# Return new date parsed from 0301-formatted string.
Date.jisx0301(jisx0301) # => #<Date: 2001-02-03>
# Return hash parsed from 0301-formatted string.
Date._jisx0301(jisx0301) # => {:year=>2001, :mon=>2, :mday=>3}

ISO 8601 Format Specifications


This section shows format specifications that are compatible with ISO 8601. Details for various formats may
be seen at the links.
Examples in this section assume:

t = Time.now # => 2022-06-29 16:49:25.465246 -0500

Dates
See ISO 8601 dates.
 Years:
o Basic year (YYYY):

o t.strftime('%Y') # => "2022"

o Expanded year (±YYYYY):

o t.strftime('+%5Y') # => "+02022"


o t.strftime('-%5Y') # => "-02022"

 Calendar dates:
o Basic date (YYYYMMDD):

o t.strftime('%Y%m%d') # => "20220629"

o Extended date (YYYY-MM-DD):

o t.strftime('%Y-%m-%d') # => "2022-06-29"

o Reduced extended date (YYYY-MM):

o t.strftime('%Y-%m') # => "2022-06"

 Week dates:
o Basic date (YYYYWww or YYYYWwwD):

o t.strftime('%Y%Ww') # => "202226w"


o t.strftime('%Y%Ww%u') # => "202226w3"

o Extended date (YYYY-Www or <tt>YYYY-Www-D<tt>):


o t.strftime('%Y-%Ww') # => "2022-26w"
o t.strftime('%Y-%Ww-%u') # => "2022-26w-3"

 Ordinal dates:
o Basic date (YYYYDDD):

o t.strftime('%Y%j') # => "2022180"

o Extended date (YYYY-DDD):

o t.strftime('%Y-%j') # => "2022-180"

Times
See ISO 8601 times.
 Times:

o Basic time (Thhmmss.sss, Thhmmss, Thhmm, or Thh):

o t.strftime('T%H%M%S.%L') # => "T164925.465"


o t.strftime('T%H%M%S') # => "T164925"
o t.strftime('T%H%M') # => "T1649"
o t.strftime('T%H') # => "T16"

o Extended time (Thh:mm:ss.sss, Thh:mm:ss, or Thh:mm):

o t.strftime('T%H:%M:%S.%L') # => "T16:49:25.465"


o t.strftime('T%H:%M:%S') # => "T16:49:25"
o t.strftime('T%H:%M') # => "T16:49"

 Time zone designators:


o Timezone (time represents a valid time, hh represents a valid 2-digit hour,
and mm represents a valid 2-digit minute):
 Basic timezone (time±hhmm, time±hh, or timeZ):

 t.strftime('T%H%M%S%z') # => "T164925-0500"


 t.strftime('T%H%M%S%z').slice(0..-3) # => "T164925-05"
 t.strftime('T%H%M%SZ') # => "T164925Z"

 Extended timezone (time±hh:mm):

 t.strftime('T%H:%M:%S%z') # => "T16:49:25-0500"

o See also:
 Local time (unqualified).
 Coordinated Universal Time (UTC).
 Time offsets from UTC.

Combined Date and Time


See ISO 8601 Combined date and time representations.
An ISO 8601 combined date and time representation may be any ISO 8601 date and any ISO 8601 time,
separated by the letter T.
For the relevant strftime formats, see Dates and Times above.

Ruby Syntax
The Ruby syntax is large and is split up into the following sections:
Literals
Numbers, Strings, Arrays, Hashes, etc.
Assignment
Assignment and variables
Control Expressions
if, unless, while, until, for, break, next, redo
Pattern matching
Experimental structural pattern matching and variable binding syntax
Methods
Method and method argument syntax
Calling Methods
How to call a method (or send a message to a method)
Modules and Classes
Creating modules and classes including inheritance
Exceptions
Exception handling syntax
Precedence
Precedence of ruby operators
Refinements
Use and behavior of the refinements feature
Miscellaneous
alias, undef, BEGIN, END
Comments
Line and block code comments

Assignment
In Ruby, assignment uses the = (equals sign) character. This example assigns the number five to the local
variable v:
v=5

Assignment creates a local variable if the variable was not previously referenced.
An assignment expression result is always the assigned value, including assignment methods.

Local Variable Names


A local variable name must start with a lowercase US-ASCII letter or a character with the eight bit set.
Typically local variables are US-ASCII compatible since the keys to type them exist on all keyboards.
(Ruby programs must be written in a US-ASCII-compatible character set. In such character sets if the eight
bit is set it indicates an extended character. Ruby allows local variables to contain such characters.)
A local variable name may contain letters, numbers, an _ (underscore or low line) or a character with the
eighth bit set.

Local Variable Scope


Once a local variable name has been assigned-to all uses of the name for the rest of the scope are
considered local variables.
Here is an example:

1.times do
a=1
puts "local variables in the block: #{local_variables.join ", "}"
end

puts "no local variables outside the block" if local_variables.empty?

This prints:

local variables in the block: a

no local variables outside the block

Since the block creates a new scope, any local variables created inside it do not leak to the surrounding
scope.
Variables defined in an outer scope appear inner scope:

a=0

1.times do
puts "local variables: #{local_variables.join ", "}"
end

This prints:

local variables: a

You may isolate variables in a block from the outer scope by listing them following a ; in the block’s
arguments. See the documentation for block local variables in the calling methods documentation for an
example.
See also Kernel#local_variables, but note that a for loop does not create a new scope like a block does.

Local Variables and Methods


In Ruby local variable names and method names are nearly identical. If you have not assigned to one of
these ambiguous names ruby will assume you wish to call a method. Once you have assigned to the name
ruby will assume you wish to reference a local variable.
The local variable is created when the parser encounters the assignment, not when the assignment occurs:

a = 0 if false # does not assign to a

p local_variables # prints [:a]

p a # prints nil

The similarity between method and local variable names can lead to confusing code, for example:

def big_calculation
42 # pretend this takes a long time
end

big_calculation = big_calculation()

Now any reference to big_calculation is considered a local variable and will be cached. To call the method,
use self.big_calculation.
You can force a method call by using empty argument parentheses as shown above or by using an explicit
receiver like self. Using an explicit receiver may raise a NameError if the method’s visibility is not public or
the receiver is the literal self.
Another commonly confusing case is when using a modifier if:

p a if a = 0.zero?

Rather than printing “true” you receive a NameError, “undefined local variable or method ‘a’”. Since ruby
parses the bare a left of the if first and has not yet seen an assignment to a it assumes you wish to call a
method. Ruby then sees the assignment to a and will assume you are referencing a local method.
The confusion comes from the out-of-order execution of the expression. First the local variable is assigned-
to then you attempt to call a nonexistent method.

Local Variables and eval


Using eval to evaluate Ruby code will allow access to local variables defined in the same scope, even if the
local variables are not defined until after the call to eval. However, local variables defined inside the call
to eval will not be reflected in the surrounding scope. Inside the call to eval, local variables defined in the
surrounding scope and local variables defined inside the call to eval will be accessible. However, you will
not be able to access local variables defined in previous or subsequent calls to evalin the same scope.
Consider each eval call a separate nested scope. Example:

def m
eval "bar = 1"
lvs = eval "baz = 2; ary = [local_variables, foo, baz]; x = 2; ary"
eval "quux = 3"
foo = 1
lvs << local_variables
end

m
# => [[:baz, :ary, :x, :lvs, :foo], nil, 2, [:lvs, :foo]]

Instance Variables
Instance variables are shared across all methods for the same object.
An instance variable must start with a @ (“at” sign or commercial at). Otherwise instance variable names
follow the rules as local variable names. Since the instance variable starts with an @ the second character
may be an upper-case letter.
Here is an example of instance variable usage:

class C
def initialize(value)
@instance_variable = value
end

def value
@instance_variable
end
end

object1 = C.new "some value"


object2 = C.new "other value"
p object1.value # prints "some value"
p object2.value # prints "other value"

An uninitialized instance variable has a value of nil. If you run Ruby with warnings enabled, you will get a
warning when accessing an uninitialized instance variable.
The value method has access to the value set by the initialize method, but only for the same object.
Class Variables
Class variables are shared between a class, its subclasses and its instances.
A class variable must start with a @@ (two “at” signs). The rest of the name follows the same rules as
instance variables.
Here is an example:

class A
@@class_variable = 0

def value
@@class_variable
end

def update
@@class_variable = @@class_variable + 1
end
end

class B < A
def update
@@class_variable = @@class_variable + 2
end
end

a = A.new
b = B.new

puts "A value: #{a.value}"


puts "B value: #{b.value}"

This prints:

A value: 0
B value: 0

Continuing with the same example, we can update using objects from either class and the value is shared:
puts "update A"
a.update

puts "A value: #{a.value}"


puts "B value: #{b.value}"

puts "update B"


b.update

puts "A value: #{a.value}"


puts "B value: #{b.value}"

puts "update A"


a.update

puts "A value: #{a.value}"


puts "B value: #{b.value}"

This prints:

update A
A value: 1
B value: 1
update B
A value: 3
B value: 3
update A
A value: 4
B value: 4

Accessing an uninitialized class variable will raise a NameError exception.


Note that classes have instance variables because classes are objects, so try not to confuse class and
instance variables.

Global Variables
Global variables are accessible everywhere.
Global variables start with a $ (dollar sign). The rest of the name follows the same rules as instance
variables.
Here is an example:

$global = 0
class C
puts "in a class: #{$global}"

def my_method
puts "in a method: #{$global}"

$global = $global + 1
$other_global = 3
end
end

C.new.my_method

puts "at top-level, $global: #{$global}, $other_global: #{$other_global}"

This prints:

in a class: 0

in a method: 0

at top-level, $global: 1, $other_global: 3

An uninitialized global variable has a value of nil.


Ruby has some special globals that behave differently depending on context such as the regular expression
match variables or that have a side-effect when assigned to. See the global variables documentation for
details.

Assignment Methods
You can define methods that will behave like assignment, for example:

class C
def value=(value)
@value = value
end
end

c = C.new
c.value = 42

Using assignment methods allows your programs to look nicer. When assigning to an instance variable
most people use Module#attr_accessor:
class C
attr_accessor :value
end

When using method assignment you must always have a receiver. If you do not have a receiver, Ruby
assumes you are assigning to a local variable:

class C
attr_accessor :value

def my_method
value = 42

puts "local_variables: #{local_variables.join ", "}"


puts "@value: #{@value.inspect}"
end
end

C.new.my_method

This prints:

local_variables: value

@value: nil

To use the assignment method you must set the receiver:

class C
attr_accessor :value

def my_method
self.value = 42

puts "local_variables: #{local_variables.join ", "}"


puts "@value: #{@value.inspect}"
end
end

C.new.my_method

This prints:
local_variables:

@value: 42

Note that the value returned by an assignment method is ignored whatever, since an assignment
expression result is always the assignment value.

Abbreviated Assignment
You can mix several of the operators and assignment. To add 1 to an object you can write:

a=1

a += 2

p a # prints 3

This is equivalent to:

a=1

a=a+2

p a # prints 3

You can use the following operators this way: +, -, *, /, %, **, &, |, ^, <<, >>
There are also ||= and &&=. The former makes an assignment if the value was nil or false while the latter
makes an assignment if the value was not nil or false.
Here is an example:

a ||= 0
a &&= 1

p a # prints 1

Note that these two operators behave more like a || a = 0 than a = a || 0.


Implicit Array Assignment
You can implicitly create an array by listing multiple values when assigning:

a = 1, 2, 3

p a # prints [1, 2, 3]

This implicitly creates an Array.


You can use * or the “splat” operator or unpack an Array when assigning. This is similar to multiple
assignment:

a = *[1, 2, 3]

p a # prints [1, 2, 3]

You can splat anywhere in the right-hand side of the assignment:

a = 1, *[2, 3]

p a # prints [1, 2, 3]

Multiple Assignment
You can assign multiple values on the right-hand side to multiple variables:

a, b = 1, 2

p a: a, b: b # prints {:a=>1, :b=>2}

In the following sections any place “variable” is used an assignment method, instance, class or global will
also work:

def value=(value)
p assigned: value
end

self.value, $global = 1, 2 # prints {:assigned=>1}

p $global # prints 2

You can use multiple assignment to swap two values in-place:

old_value = 1

new_value, old_value = old_value, 2

p new_value: new_value, old_value: old_value


# prints {:new_value=>1, :old_value=>2}

If you have more values on the right hand side of the assignment than variables on the left hand side, the
extra values are ignored:

a, b = 1, 2, 3
p a: a, b: b # prints {:a=>1, :b=>2}

You can use * to gather extra values on the right-hand side of the assignment.

a, *b = 1, 2, 3

p a: a, b: b # prints {:a=>1, :b=>[2, 3]}

The * can appear anywhere on the left-hand side:

*a, b = 1, 2, 3

p a: a, b: b # prints {:a=>[1, 2], :b=>3}

But you may only use one * in an assignment.


Array Decomposition
Like Array decomposition in method arguments you can decompose an Array during assignment using
parenthesis:

(a, b) = [1, 2]

p a: a, b: b # prints {:a=>1, :b=>2}

You can decompose an Array as part of a larger multiple assignment:

a, (b, c) = 1, [2, 3]

p a: a, b: b, c: c # prints {:a=>1, :b=>2, :c=>3}

Since each decomposition is considered its own multiple assignment you can use * to gather arguments in
the decomposition:

a, (b, *c), *d = 1, [2, 3, 4], 5, 6

p a: a, b: b, c: c, d: d
# prints {:a=>1, :b=>2, :c=>[3, 4], :d=>[5, 6]}

Calling Methods
Calling a method sends a message to an object so it can perform some work.
In ruby you send a message to an object like this:

my_method()
Note that the parenthesis are optional:

my_method

Except when there is difference between using and omitting parentheses, this document uses parenthesis
when arguments are present to avoid confusion.
This section only covers calling methods. See also the syntax documentation on defining methods.

Receiver
self is the default receiver. If you don’t specify any receiver self will be used. To specify a receiver use .:

my_object.my_method

This sends the my_method message to my_object. Any object can be a receiver but depending on the
method’s visibility sending a message may raise a NoMethodError.
You may also use :: to designate a receiver, but this is rarely used due to the potential for confusion
with :: for namespaces.

Chaining Method Calls


You can “chain” method calls by immediately following one method call with another.
This example chains methods Array#append and Array#compact:

a = [:foo, 'bar', 2]
a1 = [:baz, nil, :bam, nil]
a2 = a.append(*a1).compact
a2 # => [:foo, "bar", 2, :baz, :bam]

Details:
 First method merge creates a copy of a, appends (separately) each element of a1 to the copy, and
returns

 [:foo, "bar", 2, :baz, nil, :bam, nil]

 Chained method compact creates a copy of that return value, removes its nil-valued entries, and
returns

 [:foo, "bar", 2, :baz, :bam]

You can chain methods that are in different classes. This example chains
methods Hash#to_a and Array#reverse:

h = {foo: 0, bar: 1, baz: 2}


h.to_a.reverse # => [[:baz, 2], [:bar, 1], [:foo, 0]]

Details:
 First method Hash#to_a converts a to an Array, and returns

 [[:foo, 0], [:bar, 1], [:baz, 2]]

 Chained method Array#reverse creates copy of that return value, reverses it, and returns

 [[:baz, 2], [:bar, 1], [:foo, 0]]

Safe Navigation Operator


&., called “safe navigation operator”, allows to skip method call when receiver is nil. It returns niland doesn’t
evaluate method’s arguments if the call is skipped.

REGEX = /(ruby) is (\w+)/i


"Ruby is awesome!".match(REGEX).values_at(1, 2)
# => ["Ruby", "awesome"]
"Python is fascinating!".match(REGEX).values_at(1, 2)
# NoMethodError: undefined method `values_at' for nil:NilClass
"Python is fascinating!".match(REGEX)&.values_at(1, 2)
# => nil

This allows to easily chain methods which could return empty value. Note that &. skips only one next call, so
for a longer chain it is necessary to add operator on each level:

"Python is fascinating!".match(REGEX)&.values_at(1, 2).join(' - ')


# NoMethodError: undefined method `join' for nil:NilClass
"Python is fascinating!".match(REGEX)&.values_at(1, 2)&.join(' - ')
# => nil

Arguments
There are three types of arguments when sending a message, the positional arguments, keyword (or
named) arguments and the block argument. Each message sent may use one, two or all types of
arguments, but the arguments must be supplied in this order.
All arguments in ruby are passed by reference and are not lazily evaluated.
Each argument is separated by a ,:

my_method(1, '2', :three)

Arguments may be an expression, a hash argument:

'key' => value

or a keyword argument:
key: value

Hash and keyword arguments must be contiguous and must appear after all positional arguments, but may
be mixed:

my_method('a' => 1, b: 2, 'c' => 3)

Positional Arguments
The positional arguments for the message follow the method name:

my_method(argument1, argument2)

In many cases, parenthesis are not necessary when sending a message:

my_method argument1, argument2

However, parenthesis are necessary to avoid ambiguity. This will raise a SyntaxError because ruby does
not know which method argument3 should be sent to:

method_one argument1, method_two argument2, argument3

If the method definition has a *argument extra positional arguments will be assigned to argumentin the
method as an Array.
If the method definition doesn’t include keyword arguments, the keyword or hash-type arguments are
assigned as a single hash to the last argument:

def my_method(options)
p options
end

my_method('a' => 1, b: 2) # prints: {'a'=>1, :b=>2}

If too many positional arguments are given, an ArgumentError is raised.

Default Positional Arguments


When the method defines default arguments you do not need to supply all the arguments to the method.
Ruby will fill in the missing arguments in-order.
First we’ll cover the simple case where the default arguments appear on the right. Consider this method:
def my_method(a, b, c = 3, d = 4)
p [a, b, c, d]
end

Here c and d have default values which ruby will apply for you. If you send only two arguments to this
method:

my_method(1, 2)

You will see ruby print [1, 2, 3, 4].


If you send three arguments:

my_method(1, 2, 5)

You will see ruby print [1, 2, 5, 4]


Ruby fills in the missing arguments from left to right.
Ruby allows default values to appear in the middle of positional arguments. Consider this more complicated
method:

def my_method(a, b = 2, c = 3, d)
p [a, b, c, d]
end

Here b and c have default values. If you send only two arguments to this method:

my_method(1, 4)

You will see ruby print [1, 2, 3, 4].


If you send three arguments:

my_method(1, 5, 6)

You will see ruby print [1, 5, 3, 6].


Describing this in words gets complicated and confusing. I’ll describe it in variables and values instead.
First 1 is assigned to a, then 6 is assigned to d. This leaves only the arguments with default values.
Since 5 has not been assigned to a value yet, it is given to b and c uses its default value of 3.

Keyword Arguments
Keyword arguments follow any positional arguments and are separated by commas like positional
arguments:

my_method(positional1, keyword1: value1, keyword2: value2)


Any keyword arguments not given will use the default value from the method definition. If a keyword
argument is given that the method did not list, and the method definition does not accept arbitrary keyword
arguments, an ArgumentError will be raised.
Keyword argument value can be omitted, meaning the value will be be fetched from the context by the
name of the key

keyword1 = 'some value'


my_method(positional1, keyword1:)
# ...is the same as
my_method(positional1, keyword1: keyword1)

Be aware that when method parenthesis are omitted, too, the parsing order might be unexpected:

my_method positional1, keyword1:

some_other_expression

# ...is actually parsed as


my_method(positional1, keyword1: some_other_expression)

Block Argument
The block argument sends a closure from the calling scope to the method.
The block argument is always last when sending a message to a method. A block is sent to a method
using do ... end or { ... }:

my_method do
# ...
end

or:

my_method {
# ...
}

do end has lower precedence than { } so:

method_1 method_2 {
# ...
}

Sends the block to method_2 while:

method_1 method_2 do
# ...
end

Sends the block to method_1. Note that in the first case if parentheses are used the block is sent
to method_1.
A block will accept arguments from the method it was sent to. Arguments are defined similar to the way a
method defines arguments. The block’s arguments go in | ... | following the opening do or {:

my_method do |argument1, argument2|


# ...
end

Block Local Arguments


You may also declare block-local arguments to a block using ; in the block arguments list. Assigning to a
block-local argument will not override local arguments outside the block in the caller’s scope:

def my_method
yield self
end

place = "world"

my_method do |obj; place|


place = "block"
puts "hello #{obj} this is #{place}"
end

puts "place is: #{place}"

This prints:

hello main this is block


place is world

So the place variable in the block is not the same place variable as outside the block. Removing ;
place from the block arguments gives this result:

hello main this is block


place is block

Array to Arguments Conversion


Given the following method:
def my_method(argument1, argument2, argument3)
end

You can turn an Array into an argument list with * (or splat) operator:

arguments = [1, 2, 3]
my_method(*arguments)

or:

arguments = [2, 3]
my_method(1, *arguments)

Both are equivalent to:

my_method(1, 2, 3)

If the method accepts keyword arguments, the splat operator will convert a hash at the end of the array into
keyword arguments:

def my_method(a, b, c: 3)
end

arguments = [1, 2, { c: 4 }]
my_method(*arguments)

Note that this behavior is currently deprecated and will emit a warning. This behavior will be removed in
Ruby 3.0.
You may also use the ** (described next) to convert a Hash into keyword arguments.
If the number of objects in the Array do not match the number of arguments for the method,
an ArgumentError will be raised.
If the splat operator comes first in the call, parentheses must be used to avoid a warning:

my_method *arguments # warning


my_method(*arguments) # no warning

Hash to Keyword Arguments Conversion


Given the following method:

def my_method(first: 1, second: 2, third: 3)


end

You can turn a Hash into keyword arguments with the ** (keyword splat) operator:
arguments = { first: 3, second: 4, third: 5 }
my_method(**arguments)

or:

arguments = { first: 3, second: 4 }


my_method(third: 5, **arguments)

Both are equivalent to:

my_method(first: 3, second: 4, third: 5)

If the method definition uses the keyword splat operator to gather arbitrary keyword arguments, they will not
be gathered by *:

def my_method(*a, **kw)


p arguments: a, keywords: kw
end

my_method(1, 2, '3' => 4, five: 6)

Prints:

{:arguments=>[1, 2], :keywords=>{'3'=>4, :five=>6}}

Proc to Block Conversion


Given a method that use a block:

def my_method
yield self
end

You can convert a proc or lambda to a block argument with the & (block conversion) operator:

argument = proc { |a| puts "#{a.inspect} was yielded" }

my_method(&argument)

If the block conversion operator comes first in the call, parenthesis must be used to avoid a warning:

my_method &argument # warning


my_method(&argument) # no warning

Method Lookup
When you send a message, Ruby looks up the method that matches the name of the message for the
receiver. Methods are stored in classes and modules so method lookup walks these, not the objects
themselves.
Here is the order of method lookup for the receiver’s class or module R:
 The prepended modules of R in reverse order
 For a matching method in R
 The included modules of R in reverse order
If R is a class with a superclass, this is repeated with R‘s superclass until a method is found.
Once a match is found method lookup stops.
If no match is found this repeats from the beginning, but looking for method_missing. The
default method_missing is BasicObject#method_missing which raises a NameError when invoked.
If refinements (an experimental feature) are active, the method lookup changes. See the refinements
documentation for details

Code Comments
Ruby has two types of comments: inline and block.
Inline comments start with the # character and continue until the end of the line:

# On a separate line
class Foo # or at the end of the line
# can be indented
def bar
end
end

Block comments start with =begin and end with =end. Each should start on a separate line.

=begin
This is
commented out
=end

class Foo
end

=begin some_tag
this works, too
=end

=begin and =end can not be indented, so this is a syntax error:

class Foo
=begin

Will not work

=end

end

Magic Comments
While comments are typically ignored by Ruby, special “magic comments” contain directives that affect how
the code is interpreted.
Top-level magic comments must appear in the first comment section of a file.
NOTE: Magic comments affect only the file in which they appear; other files are unaffected.

# frozen_string_literal: true

var = 'hello'
var.frozen? # => true

Alternative syntax
Magic comments may consist of a single directive (as in the example above). Alternatively, multiple
directives may appear on the same line if separated by “;” and wrapped between “-*-” (see Emacs’ file
variables).

# emacs-compatible; -*- coding: big5; mode: ruby; frozen_string_literal: true -*-

p 'hello'.frozen? # => true


p 'hello'.encoding # => #<Encoding:Big5>

encoding Directive
Indicates which string encoding should be used for string literals, regexp literals and __ENCODING__:

# encoding: big5

''.encoding # => #<Encoding:Big5>

Default encoding is UTF-8.


Top-level magic comments must start on the first line, or on the second line if the first line looks like #!
shebang line.
The word “coding” may be used instead of “encoding”.
frozen_string_literal Directive
Indicates that string literals should be allocated once at parse time and frozen.

# frozen_string_literal: true

3.times do
p 'hello'.object_id # => prints same number
end
p 'world'.frozen? # => true

The default is false; this can be changed with --enable=frozen-string-literal. Without the directive, or with #
frozen_string_literal: false, the example above would print 3 different numbers and “false”.
Starting in Ruby 3.0, string literals that are dynamic are not frozen nor reused:

# frozen_string_literal: true

p "Addition: #{2 + 2}".frozen? # => false

It must appear in the first comment section of a file.


warn_indent Directive
This directive can turn on detection of bad indentation for statements that follow it:

def foo
end # => no warning

# warn_indent: true
def bar
end # => warning: mismatched indentations at 'end' with 'def' at 6

Another way to get these warnings to show is by running Ruby with warnings (ruby -w). Using a directive to
set this false will prevent these warnings to show.
shareable_constant_value Directive
Note: This directive is experimental in Ruby 3.0 and may change in future releases.
This special directive helps to create constants that hold only immutable objects, or Ractor-
shareableconstants.
The directive can specify special treatment for values assigned to constants:
 none: (default)
 literal: literals are implicitly frozen, others must be Ractor-shareable
 experimental_everything: all made shareable
 experimental_copy: copy deeply and make it shareable
Mode none (default)
No special treatment in this mode (as in Ruby 2.x): no automatic freezing and no checks.
It has always been a good idea to deep-freeze constants; Ractor makes this an even better idea as only the
main ractor can access non-shareable constants:
# shareable_constant_value: none
A = {foo: []}
A.frozen? # => false
Ractor.new { puts A } # => can not access non-shareable objects by non-main Ractor.

Mode literal
In “literal” mode, constants assigned to literals will be deeply-frozen:

# shareable_constant_value: literal
X = [{foo: []}] # => same as [{foo: [].freeze}.freeze].freeze

Other values must be shareable:

# shareable_constant_value: literal
X = Object.new # => cannot assign unshareable object to X

Note that only literals directly assigned to constants, or recursively held in such literals will be frozen:

# shareable_constant_value: literal
var = [{foo: []}]
var.frozen? # => false (assignment was made to local variable)
X = var # => cannot assign unshareable object to X

X = Set[1, 2, {foo: []}].freeze # => cannot assign unshareable object to X


# (`Set[...]` is not a literal and
# `{foo: []}` is an argument to `Set.[]`)

The method Module#const_set is not affected.


Mode experimental_everything
In this mode, all values assigned to constants are made shareable.

# shareable_constant_value: experimental_everything
FOO = Set[1, 2, {foo: []}]
# same as FOO = Ractor.make_sharable(...)
# OR same as `FOO = Set[1, 2, {foo: [].freeze}.freeze].freeze`

var = [{foo: []}]


var.frozen? # => false (assignment was made to local variable)
X = var # => calls `Ractor.make_shareable(var)`
var.frozen? # => true

This mode is “experimental”, because it might be error prone, for example by deep-freezing the constants of
an external resource which could cause errors:
# shareable_constant_value: experimental_everything
FOO = SomeGem::Something::FOO
# => deep freezes the gem's constant!

This will be revisited before Ruby 3.1 to either allow ‘everything` or to instead remove this mode.
The method Module#const_set is not affected.
Mode experimental_copy
In this mode, all values assigned to constants are deeply copied and made shareable. It is safer mode
than experimental_everything.

# shareable_constant_value: experimental_everything
var = [{foo: []}]
var.frozen? # => false (assignment was made to local variable)
X = var # => calls `Ractor.make_shareable(var, copy: true)`
var.frozen? # => false
Ractor.shareable?(X) #=> true
var.object_id == X.object_id #=> false

This mode is “experimental” and has not been discussed thoroughly. This will be revisited before Ruby 3.1
to either allow ‘copy` or to instead remove this mode.
The method Module#const_set is not affected.

Scope
This directive can be used multiple times in the same file:

# shareable_constant_value: none
A = {foo: []}
A.frozen? # => false
Ractor.new { puts A } # => can not access non-shareable objects by non-main Ractor.

# shareable_constant_value: literal
B = {foo: []}
B.frozen? # => true
B[:foo].frozen? # => true

C = [Object.new] # => cannot assign unshareable object to C (Ractor::IsolationError)

D = [Object.new.freeze]
D.frozen? # => true

# shareable_constant_value: experimental_everything
E = Set[1, 2, Object.new]
E.frozen? # => true
E.all(&:frozen?) # => true

The directive affects only subsequent constants and only for the current scope:

module Mod
# shareable_constant_value: literal
A = [1, 2, 3]
module Sub
B = [4, 5]
end
end

C = [4, 5]

module Mod
D = [6]
end
p Mod::A.frozen?, Mod::Sub::B.frozen? # => true, true
p C.frozen?, Mod::D.frozen? # => false, false

Control Expressions
Ruby has a variety of ways to control execution. All the expressions described here return a value.
For the tests in these control expressions, nil and false are false-values and true and any other object are
true-values. In this document “true” will mean “true-value” and “false” will mean “false-value”.
if Expression
The simplest if expression has two parts, a “test” expression and a “then” expression. If the “test” expression
evaluates to a true then the “then” expression is evaluated.
Here is a simple if statement:

if true then
puts "the test resulted in a true-value"
end

This will print “the test resulted in a true-value”.


The then is optional:

if true
puts "the test resulted in a true-value"
end

This document will omit the optional then for all expressions as that is the most common usage of if.
You may also add an else expression. If the test does not evaluate to true the else expression will be
executed:

if false
puts "the test resulted in a true-value"
else
puts "the test resulted in a false-value"
end

This will print “the test resulted in a false-value”.


You may add an arbitrary number of extra tests to an if expression using elsif. An elsif executes when all
tests above the elsif are false.

a=1

if a == 0
puts "a is zero"
elsif a == 1
puts "a is one"
else
puts "a is some other value"
end

This will print “a is one” as 1 is not equal to 0. Since else is only executed when there are no matching
conditions.
Once a condition matches, either the if condition or any elsif condition, the if expression is complete and no
further tests will be performed.
Like an if, an elsif condition may be followed by a then.
In this example only “a is one” is printed:

a=1

if a == 0
puts "a is zero"
elsif a == 1
puts "a is one"
elsif a >= 1
puts "a is greater than or equal to one"
else
puts "a is some other value"
end

The tests for if and elsif may have side-effects. The most common use of side-effect is to cache a value into
a local variable:

if a = object.some_value
# do something to a
end

The result value of an if expression is the last value executed in the expression.

Ternary if
You may also write a if-then-else expression using ? and :. This ternary if:

input_type = gets =~ /hello/i ? "greeting" : "other"

Is the same as this if expression:

input_type =
if gets =~ /hello/i
"greeting"
else
"other"
end

While the ternary if is much shorter to write than the more verbose form, for readability it is recommended
that the ternary if is only used for simple conditionals. Also, avoid using multiple ternary conditions in the
same expression as this can be confusing.
unless Expression
The unless expression is the opposite of the if expression. If the value is false, the “then” expression is
executed:

unless true
puts "the value is a false-value"
end

This prints nothing as true is not a false-value.


You may use an optional then with unless just like if.
Note that the above unless expression is the same as:

if not true
puts "the value is a false-value"
end

Like an if expression you may use an else condition with unless:

unless true
puts "the value is false"
else
puts "the value is true"
end

This prints “the value is true” from the else condition.


You may not use elsif with an unless expression.
The result value of an unless expression is the last value executed in the expression.
Modifier if and unless
if and unless can also be used to modify an expression. When used as a modifier the left-hand side is the
“then” statement and the right-hand side is the “test” expression:

a=0

a += 1 if a.zero?

pa

This will print 1.

a=0

a += 1 unless a.zero?

pa

This will print 0.


While the modifier and standard versions have both a “test” expression and a “then” statement, they are not
exact transformations of each other due to parse order. Here is an example that shows the difference:

p a if a = 0.zero?

This raises the NameError “undefined local variable or method ‘a’”.


When ruby parses this expression it first encounters a as a method call in the “then” expression, then later it
sees the assignment to a in the “test” expression and marks a as a local variable.
When running this line it first executes the “test” expression, a = 0.zero?.
Since the test is true it executes the “then” expression, p a. Since the a in the body was recorded as a
method which does not exist the NameError is raised.
The same is true for unless.
case Expression
The case expression can be used in two ways.
The most common way is to compare an object against multiple patterns. The patterns are matched using
the +===+ method which is aliased to +==+ on Object. Other classes must override it to give meaningful
behavior. See Module#=== and Regexp#=== for examples.
Here is an example of using case to compare a String against a pattern:
case "12345"
when /^1/
puts "the string starts with one"
else
puts "I don't know what the string starts with"
end

Here the string "12345" is compared with /^1/ by calling /^1/ === "12345" which returns true. Like
the if expression, the first when that matches is executed and all other matches are ignored.
If no matches are found, the else is executed.
The else and then are optional, this case expression gives the same result as the one above:

case "12345"
when /^1/
puts "the string starts with one"
end

You may place multiple conditions on the same when:

case "2"
when /^1/, "2"
puts "the string starts with one or is '2'"
end

Ruby will try each condition in turn, so first /^1/ === "2" returns false, then "2" === "2"returns true, so “the
string starts with one or is ‘2’” is printed.
You may use then after the when condition. This is most frequently used to place the body of the when on a
single line.

case a
when 1, 2 then puts "a is one or two"
when 3 then puts "a is three"
else puts "I don't know what a is"
end

The other way to use a case expression is like an if-elsif expression:

a=2

case
when a == 1, a == 2
puts "a is one or two"
when a == 3
puts "a is three"
else
puts "I don't know what a is"
end

Again, the then and else are optional.


The result value of a case expression is the last value executed in the expression.
Since Ruby 2.7, case expressions also provide a more powerful experimental pattern matching feature via
the in keyword:

case {a: 1, b: 2, c: 3}
in a: Integer => m
"matched: #{m}"
else
"not matched"
end
# => "matched: 1"

The pattern matching syntax is described on its own page.


while Loop
The while loop executes while a condition is true:

a=0

while a < 10 do
pa
a += 1
end

pa

Prints the numbers 0 through 10. The condition a < 10 is checked before the loop is entered, then the body
executes, then the condition is checked again. When the condition results in false the loop is terminated.
The do keyword is optional. The following loop is equivalent to the loop above:

while a < 10
pa
a += 1
end

The result of a while loop is nil unless break is used to supply a value.
until Loop
The until loop executes while a condition is false:

a=0

until a > 10 do
pa
a += 1
end

pa

This prints the numbers 0 through 11. Like a while loop the condition a > 10 is checked when entering the
loop and each time the loop body executes. If the condition is false the loop will continue to execute.
Like a while loop, the do is optional.
Like a while loop, the result of an until loop is nil unless break is used.
for Loop
The for loop consists of for followed by a variable to contain the iteration argument followed by in and the
value to iterate over using each. The do is optional:

for value in [1, 2, 3] do


puts value
end

Prints 1, 2 and 3.
Like while and until, the do is optional.
The for loop is similar to using each, but does not create a new variable scope.
The result value of a for loop is the value iterated over unless break is used.
The for loop is rarely used in modern ruby programs.
Modifier while and until
Like if and unless, while and until can be used as modifiers:

a=0

a += 1 while a < 10

p a # prints 10

until used as a modifier:

a=0

a += 1 until a > 10

p a # prints 11

You can use begin and end to create a while loop that runs the body once before the condition:

a=0

begin
a += 1
end while a < 10

p a # prints 10

If you don’t use rescue or ensure, Ruby optimizes away any exception handling overhead.
break Statement
Use break to leave a block early. This will stop iterating over the items in values if one of them is even:

values.each do |value|
break if value.even?

# ...
end

You can also terminate from a while loop using break:

a=0

while true do
pa
a += 1

break if a < 10
end

pa

This prints the numbers 0 and 1.


break accepts a value that supplies the result of the expression it is “breaking” out of:

result = [1, 2, 3].each do |value|


break value * 2 if value.even?
end

p result # prints 4

next Statement
Use next to skip the rest of the current iteration:

result = [1, 2, 3].map do |value|


next if value.even?

value * 2
end
p result # prints [2, nil, 6]

next accepts an argument that can be used as the result of the current block iteration:

result = [1, 2, 3].map do |value|


next value if value.even?

value * 2
end

p result # prints [2, 2, 6]

redo Statement
Use redo to redo the current iteration:

result = []

while result.length < 10 do


result << result.length

redo if result.last.even?

result << result.length + 1


end

p result

This prints [0, 1, 3, 3, 5, 5, 7, 7, 9, 9, 11]


In Ruby 1.8, you could also use retry where you used redo. This is no longer true, now you will receive
a SyntaxError when you use retry outside of a rescue block. See Exceptions for proper usage of retry.

Modifier Statements
Ruby’s grammar differentiates between statements and expressions. All expressions are statements (an
expression is a type of statement), but not all statements are expressions. Some parts of the grammar
accept expressions and not other types of statements, which causes code that looks similar to be parsed
differently.
For example, when not used as a modifier, if, else, while, until, and begin are expressions (and also
statements). However, when used as a modifier, if, else, while, until and rescueare statements but not
expressions.

if true; 1 end # expression (and therefore statement)


1 if true # statement (not expression)
Statements that are not expressions cannot be used in contexts where an expression is expected, such as
method arguments.

puts( 1 if true ) #=> SyntaxError

You can wrap a statement in parentheses to create an expression.

puts((1 if true)) #=> 1

If you put a space between the method name and opening parenthesis, you do not need two sets of
parentheses.

puts (1 if true) #=> 1, because of optional parentheses for method

This is because this is parsed similar to a method call without parentheses. It is equivalent to the following
code, without the creation of a local variable:

x = (1 if true)
px

In a modifier statement, the left-hand side must be a statement and the right-hand side must be an
expression.
So in a if b rescue c, because b rescue c is a statement that is not an expression, and therefore is not
allowed as the right-hand side of the if modifier statement, the code is necessarily parsed as (a if b) rescue
c.
This interacts with operator precedence in such a way that:

stmt if v = expr rescue x


stmt if v = expr unless x

are parsed as:

stmt if v = (expr rescue x)


(stmt if v = expr) unless x

This is because modifier rescue has higher precedence than =, and modifier if has lower precedence
than =.

Flip-Flop
The flip-flop is a slightly special conditional expression. One of its typical uses is processing text from ruby
one-line programs used with ruby -n or ruby -p.
The form of the flip-flop is an expression that indicates when the flip-flop turns on, .. (or ...), then an
expression that indicates when the flip-flop will turn off. While the flip-flop is on it will continue to evaluate
to true, and false when off.
Here is an example:

selected = []

0.upto 10 do |value|
selected << value if value==2..value==8
end

p selected # prints [2, 3, 4, 5, 6, 7, 8]

In the above example, the ‘on’ condition is n==2. The flip-flop is initially ‘off’ (false) for 0 and 1, but becomes
‘on’ (true) for 2 and remains ‘on’ through 8. After 8 it turns off and remains ‘off’ for 9 and 10.
The flip-flop must be used inside a conditional such as !, ? :, not, if, while, unless, untiletc. including the
modifier forms.
When you use an inclusive range (..), the ‘off’ condition is evaluated when the ‘on’ condition changes:

selected = []

0.upto 5 do |value|
selected << value if value==2..value==2
end

p selected # prints [2]

Here, both sides of the flip-flop are evaluated so the flip-flop turns on and off only when value equals 2.
Since the flip-flop turned on in the iteration it returns true.
When you use an exclusive range (...), the ‘off’ condition is evaluated on the following iteration:

selected = []

0.upto 5 do |value|
selected << value if value==2...value==2
end

p selected # prints [2, 3, 4, 5]

Here, the flip-flop turns on when value equals 2, but doesn’t turn off on the same iteration. The ‘off’ condition
isn’t evaluated until the following iteration and value will never be two again.

Exception Handling
Exceptions are rescued in a begin/end block:

begin
# code that might raise
rescue
# handle exception
end

If you are inside a method, you do not need to use begin or end unless you wish to limit the scope of
rescued exceptions:

def my_method
# ...
rescue
# ...
end

The same is true for a class, module, and block:

[0, 1, 2].map do |i|


10 / i
rescue ZeroDivisionError
nil
end
#=> [nil, 10, 5]

You can assign the exception to a local variable by using => variable_name at the end of the rescue line:

begin
# ...
rescue => exception
warn exception.message
raise # re-raise the current exception
end

By default, StandardError and its subclasses are rescued. You can rescue a specific set of exception
classes (and their subclasses) by listing them after rescue:

begin
# ...
rescue ArgumentError, NameError
# handle ArgumentError or NameError
end

You may rescue different types of exceptions in different ways:

begin
# ...
rescue ArgumentError
# handle ArgumentError
rescue NameError
# handle NameError
rescue
# handle any StandardError
end

The exception is matched to the rescue section starting at the top, and matches only once. If
an ArgumentError is raised in the begin section, it will not be handled in the StandardErrorsection.
You may retry rescued exceptions:

begin
# ...
rescue
# do something that may change the result of the begin block
retry
end

Execution will resume at the start of the begin block, so be careful not to create an infinite loop.
Inside a rescue block is the only valid location for retry, all other uses will raise a SyntaxError. If you wish to
retry a block iteration use redo. See Control Expressions for details.
To always run some code whether an exception was raised or not, use ensure:

begin
# ...
rescue
# ...
ensure
# this always runs
end

You may also run some code when an exception is not raised:

begin
# ...
rescue
# ...
else
# this runs only when no exception was raised
ensure
# ...
end
Literals
Literals create objects you can use in your program. Literals include:
 Boolean and Nil Literals
 Number Literals
o Integer Literals
o Float Literals
o Rational Literals
o Complex Literals
 String Literals
 Here Document Literals
 Symbol Literals
 Array Literals
 Hash Literals
 Range Literals
 Regexp Literals
 Lambda Proc Literals
 Percent Literals
o %q: Non-Interpolable String Literals
o % and %Q: Interpolable String Literals
o %w and %W: String-Array Literals
o %i and %I: Symbol-Array Literals
o %r: Regexp Literals
o %s: Symbol Literals
o %x: Backtick Literals

Boolean and Nil Literals


nil and false are both false values. nil is sometimes used to indicate “no value” or “unknown” but evaluates
to false in conditional expressions.
true is a true value. All objects except nil and false evaluate to a true value in conditional expressions.

Number Literals

Integer Literals
You can write integers of any size as follows:

1234
1_234

These numbers have the same value, 1,234. The underscore may be used to enhance readability for
humans. You may place an underscore anywhere in the number.
You can use a special prefix to write numbers in decimal, hexadecimal, octal or binary formats. For decimal
numbers use a prefix of 0d, for hexadecimal numbers use a prefix of 0x, for octal numbers use a prefix
of 0 or 0o, for binary numbers use a prefix of 0b. The alphabetic component of the number is not case-
sensitive.
Examples:

0d170
0D170

0xaa
0xAa
0xAA
0Xaa
0XAa
0XaA

0252
0o252
0O252

0b10101010
0B10101010

All these numbers have the same decimal value, 170. Like integers and floats you may use an underscore
for readability.

Float Literals
Floating-point numbers may be written as follows:

12.34
1234e-2
1.234E1

These numbers have the same value, 12.34. You may use underscores in floating point numbers as well.

Rational Literals
You can write a Rational literal using a special suffix, 'r'.
Examples:

1r # => (1/1)
2/3r # => (2/3) # With denominator.
-1r # => (-1/1) # With signs.
-2/3r # => (-2/3)
2/-3r # => (-2/3)
-2/-3r # => (2/3)
+1/+3r # => (1/3)
1.2r # => (6/5) # With fractional part.
1_1/2_1r # => (11/21) # With embedded underscores.
2/4r # => (1/2) # Automatically reduced.

Syntax:

<rational-literal> = <numerator> [ '/' <denominator> ] 'r'

<numerator> = [ <sign> ] <digits> [ <fractional-part> ]

<fractional-part> = '.' <digits>

<denominator> = [ sign ] <digits>

<sign> = '-' | '+'

<digits> = <digit> { <digit> | '_' <digit> }

<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

Note this, which is parsed as Float numerator 1.2 divided by Rational denominator 3r, resulting in a Float:

1.2/3r # => 0.39999999999999997

Complex Literals
You can write a Complex number as follows (suffixed i):

1i #=> (0+1i)
1i * 1i #=> (-1+0i)

Also Rational numbers may be imaginary numbers.

12.3ri #=> (0+(123/10)*i)

i must be placed after r; the opposite is not allowed.

12.3ir #=> Syntax error


Strings

String Literals
The most common way of writing strings is using ":

"This is a string."

The string may be many lines long.


Any internal " must be escaped:

"This string has a quote: \". As you can see, it is escaped"

Double-quote strings allow escaped characters such as \n for newline, \t for tab, etc. The full list of
supported escape sequences are as follows:

\a bell, ASCII 07h (BEL)

\b backspace, ASCII 08h (BS)

\t horizontal tab, ASCII 09h (TAB)

\n newline (line feed), ASCII 0Ah (LF)

\v vertical tab, ASCII 0Bh (VT)

\f form feed, ASCII 0Ch (FF)

\r carriage return, ASCII 0Dh (CR)

\e escape, ASCII 1Bh (ESC)

\s space, ASCII 20h (SPC)

\\ backslash, \

\nnn octal bit pattern, where nnn is 1-3 octal digits ([0-7])

\xnn hexadecimal bit pattern, where nn is 1-2 hexadecimal digits ([0-9a-fA-F])

\unnnn Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])

\u{nnnn ...} Unicode character(s), where each nnnn is 1-6 hexadecimal digits ([0-9a-fA-F])

\cx or \C-x control character, where x is an ASCII printable character

\M-x meta character, where x is an ASCII printable character


\M-\C-x meta control character, where x is an ASCII printable character

\M-\cx same as above

\c\M-x same as above

\c? or \C-? delete, ASCII 7Fh (DEL)

Any other character following a backslash is interpreted as the character itself.


Double-quote strings allow interpolation of other values using #{...}:

"One plus one is two: #{1 + 1}"

Any expression may be placed inside the interpolated section, but it’s best to keep the expression small for
readability.
You can also use #@foo, #@@foo and #$foo as a shorthand for,
respectively, #{ @foo }, #{ @@foo } and #{ $foo }.
Interpolation may be disabled by escaping the “#” character or using single-quote strings:

'#{1 + 1}' #=> "\#{1 + 1}"

In addition to disabling interpolation, single-quoted strings also disable all escape sequences except for the
single-quote (\') and backslash (\\).
Adjacent string literals are automatically concatenated by the interpreter:

"con" "cat" "en" "at" "ion" #=> "concatenation"


"This string contains "\
"no newlines." #=> "This string contains no newlines."

Any combination of adjacent single-quote, double-quote, percent strings will be concatenated as long as a
percent-string is not last.

%q{a} 'b' "c" #=> "abc"


"a" 'b' %q{c} #=> NameError: uninitialized constant q

There is also a character literal notation to represent single character strings, which syntax is a question
mark (?) followed by a single character or escape sequence that corresponds to a single codepoint in the
script encoding:

?a #=> "a"

?abc #=> SyntaxError


?\n #=> "\n"

?\s #=> " "

?\\ #=> "\\"

?\u{41} #=> "A"

?\C-a #=> "\x01"

?\M-a #=> "\xE1"

?\M-\C-a #=> "\x81"

?\C-\M-a #=> "\x81", same as above

?あ #=> "あ"

See also:
 %q: Non-Interpolable String Literals
 % and %Q: Interpolable String Literals

Here Document Literals


If you are writing a large block of text you may use a “here document” or “heredoc”:

expected_result = <<HEREDOC
This would contain specially formatted text.

That might span many lines


HEREDOC

The heredoc starts on the line following <<HEREDOC and ends with the next line that starts
with HEREDOC. The result includes the ending newline.
You may use any identifier with a heredoc, but all-uppercase identifiers are typically used.
You may indent the ending identifier if you place a “-” after <<:

expected_result = <<-INDENTED_HEREDOC
This would contain specially formatted text.

That might span many lines


INDENTED_HEREDOC
Note that while the closing identifier may be indented, the content is always treated as if it is flush left. If you
indent the content those spaces will appear in the output.
To have indented content as well as an indented closing identifier, you can use a “squiggly” heredoc, which
uses a “~” instead of a “-” after <<:

expected_result = <<~SQUIGGLY_HEREDOC
This would contain specially formatted text.

That might span many lines


SQUIGGLY_HEREDOC

The indentation of the least-indented line will be removed from each line of the content. Note that empty
lines and lines consisting solely of literal tabs and spaces will be ignored for the purposes of determining
indentation, but escaped tabs and spaces are considered non-indentation characters.
For the purpose of measuring an indentation, a horizontal tab is regarded as a sequence of one to eight
spaces such that the column position corresponding to its end is a multiple of eight. The amount to be
removed is counted in terms of the number of spaces. If the boundary appears in the middle of a tab, that
tab is not removed.
A heredoc allows interpolation and escaped characters. You may disable interpolation and escaping by
surrounding the opening identifier with single quotes:

expected_result = <<-'EXPECTED'
One plus one is #{1 + 1}
EXPECTED

p expected_result # prints: "One plus one is \#{1 + 1}\n"

The identifier may also be surrounded with double quotes (which is the same as no quotes) or with
backticks. When surrounded by backticks the HEREDOC behaves like Kernel#`:

puts <<-`HEREDOC`
cat #{__FILE__}
HEREDOC

When surrounding with quotes, any character but that quote and newline (CR and/or LF) can be used as the
identifier.
To call a method on a heredoc place it after the opening identifier:

expected_result = <<-EXPECTED.chomp
One plus one is #{1 + 1}
EXPECTED

You may open multiple heredocs on the same line, but this can be difficult to read:
puts(<<-ONE, <<-TWO)
content for heredoc one
ONE
content for heredoc two
TWO

Symbol Literals
A Symbol represents a name inside the ruby interpreter. See Symbol for more details on what symbols are
and when ruby creates them internally.
You may reference a symbol using a colon: :my_symbol.
You may also create symbols by interpolation:

:"my_symbol1"
:"my_symbol#{1 + 1}"

Like strings, a single-quote may be used to disable interpolation:

:'my_symbol#{1 + 1}' #=> :"my_symbol\#{1 + 1}"

When creating a Hash, there is a special syntax for referencing a Symbol as well.
See also:
 %s: Symbol Literals

Array Literals
An array is created using the objects between [ and ]:

[1, 2, 3]

You may place expressions inside the array:

[1, 1 + 1, 1 + 2]
[1, [1 + 1, [1 + 2]]]

See also:
 %w and %W: String-Array Literals
 %i and %I: Symbol-Array Literals
See Array for the methods you may use with an array.

Hash Literals
A hash is created using key-value pairs between { and }:
{ "a" => 1, "b" => 2 }

Both the key and value may be any object.


You can create a hash using symbol keys with the following syntax:

{ a: 1, b: 2 }

This same syntax is used for keyword arguments for a method.


Like Symbol literals, you can quote symbol keys.

{ "a 1": 1, "b #{1 + 1}": 2 }

is equal to

{ :"a 1" => 1, :"b 2" => 2 }

Hash values can be omitted, meaning that value will be fetched from the context by the name of the key:

x = 100
y = 200
h = { x:, y: }
#=> {:x=>100, :y=>200}

See Hash for the methods you may use with a hash.

Range Literals
A range represents an interval of values. The range may include or exclude its ending value.

(1..2) # includes its ending value


(1...2) # excludes its ending value
(1..) # endless range, representing infinite sequence from 1 to Infinity
(..1) # beginless range, representing infinite sequence from -Infinity to 1

You may create a range of any object. See the Range documentation for details on the methods you need
to implement.

Regexp Literals
A regular expression may be created using leading and trailing slash ('/') characters:

re = /foo/ # => /foo/


re.class # => Regexp
The trailing slash may be followed by one or more flag characters that modify the behavior. See Regexp
options for details.
Interpolation may be used inside regular expressions along with escaped characters. Note that a regular
expression may require additional escaped characters than a string.
See also:
 %r: Regexp Literals
See Regexp for a description of the syntax of regular expressions.
Lambda Proc Literals
A lambda proc can be created with ->:

-> { 1 + 1 }

Calling the above proc will give a result of 2.


You can require arguments for the proc as follows:

->(v) { 1 + v }

This proc will add one to its argument.

Percent Literals
Each of the literals in described in this section may use these paired delimiters:
 [ and ].
 ( and ).
 { and }.
 < and >.
 Any other character, as both beginning and ending delimiters.
These are demonstrated in the next section.
%q: Non-Interpolable String Literals
You can write a non-interpolable string with %q. The created string is the same as if you created it with
single quotes:

%[foo bar baz] # => "foo bar baz" # Using [].


%(foo bar baz) # => "foo bar baz" # Using ().
%{foo bar baz} # => "foo bar baz" # Using {}.
%<foo bar baz> # => "foo bar baz" # Using <>.
%|foo bar baz| # => "foo bar baz" # Using two |.
%:foo bar baz: # => "foo bar baz" # Using two :.
%q(1 + 1 is #{1 + 1}) # => "1 + 1 is \#{1 + 1}" # No interpolation.

% and %Q: Interpolable String Literals


You can write an interpolable string with %Q or with its alias %:

%[foo bar baz] # => "foo bar baz"


%(1 + 1 is #{1 + 1}) # => "1 + 1 is 2" # Interpolation.

%w and %W: String-Array Literals


You can write an array of strings with %w (non-interpolable) or %W (interpolable):

%w[foo bar baz] # => ["foo", "bar", "baz"]


%w[1 % *] # => ["1", "%", "*"]
# Use backslash to embed spaces in the strings.
%w[foo\ bar baz\ bat] # => ["foo bar", "baz bat"]
%w(#{1 + 1}) # => ["\#{1", "+", "1}"]
%W(#{1 + 1}) # => ["2"]

%i and %I: Symbol-Array Literals


You can write an array of symbols with %i (non-interpolable) or %I (interpolable):

%i[foo bar baz] # => [:foo, :bar, :baz]


%i[1 % *] # => [:"1", :%, :*]
# Use backslash to embed spaces in the symbols.
%i[foo\ bar baz\ bat] # => [:"foo bar", :"baz bat"]
%i(#{1 + 1}) # => [:"\#{1", :+, :"1}"]
%I(#{1 + 1}) # => [:"2"]

%s: Symbol Literals


You can write a symbol with %s:

:foo # => :foo


:foo bar # => :"foo bar"

%r: Regexp Literals


You can write a regular expression with %r; the character used as the leading and trailing delimiter may be
(almost) any character:

%r/foo/ # => /foo/


%r:name/value pair: # => /name\/value pair/

A few “symmetrical” character pairs may be used as delimiters:

%r[foo] # => /foo/


%r{foo} # => /foo/
%r(foo) # => /foo/
%r<foo> # => /foo/

The trailing delimiter may be followed by one or more flag characters that modify the behavior. See Regexp
options for details.
%x: Backtick Literals
You can write and execute a shell command with %x:
%x(echo 1) # => "1\n"

Methods
Methods implement the functionality of your program. Here is a simple method definition:

def one_plus_one
1+1
end

A method definition consists of the def keyword, a method name, the body of the method, returnvalue and
the end keyword. When called the method will execute the body of the method. This method returns 2.
Since Ruby 3.0, there is also a shorthand syntax for methods consisting of exactly one expression:

def one_plus_one = 1 + 1

This section only covers defining methods. See also the syntax documentation on calling methods.
Method Names
Method names may be one of the operators or must start a letter or a character with the eighth bit set. It
may contain letters, numbers, an _ (underscore or low line) or a character with the eighth bit set. The
convention is to use underscores to separate words in a multiword method name:

def method_name
puts "use underscores to separate words"
end

Ruby programs must be written in a US-ASCII-compatible character set such as UTF-8, ISO-8859-1 etc. In
such character sets if the eighth bit is set it indicates an extended character. Ruby allows method names
and other identifiers to contain such characters. Ruby programs cannot contain some characters like ASCII
NUL (\x00).
The following are examples of valid Ruby methods:

def hello
"hello"
end

def こんにちは
puts "means hello in Japanese"
end

Typically method names are US-ASCII compatible since the keys to type them exist on all keyboards.
Method names may end with a ! (bang or exclamation mark), a ? (question mark), or = (equals sign).
The bang methods (! at the end of the method name) are called and executed just like any other method.
However, by convention, a method with an exclamation point or bang is considered dangerous. In Ruby’s
core library the dangerous method implies that when a method ends with a bang (!), it indicates that unlike
its non-bang equivalent, permanently modifies its receiver. Almost always, the Ruby core library will have a
non-bang counterpart (method name which does NOT end with !) of every bang method (method name
which does end with !) that does not modify the receiver. This convention is typically true for the Ruby core
library but may or may not hold true for other Ruby libraries.
Methods that end with a question mark by convention return boolean, but they may not always return
just true or false. Often, they will return an object to indicate a true value (or “truthy” value).
Methods that end with an equals sign indicate an assignment method.

class C
def attr
@attr
end

def attr=(val)
@attr = val
end
end

c = C.new
c.attr #=> nil
c.attr = 10 # calls "attr=(10)"
c.attr #=> 10

Assignment methods can not be defined using the shorthand syntax.


These are method names for the various Ruby operators. Each of these operators accepts only one
argument. Following the operator is the typical use or name of the operator. Creating an alternate meaning
for the operator may lead to confusion as the user expects plus to add things, minus to subtract things, etc.
Additionally, you cannot alter the precedence of the operators.
+
add
-
subtract
*
multiply
**
power
/
divide
%
modulus division, String#%
&
AND
^
XOR (exclusive OR)
>>
right-shift
<<
left-shift, append
==
equal
!=
not equal
===
case equality. See Object#===
=~
pattern match. (Not just for regular expressions)
!~
does not match
<=>
comparison aka spaceship operator. See Comparable
<
less-than
<=
less-than or equal
>
greater-than
>=
greater-than or equal
To define unary methods minus and plus, follow the operator with an @ as in +@:

class C
def -@
puts "you inverted this object"
end
end

obj = C.new

-obj # prints "you inverted this object"

The @ is needed to differentiate unary minus and plus operators from binary minus and plus operators.
You can also follow tilde and not (!) unary methods with @, but it is not required as there are no binary tilde
and not operators.
Unary methods accept zero arguments.
Additionally, methods for element reference and assignment may be defined: [] and []=respectively. Both
can take one or more arguments, and element reference can take none.

class C
def [](a, b)
puts a + b
end

def []=(a, b, c)
puts a * b + c
end
end

obj = C.new

obj[2, 3] # prints "5"


obj[2, 3] = 4 # prints "10"

Return Values
By default, a method returns the last expression that was evaluated in the body of the method. In the
example above, the last (and only) expression evaluated was the simple sum 1 + 1. The returnkeyword can
be used to make it explicit that a method returns a value.

def one_plus_one
return 1 + 1
end

It can also be used to make a method return before the last expression is evaluated.

def two_plus_two
return 2 + 2
1 + 1 # this expression is never evaluated
end

Note that for assignment methods the return value will be ignored when using the assignment syntax.
Instead, the argument will be returned:

def a=(value)
return 1 + value
end
p(self.a = 5) # prints 5

The actual return value will be returned when invoking the method directly:

p send(:a=, 5) # prints 6

Scope
The standard syntax to define a method:

def my_method
# ...
end

adds the method to a class. You can define an instance method on a specific class with the classkeyword:

class C
def my_method
# ...
end
end

A method may be defined on another object. You may define a “class method” (a method that is defined on
the class, not an instance of the class) like this:

class C
def self.my_method
# ...
end
end

However, this is simply a special case of a greater syntactical power in Ruby, the ability to add methods to
any object. Classes are objects, so adding class methods is simply adding methods to the Classobject.
The syntax for adding a method to an object is as follows:

greeting = "Hello"

def greeting.broaden
self + ", world!"
end

greeting.broaden # returns "Hello, world!"


self is a keyword referring to the current object under consideration by the compiler, which might make the
use of self in defining a class method above a little clearer. Indeed, the example of adding a hello method to
the class String can be rewritten thus:

def String.hello
"Hello, world!"
end

A method defined like this is called a “singleton method”. broaden will only exist on the string
instance greeting. Other strings will not have broaden.

Overriding
When Ruby encounters the def keyword, it doesn’t consider it an error if the method already exists: it simply
redefines it. This is called overriding. Rather like extending core classes, this is a potentially dangerous
ability, and should be used sparingly because it can cause unexpected results. For example, consider this
irb session:

>> "43".to_i

=> 43

>> class String

>> def to_i

>> 42

>> end

>> end

=> nil

>> "43".to_i

=> 42

This will effectively sabotage any code which makes use of the method String#to_i to parse numbers from
strings.

Arguments
A method may accept arguments. The argument list follows the method name:
def add_one(value)
value + 1
end

When called, the user of the add_one method must provide an argument. The argument is a local variable
in the method body. The method will then add one to this argument and return the value. If given 1 this
method will return 2.
The parentheses around the arguments are optional:

def add_one value


value + 1
end

The parentheses are mandatory in shorthand method definitions:

# OK

def add_one(value) = value + 1

# SyntaxError

def add_one value = value + 1

Multiple arguments are separated by a comma:

def add_values(a, b)
a+b
end

When called, the arguments must be provided in the exact order. In other words, the arguments are
positional.

Default Values
Arguments may have default values:

def add_values(a, b = 1)
a+b
end

The default value does not need to appear first, but arguments with defaults must be grouped together. This
is ok:
def add_values(a = 1, b = 2, c)
a+b+c
end

This will raise a SyntaxError:

def add_values(a = 1, b, c = 1)

a+b+c

end

Default argument values can refer to arguments that have already been evaluated as local variables, and
argument values are always evaluated left to right. So this is allowed:

def add_values(a = 1, b = a)
a+b
end
add_values
# => 2

But this will raise a NameError (unless there is a method named b defined):

def add_values(a = b, b = 1)
a+b
end
add_values
# NameError (undefined local variable or method `b' for main:Object)

Array Decomposition
You can decompose (unpack or extract values from) an Array using extra parentheses in the arguments:

def my_method((a, b))


p a: a, b: b
end

my_method([1, 2])

This prints:

{:a=>1, :b=>2}

If the argument has extra elements in the Array they will be ignored:
def my_method((a, b))
p a: a, b: b
end

my_method([1, 2, 3])

This has the same output as above.


You can use a * to collect the remaining arguments. This splits an Array into a first element and the rest:

def my_method((a, *b))


p a: a, b: b
end

my_method([1, 2, 3])

This prints:

{:a=>1, :b=>[2, 3]}

The argument will be decomposed if it responds to to_ary. You should only define to_ary if you can use your
object in place of an Array.
Use of the inner parentheses only uses one of the sent arguments. If the argument is not an Array it will be
assigned to the first argument in the decomposition and the remaining arguments in the decomposition will
be nil:

def my_method(a, (b, c), d)


p a: a, b: b, c: c, d: d
end

my_method(1, 2, 3)

This prints:

{:a=>1, :b=>2, :c=>nil, :d=>3}

You can nest decomposition arbitrarily:

def my_method(((a, b), c))


# ...
end

Array/Hash Argument
Prefixing an argument with * causes any remaining arguments to be converted to an Array:
def gather_arguments(*arguments)
p arguments
end

gather_arguments 1, 2, 3 # prints [1, 2, 3]

The array argument must appear before any keyword arguments.


It is possible to gather arguments at the beginning or in the middle:

def gather_arguments(first_arg, *middle_arguments, last_arg)


p middle_arguments
end

gather_arguments 1, 2, 3, 4 # prints [2, 3]

The array argument will capture a Hash as the last entry if keywords were provided by the caller after all
positional arguments.

def gather_arguments(*arguments)
p arguments
end

gather_arguments 1, a: 2 # prints [1, {:a=>2}]

However, this only occurs if the method does not declare any keyword arguments.

def gather_arguments_keyword(*positional, keyword: nil)


p positional: positional, keyword: keyword
end

gather_arguments_keyword 1, 2, three: 3
#=> raises: unknown keyword: three (ArgumentError)

Also, note that a bare * can be used to ignore arguments:

def ignore_arguments(*)
end

You can also use a bare * when calling a method to pass the arguments directly to another method:

def delegate_arguments(*)
other_method(*)
end
Keyword Arguments
Keyword arguments are similar to positional arguments with default values:

def add_values(first: 1, second: 2)


first + second
end

Arbitrary keyword arguments will be accepted with **:

def gather_arguments(first: nil, **rest)


p first, rest
end

gather_arguments first: 1, second: 2, third: 3


# prints 1 then {:second=>2, :third=>3}

When calling a method with keyword arguments the arguments may appear in any order. If an unknown
keyword argument is sent by the caller, and the method does not accept arbitrary keyword arguments,
an ArgumentError is raised.
To require a specific keyword argument, do not include a default value for the keyword argument:

def add_values(first:, second:)


first + second
end
add_values
# ArgumentError (missing keywords: first, second)
add_values(first: 1, second: 2)
# => 3

When mixing keyword arguments and positional arguments, all positional arguments must appear before
any keyword arguments.
Also, note that ** can be used to ignore keyword arguments:

def ignore_keywords(**)
end

You can also use ** when calling a method to delegate keyword arguments to another method:

def delegate_keywords(**)
other_method(**)
end

To mark a method as accepting keywords, but not actually accepting keywords, you can use the **nil:

def no_keywords(**nil)
end
Calling such a method with keywords or a non-empty keyword splat will result in an ArgumentError. This
syntax is supported so that keywords can be added to the method later without affected backwards
compatibility.
If a method definition does not accept any keywords, and the **nil syntax is not used, any keywords
provided when calling the method will be converted to a Hash positional argument:

def meth(arg)
arg
end
meth(a: 1)
# => {:a=>1}

Block Argument
The block argument is indicated by & and must come last:

def my_method(&my_block)
my_block.call(self)
end

Most frequently the block argument is used to pass a block to another method:

def each_item(&block)
@items.each(&block)
end

You are not required to give a name to the block if you will just be passing it to another method:

def each_item(&)
@items.each(&)
end

If you are only going to call the block and will not otherwise manipulate it or send it to another method,
using yield without an explicit block parameter is preferred. This method is equivalent to the first method in
this section:

def my_method
yield self
end

Argument Forwarding
Since Ruby 2.7, an all-arguments forwarding syntax is available:
def concrete_method(*positional_args, **keyword_args, &block)
[positional_args, keyword_args, block]
end

def forwarding_method(...)
concrete_method(...)
end

forwarding_method(1, b: 2) { puts 3 }
#=> [[1], {:b=>2}, #<Proc:...skip...>]

Calling with forwarding ... is available only in methods defined with ....

def regular_method(arg, **kwarg)

concrete_method(...) # Syntax error

end

Since Ruby 3.0, there can be leading arguments before ... both in definitions and in invocations (but in
definitions they can be only positional arguments without default values).

def request(method, path, **headers)


puts "#{method.upcase} #{path} #{headers}"
end

def get(...)
request(:GET, ...) # leading argument in invoking
end

get('http://ruby-lang.org', 'Accept' => 'text/html')


# Prints: GET http://ruby-lang.org {"Accept"=>"text/html"}

def logged_get(msg, ...) # leading argument in definition


puts "Invoking #get: #{msg}"
get(...)
end

logged_get('Ruby site', 'http://ruby-lang.org')


# Prints:
# Invoking #get: Ruby site
# GET http://ruby-lang.org {}

Note that omitting parentheses in forwarding calls may lead to unexpected results:
def log(...)
puts ... # This would be treated as `puts()...',
# i.e. endless range from puts result
end

log("test")
# Prints: warning: ... at EOL, should be parenthesized?
# ...and then empty line

Exception Handling
Methods have an implied exception handling block so you do not need to use begin or end to handle
exceptions. This:

def my_method
begin
# code that may raise an exception
rescue
# handle exception
end
end

May be written as:

def my_method
# code that may raise an exception
rescue
# handle exception
end

Similarly, if you wish to always run code even if an exception is raised, you can
use ensure without begin and end:

def my_method
# code that may raise an exception
ensure
# code that runs even if previous code raised an exception
end

You can also combine rescue with ensure and/or else, without begin and end:

def my_method
# code that may raise an exception
rescue
# handle exception
else
# only run if no exception raised above
ensure
# code that runs even if previous code raised an exception
end

If you wish to rescue an exception for only part of your method, use begin and end. For more details see the
page on exception handling.

Miscellaneous Syntax

Ending an Expression
Ruby uses a newline as the end of an expression. When ending a line with an operator, open parentheses,
comma, etc. the expression will continue.
You can end an expression with a ; (semicolon). Semicolons are most frequently used with ruby -e.

Indentation
Ruby does not require any indentation. Typically, ruby programs are indented two spaces.
If you run ruby with warnings enabled and have an indentation mismatch, you will receive a warning.
alias
The alias keyword is most frequently used to alias methods. When aliasing a method, you can use either its
name or a symbol:

alias new_name old_name


alias :new_name :old_name

For methods, Module#alias_method can often be used instead of alias.


You can also use alias to alias global variables:

$old = 0

alias $new $old

p $new # prints 0

You may use alias in any scope.


undef
The undef keyword prevents the current class from responding to calls to the named methods.

undef my_method

You may use symbols instead of method names:


undef :my_method

You may undef multiple methods:

undef method1, method2

You may use undef in any scope. See also Module#undef_method


defined?
defined? is a keyword that returns a string describing its argument:

p defined?(UNDEFINED_CONSTANT) # prints nil


p defined?(RUBY_VERSION) # prints "constant"
p defined?(1 + 1) # prints "method"

You don’t need to use parenthesis with defined?, but they are recommended due to the low
precedence of defined?.
For example, if you wish to check if an instance variable exists and that the instance variable is zero:

defined? @instance_variable && @instance_variable.zero?

This returns "expression", which is not what you want if the instance variable is not defined.

@instance_variable = 1
defined?(@instance_variable) && @instance_variable.zero?

Adding parentheses when checking if the instance variable is defined is a better check. This correctly
returns nil when the instance variable is not defined and false when the instance variable is not zero.
Using the specific reflection methods such as instance_variable_defined? for instance variables or
const_defined? for constants is less error prone than using defined?.
defined? handles some regexp global variables specially based on whether there is an active regexp match
and how many capture groups there are:

/b/ =~ 'a'
defined?($~) # => "global-variable"
defined?($&) # => nil
defined?($`) # => nil
defined?($') # => nil
defined?($+) # => nil
defined?($1) # => nil
defined?($2) # => nil

/./ =~ 'a'
defined?($~) # => "global-variable"
defined?($&) # => "global-variable"
defined?($`) # => "global-variable"
defined?($') # => "global-variable"
defined?($+) # => nil
defined?($1) # => nil
defined?($2) # => nil

/(.)/ =~ 'a'
defined?($~) # => "global-variable"
defined?($&) # => "global-variable"
defined?($`) # => "global-variable"
defined?($') # => "global-variable"
defined?($+) # => "global-variable"
defined?($1) # => "global-variable"
defined?($2) # => nil

BEGIN and END


BEGIN defines a block that is run before any other code in the current file. It is typically used in one-liners
with ruby -e. Similarly END defines a block that is run after any other code.
BEGIN must appear at top-level and END will issue a warning when you use it inside a method. Here is an
example:

BEGIN {
count = 0
}

You must use { and } you may not use do and end.
Here is an example one-liner that adds numbers from standard input or any files in the argument list:

ruby -ne 'BEGIN { count = 0 }; END { puts count }; count += gets.to_i'

Modules
Modules serve two purposes in Ruby, namespacing and mix-in functionality.
A namespace can be used to organize code by package or functionality that separates common names
from interference by other packages. For example, the IRB namespace provides functionality for irb that
prevents a collision for the common name “Context”.
Mix-in functionality allows sharing common methods across multiple classes or modules. Ruby comes with
the Enumerable mix-in module which provides many enumeration methods based on the eachmethod
and Comparable allows comparison of objects based on the <=> comparison method.
Note that there are many similarities between modules and classes. Besides the ability to mix-in a module,
the description of modules below also applies to classes.
Module Definition
A module is created using the module keyword:
module MyModule
# ...
end

A module may be reopened any number of times to add, change or remove functionality:

module MyModule
def my_method
end
end

module MyModule
alias my_alias my_method
end

module MyModule
remove_method :my_method
end

Reopening classes is a very powerful feature of Ruby, but it is best to only reopen classes you own.
Reopening classes you do not own may lead to naming conflicts or difficult to diagnose bugs.

Nesting
Modules may be nested:

module Outer
module Inner
end
end

Many packages create a single outermost module (or class) to provide a namespace for their functionality.
You may also define inner modules using :: provided the outer modules (or classes) are already defined:

module Outer::Inner::GrandChild
end

Note that this will raise a NameError if Outer and Outer::Inner are not already defined.
This style has the benefit of allowing the author to reduce the amount of indentation. Instead of 3 levels of
indentation only one is necessary. However, the scope of constant lookup is different for creating a
namespace using this syntax instead of the more verbose syntax.

Scope
self
self refers to the object that defines the current scope. self will change when entering a different method or
when defining a new module.

Constants
Accessible constants are different depending on the module nesting (which syntax was used to define the
module). In the following example the constant A::Z is accessible from B as A is part of the nesting:

module A
Z=1

module B
p Module.nesting #=> [A::B, A]
p Z #=> 1
end
end

However, if you use :: to define A::B without nesting it inside A, a NameError exception will be raised
because the nesting does not include A:

module A
Z=1
end

module A::B
p Module.nesting #=> [A::B]
p Z #=> raises NameError
end

If a constant is defined at the top-level you may preceded it with :: to reference it:

Z=0

module A
Z=1

module B
p ::Z #=> 0
end
end

Methods
For method definition documentation see the syntax documentation for methods.
Class methods may be called directly. (This is slightly confusing, but a method on a module is often called a
“class method” instead of a “module method”. See also Module#module_function which can convert an
instance method into a class method.)
When a class method references a constant, it uses the same rules as referencing it outside the method as
the scope is the same.
Instance methods defined in a module are only callable when included. These methods have access to the
constants defined when they were included through the ancestors list:

module A
Z=1

def z
Z
end
end

include A

p self.class.ancestors #=> [Object, A, Kernel, BasicObject]


p z #=> 1

Visibility
Ruby has three types of visibility. The default is public. A public method may be called from any other
object.
The second visibility is protected. When calling a protected method the sender must inherit
the Class or Module which defines the method. Otherwise a NoMethodError will be raised.
Protected visibility is most frequently used to define == and other comparison methods where the author
does not wish to expose an object’s state to any caller and would like to restrict it only to inherited classes.
Here is an example:

class A
def n(other)
other.m
end
end

class B < A
def m
1
end

protected :m

end
class C < B
end

a = A.new
b = B.new
c = C.new

c.n b #=> 1 -- C is a subclass of B


b.n b #=> 1 -- m called on defining class
a.n b # raises NoMethodError A is not a subclass of B

The third visibility is private. A private method may only be called from inside the owner class without a
receiver, or with a literal self as a receiver. If a private method is called with a receiver other than a
literal self, a NoMethodError will be raised.

class A
def without
m
end

def with_self
self.m
end

def with_other
A.new.m
end

def with_renamed
copy = self
copy.m
end

def m
1
end

private :m
end

a = A.new
a.without #=> 1
a.with_self #=> 1
a.with_other # NoMethodError (private method `m' called for #<A:0x0000559c287f27d0>)
a.with_renamed # NoMethodError (private method `m' called for #<A:0x0000559c285f8330>)

alias and undef


You may also alias or undefine methods, but these operations are not restricted to modules or classes. See
the miscellaneous syntax section for documentation.

Classes
Every class is also a module, but unlike modules a class may not be mixed-in to another module (or class).
Like a module, a class can be used as a namespace. A class also inherits methods and constants from its
superclass.

Defining a class
Use the class keyword to create a class:

class MyClass
# ...
end

If you do not supply a superclass your new class will inherit from Object. You may inherit from a different
class using < followed by a class name:

class MySubclass < MyClass


# ...
end

There is a special class BasicObject which is designed as a blank class and includes a minimum of built-in
methods. You can use BasicObject to create an independent inheritance structure. See
the BasicObject documentation for further details.

Inheritance
Any method defined on a class is callable from its subclass:

class A
Z=1

def z
Z
end
end

class B < A
end
p B.new.z #=> 1

The same is true for constants:

class A
Z=1
end

class B < A
def z
Z
end
end

p B.new.z #=> 1

You can override the functionality of a superclass method by redefining the method:

class A
def m
1
end
end

class B < A
def m
2
end
end

p B.new.m #=> 2

If you wish to invoke the superclass functionality from a method use super:

class A
def m
1
end
end

class B < A
def m
2 + super
end
end

p B.new.m #=> 3

When used without any arguments super uses the arguments given to the subclass method. To send no
arguments to the superclass method use super(). To send specific arguments to the superclass method
provide them manually like super(2).
super may be called as many times as you like in the subclass method.
Singleton Classes
The singleton class (also known as the metaclass or eigenclass) of an object is a class that holds methods
for only that instance. You can access the singleton class of an object using class << object like this:

class C
end

class << C
# self is the singleton class here
end

Most frequently you’ll see the singleton class accessed like this:

class C
class << self
# ...
end
end

This allows definition of methods and attributes on a class (or module) without needing to write def
self.my_method.
Since you can open the singleton class of any object this means that this code block:

o = Object.new

def o.my_method
1+1
end

is equivalent to this code block:

o = Object.new

class << o
def my_method
1+1
end
end

Both objects will have a my_method that returns 2

Pattern matching
Pattern matching is a feature allowing deep matching of structured values: checking the structure and
binding the matched parts to local variables.
Pattern matching in Ruby is implemented with the case/in expression:

case <expression>

in <pattern1>

...

in <pattern2>

...

in <pattern3>

...

else

...

end

(Note that in and when branches can NOT be mixed in one case expression.)
Or with the => operator and the in operator, which can be used in a standalone expression:

<expression> => <pattern>

<expression> in <pattern>
The case/in expression is exhaustive: if the value of the expression does not match any branch of
the case expression (and the else branch is absent), NoMatchingPatternError is raised.
Therefore, the case expression might be used for conditional matching and unpacking:

config = {db: {user: 'admin', password: 'abc123'}}

case config
in db: {user:} # matches subhash and puts matched value in variable user
puts "Connect with user '#{user}'"
in connection: {username: }
puts "Connect with user '#{username}'"
else
puts "Unrecognized structure of config"
end
# Prints: "Connect with user 'admin'"

whilst the => operator is most useful when the expected data structure is known beforehand, to just unpack
parts of it:

config = {db: {user: 'admin', password: 'abc123'}}

config => {db: {user:}} # will raise if the config's structure is unexpected

puts "Connect with user '#{user}'"


# Prints: "Connect with user 'admin'"

<expression> in <pattern> is the same as case <expression>; in <pattern>; true; else false; end. You can
use it when you only want to know if a pattern has been matched or not:

users = [{name: "Alice", age: 12}, {name: "Bob", age: 23}]


users.any? {|user| user in {name: /B/, age: 20..} } #=> true

See below for more examples and explanations of the syntax.

Patterns
Patterns can be:
 any Ruby object (matched by the === operator, like in when); (Value pattern)
 array pattern: [<subpattern>, <subpattern>, <subpattern>, ...]; (Array pattern)
 find pattern: [*variable, <subpattern>, <subpattern>, <subpattern>, ..., *variable]; (Find pattern)
 hash pattern: {key: <subpattern>, key: <subpattern>, ...}; (Hash pattern)
 combination of patterns with |; (Alternative pattern)
 variable capture: <pattern> => variable or variable; (As pattern, Variable pattern)
Any pattern can be nested inside array/find/hash patterns where <subpattern> is specified.
Array patterns and find patterns match arrays, or objects that respond to deconstruct (see below about the
latter). Hash patterns match hashes, or objects that respond to deconstruct_keys (see below about the
latter). Note that only symbol keys are supported for hash patterns.
An important difference between array and hash pattern behavior is that arrays match only a wholearray:

case [1, 2, 3]
in [Integer, Integer]
"matched"
else
"not matched"
end
#=> "not matched"

while the hash matches even if there are other keys besides the specified part:

case {a: 1, b: 2, c: 3}
in {a: Integer}
"matched"
else
"not matched"
end
#=> "matched"

{} is the only exclusion from this rule. It matches only if an empty hash is given:

case {a: 1, b: 2, c: 3}
in {}
"matched"
else
"not matched"
end
#=> "not matched"

case {}
in {}
"matched"
else
"not matched"
end
#=> "matched"

There is also a way to specify there should be no other keys in the matched hash except those explicitly
specified by the pattern, with **nil:

case {a: 1, b: 2}
in {a: Integer, **nil} # this will not match the pattern having keys other than a:
"matched a part"
in {a: Integer, b: Integer, **nil}
"matched a whole"
else
"not matched"
end
#=> "matched a whole"

Both array and hash patterns support “rest” specification:

case [1, 2, 3]
in [Integer, *]
"matched"
else
"not matched"
end
#=> "matched"

case {a: 1, b: 2, c: 3}
in {a: Integer, **}
"matched"
else
"not matched"
end
#=> "matched"

Parentheses around both kinds of patterns could be omitted:

case [1, 2]
in Integer, Integer
"matched"
else
"not matched"
end
#=> "matched"

case {a: 1, b: 2, c: 3}
in a: Integer
"matched"
else
"not matched"
end
#=> "matched"

[1, 2] => a, b
[1, 2] in a, b
{a: 1, b: 2, c: 3} => a:
{a: 1, b: 2, c: 3} in a:

Find pattern is similar to array pattern but it can be used to check if the given object has any elements that
match the pattern:

case ["a", 1, "b", "c", 2]


in [*, String, String, *]
"matched"
else
"not matched"
end

Variable binding
Besides deep structural checks, one of the very important features of the pattern matching is the binding of
the matched parts to local variables. The basic form of binding is just specifying => variable_name after the
matched (sub)pattern (one might find this similar to storing exceptions in local variables in a rescue
ExceptionClass => var clause):

case [1, 2]
in Integer => a, Integer
"matched: #{a}"
else
"not matched"
end
#=> "matched: 1"

case {a: 1, b: 2, c: 3}
in a: Integer => m
"matched: #{m}"
else
"not matched"
end
#=> "matched: 1"

If no additional check is required, for only binding some part of the data to a variable, a simpler form could
be used:

case [1, 2]
in a, Integer
"matched: #{a}"
else
"not matched"
end
#=> "matched: 1"

case {a: 1, b: 2, c: 3}
in a: m
"matched: #{m}"
else
"not matched"
end
#=> "matched: 1"

For hash patterns, even a simpler form exists: key-only specification (without any sub-pattern) binds the
local variable with the key’s name, too:

case {a: 1, b: 2, c: 3}
in a:
"matched: #{a}"
else
"not matched"
end
#=> "matched: 1"

Binding works for nested patterns as well:

case {name: 'John', friends: [{name: 'Jane'}, {name: 'Rajesh'}]}


in name:, friends: [{name: first_friend}, *]
"matched: #{first_friend}"
else
"not matched"
end
#=> "matched: Jane"

The “rest” part of a pattern also can be bound to a variable:

case [1, 2, 3]
in a, *rest
"matched: #{a}, #{rest}"
else
"not matched"
end
#=> "matched: 1, [2, 3]"

case {a: 1, b: 2, c: 3}
in a:, **rest
"matched: #{a}, #{rest}"
else
"not matched"
end
#=> "matched: 1, {:b=>2, :c=>3}"

Binding to variables currently does NOT work for alternative patterns joined with |:

case {a: 1, b: 2}

in {a: } | Array

"matched: #{a}"

else

"not matched"

end

# SyntaxError (illegal variable in alternative pattern (a))

Variables that start with _ are the only exclusions from this rule:

case {a: 1, b: 2}
in {a: _, b: _foo} | Array
"matched: #{_}, #{_foo}"
else
"not matched"
end
# => "matched: 1, 2"

It is, though, not advised to reuse the bound value, as this pattern’s goal is to signify a discarded value.

Variable pinning
Due to the variable binding feature, existing local variable can not be straightforwardly used as a sub-
pattern:

expectation = 18

case [1, 2]
in expectation, *rest
"matched. expectation was: #{expectation}"
else
"not matched. expectation was: #{expectation}"
end
# expected: "not matched. expectation was: 18"
# real: "matched. expectation was: 1" -- local variable just rewritten

For this case, the pin operator ^ can be used, to tell Ruby “just use this value as part of the pattern”:

expectation = 18
case [1, 2]
in ^expectation, *rest
"matched. expectation was: #{expectation}"
else
"not matched. expectation was: #{expectation}"
end
#=> "not matched. expectation was: 18"

One important usage of variable pinning is specifying that the same value should occur in the pattern
several times:

jane = {school: 'high', schools: [{id: 1, level: 'middle'}, {id: 2, level: 'high'}]}
john = {school: 'high', schools: [{id: 1, level: 'middle'}]}

case jane
in school:, schools: [*, {id:, level: ^school}] # select the last school, level should match
"matched. school: #{id}"
else
"not matched"
end
#=> "matched. school: 2"

case john # the specified school level is "high", but last school does not match
in school:, schools: [*, {id:, level: ^school}]
"matched. school: #{id}"
else
"not matched"
end
#=> "not matched"

In addition to pinning local variables, you can also pin instance, global, and class variables:

$gvar = 1
class A
@ivar = 2
@@cvar = 3
case [1, 2, 3]
in ^$gvar, ^@ivar, ^@@cvar
"matched"
else
"not matched"
end
#=> "matched"
end

You can also pin the result of arbitrary expressions using parentheses:

a=1
b=2
case 3
in ^(a + b)
"matched"
else
"not matched"
end
#=> "matched"

Matching non-primitive objects: deconstruct and deconstruct_keys


As already mentioned above, array, find, and hash patterns besides literal arrays and hashes will try to
match any object implementing deconstruct (for array/find patterns) or deconstruct_keys(for hash patterns).

class Point
def initialize(x, y)
@x, @y = x, y
end

def deconstruct
puts "deconstruct called"
[@x, @y]
end

def deconstruct_keys(keys)
puts "deconstruct_keys called with #{keys.inspect}"
{x: @x, y: @y}
end
end

case Point.new(1, -2)


in px, Integer # sub-patterns and variable binding works
"matched: #{px}"
else
"not matched"
end
# prints "deconstruct called"
"matched: 1"

case Point.new(1, -2)


in x: 0.. => px
"matched: #{px}"
else
"not matched"
end
# prints: deconstruct_keys called with [:x]
#=> "matched: 1"

keys are passed to deconstruct_keys to provide a room for optimization in the matched class: if calculating
a full hash representation is expensive, one may calculate only the necessary subhash. When
the **rest pattern is used, nil is passed as a keys value:

case Point.new(1, -2)


in x: 0.. => px, **rest
"matched: #{px}"
else
"not matched"
end
# prints: deconstruct_keys called with nil
#=> "matched: 1"

Additionally, when matching custom classes, the expected class can be specified as part of the pattern and
is checked with ===

class SuperPoint < Point


end

case Point.new(1, -2)


in SuperPoint(x: 0.. => px)
"matched: #{px}"
else
"not matched"
end
#=> "not matched"

case SuperPoint.new(1, -2)


in SuperPoint[x: 0.. => px] # [] or () parentheses are allowed
"matched: #{px}"
else
"not matched"
end
#=> "matched: 1"

Guard clauses
if can be used to attach an additional condition (guard clause) when the pattern matches. This condition
may use bound variables:

case [1, 2]
in a, b if b == a*2
"matched"
else
"not matched"
end
#=> "matched"

case [1, 1]
in a, b if b == a*2
"matched"
else
"not matched"
end
#=> "not matched"

unless works, too:

case [1, 1]
in a, b unless b == a*2
"matched"
else
"not matched"
end
#=> "matched"

Current feature status


As of Ruby 3.1, find patterns are considered experimental: its syntax can change in the future. Every time
you use these features in code, a warning will be printed:
[0] => [*, 0, *]
# warning: Find pattern is experimental, and the behavior may change in future versions of Ruby!
# warning: One-line pattern matching is experimental, and the behavior may change in future versions of
Ruby!

To suppress this warning, one may use the Warning::[]= method:

Warning[:experimental] = false
eval('[0] => [*, 0, *]')
# ...no warning printed...

Note that pattern-matching warnings are raised at compile time, so this will not suppress the warning:

Warning[:experimental] = false # At the time this line is evaluated, the parsing happened and warning
emitted
[0] => [*, 0, *]

So, only subsequently loaded files or ‘eval`-ed code is affected by switching the flag.
Alternatively, the command line option -W:no-experimental can be used to turn off “experimental” feature
warnings.

Appendix A. Pattern syntax


Approximate syntax is:

pattern: value_pattern

| variable_pattern

| alternative_pattern

| as_pattern

| array_pattern

| find_pattern

| hash_pattern

value_pattern: literal

| Constant
| ^local_variable

| ^instance_variable

| ^class_variable

| ^global_variable

| ^(expression)

variable_pattern: variable

alternative_pattern: pattern | pattern | ...

as_pattern: pattern => variable

array_pattern: [pattern, ..., *variable]

| Constant(pattern, ..., *variable)

| Constant[pattern, ..., *variable]

find_pattern: [*variable, pattern, ..., *variable]

| Constant(*variable, pattern, ..., *variable)

| Constant[*variable, pattern, ..., *variable]

hash_pattern: {key: pattern, key:, ..., **variable}

| Constant(key: pattern, key:, ..., **variable)

| Constant[key: pattern, key:, ..., **variable]


Appendix B. Some undefined behavior examples
To leave room for optimization in the future, the specification contains some undefined behavior.
Use of a variable in an unmatched pattern:

case [0, 1]
in [a, 2]
"not matched"
in b
"matched"
in c
"not matched"
end
a #=> undefined
c #=> undefined

Number of deconstruct, deconstruct_keys method calls:

$i = 0
ary = [0]
def ary.deconstruct
$i += 1
self
end
case ary
in [0, 1]
"not matched"
in [0]
"matched"
end
$i #=> undefined

Precedence
From highest to lowest, this is the precedence table for ruby. High precedence operations happen before
low precedence operations.

!, ~, unary +

**
unary -

*, /, %

+, -

<<, >>

&

|, ^

>, >=, <, <=

<=>, ==, ===, !=, =~, !~

&&

||

.., ...

?, :
modifier-rescue

=, +=, -=, etc.

defined?

not

or, and

modifier-if, modifier-unless, modifier-while, modifier-until

{ } blocks

Unary + and unary - are for +1, -1 or -(a + b).


Modifier-if, modifier-unless, etc. are for the modifier versions of those keywords. For example, this is a
modifier-unless statement:

a += 1 unless a.zero?

Note that (a if b rescue c) is parsed as ((a if b) rescue c) due to reasons not related to precedence.
See modifier statements.

{ ... } blocks have priority below all listed operations, but do ... end blocks have lower priority. All other words
in the precedence table above are keywords.
Precedence
From highest to lowest, this is the precedence table for ruby. High precedence operations happen before
low precedence operations.
!, ~, unary +

**

unary -

*, /, %

+, -

<<, >>

&

|, ^

>, >=, <, <=

<=>, ==, ===, !=, =~, !~

&&

||
.., ...

?, :

modifier-rescue

=, +=, -=, etc.

defined?

not

or, and

modifier-if, modifier-unless, modifier-while, modifier-until

{ } blocks

Unary + and unary - are for +1, -1 or -(a + b).


Modifier-if, modifier-unless, etc. are for the modifier versions of those keywords. For example, this is a
modifier-unless statement:

a += 1 unless a.zero?

Note that (a if b rescue c) is parsed as ((a if b) rescue c) due to reasons not related to precedence.
See modifier statements.
{ ... } blocks have priority below all listed operations, but do ... end blocks have lower priority.
All other words in the precedence table above are keywords.
Refinements
Due to Ruby’s open classes you can redefine or add functionality to existing classes. This is called a
“monkey patch”. Unfortunately the scope of such changes is global. All users of the monkey-patched class
see the same changes. This can cause unintended side-effects or breakage of programs.
Refinements are designed to reduce the impact of monkey patching on other users of the monkey-patched
class. Refinements provide a way to extend a class locally. Refinements can modify both classes and
modules.
Here is a basic refinement:

class C
def foo
puts "C#foo"
end
end

module M
refine C do
def foo
puts "C#foo in M"
end
end
end

First, a class C is defined. Next a refinement for C is created using Module#refine.


Module#refine creates an anonymous module that contains the changes or refinements to the class (C in
the example). self in the refine block is this anonymous module similar to Module#module_eval.
Activate the refinement with using:

using M

c = C.new

c.foo # prints "C#foo in M"

Scope
You may activate refinements at top-level, and inside classes and modules. You may not activate
refinements in method scope. Refinements are activated until the end of the current class or module
definition, or until the end of the current file if used at the top-level.
You may activate refinements in a string passed to Kernel#eval. Refinements are active until the end of the
eval string.
Refinements are lexical in scope. Refinements are only active within a scope after the call to using. Any
code before the using statement will not have the refinement activated.
When control is transferred outside the scope, the refinement is deactivated. This means that if you require
or load a file or call a method that is defined outside the current scope the refinement will be deactivated:

class C
end

module M
refine C do
def foo
puts "C#foo in M"
end
end
end

def call_foo(x)
x.foo
end

using M

x = C.new
x.foo # prints "C#foo in M"
call_foo(x) #=> raises NoMethodError

If a method is defined in a scope where a refinement is active, the refinement will be active when the
method is called. This example spans multiple files:
c.rb:

class C
end

m.rb:

require "c"

module M
refine C do
def foo
puts "C#foo in M"
end
end
end

m_user.rb:
require "m"

using M

class MUser
def call_foo(x)
x.foo
end
end

main.rb:

require "m_user"

x = C.new
m_user = MUser.new
m_user.call_foo(x) # prints "C#foo in M"
x.foo #=> raises NoMethodError

Since the refinement M is active in m_user.rb where MUser#call_foo is defined it is also active
when main.rb calls call_foo.
Since using is a method, refinements are only active when it is called. Here are examples of where a
refinement M is and is not active.
In a file:

# not activated here


using M
# activated here
class Foo
# activated here
def foo
# activated here
end
# activated here
end
# activated here

In a class:

# not activated here


class Foo
# not activated here
def foo
# not activated here
end
using M
# activated here
def bar
# activated here
end
# activated here
end
# not activated here

Note that the refinements in M are not activated automatically if the class Foo is reopened later.
In eval:

# not activated here


eval <<EOF
# not activated here
using M
# activated here
EOF
# not activated here

When not evaluated:

# not activated here


if false
using M
end
# not activated here

When defining multiple refinements in the same module inside multiple refine blocks, all refinements from
the same module are active when a refined method (any of the to_json methods from the example below) is
called:

module ToJSON
refine Integer do
def to_json
to_s
end
end

refine Array do
def to_json
"[" + map { |i| i.to_json }.join(",") + "]"
end
end

refine Hash do
def to_json
"{" + map { |k, v| k.to_s.dump + ":" + v.to_json }.join(",") + "}"
end
end
end

using ToJSON

p [{1=>2}, {3=>4}].to_json # prints "[{\"1\":2},{\"3\":4}]"

Method Lookup
When looking up a method for an instance of class C Ruby checks:
 If refinements are active for C, in the reverse order they were activated:
o The prepended modules from the refinement for C
o The refinement for C
o The included modules from the refinement for C
 The prepended modules of C
 C
 The included modules of C
If no method was found at any point this repeats with the superclass of C.
Note that methods in a subclass have priority over refinements in a superclass. For example, if the
method / is defined in a refinement for Numeric 1 / 2 invokes the original Integer#/ because Integer is a
subclass of Numeric and is searched before the refinements for the superclass Numeric. Since the
method / is also present in child Integer, the method lookup does not move up to the superclass.
However, if a method foo is defined on Numeric in a refinement, 1.foo invokes that method since foo does
not exist on Integer.
super
When super is invoked method lookup checks:
 The included modules of the current class. Note that the current class may be a refinement.

 If the current class is a refinement, the method lookup proceeds as in the Method Lookup section
above.
 If the current class has a direct superclass, the method proceeds as in the Method Lookup section
above using the superclass.
Note that super in a method of a refinement invokes the method in the refined class even if there is another
refinement which has been activated in the same context. This is only true for super in a method of a
refinement, it does not apply to super in a method in a module that is included in a refinement.

Methods Introspection
When using introspection methods such as Kernel#method or Kernel#methods refinements are not
honored.
This behavior may be changed in the future.
Refinement inheritance by Module#include
When a module X is included into a module Y, Y inherits refinements from X.
For example, C inherits refinements from A and B in the following code:

module A

refine X do ... end

refine Y do ... end

end

module B

refine Z do ... end

end

module C

include A

include B

end

using C

# Refinements in A and B are activated here.

Refinements in descendants have higher precedence than those of ancestors.

Further Reading
See bugs.ruby-lang.org/projects/ruby-master/wiki/RefinementsSpec for the current specification for
implementing refinements. The specification also contains more details.

Timezones
Timezone Specifiers
Certain Time methods accept arguments that specify timezones:
 Time.at: keyword argument in:.
 Time.new: positional argument zone or keyword argument in:.
 Time.now: keyword argument in:.
 Time#getlocal: positional argument zone.
 Time#localtime: positional argument zone.
The value given with any of these must be one of the following (each detailed below):
 Hours/minutes offset.
 Single-letter offset.
 Integer offset.
 Timezone object.

Hours/Minutes Offsets
The zone value may be a string offset from UTC in the form '+HH:MM' or '-HH:MM', where:
 HH is the 2-digit hour in the range 0..23.
 MM is the 2-digit minute in the range 0..59.
Examples:

t = Time.utc(2000, 1, 1, 20, 15, 1) # => 2000-01-01 20:15:01 UTC


Time.at(t, in: '-23:59') # => 1999-12-31 20:16:01 -2359
Time.at(t, in: '+23:59') # => 2000-01-02 20:14:01 +2359

Single-Letter Offsets
The zone value may be a letter in the range 'A'..'I' or 'K'..'Z'; see List of military time zones:

t = Time.utc(2000, 1, 1, 20, 15, 1) # => 2000-01-01 20:15:01 UTC


Time.at(t, in: 'A') # => 2000-01-01 21:15:01 +0100
Time.at(t, in: 'I') # => 2000-01-02 05:15:01 +0900
Time.at(t, in: 'K') # => 2000-01-02 06:15:01 +1000
Time.at(t, in: 'Y') # => 2000-01-01 08:15:01 -1200
Time.at(t, in: 'Z') # => 2000-01-01 20:15:01 UTC

Integer Offsets
The zone value may be an integer number of seconds in the range -86399..86399:

t = Time.utc(2000, 1, 1, 20, 15, 1) # => 2000-01-01 20:15:01 UTC


Time.at(t, in: -86399) # => 1999-12-31 20:15:02 -235959
Time.at(t, in: 86399) # => 2000-01-02 20:15:00 +235959
Timezone Objects
In most cases, the zone value may be an object responding to certain timezone methods.
Exceptions (timezone object not allowed):
 Time.new with positional argument zone.
 Time.now with keyword argument in:.
The timezone methods are:
 local_to_utc:
o Called when Time.new is invoked with tz as the value of positional argument zone or
keyword argument in:.
o Argument: a Time::tm object.
o Returns: a Time-like object in the UTC timezone.

 utc_to_local:
o Called when Time.at or Time.now is invoked with tz as the value for keyword
argument in:, and when Time#getlocal or Time#localtime is called with tz as the value for
positional argument zone.
o Argument: a Time::tm object.
o Returns: a Time-like object in the local timezone.

A custom timezone class may have these instance methods, which will be called if defined:
 abbr:
o Called when Time#strftime is invoked with a format involving %Z.
o Argument: a Time::tm object.
o Returns: a string abbreviation for the timezone name.

 dst?:
o Called when Time.at or Time.now is invoked with tz as the value for keyword
argument in:, and when Time#getlocal or Time#localtime is called with tz as the value for
positional argument zone.
o Argument: a Time::tm object.
o Returns: whether the time is daylight saving time.

 name:
o Called when <tt>Marshal.dump(t) is invoked

o Argument: none.
o Returns: the string name of the timezone.

How to build ruby using Visual C++

Requirement
1. Windows 7 or later.
2. Visual C++ 12.0 (2013) or later.
Note
if you want to build x64 version, use native compiler for x64.
3. Please set environment variable INCLUDE, LIB, PATH to run required commands properly from
the command line.
Note
building ruby requires following commands.
o nmake
o cl
o ml
o lib
o dumpbin
4. If you want to build from GIT source, following commands are required.
o bison
o patch
o sed
o ruby 2.0 or later
5. Enable Command Extension of your command line. It’s the default behavior of cmd.exe. If you
want to enable it explicitly, run cmd.exe with /E:ON option.

How to compile and install


1. Execute win32\configure.bat on your build directory. You can specify the target platform as an
argument. For example, run ‘configure --target=i686-mswin32’ You can also specify the install
directory. For example, run ‘configure --prefix=<install_directory>’ Default of the install directory
is /usr . The default PLATFORMis ‘i386-mswin32_MSRTVERSION’ on 32-bit platforms, or ‘x64-
mswin64_MSRTVERSION’ on x64 platforms. MSRTVERSION is the 2- or 3-digits version of the
Microsoft Runtime Library.
2. Change RUBY_INSTALL_NAME and RUBY_SO_NAME in Makefile if you want to change the
name of the executable files. And add RUBYW_INSTALL_NAME to change the name of the
executable without console window if also you want.
3. Run ‘nmake up’ if you are building from GIT source.
4. Run ‘nmake’
5. Run ‘nmake check’
6. Run ‘nmake install’

Icons
Any icon files(*.ico) in the build directory, directories specified with icondirs make variable
and win32directory under the ruby source directory will be included in DLL or executable files, according to
their base names.
$(RUBY_INSTALL_NAME).ico or ruby.ico --> $(RUBY_INSTALL_NAME).exe

$(RUBYW_INSTALL_NAME).ico or rubyw.ico --> $(RUBYW_INSTALL_NAME).exe

the others --> $(RUBY_SO_NAME).dll

Although no icons are distributed with the ruby source, you can use anything you like. You will be able to
find many images by search engines. For example, followings are made from Ruby logo kit:
 Small favicon in the official site
 ruby.morphball.net/vit-ruby-ico_en.html or icon itself

Build examples
 Build on the ruby source directory.
ex.)

ruby source directory: C:\ruby

build directory: C:\ruby

install directory: C:\usr\local

C:

cd \ruby

win32\configure --prefix=/usr/local

nmake

nmake check

nmake install

 Build on the relative directory from the ruby source directory.


ex.)

ruby source directory: C:\ruby


build directory: C:\ruby\mswin32

install directory: C:\usr\local

C:

cd \ruby

mkdir mswin32

cd mswin32

..\win32\configure --prefix=/usr/local

nmake

nmake check

nmake install

 Build on the different drive.


ex.)

ruby source directory: C:\src\ruby

build directory: D:\build\ruby

install directory: C:\usr\local

D:

cd D:\build\ruby

C:\src\ruby\win32\configure --prefix=/usr/local

nmake

nmake check

nmake install DESTDIR=C:


 Build x64 version (requires native x64 VC++ compiler)
ex.)

ruby source directory: C:\ruby

build directory: C:\ruby

install directory: C:\usr\local

C:

cd \ruby

win32\configure --prefix=/usr/local --target=x64-mswin64

nmake

nmake check

nmake install

Bugs
You can NOT use a path name that contains any white space characters as the ruby source directory, this
restriction comes from the behavior of !INCLUDE directives of NMAKE.
You can build ruby in any directory including the source directory, except win32 directory in the source
directory. This is restriction originating in the path search method of NMAKE.

You might also like