Nexmon 2017
Nexmon 2017
ABSTRACT which results in testbeds such as NITOS [10] or CRC [6]. Schulz
The most widespread Wi-Fi enabled devices are smartphones. They et al. connected WARP SDRs [1] to Android devices in [12] to gain
are mobile, close to people and available in large quantities, which access to Wi-Fi’s physical layer to change parameters such as mod-
makes them perfect candidates for real-world wireless testbeds. ulation schemes and transmit powers to enhance video streaming.
Unfortunately, most smartphones contain closed-source FullMAC These modifications would also run on off-the-shelf hardware, but
Wi-Fi chips that hinder the modification of lower-layer Wi-Fi mech- the blackbox nature of FullMAC chips forces researchers to either
anisms and the implementation of new algorithms. To enable re- move to oversized experimental platforms or limit themselves to
searchers’ access to lower-layer frame processing and advanced the capabilities of proprietary Wi-Fi firmwares as done by Eriksson
physical-layer functionalities on Broadcom Wi-Fi chips, we devel- et al. for their cross-layer optimizations in [3].
oped the Nexmon firmware patching framework. It allows users In this work, we introduce Nexmon [13], an open-source frame-
to create firmware modifications for embedded ARM processors work to write firmware patches in C instead of Assembly with a
using C code and to change the behavior of Broadcom’s real-time special focus on modifying Broadcom FullMAC Wi-Fi firmwares.
processor using Assembly. Currently, our framework supports five Using C as programming language allows rapid prototyping and
Broadcom chips available in smartphones and Raspberry Pis. Our easy portation of existing algorithms to run on the Wi-Fi chip’s
example patches enable monitor mode, frame injection, handling embedded processor. By cleverly using linker scripts, we also man-
of ioctls, ucode compression and flashpatches. In a simple ping off- aged to call functions of the original firmware similar to library
loading example, we demonstrate how handling pings in firmware functions defined in a header file. We further provide means to free
reduces power consumption by up to 165 mW and is nine times multiple kilobytes of space in the original firmware to place new
faster than in the kernel on a Nexus 5. Using Nexmon, researchers functionalities. Our main contributions are:
can unleash the full capabilities of off-the-shelf Wi-Fi devices. • Presentation of how processing works in Broadcom Full-
MAC Wi-Fi chips.
1 INTRODUCTION AND RELATED WORK • Design and development of the Nexmon firmware patching
framework with instructions to implement new function-
The wide-spread availability of wireless infrastructure is one of alities.
the major factors that lead to the success of smartphones. Their • Evaluation of the general operation, energy consumption
mobility makes them a perfect candidate for mobile testbeds. Also, and delay of a ping offloading application.
the Internet of things (IoT) strongly relies on wireless communi-
cation for monitoring and control applications. As a small and Below, we first present the in-chip processing in Section 2, in-
cheap Wi-Fi-enabled platform, the Raspberry Pi is a good candi- troduce Nexmon in Section 3, explain how testbed developers can
date for experimentation in this domain. Both platforms seek for achieve custom goals in Section 4 and then present our evaluation
low-energy consumption to enhance battery life. Hence, they use results in Section 5 followed by a discussion and a conclusion in
FullMAC Wi-Fi chips to handle Wi-Fi-related tasks in an embedded Section 6 and Section 7.
processor that only wakes up the device’s main processor if frames
need handling by an application. Unfortunately, FullMAC chips 2 IN-CHIP PROCESSING
reduce the flexibility to modify Wi-Fi’s behavior in testbeds and As illustrated in Figure 1, all Broadcom Wi-Fi chips consist of an
research applications. To circumvent this limitation, researchers interface to the host (such as the secure digital input output (SDIO)
often employ software-defined radios (SDRs) to access lower layers, interface or the peripheral component interconnect express (PCIE)
bus system), a physical layer to implement the digital baseband
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed signal processing, an analog front end to mix baseband signals up
for profit or commercial advantage and that copies bear this notice and the full citation to or down from the transmission frequency, as well as a D11 core
on the first page. Copyrights for components of this work owned by others than the to handle real-time MAC functionalities. While SoftMAC chips
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission handle non-time-critical functions in the Wi-Fi driver running on
and/or a fee. Request permissions from [email protected]. the host system, FullMAC chips move these responsibilities to an
WiNTECH’17, October 20, 2017, Snowbird, UT, USA. ARM processor embedded in the Wi-Fi chip. This reduces energy
© 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-5147-8/17/10. . . $15.00 consumption, as the host’s processor only needs to wake up from a
DOI: https://doi.org/10.1145/3131473.3131476 sleep state to handle application traffic. Management and control
59
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
RADIO FRONT-END D11 CORE FOR REAL-TIME PROCESSING EMBED. PROC. HOST
Mixers + Amplifiers Programmable Obj. Memory D11 Registers ROM Operating
State Machine Ucode Memory Object Memory Access System
DAC ADC ARM Processor
(PSM) Shared Memory PHY Register Access
PSM Registers Special Purpose Regs. RAM BCMDHD or
PHYSICAL Condition Regs. brcmfmac
Template RAM Access Firmware
LAYER Heap Driver
Template RAM
PHY Registers Receive Special Purpose RX Proc. SDIO/PCIE
RX FIFO and DMA
Engine Registers Interface
Baseband Background TX FIFO DMA
Crypto
OFDM Transmit Transmit Engine Best-Effort TX FIFO DMA DMA
Engine Modification Video TX FIFO DMA
DSSS Kernel
Timer Engine DMA TX Proc.
Voice TX FIFO Memory
Figure 1: For frame processing, each Broadcom Wi-Fi chip contains a D11 core with a programmable state machine to handle
real-time tasks. FullMAC chips, additionally, have an embedded processor to convert between Ethernet frames on the host
side to Wi-Fi frames on the D11 core side and handle Wi-Fi related MAC layer operations.
frames are handled in the Wi-Fi chip. Using a direct memory access 3 INTRODUCING NEXMON
(DMA) controller, the host only exchanges Ethernet frames with To create patches for embedded firmwares, we created Nexmon. It
the Wi-Fi chip. The latter is responsible for forwarding the frame’s follows the philosophy of collecting all the information required
payload over the wireless interface using Wi-Fi headers and correct for patching a firmware directly in the C files that also contain the
physical layer settings to reach the destination node. After prepro- patch code. To define where functions and variables (in general
cessing frames, the ARM firmware places them in its DMA ring symbols) should be placed, we introduced a new at-attribute and
buffers and triggers DMA transfers into FIFO buffers of the D11 targetregion-pragma that we evaluate during compilation with
core. There, a programmable state machine (PSM) takes over to our plugin for the GNU compiler collection (GCC). This approach
control specialized frame processing hardware such as the transmit allows to reuse Nexmon for patching firmwares of other systems
engine that is responsible for passing frames from the FIFOs to the with GCC compiler support.
physical layer. Encryption is employed in the crypto engine and In Figure 2, we present the whole firmware handling workflow.
frame headers are quickly rewritten in the transmit modification Every firmware analysis starts by extracting both RAM and ROM
engine. To control these processing steps, the PSM accesses spe- and analyzing them in IDA to extract address information (see
cial purpose registers that influence the engines’ behaviors. Even Section 3.4) that either ends up in our C patch files to place symbols
though, the ARM processor can also access these registers, it is or in the definitions.mk file used to define addresses for patch
too slow to apply changes on a per frame basis. The PSM, instead, placement and the location of binary blobs. To make space for
executes optimized code to quickly react to changing conditions. our own patch code, we implemented ucode compression based
This could be a timer indicating the need for a retransmission due on [8] to roughly half the size of the ucode stored in the ARM
to a missing acknowledgment. The PSM also handles receptions. firmware. During chip initialization we decompress the ucode
To this end, it analyzes the received bytes in real-time and decides directly into the D11 core’s ucode memory using an adaptation of
if frames need to be dropped, forwarded to the ARM processor or if Andrew Church’s tiny inflate library2 (see Section 3.2). Between
they require an acknowledgment. The latter has strict timing con- extraction and compression of the ucode, we can disassemble and
straints that the PSM can meet by scheduling a transmission from extend it, as done by Schulz et al. in [11] to create a reactive jammer
the template random-access memory (RAM) after a defined time on smartphones. As the binary blob to initialize the template RAM
after completing a frame reception. To program the PSM, we disas- is stored after the ucode, we extract it and let the linker place it
semble the so-called ucode, change it and reassemble it. In FullMAC directly after the compressed ucode. The space freed by ucode
chips, the ARM processor’s firmware stores a binary blob of the compression is used to store symbols that we do not explicitly place
ucode and loads it through object memory access into the D11 core by our at-attribute. Instead, we let the linker collect them in a
during initialization of the chip. The ARM firmware itself is split patch-region using our targetregion-pragma.
in two parts, one persistently stored in read-only memory (ROM, During compilation, our GCC plugin extracts placement informa-
640 KiB on a BCM43391 [2]) and one loaded into RAM (768 KiB on tion and stores them into a nexmon.pre file that Nexmon re-sorts
a BCM4339 [2]) by the BCMDHD (smartphone) or brcmfmac (Rasp- for prioritization resulting in a nexmon2.pre file. Then, Nexmon
berry Pi) Wi-Fi driver. Using Nexmon, we can extend the firmware creates linker and makefiles used to produce and embed patch bi-
loaded into RAM and thereby change the chip’s internal behavior as naries into the original firmware file. To call original firmware
explained in Sections 3 and 4. In the latter, we further explain how functions, we insert their signatures with a dummy function stub
to patch the ROM using flashpatches and how to rewrite ucode. and placement information into the wrapper.c file. This file is
1 Alsocalled CYW4339 after a takeover of Broadcom’s wireless Internet of things
business by Cypress in April 2016. 2 Original tinflate.c file: http://achurch.org/tinflate.c
60
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
FIRMWARE ANALYSIS
fpext merge IDA Pro
rom.clean.bin rom.bin complete_fw.bin complete_fw.idb
ROM firmware dump ROM firmware dump combined firmware for disassembled firmware
without flashpatches with flashpatches analysis as IDA database
extract addresses
fw_bcmdhd.orig.bin UCODE MODIFICATION and structures
original firmware file b43dasm modify
ucode.asm ucode.modified.asm
disassembled ucode modified ucode
INFORMATION
STORAGE
ucode.*.patch b43asm structs.h
BINARY BLOB EXTRACTION
contain code changes ucode.new.bin structures used
ucodeext ucode.bin without original code in firmware
reassembled,
or dd extracted
modified ucode
original ucode definitions.mk
templateram.bin firmware specific
dd extracted xxd take original if modified missing UCODE COMPRESSION addresses (e.g., for
templateram zlibflate and xxd patch placement)
flashpatches.c templateram.c ucode_compressed.c wrapper.c
fpext extracted extracted templateram reassembled, placement information
flashpatches for automatic placement modified ucode and function stubs
included in
NEXMON PATCHING PROCESS wrapper.h
gcc function signatures to
src/*.c gen/*.o
Nexmon gcc object files with compiled call original firmware
patch
plugin functions and variables functions from C code
specific
C files and unresolved symbols
nexmon.pre ld
extracted address gen/*.ld patch.elf
gawk linker files to place contains placed
information
symbols at previously and resolved
gawk symbols
defined addresses fw_bcmdhd.bin
nexmon2.pre make
gen/*.mk patched firmware
prioritized and gawk make files containing binary
sorted address fw_bcmdhd.orig.bin
instructions to extract
information original firmware file
symbols from patch.elf
and insert into firmware
Figure 2: Illustration of the whole Nexmon workflow. We start by analyzing the firmware in IDA to extract address and struc-
ture information. Using this information, we extract binary blobs for replacement (templateram), modification (flashpatches)
and compression (ucode). We require the latter to attain space for firmware patches. Before compression, we can modify
the ucode to change the chip’s real-time behavior. To modify the ARM firmware, we write patches in C, link them against
firmware functions and merge the result into a new firmware.
compiled like any other C file, but the resulting binary blobs are 0x100), the second is a string that can be set to "flashpatch" or
not embedded into the patched firmware. Nevertheless, the linker "dummy". In wrapper.c, "dummy" is used to avoid placing function
knows where to find firmware functions and is able to call them stubs into the firmware. "flashpatch" tells Nexmon to create a
from our patch code. To avoid redefinitions of all function signa- flashpatch that overwrites up to eight bytes in the ROM at the
tures in a header file, we use the wrapper.h file that automatically specified address (see Section 3.3). The other two parameters of
removes the function stubs and only keeps the signatures. Below, the at-attribute allow to condition the use of this attribute to cer-
we present how to handle Nexmon in general and in Section 4 we tain chip and firmware versions (e.g., CHIP_VER_BCM4339 for the
explicitly focus on extending Broadcom firmwares. BCM4339 and FW_VER_ALL used for symbols in ROM, whose ad-
dresses do not change according to the firmware files loaded into
3.1 How to write patches? RAM). By prepending multiple at-attributes with different ver-
sion parameters, one can write one C file and apply it to multiple
To place functions or variables at arbitrary positions, we can prepend platforms and firmware versions.
their definitions by our at-attribute: Besides simply overwriting a function with a patch function, we
at(0x100, "", CHIP_VER_BCM4339, FW_VER_ALL) supply a set of macros to create patches based on inline Assembly
It takes four parameters. The first defines the target address (e.g.,
61
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
code. They are defined in the patcher.h file. Each macro expects a ucode compression also allows to simply extend the ucode without
name as first parameter that influences how the generated symbol is the need to worry about its size for storing it in the ARM firmware.
called in the linker scripts. Placement is done with the at-attribute.
Below, we introduce our macros: 3.3 How to patch read-only memory?
BLPatch(name, func) and BPatch(name, func): Both create
Besides the firmware that is loaded by the driver into the RAM
branch instructions resulting in jumps to the target function func
of the Wi-Fi chip, the chip itself holds a part of the firmware in
that can either be a function name or an address. The addresses
read-only memory (ROM). Even though, it is not possible to per-
are calculated relative to the program counter. During runtime,
manently overwrite this part, a flash patching unit exists in most
BLPatch additionally sets the link register to the address after the
Broadcom chips. It overlays a number of up to eight byte long mem-
created BL instruction which allows to call functions that return.
ory chunks by data defined in RAM. Reading from those patched
HookPatch4(name, func, inst): Calls a hook function func
locations delivers the overlayed data. Hence, it is possible to redi-
before calling the original function by overwriting the first four
rect the program flow from ROM to RAM by simply overlaying
bytes of the original function with a branch instruction to an in-
an instruction in ROM with a branch instruction (e.g., by using
termediate function. The latter pushes the first four registers and
a BLPatch or BPatch). Internally, flash patches are defined by
the link register to the stack to save them from being overwritten
creating an entry in the flash patch configuration array consist-
in the hook function func. After calling the hook function, this
ing of the target address, the length of the patch and a pointer to
patch pops the saved registers from the stack and executes the
the patch data in RAM, which is also stored in an array of eight
instruction inst before continuing to execute the original function.
byte long entries. As the original firmwares do not reserve space
The parameter inst needs to be the assembler instruction that was
to add new flash patch configurations, we automatically extract
overwritten in the original function.
all flash patches and store them in a flashpatch.c file using our
GenericPatch1/2/4(name, val): Overwrites one, two or four
fpext utility. During the firmware build, we reassemble the flash-
bytes with val in the original firmware. We can use the four-byte
patches and place them into the space freed by ucode compression.
version to overwrite function pointers in a function table. The
After firmware initialization this space is freed and assigned to
target function address should be increased by one to indicate
the heap. To define a flash patch in C code, one simply uses the
Thumb instruction set.
keyword “flashpatch” as second parameter of the “at”-attribute:
All symbols, that we do not place explicitly using the at-attribute,
__attribute__((at(..., "flashpatch", ..., ...)))
are collected by the linker and stored in the region defined by the
targetregion-pragma. For every code file, this should be set to
the patch-region that is located at the end of the original ucode 3.4 How to analyze the firmware?
blob in the firmware that was freed by ucode compression. Below, To analyze the whole firmware binary, the ROM of the Wi-Fi chip
we describe how it works. needs to be extracted. To extract a clean ROM dump without applied
flashpatches, the extraction must take place before the configuration
of the latter started during runtime. To achieve this, we created
3.2 Where to embed the patch code? firmware patches that copy the whole ROM content into the RAM
Symbols that are not explicitly placed are collected in memory re- directly after starting the chip (rom_extraction projects in the
gions that also need placement in the firmware file at a location Nexmon repository [13]). Then, we wait in an endless loop. To
that is not overwritten during runtime. Most firmware files do avoid hanging up the driver during normal interface setup, we use
not have such empty spaces, hence, we needed to find a way to dhdutil’s download function to reload the firmware on an already
clear space for our patches. Analysing the firmware at runtime, we running Wi-Fi chip. Then, we use dhdutil’s membytes function to
realized that certain functions and data regions are only needed dump the RAM content and thereby dump the previously copied
during the initialization of the Wi-Fi chip. After using the data, the ROM contents. To analyze this binary in conjunction with a RAM
hndrte_reclaim function is called to free the now unused space firmware file, flashpatches should be applied manually to the ROM
and assign it to the heap. The largest chunck of memory is freed after file using the fpext utility.
writing the ucode firmware into the memory of the programmable Equipped with RAM and ROM binaries, we can create a complete
state machine (PSM) responsible for real-time operations. Analyz- binary of the Wi-Fi firmware. To analyze this firmware and find
ing this ucode binary reveals that it can be compressed by roughly new functions and data structures, we can use IDA Pro with the
50 percent, reducing the size of 44.7 KiB to 22.4 KiB on a BCM4339. ARM Decompiler plugin. The latter allows to create C-like code
This is free space that can be used for our firmware patch code. that helps to understand the program flow and allows comparisons
Hence, we integrated a ucode compression mechanism based on the to other code sources such as the brcmsmac driver that contains
deflate algorithm into our build toolchain. When the ucode should functions similar to those in the firmware. In IDA we first make
be loaded into the code memory of the PSM, we decompress it sure that the code is interpreted as ARM Thumb code in little-
on-the-fly as implemented in the ucode_compression_code.c file endian byte order. Then we start looking for strings that look
whose wlc_ucode_write_compressed function we call by patch- like function names, find their references and name the enclosing
ing the call to wlc_ucode_write in the wlc_ucode_download func- functions accordingly. Then we compare the found function names
tion. To finally reserve the freed space for our patches, we reduced with functions of the brcmsmac driver or binaries of the wl driver
the amount of memory assigned to the heap and placed our patch including symbol names to label more functions in the firmware
binaries at the end of the former ucode region. As a side-effect, binary. The brcmsmac code also helps to name function arguments
62
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
and define their types as structures to make the code more readable. without the need of handling Wi-Fi headers. We use it in our ex-
Once functions are found and declared in one firmware version, we perimental evaluation in Section 5.
can use zynamics’s bindiff plugin for IDA Pro to find the same
functions in other firmwares, even those of other chips.
4.2 How to perform transmissions?
If connected to a network, we can trigger the transmission of Eth-
3.5 How to adapt to new firmware files?
ernet frames, for example, after processing a received frame in
Each chip has a subdirectory (e.g., BCM4339) under the firmwares wl_sendup. To this end, we call the wlc_sendpkt function. It
directory. Each firmware version has an individual subdirectory strips the Ethernet headers, adds Wi-Fi headers and chooses physi-
(e.g., 6 37 34 43) in such a chip subdirectory. Besides the firmware cal layer parameters required to reach the destination. Responsible
file (e.g., fw bcmdhd.bin), it contains a definitions.mk file with for actually settings those parameters is the wlc_d11hdrs_ext func-
firmware specific addresses, such as the start address and size of tion that appends a d11txhdr structure to each frame before it is
the original ucode. To adapt the definitions.mk file, we need to passed to the D11 core for transmission. To this end, frames are first
find those addresses in the new firmware mainly by comparing enqueued with the wlc_prec_enq function and then transmitted
disassembled code pattern of an already analyzed firmware with by calling wlc_send_q. To change transmission parameters, we
those of the new firmware. After updating the definitions, we need can place a hook at the end of the wlc_d11hdrs_ext function and
to find all functions we want to call from our firmware patches. change the d11txhdr structure accordingly.
If we already have an IDA file of another firmware version, we To inject arbitrary frames, Nexmon offers the sendframe helper
can find functions in new firmwares by using IDA’s bindiff plugin. function. It can send raw 802.11 frames starting with Wi-Fi headers.
After that we append new “at”-attributes to function stubs in the For those frames, sendframe calls the light-weight wlc_sendctl
wrapper.c file containing the addresses in the new firmware. To function discovered by Hoffmann in [7]. It takes raw frames, adds
create a new patching project, it is best to copy one of the nexmon the d11txhdr structure, enqueues frames and triggers their trans-
projects from another firmware to the newly added one and adjust mission. Additionally, sendframe can handle frames that already
all “at”-attributes to place patches at the correct locations in the contain the d11txhdr structure. Then sendframe only enqueues
new firmware file. In the next section, we present how researchers and sends those frames. The latter option is useful to gain more
may use the extracted information to achieve goals often required control over the transmission settings by manually calling the
in a testbed but hard to reach with unmodified FullMAC firmwares. wlc_d11hdrs_ext function to create the d11txhdr structure and
then modifying its contents before calling sendframe. In any case,
4 ACHIEVING TESTBED GOALS frames for injection either need to come from the host or need to
Researchers often write firmware patches to accomplish higher be crafted from scratch in the firmware. For the latter, we need to
goals that are not achievable with unmodified Wi-Fi firmwares. create an sk_buff structure by calling pkt_buf_get_skb and fill
This includes the activation of monitor mode and frame injection to its data section with the raw frame bytes.
implement custom low-layer communication protocols in the oper-
ating system followed by a firmware implementation with reduced
4.3 How to handle retransmissions?
latencies and lower power consumption. Besides regular frame pro-
cessing, Nexmon further offers direct access to the physical layer Retransmissions are handled by the D11 core. Whenever a transmit-
that, for example, unleashes SDR-like features to transmit arbitrary ted frame requires an acknowledgment by the receiver, the frame
signals as done in [11]. Below, we present a selected set of goals is retransmitted as often as defined by the short retry limit (SRL)
that can be achieved, mainly focusing on the extension of frame respectively the long retry limit (LRL). By default SRL is set to 6
processing capabilities and more control over frame transmission and LRL to 7. We can change the values by using the WLC_SET_SRL
parameters. and WLC_SET_LRL ioctls either with nexutil from userspace, or
within the firmware by calling our set_intioctl helper function.
For retransmissions, we can define up to four fallback rates on
4.1 How to handle receptions?
802.11ac chips. The first is used for the first three retransmissions,
In the ARM processor, all frames received by the D11 core are han- the second for the fourth, the third for the fifth and the fourth for
dled in the wlc_bmac_recv function that collects them from the any other retransmission. To define those rates, we hooked the
DMA ring buffers and passes them to the wlc_recv function. If wlc_antsel_antcfg_get function that is called during the prepara-
monitor mode is active (e.g., by calling nexutil -m1), this func- tion of the d11txhdr. Using this hook, we get access to an instance
tion calls the wlc_monitor function that extracts receive statistics of the ratesel_txparams structure that contains the rspec array
and writes them into the wl_rxsts structure. Then it passes both to define the retransmission rates.
the statistics and the frame to the wl_monitor function. This is
the function we hook to implement monitor mode with radiotap
headers. If the Wi-Fi chip is connected to a network, the wlc_recv 4.4 How to set transmit powers?
function also calls a chain of functions used to strip Wi-Fi headers Broadcom offers the qtxpower iovar that can be set using the
and replace them with Ethernet headers. At the end, wl_sendup is WLC_SET_VAR ioctl. It allows to overwrite the transmit power for
called to initiate the transfer of the received frames to the host’s all transmitted frames. In FullMAC firmwares, this setting can
operating system. This makes wl_sendup the perfect place to im- only choose transmit powers smaller than the regulatory limita-
plement mechanisms with the benefits of running in the firmware tions. To exceed these limitations, a debugging firmware is required
63
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
that checks the txpwroverride variable. As we also want to en- 4.7 How to talk to the firmware?
able arbitrary power settings in off-the-shelf firmwares, we sim- For many applications, it is helpful to configure a firmware during
ply nop the call to the ppr_compare_min function that calculates runtime or extract information for debugging purposes. Below,
the minimum between user targets and the regulatory limits in we present means to directly access the chips memory (1), use the
the wlc_phy_txpower_recalc_target function. The value set by printf function (2), extract data through tunnels using the user
qtxpower is first translated into a power index that the hardware datagram protocol (UDP) (3) and use ioctls to control the firmware
uses to set actual gains at the amplifiers automatically. To also get (4). To directly access the chip’s internal memory (1), we can use
full control over the amplifier values, we need to deactivate hard- the dhdutil with its membytes option. It allows to read from and
ware power control using the wlc_phy_txpwrctrl_enable_acphy write to arbitrary memory locations in the RAM and may also di-
function and can then abuse the wlc_phy_txcal_cleanup_acphy rectly read the ROM on some chips. Additionally, dhdutil offers
function to set all gains manually according to the definitions in the consoledump option that dumps the internal console buffer of
the ac_txgain_settings structure. the firmware to which we can write by calling the printf func-
tion (2). This allows to pass small amounts of textual data to the
user space. To send more data, we can encapsulate it in a UDP
4.5 What are the internal structures? frame (3) and send it to the broadcast Internet protocol (IP) address
To handle the internal state of the firmware, a number of structure 255.255.255.255. Those frames are always accepted by the Linux
instances are used and passed to functions. Most of these instances kernel and passed on into the user space, where they can even be
are created on the heap during the initialization of the firmware. received by apps without root privileges. To implement this in the
Even though, they are always placed at the same positions in one firmware, we first create a new sk_buff buffer and fill it with the
firmware version, absolute references to these addresses should be desired data and then prepend Ethernet, IP and UDP headers using
avoided in the patch code as firmware patches allocating space on our prepend_ethernet_ipv4_udp_header helper function (that
the heap can lead to address changes of these structures. If the loca- uses UDP port 5500 by default). Then, we call the xmit function of
tion of one structure is known, we can derive the addresses of the the wl device to send the frame to the host. Alternatively, to initiate
other structures. The wlc_info structure is the main structure han- transfers from the firmware, a user-space program such as nexutil
dling the state of the high-layer driver functionalities such as the can also initiate a synchronous data exchange with the firmware
association state. It is mainly passed to functions starting with wlc_, by calling ioctls in the firmware (4). Each ioctl contains a command
but not to those starting with wlc_bmac_. The latter normally ex- number, a pointer to a buffer to exchange data and the length of
pect the wlc_hw_info structure managing hardware specific states this buffer. Ioctls can either only set data or set and get data back
such as access to the physical layer. The above mentioned structures from the firmware. For the two directions, nexutil offers the two
are independent of the operating system. The osl_info structure parameters -s<command_number> and -g<command_number> and
keeps track of using operating system resources such as those used may either pass integers, strings, raw data from the standard in-
for the creation of sk_buff instances. Even though, no operating put or base64 encoded raw data to the firmware. There, ioctls are
system is running on the Wi-Fi chip, Broadcom offers a minimal handled in the wlc_ioctl function that we hooked to check for
library with functions required to operate the Wi-Fi firmware. An- custom ioctl command numbers and handle them in ioctl.c. To
other operating system specific structure is wl_info that is required easily send back strings to the caller of a get-ioctl, we offer the
by functions interacting with the operating system interface, for argprintf function, that writes strings into the ioctl buffer and
example, to pass frames from the firmware to the Linux kernel. handles the remaining size automatically. In our git repository [13],
you can find examples for all four ways of communication as well
as the sources to build firmware patches and the used utilities.
4.6 How to set channel specifications?
For some experiments, researchers need to set restricted channel 4.8 How to modify the real-time firmware?
specifications (e.g., to use channel 14). On FullMAC chips, all avail- The real-time firmware is the ucode running in the programmable
able channels are defined in the firmware and only those allowed state machine (PSM) in the D11 core. In FullMAC chips, the ARM
in the regulatory domain are selectable. These channels are also firmware contains the ucode as binary blob and loads it into the
reflected in the operating system. Hence, by patching the firm- ucode memory of the D11 core. As only seven out of eight ucode
ware, we automatically modify the channels selectable by the host bytes are actually used, some firmwares store the ucode with
system. When the list of selectable channel specifications is gener- the eighth byte omitted. To extract those firmwares, we use our
ated at chip initialization or when changing regulatory domains, ucodeext utility. For ucodes that contain the eighth byte, we simply
the wlc_valid_chanspec_ext function is called for all possible use dd to extract them from the ARM firmware. After extraction, we
channel specifications. It returns 1 for every valid selection. To use the b43-dasm disassembler contained in the b43-tools3 to disas-
activate more channels, we hook the wlc_valid_chanspec_ext semble the ucode. As illustrated in Figure 1, the PSM has access to
function and return 1 for any channel we intend to activate. This condition registers and special purpose registers (SPRs). To replace
only allows to select channels that are standardized. To further register numbers by speaking names defined in the cond.inc and
set arbitrary specifications (e.g., to activate 80 MHz bandwidth spr.inc, we use the b43-beautifier. As it is still hard to under-
in the 2.4 GHz band as demonstrated in [11]), we need to patch stand the meaning of uncommented code, we intended to analyze
the wf_chspec_malformed function to always return 0 to disable
checking for a legal set of parameters. 3 b43-tools repository: https://github.com/mbuesch/b43-tools
64
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
it to figure out its meaning. To this end, Koch realized in [9] that 550 Ping handling in kernel
6
5 EXPERIMENTAL RESULTS
4
To demonstrate the benefits of modifying firmwares with Nex-
mon, we chose a simple ping offloading application4 . Instead of 2
answering ping requests in the kernel, we do it in the firmware. We
0
chose the ping application as it is similar to forwarding frames in a
0 6 10.1 16.8 28 46.7 77.8 129.6 216 360 600 1000
mesh setup, but does not require to modify the kernel on our own.
Additionally, handling frames in the kernel is the most efficient Targeted Number of Pings per Second
implementation achievable on the host’s processor running Linux.
Our setup consists of two Nexus 5 smartphones running the rooted Figure 5: The round trip time to answer pings in the firm-
stock firmware version M4B30Z and are located one meter apart. ware is deterministically low at 230 µs, while it strongly
Both are connected in Ad-Hoc mode on the otherwise unused chan- varies and is much higher in the kernel, likely due to waking
nel 6 with 20 MHz bandwidth. They exchange 802.11ac frames with up from energy-saving states.
MCS 8, which is normally not supported in the 2.4 GHz band, but
still available due to Nexmon. We disabled retransmissions and
AMPDUs to send only one frame per ping request and reply. Us- that hooks the call to the handler function used for offloading the
ing the Android Debug Bridge (ADB), we setup the first phone to address resolution protocol (ARP) in the wl_sendup function that
transmit ping requests with 1200 byte payload to the second phone. is called after replacing Wi-Fi headers by Ethernet headers, shortly
For our experiments, we increased ping intervals by a factor of before pushing up frames to the host. Here, we check for ping
5/3, resulting in the targeted frame rates shown in our Figures 3 requests and generate ping responses encapsulated in Ethernet
to 5. On the second phone, we installed our modified firmware frames that we send using the wlc_sendpkt function that creates
the correct Ad-Hoc Wi-Fi headers and transmits the frames. During
4 Ping offloading application source code: https://nexmon.org/ping offloading our experiments, we can toggle this ping offloading functionality
65
Session: Innovative Experimentation Platforms and Methods WiNTECH17, October 20, 2017, Snowbird, UT, USA.
66