Vector TRM PDF
Vector TRM PDF
Vector TRM PDF
Anki Vector
A LOVE LETTER TO THE
LITTLE DUDE
AUTHOR RANDALL MAAS
OVERVIEW This book explores how the Anki Vector was realized in hardware and software.
LinkedIn: http://www.linkedin.com/pub/randall-maas/9/838/8b1
PREFACE ............................................................................................................................1
1. ORGANIZATION OF THIS DOCUMENT ..............................................................................1
1.1. ORDER OF DEVELOPMENT ...........................................................................................3
1.2. VERSION(S) .............................................................................................................3
1.3. CUSTOMIZATION AND PATCHING ..................................................................................4
1.4. CODE NAMES OR VECTOR VS VICTOR .............................................................................4
CHAPTER 1 .........................................................................................................................5
OVERVIEW OF VECTOR.......................................................................................................5
2. OVERVIEW ..............................................................................................................5
2.1. COMPELLING CHARACTER ...........................................................................................5
2.2. FEATURES ...............................................................................................................6
3. PRIVACY AND SECURITY ..............................................................................................8
4. COZMO ..................................................................................................................8
5. ALEXA INTEGRATION ..................................................................................................9
PART I.............................................................................................................................. 11
ELECTRONICS DESIGN....................................................................................................... 11
CHAPTER 2 ....................................................................................................................... 13
CHAPTER 3 ....................................................................................................................... 18
CHAPTER 4 ....................................................................................................................... 22
CHAPTER 5 ....................................................................................................................... 38
PART II............................................................................................................................. 41
CHAPTER 6 ....................................................................................................................... 43
ARCHITECTURE ................................................................................................................ 43
16. OVERVIEW OF VECTOR’S COMMUNICATION INFRASTRUCTURE............................................ 43
16.1. APPLICATION SERVICES ARCHITECTURE ......................................................................... 44
16.2. EMOTION MODEL, BEHAVIOUR ENGINE, ACTIONS AND ANIMATION ENGINE ......................... 46
17. STORAGE SYSTEM ................................................................................................... 47
17.1. ELECTRONIC MEDICAL RECORD (EMR)......................................................................... 47
17.2. OEM PARTITION FOR OWNER CERTIFICATES AND LOGS .................................................... 49
18. SECURITY AND PRIVACY ............................................................................................ 49
18.1. ENCRYPTED COMMUNICATION ................................................................................... 50
18.2. ENCRYPTED FILESYSTEM ............................................................................................ 50
18.3. THE OPERATING SYSTEM .......................................................................................... 50
18.4. AUTHENTICATION ................................................................................................... 51
19. CONFIGURATION AND ASSET FILES............................................................................... 51
19.1. CONFIGURATION FILES.............................................................................................. 51
20. SOFTWARE-HARDWARE LAYERS ................................................................................. 52
20.1. THE BODY BOARD INPUT/OUTPUT .............................................................................. 52
20.2. THE LCD DISPLAY .................................................................................................... 52
20.3. THE CAMERA.......................................................................................................... 53
21. REFERENCES & RESOURCES ....................................................................................... 53
STARTUP ......................................................................................................................... 55
22. STARTUP ............................................................................................................... 55
22.1. QUALCOMM’S PRIMARY AND SECONDARY BOOTLOADER................................................... 55
22.2. ANDROID BOOTLOADER (ABOOT) ............................................................................. 56
22.3. REGULAR SYSTEM BOOT............................................................................................ 57
22.4. ABNORMAL SYSTEM BOOT......................................................................................... 60
22.5. REGULAR REBOOTS ................................................................................................. 60
23. REFERENCES & RESOURCES ....................................................................................... 60
CHAPTER 8 ....................................................................................................................... 61
POWER MANAGEMENT.................................................................................................... 61
24. POWER MANAGEMENT ............................................................................................. 61
24.1. BATTERY MANAGEMENT ........................................................................................... 61
24.2. RESPONSES, SHEDDING LOAD / POWER SAVING EFFORTS .................................................. 62
24.3. SLEEP STATES ......................................................................................................... 64
24.4. ACTIVITY LEVEL MANAGEMENT .................................................................................. 65
24.5. SHUTDOWN ........................................................................................................... 65
24.6. THE CUBE POWER MANAGEMENT .............................................................................. 66
25. CHARGING ............................................................................................................ 66
CHAPTER 9 ....................................................................................................................... 67
CHAPTER 10 ..................................................................................................................... 70
COMMUNICATION ........................................................................................................... 73
CHAPTER 11 ..................................................................................................................... 75
COMMUNICATION ........................................................................................................... 75
30. OVERVIEW OF VECTOR’S COMMUNICATION INFRASTRUCTURE............................................ 75
31. INTERNAL COMMUNICATION WITH PERIPHERALS ............................................................. 76
31.1. COMMUNICATION WITH THE BODY-BOARD ................................................................... 76
CHAPTER 12 ..................................................................................................................... 84
CHAPTER 13 ..................................................................................................................... 91
This book is my attempt to understand the Anki Vector and its construction; it is not authoratative
and is based on speculation. Speculation informed by Anki’s SDKs, blog posts, patents and FCC
filings; by articles about Anki, presentations by Anki employees; by PCB photos, and hardware
teardowns from others; by a team of people (Project Victor) analyzing the released software; and
by experience with the parts, and the functional areas. When you do find errors (and typos), please
contact me (my email is on the second page.)
PART I: ELECTRICAL DESIGN. This part provides an overview of the design of the electronics in
Vector and his accessories:
PART II: BASIC OPERATION. This part provides an overview of Vector’s software design.
CHAPTER 9: BUTTON & TOUCH INPUT AND OUTPUT LEDS. A look at the push button, touch
sensing, surface proximity sensors, time of flight proximity sensing, and backpack LEDs.
PART III: COMMUNICATION. This part provides details of Vector’s communication protocols. These
chapters describe structure communication, the information that is exchange, its encoding, and the
sequences needed to accomplish tasks. Other chapters will delve into the functional design that the
communication provides interface to.
CHAPTER 12: COMMUNICATION WITH THE BODY-BOARD. The protocol that the body-board
responds to.
CHAPTER 13: BLUETOOTH LE. The Bluetooth LE protocol that Vector responds to.
CHAPTER 14: SDK PROTOCOL. The HTTPS protocol that Vector responds to.
CHAPTER 16: CLOUD. A look at how Vector syncs with remote services.
PART IV: ADVANCED FUNCTIONS. This part describes items that are Vector’s primary function.
CHAPTER 17: AUDIO INPUT. A look at Vector’s ability to hear spoken commands, and ambient
sounds.
CHAPTER 18: IMAGE PROCESSING. Vector vision system is sophisticated, with the ability to
recognize marker, faces, and objects; to take photographs, and acts as a key part of the
navigation system.
CHAPTER 19: MAPPING & NAVIGATION. A look at Vector’s mapping and navigation systems.
CHAPTER 20: ACCESSORIES. A look at Vector’s home (charging station), companion cube and
custom objects.
CHAPTER 21: ANIMATION. An overview how Vector’s scripted animations represents the
“movements, faces, lights and sounds;” and how they are coordinated.
CHAPTER 22: LIGHT ANIMATION. An overview of the backpack and cube light animation.
CHAPTER 23: DISPLAY & PROCEDURAL FACE. Vector displays a face to convey his mood and
helps forms an emotional connection with his human.
CHAPTER 24: AUDIO PRODUCTION. A look at Vector’s sound effects and how he speaks
CHAPTER 26: ANIMATION FILE FORMAT. The format of Vector’s binary animation file
CHAPTER 28: EMOTION/MOOD MODEL. At Vector’s emotions, where they come from and how
they impact the sounds and choices he makes.
CHAPTER 29: BEHAVIOR TREES. A look at how the behaviors are selected and their settings.
PART VII: MAINTENANCE. This part describes items that are not Vector’s primary function; they
are practical items to support Vector’s operation.
CHAPTER 30: SETTINGS, PREFERENCES, FEATURES AND STATISTICS. A look at how Vector
syncs with remote servers
CHAPTER 31: SOFTWARE UPDATES. How Vector’s software updates are applied.
REFERENCES AND RESOURCES. This provides further reading and referenced documents.
APPENDICES: The appendices provide extra material supplemental to the main narrative. These
include tables of information, numbers and keys.
APPENDIX B: TOOL CHAIN. This appendix lists the tools known or suspected to have been
used by Anki to create, and customize the Vector, and for the servers. Tools that can be used
to analyze Vector
APPENDIX C: ALEXA MODULES. This appendix describes the modules used by the Alexa client
APPENDIX D: FAULT AND STATUS CODES. This appendix provides describes the system fault
codes, and update status codes.
APPENDIX E: FILE SYSTEM. This appendix lists the key files that are baked into the system.
APPENDIX G: SERVERS. This appendix provides the servers that the Anki Vector and App
contacts.
APPENDIX H: FEATURES. This appendix enumerates the Vector OS “features” that can be
enabled and disabled; and the AI behavior’s called “features.”
APPENDIX I: PHRASES. This appendix reproduces the phrases that Vector keys off of.
APPENDIX J: EMOTION EVENTS. This appendix provides a list of the emotion events that
Vector internally responds to.
APPENDIX K: DAS EVENTS. This appendix describes the identified DAS events
APPENDIX L: PLEO. This appendix gives a brief overview of the Pleo animatronic dinosaur, an
antecedent with many similarities.
Note: I use many diagrams from Cozmo literature. They’re close enough
Most chapters will description a vertical slice or stack of the software. The higher levels will
discuss features and interactions with other subsystems that have not been discussed in detail yet.
For instance, the section on the basic operation of Vectors hardware includes layers that link to the
behavior and communication well ahead of those portions. Just assume that you’ll have to flip
forward and backward from time to time.
The communication interface has its own section with the relevant interactions, commands,
structures and so on.
1.2. VERSION(S)
The software analyzed here is mostly version 1.5 and 1.6 of Vector’s production software, as well
as some of the development version of 1.7. There are incremental differences with each version; I
The software in the main processor may be customizable; that will be discussed in many
areas of the rest of the document
The body-board firmware is field updatable, and will take expertise to construct updates.
The cube firmware can be updated, but that appears to be the hardest to change, and not
likely to be useful.
Overview of Vector
Anki Vector is a cute, palm-sized robot; a buddy with a playful, slightly mischievous character.
This chapter provides an overview of Vector:
2. OVERVIEW
Vector is an emotionally expressive, life-like, animatronic robot pet that people connect with and
feel affection for.
A character has identifiable traits, and moods, something that we can empathize with.
A compelling character tries but doesn’t always succeed. As Pixar said, “we admire a
character trying more than for their successes”
He can sense the environment and has some awareness of what they and others are doing…
He knows that he succeeded – or didn’t – and that affects his mood.. So a character has
moods, emotions and that affects what it does and how it does it.
Movements vary and are never quite the same. When they look repetitive, they break the
illusion. This is true for choices, reactions and other behaviors too.
There are little motions, sounds and body’s affect that anticipate what a character is
thinking and going to do
Vector has a wide variety of behaviors, little motions (animations), and even some emotions that
give him a personality. He can express emotions thru expressive eyes (on an LCD display), raising
and lower his head, sounds, wiggling his body (by using his treads), or lifting his arms… or
shaking them. He can sense surrounding environment, interact and respond to it. He can
recognize his name1, follow the gaze of a person looking at him, and seek petting. 2
2.2. FEATURES
Although cute, small, and affordable,3 Vector’s design is structured like many other robots.
Internal microphone(s) to listen, hear commands and sense the ambient activity level
A button that is used to turn Vector on, to cause him to listen – or to be quiet (and not
listen), to reset him (wiping out his robot-specific information).
Segmented lights on Vector’s backpack are used to indicate when he is on, needs the
charger, has heard the wake word, is talking to the Cloud, can’t detect WiFi, is booting, is
resetting (wiping out his personality and robot-specific information).
An LCD display, primarily to show eyes of a face. Robot eyes were Anki’s strongest piece
of imagery. Vector smiles and shows a range of expressions with his eyes.
His head can be tilted up and down to represent sadness, happiness, etc.
He can use his treads to shake or wiggle, usually to express happiness or embarrassment
A camera is used to map the area, detect and identify objects and faces.
Fist-bump and being lifted can be detected using an internal inertial measurement unit
(IMU)
1
Vector can’t be individually named.
2
Admittedly this is a bit hit and miss.
3
Although priced as an expensive toy, this feature set in a robot is usually an order of magnitude more expensive, with less quality.
4
and possibly a pat on his head?
Ground sensing proximity sensors that are used to detect cliffs at the edge of his area and
to following lines when he is reversing onto his charger.
Using his arms Vector can lift or flip a cube; he can pop a wheelie, or lift himself over a
small obstacle.
Communication with the external world is thru WiFi and Bluetooth LE.
Motion control
At the lowest level can control each of the motors speed, degree of rotation, etc. This
allows Vector to make quick actions.
Combined with the internal sensing, he can drive in a straight line and turn very tightly.
To do all this, the motion control takes in feedback from the motor encoder, IMU-
gyroscope. May also use the image processing for SLAM-based orientation and
movement.
Vector plans a route to his goals – if he knows where his goal is – along a path free of
obstacles; he adapts, moving around in changing conditions.
Maps are built using simultaneous location and mapping (SLAM) algorithms, using the
camera and IMU gyroscope movement tracking, time of flight sensor to measure distances,
and particle system algorithms to fill in the gaps.
Behaviour system:
Confident
Social
Stimulated
Vision. This is one of Anki’s hallmarks: they used vision where others used beacons. For instance,
iRobot has a set of IR beacons to keep the robots of out areas, and to guide it to the dock. Mint has
an IR beacon that the mint robots use to navigate and drive in straight lines. Although Vector’s
companion cube is powered, this is not used for localization. It has markers that are visually
recognized by Vector.
Illumination sensing
Motion sensing
Detecting faces and gaze detection allows him to maintain eye contact
4. COZMO
We shouldn't discuss Vector without mentioning the prior generation. Vector’s body is based
heavily on Cozmo; the mechanical refinements and differences are relatively small. Vector’s
software architecture also borrows from Cozmo and extends it greatly. Many of Vector’s
behaviours, senses, and functions were first implemented in Cozmo (and/or in the smartphone
application). One notable difference is that Cozmo did not include a microphone.
Cozmo includes a wide variety of games, behaviours, and ~940 animation scripts. Cozmo’s engine
is reported to be “about 1.8 million lines of code, the AI, computer vision, path planning,
everything.”5 This number should be discounted somewhat, as it likely includes many large 3rd
party modules… Nonetheless, it represents the scale of work to migrate Cozmo’s code base for
reuse in Vector.
5
https://www.reddit.com/r/IAmA/comments/7c2b5k/were_the_founders_of_anki_a_robotics_and_ai/
5. ALEXA INTEGRATION
Vector includes Amazon Alexa functionality, but it is not intimately integrated. Vector only acts
like an Echo Dot, as pass thru for Alexa service. By using the key word “Alexa,” Vector will
suppress his activity, face and speech, and the Alexa functionality takes over. Vector has no
awareness of Alexa’s to-do list, reminders, messages, alarms, notifications, question-and-answers,
and vice-versa; nor can he react to them.
The most likely reason for including Alexa is the times: everything had to include Alexa to be hip,
or there would be great outcry. Including Alexa may have also been intended to provide
functionality and features that Anki couldn’t, to gain experience with the features that Amazon
provides, and (possibly) with the intent to more tightly integrate those features into Anki products
while differentiating themselves in other areas.
“[Alexa Voice Service] solutions for Alexa Built-in products required expensive
application processor-based devices with >50MB memory running on Linux or Android”6
Alexa’s software resources consume as much space as Vector’s main software. And the software
is not power efficient. Even casual use of Alexa noticeably reduces battery life, and (anecdotally)
increases the processor temperature.
6
https://aws.amazon.com/blogs/iot/introducing-alexa-voice-service-integration-for-aws-iot-core/
Alexa’s SDK and services have continued to evolve. New Alexa SDKs allow simpler processors and smaller code by acting as little
more than a remote microphone.
Electronics Design
This part provides an overview of the design of the electronics in Vector and his accessories
BACKPACK & BODY-BOARD ELECTRONICS DESIGN. A detailed look at the electronics design of
Vector’s backpack and motor driver boards.
Note: In previous versions called the circuit board in the bottom half the “base-board”. It is now
referred to as “body-board” to match Anki’s naming
Electronics Design
Description
This chapter describes the design of Vector’s electronics:
Design Overview, outlining the main subsystems
Power distribution
Subsequent chapters will examine in detail the design of the subsystems
6. DESIGN OVERVIEW
Vector’s design includes numerous features to sense and interact with his environment, other to
interact with people and express emotion and behaviour.
Figure 3: Circuit
Head Board board topology
Time of Backpack
Body-Board
Flight sensor Board
Surface
Proximity
44 Motors
Motors 4Motors
4 Motors
Motors
Sensors
Shaft
44 Motors
Motors
Encoders
The main two boards are the head-board where the major of Vector’s processing occurs, and the
body-board, which drives the motors and connects to the other boards.
Figure 4: Vector’s
Main circuit board, LCD main microcontroller
display for facial expression,
and HD camera circuit boards
Table 3: Vector’s
Circuit Board Description
circuit boards
backpack board The backpack board has 4 RGB LEDs, 4 MEMS microphones, a touch wire, and a button.
This board connects to the body-board.
body-board The body board drives the motors, provides power management, and the battery charger.
encoder-boards The two encoder boards have single opto-coupler encoders each. The encoder is used to
monitor the position of the arms and head, either as driven by the motor, or by a person
manipulating them.
head-board The head-board includes the main processor, flash & RAM memory storage, an IMU, and a
PMIC. The WiFi and Bluetooth LE are built into the processor. The camera and LCD are
attached to the board, thru a flex tape. The speaker is also attached to this board.
time of flight sensor The time of flight sensor is on a separate board, allowing it to be mounted in Vector’s front.
board
Figure 5: Power
MP2617B Backpack distribution
Battery
Charger Board
Body Board
When the charging pads are energized – when Vector is in the charging dock – the system is
powered by the external power source.
Excessive current demand – such as from a stalled motor – can trigger a system brown-out and
shutdown.
6.1.1 Battery
Vector battery is a single-cell 3.7v 320mAh “toy safe” lithium-ion polymer battery. The battery battery
is connected to the body-board. The pack is not a “smart” battery – it only has positive and
negative leads, lacking an onboard temperature sensor or battery management system (BMS).
Battery heat is the most significant source of battery “aging” – its effective service life. High
recharge rates internally heat the cells, causing them to deteriorate. Vector’s battery thinness gives
it a high surface area to volume ratio allowing it shed heat much faster, greatly reducing the
internal heating from charging and heavy loads. The battery is physically separated from the body-
board, isolating it from the heat generated in the charging, power distribution and motor driver
circuits. This increases the battery service life.
Vector takes care to thermally manage the battery, to promote a longer service life. The software
monitors the body board temperature (as a proxy of the battery temperature). When the
temperature gets above one or more thresholds (e.g. 50C), Vector can slow down or stops his
activities and charging to allow the battery cool.
The battery has a low internal resistance. This reduces the internal heating and allowing it to
usefully deliver higher currents without resulting in a brown-out. “Vector has brief but high (2A)
peak currents when doing certain computations or flipping himself with his lift.”
Anki engineers certainly desired easy-to-replace batteries, and larger batteries. But there were
challenges. Battery replacement requires more parts and design features. A larger battery would
allow longer play time between charges, but they often have higher internal resistance (thus more
prone to brown out). So it would have taken finding one with good thermal characteristics (i.e.
didn’t get too hot), was toy safe despite holding more charge and chemicals, and so on. Ultimately
schedule prevented finding a suitable larger battery.
When Vector is going into an off state – such as running too low on power, going into a
ship state before first use, or has been turned off by a human companion – the MP2617B
charger and power converted can be signaled to turn off.
When Vector is turned off the boards are not energized. The exception is that the high side
of the push button is connected to the battery. When closed, the signals the MP2617B to
connect the battery to the rest of the system, powering it up.
The MP2617B is also responsible for charging the battery. There are two pads that mate
the dock to supply energy to charge the battery.
In many rechargeable lithium ion battery systems there is a coulomb counter to track the state of
charge. Vector does not have one. The need for recharge is triggered solely on the battery voltage.
Head-board
Electronics Design
Description
This chapter describes the electronic design of Vector’s head-board:
Detailed design of the head-board
Figure 6: Head-board
Speaker
Vpwr PM8916 block diagram
PMIC LCD
backlight
Flash/RAM
Body-Board
Communication
SDHC1
UART
SPI1 LCD
Console UART UART
APQ8009 Microprocessor
USB USB
SPI0
MIPI
I2C6
IMU
The APQ8009 processor is a sibling to the MSM8909 processor employed in cell phones, where
APQ is short for “Application Processor Qualcomm” and MSM is short for “Mobile Station
Modem.” The difference is that the later includes some form of modem, such as HPSA, CDMA,
or LTE. Both designators are used in software code-bases employed by Vector. The most likely
reason is the naming of registers, drivers, and other useful software didn't carefully limit the use of
MSMxxxx references to just the processors with modems.
The flash & RAM are connected to the processor on SDHC1. The device tree file shows that
during development Vector’s also supported an SD card slot on SDHC2.
The processor dynamically adjusts its clock frequency, within an allowed region. The processor
can be configured to limit its speed.
8.2. SPEAKER
The speaker is driven at 16bits, single channel, with a sample rate of 8000-16025 samples/sec.
The prior generation, Cozmo, used an OLED display for his face and eyes. This display had the US Patent 20372659
strengths of high contrast and self-illumination. However, OLEDs are susceptible to burn-in and
uneven dimming or discoloration of overused pixels. Anki addressed this with two
accommodations. First it gave the eyes regular motion, looking around and blinking. Second, the
LCD’s illuminated rows were regularly alternated to give a retro-technology interlaced row effect,
like old CRTs.
Vector’s IPS display gives a smoother imagery – Cozmo’s OLED was simply black and white.
The LCD is also much less susceptible to burn-in, at the expense of higher power. Vector’s LCD
can also develop dead lines (or pixels) that grow in number until the display is non-functional.
Some units have a defective LCD, where the glass is not properly sealed. This allows moisture in,
causing progressive damage to the LCD. It is also speculated that these lines come from shocks to
the head, causing breaks in the LCD connections.
The headboard can be put into a lower power state by reducing the clock rate of processor and
using its sleep features; dimming or turning off the LCD, and reducing the camera frame rate (or
turning it off). The APQ8009 processor has many sophisticated power controls, but these were not
fully realized in Vector’s software.
There are per unit keys, MAC addresses and serial numbers
Each processor has its own unique key, used to with the Trust Zone
The USB interface is used to load firmware. The microprocessors include a built-in boot-loader
(ABOOT), which includes support for loading firmware into the devices flash.
There is a UART, that provides a boot console, but does not accept input
The WiFi, once MAC addresses have been loaded into the unit
Figure 7: Backpack
Bat+ board block diagram
Push button
Button
state
Clock
Connector
44RGB
74AHC164 4 RGB
RGB
LEDs
& Data LEDs
LEDs
Clock
& Data
44MEMs
4MEMs
MEMs
Touch Microphones
Microphones
Microphones
Vpwr
Table 6: Backpack
Elements Description
board functional
74AHC164 A SPI-like GPIO expander. This is used to drive the RGB LEDs. elements
microphones There are 4 far-field MEMS PDM microphones. The microphones are accessed via SPI, in an
output only mode. These are designated MK1, MK2, MK3, MK4
push button A momentary push button is connected to the battery terminal, allowing a press to wake
Vector, as well as signal the processor(s).
RGB LEDs There are 4 RGB LEDs to make up a segmented display. Each segment can be illuminated
individually (in a time multiplexed manner) or may share a colour configuration with its
counterparts.
touch sensor A touch-sensing wire (and passive components)
10.3. OPERATION
The touch sensor conditioning and sensing is handled by the body-board. The touch sense wire is
merely an extension from the body-board through the backpack board.
The push-button is wired to the battery. When pressed, the other side of the push button signals
both body-board microcontroller, and (if Vector is off) the charger chip to connect power. The
theory of operation will be discussed further in the body-board section below.
The 74AHC164 serial-shift-register is used as a GPIO expander. It takes a clock signal and serial
digital input, which are used to control up to 8 outputs. The inputs determine the state of 8 digital
outputs used to control the RGB LEDs.
Each of the 4 MEMS microphones take a clock signal, and provide a serial digital output. The
body-board reads all four microphones by simultaneously. (This will be discussed in the body-
board section).
Figure 8: Possible
LED1
light topology on
Select backpack board
Clock
74AHC164 LED2
Data
LED3
LED4
Select
1. The firmware would send the RGB signals for LEDs 1 and 3, enabling them and disabling
LEDs 2 and 4.
2. Delay
The second possibility is that each LED’s red signal goes to the same signal on 74AHC164; similar
for green and blue. However each LED’s low side is connected to separate signals on 74AHC164.
Figure 9: Another
LED1
possible light topology
Clock on backpack board
Data 74AHC164 LED2
LED3
LED4
Select
This approach takes more work. The process of illuminating the lights in this configuration would
be:
1. The RGB color and light 1 signal enables are sent, illuminating the first light
2. Then the RGB color and light 2 signal enables are sent – but the first light signal is
disabled – illuminating the second light
7
I’d need to physically examine a backpack board. This is the limit of examining the available photos
With either approach, if the switching between the LED’s is done quickly enough – in a short time
interval – the off period isn’t visible. LED’s don’t immediate turn off, rather their brightness
decays over a short period. And the human eye doesn’t perceive short flickers. Although the
lights are “pulse width modulated” – they are turned off a portion of the time, dimming them –
current limiting resistors may have been set to achieve the desired maximum brightness for the
fastest multiplexing time.
The body-board controller can dim the brightness of the LEDs further but choosing larger numbers
of time slots to not illuminate a light.
ESD Protection
Battery
Reverse Switch Touch
Charging
Polarity
Terminal
Protection 74AHC164
44RGB
4 RGB
RGB
LEDs
LEDs
Vpwr LEDs
MP2617B
Charger
44MEMs
4MEMs
MEMs
Microphones
Microphones
Microphones
Regulator Thermistor
Backpack Board
ADC
GPIO
GPIO
ADC
Head-Board SPI
STM32F030 PWM/ 44Motor
Communication 4Motor
Motor 4 4Motors
UART Microcontroller GPIO Drivers
Drivers 4Motors
Motors
Drivers
(system controller)
4 Optical
I2C /
Counter
SPI
I2C
GPIO 4shaft
4Motors
Motors
4 Surface Encoders
Proximity
44Motors
Motors
Sensors
Time of Flight
Sensor
battery switch Used to disconnect the battery to support off-mode (such as when stored) and to reconnect the
battery with a button press.
charging pad Two pads on the bottom are used to replenish the energy in the battery pack from the dock.
motor driver There are four motor drivers, based on an H-bridge design. This allows a motor to be driven
forward and backward.
motors There are four motors with to measure their position and approximate speed. One motor
controls the tilt of the head assembly. Another controls the lift of his arms. Two are used to
drive him in a skid-steering fashion.
MP2617B charger The Monolithic Power Systems MP2617B serves as the battery charger. It provides a state of
charge to the microcontroller. It also directs power from the charging pads to the rest of the
system while the robot is on the charging dock.
optical shaft encoder A Sharp GP1S092HCPIF opto-coupler, in conjunction with a slotted disc on a motor’s shaft, is
used to measure the amount a shaft has turned, and its speed.
regulator A 3.3v regulator is used to supply power to the microcontroller and logical components.
reverse polarity Protects the circuitry from energy being applied to the charging pads in reverse polarity, such
protection as putting Vector onto the charging pads in reverse.
STM32F030 The “brains” of the body-board, used to drive the motors, and RGB LEDs; to sample the
microcontroller microphones, time of flight sensor, proximity sensor, temperature, and the touch sense;, and
monitoring the battery charge state. It communicates with the head-board.
surface proximity 4 infrared proximity sensors are used to detect the surface beneath Vector – and to detect drop
sensors offs (“cliffs”) at the edge of his driving area and to follow lines.
thermistor A temperature sense resistor used measure the battery pack temperature; it is used to prevent
overheating during recharge.
VL53L0x time of A ST Microelectronics VL53L0x time of flight sensor is used to measure distance to objects in
flight sensor front of Vector. This sensor is connected by I2C.
11.1.1 Protections
The charging pads have reverse polarity protection.
The MP2617B has an over-current cut off. If the current exceeds ~5A (4-6A), the battery will be
disconnected from the system bus. Such a high-current indicates a short. There is no fuse.
The MP2617B has a low voltage cut off. If the battery voltage drops below ~2.4 (2.2-2.7V) the
battery will be disconnected from the system bus (TBD) until the battery voltage rises above ~2.6V
(2.4-2.8V).
The MP2617B has a temperature sense. If the temperature exceeds a threshold, charging is paused
until the battery cools. The temperature sense is not on the battery. It is on the circuit board.
Figure 11: A
Bat+ Vbat
representative battery
connect switch
Backpack
push button
Button Pwr
state Enable
Two MOSFETS (a PFET and NFET) 8 act as a switch. These are in a single package, the
DMC2038LVT. (This part is also used in the motor drivers.)
When the system is in an off state, the MOSFETs are kept in an off state with biasing
resistors. The PFET’s gate is biased high with a resistor. The NFET gate is biased low, to
ground. There is no current flow. Two MOSFETS are needed due to internal body diodes.
The PFET body diode would allow current to flow from the battery (from the source to the
drain). However, this current is blocked by the NFET body diode, which has a different
polarity
The push button can wake the system. When the button is closed, the battery terminal
(Bat+) is connected to the gate of the NFET, turning it on. A second NFET is also
energized, pulling the PFET gate to ground, turning it on as well. When the button is open,
Bat+ is not connected to anything, so there is no leakage path draining the battery.
To keep the system energized when the button is open, the STM32F030 MCU must drive
the Pwr Enable line high, which has the same effect as the button closed. The gate
threshold voltage is 1V, well within the GPIO range of the MCU.
The MCU can de-energize the system by pulling Pwr Enable line low. The switches will
open, disconnect the battery.
The MCU needs to be able to sense the state of the button while Pwr Enable is pulled high.
The MCU can do this by sampling the Button State signal. This signal is isolated from
from Pwr Enable by a large resistor and pulled to ground by smaller resistor. This biases
the signal to ground while the button is open.
This circuit also provides reverse polarity protection. It will not close the switch if the battery is
connected backwards.
11.1.3 Charging
The charging station pads are connected to a MP2617B charger IC thru a reverse polarity charging station pads
protection circuit. The reverse polarity protection9 is a DMG2305UX PFET in a diode
8
Q11 and/or Q12
9
Q14
Figure 12: A
Charger+ To MP2617
representative PFET
based reversed
polarity protection
The MP2617B internally switches the charger input voltage to supply the system with power, and supplying power from
to begin charging the battery. This allows the charger to power the system whenever the robot is the charging station
in the charging station, even when the battery is depleted, or disconnected.
The presence of the dock power, and the state of MP2617B (charging or not) are signaled to the
microcontroller.
The charger goes through different states as it charges the battery. Each state pulls a different charging states
amount of current from the charging pads and treats the battery differently.
3V Battery
Voltage
Time
The basic idea is that the charger first applies a low current to the battery to bring it up to a constant current
threshold; this is called prequalification in the diagram. Then it applies a high current, call constant voltage
constant current. Once the battery voltage has risen to a threshold, the charger switches to
constant voltage, and the current into the battery tapers off. I refer to the data sheet for more detail.
The MP2617B measures the battery temperature by proxy using a thermistor on the PCBA. If the
temperature exceeds a threshold, charging is paused until the battery cools. The microcontroller
also samples this temperature.
The MP2617B supports limiting the input current, to accommodate the capabilities of external input current limits
USB power converts. There are four different possible levels that the IC may be configured for:
2A is the default limit, 450mA to support USB2.0 limits, 825mA to support USB3.0 limits, and a
custom limit that can be set by resistors. The input limit appears to be set for either default (up to
~2A input), or a programmable input.
Commentary. In my testing, using a USB battery pack charging pulls up to 1A during the Higher charge rates
constant current, then falls off to 100mA-200mA during constant voltage, depending on the are acceptable
With larger batteries this would be too high. Battery cells are normally charged at no more than a
“1C” rate – e.g. the battery maximum charge rate “should” be 320mA at max. Vector’s battery can
be charged at a rate higher than 1C. Heat is what damages batteries. This battery’s low internal
resistance doesn’t produce as much heat; and its large surface to volume ratio lets it shed heat.
11.1.4 Brown-out
The motor stall current is enough to cause Vector to brown-out and shut down unexpectedly. motor stall & brown
This indicates two possible mechanisms: out effects
If the system browns out the STM32F030, the MCU will no longer hold the power switch
closed, and the system power will be disconnected.
If the current exceeds a threshold, the MP2617B will disconnect power to the system. This
threshold is very high – ~5A – and is unlikely to ever be encountered in operation.
Commentary: It may be interesting to modify either the MCU’s Vdd to have a larger retaining
capacitor, or to add a current limiting mechanism for the motors, such as an inline resistor.
10
Other reports suggest up to 2As into the battery, possible with the use of high-power USB adapters intended to support tablet
recharge. As a preventative measure, I have a current limiter between my USB power adapter and Vector’s charging dock. 1Ω on the
USB power. I tried 1Ω -14Ω; these should have limited the current to 1A and 500mA respectively. Instead, Vector would only pull
40mA - 370mA; in many cases, not enough to charge. Most likely the resistor acted as a part of resistive divider and undermined the
chargers feedback loops.
2 SPI to LED outputs. Uses an clock and data line to send the state to the LEDs.
5 SPI from microphones – an SPI MCLK to clock out, a timer divider (in and out), and 2
MISO to receive state of the data from the microphones.
Note: The microcontroller does not have an external crystal 11 and uses an internal RC oscillator
instead.
The body-board likely also provides RS232-style bidirectional communication that can be used
issue commands, query results, and store calibration and serial number information.
STM32 Readout-protection is set to the highest level in the microcontroller. This is intended to
prevent a SWD-based reading or modification of the firmware (including the boot loader). STM32
processors include a different boot loader from ST as well; this bootloader will crash if any access
to program memory is attempted with the readout protection flags set. It is possible to disable the
read-out protection – but mass erasing the chip in the process – with physical access and SWD
tools.12 To extract the original bootloader will more skilled and invasive techniques.13
Future changes to the body-board firmware will require expertise. The STM32F030 firmware can
be analyzed using the syscon.dfu file (or be extracted with a ST-Link) and disassembled. Shy of
recreating the firmware source code, patches replacing a key instruction here and there with a jump
to the patch, created in assembly (most likely) code to fix or add feature, then jump back.
11
as far as I can see
12
https://stackoverflow.com/questions/32509747/stm32-read-out-protection-via-openocd
13
https://rtfm.newae.com/Capture/ChipWhisperer-Nano/
https://www.cl.cam.ac.uk/~sps32/mcu_lock.html
11.4. SENSING
These sensors work by timing how long it takes for a coded pulse to return. The time value is then
converted to a distance. Items too close return the pulse faster than the sensor can measure. The
measured distance is available to the microcontroller over I2C.
The microphones take a clock signal as input, and always drive one bit per clock; they have no
chip select. Two microphones can share a single data line. We’ll refer to them as “left” and
“right” here.
PDM Gnd
Right Data out Modulator
Pulling the left microphone’s “left/right” signal low will configure it to emit the data bit while the
PDM clock is low. It does not drive the data line when the clock is high. Similarly, pulling the
SPI, however, only receives data bits on the clock’s falling transition– not the rising edge. The
trick is to run the SPI clock at twice the frequency of the PDM clocks, so that the SPI clock’s first
transition low is for the left microphone bit, and the second transition low is for the right
microphone. This is done by dividing the SPI clock by two to produce the PDM clock to the
microphones:
Figure 15:
SPI Clock Microphone clock and
signals
PDM Clock
Data Out
The received data bits (in each byte) will alternate between the left and right microphones, and will
need to be separated and converted by firmware. The SPI peripheral along with a DMA can be
configured to clock in large batches of bytes into a buffer for further processing.
Dividing the clock by two can be performed by a timer built into the STM32. The SPI clock signal
is connected to the input of an STM32 timer (TIMxCHIN). The timer is configured to use an external
input clock source, and generate an output after a divide by two. The output of the timer
(TIMxCHOUT) can then be used as the clock for the PDM microphones (and 74AHC164 GPIO
expander).
The clock rates have a limited range on the body board. PDM MEMS microphones clock rates
must be in the range 1 MHz to 3.25MHz. (The products are pretty consistent about this range.)
The SPI clock rate is 2x that PDM’s clock, so the SPI clock rate must be in the range of 2MHz to
6.5MHz. The ST processor’s clock is 48MHz, and its SPI clock must be this frequency divided by
a power of two. This means there are only two possibilities: A 32:1 divider gives an SPI clock
frequency of 6 MHz, and A 16:1 divider gives a clock rate of 3 MHz.
PDM Gnd
Right Data out Modulator
Vdd
Left/Right
Left Data out PDM
SPI2
Modulator
PDM Clock
Left/Right
PDM Gnd
Right Data out Modulator
11.5. OUTPUTS
The 74AHC164 does not share a clock with the PDM’s microphones. The data and clock could be
bit-banged – it’s only 8 bits and there are little timing requirements – or they could be driven by an
SPI peripheral.
Note: care must be taken so that an extra clock edge isn’t received by the 74AHC164. (For
instance, during body board initialization.) There is no synchronization to indicate the first bit of
the 8 bits sent to the 74AHC164.
Each side of the H-bridge based on the DMC2038LVT, which has a P-FET and N-FET in each
package. Two of these are needed for each motor.
The MCU (probably) independently controls the high side and low side to prevent shoot thru. This
is done by delaying a period of time between turning off a FET and turning on a FET.
The motors can be controlled with a control loop that takes feedback from the optical encoder to
represent speed and position. The firmware must take care to prevent burn out if they have been
stalled at full power for 15 seconds or more.
11.6. COMMUNICATION
The body-board communicates with the head-board via RS-232 3.3V (3 Mbits/sec14). As the MCU
does not have a crystal, there may be communication issues from clock drift at extreme
temperatures; since Vector is intended for use at room temperature, the effect may be negligible.
The firmware can be updated over the serial communication by the head-board.
14
Value from analyzing the RAMPOST and vic-switchboard programs. Melanie T measured it on an oscilloscope and estimated it to be
2Mbps.
2 E2 HENCB Yellow
3 E1 HENCA Green
2 E2 B Yellow
3 E1 A Green
Everlight EAAPMST3923A2
Monolithic Power, MP2617A, MP2617B 3A Switching Charger with NVDC Power Path
Management for Single Cell Li+ Battery, Rev 1.22 2017 Jun 29
https://www.monolithicpower.com/pub/media/document/MP2617A_MP2617B_r1.22.pdf
Panda, a data sheet for a similar single-cell lithium battery
https://panda-bg.com/datasheet/2408-363215-Battery-Cell-37V-320-mAh-Li-Po-303040.pdf
Sharp GP1S092HCPIF Compact Transmissive Photointerrupter, 2005 Oct 3
https://datasheet.lcsc.com/szlcsc/Sharp-Microelectronics-GP1S092HCPIF_C69422.pdf
ST Microelectronics, STM32F030x8, Rev 4, 2019-Jan
https://www.st.com/resource/en/datasheet/stm32f030c8.pdf
Accessory Electronics
Design Description
This chapter describes the electronic design of the Anki Vector accessories:
The charging station
The habitat (Vector space)
The companion cube
Terminals to charge
Vector
The charging station has a USB cable that plugs into an outlet adapter or battery. The adapter or
battery supplies power to the charging station. The base of the station has two terminals to supply
+5V (from the power adapter) to Vector, allowing him to recharge. The terminals are offset in
such a way to prevent Vector from accidentally being subject to the wrong polarity. Vector has to
be backed into charging station in mate with the connectors. Vector face-first, even with his arms
lifted, will not contact the terminals.
The charging station has an optical marker used by Vector to identify the charging station and its
pose (see chapter 20).
Charging
Terminals
There seems to be some references to the habitat in the behavior tree, and in the developer
visualization tools to habitat. It is possible that they created or were creating the ability for
Vector to recognize the habitat and adjust his behaviors. The bottom of the habitat is dark, but
with a thick white line around the perimeter near the edge. The line likely serves as a signal to
Vector to turn away before running into the edge, or to drive along. It may be detected by Vector’s
cliff sensors.
15. CUBE
The companion cube is a small toy for Vector play with. He can fetch it, roll it, and use it to pop-
wheelies. Each face of the cube has a unique optical marker used by Vector to identify the cube
and its pose (see chapter 18).
Removable panel, to
access battery
Although the companion cube is powered, this is not used for localization or pose. The electronics
are only used to flash lights for his human companion, and to detect when a person taps, moves the
cube or changes the orientation.
The cube has holes near the corners to allow the lift to engage, allowing Vector to lift the cube.
Not all corners have such holes. The top – the side with the multicolour LEDs – does not have
these. Vector is able to recognize the cubes orientation by symbols on each face, and to flip the
cube so that it can lift it.
The electronics in the cube are conventional for a small Bluetooth LE accessory:
VDD
DA14580
PWM /
SPI /
GPIO
I2C
Accelerometer
4 RGB
LEDs
battery The cube is powered by a 1.5 volt N / E90 / LR1 battery cell.15
crystal The crystal provides the accurate frequency reference used by the Bluetooth LE radio.
Dialog DA14580 This is the Bluetooth LE module (transmitter/receiver, as well as microcontroller and protocol
implementation).
EEPROM The EEPROM holds the updatable application firmware.
RGB LEDs There are 4 RGB LEDs. They can flash and blink. Unlike the backpack LEDs, two LEDs can
have independent colors.
If the application passes control back to the boot loader – or there isn't a valid application in
EEPROM –a new application can be downloaded. The boot loader uses a different set of services
and characteristics to support the boot loading process.
Paul digs into emulating Vector’s cube and identifies elements of the protocol.
15
The size is similar to the A23 battery, which will damage the cube’s electronics.
Basic Operation
This part provides an overview of Vector’s software design.
THE SOFTWARE ARCHITECTURE. A detailed look at Vector’s overall software architecture and
main modules.
BASIC INPUT AND OUTPUT. A look at push button, touch sensing, surface proximity sensors,
time of flight proximity sensing, and backpack LEDs.
Architecture
This chapter describes Vector’s software architecture:
The architecture
The emotion-behaviour system
The communication infrastructure
Internal support
Video
Inputs
Intent /
frame
Animations
Motors,
Emotion Motion
Behaviour LEDs,
Inputs State trajectory
Engine LCD,
Engine generator
Sound
Emotion state
Behaviour results
Fast control loops — to respond quickly — are done on the Vector’s hardware. Speech
recognition, natural language processing – very processing items – are sent to the cloud. Face
recognition, and training for faces are not sent to the cloud.
Vector is built on a version of Yocto Linux. Anki selected this for a balance of reasons: some explored in Casner,
form of Linux is required to use the Qualcomm processor, the low up front (and no royalty) and Wiltz
costs, the availability of tools and software modules. Qualcomm pushes the Android stack of tools
in particular for their processors. The Qualcomm is a multi-processor, with four main processing
cores and a GPU. Vector runs a handful of different application programs, in addition to the OS’s
foundational service tasks and processes.
Vic-dasmgr
(stats &
diagnostic data)
Cloud
Vic-Cloud
Camera Vic-Engine (preferences,
audio for NLP)
Screen Vic-Anim
Mobile App
& Python
Vic-Gateway
SDK
Vic-Robot applications
Within the each vic- server processes, there are one or more event-driven communication threads.
A thread likely has the following basic structure:
The communication threads have an input message queue. On Vector these include
The communication thread blocks on one or more message queue events. It wakes when there is
an incoming event/message, or there has been an error or timeout while waiting. When it wakes, it
dequeues the message, takes action and goes back to waiting. It may post messages (or other
signals) to other threads, possibly indirectly as a result of framework/library/system calls.
Within a server process, convenient C++ data structures are used. The vic- servers also use CLAD,
and JSON data structures, and include many helper procedures to convert between the two. It
appears that a process interprets and generates a JSON data structure. To communicate with
another process, it converts the JSON to a CLAD (since it is a contiguous span of bytes), sends that
to the other process; the other process reverses the process, converting it JSON and using that
interpret the message.
Animation
Animation
Engine
There are many similar terms used within Vector’s AI model, but there are subtle distinctions
between them:
· An AI Feature is the high level behaviors as a person would experience. There are about
70 of these. Note the name shouldn’t be confused with a feature flag or feature toggle;
that is a different concept, for software elements that are not ready yet, but included in the
code base.
· An action is like a mini-behavior, with some differences. Multiple actions can run at a
time – so long as they don’t use the same resources– but only one behavior can run at a
time. Actions can wait in a queue.
· An animation is a scripted motion, sound, light pattern, and/or facial animation (or picture
on the display) that Vector carries out. Behaviors and actions can initiate animations. The
animation engine selects the specific animation, from a pool of alternatives, based on
context and current emotional state. An animation can’t use the sensors, so it can’t adapt
to the environmental conditions. For instance, to drive up to a hand (or a cube) requires the
time of flight sensor; so an action is required.
Most of the partitions on the flash storage are not modifiable – and are checked for authenticity
(and alteration). These partitions hold the software and assets as delivered by Anki (and
Qualcomm) for a particular release of the firmware. They are integrity checked as part of the start
procedure. (See Chapter 7 for a description.)
Data that is specific to the robot, such as settings, security information, logs, and user data (such as
pictures) are stored in modifiable partitions. Some of this data is erased when the unit is “reset” to
factory conditions
This information is not modified after manufacture; it persists after a device reset or wipe.
This mode is never intended to be seen outside of the factory, so little is known. Only a couple
The robots were made “in big batches in July/August, and they didn't start coming back [to
customer service] until January/February,” when Anki would “put the fixes into the next big batch
the upcoming year.”
The images that Vector sees during these tests are kept with unit. This way, if the unit is returned
later with a vision-related problem, the images from the manufacturing are there to see if, as part of
the manufacturing record for analysis of returned products, “we can go back to those images and
see if it's a new problem or was always there.”
There is also a sound booth that checked that his speaker was working properly and did not exceed
limits.
16
https://forums.anki.com/t/any-one-know-what-error-code-50-is/40891
Photographs taken by Vector are not sent to (nor stored in) a remote server. They are
stored in encrypted file system, and only provided to authenticated applications on the
local network. Each photograph can be individually deleted (via the mobile application).
The image stream from Vector’s camera is not sent to a remote server. It is only provided
to authenticated applications on the local network.
The data used to recognize faces17 and the names that Vector knows are not sent to (nor
stored in) a remote server. The information is stored in an encrypted file system. The list
of known faces (and their names) is only provided to authenticated applications on the
local network. Any facial recognition data not associated with a name is deleted when
Vector goes to sleep. Facial data associated with an individual name can be deleted (along
with the name) via the mobile application.
“[After] you say the wake words, “Hey Vector”, Vector streams your voice command to
the cloud, where it is processed. Voice command audio is deleted after processing. Text
translations of commands are saved for product improvement not associated with a user.”
The audio stream from the microphone — if it had been finished being implemented –
would have been provided to authenticated applications on the local network.
Information about the owner can be erased using the Clear User Data menu option.
Control of the robots movement, speech & sound, display, etc. is limited to authenticated
applications on the local network.
Vector’s software is protected from being altered in a way that would impair its ability to secure
the above. At the high level, this is done by requiring signed software files, and a signed file
system that is checked for alteration. The protections extend all the way to low-level electronics,
where the JTAG access fuses are blown, so that extracting or modifying RAM, flash or other data
can not be done. (Anki did this as a matter of standard operating procedure on all electronic
products.)
17
The Anki privacy and security documents logically imply that the face image is not sent to Anki servers to construct a recognition
pattern. There are no communication structures to send images to the cloud.
When the microphone is actively listening, it is always indicated on the backpack lights
(blue).
The microphone is enabled by default, but only listening for the wake word, unless
Vector’s microphone has been disabled.
When the camera is taking a picture (to be saved), Vector makes a sound
Unless the backpack lights are all orange, the WiFi is enabled. (All orange indicates it is
disabled.)
Android boot loaders typically include a few powerful (but unchecked) bits that disable the
signature checking, and other security features. These bits typically are set either thru commands
to the firmware during boot up, by applications, or possibly by hack/exploit. Sometimes this
requires disassembling the device and shorting some pins on the circuit board.
Vector doesn't support those bits, nor those commands. Signature checking of the boot loader,
kernel and RAM disk can't be turned off.
Note: the OTA software has a “dev” (or development) set of OTA packages. Those packages are
not the same; they are essential software release candidates being pushed out for test purposes.
The audio stream from the microphone — if it had finished being implemented
The files are cleanly spaced, not in the most compact minimized size
The JSON parser supports comments, which is not valid JSON. Many files have
comments in them. Many have sections of the configuration that are commented out.
/anki/etc/config/platform_config.json
This path is hardcoded into the vic-dasmgr, and provided in the editable startup files for vic-anim
and vic-engine. The configuration file contains a JSON structure with the following fields:
When describing the configuration and asset files, a full path will be provided. When the path is
constructed from different parts, the part that is specified in another configuration or binary file
will be outlined. The path to a settings file might look like:
/anki/assets/cozmo_resources/ config/engine/settings_config.json
The path leading up to the settings file (not outlined in red) is specified in an earlier configuration
file, usually the platform configuration file described above.
Camera
Vic-robot
Body
Debounce UART board
Frame buffer
LCD display
/dev/fb0 SPI
The LCD is connected to the MPU via an SPI interface (/dev/spidev1.0). The frame buffer
(/dev/fb0) is essentially a buffer with metadata about its width, height, pixel format, and
orientation. Application modifies the frame buffer by write() or memmap() and modifies the bytes.
Then the frame buffer has the bytes transfer (via SPI) tot the display.
vic-anim employs a clever screen compositing system to create Vector’s face (his eyes), animate
text jumping and exploding, and small videos, such as rain or fireworks.
The vic-faultDisplayCode and Customer Care Information Screen of vic-anim have a visual
aesthetic is unlike the rest of Vector. These modes employ a barebones system for the display.
Not sure if the transfer is in a driver, in the kernel, or in user space... or which process would have
it in user space.
IMU
dev/socket/vic-engine-cam_client0
Vertical sync
Camera mm-anki-camera
MIPI
The camera’s vertical synchronization signal is connect to the interrupt line on IMU, triggering Daniel Casner, 2019
accelerometer and gyroscope sampling in sync with the camera frame. The vision is used as a Embedded Vision
navigation aid, along with the IMU data. The two sources of information are fused together in Summit
the navigation system (see chapter 19) to form a more accurate position and relative movement
measure. The image must be closely matched in time with the IMU samples. However the
transfer of the image from the camera to the processor, then thru several services to vic-engine
introduces variable or unpredictable delays. The camera’s vertical sync – an indication of when
the image is started being sampled – is used to trigger the IMU to take a sample at the same time.
The camera is also used as an ambient light sensor when Vector is in low power mode (e.g.
napping, or sleeping). In low power mode, the camera is suspended and not acquiring images.
Although in a low power state, it is still powered. The software reads the camera’s auto
exposure/gain settings and uses these as an ambient light sensor. (This allows it to detect when
there is activity and Vector should wake.)
Startup
This chapter describes Vector’s start up and shutdown processes:
The startup process
The shutdown steps
22. STARTUP
Vector’s startup is based on the Android boot loader and Linux startup.18 These are otherwise not
relevant to Vector, and their documentation is referred to. The boot process gets quite far before
knowing why it booted up or being able to response in complex fashion.
1. The backpack button is pressed, or Vector is placed into the charger. This powers the body
board, and the head-board.
a. The body-board boot loader checks the application for validity, using a private key.
The application is run only if it passes the integrity checks.
2. The body-board displays an animation of the backpack LEDs while turning on.
a. If turned on from a button press and the button is released before the LED segments
are fully lit, the power will go off.
b. If the button is held down – for about 5 seconds – the head-board will have reach a
point in its boot process to direct the body-board to keep the battery switch closed.
c. If held for 15 seconds, the body-board will hold is TX line – the head-boards RX line
– low during the boot process. This tells the system to boot into recovery mode.
3. While the head-board boots, the body-board performs several self tests. These include
checking that the microcontroller can communicate with the 4 cliff (surface proximity)
sensors, and the time of flight sensor.
1. “Qualcomm’s Primary Boot Loader is verified and loaded into [RAM] memory19 from Nolen Johnson
BootROM, a non-writable storage on the SoC. [The primary boot loader] is then
executed and brings up a nominal amount of hardware,”
2. The primary boot loader checks to see if a test point is shorted on the board, the unit will Roee Hay
go into emergency download (EDL) mode. It is known that when F_USB pad on the
head-board is pulled to Vcc, USB is enabled; this may be the relevant pin.
18
An ideal embedded system has a fast (seemingly instant) turn on. Vector’s startup isn’t fast. The steps to check the integrity of the
large flash storage – including checking the security signatures – and the complex processes that Linux provides each contribute to the
noticeable slow turn on time. Checking the signatures is inherently slow, by design.
19
The boot loader is placed into RAM for execution to defeat emulators.
4. If the secondary boot loader does not pass checks, the primary boot loader will go into
emergency down load mode.
5. “The next boot loader(s) in the chain are SBL*/XBL (Qualcomm’s Secondary/eXtensible
Boot Loader). These early boot loaders bring up core hardware like CPU cores, the MMU,
etc. They are also responsible for bringing up core processes .. [for] TrustZone. The last
purpose of SBL*/XBL is to verify the signature of, load, and execute aboot/ABL [Android
boot loader].”
The Android boot loader (aboot) is stored on the “ABOOT” partition.
The secondary bootloader also supports the Sahara protocol; it is not known how to
activate it.
Note: The keys for the boot loaders and TrustZone are generated by Qualcomm, with the public
keys programmed into the hardware fuses before delivery to Anki or other customers. The signed
key pair for the secondary boot loader is not necessarily the same signed key pair for the aboot.
They are unique for each of Qualcomm’s customer. Being fuses they cannot be modified, even
with physical access.
a. On other Android devices, aboot reads the DEVINFO partition for a structure. It Roee Hay
checks the header of the structure for a magic string (“ANDROID-BOOT!”) and
then uses the values within the structure to indicate whether or not the device is
unlocked, whether verity-mode is enabled or disabled, as well as a few other settings.
By writing a version of this structure to the partition, the device can be placed into
unlock mode.
b. “The build system calculates the SHA256 hash of the raw boot.img and signs the Qualcomm LM80
hash with the user’s private key… It then concatenates this signed hash value at the P0436
end of raw boot.img to generate signed boot.img.”
c. “During bootup, [Aboot20] strips out the raw boot.img and signed hash attached at the
end of the image. [Aboot] calculates the SHA256 hash of the complete raw boot.img
and compares it with the hash provided in the boot.img. If both hashes match, kernel
image is verified successfully.”
2. ABoot can either program the flash with software via boot loader mode, or load a kernel.
The kernel can be flagged to use a recovery RAM disk or mount a regular system.
3. If recovery mode, it will load the kernel and file systems from the active RECOVERY
partitions.
20
The Qualcomm document speaks directly about Little Kernel; ABoot is based on Little Kernel.
b. The RX signal from the body-board may be held low when aboot starts,
indicating that the operator has held the button and wishes to initiate recovery
mode.21 If this is the case, “anki.unbrick=1” is prepended to the command line
passed to the kernel.
4. ABoot loads the kernel and RAM file system from the active “BOOT” partition and passes it
command line to perform a check of the boot and RAM file system the signatures.22 The
command line is stored in the header of the boot partition; it is checked as part of the
signature check of the boot partition and RAM file system. If the ABoot is compiled for a
developer robot, it will add an “anki.dev” to the command line.
Many of these elements will be revisited in Chapter 31 where updating aboot, boot, and system
partitions are discussed.
anki.dev This is set to confirm (to the linux system) that this robot is a development
robot and can run development software systems.
dm= The dm-verity command line used to verify the system file system
21
The body-board may body-board a resets/restarts the head-board so that the bootloader runs again.
22
The check specifies the blocks on the storage to perform a SHA256 check over and provides expected signature result.
Command
line has
“anki.dev”? No
Command
line has
Yes No
“dm=”?
Yes
Tell RAMPOST
Set up dm-verity
to cut power
1. The RAM file system contains primarily of two programs: init and /bin/rampost. init is a
shell script and the first program launched by the kernel. This script turns on the LCD, its
backlight and initiate communication with the body-board. (These occur ~6.7 seconds
after power-on is initiated).
a. rampost initializes the LCD, clearing the display. It also shows a start up screen on
the display of developer units.
b. rampost will perform a firmware upgrade of the body-board if its version is out of
date. It loads the firmware from syscon.dfu (Note: the firmware in the body-board is
referred to as syscon.)
c. rampost checks the battery voltage, temperature and error flags. It posts any issues to
/dev/rampost_error
2. Next, init performs a signature check of the system partition to ensure integrity. This is
triggered by the command line which includes dm-verity options prefixed with “dm=”. If
the system does not pass checks, init fails and exits.
a. Note: none of the file systems in fstab marked for verity checking, so this is the only
place where it is performed.
3. The main system file-system is mounted and launches the main system initialization.
2. The encrypted user file system is checked and mounted (via the mount-data service). This
file system is where the all of the logs, people’s faces, and other information specific to the
individual Vector are stored. The keys to this file system are stored in the TrustZone in the
MPU’s SOC fuse area. This file system can only be read by the MPU that created it.
a. If “anki.unbrick” is on the command line, the user data partition is not touched;
instead a temporary file system is created and used instead.23 This flag is not
meaningful in the regular system since the bootloader will only launch the recovery
partition software with “anki.unbrick”
b. If the data partition is empty (i.e., erased to clear the user data), the user data
partitions is formatted;
3. The MPU’s clock rate is limited to 533Mhz, and the RAM is limited to 400MHz to prevent
overheating.
c. The time client (chronyd), to retrieve network time. (Vector does not have a clock
that keeps time when turned off)
d. init-debuggerd
8. The “Victor Boot Animator” is started (~8 seconds after power on) and shows the sparks
turning into the “V” splash screen on the display.
9. Victor Boot completes ~20.5 after power on, and the post boot services launches
10. The vic-crashuploader service is started to gather crash logs and dump files, some of which
may have been created during a previous boot attempt. These will be uploaded when
internet access is restored.
12. Once the startup has sufficiently brought up enough the next set of animations the sound of
boot
13. VicOS is running ~32 seconds after power on. The boot is complete, and Vector is ready
to play
23
I’m not sure how this would be useful as is with the regular system software. It seems like Vector could boot up, appear like
everything is wiped, and needs to be re-set up… then some time later, Vector would reboot, and appear to be his previous self –
including any misconfiguration that motivated the unbrick the first time.
Hay, Roee. fastboot oem vuln: Android Bootloader Vulnerabilities in Vendor Customizations,
Aleph Research, HCL Technologies, 2017
https://www.usenix.org/system/files/conference/woot17/woot17-paper-hay.pdf
Hay, Roee; Noam Hadad. Exploiting Qualcomm EDL Programmers, 2018 Jan 22
Part 1: Gaining Access & PBL Internals
https://alephsecurity.com/2018/01/22/qualcomm-edl-1/
Part 2: Storage-based Attacks & Rooting
https://alephsecurity.com/2018/01/22/qualcomm-edl-2/
Part 3: Memory-based Attacks & PBL Extraction
https://alephsecurity.com/2018/01/22/qualcomm-edl-3/
Part 4: Runtime Debugger
https://alephsecurity.com/2018/01/22/qualcomm-edl-4/
Part 5: Breaking Nokia 6's Secure Boot
https://alephsecurity.com/2018/01/22/qualcomm-edl-5/
Johnson, Nolen; Qualcomm’s Chain of Trust, Lineage OS, 2018 Sept 17
https://lineageos.org/engineering/Qualcomm-Firmware/
A good overview of Qualcomm’s boot loader, boot process, and differences between versions
of Qualcomm’s process. Quotes are slightly edited for grammar.
Nakamoto, Ryan; Secure Boot and Image Authentication, Qualcomm , 2016 Oct
https://www.qualcomm.com/media/documents/files/secure-boot-and-image-authentication-
technical-overview-v1-0.pdf
Qualcomm, DragonBoard™ 410c based on Qualcomm® Snapdragon™ 410E processor Little
Kernel Boot Loader Overview, LM80-P0436-1, Rev D, 2016 Jul
lm80-p0436-1_little_kernel_boot_loader_overview.pdf
https://github.com/ alephsecurity
A set repositories researching tools to discover commands in Sahara and EDL protocols
https://github.com/openpst
A set of repositories researching and implementing an interface to the Sahara protocol.
Power management
This chapter describes Vector’s power management:
The battery management
Load shedding
Charger info
Battery
Battery
Level is
Connected? No unknown
Yes
On Charger?
No
Yes
BatteryLevel
The BatteryLevel enumeration is used to categorize the condition of the Vector battery:
Table 19:
Name Value Description
BatteryLevel codes24
BATTERY_LEVEL_FULL 3 Vector’s battery is at least 4.1V as they apply to
Vector
BATTERY_LEVEL_LOW 1 Vector’s battery is 3.6V or less; or if Vector is on the
charger, the battery voltage is 4V or less.
BATTERY_LEVEL_NOMINAL 2 Vector’s battery level is between low and full.
BATTERY_LEVEL_UNKNOWN 0 If the battery is not connected, Vector can’t measure its
battery.
The current battery level and voltage can be requested with the Battery State command (see
Chapter 14, section 49.2 Battery State). The response will provide the current battery voltage, and
interpreted level.
For Vector, a fuel gauge would given him smarts about knowing he will need to plan to return
home, or is getting low. His hardware doesn’t have a coulomb counter, for a variety of reasons.
Any effort, beyond simple battery voltage, to estimate the remaining play time would have to be
based on software and tracking the battery performance.
24
The levels are from robot.py
Depending on the state of the battery – and charging – Vector may engage in behaviours that
override others.
Disconnect
Level too low?
Yes battery
No
Level low
enough to seek Done
charger? No
Yes
No
Queue high
priority task to
seek charger
If his power is low, Vector will launch a behavior to seek the charger out, and recharge. If he is
stuck, his behaviors will have him cry out.
If Vector is unable to dock (or even locate a dock) he sheds load as he goes into a lower state:
If the body board is overheated, a flag in the HTTPS API RobotStatus bit mask is set (see
Chapter 14, section 42.1.2 RobotStatus Note: this is speculated, not proven.
At some point past 90C, Vector starts a clean shut down (see earlier). The software in the
head is idle, and turns off as many peripherals (e.g. WiFi, display, etc.) with “the goal to
save enough power in the head to let the chip cool off, so we could continue driving
home.”
If the APQ8009 processor is hot, it will throttle its clocks. If the MP2617B charging chip
is reaching the thermal limits related to charging, it will throttle the charging.
The battery overheated icon is displayed by vic-faultCodeDisplay, which has a hard coded path to
the icon:
/anki/data/assets/cozmo_resources/config/devOnlySprites/independentSprites/battery_o
verheated.png
Version 1.6 uses very conservative thresholds (to protect the battery) with the intention of follow
up releases fine tuning the thresholds.
Whether Vector is in calm power mode (or not) is reported in the RobotStatus message in the status
field. (See chapter 14 for details.) Vector is in a calm power model if the
ROBOT_STATUS_CALM_POWER_MODE bit is set (in the status value).
The encoders are mostly turned off; they “pulsed at 1% duty cycle and watched for
changes” to detect someone moving Vector around;
Comatose
Deep sleep
Emergency sleep
Asleep, but held in palm
Asleep, on palm
Asleep on charger
Light sleep
Internally Vector tracks this as an amount of time he needs to sleep (sleep_debt_hours, a floating
point number). This increments with activity (and charging), and decrements (at a different rate)
when sleeping.
Behaviors are responsible for requesting that Vector enter a power saving or other sleep state.
24.5. SHUTDOWN
Turning Vector off manually
Vector cannot be turned off via Bluetooth LE, or the local HTTPS SDK access. There are no
exposed commands that do this. Using a verbal command, like “turn off” does not direct Vector to
shut off (disconnect the battery). Instead it goes into a quiet mode. Although it may be possible
for a Cloud command to turn Vector off, this seems unlikely.
However, there is likely a command to automate the manufacture and preparation for ship process.
The shutdown code is logged, and broadcast but not otherwise stored.
24.5.2 Unintentionally
The body-board is responsible for keeping the battery connected. However brownouts, self-
protects when the voltage get to too low, and bugs can cause the battery to be disconnected. The
body board will turn off power if it doesn’t hear from the head-board in a regular fashion. This
could be because of software crash.
25. CHARGING
Vector tracks whether is charging is in process, and how long. The software has some initial
estimates how long before charging is complete. This is similar to the software “fuel gauge.” It
takes some model of the batteries capacity, and typical charging times given that.
The state of the charger is reported in the RobotStatus message in the status field. (See chapter 14
for details.) Vector is on the charger if the ROBOT_STATUS_IS_ON_CHARGER bit is set (in the status
value), and charging if the ROBOT_STATUS_IS_CHARGING bit is set.
Version 1.5 slowed down the charging, to reduce heat, prolonging the battery life.
Additional information about the state of the charger can be requested with the Battery State
command (see Chapter 14, section 49.2 Battery State). The response will provide flags indicating
whether or not Vector is on the charger, and if it is charging. The response also provides a
suggested amount of time to charge the batteries.
Note: the audio sampling will be covered in a later chapter (Chapter 17)
UART
Touch
ADC
Button
GPIO
Surface
Surface
Surface
Proximity Body-Board
Proximity
Proximity
Sensors
Sensors
Sensors
Time of
Flight I2C
The states of the inputs (button, touch, surface proximity and time of flight sensors) are reported in
the RobotStatus message. (See chapter 14 for details.) The button state can be found in the status
field. The button is pressed if the ROBOT_STATUS_IS_BUTTON_PRESSED bit is set (in the status
value).
The surface proximity sensors (aka “cliff sensors”) are used to determine if there is a cliff, or cliff sensors
potentially in the air. The individual sensor values are not accessible. The cliff detection state
can be found in the status field. A cliff is presently detected if the
ROBOT_STATUS_CLIFF_DETECTED bit is set (in the status value).
These measures could potentially distinguish between light touch (e.g. tip of the finger), heavy
touch (e.g. a full palm?), and perhaps even changing touch.
The touch sensor readings can be found in the touch_data field of the RobotStatus message. The
values indicate whether Vector is being touched (e.g. petted).
The touch sensor module produces a JSON structure for internal use:
Vic-Robot
Vic-Spine
UART
Body-Board LEDs
SPI
The software can direct the body-board to illuminate the backpack lights with individually
different colors and brightness’s. The body-board “pulse width modulates” (PWM’s) the LEDs to
achieve different colors and intensities.
PWM3
PWM4
Inertial Motion
Sensing
This chapter describes Vector’s motion sensing:
Measuring motion as feedback to motion control, and allow moving along paths in a
smooth and controlled fashion
Fist-Bump
Detector
Poke
Detector
Being Held
Detector
By blending the accelerometer and gyroscope signals together, they can compensate and cancel
each other’s weaknesses out.
Angular
Velocity
The IMU can be used to detect the angle of Vector’s body. This is important, as the charging
behaviour uses the tilt of the charging station ramp to know that it is in the right place.
By using combinations of high, low pass, and band filters, and looking for signature patterns,
Vector identify the kinds of physical interactions that are occurring.
The taps and pokes may tilt Vector, but will also provide a “frequency” response to the signals that
can be used to trigger on. The movement will change his position quickly and slight in small
distance, but Vector will resume his prior position very quickly.
Fist-bumps are like pokes, except that the lift has already been raised, and most of the frequency
response and motion will be predictable from receiving the bump on the lift.
Being picked up is distinct because of the direction of acceleration and previous orientation of
Vector’s body.
Being held is sensed, in part by first being picked up, and by motions that indicate it is not on a
solid surface.
A similar set of interaction sensing is present with the cube. It can sense that it is being tapped (or
double tapped), picked up, and held. See Chapter 20.
Patent filings (e.g. WO 2019/173321 indicates that Anki had ideas of how this could be extended
to detect riding in a car, and even estimating how fast it is moving.
Communication
This part provides details of Vector’s communication protocols. These chapters describe structure
communication, the information that is exchange, its encoding, and the sequences needed to
accomplish tasks. Other chapters will delve into the functional design that the communication
provides interface to.
COMMUNICATION WITH THE BODY-BOARD. The protocol that the body-board responds to.
Communication
This chapter describes the system of communication system with the devices internal and external
to Vector:
Python SDK
Serial applications
LCD Bluetooth Wifi Stack
Console IMU stack
Body Board:
Offboard Vision
USB Motors, LEDs
Engine
& sensors
Mobile App
Cube
31.3. USB
There are pins for USB on the head board. Asserting “F_USB” pad to VCC enables the port. Melanie T
During power-on, and initial boot it is a Qualcomm QDL port. The USB supports a Qualcomm
debugging driver (QDL), but the readout is locked. It appears to be intended to inject software
during manufacture.
The /etc/initscriptsusb file enables the USB and the usual functionfs adb. It lives in
/sbin/usr/composition/9091 (I think, if I understand the part number matching correctly). This
launches ADB (DIAG + MODEM + QMI_RMNET + ADB)
Vectors log shows the USB being disabled 24 seconds after linux starts. It is enabled only on
development units.
32. BLUETOOTH LE
Bluetooth LE is used for two purposes:
1. Bluetooth LE is used to initially configure Vector, to reconfigure him when the WiFi
changes; and to pair him to with the companion cube accessory. Potentially allows some
diagnostic and customization.
2. Bluetooth LE is used to communicate with the companion Cube: to detect its movement,
taps, and to set the state of its LEDs.
Cube library
libcubeBleClient
libanki-ble
/data/misc/bluetooth/abtd.socket
ankibluethd
/data/misc/bluetooth/btprop
Bluez
Qualcomm
Bluetooth LE
25
The library includes a great deal of built in knowledge of the state of application (“game engine”), animations, and other elements
The three different rates of communication are used between the Cube and Vector:
1. The lowest level is unconnected –the Cube is just sending out advertisements (that is, “a
hello-world I exist”) a modest interval; there isn’t an active Bluetooth LE connection.
2. The next level is background. The application is getting just enough information from the
cube to know its orientation, broad movements (and maybe that it was tapped).
3. The highest update rate is the interactable level. The cube is configured to send much
more responsive information on the cube orientation, sent fast (or sensitive) enough to
detect taps, and tell if the cube is being held. This rate consumes the most power.
The behavior system drives the level interest in the cube. The condition or active behavior
requests a level of service. The request can be temporary, using a timeout, so that if nothing
interesting is detected in a reasonable period, it falls back to the lower rate.
1. WiFi is used to provide the access to the remote servers for Vector’s speech recognition,
natural language processing
2. WiFi is used to provide the access to the remote servers for software updates, and
providing diagnostic logging and troubleshooting information to Anki
4. To provide an interface, on the local network, that the mobile application can use to
configure Vector, and change his settings.
5. To provide an interface, on the local network, that SDK applications can use to program
Vector.
6. To provide interfaces, on the local network, that allow development Vectors (special
internal versions) to be debugged and characterized
Vic-Gateway
libvictor_web_libray
libcubeBleClient
Vic-
Switchboard
Civet Webserver
libcivetweb
Avahi mDNS
server
Connman
/net/connman/service/wifi_..._managed_psk
Qualcomm
WiF
26
All of the software versions include an Anki webserver service systemd configuration file whose executable is missing. The most
likely explanation is that early architecture (and possibly early versions) included this separate server, and that the systemd
configuration file is an unnoticed remnant.
/etc/iptables/iptables.rulesiptables
/etc/iptables/ip6tables.rulesiptables
Is set to block incoming traffic (but not internal traffic), except for:
1. Responses to outgoing traffic
2. DHCP
3. TCP port 443 for vic-gateway
4. UDP port 5353 for mDNS (Avahi)
5. And the ping ICMP
The connman settings – files for accessing known WiFi access points – are stored on the encrypted
file-system /data, in the folder:
/data/lib/connman
The path is hard-coded into connman itself. This folder is created (if it doesn’t exist) by mount-
data when it sets /data up for the robot (such as when it is new or has had its user data erased via
the “Clear User Data” menu). The contents of /var/lib/connman are copied here with each system
start.
advertised Bluetooth LE peripheral name (although spaces are used instead of dashes)
mDNS network name (dashes are used instead of spaces),
the name used to sign certificates, and
it will be the name of his WiFi Access Point, when placed into Access Point mode
A client token is passed to Vector in each of the HTTPS-based SDK commands, and in the
Bluetooth LE SDK Proxy commands. It is generated in one of two ways. One method is by the
Bluetooth LE command (cloud session); the other is by send a User Authentication command (see
Chapter 14 section 50.5 User Authentication). The client token should be saved indefinitely for
future use. It is not clear if the client token can be shared between the two transport mechanisms.
A certificate is also generated by Vector for used with the API and vic-gateway. The certificate is
intended to be added to the trusted SSL certificates before an HTTPS communication session. The
certificate issued by Vector is good for 100 years.
Note: the certificates are invalidated and new ones are created when user data is cleared. Vector is
assigned a new robot name as well.
27
https://groups.google.com/forum/#!msg/anki-vector-rooting/YlYQsX08OD4/fvkAOZ91CgAJ
https://groups.google.com/forum/#!msg/anki-vector-rooting/XAaBE6e94ek/OdES50PaBQAJ
The web-sockets provide access to internal variables and other software state. In some cases
provide points of control. The web-server, esp thru the webdav support, allows files to be
downloaded and uploaded into Vector. This includes the ability to add animation files that can be
tested.
Note: the tool is rumoured to be consume a lot of resources, causing unusual faults to occur on
Vector. It has a small overlap with the functions can be taken via the SDK interface.
Lex
Automatic Speech
Chipper
Recognition &
Handoff
Language
understanding
Houndify
Automatic Speech
Recognition &
Knowledge Q&A
IBM Weather
Weather related
Q&A
For natural language processing, the audio stream (after the “Hey Vector”) is sent to a group of
remote servers for processing. The functions are divided up across several different servers which
can provide specialized services:
Chapter 16 describes the communication with these servers, including the responses that they send
back.
Chapter 17 describes typical natural language processing, and the processing of intents.
Vector has a couple UDP ports open internally; likely this is inherited from libcozmo_engine.
The PyCozmo project has reverse engineered much of Cozmo’s UDP protocol.
Body-board
Communication
Protocol
This chapter describes Vector’s body-board communication protocol.
The messages from the head board to the body-board have the content:
Bootloader updates to the firmware: Entering the bootloader, erasing flash, writing a new
application, and verifying it
Controls for the motors: possible direction and enable; direction and duty cycle; or a target
position and speed.
Power control information: disable power to the system, turn off distance, cliff sensors, etc.
In turn, the body board messages to the head-board can contain (depending on the type of packet):
CLAD
Params ...
Data
Type
...
Frame
Payload
Header
Type
CRC
Size
THE A RS232 SERIAL LINK is the used as the transport. It handles the delivery of the bytes between
the body board and the head board. The data rate: 3 Mbits/sec28
THE FRAME provides information that identifies the start and end of a frame, and error detection. It
also includes the kind of CLAD message that is contained.
THE C-LIKE ABSTRACT DATA (CLAD) is the layer that decodes the messages into values for fields,
and interprets them.
TIMEOUTS. The body-board maintains a timer to detect the loss of communication from the head-
board – perhaps from a software crash. If there body-board does not receive communication
within this timeout period, it will turn off power.
CRC
Size
When the head-board sends messages to the body-board, the header is:
The body-board sends messages in response to commands, and at regular intervals to the head-
board. The header of a message is:
28
Value from analysis of the RAMPOST, vic-robot, and dfu programs.
The payload type is 16 bits. The packet type implies both the size of the payload, and the
contents. If the packet type is not recognized, or the implied size does not match the
passed payload size, the packet is considered in error.
The payload size is a 16 bit number. The maximum payload size is 1280 bytes.
The tag and CLAD payload are passed to the application for interpretation.
2. It sends a serial sequence of the application data using the 6675 16 command.
3. Then the 737416 command is sent to validate the command (including checking its
authenticity using a digital signature), and start the application.
4. The boot-loader sends the results of the check in a 6b6116 response. The head-board
application check results, then if successful,
1. If text characters are received, the body board sends them with the 636416 command to the
head board.
2. The head-board receives these, and buffers them. When it sees a new line or carriage
return, it examines then. If the line starts with a ‘>’ and is followed by a valid 3-letter
command, it will carry out the command. This may include reporting sensed values,
writing the factor calibration values or EMR.
3. If the head-board wishes to send text to display, via the body-boards outgoing serial port, it
uses the 636416 command to send the text characters.
esn
bsv
mot
get
fcc
rlg
The following kinds of messages can be sent from the head-board to the body-board:
646616 32 Data frame. This has all the bits for the LEDs, motor
‘df’ drivers, power controls, etc. Seems to have a sequence
number in it
647316 0 Disconnect the battery, to shutoff the system.
‘ds’
667516 1028 Firmware update frame. Sends a 1024B as part of the
‘fu’ DFU payload. The first 16b is the offset in the program
memory to update; the next 16b are the number of 32-bit
words in the payload to write. (The packet is a fixed size,
so may be padded out)
6D6416 0 Change the mode: enter the boot-loader? start regular
‘md’ reports?
727616 0 Requests the application version. If there is an
‘rv’ application, it responds with a 727616. If there isn’t
application, the boot-loader responds with a 6B6116 with a
0 payload (a NAK).
737416 0 Validate the flash, to check that the newly downloaded
‘st’ program and that it passed signature checks. The boot-
loader sends back a 6B6116 to ACK to indicate that the
firmware passed checks, or NACK that it does not. If
successful, the application is started.
787816 0 Erases the current program memory (the currently
‘xx’ installed image). The boot-loader sends back a 6B6116 to
acknowledge that the erase when it has completed.
38.1. ENUMERATIONS
38.1.2 Motors
The motor indices are:
The parameters for the response message are: {offsets are unknown}
128 640 uint16_t[320] The microphone samples. The size of the message
suggests that it holds 80 samples from each microphones
(4 microphones × 2bytes/sample × 80
samples/microphone == 640 bytes) for the voice activity
detection audio processing.
Some of these bits may have had different meaning in the past, and became unused with body-
board firmware revisions.
Some of these bits may have had different meaning in the past, and became unused with body-
board firmware revisions.
Bluetooth LE
Communication
Protocol
This chapter describes Vector’s Bluetooth LE communication protocol.
Note: communication with the Cube is simple reading and writing a characteristic, and covered in
Appendix F.
The application layer messages may be arbitrarily large. To support Bluetooth LE 4.1 (the version
in Vector, and many mobile devices) the CLAD message must be broken up into small chunks to
be sent, and then reassembled on receipt.
CLAD
encryption and
fragmentation stack
Param1
Param2
Data
Tag
...
RTS
Message
versionl
Data
mode
Encrypt &
Hand
shake
Version Decrypt
Length
Data
1
Fragmentation &
Reassembly
Length
Data
Bluetooth
LE
Characteristic
Data
Control
THE BLUETOOTH LE is the link/transport media. It handles the delivery, and low-level error
detection of exchanging message frames. The frames are fragments of the overall message. The
GUID’s for the services and characteristics can be found in Appendix F.
THE FRAGMENTATION & REASSEMBLY is responsible for breaking up a message into multiple
frames and reassembling them into a message.
THE ENCRYPTION & DECRYPTION LAYER is used to encrypt and decrypt the messages, after the
communication channel has been set up.
THE RTS is extra framing information that identifies the kind of CLAD message, and the version of
its format. The format changed with version, so this version code is embedded at this layer.
THE C-LIKE ABSTRACT DATA (CLAD) is the layer that decodes the messages into values for fields,
and interprets them,
If you connect for the “first time” – or wish to re-pair with him – put him on the charger and press
the backpack button twice quickly. He’ll display a screen indicating he is getting ready to pair.
If you have already paired the application with Vector, the encryption keys can be reused.
The process to set up a Bluetooth LE communication with Vector is complex. The sequence has
many steps:
Connection Request
Connection Response
Nonce
Nonce Response
Challenge
Challenge response
Challenge success
1. The application opens Bluetooth LE connection (retrieving the service and characteristics
handles) and subscribes to the “read” characteristic (see Appendix F for the UUID).
2. Vector sends handshake message; which the application receives. The handshake message
structure is given below. The handshake message includes the version of the protocol
supported.
4. Then the Vector will send a connection request, consisting of the public key to use for the
session. The application’s response depends on whether this is a first-time pairing, or a
reuse.
a. First time pairing requires that Vector have already been placed into pairing
mode prior to connecting to Vector. The application keys should be created (see
section 39.3.1 First time pairing above).
b. Reconnection can reuse the public and secret keys, and the encryption and
decryption keys from a prior pairing
7. Vector will send a nonce message. After the application has sent its response, the channel
will now be encrypted.
8. Vector will send a challenge message. The application should increment the passed value
and send it back as a challenge message.
If the user puts Vector on the charger, and double clicks the backpack button, Vector will usually
send a disconnect request.
Payload
Control
The control byte is used to tell the receiver how to reassemble the message using this frame.
If the MSB bit (bit 7) is set, this is the start of a new message. The previous message
should be discarded.
If the 2nd MSB (bit 6) is set, this is the end of the message; there are no more frames.
The 6 LSB bits (bits 0..5) are the number of payload bytes in the frame to use.
The receiver would append the payload onto the end of the message buffer. If there are no more
frames to be received it will pass the buffer (and size count) on to the next stage. If encryption has
been set up, the message buffer will be decrypted and then passed to the RTS and CLAD. If
encryption has not been set up, it is passed directly to the RTS & CLAD.
1. Set the MSB bit of the control byte, since this is the start of a message.
3. Set the number of bytes in the 6 LSB bits of the control byte
4. If there are no more bytes remaining, set the 2nd MSB it of the control byte.
The application should generate its own internal public and secret keys at start.
crypto_kx_keypair(publicKey, secretKey); Example 2: Bluetooth
LE key pair
The application will send a connection response with first-time-pairing set, and the public key.
After Vector receives the connection response, he will display the pin code. (See the steps in the
next section for when this will occur.)
39.3.2 Reconnecting
Reconnecting can reused the public and secret keys, and the encryption and decryption keys. It is
not known how long these persist on Vector.
Each received enciphered message can be decrypted from cipher text ( cipher, and cipherLen) to the
message buffer (message and messageLen) for further processing:
Each message to be sent can be encrypted from message buffer ( message and messageLen) into
cipher text (cipher, and cipherLen) that can be fragmented and sent:
Tag
If type byte is 4, the version is held in the next byte. (If the type is 1, there is no version
byte).
The next byte is the tag – the value used to interpret the message.
The tag, parameter body, and version are passed to the CLAD layer for interpretation. This is
described in the next section.
File download
File download
...
File download
The log request is sent to Vector. In principal this includes a list of the kinds of logs (called filter
names) to be included. In practice, the “filter name” makes no difference.
Vector response, and if there will be a file sent, includes an affirmative and a 32-bit file identifier
used for the file transfer.
Vector zips the log files up (as a tar.bz2 compressed archive) and sends the chunks to the
application. Each chunk has this file identifier. (Conceptually there could be several files in
transfer at a time.)
The file transfer is complete when the packet number matches the packet total.
The BLE Shell Connect request is sent to Vector. Vector response will include a status code
indicating success or not. If successful a bi-directional stream can be sent.
The client has the option to close the shell connection at any time by sending a BLE Shell
Disconnect request.
Note: The BLE Shell connection requires Version 6 of the BLE protocol to be honored by Vector.
No version of the Vector software has been identified that supports this version.
Requests are from the mobile application to Vector, and responses are Vector to the
application
Disconnect 1116 0
Response 2116 4
40.1.1 Request
The parameters of the request body are:
40.1.2 Response
There is no response.
40.2.1 Request
The request body has no parameters.
40.2.2 Response
The parameters of the response body are:
40.3.1 Request
The request body has no parameters.
40.3.2 Response
The parameters of the response body are:
40.4.1 Request
The parameters of the request body are:
2 varies uint8_t[text text The text to send to the client from the shell.
length]
40.4.2 Response
The parameters of the response body are:
40.5.1 Request
The parameters of the request body are:
2 varies uint8_t[text text The text to send to the shell (server) from the client.
length]
40.5.2 Response
The parameters of the response body are:
40.6.1 Request
The command has no parameters.
40.6.2 Response
There is no response.
40.7.1 Request
The parameters of the request body are:
The application, when it receives this message, should increment the value and send the response
(a challenge message).
40.7.2 Response
The parameters of the response body are:
40.8.1 Request
The command has no parameters.
40.8.2 Response
There is no response.
40.9.1 Command
The parameters of the request body are:
29
https://groups.google.com/forum/#!msg/anki-vector-rooting/YlYQsX08OD4/fvkAOZ91CgAJ
https://groups.google.com/forum/#!msg/anki-vector-rooting/XAaBE6e94ek/OdES50PaBQAJ
40.10.1 Request
The parameters of the request body are:
The application, when it receives this message, should use the public key for the session, and
send a response back.
40.10.2 Response
The parameters for the connection response message are:
The application sends the response, with its publicKey (see section 39.3 Encryption support). A
“first time pairing” connection type will cause Vector to display a pin code on the screen
If Vector is not in pairing mode – was not put on his charger and the backpack button
pressed twice, quickly – Vector will respond. Attempting to enter pairing mode now will
cause Vector to send a disconnect request.
If Vector is in pairing mode, Vector will display a pin code on the screen, and send a nonce
message, triggering the next steps of the conversation.
If a reconnection is sent, the application would employ the public and secret keys, and the
encryption and decryption keys from a prior pairing.
The application may send this to request Vector to close the connection.
40.11.1 Request
The command has no parameters.
40.11.2 Response
There is no response.
40.12.1 Request
There is no direct request.
40.12.2 Response
The parameters of the response body are:
1 4 uint32_t file id
40.13.1 Request
The parameters of the request body are:
40.13.2 Response
It can take several seconds for Vector to prepare the log archive file and send a response. The
response will be a “log response” (below) and a series of “file download” responses.
1 4 uint32_t file id A 32-bit identifier that will be used in the file download
messages.
40.14.1 Request
The parameters for the nonce request message are:
40.14.2 Response
After receiving a nonce, if the application is in first-time pairing the application should send a
response, with a value of 3.
After the response has been sent, the channel will now be encrypted. If vector likes the response,
he will send a challenge message.
40.15.1 Request
The parameters of the request body are:
40.15.2 Response
The response will be one or more “OTA response” indicating the status of the update, or errors.
Status codes >= 200 indicate that the update process has completed. The update has completed the
download when the current number of bytes match the expected number of bytes.
Note: the status codes 200 and above are from the update-engine, and are given in Appendix D.
40.17.1 Request
The parameters of the request body are:
40.17.2 Response
The parameters for the response message are:
40.18.1 Request
The SSH key command passes the authorization key by dividing it up into substrings and passing
the list of substrings. The substrings are appended together by the recipient to make for the overall
authorization key.
40.18.2 Response
The response has no parameters.
40.19.1 Request
The request has no parameters.
40.19.2 Response
The parameters for the response message are:
1 uint8_t version length The number of bytes in the version string; may be 0
version >= 2
varies uint8_t [version version The version string; version >= 2
length]
1 uint8_t ESN length The number of bytes in the ESN string; may be 0
version >= 4
varies uint8_t[ESN ESN The electronic serial number string; version >= 4
length]
1 uint8_t OTA in progress 0 over the air update not in progress, otherwise in
process of over the air update; version >= 2
1 uint8_t has owner 0 does not have an owner, otherwise has an owner;
version >= 3
1 uint8_t cloud authorized 0 is not cloud authorized, otherwise is cloud authorized;
version >= 5
Note: a hex string is a series of bytes with values 0-15. Every pair of bytes must be converted to a
single byte to get the characters. Even bytes are the high nibble, odd bytes are the low nibble.
40.20.1 Request
The request body has no parameters.
40.20.2 Response
The parameters of the response body are:
If successful, Vector will provide a WiFi Access Point with an SSID that matches his robot name.
40.21.1 Request
The parameters of the request body are:
40.21.2 Response
If the Bluetooth LE session is not cloud authorized a “response” message will be sent with this
error. Otherwise the WiFi Access Point response message will be sent.
1 1 uint8_t SSID length The number of bytes in the SSID string; may be 0
2 varies uint8_t[SSID SSID The WiFi SSID (hex string)
length]
1 uint8_t password length The number of bytes in the password string; may be 0
varies uint8_t password The WiFi password
[password
length]
40.22.1 Request
The parameters for the request message are:
1 WEP
2 WEP shared
3 IEEE8021X
4 WPA PSK
5 WPA2 PSK
6 WPA2 EAP
40.22.2 Response
The parameters for the response message are:
1 varies uint8_t[SSID SSID The SSID (hex string) that was deleted
length]
1 uint8_t WiFi state See Table 70: WiFi state enumeration
1 uint8_t connect result version >= 3
40.23.1 Request
The parameters for the request message are:
40.23.2 Response
The parameters for the response message are:
40.24.1 Request
The request has no parameters
40.24.2 Response
The parameters for the response message are:
1 1 uint8_t has IPv6 0 if Vector doesn’t have an IPv6 address; other it does
2 4 uint8_t[4] IPv4 address Vector’s IPv4 address
6 32 uint8_t[16] IPv6 address Vector’s IPv6 address
40.25.2 Response
The response lists the Wi-Fi access points Vector can find. The parameters for the response
message are:
Note: the information in this chapter comes from the protobuf specification files in the python
SDK, from the SDK itself, and some analysis of the mobile application. All quotes (unless
otherwise indicated) are from the SDK.
Bearer BASE64KEY
Content-Type: application/json
30
The protocol was specified in Google Protobuf.
Motion Control
Motion Sensing – how Vector senses that he is moving
Onboarding
Photos – commands to access (and delete) photographs and their thumbnails
Settings and Preferences
Software Updates, used to update Vector’s software – operating system, applications,
assets, etc.
42.1. ENUMERATIONS
42.1.1 ResultCode
The ResultCode enumeration has the following named values:
42.1.2 RobotStatus
The RobotStatus is a bit mask used to indicate what Vector is doing, and the status of his controls.
It is used in the RobotState message. The enumeration has the following named bits (any number
may be set). Note that some bits have two names; the second name is one employed by Anki’s
python SDK.
42.2.1 CladPoint
The CladPoint is used to represent a 2D rectilinear point on an image or in the 2D map. It has the
following fields:
42.2.2 CladRect
The CladRect is used to represent a 2D rectilinear rectangle on an image. It has the following
fields:
Table 85:
Field Type Units Description
CladRectangle JSON
height float pixels The height of the rectangle structure
42.2.3 PoseStruct
The PoseStruct is used to represent a 3D rectilinear point and orientation on the map. It has the
following fields:
42.2.4 ResponseStatus
The ResponseStatus is “a shared response message sent back as part of most requests. This will
indicate the generic state of the request.” It has the following fields:
You too can create custom objects for Vector to… at least see and perceive. Maybe even love.
There are four kinds of custom objects that you can define:
A fixed, unmarked cube-shaped object. The object is in a fixed position and orientation,
and it can’t be observed (since it is unmarked). So there won’t be any events related to this
object. “This could be used to make Vector aware of objects and know to plot a path
around them.”
A note about object id’s: The object id may change: “a cube disconnecting and reconnecting it's
removed and then re-added to robot's internal world model which results in a new ID.”
The client should employ a timer for each potential visual object. If there isn’t an “object
observed” event received in the time period, it should be assumed “that Vector can no longer see
an object.”
43.1. ENUMERATIONS
The CustomObjectMarker enumerates the marker symbols
The CustomType refers to the one of the 20 possible custom objects that can be defined
The ObjectFamily is an older, now deprecated method, of enumerating the kind of object
(as in, charger, light cube, wall, box, or custom cube).
The ObjectType enumeration is the preferred method of enumerating the kinds of objects
Table 89:
Name Value Description
CustomObjectMarker
CUSTOM_MARKER_UNKNOWN 0 Enumeration
CUSTOM_MARKER_CIRCLES_2 1
CUSTOM_MARKER_CIRCLES_3 2
CUSTOM_MARKER_CIRCLES_4 3
CUSTOM_MARKER_CIRCLES_5 4
CUSTOM_MARKER_DIAMONDS_2 5
CUSTOM_MARKER_DIAMONDS_3 6
CUSTOM_MARKER_DIAMONDS_4 7
CUSTOM_MARKER_DIAMONDS_5 8
CUSTOM_MARKER_HEXAGONS_2 9
CUSTOM_MARKER_HEXAGONS_3 10
CUSTOM_MARKER_HEXAGONS_4 11
CUSTOM_MARKER_HEXAGONS_5 12
CUSTOM_MARKER_TRIANGLES_2 13
CUSTOM_MARKER_TRIANGLES_3 14
CUSTOM_MARKER_TRIANGLES_4 15
CUSTOM_MARKER_TRIANGLES_5 16
CUSTOM_MARKER_COUNT 16
43.1.4 ObjectType
The ObjectType is used represent the type of object that a symbol is attached to. The enumeration
has the following named values:
43.2.1 ObjectEvent
The ObjectEvent event is sent (see Event message) when the state of an object has changed. The
structure has one (and only one) of the following fields:
43.2.2 ObjectAvailable
The ObjectAvailable event is sent (see section 43.2.1 ObjectEvent) when Vector has received
Bluetooth LE advertisements from the object (cube).
Table 94:
Field Type Units Description
ObjectAvailable JSON
factory_id string The identifier for the cube. This is built into the structure
cube.
43.2.3 ObjectConnectionState
The ObjectConnectedState event is to “indicate that a cube has connected or disconnected to the
robot. This message will be sent for any connects or disconnects regardless of whether it
originated from us or underlying robot behavior.”
Table 95:
Field Type Units Description
ObjectConnectedState
connected bool True if Vector has a Bluetooth LE connection with JSON structure
the Cube.
factory_id string The identifier for the cube. This is built into the
cube.
object_id uint32 The identifier of the object that Vector is (or was)
connected to.
object_type ObjectType The type of object referred to.
43.2.4 ObjectMoved
The ObjectMoved event is sent (see section 43.2.1 ObjectEvent) when an object has changed its
position. The structure has the following fields:
43.2.5 ObjectStoppedMoving
The ObjectStoppedMoving event is sent (see section 43.2.1 ObjectEvent) when an object previously
identified as moving has come to rest. The structure has the following fields:
Table 97:
Field Type Units Description
ObjectStoppedMoving
object_id uint32 The identifier of the object that was moving. JSON structure
timestamp uint32 The time that the event occurred on. The format
is milliseconds since Vector’s epoch.
43.2.6 ObjectUpAxisChanged
The ObjectUpAxis event is sent (see section 43.2.1 ObjectEvent) if the orientation of the object has
significantly changed, leaving it with a new face upward. The structure has the following fields:
43.2.7 RobotObservedObject
The RobotObservedObject event is sent when “an object with [the] specified ID/Type was seen at a
particular location in the image and the world.” This event structure has the following fields:
Table 100:
Field Type Units Description
RobotObservedObject
img_rect CladRect The position of the object within the vision image. JSON structure
is_active uint32
object_family ObjectFamily Deprecated. “Use ObjectType instead to reason
about groupings of objects.”
object_id int32 The identifier of the object that has been seen.
Note that this is signed (int32 instead of uint32) for
internal compatibility reasons.
object_type ObjectType The type of object referred to.
pose PoseStruct The observed pose of this object. Optional.
timestamp uint32 The time that the object was most recently
observed. The format is milliseconds since
Vector’s epoch.
top_face_orientation_rad float radians “Angular distance from the current reported up
axis. “ “absolute orientation of top face, iff
isActive==true”
Post: “/v1/create_fixed_custom_object”
43.3.1 Request
The CreateFixedCustomObjectRequest structure has the following fields:
Table 101:
Field Type Units Description
CreateFixedCustomObje
pose PoseStruct The position and orientation of this object. ctRequest JSON
structure
x_size_mm float mm The size of the object that the marker symbol is on,
along the x-axis.
y_size_mm float mm The size of the object that the marker symbol is on,
along the y-axis.
z_size_mm float mm The size of the object that the marker symbol is on,
along the z-axis.
43.3.2 Response
The CreateFixedCustomObjectResponse structure has the following fields:
Table 102:
Field Type Description
CreateFixedCustomObje
object_id uint32 The object identifier assigned to this object. ctResponse JSON
structure
status ResponseStatus A generic status of whether the request was able
to be carried out, or an error code indicating why
it was unable to be carried out.
Note: “No instances of this object are added to the world until they have been seen.”
Post: “/v1/define_custom_object”
43.4.1 Request
The DefineCustomObjectRequest structure has the following fields:
Table 103:
Field Type Units Description
DefineCustomObjectReq
custom_type CustomType The object type to be assigned to this object. uest JSON structure
Note: only one of “custom_box,” “custom_cube,” or “custom_wall” can be used in the request.
Table 104:
Field Type Units Description
CustomBoxDefinition
marker_back CustomObjectMarker The marker symbol used on the back surface of the JSON structure
box. This marker must be unique (not used by any
of the other side’s on this box or in any other
shape).
marker_bottom CustomObjectMarker The marker symbol used on the bottom surface of
the box. This marker must be unique (not used by
any of the other side’s on this box or in any other
shape).
marker_front CustomObjectMarker The marker symbol used on the front surface of the
box. This marker must be unique (not used by any
of the other side’s on this box or in any other
shape).
marker_left CustomObjectMarker The marker symbol used on the left-hand side of
the box. This marker must be unique (not used by
any of the other side’s on this box or in any other
shape).
marker_right CustomObjectMarker The marker symbol used on the right-hand side of
the box This marker must be unique (not used by
any of the other side’s on this box or in any other
shape).
marker_top CustomObjectMarker The marker symbol used on the top surface of the
box. This marker must be unique (not used by any
of the other side’s on this box or in any other
shape).
marker_height_mm float mm The height of the marker symbol.
marker_width_mm float mm The width of the marker symbol.
x_size_mm float mm The size of the object, along the x-axis, that the
marker symbol is on.
y_size_mm float mm The size of the object, along the y-axis, that the
marker symbol is on.
z_size_mm float mm The width of the object, along the z-axis, that the
marker symbol is on.
The CustomCubeDefinition “defines a custom cube of the given size.” The structure has the
following fields:
Table 105:
Field Type Units Description
CustomCubeDefinition
marker CustomObjectMarker The marker symbol used on all of the cube JSON structure
surfaces; “the same marker [must] be centered on
all faces.”
marker_height_mm float mm The height of the marker symbol
marker_width_mm float mm The width of the marker symbol
size_mm float mm The height, width, and depth of the object that the
marker symbol is on.
Table 106:
Field Type Units Description
CustomWallDefinition
marker CustomObjectMarker The marker symbol used on the wall surfaces; “the JSON structure
same marker centered on both sides (front and
back)”
marker_height_mm float mm The height of the marker symbol
marker_width_mm float mm The width of the marker symbol
height_mm float mm The height of the object that the marker symbol is
on.
width_mm float mm The width of the object that the marker symbol is
on.
43.4.2 Response
The DefineCustomObjectResponse type has the following fields:
Table 107:
Field Type Description
DefineCustomObjectRe
status ResponseStatus A generic status of whether the request was able sponse JSON structure
to be carried out, or an error code indicating why
it was unable to be carried out.
success bool True if the thumbnail was successfully retrieved;
otherwise there was an error.
Post: “/v1/delete_custom_objects”
43.5.1 Request
The DeleteCustomObjectsRequest type has the following fields:
Table 108:
Field Type Description
DeleteCustomObjectsRe
mode CustomObjectDeletionMode The kind of custom objects to remove. quest JSON structure
The CustomObjectDeletionMode is used to specify which kinds of custom objects should be deleted
from the internal database. The enumeration has the following named values:
Table 109:
Name Value Description
CustomObjectDeletionM
DELETION_MASK_UNKNOWN 0 ode Enumeration
43.5.2 Response
The DeleteCustomObjectsResponse type has the following fields:
Table 110:
Field Type Description
DeleteCustomObjectsRe
status ResponseStatus A generic status of whether the request was able sponse JSON structure
to be carried out, or an error code indicating why
it was unable to be carried out.
See also section 51 Cube, and section 57 Interactions with Objects, which covers actions/behaviors
that involve interacting with objects and faces.
Actions often have tags (an arbitrary value given to it by the SDK application), and have result
code. And action can be cancelled using this tag. Behaviors do not have tags.
Behaviors are part of the behavior tree, and can potentially submit other behaviors based on
prevailing conditions. See Chapter 27 for more detail on behaviors.
Behaviors are submitted at the priority level associated with the connection. If the connection has
released control, requested behaviors and actions are ignored. When control is requested, a priority
level is requested by the SDK application at the time. Behaviors requested by Vector’s internal AI
with a lower priority will be ignored; behaviors with a high priority will take control (causing the
SDK to lose control). By giving up control, or changing the control priority the SDK can
effectively cancel the behavior it requested.
Request control at the RESERVE_CONTROL priority level “can be used to suppress the ordinary
idle behaviors of the Robot and keep Vector still between SDK control instances. Care must be
taken when blocking background behaviors, as this may make Vector appear non-responsive.”
44.1. ENUMERATIONS
44.1.1 ActionTagConstants
This is the range of numbers in which we can assign an identifier for the action so that we can
cancel it later.
Table 111:
Name Value Description
ActionTagConstants
INVALID_SDK_TAG 0 Enumeration
44.1.2 BehaviorResults
The BehaviorResults is used TBD. The enumeration has the following named values:
Table 112:
Name Value Description
BehaviorResults
BEHAVIOR_INVALID_STATE 0 Enumeration
BEHAVIOR_COMPLETE_STATE 1
BEHAVIOR_WONT_ACTIVATE_STATE 2
44.2.1 FeatureStatus
The FeatureStatus status event is sent as Vector’s behavior focus changes. The structure has the
following fields:
Table 113:
Field Type Description
FeatureStatus JSON
feature_name string The current active behaviour (feature). See structure
Appendix H, table Table 612: The AI behaviour
features for a list and description.
source string Where the direction to do this behavior came from:
“Voice”, “App”, “AI”, “Unknown”. Voice is for
responses to voice commands and intents; “App” is
for application submitted intents; AI is behaviors
initiated by the high-level AI.
Note: for Vector-OS feature flags, see section 55 Features & Entitlements.
44.2.2 StimulationInfo
The StimulationInfo event is used report events that impact Vector’s emotion state and overall
stimulation level. The structure has the following fields:
Table 114:
Field Type Units Description
StimulationInfo JSON
accel float mm/sec2 The acceleration at the time of the stimulation. structure
44.3.1 ActionResults
“The possible results of running an action.” The structure has the following fields:
Table 115:
Field Type Description
ActionResults JSON
code ActionResultCode The results structure
Table 116:
Name Value Description
ActionResultCode
ACTION_RESULT_SUCCESS 0 “Action completed successfully.” Enumeration
The request specifies a priority level. After control is granted, Vector’s AI will suppress internal
behaviors with a lower priority. When a behavior is commanded by the SDK, it will be associated
with the priority level selected here. Note: the priority level is represented by a number where
lower values represent higher priorities, and higher values represent lower priorities. See Chapter
27 for a detailed description of behavior priorities.
There are two entry points: AssumeBehaviorControl and BehaviorControl. Both employ the same
request and response message structures. The response is a stream that includes information when
the control was acquired, and lost.
Post: “/v1/assume_behavior_control”
44.4.1 Request
The BehaviorControlRequest is used to request control of Vector’s behavior stream, and to release
it. This structure includes one (and only one) of the following fields:
Table 117:
Field Type Description
BehaviorControlRequest
control_release {} This is used to when the application is releasing JSON structure
control back to Vector; the value is an empty
dictionary.
control_request ControlRequest This is used when the application is requesting
control of Vector; see below for a description.
The ControlRequest is used to request control of the behavior system at a given priority. This
structure has the following fields:
Table 118:
Field Type Description
ControlRequest JSON
priority Priority This is the priority level that should be employed structure
for requested behaviors; internal behaviors with a
priority lower than this will be suppressed.
The Priority enumeration has the following named priority level values:
Table 120:
Field Type Units Description
BehaviorControlRespons
control_granted_response {} The application is now in control of the behavior e JSON structure
stream and is “free to run any actions and
behaviors they like. Until a ControlLostResponse
is received, they are directly in control of Vector's
behavior system.”
control_lost_event {} “This informs the user that they lost control of the
behavior system... to a higher priority behavior.”
“This control can be regained through another”
BehaviorControlRequest.
keep_alive KeepAlivePing “Used by Vector to verify the connection is still
alive.”
reserved_control_lost_event {} The “behavior system lock has been lost to
another connection.” “This control can be
regained through another”
BehaviorControlRequest. This is sent when the
SDK is at RESERVE_CONTROL priority level.
Post: “/v1/cancel_action_by_id_tag”
44.5.1 Request
The CancelActionByIdTagRequest structure has the following fields:
Table 121:
Field Type Description
CancelActionByIdTagRe
id_tag uint32 “Use the id_tag provided to the action request” quest JSON structure
44.5.2 Response
The CancelActionByIdTagResponse type has the following fields:
Table 122:
Field Type Description
CancelActionByIdTagRe
status ResponseStatus A generic status of whether the request was able sponse JSON structure
to be carried out, or an error code indicating why
it was unable to be carried out.
Post: “/v1/cancel_behavior”
44.6.1 Request
The CancelBehaviorRequest structure has no fields.
44.6.2 Response
The CancelBehaviorResponse type has the following fields:
Table 123:
Field Type Description
CancelBehaviorRespons
status ResponseStatus A generic status of whether the request was able e JSON structure
to be carried out, or an error code indicating why
it was unable to be carried out.
Post: “/v1/look_around_in_place”
44.7.1 Request
The LookAroundInPlaceRequest structure has no fields.
44.7.2 Response
The LookAroundInPlaceResponse structure has the following fields:
Table 124:
Field Type Description
LookAroundInPlaceResp
result BehaviorResults onse JSON structure
45.1. ENUMERATIONS
45.1.1 AlexaAuthState
The AlexaAuthState is used represent how far in the Alexa Voice Services authorization process
Vector is. The enumeration has the following named values:
Table 125:
Name Value Description
AlexaAuthState
ALEXA_AUTH_INVALID 0 “Invalid/error/versioning issue” Enumeration
45.2. EVENTS
45.2.1 AlexaAuthEvent
The AlexaAuthEvent is used to post updates to SDK application (via the Event message) when the
authorization with Alexa Voice Services change. The structure has the following fields:
Table 126:
Field Type Description
AlexaAuthEvent JSON
auth_state AlexaAuthState structure
extra string
Post: “/v1/alexa_auth_state”
45.3.1 Request
The AlexaAuthStateRequest structure has no fields.
45.3.2 Response
The AlexaAuthStateResponse structure has the following fields:
Table 127:
Field Type Description
AlexaAuthStateRespons
auth_state AlexaAuthState e JSON structure
extra string
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/alexa_opt_in”
45.4.1 Request
The AlexaOptInRequest structure has the following fields:
Table 128:
Field Type Description
AlexaOptInRequest
opt_in bool True, if Vector should employ Alexa Voice JSON structure
services; otherwise Vector should not.
45.4.2 Response
The AlexaOptInResponse structure has the following fields:
Table 129:
Field Type Description
AlexaOptInResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
46.1. STRUCTURES
46.1.1 Animation
This structure is used to provide the name of an animation. The Animation structure has the
following fields:
46.1.2 AnimationTrigger
This structure is used to provide the name of an animation group (aka its trigger name). The
AnimationTrigger structure has the following fields:
Table 131:
Field Type Description
AnimationTrigger JSON
name string “The name of a given animation trigger” structure
Post: “/v1/list_animations”
46.2.1 Request
The ListAnimationsRequest has no fields.
46.2.2 Response
The ListAnimationsResponse structure has the following fields:
Table 132:
Field Type Description
ListAnimationsResponse
animation_names Animation[] “The animations that Vector knows..” JSON structure
Post: “/v1/list_animation_triggers”
46.3.1 Request
The ListAnimationTriggerssRequest has no fields.
46.3.2 Response
The ListAnimationTriggersResponse structure has the following fields:
Table 133:
Field Type Description
ListAnimationTriggersR
animation_tigger_names AnimationTrigger[] “The animations triggers that Vector knows.” esponse JSON structure
46.4.1 Request
The PlayAnimationRequest structure has the following fields:
Table 134:
Field Type Units Description
PlayAnimationRequest
animation Animation “The animation to play.” JSON structure
46.4.2 Response
The PlayAnimationResponse structure has the following fields:
Table 135:
Field Type Description
PlayAnimationResponse
animation Animation “The animation that the robot executed.” JSON structure
46.5.1 Request
The PlayAnimationTriggerRequest structure has the following fields:
Table 136:
Field Type Units Description
PlayAnimationTriggerRe
animation_trigger AnimationTrigger “The animation trigger to play.” quest JSON structure
46.5.2 Response
See the response for Play Animation.
47.1. EVENTS
47.1.1 AttentionTransfer
This event is sent when TBD. The AttentionTransfer structure has the following fields:
Table 137:
Field Type Description
AttentionTransfer
reason AttentionTransferReason The reason that the attention was changed. JSON structure
The AttentionTransferReason is used to represent why the attention was transferred. The
enumeration has the following named values:
Table 138:
Name Value Description
AttentionTransferReaso
Invalid 0 n Enumeration
NoCloudConnection 1
NoWifi 2
UnmatchedIntent 3
Post: “/v1/get_latest_attention_transfer”
47.2.1 Request
The GetLatestAttentionTransferRequest has no fields.
47.2.2 Response
The GetLatestAttentionTransferResponse has the following fields:
Table 139:
Field Type Description
GetLatestAttentionTran
latest_attention_transfer LatestAttentionTransfer sferResponse JSON
structure
status ResponseStatus A generic status of whether the request was able
to be carried out, or an error code indicating why
it was unable to be carried out.
Table 140:
Field Type Description
LatestAttentionTransfer
attention_transfer AttentionTransfer When and why the attention was changed. JSON structure
48.1. ENUMERATIONS
48.1.1 AudioProcessingMode
The AudioProcessingMode is used to represent the different ways that Vector can process the
microphone audio. The enumeration has the following named values:
Table 141:
Name Value Description
AudioProcessingMode
AUDIO_UNKNOWN 0 “error value” Enumeration
AUDIO_OFF 1 The audio settings from the HTTPS API will not be
used.
AUDIO_FAST_MODE 2 The spatial audio processing is disabled; the sound
is used from a single microphone. This has the
lowest processing overhead.
AUDIO_DIRECTIONAL_MODE 3 Use “beamforming support for focusing on specific
direction – [this] sounds cleanest”
AUDIO_VOICE_DETECT_MODE 4 Use “multi-microphone non-beamforming. [This
is] best for voice detection programs.”
48.1.2 MasterVolumeLevel
The MasterVolumeLevel is used to control the volume of audio played by Vector, including text to
speech. It is used in the MasterVolumeLevelRequest. The enumeration has the following named
values:
Table 142:
Name Value Description
MasterVolumeLevel
VOLUME_LOW 0 Enumeration
VOLUME_MEDIUM_LOW 1
VOLUME_MEDIUM 2
VOLUME_MEDIUM_HIGH 3
VOLUME_HIGH 4
48.1.3 UtteranceState
The UtteranceState is used to represent the state of audio playback by Vector, including text to
speech. It is used in the SayTextResponse. The enumeration has the following named values:
Table 143:
Name Value Description
UtteranceState
INVALID 0 Enumeration
48.2. EVENTS
The following events are sent in the Event message. When a person speaks the wake word, the
WakeWordBegin event will be sent, followed by the WakeWordEnd event and possibly a
UserIntent event.
48.2.1 AudioSendModeChanged
Note: this event is not available; it was defined in the API protocol, but never implemented and
removed. It is reproduced here for information purposes; it may be in future releases.
This event is “sent when the robot changes the mode it's processing and sending audio” in.
See Chapter 17, section 74.2 Spatial audio processing for more information
Table 144:
Field Type Description
AudioSendModeChanged
mode AudioProcessingMode The requested audio processing mode. JSON structure
48.2.2 UserIntent
The UserIntent event is sent by Vector when an intent is received (from the cloud), after a person
has said the wake word and spoken. The UserIntent structure has the following fields:
48.2.3 WakeWord
This event is sent when the wake word is heard, and then when the cloud response is received. The
WakeWord structure has the following fields, only one is present at any time:
31
The use of an enumeration rather than a string is unusual here, and seems limiting.
Table 147:
Field Type Description
WakeWordEnd JSON
intent_heard bool True if a sentence was recognized with an structure
associated intent; false otherwise.
intent_json string The intent and parameters as a JSON formatted
string. This is empty if an intent was not heard
(intent_heard will be false), or if the client does not
have control. In the later case, a UserIntent event
with the intent JSON data will be sent.
Post: “/v1/app_intent”
48.3.1 Request
The AppIntentRequest structure has the following fields:
Table 148:
Field Type Description
AppIntentRequest
intent string The name of the intent to request; Vector JSON structure
(probably) will only honor the intents listed in the
“App Intent” column in Appendix I, Table 615:
Mapping of different intent names
param string The parameters for the intent. This is usually a
JSON formatted string. This can be empty if the
intent does not require any additional information.
Table 149:
Field Type Units Description
intent_meet_victor
param parameters
The intent_clock_settimer intent parameter isn’t used. Instead the length of the param is used as
the number of seconds to set the timer for.
48.3.2 Response
The AppIntentResponse has the following fields:
Table 150:
Field Type Description
AppIntentResponse
status ResponseStatus A generic status of whether the request was able JSON structure
to be carried out, or an error code indicating why
it was unable to be carried out.
See Chapter 17, section 74.2 Spatial audio processing for more information
Post: “/v1/audio_feed”
48.4.1 Request
This AudioFeedRequest has no fields.
48.4.2 Response
The response is a stream of the following AudioFeedResponse structure. This structure has the
following fields:
Table 151:
Field Type Description
AudioFeedResponse
direction_strengths bytes “Histogram data of which directions this audio JSON structure
chunk came from.”
group_id uint32 “The index of this audio feed response”
noise_floor_power uint32 The background noise level, as a “power value,
convert to db with log10(value)”
robot_time_stamp uint32 The “robot time at the transmission of this audio
sample group”
signal_power bytes The stream of sound that Vector hears, as a “mono
audio amplitude samples”. This is 1600 “16-bit
little-endian PCM audio” samples, at 11025
samples/sec.
source_confidence uint32 The “accuracy of the calculated source_direction”
source_direction uint32 0-11: The index of the direction that the voice or
key sound is coming.
12: There is no identifiable sound or the direction
cannot be determined.
This command is used to “request how the robot should process and send audio.” Specifically it
can turn off the audio processing, and enable or disable the spatial audio processing.
See Chapter 17, section 74.2 Spatial audio processing for more information
48.5.1 Request
This AudioSendModeRequest has the following fields:
Table 152:
Field Type Description
AudioSendModeRequest
mode AudioProcessingMode The requested audio processing mode. JSON structure
48.5.2 Response
There is no response.
1. Setting up the audio playback, by sending the “audio_stream_prepare” substructure with the
audio rate and value
2. Sending the audio data in chunks (up to 1024 bytes, or 512 samples) using the
“audio_stream_chunk” structure
48.6.1 Request
The ExternalAudioStreamRequest is used to stream a chunk of audio to Vector. This structure has
one (and only one) of the following fields:
Table 153:
Field Type Description
ExternalAudioStreamRe
audio_stream_cancel {} “Cancel a playing external robot audio stream” quest JSON structure
Table 154:
Field Type Description
ExternalAudioStreamPr
audio_frame_rate uint32 The sample rate for the audio. This must be in the epare JSON structure
range of 8000 to 16025 samples/sec.
audio_volume uint32 The volume to play the audio at. 0-100
Table 155:
Field Type Description
ExternalAudioStreamCh
audio_chunk_samples byte[] The audio samples, encoded as 16-bit values in unk JSON structure
little-endian order. This must be 1024 or few
bytes
audio_chunk_size_bytes32 uint32 The number of bytes sent; the max is 1024 (i.e. a
max of 512 samples).
32
I am curious. Why does this field exist? The array intrinsically knows it size…
Table 156:
Field Type Description
ExternalAudioStreamRe
audio_stream_playback_complete {} “Audio has been played on the Robot” sponse JSON structure
Table 157:
Field Type Description
ExternalAudioStreamBu
audio_samples_played uint32 The number of samples that were played. fferOverrun JSON
structure
audio_samples_sent uint32 The number of audio samples that were sent [To
Vector? To the audio subsystem?]
33
Yes, that mis-spelling is correct
48.7.1 Request
The MasterVolumeResponse has the following fields:
Table 158:
Field Type Description
MasterVolumeRequest
volume_level MasterVolumeLevel This is used to set the volume of Vector’s audio JSON structure
playback.
48.7.2 Response
The MasterVolumeResponse has the following fields:
Table 159:
Field Type Description
MasterVolumeResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/say_text”
48.8.1 Request
The SayTextRequest structure has the following fields:
Table 160:
Field Type Units Description
SayTextRequest JSON
duration_scalar float ratio This controls the speed at which Vector speaks. structure
1.0 is normal rate, less than 1 increases the speed
(e.g. 0.8 causes Vector to speak in just 80% of the
usual time), and a value larger than one slows the
speed (e.g. 1.2 causes Vector to take 120% of the
usual time to speak). Allowed range is 0.5..20.0.
Default: 1.0
pitch_scalar float Negative values lower the pitch, higher values raise
the pitch. Allowed range is -1.0..1.0 Default: 0.0.
Note: this field is optional, and available only in
1.7 or later versions.
text string The text (the words) that Vector should say.
use_vector_voice bool True if the text should be spoken in “Vector's robot
voice; otherwise, he uses a generic human male
voice.”
48.8.2 Response
The SayTextResponse structure has the following fields:
Table 161:
Field Type Description
SayTextResponse JSON
state UtteranceState Where in the speaking process Vector is currently. structure
See section 57 Interactions with Objects for actions to drive onto and off of the charger.
49.1. ENUMERATIONS
The BatteryLevel enumeration is located in Chapter 8, Power Management, Table 19: BatteryLevel
codes as they apply to Vector
Post: “/v1/battery_state”
49.2.1 Request
No parameters
49.2.2 Response
The BatteryStateResponse structure has the following fields:
Table 162:
Field Type Units Description
BatteryStateResponse
battery_level BatteryLevel The interpretation of the battery level. JSON structure
50.1. EVENTS
50.1.1 ConnectionResponse
The ConnectionResponse structure has the following fields:
Table 163:
Field Type Description
ConnectionResponse
is_primary bool JSON structure
50.1.2 Event
The Event structure is to deliver messages that some event has occurred. It is received in periodic
response to the part of the Event Stream command. All the events are carried in this one has one
(and only) of the following fields:
50.1.3 KeepAlivePing
This is “a null message used by streams to verify that the client is still connected.” This message
has no fields.
50.1.4 TimeStampedStatus
The TimeStampedStatus structure has the following fields:
Table 165:
Field Type Description
TimeStampedStatus
status Status JSON structure
timestamp_utc uint32 The time that the status occurred on. The format is
unix time: seconds since 1970, in UTC.
The Status structure has one (and only one) of the following fields:
Post: “"/v1/event_stream”
Get: “"/v1/event_stream”
50.2.1 Request
The EventRequest has the following fields:
Table 167:
Field Type Description
EventRequest JSON
black_list FilterList The list of events to not include. ? structure
connection_id string
white_list FilterList The list of events to include.
50.2.2 Response
The response is a stream of EventResponse structures. These have the following fields:
Table 169:
Field Type Description
EventResponse JSON
event Event The event that occurred. This structure is described structure
above in the subsection Eents
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/protocol_version”
“The valid versions of the protocol. Protocol versions are updated when messages change
significantly: new ones are added and removed, fields deprecated, etc. The goal is to support as
many old versions as possible, only bumping the minimum when there is no way to handle a prior
version.”
50.3.1 Request
The ProtocolVersionRequest has the following fields:
Table 170:
Field Type Description
ProtocolVersionRequest
client_version int64 The version of the protocol that the client is using. JSON structure
50.3.2 Response
The ProtocolVersionResponse has the following fields:
Table 171:
Field Type Description
ProtocolVersionRespon
host_version int64 The version of the protocol that the robot supports. se JSON structure
The Result is used to indicate whether the client version is supported. The enumeration has the
following named values:
Post: “/v1/sdk_initialization”
50.4.1 Request
The SDKInitializationRequest has the following fields:
Table 173:
Field Type Description
SDKInitializationReques
cpu_version string The CPU model that the client (SDK) is using; t JSON structure
informational only.
os_version string The version of operating system that the client
(SDK) is using; informational only.
python_implementation string
python_version string The version of python that the client (SDK) is
using. Informational only.
sdk_module_version string The version of the SDK software that the client is
using.
50.4.2 Response
The SDKInitializationResponse type has the following fields:
Table 174:
Field Type Description
SDKInitializationRespon
status ResponseStatus A generic status of whether the request was able to se JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/user_authentication”
50.5.1 Request
The UserAuthenticationRequest has the following fields:
Table 175:
Field Type Description
UserAuthenticationRequ
client_name bytes est JSON structure
user_session_id bytes
50.5.2 Response
The UserAuthenticationResponse has the following fields:
Table 176:
Field Type Description
UserAuthenticationResp
client_token_guid bytes The token bytes to be included in subsequent onse JSON structure
HTTPS postings. This token should be saved for
future use.
code Code The result of the authentication request
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/version_state”
50.6.1 Request
The VersionStateRequest has no fields.
50.6.2 Response
The VersionStateResponse type has the following fields:
Table 178:
Field Type Description
VersionStateResponse
engine_build_id string The robot’s software build identifier. JSON structure
Comment: Many of the commands are specific to interacting with a cube, but appear to have been
intended to be generalized to work with a wider range of objects.
See also section 43.4 Define Custom Object for a description how to create custom box and cube
objects.
51.1. ENUMERATIONS
51.1.1 AlignmentType
The AlignmentType is used to indicate how Vector should align with the object. The enumeration
has the following named values:
Table 179:
Name Value Description
AlignmentType
ALIGNMENT_TYPE_UNKNOWN 0 Enumeration
ALIGNMENT_TYPE_LIFT_FINGER 1 “Align the tips of the lift fingers with the target
object”
ALIGNMENT_TYPE_LIFT_PLATE 2 “Align the flat part of the lift with the object (useful
for getting the fingers in the cube's grooves)”
ALIGNMENT_TYPE_BODY 3 “Align the front of Vector's body (useful for when
the lift is up)”
ALIGNMENT_TYPE_CUSTOM 4 “For use with distanceFromMarker parameter”
51.1.2 CubeBatteryLevel
The CubeBatteryLevel enumeration is used to categorize the condition of the Cube battery:
Table 180:
Name Value Description
CubeBatteryLevel
BATTERY_LEVEL_LOW 0 The Cube battery is 1.1V or less. codes34 as they apply
to Vector
BATTERY_LEVEL_NORMAL 1 The Cube battery is at normal operating levels, i.e.
>1.1v
34
The levels are from robot.py
51.2.1 CubeBattery
The CubeBattery structure has the following fields:
51.2.2 CubeConnectionLost
“Indicates that the connection subscribed through ConnectCube has been lost.”
51.2.3 ObjectTapped
The ObjectTapped event is sent (see ObjectEvent) when an object has received a finger-tap. This
event is only sent by the cube. Note: this event can have false triggers; it may sent when Vector is
picking up, carrying, or putting down the Cube.
Table 182:
Field Type Units Description
ObjectTapped JSON
object_id uint32 The identifier of the object tapped. structure
timestamp uint32 The time that the event occurred on. The format
is milliseconds since Vector’s epoch.
Post: “/v1/connect_cube”
51.3.1 Request
The ConnectCubeRequest has no fields.
51.3.2 Response
The ConnectCubeResponse type has the following fields:
Table 183:
Field Type Description
ConnectCubeResponse
factory_id string The identifier for the cube. This is built into the JSON structure
cube.
object_id uint32 The identifier of the cube that we connected with.
This is Vector’s internal identifier, and only the
preferred cube is assigned one.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
success bool True if Vector was able to successfully connect, via
Bluetooth LE, with the cube.
Post: “/v1/cubes_available”
51.4.1 Request
The CubesAvailableRequest has no fields.
51.4.2 Response
The CubesAvailableResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 184:
Field Type Description
CubesAvailableRespons
factory_ids string[] A list of the cubes that were seen via Bluetooth LE. e JSON structure
The cubes internal identifier (it’s factor id) is sent.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/disconnect_cube”
51.5.1 Request
The DisconnectCubeRequest has no fields.
51.5.2 Response
The DisconnectCubeResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 185:
Field Type Description
DisconnectCubeRespon
status ResponseStatus A generic status of whether the request was able to se JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
This action requires the use of the wheels (tracks). “Actions that use the wheels cannot be
performed at the same time; otherwise you may see a TRACKS_LOCKED error.”
Post: “/v1/dock_with_cube”
51.6.1 Request
The DockWithCubeRequest structure has the following fields:
Table 186:
Field Type Units Description
DockWithCubeRequest
alignment_type AlignmentType “Which part of the robot to align with the object.” JSON structure
approach_angle_rad float radians “The angle to approach the cube from. For
example, 180 degrees will cause Vector to drive
past the cube and approach it from behind.”
distance_from_marker_mm float mm “The distance from the object to stop. This is the
distance between the origins.” 0mm to dock.
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
motion_prof PathMotionProfile Modifies how Vector should approach the cube.
Optional.
num_retries int32 Maximum of times to attempt to reach the object.
A retry is attempted if Vector is unable to reach the
target object.
object_id int32 The identifier of the object to dock with.
use_approach_angle bool If true, Vector will approach the cube from the
given approach angle; otherwise Vector will
approach from the most convenient angle.
use_pre_dock_pose bool If true, “try to immediately [dock with the] object
or first position the robot next to the object.”
Recommended to set this to the same as
use_approach_angle.
51.6.2 Response
The DockWithCubeResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 187:
Field Type Description
DockWithCubeResponse
result ActionResult An error code indicating the success of the action, JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Note: “This [command] is intended for app level user surfacing of cube connectivity, not for SDK
cube light control.”
Post: “/v1/flash_cube_lights”
51.7.1 Request
The FlashCubeLightsRequest has no fields.
51.7.2 Response
The FlashCubeLightsResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 188:
Field Type Description
FlashCubeLightsRespon
status ResponseStatus A generic status of whether the request was able to se JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/forget_preferred_cube”
51.8.1 Request
The ForgetPreferredCubeRequest has no fields.
51.8.2 Response
The ForgetPreferredCubeResponse is sent to indicate whether the action successfully completed or
not. This structure has the following fields:
Table 189:
Field Type Description
ForgetPreferredCubeRe
status ResponseStatus A generic status of whether the request was able to sponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
“Note that actions that use the wheels cannot be performed at the same time, otherwise you may
see a TRACKS_LOCKED error.”
51.9.1 Request
The PickupObjectRequest structure has the following fields:
Table 190:
Field Type Units Description
PickupObjectRequest
approach_angle_rad float radians “The angle to approach the cube from. For JSON structure
example, 180 degrees will cause Vector to drive
past the cube and approach it from behind.”
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
motion_prof PathMotionProfile Optional.
num_retries int32 Maximum of times to attempt to reach the object.
A retry is attempted if Vector is unable to reach the
target object.
object_id int32 The identifier of the object to pick up. `Negative
value means currently selected object’
use_approach_angle bool If true, Vector will approach the cube from the
given approach angle; otherwise Vector will
approach from the most convenient angle.
use_pre_dock_pose bool “Whether or not to try to immediately pick up an
object or first position the robot next to the object.”
51.9.2 Response
The PickupObjectResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 191:
Field Type Description
PickupObjectResponse
result ActionResult An error code indicating the success of the action, JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
51.10.1 Request
The PlaceObjectOnGroundRequest structure has the following fields:
Table 192:
Field Type Units Description
PlaceObjectOnGroundR
id_tag int32 This is an action tag that can be assigned to this equest JSON structure
request and used later to cancel the action.
Optional.
num_retries int32 Maximum of times to attempt to reach the object.
A retry is attempted if Vector is unable to reach the
target object.
51.10.2 Response
The PlaceObjectOnGroundResponse is sent to indicate whether the action successfully completed or
not. This structure has the following fields:
Table 193:
Field Type Description
PlaceObjectOnGroundR
result ActionResult An error code indicating the success of the action, esponse JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
51.11.1 Request
The PopAWheelieRequest structure has the following fields:
Table 194:
Field Type Units Description
PopAWheelieRequest
approach_angle_rad float radians “The angle to approach the cube from. For JSON structure
example, 180 degrees will cause Vector to drive
past the cube and approach it from behind.”
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
motion_prof PathMotionProfile zOptional.
num_retries int32 Maximum of times to attempt to reach the object.
A retry is attempted if Vector is unable to reach the
target object.
object_id int32 The identifier of the object to used to pop a
wheelie. Negative value means currently selected
object’
use_approach_angle bool If true, Vector will approach the cube from the
given approach angle; otherwise Vector will
approach from the most convenient angle.
use_pre_dock_pose bool “Whether or not to try to immediately [use the]
object or first position the robot next to the object.”
Recommended to set this to the same as
use_approach_angle.
51.11.2 Response
The PopAWheelieResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 195:
Field Type Description
PopAWheelieResponse
result ActionResult An error code indicating the success of the action, JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/roll_block”
51.12.1 Request
The RollBlockRequest has no fields.
51.12.2 Response
The RollBlockResponse structure has the following fields:
Table 196:
Field Type Description
RollBlockResponse
result BehaviorResults JSON structure
51.13.1 Request
The RollObjectRequest structure has the following fields:
Table 197:
Field Type Units Description
RollObjectRequest
approach_angle_rad float radians “The angle to approach the cube from. For JSON structure
example, 180 degrees will cause Vector to drive
past the cube and approach it from behind.”
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
motion_prof PathMotionProfile Optional.
num_retries int32 Maximum of times to attempt to reach the object.
A retry is attempted if Vector is unable to reach the
target object.
object_id int32 The identifier of the object to roll. `Negative value
means currently selected object’
use_approach_angle bool If true, Vector will approach the cube from the
given approach angle; otherwise Vector will
approach from the most convenient angle.
use_pre_dock_pose bool “Whether or not to try to immediately [roll the]
object or first position the robot next to the object.”
Recommended to set this to the same as
use_approach_angle.
51.13.2 Response
The RollObjectResponse is sent to indicate whether the action successfully completed or not. This
structure has the following fields:
Table 198:
Field Type Description
RollObjectResponse
result ActionResult An error code indicating the success of the action, JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
“Sets each LED on [Vector]'s cube. Two states are specified designated ‘on’ and ‘off’, each with a
color, duration, and state transition time.”
51.14.1 Request
The SetCubeLightsRequest event is used to specify the light pattern on the cube. The structure has
the following fields:
Table 199:
Field Type Units Description
SetCubeLightsRequest
object_id uint32 The internal id for the cube. JSON structure
Table 200:
Name Value Description
MakeRelativeMode
UNKNOWN 0 Enumeration
OFF 1
BY_CORNER 2
BY_SIDE 3
51.14.2 Response
The SetCubeLightsResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 201:
Field Type Description
SetCubeLightsResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/set_preferred_cube”
51.15.1 Request
The SetPreferredCubeRequest structure has the following fields:
Table 202:
Field Type Units Description
SetPreferredCubeReque
factory_id string The identifier of the cube to use. This is built into st JSON structure
the cube.
51.15.2 Response
The SetPreferredCubeResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 203:
Field Type Description
SetPreferredCubeRespo
status ResponseStatus A generic status of whether the request was able to nse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/check_cloud_connection”
52.1.1 Request
The CheckCloudRequest has no fields.
52.1.2 Response
The CheckCloudResponse has the following fields:
Table 204:
Field Type Description
CheckCloudResponse
code ConnectionCode Whether the cloud is available, or the relevant JSON structure
connection error.
expected_packets int32 The number of packets expected to have been
exchanged with the cloud server.
num_packets int32 The number of packets actually exchanged with the
cloud server.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
status_message string
The ConnectionCode is used to indicate whether the cloud is available. It is used in the response to
the CheckCloudConnectionRequest command. The ConnectionCode enumeration has the following
named values:
Table 205:
Name Value Description
ConnectionCode
AVAILABLE 1 The cloud is connected, and has authenticated Enumeration
successfully.
BAD_CONNECTIVITY 2 The internet or servers are down.
FAILED_AUTH 4 The cloud connection has failed due to an
authentication issue.
FAILED_TLS 3 The cloud connection has failed due to [TLS
certificate?] issue.
UNKNOWN 0 There is an error connecting to the cloud, but the
reason is unknown.
Post: “/v1/upload_debug_logs”
52.2.1 Request
The UploadDebugLogsRequest structure has no fields.
52.2.2 Response
The UploadDebugLogsResponse structure has the following fields:
Table 206:
Field Type Description
UploadDebugLogsRespo
status ResponseStatus A generic status of whether the request was able to nse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
url string
53.1. EVENTS
53.1.1 MirrorModeDisabled
The MirrorModeDisabled event is sent (see Event) “if MirrorMode (camera feed displayed on face) is
currently enabled but is automatically being disabled.”
Post: “/v1/display_face_image_rgb”
53.2.1 Request
The DisplayFaceImageRGBRequest structure has the following fields:
Table 207:
Field Type Units Description
DisplayFaceImageRGBRe
duration_ms uint32 ms “How long to display the image on the face.” quest JSON structure
face_data bytes The raw data for the image to display. The LCD is
184x96, with RGB565 pixels (16 bits/pixel).
interrupt_running bool “If this image should overwrite any current images
on the face.”
53.2.2 Response
The DisplayFaceImageRGBResponse structure has the following fields:
Table 208:
Field Type Description
DisplayFaceImageRGBRe
status ResponseStatus A generic status of whether the request was able to sponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/enable_mirror_mode”
53.3.1 Request
The EnableMirrorModeRequest message has the following fields:
Table 209:
Field Type Description
EnableMirrorModeRequ
enable bool If true, enables displaying the camera feed (and est JSON structure
detections) on the LCD.
53.3.2 Response
The EnableMirrorModeResponse structure has the following fields:
Table 210:
Field Type Description
EnableMirrorModeRespo
status ResponseStatus A generic status of whether the request was able to nse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/set_eye_color”
53.4.1 Request
The SetEyeColorRequest has the following fields:
Table 211:
Field Type Description
SetEyeColorRequest
hue float The hue to set Vector’s eyes to. JSON structure
saturation float The saturation of the color to set Vector’s eyes to.
53.4.2 Response
The SetEyeColorResponse structure has the following fields:
Table 212:
Field Type Description
SetEyeColorResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Note: an int32 identifier is used to distinguish between faces that are seen. Each face will have a
separate identifier. A positive identifier is used for a face that is known (recognized). This value
will be the same when the face disappears and reappears later; the value likely persists across
reboots. A negative identifier is used for face that is not recognized; as unknown faces appear and
disappear they may be assigned different subsequent negative numbers. If a face becomes
recognized, a RobotChangedObservedFaceID event will be sent, along with a change in identifier
used.
54.1. ENUMERATIONS
54.1.1 FaceEnrollmentResult
The FaceEnrollmentResult is used to represent the success of associating a face with a name, or an
reason code if there was an error. The enumeration has the following named values:
Table 213:
Name Value Description
FacialExpression
SUCCESS 0 A face was seen, its facial signature and associated Enumeration
name were successfully saved.
SAW_WRONG_FACE 1
SAW_MULTIPLE_FACES 2 Too many faces were seen, and Vector did not
know which one to associate with the name.
TIMED_OUT 3
SAVED_FAILED 4 There was an error saving the facial signature and
associated name to non-volatile storage.
INCOMPLETE 5
CANCELLED 6 See Cancel Face Enrollment.
NAME_IN_USE 7
NAMED_STORAGE_FULL 8 There was no more room in the non-volatile storage
to hold another facial signature and associated
name.
UNKOWN_FAILURE 9
54.1.2 FacialExpression
The FacialExpression is used to estimate the emotion expressed by each face that vector sees. The
enumeration has the following named values:
Table 214:
Name Value Description
FacialExpression
EXPRESSION_UNKNOWN 0 The facial expression could not be estimated. Note: Enumeration
this could be because the facial expression
54.2. EVENTS
54.2.1 FaceEnrollmentComplete
The FaceEnrollmentComplete structure has the following fields:
Table 215:
Field Type Description
FaceEnrollmentComplet
face_id int32 The identifier code for the face. e JSON structure
54.2.4 RobotChangedObservedFaceID
This event occurs when a tracked (but not yet recognized) face is recognized and receives a
positive ID. This happens when Vector’s view of the face improves. This event can also occur
“when face records get merged” “(on realization that 2 faces are actually the same).”
Table 216:
Field Type Description
RobotChangedObserve
new_id int32 The new identifier code for the face that has been dFaceID JSON structure
recognized.
old_id int32 The identifier code that was used for the face until
now. Probably negative
54.2.5 RobotErasedEnrolledFace
The RobotErasedEnrolledFace event is sent to confirm that an enrolled face has been removed from
the robot. This structure has the following fields:
Table 217:
Field Type Description
RobotErasedEnrolledFa
face_id int32 The identifier code for the face; negative if the face ce JSON structure
is not recognized, positive if it has been recognized.
name string The name associated with the face. Empty if a
name is not known.
54.2.6 RobotObservedFace
The RobotObservedFace event is sent when faces are observed within the field of view. This event
is only sent if face detection is enabled. This structure has the following fields:
Table 218:
Field Type Description
RobotObservedFace
face_id int32 The identifier code for the face; negative if the face JSON structure
is not recognized, positive if it has been recognized.
expression FacialExpression The estimated facial expression seen on the face.
expression_values uint32[] An array that represents the histogram of
confidence scores in each individual expression. If
the expression is not known (e.g. expression
estimation is disabled), the array will be all zeros.
Otherwise, will sum to 100.
img_rect CladRect The area within the camera view holding the face.
name string The name associated with the face (if recognized).
Empty if a name is not known.
pose PoseStruct The position and orientation of the face.
left_eye CladPoint[] A polygon outlining the left eye, with respect to the
image rectangle.
mouth CladPoint[] A polygon outlining the mouth; the coordinates are
in the camera image.
nose CladPoint[] A polygon outlining the nose; the coordinates are in
the camera image.
right_eye CladPoint[] A polygon outlining the right eye; the coordinates
are in the camera image.
timestamp uint32 The time that the most recent facial information
was obtained. The format is milliseconds since
Vector’s epoch.
Table 219:
Field Type Description
RobotRenamedEnrolled
face_id int32 The identifier code for the face; negative if the face Face JSON structure
is not recognized, positive if it has been recognized
name string The name now associated with the face. Empty if a
name is not known.
post: “/v1/cancel_face_enrollment”
54.3.1 Request
The CancelFaceEnrollmentRequest structure has no fields.
54.3.2 Response
The CancelFaceEnrollmentResponse has the following fields:
Table 220:
Field Type Description
CancelFaceEnrollmentR
status ResponseStatus A generic status of whether the request was able to esponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
post: “/v1/enable_face_detection”
54.4.1 Request
The EnableFaceDetectionRequest structure has the following fields:
Table 221:
Field Type Description
EnableFaceDetectionRe
enable bool If true, face detection (and recognition) is enabled; quest JSON structure
otherwise face detection processes are disabled.
enable_blink_detection bool If true, Vector will attempt “to detect how much
detected faces are blinking.” Note: the blink
amount is not reported.
enable_expression_estimation bool If true, Vector will attempt to estimate facial
expressions.
enable_gaze_detection bool If true, Vector will attempt “to detect where
detected faces are looking.” Note: the gaze
direction is not reported.
enable_smile_detection bool If true, Vector will attempt “to detect smiles in
detected faces.” Note: the smile is not reported.
54.4.2 Response
The EnableFaceDetectionResponse has the following fields:
Table 222:
Field Type Description
EnableFaceDetectionRes
status ResponseStatus A generic status of whether the request was able to ponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
post: “/v1/enroll_face”
54.5.1 Request
The EnrollFaceRequest structure has no fields.
54.5.2 Response
The EnrollFacesResponse structure has the following fields:
Table 223:
Field Type Description
EnrollFacesResponse
result BehaviorResults JSON structure
post: “/v1/erase_all_enrolled_faces”
54.6.1 Request
The EraseAllEnrolledFacesRequest structure has no fields.
54.6.2 Response
The EraseAllEnrolledFacesResponse has the following fields:
Table 224:
Field Type Description
EraseAllEnrolledFacesRe
status ResponseStatus A generic status of whether the request was able to sponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
post: “/v1/erase_enrolled_face_by_id”
54.7.1 Request
The EraseEnrolledFaceByIDRequest structure has the following fields:
Table 225:
Field Type Description
EraseEnrolledFaceByIDR
face_id int32 The identifier code for the face to erase. equest JSON structure
54.7.2 Response
The EraseEnrolledFaceByIDResponse has the following fields:
Table 226:
Field Type Description
EraseEnrolledFaceByIDR
status ResponseStatus A generic status of whether the request was able to esponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
post: “/v1/find_faces”
54.8.1 Request
The FindFacesRequest structure has no fields.
54.8.2 Response
The FindFacesResponse structure has the following fields:
Table 227:
Field Type Description
FindFacesResponse
result BehaviorResults JSON structure
post: “/v1/request_enrolled_names”
54.9.1 Request
The RequestEnrolledNamesRequest structure has no fields.
54.9.2 Response
The RequestEnrolledNamesRequest structure has the following fields:
Table 228:
Field Type Description
RequestEnrolledNames
faces LoadedKnownFace[] An array of the faces that are associated with Response JSON
names. structure
Table 229:
Field Type Units Description
LoadedKnownFace
face_id int32 The identifier code for the face. JSON structure
last_seen_seconds_since_epoch int64 seconds The timestamp of the time the face was last seen.
The format is unix time: seconds since 1970, in
UTC?
name name The name associated with the face.
seconds_since_first_enrolled int64 seconds The number of seconds since the face was first
associated with a name and entered into the known
faces database.
seconds_since_last_seen int64 seconds The number of seconds since the face was last seen
seconds_since_last_updated int64 seconds The number of seconds since (?) the name
associated with the face was last changed.(?)
post: “/v1/set_face_to_enroll”
54.10.1 Request
The SetFaceToEnrollRequest structure has the following fields:
Table 230:
Field Type Description
SetFaceToEnrollRequest
name string The name to associate with the face. JSON structure
54.10.2 Response
The SetFaceToEnrollResponse has the following fields:
Table 231:
Field Type Description
SetFaceToEnrollRespons
status ResponseStatus A generic status of whether the request was able to e JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
post: “/v1/update_enrolled_face_by_id”
54.11.1 Request
The UpdateEnrolledFaceByIDRequest structure has the following fields:
Table 232:
Field Type Description
UpdateEnrolledFaceByI
face_id int32 The identifier code for the face. DRequest JSON
structure
new_name string The new name to associate with the face.
old_name string The name associated (until now) with the face.
This name must match the one Vector has for the
face_id. If not the command will not be honored.
54.11.2 Response
The UpdateEnrolledFaceByIDResponse has the following fields:
Table 233:
Field Type Description
UpdateEnrolledFaceByI
status ResponseStatus A generic status of whether the request was able to DResponse JSON
be carried out, or an error code indicating why it structure
was unable to be carried out.
Note: the API does not include the ability to enable a feature.
55.1. ENUMERATIONS
55.1.1 UserEntitlement
The UserEntitlement enumeration has the following named values:
Table 234:
Name Value Description
UserEntitlement
KICKSTARTER_EYES 0 Note: This was an entitlement that was explored, Enumeration
but not used.
post: “/v1/feature_flag”
55.2.1 Request
The FeatureFlagRequest message has the following fields:
Table 235:
Field Type Description
FeatureFlagRequest
feature_name string The name of the feature; this feature name should JSON structure
be one of those listed in response to Get Feature
Flag List (section 55.3). See Appendix H, Table
611: The features
55.2.2 Response
The FeatureFlagResponse type has the following fields:
Table 236:
Field Type Description
FeatureFlagResponse
feature_enabled bool True if the feature is enabled, false if not JSON structure
post: “/v1/feature_flag_list”
55.3.1 Request
The following is streamed… to the robot?
Table 237:
Field Type Description
FeatureFlagListRequest
request_list string JSON structure
55.3.2 Response
The FeatureFlagListResponse type has the following fields:
Table 238:
Field Type Description
FeatureFlagListRespons
list string[] An array of the feature flags; see Appendix H, e JSON structure
Table 611: The features for a description
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/update_user_entitlements”
55.4.1 Request
The UpdateUserEntitlementsRequest has the following fields:
55.4.2 Response
The UpdateUserEntitlementsResponse type has the following fields:
Table 241:
Field Type Description
UpdateUserEntitlements
code ResultCode Response JSON
structure
doc Jdoc
56.1. ENUMERATIONS
56.1.1 ImageEncoding
The ImageEncoding is used to describe the format of the image data contained in the chunk. The
enumeration has the following named values:
Table 242:
Name Value Description
ImageEncoding
NONE_IMAGE_ENCODING 0 Image is not encoded. TBD: does this mean no Enumeration
image?
RAW_GRAY 1 “No compression”
RAW_RGB 2 “no compression, just [RGBRGBRG...]”
YUYV 3
YUV420SP 4
BAYER 5
JPEG_GRAY 6
JPEG_COLOR 7
JPEG_COLOR_HALF_WIDTH 8
JPEG_MINIMIZED_GRAY 9 “Minimized grayscale JPEG - no header, no footer,
no byte stuffing”
JPEG_MINIMIZED_COLOR 10 “Minimized grayscale JPEG – no header, no footer,
no byte stuffing, with added color data.”
56.2. EVENTS
56.2.1 CameraSettingsUpdate
This CameraSettingsUpdate event is sent when the camera exposure settings change. This structure
has the following fields:
Table 243:
Field Type Units Description
CameraSettingsUpdate
auto_exposure_enabled bool parameters
exposure_ms uint32 ms
gain float
Table 244:
Field Type Units Description
RobotObservedMotion
bottom_img_area float area “Area of the supporting region for the point, as a parameters
fraction fraction of the bottom region”
bottom_img_x int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
bottom_img_y int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
ground_area float area “Area of the supporting region for the point, as a
fraction fraction of the ground ROI. If unable to map to the
ground, area=0.”
ground_x int32 mm “Coordinates of the point on the ground, relative to
robot, in mm.”
ground_y int32 mm “Coordinates of the point on the ground, relative to
robot, in mm.”
img_area float area “Area of the supporting region for the point, as a
fraction fraction of the image”
img_x int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
img_y int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
left_img_area float area “Area of the supporting region for the point, as a
fraction fraction of the left region.”
left_img_x int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
left_img_y int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
right_img_area float area “Area of the supporting region for the point, as a
fraction fraction of the right region.”
right_img_x int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
right_img_y int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
timestamp uint ms “Timestamp of the corresponding image”
top_img_area float area “Area of the supporting region for the point, as a
fraction fraction of the top region”
top_img_x int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
top_img_y int32 pixel “Pixel coordinate of the point in the image, relative
to top-left corner.”
Post: “/v1/camera_feed”
56.3.1 Request
The CameraFeedRequest has no fields.
56.3.2 Response
The response is a stream of the following CameraFeedResponse structure. This structure has the
following fields:
Table 245:
Field Type Description
CameraFeedResponse
data bytes The bytes of the image JSON structure
frame_time_stamp uint32 The time that the image frame was captured.
image_encoding ImageEncoding The data format used for the image.
image_id uint32
Post: “/v1/capture_single_image”
56.4.1 Request
The CaptureSingleImageRequest has the following fields:
Table 246:
Field Type Description
CaptureSingleImageReq
enable_high_resolution bool True if the image should be capture in high uest JSON structure
resolution; false to capture in 640x360 resolution.
Default: false. Optional.
Note: this field is only honoured in version 1.7 and
later of the software.
56.4.2 Response
The CaptureSingleImageResponse structure has the following fields:
Table 247:
Field Type Description
CaptureSingleImageRes
data bytes The bytes of the image ponse JSON structure
frame_time_stamp uint32 The time that the image frame was captured.
image_encoding ImageEncoding The data format used for the image.
image_id uint32
Post: “/v1/enable_image_streaming”
56.5.1 Request
The EnableImageStreamingRequest type has the following fields:
Table 248:
Field Type Description
EnableImageStreamingR
enable bool True if Vector should send a stream of images from equest JSON structure
the camera.
enable_high_resolution bool True if the image should be captured in high
resolution; false to capture in 640x360 resolution.
Default: false. Optional.
Note: this field is only honoured in version 1.7 and
later of the software.
56.5.2 Response
The EnableImageStreamingResponse has the following fields:
Table 249:
Field Type Description
EnableImageStreamingR
status ResponseStatus A generic status of whether the request was able to esponse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Note: The custom marker detection may remain internally enabled, even if disabled by the SDK,
“if another subscriber (including one internal to the robot) requests this vision mode be active.”
Post: “/v1/enable_marker_detection”
56.6.1 Request
The EnableMarkerDetectionRequest has the following fields:
56.6.2 Response
The EnableMarkerDetectionResponse has the following fields:
Table 251:
Field Type Description
EnableMarkerDetection
status ResponseStatus A generic status of whether the request was able to Response JSON
be carried out, or an error code indicating why it structure
was unable to be carried out.
Post: “/v1/enable_motion_detection”
56.7.1 Request
The EnableMotionDetectionRequest structure has the following fields:
Table 252:
Field Type Description
EnableMotionDetection
enable bool True if RobotObservedMotion events should be sent. Request JSON structure
56.7.2 Response
The EnableMotionDetectiontResponse has the following fields:
Table 253:
Field Type Description
EnableMotionDetection
status ResponseStatus A generic status of whether the request was able to Response JSON
be carried out, or an error code indicating why it structure
was unable to be carried out.
Post: “/v1/get_camera_config”
56.8.1 Request
The CameraConfigRequest has no fields.
56.8.2 Response
The CameraConfigResponse structure has the following fields:
Table 254:
Field Type Units Description
CameraConfigResponse
center_x float “The position of the optical center of projection JSON structure
within the image. It will be close to the center of
center_y float
the image, but adjusted based on the calibration of
the lens at the factory.”
focal_length_x float The “focal length combined with pixel skew (as the
focal_length_y
pixels aren't perfectly square), so there are subtly
float
different values for x and y.”
fov_x float degree The full field of view along the x-axis.
fov_y float degree The full field of view along the y-axis.
max_camera_exposure_time_ms uint32 ms The maximum duration allowed for a frame
exposure.
min_camera_exposure_time_ms uint32 ms The minimum allowed duration for a frame
exposure.
max_camera_gain float The maximum allowed camera gain setting.
min_camera_gain float The minimum allowed camera gain setting.
Post: “/v1/is_image_streaming_enabled”
56.9.1 Request
The IsImageStreamingRequest has no fields.
56.9.2 Response
The IsImageStreamingResponse “indicates whether or not image streaming is enabled on the robot.”
The structure has the following fields:
Table 255:
Field Type Description
IsImageStreamingRespo
enable bool True if image streaming is enabled, false otherwise nse JSON structure
Post: “/v1/set_camera_config”
56.10.1 Request
The SetCameraSettingsRequest has the following fields:
Table 256:
Field Type Units Description
SetCameraSettings
auto_exposure_enabled bool True if the camera suhould use auto-exposure parameters
mode.
exposure_ms uint32 ms The requested duration of exposure, when in
manual settings.
gain float
56.10.2 Response
The SetCameraSettingsResponse structure has the following fields:
Table 257:
Field Type Description
SetCameraSettingsResp
status ResponseStatus A generic status of whether the request was able to onse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
status_string string
Some behaviours can be assigned a tag that can be used to cancel it later.
Some behaviours accept a parameter to modify their motion profile.
Behaviour results value
Actions
57.1. STRUCTURES
57.1.1 PathMotionProfile
This structure contains “all the information relevant to how a path should be modified or
traversed.”
Table 258:
Field Type Units Description
PathMotionProfile
accel_mmps2 float mm/sec2 How fast Vector should accelerate to achieve the JSON structure
target speed.
decel_mmps2 float mm/sec2 How fast Vector should decelerate to the target
speed.
is_custom bool
dock_accel_mmps2 float mm/sec2 How fast Vector should accelerate when
performing the docking procedure.
dock_decel_mmps2 float mm/sec2 How fast Vector should decelerate when
performing the docking procedure.
dock_speed_mmps float mm/sec The speed that Vector should perform the docking
procedure at.
point_turn_accel_mmps2 float mm/sec2 How fast Vector should accelerate when turning (in
place).
point_turn_decel_mmps2 float mm/sec2 How fast Vector should decelerate when turning (in
place).
point_turn_speed_mmps float mm/sec The speed that Vector should perform a turn (in
place).
reverse_speed_mmps float mm/sec How fast Vector should moved when backing up
speed_mmps float mm/sec The speed that Vector should move along the path
Post: “/v1/drive_off_charger”
57.2.1 Request
The DriveOffChargerRequest structure has no fields.
57.2.2 Response
The DriveOffChargerResponse type has the following fields:
Table 259:
Field Type Description
DriveOffChargerRespon
result BehaviorResults se JSON structure
Post: “/v1/drive_on_charger”
57.3.1 Request
The DriveOnChargerRequest structure has no fields.
57.3.2 Response
The DriveOnChargerResponse type has the following fields:
Table 260:
Field Type Description
DriveOnChargerRespon
result BehaviorResults se JSON structure
57.4.1 Request
The GoToObjectRequest structure has the following fields:
Table 261:
Field Type Units Description
GoToObjectRequest
id_tag int32 This is an action tag that can be assigned to this JSON structure
request and used later to cancel the action.
Optional.
motion_prof PathMotionProfile Optional.
num_retries int32 The maximum number of times to attempt to reach
the object. A retry is attempted if Vector is unable
to reach the target object.
object_id int32 The identifier of the object to drive to. Note:
custom objects “are not supported”
distance_from_object_origin_mm float mm “The distance from the object to stop. This is the
distance between the origins. For instance, the
distance from the robot's origin (between Vector's
two front wheels) to the cube's origin (at the center
of the cube) is ~40mm.”
use_pre_dock_pose bool Set this to false
57.4.2 Response
The GoToObjectResponse is sent to indicate whether the action successfully completed or not. This
structure has the following fields:
Table 262:
Field Type Description
GoToObjectResponse
result ActionResult An error code indicating the success of the action, JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
57.5.1 Request
The TurnTowardsFaceRequest structure has the following fields:
Table 263:
Field Type Description
TurnTowardsFaceReque
face_id int32 The identifier of the face to look for st JSON structure
57.5.2 Response
The TurnTowardsFaceResponse is sent to indicate whether the action successfully completed or not.
This structure has the following fields:
Table 264:
Field Type Description
TurnTowardsFaceRespo
result ActionResult An error code indicating the success of the action, nse JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
58.1. ENUMERATIONS
58.1.1 JdocType
The JdocType enumeration has the following named values:
58.2. STRUCTURES
58.2.1 Jdoc
The Jdoc type has the following fields:
58.2.2 NamedJdoc
The NamedJdoc type has the following fields:
58.3.1 JdocsChanged
The JdocsChanged message is sent when a Jdoc objct has been changed. This message has the
following fields:
Post: “/v1/pull_jdocs”
58.4.1 Request
The PullJdocsRequest has the following fields:
58.4.2 Response
The PullJdocResponse has the following fields:
Post: “/v1/nav_map_feed”
59.1.1 Request
“Requests [navigation] map data from the robot at a specified maximum update frequency.
Responses in the [navigation] map stream may be sent less frequently if the robot does not consider
there to be relevant new information.”
59.1.2 Response
“A full [navigation] map sent from the robot. It contains an origin_id that which can be compared
against the robot's current origin_id, general info about the map, and a collection of quads
representing the map's content.”
The NavMapInfo is used to describe the map as a whole. It has the following fields:
Table 273: NavMapInfo
Field Type Units Description
JSON structure
root_center_x float mm The x coordinate of the maps center
root_center_y float mm The y coordinate of the maps center
root_center_z float mm The z coordinate of the maps center
root_depth int The depth of the quad tree: the number levels to the
leaf nodes.
root_size_mm float mm The length and width of the whole map. (The map
is square).
Table 274:
Field Type Description
NavMapQuadInfo
color_rgba uint32 Suggested color for the area of the map, used when structure
visualizing the map.
content NavNodeContentType A tag of what Vector has identified as located in
this area.
depth uint32 The depth within the tree.
“Every tile in the [navigation] map will be tagged with a content key referring to the different
environmental elements that Vector can identify.” The NavNodeContentType is used to represent
the kind of environmental element.
Table 275:
Name Value Description
NavNodeContentType
NAV_NODE_UNKNOWN 0 It is not known what is in the area Enumeration
Note: “Vector will drive for the specified distance (forwards or backwards) Vector must be off of
the charger for this movement action. [A] that use the wheels cannot be performed at the same
time; otherwise you may see a TRACKS_LOCKED error.”
60.1.1 Request
The DriveStraightRequest has the following fields:
Table 276:
Field Type Units Description
DriveStraightRequest
dist_mm float mm The distance to drive. (Negative is backwards) JSON structure
60.1.2 Response
The DriveStraightResponse has the following fields:
Table 277:
Field Type Description
DriveStraightResponse
response ActionResult Whether the action is able to be run. If not, an JSON structure
error code indicating why not.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
60.2.1 Request
The DriveWheelsRequest has the following fields:
Table 278:
Field Type Units Description
DriveWheelsRequest
left_wheel_mmps float mm/sec The initial speed to set the left wheel to. JSON structure
2
left_wheel_mmps2 float mm/sec How fast to increase the speed of the left wheel.
right_wheel_mmps float mm/sec The initial speed to set the right wheel to.
right_wheel_mmps2 float mm/sec2 How fast to increase the speed of the right wheel.
60.2.2 Response
The DriveWheelsResponse has the following fields:
Table 279:
Field Type Description
DriveWheelsResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
“Note that actions that use the wheels cannot be performed at the same time, otherwise you may
see a TRACKS_LOCKED error.”
Post: “/v1/go_to_pose”
60.3.1 Request
The GoToPoseRequest structure has the following fields:
Table 280:
Field Type Units Description
GoToPoseRequest JSON
id_tag int32 This is an action tag that can be assigned to this structure
request and used later to cancel the action.
Optional.
motion_prof PathMotionProfile
num_retries int32 Maximum of times to attempt to reach the pose. A
retry is attempted if Vector is unable to reach the
target pose.
rad float radians The angle to change orientation to.
x_mm float mm The x-coordinate of the position to move to.
y_mm float mm The y-coordinate of the position to move to.
60.3.2 Response
The GoToPoseResponse is sent to indicate whether the action successfully completed or not. This
structure has the following fields:
Table 281:
Field Type Description
GoToPoseResponse
result ActionResult An error code indicating the success of the action, JSON structure
the detailed reason why it failed, or that the action
is still being carried out.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
60.4.1 Request
The MoveHeadRequest has the following fields:
Table 282:
Field Type Units Description
MoveHeadRequest
speed_rad_per_sec float radian/sec The speed to drive the head motor at. Positive JSON structure
values tilt the head up, negative tilt the head down.
A value of 0 will unlock the head track.
60.4.2 Response
The MoveHeadResponse has the following fields:
Table 283:
Field Type Description
MoveHeadResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
60.5.1 Request
The MoveLiftRequest has the following fields:
Table 284:
Field Type Units Description
MoveLiftRequest JSON
speed_rad_per_sec float radian/sec The speed to drive the lift at. Positive values move structure
the lift up, negative move the lift down. A value of
0 will unlock the lift track.
60.5.2 Response
The MoveLiftResponse has the following fields:
Table 285:
Field Type Description
MoveLiftResponse JSON
status ResponseStatus A generic status of whether the request was able to structure
be carried out, or an error code indicating why it
was unable to be carried out.
60.6.1 Request
The SetHeadAngleRequest has the following fields:
Table 286:
Field Type Units Description
SetHeadAngleRequest
accel_rad_per_sec2 float radian/sec2 How fast to increase the speed the head is moving JSON structure
at. Recommended value: 10 radians/sec2
angle_rad float radians The target angle to move Vector’s head to. This
should be in the range -22.0° to 45.0°.
duration_sec float sec “Time for Vector's head to move in seconds. A
value of zero will make Vector try to do it as
quickly as possible.”
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
max_speed_rad_per_sec float radian/sec The maximum speed to move the head. (This
clamps the speed from further acceleration.)
Recommended value: 10 radians/sec
num_retries int32 count Maximum of times to attempt to move the head to
the height. A retry is attempted if Vector is unable
to reach the target angle
60.6.2 Response
The SetHeadAngleResponse has the following fields:
Table 287:
Field Type Description
SetHeadAngleResponse
response ActionResult Whether the action is able to be run. If not, an JSON structure
error code indicating why not.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
60.7.1 Request
The SetLiftRequest has the following fields:
Table 288:
Field Type Units Description
SetLiftRequest JSON
accel_rad_per_sec2 float radian/sec2 How fast to increase the speed the lift is moving at/ structure
Recommended value: 10 radians/sec2
duration_sec float sec “Time for Vector's lift to move in seconds. A value
of zero will make Vector try to do it as quickly as
possible.”
height_mm float mm The target height to raise the lift to.
Note: the python API employs a different range for
this parameter
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
max_speed_rad_per_sec float radian/sec The maximum speed to move the lift at. (This
clamps the speed from further acceleration.)
Recommended value: 10 radians/sec
num_retries int32 count Maximum of times to attempt to move the lift to
the height. A retry is attempted if Vector is unable
to reach the target height.
60.7.2 Response
The SetLiftResponse has the following fields:
Table 289:
Field Type Description
SetLiftResponse JSON
response ActionResult Whether the action is able to be run. If not, an structure
error code indicating why not.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
60.8.1 Request
The StopAllMotorsRequest structure has no fields.
60.8.2 Response
The StopAllMotorsResponse has the following fields:
Table 290:
Field Type Description
StopAllMotorsResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
Note: “Vector must be off of the charger for this movement action. Note that actions that use the
wheels cannot be performed at the same time, otherwise you may see a TRACKS_LOCKED
error.”
60.9.1 Request
The TurnInPlaceRequest has the following fields:
Table 291:
Field Type Units Description
TurnInPlaceRequest
accel_rad_per_sec2 float radian/sec2 How fast to increase the speed the body is moving JSON structure
at
angle_rad float radians If is_absolute is 0, turn relative to the current
heading by this number of radians; positive means
turn left, negative is turn right. Otherwise, turn to
the absolute orientation given by this angle.
id_tag int32 This is an action tag that can be assigned to this
request and used later to cancel the action.
Optional.
is_absolute uint32 If 0, turn by angle_rad relative to the current
orientation. If 1, turn to the absolute angle given
by angle_rad.
num_retries int32 count Maximum of times to attempt to turn to the target
angle. A retry is attempted if Vector is unable to
reach the target angle.
speed_rad_per_sec float radian/sec The speed to move around the arc.
tol_rad float count “The angular tolerance to consider the action
complete (this is clamped to a minimum of 2
degrees internally).”
60.9.2 Response
The TurnInPlaceResponse has the following fields:
Table 292:
Field Type Description
TurnInPlaceResponse
response ActionResult Whether the action is able to be run. If not, an JSON structure
error code indicating why not.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
The values are given with respect to a “coordinate space is relative to Vector, where Vector's origin
is the point on the ground between Vector's two front wheels. The X axis is Vector's forward
direction, the Y axis is to Vector's left, and the Z axis is up.”
61.1. ENUMERATIONS
61.1.1 UnexpectedMovementSide
Table 293:
Name Value Description
UnexpectedMovementSi
UNKOWN 0 de Enumeration
FRONT 1
BACK 2
LEFT 3
RIGHT 4
61.1.2 UnexpectedMovementType
Table 294:
Name Value Description
UnexpectedMovementT
TURNED_BUT_STOPPED 0 ype Enumeration
TURNED_IN_SAME_DIRECTION 1
TURNED_IN_OPPOSITE_DIRECTI 2
ON
ROTATING_WITHOUT_MOTORS 3
61.2. STRUCTURES
61.2.1 AccelData
This structure is used to report the accelerometer readings, as part of the RobotState structure. The
accelerometer is located in Vector’s head, so its XYZ axes are not the same as Vector’s body axes.
When motionless, the accelerometer can be used to calculate the angle of Vectors head. The
AccelData has the following fields:
61.2.2 GyroData
This structure is used to report the gyroscope readings, as part of the RobotState structure. The
gryoscope is located in Vector’s head, so its XYZ axes are not the same as Vector’s body axes.
The GryroData has the following fields:
61.2.3 ProxData
This structure is used to report the “time of flight” proximity sensor readings, as part of the
RobotState structure.
“The proximity sensor is located near the bottom of Vector between the two front wheels, facing proximity sensor
forward. The reported distance describes how far in front of this sensor the robot feels an
obstacle is. The sensor estimates based on time-of-flight information within a field of view which
the engine resolves to a certain quality value.”
The distance measurement may not be valid. The sensor may be blocked Vector’s lift or the item
he is carying. Or the sensor may not have picked up anything significant. These are indicated by
“four additional flags are supplied by the engine to indicate whether this proximity data is
considered valid for the robot's internal pathfinding.” It is recommended that an application track
the most recent proximity data from the robot, and the most recent proximity data which did not
have the lift blocking.
61.3. EVENTS
61.3.1 RobotState
The RobotState structure is periodically posted in an Event message. The structure has the
following fields:
Table 300:
Field Type Description
UnexpectedMovement
movement_side UnexpectedMovementSi JSON structure
de
movement_type UnexpectedMovementTy
pe
timestamp uint32 The time that the movement was sensed. The
format is unix time: seconds since 1970, in UTC.
62.1. ENUMERATIONS
62.1.1 OnboardingPhase
Table 301:
Name Value Description
OnboardingPhase
InvalidPhase 0 Enumeration
Default 1
LookAtPhone 2
WakeUp 3
LookAtUser 4
TeachWakeWord 5
TeachComeHere 6
TeachMeetVictor 7
62.1.2 OnboardingPhaseState
Table 302:
Name Value Description
OnboardingPhaseState
PhaseInvalid 0 Enumeration
PhasePending 1
PhaseInProgress 2
PhaseComplete 3
62.2.1 Onboarding
The Onboarding event is sent as different stages of the onboarding process have been completed.
This structure has the following fields:
62.2.2 Onboarding1p0ChargingInfo
This structure is used to report whether Vector needs to charge, and an estimated (or
recommended) duration. It is part of the Onboarding event structure. This structure has the
following fields:
Table 304:
Field Type Units Description
Onboarding1p0ChargingI
needs_to_charge bool If true, Vector needs to charge before onboarding nfoJSON structure
can continue.
on_charger bool If true, Vector is on his charger, and there is power
supplied to the charger.
suggested_charger_time float The estimated amount of time to charger Vector
before completing the onboarding process.
62.2.3 OnboardingState
The OnboardingState type has the following fields:
Table 305:
Field Type Description
OnboardingState JSON
stage OnboardingStages Where in the onboarding process we are structure
Table 306:
Name Value Description
OnboardingStages
NotStarted 0 The onboarding process has not started yet. Enumeration
62.3.1 Request
The OnboardingCompleteRequest structure has no fields.
62.3.2 Response
The OnboardingCompleteResponse type has the following fields:
Table 307:
Field Type Description
OnboardingCompleteRe
completed bool True if the onboarding process has completed. sponse JSON structure
62.4.1 Request
The OnboardingInputRequest has one (and only one) of the following fields:
Table 308:
Field Type Description
OnboardingInputReques
onboarding_charge_info_request {} This is a request for charging information; it t JSON structure
contains no fields.
onboarding_complete_request {} This is a request to complete the onboarding; it
contains no fields.
onboarding_mark_complete_and_ {} This contains no fields.
exit
Table 309:
Field Type Description
OnboardingSetPhaseReq
phase OnboardingPhase The desired phase to be in uest JSON structure
Table 310:
Field Type Description
OnboardingInputRespon
onboarding_charge_info_respons OnboardingChargingInfo See below se JSON structure
e Response
ONBOARDINGCHARGINGINFORESPONSE
This structure is used to report whether Vector needs to charge, and an estimated (or suggested)
duration. It is part of the OnboardingInputResponse event structure. This structure has the
following fields:
Table 311:
Field Type Units Description
OnboardingInputRespon
needs_to_charge bool If true, Vector needs to charge before onboarding se structure
can continue.
on_charger bool If true, Vector is on his charger, and there is power
supplied to the charger.
required_charge_time float The estimated amount of time to charger Vector
before completing the onboarding process.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Note: this structure is similar to the Onboarding1p0ChargingInfo structure. That structures is older,
but retained as software had already been developed against it.
ONBOARDINGCOMPLETERESPONSE
The OnboardingCompleteResponse type has the following fields:
Table 312:
Field Type Description
OnboardingCompleteRe
completed bool True if the onboarding has completed sponse JSON structure
Table 313:
Field Type Units Description
OnboardingPhaseProgre
last_set_phase OnboardingPhase ssResponse structure
last_set_phase_state OnboardingPhaseState
percent_completed int32 % How far we are in the phase.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
ONBOARDINGSETPHASERESPONSE
The OnboardingSetPhaseResponse type has the following fields:
Table 314:
Field Type Units Description
OnboardingSetPhaseRes
last_set_phase OnboardingPhase ponse JSON structure
last_set_phase_state OnboardingPhaseState
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
ONBOARDINGWAKEUPRESPONSE
The OnboardingWakeupResponse type has the following fields:
Table 315:
Field Type Description
OnboardingWakeupResp
charging_info Onboarding1p0ChargingI Whether or not Vector needs to charge after waking onse JSON structure
nfo up.
waking_up bool True if Vector is waking up.
ONBOARDINGWAKEUPSTARTEDRESPONSE
The OnboardingWakeupStartedResponse type has the following fields:
Table 316:
Field Type Description
OnboardingWakeupStart
already_started bool True if the onboarding has completed edResponse JSON
structure
Post: “/v1/get_onboarding_state”
62.5.1 Request
The OnboardingStateRequest structure has no fields.
62.5.2 Response
The OnboardingStateResponse type has the following fields:
Table 317:
Field Type Description
OnboardingStateRespon
onboarding_state OnboardingState Where in the onboarding process we are. se JSON structure
62.6.1 Request
The OnboardingWakeUpRequest structure has no fields.
62.6.2 Response
The OnboardingWakeUpResponse type has the following fields:
Table 318:
Field Type Description
OnboardingWakeUpResp
already_started bool True if the process of waking Vector up for onse JSON structure
onboarding has already been started.
62.7.1 Request
The OnboardingWakeUpStartedRequest structure has no fields.
62.7.2 Response
The OnboardingWakeUpStartedResponse type has the following fields:
Table 319:
Field Type Description
OnboardingWakeUpStar
charging_info Onboarding1p0ChargingI The state of Vectors initial charging tedResponse JSON
nfo structure
waking_up bool True if TBD
63.1. STRUCTURES
Table 320:
Field Type Description
PhotoPathMessage
full_path string JSON structure
Table 321:
Field Type Description
ThumbnailPathMessage
full_path string JSON structure
63.2. EVENTS
63.2.1 PhotoTaken
The PhotoTaken event is sent after Vector has taken a photograph and stored it. This structure has
the following fields:
Post: “/v1/delete_photo”
63.3.1 Request
The DeletePhotoRequest has the following fields:
Table 323:
Field Type Description
DeletePhotoRequest
photo_id uint32 The identifier of the photograph to delete. JSON structure
63.3.2 Response
The DeletePhotoResponse type has the following fields:
Table 324:
Field Type Description
DeletePhotoResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
success bool True if the photograph was successfully removed;
otherwise there was an error.
63.4. PHOTO
This command is used to retrieve the photograph’s image.
Post: “/v1/photo”
63.4.1 Request
The PhotoRequest structure has the following fields:
Table 325:
Field Type Description
PhotoRequest JSON
photo_id uint32 The identifier of the photograph to request. structure
63.4.2 Response
The PhotoResponse type has the following fields:
Table 326:
Field Type Description
PhotoResponse JSON
image bytes The data that make up the photograph’s image structure
Post: “/v1/photos_info”
63.5.1 Request
The PhotosInfoRequest structure has no fields.
63.5.2 Response
The PhotosInfoResponse type has the following fields:
Table 327:
Field Type Description
PhotosInfoResponse
status ResponseStatus A generic status of whether the request was able to JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
photo_infos PhotoInfo[] An array of information about the photographs
available on Vector.
Post: “/v1/thumbnail”
63.6.1 Request
The ThumbnailRequest structure has the following fields:
Table 329:
Field Type Description
ThumbnailRequest
photo_id uint32 The identifier of the photograph to request a JSON structure
thumbnail for.
63.6.2 Response
The ThumbnailResponse type has the following fields:
Table 330:
Field Type Description
ThumbnailResponse
image bytes The data that make up the thumbnail’s image JSON structure
RobotSettingsConfig
64.1. STRUCTURES
64.1.1 AccountSettingsConfig
The AccountSettingsConfig type has the following fields:
Table 331:
Field Type Description
AccountSetting JSON
app_locale string The IETF language tag of the human companion’s structure
language preference – American English, UK
English, Australian English, German, French,
Japanese, etc.
default: “en-US”
data_collection boolean True if data collection – crash logs and DAS events
– are allowed to be uploaded ot the server.
Post: “/v1/update_settings”
64.2.1 Request
The UpdateSettingsRequest has the following fields:
Table 332:
Field Type Description
UpdateSettingsRequest
settings RobotSettingsConfig The settings to apply to the robot. JSON structure
64.2.2 Response
The UpdateSettingsResponse type has the following fields:
Table 333:
Field Type Description
UpdateSettingsRespons
code ResultCode Whether or not the update was accepted and e JSON structure
completed.
doc Jdoc The Jdoc with the updated settings.
status ResponseStatus A generic status of whether the request was able to
be carried out, or an error code indicating why it
was unable to be carried out.
Post: “/v1/update_account_settings”
64.3.1 Request
The UpdateAccountsSettingsRequest has the following fields:
64.3.2 Response
The UpdateAccountsSettingsResponse type has the following fields:
Table 335:
Field Type Description
UpdateAccountSettings
code ResultCode Whether or not the update was accepted and Response JSON
completed. structure
65.1. ENUMERATIONS
65.1.1 UpdateStatus
The UpdateStatus enumeration has the following named values:
Table 336:
Name Value Description
UpdateStatus
IN_PROGRESS_DOWNLOAD 2 The software update is currently being downloaded. Enumeration
Post: “/v1/start_update_engine”
This command uses the same request and response structures as CheckUpdateStatus
Post: “/v1/check_update_status”
65.3.1 Request
The CheckUpdateStatusRequest structure has no fields.
65.3.2 Response
This is streamed set of update status. The CheckUpdateStatusResponse type has the following
fields:
Table 337:
Field Type Description
CheckUpdateStatusRes
expected int64 The number of bytes expected to be downloaded ponse JSON structure
65.4.1 Request
The UpdateAndRestartRequest structure has no fields.
65.4.2 Response
The UpdateAndRestartResponse has the following fields:
Table 338:
Field Type Description
UpdateAndRestartResp
status ResponseStatus A generic status of whether the request was able to onse JSON structure
be carried out, or an error code indicating why it
was unable to be carried out.
https://github.com/anki/vector-python-sdk/tree/c14082af5a947c23016111c1f73a445d8356dbf8
Some commands removed from this repository (possibly later) because they were not implemented
on Vector include:
Network statistics
Cavaet: This feature is not present in the production releases, nor many of the development
releases. As this is a debugging tool, the schema for the data provided over the web-socket
probably changed with each software version.
The developer build includes some special URLs for listing a manifest of the control variables:
It also provides an HTTP GET interface to access the control variables, and initiate functions:
/daslog
/getAppMessages
/processstatus
/sendAppMessage
/systemctl
/consolevarlist
note: the module name can have spaces; it can also be dotted, as in Major and Minor name.
/consolevarget?key=name_of_variable
/consolevarset?key=name_of_variable&value=new_value
Frame
· Type
· Module
· Data
WebSocket
Before we go further we’ll need to know the module identifiers. The modules differ by ports that
vend their events:
ws://address:port/socket
Where the address is the address of the Vector of interest, and the port is the shared port for the
modules of interest, given in the earlier table.
To subscribe to events from a module post the following JSON structure to the web socket:
To unsubscribe from events from a module post the following JSON structure to the web socket:
Table 343:
Field Type Description
Unsubscribing from a
module string The lower case name of the module to unsubscribe to events module’s events
from. See Table 341: Module names and their ports for the
module names
type string “unsubscribe”
Events related to the module will come with the following structure:
When an audio event is sent, the event structure has the following fields:
When all audio events are stopped, the event structure has the following fields:
When an audio group is set to a new state, an event structure with the following fields is sent:
68.2.6 Behaviors
Todo There are two forms of events send by the behaviors module.
When a behavior event is sent, the event structure has the following fields:
When a behavior condition event is sent, the event structure has the following fields:
The cloud intent event is a structure or an array of structures with the following fields:
68.2.9 CPU
When a CPU event is sent, the event structure has the following fields:
35
http://www.linuxhowtos.org/System/procstat.htm
68.2.11 Cubes
Note: The Cube events are sent regardless of whether Vector is communication with his cube.
When a Cube event is sent, the event structure has the following fields:
36
What does this mean if there are both?
Table 368:
Field Type Description
StateCountDown
DisconnectingIn The duration before the connection will exit parameters
SwitchingToBackgroundIn string
Table 369:
Field Type Description
SubscriberData
ExpiresIn The duration before the connection will exit. This field is parameters
only present if the subscriber has included a timeout.
Optional
SubscriberName string The name of the behavior (the ID) that requested this
connection.
SubscriptionType string “Background” or “Interactable”
68.2.12 Features
The features event is an array of structures with the following fields:
68.2.16 Intents
The intents events can come in two different forms. In one kind, as an array of the following
structure:
68.2.18 Mood
These structures are similar to, but differ from, those in Chapter 28.
When mood event is sent, the event structure has the following fields:
68.2.19 NavMap
The NavMap events are used to transfer the current navigational map, and location of items in the
map. Map events won’t be sent unless the application has sent a request to enable the events. (See
section 68.3.7 NavMap)
The navigation map events include a type field that describes how to interpret the rest of the
structure. Note: the observed object events are also sent to NavMap subscriber, to update their
positions.
When the map is sent, there are several different structures: one to begin, one or more contents, and
then one to end. The beginning has the following fields:
The QuadInfo is “an individual sample of Vector's [navigation] map. This quad's size will vary and
depends on the resolution the map requires to effectively identify boundaries in the environment.”
It has the following fields:
The cube location is updated with an event with the following fields:
The deleted face event is sent when Vector no longer sees a given face. The structure has the
following fields:
This observed face event is sent while Vector sees and tracks a face in his view. The structure has
the following fields:
This observed object event is sent while Vector sees and tracks a face in his view. The structure
has the following fields:
Note the code has object observed events sent on two websockets!
68.2.21 Power
The power event is a structure with the following fields (or an array of these structures):
68.2.22 Sleeping
The sleeping event is structure with the following fields:
68.2.23 SoundReactions
See also section 68.2.23 SoundReactions
Table 398:
Field Type Units Description
SoundReactions event
activeState boolean True if a voice has been detected. False otherwise. parameters
endTime_ms uint ms
startTime_ms uint ms
count uint
enabled string “true” “false”
esn string The robot’s electronic serial number.
osVersion string The version of the software running on Vector.
numActiveModes uint count The number of vision modes that are currently enabled.
patternWidth uint
68.3.1 Behaviors
The following command is sent to submit a behavior. Not sure if it bypasses the condition checks.
68.3.2 Cubes
The following command is sent to enable and disable features on the cube:
68.3.3 Features
The feature settings can be enabled, disabled, or reset. The posted structure includes a type field
that describes how to interpret the rest of the structure.
Table 405:
Field Type Units Description
Enable/disable Feature
name string The name of the AI feature to enable or disable. parameters
override string “none”, Whether or not the feature should be enabled or disabled
“enabled”,
“disabled”
The following command is sent to reset all of the features to their default state:
68.3.4 Habitat
The following command is sent to force Vector to think that he is in or out of his habitat:
68.3.5 Intent
The following command is sent to submit an intent to AI engine:
68.3.6 Mood
The following command is sent to enable and disable features on the cube:
68.3.7 NavMap
The following command is sent to request an updated map:
69. CONFIGURATION
The server URLs are specified in
/anki/data/assets/cozmo_resources/config/server_config.json
/anki/etc/vic-crashuploader.env
/anki/etc/update-engine.env
/anki/data/assets/cozmo_resources/ config/DASConfig.json
/anki/data/assets/cozmo_resources/ config/engine/jdocs_config.json
to set how often it contacts the server. (The path to this JSON file is hardcoded in
libcozmo_engine.) The configuration also lists the base name of the json file (without the .json
extension) used to store the jdoc file locally.
The interactions are basic: store, read, and delete a JSON blob by an identifier. The description
below37 gives the JSON keys, value format. It is implemented as gRPC/protobuf interaction over
HTTP.
37
The protocol was specified in Google Protobuf. Vic-Cloud and Vic-Gateway were both written in Go. There is enough information
in those binaries to reconstruct significant portions of the Protobuf specification in the future.
70.2.1 Request
The DeleteDocReq request message has the following fields:
70.2.2 Response
The DeleteDocResp response message has the following fields:
70.3.1 Request
EchoReq
data
70.3.2 Response
EchoResp
data
70.4.1 Request
The ReadDocsReq request message has the following fields
70.4.2 Response
ReadDocsResp
items
70.5.1 Request
The ReadDocsReq_Item request message has the following fields
70.5.2 Response
The ReadDocsResp_Item response message has the following fields:
70.6.1 Request
The WriteDocReq request message has the following fields
70.6.2 Response
The WriteDocResp response message has the following fields:
71.1.1 Response
The request sent to the server has the following fields
Not sure where the stream open goes. Does it upload the file, or live stream it?
71.1.2 Response
The server response message has the following fields
Table 425:
Field Type Units Description
intent_clock_settimer_ex
timer_duration int seconds The number of seconds to set the timer to. tend parameters
Table 426:
Field Type Units Description
intent_global_stop_dele
entity_behavior_deletable bool table parameters
Table 427:
Field Type Units Description
intent_global_stop_ext
entity_behavior_stoppable bool end parameters
Table 428:
Field Type Units Description
intent_imperative_eyeco
eye_color string The name of the color to set the eye color to lor_extend parameters
Table 429:
Field Type Units Description
intent_imperative_volum
volume_level string elevel_extend
parameters
Table 430:
Field Type Description
intent_knowledge_respo
answer string The text to be spoken nse_extend parameters
Table 431:
Field Type Units Description
intent_message_playme
given_name string The name of the person to send the message to ssage_extend
parameters
Table 432:
Field Type Units Description
intent_names_username
username string The name of the user _extend parameters
Table 433:
Field Type Units Description
intent_photo_take_exten
entity_photo_selfie string Empty string if taking a photo, “photo_selfie” if d parameters
taking a selfie.
Table 434:
Field Type Units Description
intent_weather_extend
condition string The current weather conditions. One of “Clear”, parameters
“Cloudy”, “Cold”, “Rain”, “Snow”, “Stars”,
“Sunny”, “Thunderstorms”, or “Windy”
is_forecast string “false” or “false” if it is the current weather conditions; “true”
“true” if forecasted weather conditions.
local_datetime string The local time (where the weather conditions
apply) in UTC ISO 8601 format.
speakable_location_string string The location name that Vector could employ in his
verbal description of the temperature.
temperature string degrees The current or forecasted temperature, in the given
units.
temperature_unit string F or C, for the units
72.1.1 vic-log-upload
vic-log-upload sends logs to an Amazon S3 server, with the bucket information in the server-
config.json file. See chapter 32, section 142.3 Gathering logs, regularly for more details on this
file.
72.1.2 vic-logmgr-upload
This section describes how logs are uploaded by vic-logmgr-upload. That program is not called.
See chapter 32, section 142.2 Vic-logmgr-upload for more details.
The logs are uploading by performing a HTTP PUT to the server. The URL is the “logfiles” URL
in the server configuration file, with a file name of the form:
year-month-day-hour-minute-seconds
The URL (including the key) is set in the vic-crashuploader configuration file. See chapter 32
section 142.7 Crash Logs for more details on vic-crashuploader and how minidumps are acquired.
MessageAttribute.1.Name DAS-Transport-Version
MessageAttribute.1.Value.DataType Number
MessageAttribute.1.Value.StringValue 2
MessageAttribute.2.Name Content-Encoding
MessageAttribute.2.Value.DataType String
MessageAttribute.2.Value.StringValue gzip, base64
MessageAttribute.3.Name Content-Type
MessageAttribute.3.Value.DataType String
MessageAttribute.3.Value.StringValue application/vnd.anki.json; format=normal;
product=vic
MessageBody
Version 2012-11-0538
Note: there may be a body of compressed JSON data. These values are hardcoded in vic-dasmgr
and libcozmo_engine. The URL is set in the vic-dasmgr configuration file.
Each entry of the upload JSON data includes a profile id; it can be tied to the user account, but
Unless you create an account and log in, Analytics Data is stored under a unique ID and
not connected to you.
See Chapter 32, section 144.2 DAS for more information on the DAS events and configuration file.
38
This date is very far in the past, before Vector or Cozmo were developed. This was the time frame of the Overdrive product
development.
Advanced Functions
This part describes items that are Vector’s primary function.
AUDIO INPUT. A look at Vector’s ability to hear spoken commands, and ambient sounds.
IMAGE PROCESSING. Vector vision system is sophisticated, with the ability to recognize
marker, faces, and objects; to take photographs, and acts as a key part of the navigation
system.
ACCESSORIES. A look at Vector’s home (charging station), companion cube and custom
objects.
Audio Input
This chapter describes the sound input system:
Spatial audio processing localizes the sound of someone talking from the background
music.
The feature extraction detects the ambient activity, and the tempo of the music. If the
tempo is right, Vector will dance to it. This also provides basic stimulation to Vector.
Voice activity detector usually triggered off of the signal before the beam-forming.
A wake word is used to engage the automatic speech recognition system. Note: the wake
word is also referred to as the trigger word.
A CODEC is used to compress the audio before sending it to the remote server; Alexa
Voice Services use the Opus audio CODEC.
The speech recognition system is on a remote server. The audio sent to the automatic
speech recognition system is compressed to reduce data usage.
Vic-Robot
Vic-Gateway SDK applications
Vic-Spine
UART
44MEMs
4MEMs
MEMs Body-Board
Microphones
Microphones
Microphones SPI
Note: providing the audio input to the SDK (via Vic-gateway) was never completed. It will be
discussed based on what was laid out in the protobuf specification files.
The audio processing blocks, except where otherwise discussed, are part of Vic-Anim. These
blocks were implemented by Signal Essence, LLC. They probably consulted on the MEMs
microphones and their configuration. Although the Qualcomm family includes software support
for these tasks, as part of the Hexagon DSP SDK; it is believed that Signal Essence did not take
advantage off it.
The body-board samples each microphone at 1.5 M samples/sec – but at only 1 bit/sample! It
passes the stream of samples thru a filter, produces audio at 15,625 samples/sec, with 16
bits/sample (effectively it may have anything in the range 10 to 16 bits, and padding out the rest).
The filter also acts as a low pass filter, removing high frequency sampling artifacts. The most
important part is that it preserves “phase information” so that the beam forming and direction
finding steps work well. (More on this in a later section).
The audio samples are transferred to the Vic-spine module (part of Vic-robot) in regular
communication with the head-board. The message from the body-board to the head board for
sending 4 channels of audio samples includes 80 samples per channel (320 samples total).
The input system triggers the SPIs to start gathering the data into their respective buffers. After
that:
1. When the DMA has filled half of the buffer, it generates an interrupt. The filtering on all
four channels is initiated for this half of the buffer, puts the result into the outgoing
message buffer.
2. When the DMA has filled the second half of the buffer, it generates and end of transfer
interrupt. The filtering on all four channels is initiated for this second half of the buffer,
again putting the result into the outgoing message buffer. (In the mean time, the DMA has
automatically looped back to the start of buffer and kept the SPI transferring the bits.)
3. If the outgoing buffer is full (i.e., after the DMA buffers have been filled twice), the UART
transmit is initiated.
It is possible that the firmware uses two buffers, one that is filled by the filtering, and another that
is sending data on the UART, and swapping every time it’s filled. It is more likely that only that
the body-board fills the same output buffer as data is being sent from it to the head-board, to save
on memory usage. Although the SPI is 2-3x faster than the UART, the filter stage takes 6 bits for
every for every data bit that is sent to the head board. The UART can effectively send data at least
2x faster than the SPIs receive.
Each microphone is driven at 1.5 M samples/sec (half the SPI clock frequency). The ratio
between this input sample rate and the output sample rate (15,625) – called the decimation
– is 96:1.
Since it takes 96 input samples (bits) to get one output 16-bit sample39, the bit-rate
reduction is 6:1.
Altogether the audio sampling, filtering/decimation, and sending to the head-board uses at least
4KB of the MCU’s 8KB of RAM.
39
The filtering may give the audio samples an effective range ~11 or 12 bits. The Customer Care Information Screen (CCIS) shows the
microphones to be about 1024 when quiet.
BEAM-FORMING combines the multiple microphone inputs to cancels audio coming from other
directions.
A histogram of the directions that the sound(s) in this chunk of audio came from. There
are 12 bins, each representing a 30° direction.
The direction that is picked for the origin of the sound of interest
The sound stream isolated for the picked direction, in the form of 160 16-bit PCM audio
samples.
See also:
Chapter 14, section 48.5 Audio Processing Mode for a potential method to enable and
disable the spatial sound processing;
Chapter 14, section 48.4 Audio Feed (from the Microphones) for potential access to the
audio stream via the HTTPS API.
Spatial
Echo Noise Auto Gain
Audio Vic-Cloud
cancellation suppression control
processing
The combination of spatial processing and noise reduction gives the cleanest sound (as compared
with no noise reduction and/or no spatial processing).
Vector is also likely to ignore the microphones while sounds are playing.
Percentile
Filter Sound Level
The TBD {loudness estimator} might use an algorithm similar to the following steps:
1. First, the sound filter is to make the sound better reflect how our ears hear it, and/or
remove elements that would cause false triggers. Two popular approaches are “equal
loudness” by David Robinson and “a-weighting.” Both take into account how people
perceive sounds loudness by giving less weight to some frequencies regions (the very low
and high), and more weight to others (the very middle).
2. Every few tens or hundreds of milliseconds the “power” level of the sound is computed.
This is the logarithm of the root mean square (RMS) of the filtered values –squaring each
value, averaging that, taking the square root and then computing its logarithm. Often this
calculation is rearrange to be a bit faster, by skipping the square root and adjusting the
logarithm scaling factor.
3. The computed power can then be compared against an estimate of the noise floor (the
generic ambient sound level), to see if there is some activity, even the beat of a music.
4. The power levels are also tracked for a second (or a few seconds). The values could be
averaged. Or the values could be sorted, from smallest to largest. The first value ~95% of
The noise floor could be taken from the lowest value in the sorted array (step 4) – or the value that
is, say, 5% into the array can be treated as the noise floor. Or it could be estimated by taking a low
pass filter on the lowest values. The key is that so even though the sound level is increasing, the
noise level is it is slow move up. A low pass filter has the advantage of not taking a large amount
of memory – using a large the percentile filter window (and using the lowest value in it) would
take much more memory to prevent confusing several minutes of music with silence.
The beat detection is made of two related sub-functions. The first is a fast detector that can be used
for quick dance responses in time to the music. The second finds the tempo – the beats per minute
– of the music, which is also good indication that there is music playing (and not other activity).
Note: See chapter 24, section 108.1.1 Pitch tracker for how to find the pitch.
Once a beat is detected, it holds off sending another event until the signal has dropped below a
threshold for at least half a second or more. Another timer may be used to tell when the music has
stopped: the timer is reset whenever a new beat is detected, and expires if a beat has not been
detected for a few seconds.
Although simple to implement, loud noises can trick it, and it is not very good at measuring the
tempo (the speed of the music in beats per second).
74.5.2 Tempo
A more accurate approach is to use a spectrogram to measure the tempo. The beats are very low
frequency signals in the spectrograph. Music might be in the range of 50-110bpm (0.7Hz to 2Hz).
The approach is to search the spectrogram in this frequency range for signals above a minimum
threshold (to screen out generic sounds), and pick the strongest.
Scoreboard
1. Take the sound input, and perform a low pass filter in it; this eliminates aliasing noises that
can come down-sampling
2. Next is to down sample the audio to only a few samples per second, and hold the results in
a window a few seconds wide.
4. The FFT results are examined to find frequencies with a power above a threshold. These
are the potential beats (in Hz)
5. The beats are then tracked in a scoreboard. The scoreboard tracks which beats are
consistent and which are transitory. The beat-rates that haven’t been heard in a while are
discounted or cleared out with time.
6. A tempo, perhaps the highest persistent beats/minute, is then reported as the most likely
rate.
The drawback of this approach is that is “slow” and can’t be used to dance in time to the music
with. The time window to find slower beats (the ones about every second) is very long, it can take
a few seconds before it will have anything about the music beats.
The voice activity detector and the wake word are used so that downstream processing – the wake
word detection, and the automatic speech recognition system – are not engaged all the time. They
are both expensive (in terms of power and CPU load), and the speech recognition is prone to
misunderstanding.
When the voice activity detector triggers – indicating that a person may be talking – the spatial
audio processing is engaged (to improve the audio quality) and the audio signals are passed to the
Wake Word Detector.
The detector for the “Hey, Vector” is provided by Sensory, Inc. Pryon, Inc provided the detector
for “Alexa.” 41 The recognition is locale dependent, detecting different wake words for German,
etc. It may be possible to create other recognition files for other wake words.
1. A connection (via Vic-Cloud) is made to the remote speech processing server for automatic
speech recognition.
2. If there was an intent found (and control is not reserved), the intent is mapped to a local
behaviour to be carried out. This is described in a later section.
/anki/data/assets/cozmo_resources/ config/micData/micTriggerConfig.json
40
Vector’s wake word detection, and speech recognition is pretty hit and miss. Signal Essence’s demonstration videos show much
better performance. The differences are they used more microphones and the spatial audio filtering in their demos. . Version 1.7
improved echo cancellation and wake word detection.
41
This appears to be standard for Alexa device SDKs.
A WakeWordLocale is used to map a language locale to the wake word recognition models to use.
This structure has the following fields:
locale string The IETF language tag of the human companion’s language
preference – American English, UK English, Australian English,
German, French, Japanese, etc.
default: “en-US”
modelList WakeWordModel[] The wake word speech recognition models, in a variety of sizes
Each WakeWordModel provides a set of word recognition models that can be used. The structure
has the following fields:
defaultSearchFileIndex uint The index of the model (in searchFileList) to use by default.
modelType string e.g. “size_500kb” or “size_1mb”
netFileName string Name of a file.
searchFileList WakeWordFile[] The wake word speech recognition models, in a variety of sizes
searchFileList string The name of the file…? (relative to the data directory). “NA” if a
file name is not applicable.
42
The names of the structures here were created for clarity; they are not actually used in the files.
3. A WakeWordEnd (see Chapter 14 section 48.2.3 WakeWord) event message is sent (to Vic-
Gateway for possible forwarding to a connected application) when the Vic-cloud has
received a response back. If control has not been reserved, and intent was received, the
intent JSON data structure is included.
4. If there was no intent found, a StimulationInfo (see Chapter 14, section 44.2.2
StimulationInfo) event message is post (to Vic-Gateway), with an emotion event such as
NoValidVoiceIntent
5. If there was an intent found (and control is reserved), a UserIntent (see Chapter 14, section
48.2.2 UserIntent) event is posted to Vic-Gateway for possible forwarding to a connected
application. In this case, the intent will not be carried out.
An external application can send an intent to Vector using the AppIntent command (see Chapter
14, section 48.3 App Intent).
Select whether the audio would have the spatial audio filter and noise reduction processing
done on it.
Include the direction of sound information from the spatial audio processing (see section
74.2 Spatial audio processing)
1600 audio samples; Note: this is 10x the chunk size of the internal processing size
Language
Speech recognition
understanding
(ASR)
Vic-Cloud · Words
· Words
· Tone
· Tone
· Intent
· Intent
· Referred to Entities
What the user said is mapped to a user intent. This is a code and structure that represents an action
to carry out in response to the spoken request, query, or statement; it may represent the action
requested, an answer to a query, or an action that emotionally responds to what was said. The
intent includes some supporting information – the colour to set the eyes to, for instance. Many of
the phrase patterns and the intent they map to can be found in Appendix I. The intent may be
further handled by Anki servers; the intent is eventually sent back to Vector.
The intent system does some replacement on the intent names and parameters 43 from the cloud and
SDK application to names used internally within Vector’s engine.
Behavior /
Coordinator
It uses separate tables for the intents passed by the cloud and those passed from an SDK
application. With the cloud based intent,
1. Looks up to see if there is a rule matching the name of the passed intent. If there is no
match, the intent (may be) is passed to the next stage. If the internal intent name associated
with the rule will be used, and
2. Each of the passed intent parameter names is checked to see if the name should be changed
to an internal name. If so it is changed to the internal name; otherwise the parameter’s
passed name is (probably) used.
43
The complexity suggests that the development of the server, mobile application and Vector were not fully coordinated and needed this
to bridge a gap.
1. Looks up to see if there is a rule matching the name of the passed intent. If there is no
match, the intent is discarded. If the internal intent name associated with the rule will be
used, and
2. Each of the passed intent parameter names is checked to see if the name should be changed
to an internal name. If so it is changed to the internal name; otherwise the parameter is
discarded.
The intent is also checked to see if it is enabled. Each intent can be associated with a feature flag;
if it is, the flag is looked up to see if the corresponding feature is enabled. (see also Chapter 30
section 131 Feature Flags).
Table 445:
Field Type Units Description
global_delete
what_to_stop string parameters
Table 447:
Field Type Units Description
imperative_eyecolor_sp
eye_color string The name of the color to set the eye color to ecific parameters
Table 448:
Field Type Units Description
imperative_volumelevel
volume_level string parameters
Table 449:
Field Type Units Description
knowledge_response
answer string The text to be spoken(?) parameters
Table 451:
Field Type Units Description
message_playback
given_name string The name of the person to send the message to parameters
Table 453:
Field Type Units Description
take_a_photo
empty_or_selfie string Empty string if taking a photo, “photo_selfie” if parameters
taking a selfie.
Table 455:
Field Type Units Description
test_timeWithUnits
time uint parameters
units string
Table 456:
Field Type Units Description
weather_response
condition string The current weather conditions. One of “Clear”, parameters
“Cloudy”, “Cold”, “Rain”, “Snow”, “Stars”,
“Sunny”, “Thunderstorms”, or “Windy”
isForecast string “false” or “false” if it is the current weather conditions; “true”
“true” if forecasted weather conditions.
localDateTime string The local time (where the weather conditions
apply) in UTC ISO 8601 format.
speakableLocationString string The location name that Vector could employ in his
verbal description of the temperature.
temperature string degrees The current or forecasted temperature, in the given
units.
temperatureUnit string F or C, for the units
/anki/data/assets/cozmo_resources/ config/engine/behaviorComponent/user_intent_map
.json
The path is hard coded into libcozmo_engine.so. The file has the following structure:
Each of the simple voice response mapping entries has the following structure:
Each of the user intent mapping entries has the following fields:
A detailed description of how the sound loudness can be measured and used to adjust the
volume of music playback.
Note: the filter implementation for audio effects can be very complex; for sound detection it is
very simple.
ST Microelectronics, Reference manual, STM32F030x4/x6/x8/xC and STM32F070x6/xB advanced
ARM®-based 32-bit MCUs, 2017 Apr, Rev 4
https://www.st.com/resource/en/reference_manual/dm00091010-stm32f030x4-x6-x8-xc-and-
stm32f070x6-xb-advanced-arm-based-32-bit-mcus-stmicroelectronics.pdf
Image Processing
Vector has a clever vision processing system:
Python SDK
Vic-Gateway
applications
Vic-engine
Object &
Face Recognition
Detection
Mapping Photos
Motion
detection
Convert to
Gray-Scale & Calibration / Illumination
Shrink the correction level
image
Frame rate
Reducer
dev/socket/vic-engine-cam_client0
Camera mm-anki-camera
MIPI
Special visual markers; Vector treats all marked objects as moveable… and all other
objects in its driving are as fixed & unmovable.
Faces
Hands
More pixels require much more memory at each stage of image the image processing.
It takes much, much longer (and more power) to process larger frames. There is the added
time to process each of the added pixels. Second, the neural-net models (used for human,
pet and object recognition) are much larger as well, taking much longer to process with the
many stages involved in these models.
That extra processing is among the most power expensive items in Vector, and rapidly
depleting his battery, shortening the time between charges,
The extra processing also generates heat in the head board, and
Image processing tasks don’t need more pixels. There is rarely any improvement in visual
detection from using more pixels or higher frame rates
The software in reduces the frame rate by skipping frames (no fancy interpolation needed). Then
the image is converted to gray scale and scaled down to quarter size (640x360). (This was also the
case with Cozmo.)
Vector’s camera has ~120° diagonal field of view.44 For comparison the iPhone’s camera has a 73˚
field of view, and the human eye is approximately 95˚. The cropped sensor image has a 90°
horizontal field of view and a 50° vertical field of view.
44
The press release for Vector reported a 120° field of view, but should be discounted as this number does not match the frame field of
view numbers given in the SDK documentation.
Vector’s calibration uses focal length instead of field of view. The two values are related:
“A full 3x3 calibration matrix for doing 3D reasoning based on the camera images would look
like:”
77.3. CORRECTION
With each image frame to be processed, the software applies some processing to improve the
image contrast. This helps with the low-light that is common in rooms and at night. (The software
also monitors the illumination levels and tweaks the exposure settings so that image is as good as
possible before it gets to the software stage.)
The vision processing system has many detectors, and functions. Some have their software run at
different rates. While most are independent of each other, they are often grouped together.
OverheadMap disabled
People This mode is used to detect people, rather than
faces
Pets This mode is used to detect pets, such as cats
and dogs.
Hands Used to detect hands (for purposes of
pouncing on them).
SaveImages This mode is used to save the camera image as
a photograph.
Stats This is probably used to compute statistics
about the images or image processing
Viz This module creates a marked up image
showing where Vector see’s the charger,
cubes, faces, and other interesting things.
WhiteBalance This mode is used to estimate the
The camera is also used as an ambient light sensor when Vector is in low power mode (e.g.
napping, or sleeping). In low power mode, the camera is suspended and not acquiring images.
Although in a low power state, it is still powered. The software reads the camera’s auto
exposure/gain settings and uses these as an ambient light sensor. (This allows it to detect when
there is activity and Vector should wake.)
Vector can detect visual movement in its field of view. This motion detector looks in two regions
of the camera view (the low left and the top right) for movement, and it looks at its projected view
of the ground for movement.
Vertical
Right
Size
Left Region
Region
The detector likely does pixel subtraction in these regions between frames, computing a score for
the number of pixels that changed and how wide of an area it was (the centroid). Then it adds this
with the past value (using the inverse of DecreaseFactor as a weight). If score is above a threshold
(MaxValue?) it concludes that there is motion in that region.
See Section 83.1.7 MotionDetector for a description of the motion detectors configuration.
The motion detector is used by the pouncing behaviors – see Chapter 29, section 122.2 Pouncing
DEFAULT_HEAD_CAM_ROTATION = [
0, -0.0698, 0.9976,
-1, 0, 0,
0, -0.9976, -0.0698 ]
# Get the neck pose (transform the initial offset by the robot's pose)
neck_pose = TransformPose(NECK_JOINT_POSITION, robot.pose)
45
https://forums.anki.com/t/camera-matrix-for-3d-positionning/13254/5
A key characteristic of the markers is a big, bold square line around it:
The square is used to estimate the distance and relative orientation (pose) of the marker and the
object is on. Vector, internally, knows the physical size of marker. The size of the square in the
view — and being told how big the shape really is —lets Vector know enough to compute the
likely physical distance to the marked item. And since the “true” mark has parallel lines, Vector
can infer the pose (relative angles) of the surface the mark is on.
The process of finding and decoding the marker symbols is very straightforward, since there is
quite a lot known about the structure of the marker image ahead of time. This allows the use of
computation friendly algorithms.
2. Apply classic erosion-dilation and Sobel transforms to build a vector representation (no
pun intended) of the image; this is most familiar as “vector drawing” vs bitmap images
3. Detect the squares – the parallel and perpendicular lines – in the vector drawing. This will
be the potential area that a symbol is in.
4. Analyze square to determine is size, and affine transform – how it is tilted up-and-down,
and tilted away from the camera.
2. Performs (erosion, dilation) that strip out noise, fill in minor pixel gaps. There are no small
features, so fine detail is not important.
3. The image is then converted to high-contrast black and white (there is no signal in grey
scale). This is done by performing a histogram of the grey scale colors, finding a median
value. This value is used as a threshold value: greys darker than this are consider black (a
1 bit), and all others are white (0 bit).
1. Typically a pair of Sobel filters is applied to identify edges of the black areas, and the
gradients (the x-y derivative) of the edges.
2. The adjacent (or nearby) pixels with similar gradients are connected together into a list.
Straight line segments will have very consistent gradients along them. In other words, the
bitmap is converted into a vector drawing. In jargon, this is called the morphology.
3. The lists of lines are organized into a containment tree. A bounding box (min and max
positions of the points in the list) can be used to find which shapes are around others. The
outer most shape is the boundary.
4. “Corners of the boundaries are identified… by filtering the (x,y) coordinates of the Stein, 2017
boundaries and looking for peaks in curvature. This yields a set of quadrilaterals (by
removing those shapes that do not have four corners).”
5. A perspective transformation is computed for the square (based on the corners), using
homography (“which is a mathematical specification of the perspective transformation”).
This tells how tilted the square is.
6. The list of squares is filtered, to keeping those that are big enough to analyze, and not
distorted with a high skew or other asymmetries.
1. The software uses the perspective transform to map the first point location to one in the
image;
2. The pixels at that point in the image are sampled and used to assign a 0 or 1 bit for the
sample point.
4. The above steps are repeated for the rest of the probe locations
This process allows Vector to decode images warped by the camera, its lens, and the relative tilt of
the area.
Next, the bit patterns are compared against a table of known symbol patterns. The table includes
multiple possible bit patterns for any single symbol, to accommodate the marker being rotated.
There is always a good chance of a mistake in decoding a bit. To find the right symbol, Vector:
1. XOR’s the decoded bit pattern with each in its symbol table,
2. Counts the number of bits in the result that are set. (A perfect match will have no bits set,
a pattern that is off by one bit will have a single bit set in the result, and so on.)
3. Vector keeps the symbol with the fewest bits set in the XOR result.
Vector knows (or is told) the physical size of the symbol, and the object holding the symbol.
Combining this with the visual size of the object, time of flight distance measurement (if any), and
Vector’s known position, this allows Vector infer the objects place in the map.
Face detection ability – the ability to sense that there is a face in the field of view, and
locate it within the image.
Face recognition, the ability to identify whose face it is, looking up the identify for a set of
known faces
Recognize parts of the face, such as eyes, nose and mouth, and where they are located
within the image.
Blink
There are a couple of areas that Vector includes access to in the SDK API, but did not incorporate
fully into Vector’s AI:
The ability to recognize the facial expression: happiness, surprise, anger, sadness and
neutral. This is likely to be unreliable; that is the consensus of research on facial
expression software.
And there are several features in OKAO that are not used
Hand detection and the ability to detect an open palm. The hand detection used in Vector
is done in a different way (which we will discuss in a section below.)
Vector’s face detector (and facial recognition) can’t tell that it is looking at an image of face – Daniel Casner, 2019
such as a picture, or on a computer screen – rather than an actual face. One thing that Anki was
considering for future products was to move the time of flight sensor next to the camera. This
Side note: Anki was exploring ideas (akin to the idea of object permanence) to keep track of a
known person or object in the field of view even when it was too small to be recognized (or
detected).
If you introduce yourself to Vector by voice, you are permitting the robot to associate the name
you provide with Facial Features Data for you. Facial Features Data is stored with the name you
provide, and the robot uses this data to enhance and personalize your experience and do things like
greet you by that name. This data is stored locally on the robot and in the robot’s app. It is not
uploaded to Anki nor shared, and you can delete it anytime.
The Enable Face Detection (see Chapter 14 section 54.4 Enable Face Detection) command
enables and disables face detection and analysis stages.
The Set Face to Enroll (see Chapter 14 section 54.10 Set Face to Enroll) command is used
to ability assign a name to face, and the Update Enrolled Face By ID (see Chapter 14
section 54.10 Set Face to Enroll) command is used to change the name of a known face
The Request Enrolled Names (see Chapter 14 section 54.9 Request Enrolled Names)
command is used to retrieve a list the known faces
The ability to remove a facial identity (see Chapter 14 section 54.7 Erase Enrolled Face By
Id), or all facial entities (see Chapter 14 section 54.6 Erase All Enrolled Faces)
The Find Faces (see Chapter 14 section 54.8 Find Faces) command initiates the search for
faces
drawing by Jesse
Easley
Vector’s hand detection is done with a custom TensorFlow Lite DNN model.47 Vector also has a
custom person detector; this may be used to quickly identify whether there is a face in view before
engaging the potentially more expensive OKAO framework.
The host application has to do preprocessing such as feature extraction, prepare the input for the
system. For instance, the image must be converted to grey scale and scaled down to 128 pixels by
128 pixels. (More pixels require much more memory and processing steps often with no
improvement in detection; some higher quality models do use slightly larger image sizes.)
Then each of the operations in the model is carried out. An operation might perform a simple
calculation light summing values, keeping the smallest or largest, etc or an operation might be a
complex calculation such as a convolution. Once all of the operations have completed, the results
are not a “this is a hand” or other conventional software result. Instead, the results are big list of
values on how confident it is for each possible item. An application typically chooses the top item
or two as the output – if their confidence is high enough.
46
Since Tensorflow Lite was both introduced at the end of 2017, there has been a steady trickle of improvements to TensorFlow Lite.
There is a lower power version that targets microcontrollers.
47
There are four different hand detector models – only one is used – which suggests that the hand detector was actively being tweaked
and improved.
Figure 75:
Tensorflow
FlatBuffers TensorFlow lite with
Lite
hardware specific
Delegate(s)
accelerators
ARM
Qualcomm
Neural
GPU
Network
framework
framework
In addition, applications using TensorFlow Lite can provide their own, faster or more efficient
implementations of operations.
Each TensorFlow Lite model is probably run in its own thread. The benchmarks posted by
TensorFlow48 using smartphones to run model tens to hundreds of milliseconds. Putting each
model on its own thread and waiting for posted results allows the rest of the processing to execute
in a consistent fashion.
48
https://www.tensorflow.org/lite/performance/benchmarks
MobileNet V1 includes higher quality models than the one employed that may be explored. Since
this model was released, a version 2 and version 3 of MobileNet have been developed and released.
Version 2 is reported to be faster, higher quality, and/or require fewer processor resources.
(Version 3 is slower and takes more processor resources, but is much more accurate.)
The configuration file shows experimentation with MobileNet V2 (using 192x192 input images),
but it was disabled.
drawing by Jesse
Easley
49
Or a special model for recognizing pets may have been under development
The camera/image processing pipeline in Vector is entirely focused on his AI features with as low
as practical battery impact. The images available for taking a picture are not filtered, or cleaned
up, so the pictures that Vector takes are noisy and smaller.
Commentary: The quality of photos seen on a mobile phone is achieved using a camera processing
pipeline to enhance the images, removing noise and applying special filters to reconstruct textures.
It is conceivable that the camera processing framework(s) from Qualcomm and Android could be
added to an open-source Vector. That would come at the cost of battery performance, heat, and
potentially overwhelm the memory resources (there are still bugs in Vector where the memory use
becomes too high, and the system thrashes, slowing noticeably down and eventually crashes.)
It is more practical, in a future open-source Vector, to export the raw camera images (in its RAW
format and at different illumination levels) and process the images on a PC or mobile device. The
availability of sophisticated image processing frameworks are much wider for those devices. See
Chapter 14, section 56 Image Processing for the camera access API.
The PhotoTaken event (see Chapter 14 section 63.2.1 PhotoTaken) is used to receive a
notification when Vector has taken a photograph.
The Photos Info (see Chapter 14 section 63.5 Photos Info) command is used to retrieve a
list of the photographs that Vector currently has
The Photo (see Chapter 14 section 63.4 Photo) command is used to retrieve a photo
The Delete Photo command (see Chapter 14 section 63.3 Delete Photo)removes a photo
from the system
The Thumbnail (see Chapter 14 section 63.6 Thumbnail) command retrieves a small
version of the image, suitable for displaying as a thumbnail
/anki/data/assets/cozmo_resources/ config/engine/vision_config.json
This path is hardcoded into libcozmo_engine.so. It configures each of the image processing
module, and the schedule defaults. The file is a structure with the following fields:
83.1.1 FaceRecognition
The FaceRecognition structure includes the following fields:
MaxDepth uint
MinSampleCount uint
OnTheFlyTrain booleab
PositiveWeight float
TruncatePrunedTree boolean
Use1SERule boolean
Note: The Ground Plane classifier is a bit unusual. It is one of only two YAML files. The YAML
file is an openCV based classifier tree, instead of TensorFlow Lite. This suggests it may have been
older (i.e. from Cozmo), and/or it may have been more efficient to implement in openCV.
83.1.3 IlluminationDetector
The IlluminationDetector structure includes the following fields:
83.1.4 InitialModeSchedules
The InitialModeSchedules provides the default frequency that each vision processing step is run.
(And step not listed here, the default is that it is not scheduled to run). The structure includes the
following fields:
numImageReadyCyclesBefo uint
reReset
percentileForMaxIntensity uint
83.1.6 ImageQuality
The ImageQuality structure includes the following fields:
SubSample uint
83.1.7 MotionDetector
The MotionDetector structure includes the following fields:
83.1.8 NeuralNets
The NeuralNets structure includes the following fields:
labelsFile string The name of the text file (.txt) that gives text strings for the
classification output of the model.
memoryMapGraph uint ?If non-zero, memory-map the TensorFlow Lite file in,
rather than loading it with file reads.
minScore float If the highest “score” for a label is below this value, none of
the items was recognized in the image.
modelType string “TFLite” for TensorFlow Lite files.
networkName string The name of the vision processing step.
numGridCols uint Optional.
numGridRows uint Optional.
outputLayerNames string The name of the output layer in the TensorFlow Lite file.
outputType string “classification” vs “binary_localization”
pollPeriod_ms uint
timeoutDuration_sec float ?The time to allow the model to run in a background thread
without any results before it is considered timed out, and
must be restarted?
useFloatInput uint If non-zero, use float data type within the model
useGrayscale uint
/anki/data/assets/cozmo_resources/config/engine/ dnn_models
83.1.10 PerformanceLogging
The PerformanceLogging provides the frequency to log stats about the vision processing. The
structure includes the following fields:
TimeBetweenProfilerDasLog uint “How often to print Profiler info messages to the logs”
s_sec
83.1.11 PetTracker
The PetTracker structure includes the following fields:
MaxPets uint The maximum number animals that are detectable &
trackable at the same time.
MinFaceSize uint
Comment: The ability to search for a lost pet would have been really cool.
/anki/data/assets/cozmo_resources/ config/engine/visionScheduleMediator_config.json
This is an array of structures. Each structure gives the frequency to run a given image processing
step, for each of the vision processing subsystems modes. 1 means “runs every frame,” 4 every
fourth frame, and so on. The structure has the following fields:
/anki/data/assets/cozmo_resources/ config/engine/photography_config.json
MedianFilterSize uint “If > 0, enables a median filter before saving. Must be odd.
3 or 5 are reasonable values.”
SharpeningAmount float 0.0 disables sharpening
RemoveDistortion boolean
SaveQuality uint
ThumbnailScale float
Qualcomm Neural Processing software development kit (SDK) for advanced on-device AI,
the Qualcomm Computer Vision Suite
Situnayake, Daniel; Pete Warden, TinyML, O’Reilly Media, Inc. 2019 Dec,
https://www.oreilly.com/library/view/tinyml/9781492052036/
Stein, Andrew; Decoding Machine-Readable Optical codes with Aesthetic Component, Anki,
Patent US 9,607,199 B2, 2017 Mar. 28
TensorFlow, Mobile Net v1
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
“small, low-latency, low-power models” that can recognize a variety of objects (including
animals) in images, while running on a microcontroller
TensorFlow, TensorFlow Lite GPU delegate
https://www.tensorflow.org/lite/performance/gpu
TensorFlow, TensorFlow Lite inference
https://www.tensorflow.org/lite/guide/inference
This Week in Machine Learning (TWIMLAI), episode 102, Computer Vision for Cozmo, the
Cutest Toy Robot Everrrrr! with Andrew Stein
https://twimlai.com/twiml-talk-102-computer-vision-cozmo-cutest-toy-robot-everrrrr-andrew-
stein/
Mapping Overview
Navigation and Path Planning
2D surface map
A 2D map that is used to track where objects (especially objects whose marker symbols he
recognizes), cliffs, and other things are on the surfaces that he can drive on. Vector uses
this map to navigate. This map has an arbitrary origin and orientation.
Vector also tracks where faces, pets and some kinds of recognized objects are in his
camera image area; these objects are tracked in the image pixels. (Never mind that the
camera pose can change!)
Vector’s 2-D surface map system works with the localization and navigation subsystem. It uses
several sensors to know
Inner node
Leaf node (quad),
level 2
The tree has two kinds of nodes: inner nodes and leaf nodes:
The inner nodes do not hold any information about the region (except its size). Instead
they point to 4 child nodes at the next lower layer. The top most node is called the root
node.
The leaf nodes of the tree are square cells (called quads) that hold information about what
is there (or that the area is unexplored).
Each node represents a square area. The size of the square depends on how many levels it is from
the root node. The root node covers the whole map. The nodes in the next layer down are half the
width and height of the root node. (In general, a node is half the width and height of a node the
next layer up.) Nodes (including quads) at the same level – the same distance from the root node –
are the same size. Each node’s coordinates can be figured in a similar way by knowing the
coordinates of the root node.
When Vector reaches the edge of his map area and needs to expand it, he has to add a new node at
the top (this becomes the new root node) and adds nodes down until it can contain the info at the
edge of the map.
Client applications (the ones that talk via the HTTPS API) may also wish to know that the map was
thrown out and a new one created – and thus know they should toss out their map and location of
objects. Vector associates a unique identifier with each generation of the map called origin_id.
Whenever a new map is created the “origin_id [is] incremented to show that [the] poses [of the
A filter is applied to them (probably a median filter), throwing out values that are too near
or too far.
Combining this with Vectors current position and orientation, and the distance to the
object, he can estimate the objects position; and
Vector can infer that the space between him and the object are free of other objects and
obstacles. (This means splitting up the map quads into a fine-grained resolution along the
narrow beam path.)
In addition to this, if the object has a known marker, the vision system estimates the angle of the
object, and a distance to it. This is based on the known visual size of the marker, and the observed
size. If the time of flight sensor is not blocked, only the angle need be used. If the sensor is
blocked, the visually estimated distance to the object can be used instead.
87.1. FILTERING
The time of flight sensor emits a stream of pulses that are detected by a grid of single photon
avalanche diode (SPAD) detectors. The detectors measure two things:
1. The duration from the time that the pulse was emitted; this is a direct measure of the
distance to the object.
2. A count of the number of photons received back from the object. This is a measure of how
reflective the object is. This can potentially be used to distinguish between two different
objects.
The software has to clean up the histogram since it is very noisy, with lots of spikes:
Comment: The histogram is actually part of how the sensor cleverly rejects noise. The detectors
will pick up light from other sources, such as bright sunlight. By using pulses and controlling
when they are sent, the sensor can measure the background (or ambient) light level, and better
discriminate its own light pulses from the rest. The noise can come from the light be reflected
back by dirt on the sensor lens, dust in the atmosphere, light bouncing around and coming back a
little later than the directly reflected light. Gathering the measurements into a histogram spreads
the noise out, mostly randomly, making it easier to pick out the useful measurement.
The easiest way to eliminate the histogram spikes is to do a pass over, setting each points value to
be the weighted average of the values to the left and right. Values below a noise floor can be
tossed out.
Then a good distance measurement can be found by looking for the peak or by finding the median.
The VL53L1X’s detector is a 16x16 grid of SPAD detectors. The sensor can be configured to use
rectangular areas of the detector grid, called the region of interest (ROI), instead of the whole grid:
With the sensors field of view, different regions look in different directions. By creatively
choosing regions to get measurements from – and using the reflectivity measurement to distinguish
between objects – the software could look around, track multiple objects and scan the driving
surface. In other words, it could work like a low-resolution depth sensing camera, with a very
good measurement of depth and surface reflectivity. It can even detect swiping motions.
Since Anki had placed the time of flight sensor in the robot’s head, near the camera, there was
more potential for smarter interaction. Obviously, the head could scan up and down, giving a
Table 479:
Field Type Units Description
DistanceSensorData
proxDistanceToTarget_mm float mm The distance to object, as measured by the time of parameters
flight sensor.
visualAngleAwayFromTarget_rad float radians The targets relative orientation angle, as estimated
by the vision system.
visualDistanceToTarget_mm float mm The distance to object, as estimated by the vision
system.
Table 480:
Field Type Units Description
RangeSensorData
headAngle_rad float radians The angle (tilt) of the robots head. parameters
The sensor-related data structures involve a complex nesting of structures. To help clarify:
The RangeDataRaw structure is just a link to an array of arrays of measurements. It has the
following fields:
Table 481:
Field Type Units Description
RangeDataRaw
data RangingData[] An array of the sensor data to process, and the parameters
results.
The RangeReading structure is basically identical to the structure in ST’s software to interface with
the time of flight sensor. It has the following fields:
Table 483:
Field Type Units Description
RangeReading
ambientRate_mcps float mcps The ambient number of counts; this is the noise parameters
floor
rawRange_mm int mm
Table 484:
Field Type Units Description
RangeDataDisplay
data RangingDataDisplay[] The ranging data, for potential display. parameters
Table 485:
Field Type Units Description
RangingDataDisplay
padding uint Likely a CLAD structure field that is reserved for parameters
future use that was automatically converted to
JSON.
processedRange_mm float mm The range to the object after processing & filtering
of the data
roi uint The region of interest that was measured.
roiStatus uint A code indicating whether there is a valid
measurement for this region.
spadCount float count “the time difference (shift) between the reference
and return [detector] arrays.” This translates to
distance to the target.
signalRate_mcps float mcps The “return signal rate measurement… represents
the amplitude of the signal reflected from the target
and detected by the device.”
status uint A code with 0 indicating a valid measurement,
otherwise indicating an error during measurement
or processing.
What Vector knows is in the quad –a cliff, the edge of a line, an object with a marker
symbol on it, or an object without a symbol (aka an obstacle),
A list of what Vector doesn’t know about quad – i.e. that he doesn’t know whether or not
there is a cliff or interesting line edge there,
Vector subdivides quads to better represent the space. The quad probably is only slight bigger than
the object in it. But the quad (probably) can be smaller than the object, to accommodate the object
not oriented and aligned to fit quite perfectly in the quad. More than quad can refer to a contained
object.
A pose. The image skew of the marker symbol gives some partial attitude (relative
orientation) information about the object and Vector can compute an estimated orientation
(relative to the coordinate system) of the object from this and Vector’s own pose. Vector
can estimate the objects position from his own position, orientation, and the distance
measured by the time of flight sensor.
A size of the object. Vector is told the size of objects with the given symbol.
A link to a control structure for the kind of object. For instance, accessory cubes can be
blinked and sensed.
If he sees a symbol, he uses the objects known size, the image scale, its pose (if known) and any
time-of-flight information to (a) refine his estimated location on the map, (b) update the location
and orientation of that object.
Identify markers
· Distance (mm)
· Angle (radians)
Time of
Select
Flight Filter
best
sensor
Apply current
position, Build Map
IMU orientation
Enhanced
Odometry
Kalman Filter Position & angle
SLAM consists of multiple parts. It integrates the sensor for distance and movement. It also uses
image processing to figure it out where it is at. It identifies landmarks, and information about
them. In a sophisticated integration process, it can estimate Vector’s orientation and if an object
has moved. The estimate of Vectors orientation is based on turn information from the IMU, and
refined by what it can see.
Vector’s map is based on occupancy grids, except it does not use probabilities.
Accessories
Vector’s accessories include his charging station, companion cube, and custom items that can be
defined thru the SDK.
Companion cube, which is “smart,” sensing movement, orientation, taps being held and is
able to provide feedback via lights
Custom items
91.1. DOCKING
Docking is a behaviour/action that is used for both approaching the cube, charging station (home),
and other marked items.
92.1. DOCKING
Vector’s step in docking with the charging station are:
3. Reverse and back up the ramp. Vector uses a line follower, with his cliff sensors, to drive
straight backwards. (Since he is going backwards, he can’t use vision.) He uses the tilt of
the ramp to confirm that he is on the charger
4. He also checks that he is in the right spot by looking for power to his charging pads, as
reported by body-board charging circuit. If he is unable to find the spot, he grumbles about
it, drives off and retries.
Vector has a cute low light mode that turns on most of the pixels on his display to see a bit more,
and locate his home.
Vector can roll his cube, shove it around, use it to “pop a wheelie,” and pick it up. To do these, he
must line up squarely with cube. Vision was found to be needed in the Cozmo to align precisely
enough to get the lift hooks into the cube.
93.1. COMMUNICATION
Vector connects with the cube via Bluetooth LE. This communication link provides the ability for
Vector to:
Discover cubes
Pair with a cube (note that Vector can pair with only one cube, and if he is not already
paired, he will automatically pair with the first cube he receives Bluetooth LE advertising
for.)
List the available cubes, see Chapter 14, section 51.4 Cubes Available
Forget (or unpair) from his preferred cube, see Chapter 14, section 51.8 Forget Preferred
Cube
Pair to the first cube detected, see Chapter 14, section 51.15 Set Preferred Cube
Connect to his cube , see Chapter 14, section 51.3 Connect Cube
Disconnect from the cube, see Chapter 14, section 51.5 Disconnect Cube
Dock with his cube, see Chapter 14, section 51.6 Dock With Cube
Flash cube lights, see Chapter 14, section 51.7 Flash Cube Lights and 51.14 Set Cube
Lights. The later allows using a complex pattern
Pick up an object (his cube), see Chapter 14, section 51.9 Pickup Object
Place his object (his cube) on the ground, see Chapter 14, section 51.10 Place Object on
Ground Here
Roll his cube, see Chapter 14, section 51.12 Roll Block and 51.13 Roll Object
The cubes battery level, see Chapter 14, section 51.2.1 CubeBattery
A loss of the connection with the cube, see Chapter 14, section 51.2.2 CubeConnectionLost
The robot state event (see Chapter 14, section 61.3.1 RobotState) provides other info about
Vector’s attempt to interact with the cube. This includes what object he is carrying. There
are bits to indicate when
The object event (see Chapter 14, section 43.2.1 ObjectEvent) provides other info about the
state of the cube as it happens: taps, loss of connection, state of connection, being moved,
etc.
93.2. ACCELEROMETER
The cube has an accelerometer built in – the software can used this to determine the cube’s cube accelerometer
orientation, whether it is being held, and to detect taps (or double taps). The software detects
these by have the Cube stream accelerometer data, filtering and looking for patterns. In that way,
the orientation and being held sensing is very similar to how Vector measure his own orientation
and decides if he is being held:
High Pass
Classifier
Filter Being held
The software also detects taps by filtering and looking for shock pattern:
Tap detected
93.3. DOCKING
The docking with a cube is based on the Hanns Maneuver, named for Hanns Tappeiner who
described it to his team.
Defining a custom object takes three kinds of information. First, the shape of the item – whether it
is a “wall,” box or cube. Second, assign some of the handful of predefined symbols to the item;
this is optional. And third, measure the size of the marker symbols and object.
mm
10
width (mm)
The marker must be horizontally and vertically centered. The width of the marker doesn’t have to
be the same as the height… but probably should be.
The body origin is the 5mm behind the center of the face. When Vector is tracking the position
and orientation of this object, the position it gives for the point in the wall 5mm behind the face, at
half the height and width – the center of the wall.
)
m
(m
ze
si
size (mm)
The marker must be horizontally and vertically centered on each face. The width of the marker
doesn’t have to be the same as the height… but probably should be.
The body origin is the very center of the cube. When Vector is tracking the position and
orientation of this object, the position it gives is for the very center of the cube, not for a visible
face.
)
m
(m
h
pt
de
width (mm)
The marker must be horizontally and vertically centered on each face. The width of the marker
doesn’t have to be the same as the height… but probably should be.
The body origin is the very center of the box. When Vector is tracking the position and orientation
of this object, the position it gives is for the very center of the box, not for a face.
94.5. COMMUNICATION
The Chapter 14 HTTPS API provides the following custom-object related commands:
Create a custom unmarked object (see Chapter 14 section 43.3 Create Fixed Custom
Object) or one with markers that can be tracked (see Chapter 14 section 43.4 Define
Custom Object)
Drive to the object, see Chapter 14 section 57.4 Go To Object. Note Vector thinks in terms
of the center of the object, not the face; for larger objects add the distance from the center
to the face for Vector’s position.
As the state of the cube changes, the following events are posted to the API:
The object event (see Chapter 14, section 43.2.1 ObjectEvent) provides other info about the
state of the object as it happens: that is observed or lost,, being moved, that it’s orientation
has changed etc.
Animation
Vector uses animations – “sequence[s] of highly coordinated movements, faces, lights, and
sounds” – “to demonstrate an emotion or reaction.” This part describes how the animation system
works.
DISPLAY & PROCEDURAL FACE. Vector displays a face to convey his mood and helps form an
emotional connection with his human.
Animation
This chapter describes Vector’s animation engine:
Animation Engine, animation groups, triggers, and events
Animation file formats
Animation trigger
Select
Emotion animation
Engine Emotion state based on
mood
Animation
Animation
Engine
Motion Control
Sound Display Procedural
· Head angle
files & animation Face Controller Backpack Cube
· Lift height
parametric Sprites & · Face tilt Lights Lights
· Turn in place
sounds Text · Eye controls
· Driving
Not surprising, much of the animation is carried out by vic-anim. The motor controls, including
driving along a path, are performed in vic-robot.
Vector employs two levels of referring to an animation. Individual animations have an animation name
animation name. Animations are also grouped together by type, with an identified for the group animation trigger
called an animation trigger name. Vector “pick[s] one of a number of actual animations to play name
based on Vector's mood or emotion, or with random weighting. Thus playing the same trigger
twice may not result in the exact same underlying animation playing twice.”
Backpack Backpack
Animation Light
Trigger Map Sequence
Composite
Image
Layout Map Image
Layout
Composite PNG
WEM
WEM
Sprite
Image Map Image Map Image
Audio
Audio
Sequence
Map Files
Files
Files
WEM
WEM
WEM
Animation Animation Animation Sound
Sound Audio
Audio
Audio
Trigger Map Groups JSON & Bin Bank
BankFiles
Files Files
Files
Files
There are seven types of animation files and other animation sources:
JSON files that describe how the backpack lights should behave (see Chapter 22)
JSON files that describe how the cube lights should behave (see Chapter 22)
Binary animation files holding one or more of related animations that coordinate
sophisticated sounds, eye animations, linking together sprite sequences, and coordinate
head & lift movements with driving (see Chapter 25 for details of this file)
JSON animation files are very similar to binary animation files. They hold one or more of
related animations that coordinate sophisticated sounds, eye animations, linking together
sprite sequences, and coordinate head & lift movements with driving (see Chapter TBD for
details of this file)
Sprite sequences (see Chapter 23), which are folders of PNG image files to display in
sequence
Composited screens (see Chapter 23) showing icons and text information driven by the
behaviors and cloud server intents.
Procedural animations are generated by vic-anim. These perform text to speech, driving
around obstacles, animating Vector’s eyes, and other tasks that are not practical to script in
a file.
JSON files that map the trigger names to the animation groups, and to the backpack and
cube light animations. (These will be described below.)
The animation binary file may direct a sprite sequence and/or audio file to play. These will
be described in Chapter 26
JSON files to layout the display; these may call out animation sequences and places to
composite icons and text. These will be described in Chapter 23
The files mapping a name to other files, or other information, end with “Map”.
The names of the animation clips start with the base name of the animation file that contains them.
(It may even be the same name). This makes it easier to find the animation file given the clip
name.
The animation trigger name is mapped to an animation file (and group of animations). The table
that defines this mapping is found in the following file:
/anki/data/assets/cozmo_resources/ assets/cladToFileMaps/AnimationTriggerMap.json
The format of the files is the same. The file is an array of structures. Each structure has the
following fields:
CladEvent string This is the animation trigger name to match when looking
up the animation.
The cube’s and backpack light animation file name is usually the same as the trigger, except that
the first letter is lower case.
/anki/data/assets/cozmo_resources/ assets/animationGroups
This path is hardcoded into libcozmo_engine. Inside are folders (grouping the animation groups),
each of which holds the JSON files. By convention, the animation group file names are all lower
case. Some names may look similar to the trigger name (but not always).
Each animation group JSON file is a structure with the following fields:
The AnimationGroupItem structure describes the specific animation clip to use. It may also
specify some head movement, with some variability; this is optional. The structure has the
following fields:
Table 489:
Field Type Units Description
AnimationGroupItem
CooldownTime_Sec float seconds The minimum duration, after this animation has completed, structure
before it can be used again. Typically 0.0
HeadAngleMin_Deg float degree The head is to move to random angle greater (or equal) to
this. This should be in the range -22.0° to 45.0°. Only
used if UseHeadAngle is true.
HeadAngleMax_Deg float degree The head is to move to random angle les than (or equal) to
this. This should be in the range -22.0° to 45.0°. Only used
if UseHeadAngle is true.
Mood string emotion The name of a “simple mood” that should be applied or
name “Default”. See Chapter 28 for more information on simple
moods.
Name string The name of the animation clip to play. This clip is defined
within one of the animation binary files. The binary file
(without the “.bin” suffix) or the JSON file (without the
“.json” suffix) for the animation.
UseHeadAngle bool If true, this enables the head to be moved to some random
within the specified range. Optional, default is false
Weight float How much “weight’ to give to this entry. Typically 1.0
The possible animations are screened for being applicable to the current emotional state (with
Mood). The result set is randomly selected from: The weights (of the select items) are summed up
and normalized, giving the probability that that entry would be selected.
96. ANIMATIONS
When an animation is played, it locks the tracks that it is using, for the duration of the animation.
If one of the tracks that it needs to use is already locked, the animation can’t be played (and
generates an internal error).
When an animation is submitted to be played, a several of tracks (the lift, head and body) can be
flagged to be ignored if they already used elsewhere by animation system.
/anki/data/assets/cozmo_resources/ assets/animations
This path is hardcoded into vic-anim. Each of these files may contain several animations (called
clips). By convention, the name of the animation starts with the name of the file. See Chapter 26
for a detailed description of these files.
This path is hardcoded into libcozmo_engine. The file is an array of structures. Each structure
has the following fields:
A list of animations triggers can be retrieved with the List Animation Triggers command
(see Chapter 14 section 46.3 List Animation Triggers).
A list of animations can be retrieved with the List Animation command (see Chapter 14
section 46.2 List Animations).
An animation can be play by selecting the animation trigger (see Chapter 14 section 46.5
Play Animation Trigger command). Vector will select the specific animation from the
group. Or,
As the individual animations are low-level they are the most likely to change, be renamed or
removed altogether in software updates. Anki strongly recommends using the trigger names
instead. “Specific animations may be renamed or removed in future updates of the app.”
Lights Animation
This chapter describes the light animations:
The Cube Spinner game, which is one user of this type of lights animation
Animation binary file to animate the backpack lights. This drives most of the light
animation.
JSON files for the Cube and backpack light animation. There are four kinds of JSON files:
files for the Cube’s light sequence, files for the backpack light sequence, two files to map
animation trigger names to each of those light sequences.
The Cube Spinner game, which is a notable client of the JSON-driven light animations..
The light animations may be triggered by the cube spinner game configuration, or by behaviors
(within libcozmo_engine), such as those related to exploring, interaction, pouncing etc.
The companion cube and backpack light animations are very similar, so they have been grouped
here for discussion.
The Cube Spinner game’s configuration file is located with the behavior folders:
/anki/data/assets/cozmo_resources/ config/engine/behaviorComponent/cubeSpinnerLight
Maps.json
Table 493:
Field Type Description
BackpackLightMap
celebration string The animation trigger name for the celebration JSON structure
event.
holdTarget string The animation trigger name for the holdTarget
event.
selectTarget string The animation trigger name for the selectTarget
event.
CubeLightMap is a structures used to map an event to an animation trigger name. The animation
trigger name is mapped to cube light animation. This structure has the following fields:
Table 494:
Field Type Description
CubeLightMap JSON
celebration string The animation trigger name for the celebration structure
event.
cycle string The animation trigger name for the cycle event.
locked string The animation trigger name for the locked event.
lockedPulse string The animation trigger name for the lockedPulse
event.
lockIn string The animation trigger name for the lockIn event.
3. The animation file provides the sequence to illuminate the backpack lights.
This path is hard coded into vic-anim. This file maps the trigger name to the name of the animation
file. The file’s schema is the same as in Chapter 21, section 95.3 Trigger Map Configuration files
/anki/data/assets/cozmo_resources/ config/engine/lights/backpackLights/
The path is hard coded into vic-anim. All of the JSON files have the same structure with the
following fields:
Note: These sequences do not have the parametric variation based on emotion or random
weighting.
3. The animation file provides the sequence to illuminate the cube lights.
/anki/data/assets/cozmo_resources/ assets/cladToFileMaps/CubeAnimationTriggerMap.jso
n
This path is hardcoded into libcozmo_engine.so. This file maps the trigger name to the name of
the animation file.
The file’s schema is the same as in Chapter 21, section 95.3 Trigger Map Configuration files,
/anki/data/assets/cozmo_resources/ config/engine/lights/cubeLights
and within folders (and sub-folders) therein. This path is hard-coded into libcozmo-engine.
All of the cube light animation JSON files have the same structure. They are an array of structures.
(There is usually one item, but there may be more.) Each structure may contain the following
fields:
This structure is very similar to that used by the backpack lights. The obvious differences are:
Full screen sprites — each frame is a PNG image that covers the whole display. A
sequence of frames (PNGs) is drawn regularly to create the animated effect.
Procedural face to draw the face in a complex way (more on this later)
The first two are used as part of behaviors and intents. A visual “movie” is shown when the
behavior starts and another is to provide the response. The compositor map allows mixing in
iconography, digits and text to show information in the response.
Using the full-screen sprites above, with the eyes pre-drawn in the PNG’s
Note: the sprite and procedural face can be drawn at the same time, with sprites drawn over the
eyes. This is done to create weather effects over Vector’s face.
102.1. ORIGIN
The display system – especially the procedural face module – was pioneered in Cozmo. To US Patent 20372659
prevent burn in and discoloration of the OLED display, Cozmo was given two features. First,
Cozmo was given regular eye motion, looking around and blinking. Second, the illuminated rows
were regularly alternated to give a retro-technology interlaced row effect, like old CRTs.
Vector’s eyes are more refined, but kept the regular eye motion. The interlacing was made
optional, and disabled by default.
Image
Vision
Camera
MIPI processing
Composition
Procedural
Face
Color
Preferences
Frame buffer
LCD display
/dev/fb0 SPI
A screen layout defining rectangular areas on the display (called sprite boxes) where
images and sprite sequences will be drawn.
Sprite sequence to display in the layout areas. Not all screen layouts have an associated
sprite sequence.
These forms are only used by a couple of behaviors, to support the weather, timers, and the
blackjack game. Version 1.7 began the process of migrating to a slightly different structure that
used the binary animation file.
/anki/data/assets/cozmo_resources/config/engine/animations/boot_anim.raw
/anki/data/assets/cozmo_resources/ assets/cladToFileMaps/CompositeImageLayoutMap.js
on
This path is hardcoded into libcozmo_engine.so. The format of the file is an array of structures.
Each structure has the following fields:
LayoutName string The name of the JSON file (without the “.json” suffix) for
the animation.
/anki/data/assets/cozmo_resources/ assets/cladToFileMaps/CompositeImageMapMap.json
This path is hardcoded into libcozmo_engine.so. The format of the file is an array of structures.
Each structure has the following fields:
MapName string The name of the JSON file (without the “.json” suffix) for
the animation.
/anki/data/assets/cozmo_resources/ assets/compositeImageResources/imageLayouts
Each layout is formatted as the array of zero or more structures, although most have a single
structure. Each structure has the following fields:
A sprite box defines a rectangular region on the display to draw an imagefrom a file. Each
SpriteBox structure has the following fields:
See also Chapter 26 section 114.16 RobotAudio for an alternate method to define a sprite box.
/anki/data/assets/cozmo_resources/ assets/compositeImageResources/imageMaps
Each image map file is formatted as the array of zero or more structures, although most have a
single structure. Each structure has the following fields:
/anki/data/assets/cozmo_resources/ assets/sprites/independentSprites
/anki/data/assets/cozmo_resources/ config/sprites/independentSprites
/anki/data/assets/cozmo_resources/ config/facePNGs
/anki/data/assets/cozmo_resources/ config/devOnlySprites/independentSprites
These paths are hardcoded into libcozmo_engine, vic-anim. and vic-faultCodeDisplay. Not all of the
images in those paths are used.
The independent sprite PNG files can be any size so long as it fits within the width and height of
the display (184x96). The images may be colored, or in gray scale with an alpha channel. If the
sprite is grey-scale, it will be colourized with the current eye colour setting, using the gray scale for
the pixel brightness level.
/anki/data/assets/cozmo_resources/ assets/sprites/spriteSequences
/anki/data/assets/cozmo_resources/ config/sprites/spriteSequences
and
/anki/data/assets/cozmo_resources/ config/devOnlySprites/spriteSequences
These paths are hardcoded into libcozmo_engine. Note: the folder name may have a different case
than the sprite sequence name used by the SpriteMapBox or the animation; the name should be
matched in a case insensitive manner.
The sprite sequence PNG files are sized to fill the display. The images must match the width and
height of the sprite box they are displayed in, or the display (184x96) if they are employed by a
binary animation file. The images may be colored, or in gray scale with an alpha channel. If the
sprite is grey-scale, it will be colourized with the current eye colour setting, using the gray scale for
the pixel brightness level.
These sprites are displayed as a sequence. The frame number is appended to the file name – range
from 2 to 5 digits – starting with 0. The frame rate is computed from the number of images in the
sequence (the number of frames) divided by the duration of the animation (given in the animation
manifest) that it is associated with.
The images are composited on top of the eye layer. The eyes may haven be turned off, or they may
be present.
These are procedures are only used in exceptional circumstances. (The typeface is inelegant, if
they were something Vector used more frequently; undoubtedly they’d have improved typeface
designs.) They are used to display the fault codes (via Vic-faultCodeDisplay), when the system is
unable to operate the software; and to display information on the customer care information screen
(CCIS) in vic-anim.
The parameters of the face controls are divided into the overall view of the face and the individual
characteristics of each eye:
scale Y
Face center
angle
The color to draw the eyes in. Vector’s eye color is a preference setting, but can be
temporarily overridden by the SDK.
The angle of the face; tilt (or rotation) of the face gives the impression of tilting the head
The illusion of gaze – the intuition that Vector is looking at something – is achieved by
giving each eye a soft spherical rounding effect. The center of the shading, the equivalent
of a pupil, may be moved around the eye area. This gives a sense of where Vector is
looking – and by moving the center, Vector can appear to be looking around. Coordinated
with the face detector, Vector can make (and maintain) direct eye contact.
The outer shape of the eyes, which gives a sense of the emotions – smiling, frustration,
sleep etc.
There is a scan line opacity factor. This controls how much alternating lines are
illuminated and darkened. A value of 1.0 has odd and even lines with the same coloring.
Where the eyes are looking is controlled within the procedural face manager, rather than in the
animation files. It controls the blink rate, the focus of the eyes and how much the eyes dart around.
The manager contributes looking at a face and making eye contact.
Hotspot
Center
The shape of the eyes is parametrically controlled by the animation engine. An internal removing gradient
configuration variable controls how fast the shading falls off from the center toward the edge. A banding
bit of random noise is added to remove the banding from the spherical gradient, and to give the
eyes shading a little texture. This too has an internal configuration variable to control the noise
factor.
Upper outer
y radius
y radius
y radius
The basic shape of the eye is controlled by the roundedness of the corners.
The size and width of the eye is created by the face’s scaling factors
Y position of the
upper eyelid
An arc represents the upper eye-lid and erases (or occludes) the upper portion of the eyes;
these help create the sleepy, frustrated/angry emotions.
An arc represents the lower eyelid and cheek, and erases (or occludes) the lower portion of
the eyes; these help create the happy emotions
An eye can be made smaller – or to squint – by having no bend to the eyelids, but moving the
eyelids position closer to the center.
Assuming that a complex clipping path is less efficient, the eye could be render as
1. The eye is rendered as a gradient pattern into a buffer, with the scale
2. The eye lid is drawn, forcing the pixels (of the eye lid area) to become transparent
3. The cheek is drawn, forcing the pixels (of the cheek area) to become transparent
4. The rectangle area where the eye will go is scaled, rotated, and offset for where the eye
will go
5. Each pixel in the translated rounded rectangle region is map to one in the eye pixel buffer,
and copied to the display buffer
105. COMMANDS
The HTTPS SDK API (Chapter 14) includes commands that affect the display
Display RGB image (see Chapter 14 section 53.2 Display Image RGB)
Mirror display (see Chapter 14 section 53.3 Enable Mirror Mode)
Audio Production
This chapter describes how Vector produces sounds and the audio output system:
106. SPEAKER
Vector uses sound to convey emotion and activities, to speak, and to play sounds streamed from
SDK applications and Alexa’s remote servers. There are five sources of sound:
Text to speech
libaudio_engine
ALSA Speaker
AudioKinetic Wwise
Text To Speech
Acapela Engine
Alexa MPG123
Compression is not used to send audio from SDK applications to Vector. The vic-engine passes
the received samples to audio engine to mix in its playback.
Wwise was used in Cozmo’s mobile application. The application which was designed as a kind of
video game, and employs a lot video game design approaches. So it makes senses that an audio
tool targeting video games would be used there. In turn, Vector is draws on Cozmo’s frameworks–
both the mobile application and what ran on the hardware – and creation tools it isn’t surprising
that the same framework would be employed by Vector.
Triggering sound effects, muting sounds, and changing parameters of sound playback (by
sending the framework audio events)
Change the sample rates from different sources to the one played
Managing a library of files that specify how to respond to audio events, how to create
music and sound effects, and can hold pre-recorded sounds.
The most import plug-ins are the ones receiving the audio output (aka “sink” plug-ins). This is
how the audio sounds are taken from the audio engine and sent to Vectors speaker.
The ALSA plug-in gathers the audio output and passes it to the “Advanced Linux Sound
System” (ALSA) sound handler, which in turn passes it thru to Qualcomm’s audio driver.
The Hijack plug-in is probably unused on Vector, but is used on desktop computers to
allow recording of Vector’s sounds…? (It may also have been intended to be used as part
of the message-recording, with the microphone audio piped thru the audio engine to be
filtered/cleaned, and then saved.)
There are two plug-ins allowing audio from external sources to be processed by the audio engine
and delivered to Vector’s speaker:
Wave Portal
Streaming Wave portal. This receives the audio sounds from vic-engine for playback
The Krotos “Dehumaniser” vocoder is used to give Vector his unique vocal qualities.
Parameters
Key Value
Parameter Setting
States
Key Value
Audio State
The audio pipeline is driven by audio events it receives from the main application. These events audio event
are like the animation triggers. The metaphor is that when something occurs in the application
(usually a video game), it represents this as an event distributed to a variety of subsystems to
respond to, including the audio engine. Typically an event will cause the audio engine to play a
sound, but the event system is much more powerful than that. Events trigger an action, which is audio action
the heart of the pipeline, and it is the action that plays the sound. The actions, in turn, can be
configured to change sound parameters, stop playback, and so on:
“[Audio] events apply actions to the different sound objects or object groups in your WWise documentation
project hierarchy. The actions you select specify whether the Wwise objects will play,
stop, pause, .. mute, set volume, enable effect bypass, and so on.”
The Wwise framework employs event ID numbers to refer to events. Events can also be referred
to using lexical names – as strings. A later section will describe how to translate a string to an
event ID.
Audio parameters are settable values used by the actions that control how they sound. Vector audio state
mainly uses these to adjust the sounds based on his current mood. Like the action, these
parameters are on a per object (within the audio engine) basis.
An audio state is used to set the context for sound system overall, so that the right sounds and audio state
effects are used in responding to events, and actions general across all of the game objects.
An audio switch is similar, but it sets up the context so that the right sound (or sound effect, etc) audio switch
is used for a particular object or event. AudioKinetic gives an example of foot-step events
triggering a footstep sound but the audio switch is set to the kind of surface that is being walked
on – selecting walking on grass, gravel, pavement, etc.
The sounds events for these are directed to a special game object just for them. Most of Vectors
sounds are driven by the animation, and when they are sent to the audio engine, they are tagged
with “Animation” as their game object. For the procedural sound effects, the events are tagged
with “Procedural” as their game object.
The mood manager also sets audio parameters based on the current mood. This gives “feedback
cues about the robot's emotion state.” See Chapter 28, section 118.2 The emotion model.
The binary animation file can trigger sending audio events (by ID), setting parameters, states and
switches. See Chapter 26, section 114.3 AudioEventGroup for more information.
107.4. EQUALIZER
The Wwise sound equalizer is used to compensate for some of the distortion of Vectors small
speaker. It is also used to “prevent the higher pitches from ever getting very loud” – something
that physically is possible despite the speakers small dimensions. The standards for toy sound
levels vary by country, but typically are limited to 75-80dB at the ear.
/anki/data/assets/cozmo_resources/ sound
There are three kinds of files located there: configuration files, sound bank files (which may
include sounds), and sound files.
/anki/data/assets/cozmo_resources/sound/ SoundbankBundleInfo.json
language string “SFX” or “English(US)” (It isn’t clear how to interpret these.
path string The path of the sound bank file, relative to the location of the
configuration file.
soundbank_name string The name of the sound bank file, without the “.bnk” extension.
The sound bank file has setups for how the sounds flow from files (and other inputs), thru
mixers, and other filters to the output. It calls this the audio bus hierarchy.
Sound effects,
Music compositions to play, (these probably are used heavily in Cozmo, but appear unused
in Vector)
State transition management, how altering the settings of effects during play.
A map of audio events to the actions to carry out when that event occurs, such as playing a
sound, stopping other sounds, changing mixer settings, and so on
Wwise always has an “Init.bnk” sound bank. It is loaded first, since it holds sections that are
shared across all of the sound bank files. It does not contain any sounds.
WEM files also including optional looping parameters. A sound file can be configured to loop looping
indefinitely or a fixed number of times.
In practice, Vector’s WEM sound files are usually single channel (mono) but may have two
channels. Two different AudioKinetic-specific encoding are used. A modified Vorbis-encoded
file. The key changes from a regular Vorbis stream are that they have their own packet wrapper;
the information shared across audio files has been separated out to make them smaller.
The sound file only has 4 bit values for each sample. Each 4 bit value is used as the index
into a pair of look up tables for how much to add to or subtract from the previous 16-bit
value for the new output; this is the adaptive differential (AD) portion.
The tables and their interpretation are standardized by a committee, which is the IMA
portion.
This approach is easy for the processor, and takes little working memory. It does make for larger
files than, say, MP3 or other more sophisticated compression. That is acceptable since the sound
segments are all short, and Vector has a large storage area to hold the files.
This approach has one drawback. It uses only 4 bits in each sample to represent the change in the
analog waveform. Often this isn’t enough; it takes several samples to add or subtract enough for
the output to catch up to the desired values (from the source material). At “low” sample rates, this
can create audible distortion. The fix is to use a higher sample rate for encoding. First, there is
less change between two points closer together in time. Second, the higher rate lets the decoder
catch up faster; this effectively makes the distortion at high, inaudible frequencies. The play back
can be done with this higher sample rate, or it can be down-sampled again after decoding.
Vector (probably) down-samples many of the sound files (after decoding) during playback. The
audio files have sample rates much higher than is supported through the SDK audio channels:
many at 30,000 samples/sec, some as high as 44,100 sample/sec.
1. “Start with an initial hash value of FNV offset basis.” (Use 16777619 for this offset)
a. “multiply [the] hash by the FNV prime,” (use 2166136261 for the prime.)
50
https://www.audiokinetic.com/library/edge/?source=SDK&id=_ak_f_n_v_hash_8h_source.html
/anki/data/assets/cozmo_resources/tts
co-French-Bruno-22khz
co-German-Klaus-22khz
co-Japanese-Sakura-22khz
co-USEnglish-Bendnn-22khz
The exact implementation isn’t known, but there are common techniques. A typical vocoder works
by estimating the pitch of the text-to-speech voice every 30ms, and then adjusting the gains
settings on a multi-band equalizer.
1. Autocorrelation
2. Cepstrum
3. McLeod pitch detection method (MPM)
4. Frequency spectrum-based
5. YIN
51
https://forums.anki.com/t/multiple-actions-possibility-for-sdk/104
Mel-
Cepstrum
30ms
FFT Spectrum
segments
All there are distinctions between these methods, there are far more similarities. They all build on
basic techniques like autocorrelation, and fast-fourier transform (FFTs). An FFT computes a
spectrograph, giving the strength of each frequency. The simplest (or naive) approach is to find the
strong frequency and call it a pitch. This approach is easily fooled. A variety of other techniques
have been developed to work around this.
108.1.2 Autocorrelation
Autocorrelation is a slow, brute-force algorithm for finding the pitch. It works by shifting the autocorrelation
signal in slight amounts until the shifted signal best matches the original one. The core algorithm
looks something like:
It does this compute the difference between each of the samples in the shifted waveform and the
waveform, square that difference, and then summing it up. The shift offset with the smallest offset
is the best match, and used to compute the pitch. Note: It only needs to try the offsets that cover
the first few kHz at the given sample rate.
This approach is an expensive way to find the pitch: Every audio sample must be scanned a huge
number of times.
2. Compute the square magnitude of each complex value – that is, multiply each complex
number by its complex conjugate. (Or, for those not steeped in the jargon,
real*real*+imag*imag, ignore the part where multiplying two imaginary numbers
becomes negative)
McLeod’s method, adds a few more polishing steps as well to clean this up and mix together a few
of the best results to get a better one.
3. Change from frequency (Hz) to “mel scale”,’ which is a perceptual scale for pitch
/anki/data/assets/cozmo_resources/ config/engine/tts_config.json
(This path is hardcoded into vic-anim.) This file is organized as dictionary whose key is the
operating system. The “vicos” key is the one relevant for Vector.52 This dereferences to a
dictionary whose key is the language base: “de”, “en”, “fr”, or “ja”. The language dereferences to
a structure with the following fields:
rangeMin uint
textLengthMax uint
textLengthMin uint
52
The other OS key is “osx” which suggests that Vector’s software was development on an OS X platform.
The localization files for feature stores its text strings (to be spoken) in
/anki/data/assets/cozmo_resources/ LocalizedStrings
This path is not present in versions before v1.6. The folder holds sub-folders based on the
language:
Inside of each are three files intended to provide the strings, for a behaviour, in the locale:
BehaviorStrings.json
BlackJackStrings.json
FaceEnrollmentStrings.json
The dictionary also includes keys, such as “BehaviorDisplayWeather.Rain” that map to a locale
specific string. These have the following fields:
53
it really has this key.
/anki/data/assets/cozmo_resources/ config/engine/behaviorComponent/weather/conditi
on_to_tts.json
This path is hardcoded into libcozmo_engine. The JSON file is an array of structures. Each
structure has the following fields:54
108.3. CUSTOMIZATION
Vector’s voice files are from Acapela. Acapela sells language packs for book readers, but the
format appears different and likely very difficult to modify or create.
Cozmo’s employs a different English voice (in the Cozmo APK). This likely could be extracted
and used on Vector. (In turn, Vectors voice could probably be used with Cozmo.)
Customization of the Localization TTS would give Vector a bit more personality.
109. COMMANDS
The HTTPS SDK API (Chapter 14) includes commands that affect the sounds
Audio stream commands (see Chapter 14 section 48.6 External Audio Stream Playback)
Text to speech (see Chapter 14 section 48.8 Say Text) An external application can direct
Vector to speak using the Say Text command. The response(s) provide status of where
Vector is in the speaking process.
Vector’s volume can be set as a setting using the UpdateSettings command (see Chapter 14
section 64.2 Update Settings) and the RobotSettingsConfig structure (see chapter 30), or
using the Master Volume command (see Chapter 14 section 48.7 Master Volume). Note:
the volume levels using settings doesn’t fully match those in the master volume command.
Note: can also trigger animations which play sounds effects as well.
54
That this file (and many others) is a simple 1:1 transform lends the suspicion that the localization process is needlessly complex.
This site provides a wealth of information on the format of the Sound Bank files.
Unfortunately not all sections of the file have been documented, and there are sections in
Vector’s Sound Bank files that were not known when this page was written
Some example code for YIN and McLeod pitch tracking
https://github.com/ashokfernandez/Yin-Pitch-Tracking
https://github.com/adamski/pitch_detector/blob/master/source/PitchMPM.h
https://github.com/sevagh/pitch-detection/blob/master/src/mpm.cpp
Motion Control
This chapter describes the motion control subsystem:
Note: the motion control is implemented in vic-robot (except where stated otherwise, of course)
Action
Controller
Path
· Turn in place
· Lines
· Arcs
Path
Follower
Steering
Head Lift
Controller
Wheel Head
LiftController
Controller Controller
The path planner thinks of the world and robot coordinates within it in terms of x,y and θ (theta)
coordinates. The θ being the direction angle that Vector is facing at the time. It builds a list of
straight line segments, arcs, and point turns. The PathFollower carries these out. Each of the
motors is independently driven and controlled, with the steering controller coordinating the driving
actions. Sets gains, executes turns, does docking,
The individual motors have controllers to calibrate, move, prevent motor burnout, and perform any
special movements.
The linear speed can be estimated from the motors shaft rotation speed (and some
estimated tread slip), merged with IMU information
The speed that the robot is rotating can be measured by the IMU and the vision system.
The navigation and localization subsystem, which employs a sophisticated Kalman filter
on all of the above position.
Position
The motor control loops are implemented in the head-board. They are implemented using floating
point (rather than the fixed point55 that the body-board’s M0 microcontroller would require), and
are updated 200 times per second. The body-board is responsible for driving the motors and
sampling the encoder. It is also responsible for protecting the motors in case of a stall.
The lift and head motors are position-controlled. The motors can be commanded to travel to an
encoder position at a speed (given in radians/sec). The position – the cumulative number of
radians that that the shaft has turned – can be computed by counting the encoder events, with the
expected direction that the motor has turned.
The speed of rotation is also computed from the encoder count. One typical approach is to
regularly take a derivative of the position (say once every millisecond), and filter it. Since the
encoder is discrete, at slow speeds its update rate will produce false measures of shaft speed.
55
Although both approaches work, fixed point (using integers and scaling factors rather than floating point) takes a bit more effort to
tune, as small but important parts of the feedback signals are dropped… this can introduce effects like jerkiness, stutter or motor noise
from.
So if you drive a motor toward the limit but someone is pulling on it the other way, it
might push hard at first, then quickly “relax” to a voltage that's safe for continuous use, but
never stop pushing just in case you let go.
The software control loops can also detect when a person is playing with Vectors lift (or head or
tracks), and then unlock the motors.
the PID controller violently fights your attempt to pull the lift, smacking your fingers and
oscillating and otherwise causing trouble. The PID controller is pretty feisty, because it
has to operate across a huge range of forces – between flipping or lifting the robot's entire
weight and delicately setting down or lifting cubes without flinging them.
Both head and lift angle must be known exactly, since we need to know exactly where the
tongs (on the lift) are relative to what the camera sees. Otherwise we couldn't engage (lift)
and disengage (pull out) the block.
At startup Vector performs a calibration procedure, “which is just an animation that pushes the
head/lift to [their] hard stop.” Both the lift and head have hard stops at their most downward
position, which serves a well-defined starting point. When these motors reach the end of travel,
the measured speed will fall below a threshold, and the software knows to zero estimated position.
Vector’s software has two backups in case the position is wrong. This can happen if the calibration
was wrong – something, perhaps a block or impatient human companion – prevented the head or
lift from moving further. Or if someone moved his lifts or head (since the position encoder is single
step, Vector won’t be able to tell which direction they were moved).
1. The body-board firmware has motor burnout prevent features. This quickly drops the
power applied to the motor if there is a stall.
2. If a motor is stalled unexpectedly or the motor isn’t stopped (by the hard stop) within 5%
of where it should, Vector schedules another calibration procedure. (This is handled by the
ReactToUncalibratedHeadAndLift behavior.)
To turn in place, the treads turn at the same rate, but in opposite directions. The speed of
the turn is proportional to the speed of treads
To drive in an arc, the left and right treads are driven speeds:
Where
radius
111.6.1 Slip
In practice, Vector’s actual movement won’t quite match what he attempted to do. Mainly this will
come from how the treads slip a bit (especially while trying to push an object), and some variation
in how driving the motors maps to actual motion.
Drive Wheels
Move Head
Move Lift
Stop All Motors
Drive Straight
Stop All Motors
Turn In Place
Set Head Angle
Set Lift Height
Go to Pose
Turn Towards face
Go To Object
/anki/data/assets/cozmo_resources/config/ cozmo_anim.fbs
Note: this file is not read by any program in Vector. A compatible parser is compiled in.
Key frames
Motion Control
Sound Procedural
· Head angle
files & Backpack Sprite Face Controller
Events · Lift height
parametric Lights sequences · Face tilt
· Turn in place
sounds · Eye controls
· Driving
Each of the tracks within the clip is composed of key frames (with settings for each of the relevant
tracks) that are triggered at different points in time.
The PyCozmo project has the (experimental) ability extract Cozmo’s animations, and may be
useful for this transcoding and adjustment to Vector’s aesthetic.
114. STRUCTURES
The animation file starts with an AnimClips structure. Unless specified otherwise, each structure is
the same as in Cozmo.
114.1. ANIMCLIPS
The AnimClips structure is the “root” type for the file. It provides one or more animation “clips” in
the file. Each clip has one or more tracks. The structure following fields:
114.2. ANIMCLIP
The AnimClip is a named animation that can be played. This structure has the following fields:
114.3. AUDIOEVENTGROUP
The AudioEventGroup structure is used to randomly select an audio event (and volume), and send it
to the audio subsystem. See Chapter 24, section 107.2 Audio Pipeline for a description of audio
events. This structure has the following fields:
Table 515:
Field Type Units Description
AudioEventGroup
eventIds uint[] The audio event IDs, weighted by a probability. structure
Table 516:
Field Type Units Description
AudioParameter
parameterId uint The identifier of the parameter to set. Default: 0 structure
114.5. AUDIOSTATE
The AudioState structure is used to put the audio system into a particular state. See Chapter 24,
section 107.2 Audio Pipeline for a description of audio state. This structure has the following
fields:
114.6. AUDIOSWITCH
The AudioSwitch structure is used to put an audio switch into a particular setting. See Chapter 24,
section 107.2 Audio Pipeline for a description of audio switches. This structure has the following
fields:
Table 519:
Field Type Units Description
BackpackLights structure
triggerTime_ms uint ms The time at which the backlights animation should begin.
durationTime_ms uint ms The duration before a transition to the next backlight setting
may begin. During this time the lights should be
illuminated with these colors; after this the colors may
transition from these to the next colors.
Front float[4] RGBA Each color is represented as 4 floats (red, green, blue, and
alpha), in the range 0..1. Alpha is always 0 (the value is
ignored).
Middle float[4] RGBA Each color is represented as 4 floats (red, green, blue, and
alpha), in the range 0..1. Alpha is always 0 (the value is
ignored).
Back float[4] RGBA Each color is represented as 4 floats (red, green, blue, and
alpha), in the range 0..1. Alpha is always 0 (the value is
ignored).
see also: Chapter 22 section 27 Backpack lights control for a similar JSON structure.
Note: Cozmo’s animation structure includes a left and right LED animation.
durationTime_ms
The best interpretation is that, once a frame is triggered, the LED is set to the given color. The
LED won’t be changed for at least durationTime_ms. Once that time has expired, the LED color is
ramped linearly to the color of the next frame.
114.8. BODYMOTION
The BodyMotion structure is used to specify driving motions for Vector. This structure has the
following fields:
Note: it is possible that the driving should ramp to the speed in the given duration. This is a TBD.
CHANGE_EYE_COLOR
CUBE_LIGHT_TOGGLE
DANCE_BEAT_SYNC
DEAL_CARDS_BEGIN
FLIP_DOWN_BEGIN
LISTENING_BEGIN
STRAIGHT
SWIPE_CARDS_BEGIN
TAPPED_BLOCK
TOGGLE_NUMBERS_DISPLAY
TURN_IN_PLACE
Note: unless otherwise specified the animations are not allowed to have event key frames – the
behavior wouldn’t expect to send the events to them.
114.10. FACEANIMATION
The FaceAnimation structure is used to specify the JSON file to animation Vector’s display. This
structure has the following fields:
Table 522:
Field Type Units Description
FaceAnimation structure
triggerTime_ms uint ms The time at which the motion is triggered.
animName string The time at the face animation should begin. See Chapter
23 section 103.6 Sprite Sequences. Required
scanlineOpacity float This is new for Vector. Default: 1.0
The scanlineOpacity is new to support Vector’s display. With Cozmo “the screen is displayed Cozmo SDK (Anki)
interlaced, with only every other line displayed This alternates every time the image is changed
(no longer than 30 seconds) to prevent screen burn-in. Therefore to ensure the image looks correct
on either scan-line offset we use half the vertical resolution”
114.12. LIFTHEIGHT
The LiftHeight structure is used to specify how to move Vector’s lift. The lift should reach the
target height in the duration given, ramping up the movement speed smoothly (with some
variability) until it reaches that. This structure has the following fields:
Note: Each of the structures has a time code. Within each array, the time code(s) must be in
ascending order; no two entries in the same array can share the same time code.
Table 526:
Field Type Units Description
ProceduralFace
triggerTime_ms uint ms The time at which the motion is triggered. structure
The arrays of floats for each eye in animations for Cozmo have been deciphered, and are PyCozmo
presumed to be the same for Vector. They are presumed to be the same for Vector:
lower_inner_radius_x 0.5
lower_inner_radius_y 0.5
lower_outer_radius_x 0.5
lower_outer_radius_y 0.5
upper_inner_radius_x 0.5
upper_inner_radius_y 0.5
upper_outer_radius_x 0.5
upper_outer_radius_y 0.5
upper_lid_y 0.0 The vertical position of the upper eye lid (which occludes
the eye).
upper_lid_angle 0.0 The angle of the upper eye lid.
upper_lid_bend 0.0 The bend to the upper eye lid.
lower_lid_y 0.0 The vertical position of the lower eye lid / cheek (which
occludes the eye).
lower_lid_angle 0.0 The angle of the lower eye lid / cheek.
lower_lid_bend 0.0 The bend to the lower eye lid / cheek.
Table 527:
Field Type Units Description
RecordHeading structure
triggerTime_ms uint ms The time when the robot should record his heading?
114.16. ROBOTAUDIO
The RobotAudio structure is used to interact with the audio engine. It is new to Vector; a very
different structure with a similar name was used with Cozmo. This structure has the following
fields:
The box coordinates and area should smoothly move and change size to the reach the target
position and size by the given trigger time.
See also Chapter 23 section 103.3 Layout file for another method of defining a sprite box.
Table 530:
Field Type Units Description
TurnToRecordedHeadin
triggerTime_ms uint ms The time when Vector should begin to turn to the recorded g structure
heading.
durationTime_ms uint ms The amount of time to move to the recorded heading.
offset_deg short deg default: 0
speed_degPerSec short deg/sec The speed that Vector should turn at.
accel_degPerSec2 short deg/sec 2
How fast Vector should accelerate when turning. default:
1000
decel_degPerSec2 short deg/sec2 How fast Vector should decelerate when turning. default:
1000
tolerance_deg ushort deg This specifies how close the actual heading is to the target
before considering the movement complete. Default: 2
numHalfRevs ushort default: 0
useShortestDir bool default: false
High Level AI
This part describes items that are Vector’s behaviour function.
BEHAVIOR TREES. A look at how the behaviors are selected and their settings
Behavior
This chapter describes Vector’s action, behaviour, and emotion system:
Actions and behaviour queues
The emotion-behaviour system, and stimulation
115. OVERVIEW
How does Vector get excited from praise, and then decide to go exploring and play? How does he
decide it’s time to take a nap?
Vector’s high-level AI – his emotions, sense of the environment and himself, and behaviors – are a
key part of how he creates a compelling character. He has an emotional state that is seen in his
affect – his facial expression, head and arm posture – how he behaves and responds, as well as the
actions he initiates.
Actions can have associated “tag” used to refer to that running instance. The client can cancel the
action.
116.2. BEHAVIORS
Unlike actions, only one behavior can be active at a time. The others are waiting in a stack. A
behavior is submitted (to be run) with a priority; if its priority is higher priority the current one, it is
run instead. The old behavior is pushed down in the stack. When a behavior completes, the next
high priority one is resumed.
· Trigger-word detection
30: RESERVE_CONTROL
Idle behaviors
· Exploring
The behaviors are grouped, from the highest priority to the least, into the following categories:
MandatoryPhysicalReactions
TriggerWordDetected
SDKDefault (the behaviors submitted via the SDK if the default priority was used)
SingletonWallTimeCoordinator
TimerUtilityCoordinator
WeatherResponses
TakeAPhotoCoordinator
ReactToRobotShaken
ReactToTouchPetting
BasicVoiceCommands (“simple voice commands that we want to ignore obstacles”)
ReactToObstacle
InterruptingVoiceReactions
ChangeEyeColor
ReactToUnclaimedIntent
HeldInPalmDispatcher
WhileInAirDispatcher
ReactToPutDown
ReactToDarkness
GreetAfterLongTime
ReactToUncalibratedHeadAndLift
DanceToTheBeatCoordinator
StayOnChargerUntilCharged
ReactToSoundAwake
ConfirmHabitat
It can have conditions (usually on the current executing environment) that must be met
before the behavior can be activated, and other conditions that they must be met to keep
running.
A behavior can have a cool down period associated with it – a period of time after the end
of its last use before it can be run again.
A behavior can trigger animations or actions when it is activated (referred to as the “get in”
animations), and when it stops running (the “get out” animations)
“For commands such as go_to_pose, drive_on_charger and dock_with_cube, Vector uses Anki SDK
path planning, which refers to the problem of navigating the robot from point A to B
without collisions. Vector loads known obstacles from his map, creates a path to navigate
around those objects, and then starts following the path. If a new obstacle is found while
following the path, a new plan may be created.”
Vector’s behavior follows a hierarchy. “The highest level is what kind of things should Captain 2018 quoting
the robot be doing right now – Should he be quiet? Should he be engaging? Should he be Brad Neuman
sleeping? Is his battery super-low, and he needs to recharge?” Different behaviors flow
from these high-level states, in response to events and the states of his Emotion Engine.
The behavior tree works by allowing the currently executing behavior to submit other behaviors
behaviors to run; but those behaviors can have sophisticated rules (and priorities) that govern
whether can run, or should stop running. The details of the behavior tree will be examined in the
next chapter.
A working instance of the behavior is created from the behavior node – the node specifies the starting a behavior
class, and its configuration, but the state is not preserved between uses. Then:
Then the top most behavior carries out any updates to its activities and state. The behavior may
also choose to cancel itself, or to initiate another behavior.
Localization
Key Text
Remap Behavior Response key
parameters node(s) Populate Text To
string Speech
Behavior Cube
Controller Animation
Backpack
Animation
Composite
Image
Animation
Groups &
Animation
Construct the text to be spoken, from templates and parameters. The parameters are from
the cloud and within the controller.
Select cube and backpack light animations, as well as other animations to play. Some of
these animations are called out in the behavior node.
postAudioEvent
earConAudioEventNeutral
earConAudioEventSuccess
earConAudioEventBegin
Emotion Model
This chapter describes Vector’s action, behaviour, and emotion system:
Actions and behaviour queues
The emotion-behaviour system, and stimulation
117. OVERVIEW
How does Vector get excited from praise? Vector has an emotional state that is seen in his affect –
his facial expression, head and arm posture – how he behaves and responds, as well as the actions
he initiates.
Behavior Tree
Decay
Vectors mood is affected by external stimulation, and his feedback on his successes (and failures)
in his activities. His current mode affects the choices he makes and the behaviors he takes,
including those in response to events and stimulation. His emotional state is also reflected in how
the audio engine modulates its effects, even potentially choosing other effects or sounds. Vectors’
emotions are transitory though: heightened emotions decay, based on the stimulus and behavior
that drove them.
This emotion model and coupling their effects with other systems is managed by the
“MoodManager.”
118.1. STIMULATION
Vector uses a concept of a stimulation level to guide how much he should initiate
“When stimulation is low, the robot is chill,”.. Vector is studiously observing but not Captain 2018 quoting
acting out. “Then if you start making noise, or make eye contact with the robot, and Brad Neuman
certainly if you say ‘Hey Vector,’ that spikes [stimulation] way up...” But Vector also
picks up subtler actions–peripheral movement and noises, for instance, or the room lights
turning on and off. “If he gets stimulated enough, he’ll drive off his charger and start to
socialize with you, … say your name, greet you, give you a fist-bump, potentially.”
· Illumination
Video
level
Input
· Faces, Gaze
Touch Petting
· Fall Detector
· Fist Bump
IMU
· Poke
· Being held
Stimulated (or the stimulation level) is from those sensory experiences described earlier;
Social, or “how eager [he] is to interact with users generally.” “Hearing his name Wolford et al, 2018
stimulates Vector, for instance, but it also makes him more social.” Captain 2018
Confident: “Vector’s confidence is affected by his success in the real world. The hooks on
his arms sometimes don’t line up with those on his cube, for instance, and he can’t pick it
up. Sometime he gets stuck while driving around. These failures make him feel less
confident, while successes make him more confident and more happy.”
Happy. This is Vectors sense that, overall, things are going well.
Trust56
Overall, Vector possesses just enough dimensions/aspects to his emotion model to drive responses
and his goal-driven behaviour, giving him a personality. When more dimensions are used, it is
harder to get them right, and the less convincing the character when they aren’t when they aren’t.
56
Trust was added in version 1.6. Vector initially only had the first four. Cozmo had nine, so it seems plausible that Vector would have
developed more dimensions over time.
Behavior Decay
The active behaviors (which are also selected by the behavior tree) may post emotion events to the
mood manager as well.
Default
Frustrated
HighStim
LowStim
MedStim
The configuration files for the mood manager are located in a folder at:
/anki/data/assets/cozmo_resources/ config/engine/emotionevents
This is path hardcoded into libcozmo_engine. It is a folder that contains a set of JSON files, all
with the same structure. Each of these files is loaded. Each is a structure containing the following
fields:
Table 531:
Field Type Description
CheckUpdateStatusRes
emotionEvents EmotionEvent[] An array emotion event structures (see below). ponse JSON structure
The EmotionEvent describes how the emotions respond to an event. It has the following
structure:
Table 532:
Field Type Description
EmotionEvent JSON
name string The name of the event (see appendix J, Table 616: structure
The emotion event names)
emotionAffectors EmotionAffector[] The impact on the emotion state.
repetitionPenalty RepetitionPenalty This is a “time ratio” describing how the value
decays. Optional.
Table 533:
Field Type Description
EmotionAffector JSON
emotionType string The dimension or type of emotion (“Happy”, structure
“Confident”, “Stimulated”, “Social”, or “Trust”)
value float The value to add to the emotional state. The range
is usually -1 to 1
Altogether, the files respond to the following “emotion event” names. Some are external stimuli,
some are events in general, some are events regarding whether or not a behaviour succeed, or
failed (failed with retry, failed with abort).
Table 534:
Field Type Description
RepititionPenalty
nodes XY[] This is a “time ratio” describing how the value structure
decays with time.
/anki/data/assets/cozmo_resources/ config/engine/mood_config.json
Behavior Tree
This chapter describes Vector’s behaviour tree and how behaviors are configured:
Behavior trees, parameters for behaviors, conditions that allow a behavior or stop a
behavior
Cube spinner event mapping.
120. OVERVIEW
Behaviors are why Vector wants to shove stuff off of the desk.
Vector employs a behavior tree that decides if a behavior can run or can no longer run. It doesn’t
take it to the extreme a detailed decision tree scripting every action and response. Most of the
behavior tree is is focused on ensuring that transition between behaviors isn’t too abrupt, and
provides the settings (or preferences) for the behavior.
The fields and structures of the behavior tree are pretty ad hoc though. This seems to be the norm
in the video game industry
The “design principles” listed in this paper are a rather transparent attempt to impose a Isla 2005
structure on what might otherwise appear to be a random grab-bag of ideas – interesting,
perhaps, in and of themselves but not terribly cohesive as a whole.
The nodes also have a field – behaviorClass – that says how to interpret the node parameters, if the
behavior is activated. This class name links to code/modules within libcozmo_engine. There are
86 different behaviour classes.
Behavior nodes can initiate other behaviors. The identity of the behavior they launch may be
called out in the configuration of the node, or be hardcoded internally. To prevent loops, the chain
of the nodes must be acyclic. The concern is that a behavior node kicks off another (and so on),
eventually to a child node initiate another copy of the first node, leading to an infinite loop of
behaviors being started on pushed onto the stack. Not only doesn’t it give expected results,
eventually the software will run out of memory, and crash.
libcozmo_engine kicks off the initial behavior that forms the root of the tree. Vector, at the top
level, has 7 broad states:
PR demo
Factory test (e.g. the playpen tests)
Acoustic testing
On-boarding
These states are mapped to initial behavior identifier. Some have the mapping built-in to the
software (hardcode), the others this mapping is in the above JSON configuration file (in the
victor_behavior_config.json file; more on this below). In normal operation, this is the
“InitNormalOperation” behavior.
Behavior
Node
BehaviorIDs
Behavior
Node
BehaviorIDs
Behavior
Node
Behavior
Node
Behavior
Node
The decision tree logic is called out with the nodes. There is a portion of the logic that is used to
check to see if the behavior can be run. This logic can be used to delay running the behavior until
some clean or stabilization of other stuff has occurred. And there is a portion of the logic that is
used to check to see if the behavior should be cancelled.
121.1. TIMERS
Behaviors can have an associated timer, similar to an animations cool down timer. This prevents
the behavior from re-engaging too quickly. These timers can be used as part of the conditional
rules that enable or disable a behavior.
ObservingOffChager
ObservingOnCharger
ReactToIllumination
ReactToJoltInPalm
/anki/data/assets/cozmo_resources/ config/engine/behaviorComponent/victor_behavior_
config.json
Note: most of the names of the structures in this chapter are arbitrary. They were made up to ease
readability and documentation. The files do not reference any such structure names.
behaviorName
delegateID string
driveOffChargerBehavior
BEHAVIORCONFIG
The BehaviorConfig structure has the following fields:
Table 541:
Field Type Units Description
BehaviorConfig
behavior string The name of a behaviorID. The anonymous parameters
behaviors in the current behavior node are checked
first to find a behavior node with this id. Then the
global table.
cooldown_s float seconds The amount of time after this behaviour completes
before it can be run again
weight float Optional
Given an array of BehaviorConfig structures, the list is prescreened to eliminate behaviors that
already running or still in cooldown. A behavior is randomly selected from this list based on its
weighting, and launched.
TriggerWordPending
begin_s float TimerInRange Wait for the timer to have been going for at
least this number of seconds.
conditionType string (all conditions) One of the conditions listed in Table 542:
Types of condition nodes
cooldown_s float BehaviorTimer The minimum duration between behaviors.
dedupInterval_ms int TimedDedup
/anki/data/assets/cozmo_resources/config/engine/behaviorComponent/behaviors/victorB
ehaviorTree/highLevelDelegates/exploring/exploringBumpObject.json
122.2. POUNCING
Pouncing is where Vector springs forward to leap on an object, such as a finger.
Vector detects visual motion, and turns toward that (see motion detection). In this he
turns left (or right) to where he detected the motion (the Turn behavior class)
When he has a distance measurement (from the proximity sensor) (The PounceWithProx
behavior class)
When he is close enough, the animation takes over; he’ll make his facial expressions,
moves his arms, and tries to pin the object with his arms (“mousetrap”). Note: the
animations can’t be used to drive toward the target earlier; they aren’t linked into the
proximity sensors for driving.
If nothing else is happening, he’ll wait for up to 30 seconds before losing interest.
A behavior tree node, using the DispatcherStrictPriority behavior class, coordinates these. The
DispatcherStrictPriority class takes the following extra parameters:
The ReactToSound behavior class is used to rouse Vector and respond if there are any sudden
noises, or there sounds like activity in the room:
Table 547:
Field Type Units Description
ReactToSound
micAbsolutePowerThreshold float “a mic power above this will always be considered parameters
a valid reaction sound” 0…4?
micConfidenceThresholdAtMinPow float Used in conjunction with micMinPower? 0…5000
er
micDirectionReactionBehavior string The behavior ID to use for reactions
micMinPowerThreshold float “a mic power above this will require a confidence
of at least kRTS_ConfidenceThresholdAtMinPower to
be considered a valid reaction sound” 0..3 ? 999.9
is considered impossibly high”
The ReactToMicDirection behavior class is used to allow Vector to respond to direction that the
sound is coming from. It maps the sound direction to the terms “TwelveOClock” “OneOClock”,
and has conditions like “OnSurface” and “OnCharger”
See Chapter16, section 74.2 Spatial audio processing for where it the microphone sound is coming
from.
Vector can dance to music, making moves in response to the beats. The dancing can be initiated
two different ways. The first step is if a beat is detected. The second is if Vector is verbally told
to dance.
Dance to
Listen for Dance Dance
the beat
the Beat dispatcher Moves
coordinator
The details of the beat detector and tempo measurement are in Chapter 17 section 74.5 Beat
Detection.
Table 548:
Field Type Units Description
BeatDetected
allowPotentialBeat boolean Default: false. Optional parameters
If a beat has been heard, the DanceToTheBeatCoordinator proceeds in two phases. The first kicks
off a helper behavior to listen for music. If it detects music (beats), it then fires off a dance
behavior: there are two such behaviors, depending on whether or not it was on the charger. If there
is no music detected – or Vector is no longer on his treads – this behavior exits.
Table 549:
Field Type Units Description
DanceToTheBeatCoordi
listeningBehavior string behaviorID The name of a behavior node to invoke. nator parameters
Otherwise, the intent is dequeued, Vector drives off of the charger, listen for the music beats, and
(if there are any) begins dancing.
Figure 116:
ListenForBeats behavior
Perform function
preListeningAnim
animation
Perform
listeningAnim
animation
Listen
Time out?
No
Perform
Set BeatDetected
noBeatAnim
to true
animation
Perform
postListeningAnim
animation
done
The behavior plays animations when it begins, ends, and if it doesn’t hear any beats. (If it does
hear beats, the dancing behaviors will play their own animations.) It sets the behavior tree variable
“BeatDetected” to true if it heard beats; otherwise it is set to false.
Table 550:
Field Type Units Description
ListenForBeats
cancelSelfIfBeatLost boolean If true, exits the behavior when the beat has been parameters
lost,
listeningAnim string The name of an animation trigger that is played
while listening for music and getting the tempo.
maxListeningTime_sec float seconds The maximum amount of time to listen for music
minListeningTime_sec float seconds Listen for at least this amount before concluding
that there is no music and exiting.
noBeatAnim string The name of an animation trigger that is played
when the behavior exits because there is no music
playing.
postListeningAnim string The name of an animation trigger that is played
after music has been detected, and is transitioning
to the good stuff
preListeningAnim string The name of an animation trigger that is played
when this behavior is started, before listening for
music has fully started.
The selection of the dance is performed using a node with the DispatcherRandom behavior class.
The different dances (as behaviors nodes) are listed in the “behaviors” array (along with some
weighting to help randomize with dance is selected). A new behavior – one that performs the
dancing – is randomly selected from this.
A particular dance is an instance of the DanceToTheBeat behavior class. A dance includes the
dance moves, whether the back pack lights can play along, and the facial expression.
Table 551:
Field Type Description
DanceToTheBeat
backpackAnim string The backpack animation trigger name to play while parameters
dancing.
danceSessions DanceSession[] The dance moves that make up the dance.
eyeHoldAnim string The animation trigger name to animate the face
getOutAnim string The animation trigger name to play when exiting this
behavior.
useBackpackLights boolean If true, play the backpack lights animation. Default is
false(?)
Table 552:
Field Type Description
DanceSession
canListenForBeats boolean If true, then the animation (in the dance phrases) will be parameters
synchronized with beats with the beat. If false, then the
beat events from the beat-detector will be ignored.
dancePhrases DancePhraseConfig The sequence of (randomized) animations. These are
[] played in order.
playGetoutIfInterrupted boolean If true, and is interrupted by another animation, it plays
the animation specified by getOutAnim (in the
containing structure).
The dancing motions– dance phrases – are “made up of one or more possible dance animations …
strung together and played on sequential musical beats.” The DancePhraseConfig structure
“specifies the rules by which dance phrases are generated when the behavior is run.” These differ
from animation groups: here a random list of animations to play is created, rather than selecting
just one. This structure has the following fields:
Table 553:
Field Type Description
DancePhraseConfig
anims string[] The list of animation names (rather than trigger names) parameters
to randomly draw from. There must be at least one
animation given.
maxBeats uint The animation is played no more than this number of
times.
minBeats uint The animation is played at least this number of times.
multipleOf uint The animation is played is a multiple of this number of
times.
“The number of animations that make up the phrase is random, but is always between ‘ minBeats’
and ‘maxBeats’, and is always a multiple of ‘multipleOf’.” The “animations are randomly drawn
from the [anims] list in accordance with the min/max beats.”
If canListenForBeats (in the containing structure) is true, the animation (may) have an event key
frame that pauses the animation until a musical beat is heard and a DANCE_BEAT_SYNC event is sent
to the animation engine. In this case, the animations must have one event key frame, and the
event_id must be “DANCE_BEAT_SYNC”.
/anki/data/assets/cozmo_resources/ config/engine/userDefinedBehaviorTree/conditionTo
BehaviorMaps.json
Maintenance
This part describes practical items to support Vector’s operation.
SETTINGS, PREFERENCES, FEATURES AND STATISTICS. A look at how Vector syncs with remote
servers
DIAGNOSTICS & STATS. The diagnostic support built into Vector, including logging and usage
statistics
Settings, Preferences,
Features, and Statistics
This chapter describes:
/net/connman/service/wifi_..._managed_psk
WiFi
Configuration
The Vic-Switchbox interacts with the WiFi subsystem (connman) to allow the mobile App to set
the preferred WiFi network to use. The mobile app must use Bluetooth LE to do this.
Vic-Gateway interacts with the mobile App and SDK programs to changes the robot settings.
Vic-Engine receives the preferences from the Vic-Cloud and Vic-Gateway, to carry out an changes
in behaviour of Vector.
The settings in the “/data/data/com.anki.victor/persistent/jdocs” folder are all JSON files with
the following fields:
The mobile application can configuration the WiFi settings via Vic-Switchbox commands. The
WiFi is managed by connman thru the Vic-Switchbox:
created_by_app_name string The name of the mobile application that register the owner.
Example: “chewie”
created_by_app_platform string The mobile OS version string when the mobile application created
the owners account. Example “ios 12.1.2; iPhone8,1
created_by_app_version string The version of the mobile application that register the owner.
Example: “1.3.1”
deactivation_reason
dob YYYY-MM-DD The owner’s date of birth (the one given at time of registration)
57
It is not clear why there is so much information, and why this is sent from the Jdocs server in so many cases.
email_is_verified boolean True if the email verification has successfully completed. False
otherwise.
email_lang IETF language The IETF language tag of the owner’s language preference.
tag example: “en-US”
family_name string The surname of the owner; null if not set
gender string The gender of the owner; null if not set
given_name string The given of the owner; null if not set
is_email_account boolean
no_autodelete boolean
password_is_complex boolean
player_id GUID A GUID to identify the owner. This is the same as the
“drive_guest_id”
purge_reason
128.1. ENUMERATIONS
128.1.1 ButtonWakeWord
When Vector’s backpack button is pressed once for attention, he acts as if someone has said his
wake word. The ButtonWakeWord enumeration describes which wake word is treated as having
been said:
Table 556:
Name Value Description
ButtonWakeWord
BUTTON_WAKEWORD_ALEXA 1 When the button is pressed, act as if “Alexa” was said. Enumeration
BUTTON_WAKEWORD_HEY_VECTOR 0 When the button is pressed, act is “Hey, Vector” was said.
/anki/assets/cozmo_resources/ config/engine/eye_color_config.json
(This path is hardcoded into libcozmo_engine.so.) This JSON configuration file is a hash that
maps the EyeColor name (not the numeric value) to a structure with the “Hue” and “Saturation”
values suitable for the SetEyeColor API command. The structure has the following fields:
This structure has the same interpretation as the SetEyeColor request, except the first letter of the
keys are capitalized here.
The mapping of the number to the JSON key for the eye colours configuration file is embedded in
Vic-Gateway. Adding more named colours would likely require successful complete
decompilation and modification. Patching the binary is unlikely to be practical. The colours for
the existing names can be modified to give custom, permanent eye colours.
128.1.3 Volume
This is the volume to employ when speaking and for sound effects. Note: the MasterVolume API
enumeration is slightly different enumeration.
LOW 1
MEDIUM_LOW 2
MEDIUM 3
MEDIUM_HIGH 4
HIGH 5
The file is specified in the “jdocs_config.json” file (see Chapter 16, section 70 JDocs Server) by
the “docName” key within the “ROBOT_SETTINGS” subsection. The “jdoc” field is a
RobotSettingsConfig structure with the following fields:
The default values for each of the settings are held in:
/anki/assets/cozmo_resources/ config/engine/settings_config.json
(This path is hardcoded into libcozmo_engine.so.) The file is a JSON structure that maps each of
the fields of RobotSettingsConfig to a control structure. Each control structure has the following
fields:
58
Anyone else notice that metric requires a true for distance, but a false for temperature? Parity.
It is implied that the setting value is to be pulled from the Cloud when the robot is restored after
clearing.
The only entitlement defined in Vector’s API (and internal configuration files) is “kickstarter eyes”
(JSON key “KICKSTARTER_EYES”). Anki decided not to pursue this, and its feature(s) remain
unimplemented.
The entitlement settings associated with the account (as opposed to the per-robot settings) are
stored in the cloud. The settings are retrieved and a local copy is located at in:
The file is specified in the “jdocs_config.json” file (see Chapter 16, section 70 JDocs Server) by
the “docName” key within the “ACCOUNT_SETTINGS” subsection. The default entitlement settings
are held in
/anki/assets/cozmo_resources/ config/engine/userEntitlements_config.json
(This path is hardcoded into libcozmo_engine.so.) The file is a JSON structure that maps each of
the entitlement to a control structure. The control structure is the same as Table 561: The setting
control structure, used in settings in the previous section.
The file is specified in the “jdocs_config.json” file (see Chapter 16, section 70 JDocs Server) by
the “docName” key within the “ACCOUNT_SETTINGS” subsection. The “jdoc” field is a structure
with the following settings:
/anki/etc/ config/engine/accountSettings_config.json
This path is hardcoded into libcozmo_engine.so and these settings are only read (possibly) by vic-
gateway. The file is a JSON structure that maps each of the settings to a control structure. The
control structure is the same as Table 561: The setting control structure, used in settings in an
earlier section.
/anki/data/assets/cozmo_resources/ config/features.json
(This path is hardcoded into libcozmo_engine.so.) This file is organized as an array of structures
with the following fields:
The set of feature flags and their enabled/disabled state can be found in Appendix H. The features
are often used as linking mechanisms of the modules. It is likely modules of behavior /
functionality.
The file is specified in the “jdocs_config.json” file (see Chapter 16, section 70 JDocs Server) by
the “docName” key within the “ROBOT_LIFETIME_STATS” subsection. The “jdoc” field holds a
structure with the following fields:
BStat.PettingReachedMaxBliss
BStat.ReactedToCliff count
BStat.ReactedToEyeContact count
BStat.ReactedToMotion count
BStat.ReactedToSound count
BStat.ReactedToTriggerWord count
Feature.AI.DanceToTheBeat
Feature.AI.Exploring
Feature.AI.FistBump
Feature.AI.GoHome
Feature.AI.InTheAir
Feature.AI.ListeningForBeats
Feature.AI.LowBattery
Feature.AI.Observing
Feature.AI.ObservingOnCharger
Feature.AI.Onboarding
Feature.AI.Sleeping
Feature.AI.StuckOnEdge
Feature.AI.UnmatchedVoiceIntent
Feature.Voice.VC_Greeting
FeatureType.Autonomous
FeatureType.Failure
FeatureType.Sleep
FeatureType.Social
FeatureType.Play
Odom.Body
Pet.ms ms The cumulative time petted
OTA
update-engine updates
server
The Vic-Gateway and Vic-Switchbox both may interact with the mobile App and SDK programs to
receive software update commands, and to provide update status information.
The update-engine is responsible for downloading the update, validating it, applying it, and
providing status information to Vic-Gateway and Vic-switchbox. The update engine can be initiated
by Vic-Switchbox via a Bluetooth LE command. [It isn’t known yet how they kick off the update
automatically or via the HTTPs commands]. The update-engine provides status information in a
set of files with the “/run/update-engine” folder.
Production updates. These modify the ABOOT, BOOT, and SYSTEM partitions
Delta updates. These modify the file system partitions; by sending only the changes to the
underlying partitions, the updates can be very compact.
135.1. MANIFEST.INI
The manifest.ini is checked by verifying its signature59 against manifest.sha256 using a secret key
(/anki/etc/ota.pub):
Note: the signature check that prevents turning off encryption checks in the manifest below. At this
time the signing key is not known.
All forms of update have a [META] section. This section has the following structure:
Table 566:
Key Description
manifest.ini META
ankidev 0 if production release, 1 if development section
59
I’m using the information originally at: https://github.com/GooeyChickenman/victor/tree/master/firmware
A factory update has [ABOOT], [RECOVERY], and [RECOVERYFS] sections; all 3 must be
present.
Table 567:
Key Description
manifest.ini image
base_version The version that Vector’s software must be at in order to accept this stream sections
update. Honored only in delta updates. This prevents corrupting a
filesystem by ensuring that it has the expected layout.
bytes The number of bytes in the uncompressed archive
compression gz (for gzipped). This is the only supported compression type.
delta 1 if this is a delta update; 0 otherwise
encryption 1 if the archive file is encrypted; 0 if the archive file is not
encrypted.
sha256 The digest of the decompressed file must match this
wbits 31. Not used buy update-engine
There are also subtle different kinds of development software. This is indicated in the suffix at Wire/Kerigan
the end of the version string – blank, “d” or “ud”. The update-engine ensures that a Vector Creighton
cannot be changed from running software with one kind of suffix to another kind.
60
https://docs.google.com/document/d/1KZ93SW7geM0gA-LBXHdt55a9NR1jfKp7UZyqlRuokno/edit
openssl enc -d -aes-256-ctr -pass file:ota.pas -md md5 -in apq8009-robot-boot.img.gz - Example 10: Decrypting
out apq8009-robot-boot.img.dec.gz the OTA update
openssl enc -d -aes-256-ctr -pass file:ota.pas -md md5 -in apq8009-robot-sysfs.img.gz -
archives with Open SSL
out apq8009-robot-sysfs.img.dec.gz
1.1.0 and later
Note: the password on this file is insecure (ota.pas has only a few bytes 62) and likely intended
only to prevent seeing the assets inside of the update file. The security comes from (a) the
individual image files are signed (this is checked by the updater), and (b) the file systems that they
contain are also signed, and are checked by aboot and the initial kernel load. See Chapter 7 Startup
for the gory details.
The update-engine-oneshot.service is used to initiate the first attempt to update after access to the
internet has been restored.
The /sbin/update-os can be used to initiate the software update process from the command-line on
developer units. This acts as if the vic-switchboard had initiated the download and install.
Downgrading is automatically enabled. This command is new to version 1.7.
61
https://groups.google.com/forum/#!searchin/anki-vector-rooting/ota.pas%7Csort:date/anki-vector-
rooting/YlYQsX08OD4/fvkAOZ91CgAJ
62
Opening up the file in a UTF text editor will show Chinese glyphs; google translate reveals that they say “This is a password”. This
password is a bit of humour to comply with a security consultant.
This folder also holds the unencrypted, uncompressed files from the OTA file:
manifest.ini
manifest.sha256
delta.bin
aboot.img
boot.img
136.2. PROCESS
The update process works as follows; if there is an error at any step, skips the rest, deletes the bin
and img files.
2. Being downloading the OTA file. It does not download the TAR and then unpack it. The
file is unpacked as it is received.
Yes
Yes
Yes
No
Yes
continue
a) If this is a development Vector (i.e. anki.dev is set on the linux boot command line),
and the current software has UPDATE_ENGINE_ALLOW_DOWNGRADE internally set (to
true), the next two checks are skipped (until step d). Otherwise,
b) Does the suffix at the end of the version number in the new manifest match the suffix
in the currently running software? If not, a 216 error code is produced.
c) Is the new version number in the new manifest greater than the one in the currently
running software? If not, a 216 error code is produced.
d) The ankidev variable in the manifest must be set on developer units, and must not be
set on production units; otherwise a 214 error code is produced.
7. If this is factory update, it checks that the QSN in the manifest matches Vector’s QSN.
8. It marks the target partition slots as unbootable
9. Checks the img and bin contents
a) delta file
b) boot & system archive files
c) If this is a factory update, aboot, recovery, and recoveryfs
/anki/etc/update-engine.env
/run/ update-engine-oneshot.env
/run/vic-switchboard/update-engine.env
This path is in the start-up /lib/systemd/system/update-engine.service file that starts the fault-
codes service. This file can have the following fields (if none are set, the fault-code-handler
reverts to these defaults):
UPDATE_ENGINE_BASE_URL The URL to inquire for new update OTA files, when
UPDATE_ENGINE_URL is “auto”. The shard id and file
request is appended to this.
UPDATE_ENGINE_ENABLED Does not appear to be used
UPDATE_ENGINE_BASE_URL_LATEST
UPDATE_ENGINE_DEBUG false
UPDATE_ENGINE_OTA_TYPE diff
UPDATE_ENGINE_SHARD
63
There is slight race condition here: the file to signal that the user data is in a tmpfs. It is possible that the other partitions could be
updated, and the system stops executing – has a kernel panic or loses power – before it gets to the step to wipe the data. This flag will
be gone when the system restarts.
Why reboot so regularly? Vector was a new system with software initially (and hurriedly) ported
from mobile phone applications meant to be run only for a few hours. The longer a program runs,
the more likely a latent bug will cause it crash. The system software might have:
If that happens while being using it, the Vector’s applications might crash... or things limp along
with mysterious inconsistent behaviors, slowdowns, etc. By rebooting, these issues can be cleared
when no one is looking, and Vector can be played with much lower risk of a crash.
Other processes can request the reboot to not reboot by creating one the following files (and
removing it when no longer needing to delay):
/data/inhibit_reboot
/run/inhibit_reboot
If those files do not exist, it checks to see if the updater has completed applying an update and is
waiting for the reboot. It does this be checking if the“/run/update-engine/done” exists. If it does
not exist, the robot will also check for the following:
That processor is in power power-saving state. If not this indicates that it is perhaps active
and being used; this will trigger a delay
If the updater is being run; if it is, this will also delay the reboot.
The reboot can only occur within a configurable time window. If the reboot is delayed until the
robot is outside of the time window, the reboot is skipped for the day.
When the reboot does occur, the rebooter creates the file /data/maintenance_reboot to indicate the
type of reboot to the start up scripts. The startup moves the file to /run/after_maintenance_reboot
/data/etc/rebooter.env
That the configuration file was located in the user’s private file system indicates a potential per
robot configuration. The reboot time of day (etc) may have been intended (or at least considered)
to be a settable preference in the future.
Diagnostics
This chapter describes the diagnostic support built into Vector
138. OVERVIEW
Anki gathers “analytics data to enable and improve the services and enhance your gameplay…
Analytics Data enables us to analyze crashes, fix bugs, and personalize or develop new features
and services.” There are many services that accomplish the analytics services. This data is
roughly: logs, crash dumps and “DAS manager”
Logging and diagnostic messages are typically not presented to the owner, neither in use with
Vector or thru the mobile application... nor even in the SDK.
The exception is gross failures that are displaed with a 3-digit error code. This is intended to be
very exceptional.
anki-crash-log Copies the last 500 system messages and the crash dump passed to the command
line to a given log file. This is called by vic-cloud, vic-dasmgr, vic-engine, vic-
gateway, vic-log-kernel-panic, vic-log-upload, vic-robot, vic-switchboard, and the
anki-crash-log service.
ankitrace This program wraps the Linux tracing toolkit (LTTng). This program is not present
in Vector’s file system. This is called by fault-code-handler.
diagnostics-logger Bundles together several log and configuration states into a compressed tar file.
This is called by vic-switchboard, in a response to a Bluetooth LE log command.
displayFaultCode Displays error fault codes on the LCD. This is not called; see vic-faultCodeDisplay.
64
The lack of documentation indicates that this was not intended to be supported and employed by the public... at least not until other
areas had been resolved.
fault-code-handler This is called by the fault-code service. It listens for a fault code, initiates capturing
crash logs, and calls vic-faultCode to display the fault code. Located in /bin
librobotLogUploader.so Sends logs to cloud. This library is employed by libcozmo_engine, vic-gateway and
vic-log-upload.
libosState Used to profile the CPU temperature, frequency, load; the WiFi statistics, and etc.
This is used by libvictor_web_library, vic-anim, and vic-dasmgr.
libwhiskeyToF This unusually named library65 has lots of time of flight sensor diagnostics. This is
present only in version 1.6 and. This library is employed by libcozmo_engine.
rampost This performs initial communication and version check of the firmware on the
body-board (syscon). This exists within the initial RAM disk, and is called by init.
vic-anim Includes the support for the Customer Care Information Screen. This is started by
the vic-anim service.
vic-crashuploader-init Removes empty crash files, renames the files ending in “.dmp~” to “.dmp”. This is
called by the vic-crashuploader service.
vic-crashuploader A script that sends crash mini-dump files to backtrace.io. This is called by the vic-
crashuploader service.
vic-faultCodeDisplay Displays error fault codes on the LCD. This is called by fault-code-handler.
vic-init.sh Takes the log messages from rampost and places then into the system log, forwards
any kernel panics. This is started by the vic-init service.
vic-log-event A program that is passed an event code in the command line. This is called by
TBD.
vic-log-uploader “This script runs as a background to periodically check for outgoing files and
attempt to upload them by calling 'vic-log-upload'.” This is started by the vic-log-
uploader service.
vic-logmgr-upload “This script collects a snapshot of recent log data" into a compressed (gzip) file,
then uploads the file” and software revision “to an Anki Blobstore bucket.” This is
not called.
vic-on-exit Called by systemd after any service stops. This script posts the fault code
associated with the service (if another fault code is not pending) to fault-code-
handler for handling and display.
vic-powerstatus.sh Record every 10 seconds the CPU frequency, temperature and the CPU & memory
usage of the “vic-” processes. This is not called.
(Quotes from Anki scripts.) Support programs are located in /bin, /anki/bin, and /usr/bin
65
Anki has taken great care for squeaky-clean image, even throughout the internal files, so it was a surprise to see one that might appear
named after a rude acronym (WTF). The name is a result of the internal product codes: Whiskey was the code name for a new
generation of Cozmo in development. This was its time of flight (ToF) sensor library, using a modified Vector (called “Spiderface”) as
a development prototype. On Whiskey, the time of flight sensor would connect directly to the main processor.
A Customer Care Info Screen (CCIS) that can display sensor values and other internal
measures,
A debug screen used to display Vector’s serial number (ESN) and IP address, and
The fault code display which is used to display a 3-digit fault code when there is an
internal failure (this screen is only displayed if there is a fault, and can’t be initiated by an
operator.)
Entering recovery mode, to force Vector use factory software and download replacement
firmware. (This mode doesn’t delete any user data.)
“Factory reset” which erases all user data, and Vector’s robot name
139.2. VECTORS’ DEBUG SCREEN (TO GET INFO FOR USE WITH THE SDK)
Steps to enter the debug screen
This will display his ESN (serial number) and IP address. The font is much smaller than normal,
and may be hard to read.
139.3. DISPLAYING FAULT CODES FOR ABNORMAL SYSTEM SERVICE EXIT / HANG
If there is a problem while the system is starting or running – such as one of the services exits early
(e.g. crashes) , or it encounters an internal error – a fault code associated with that service is
displayed , and crash information is gathered for later analysis. See Appendix D for fault codes.
The implementation details will be discussed in section 142.6 Fault Code Handler below.
The application in the recovery mode attempts to download and reinstall the latest software. This
is likely done under the assumption that the firmware may be corrupted, or not the latest, and that a
check for corruption isn’t possible with the read-only filesystems of production software.
The menu is implemented in the vic-anim program. When the Clear User Data menu option is
selected and confirmed, triggers the erasing all of the user data when the system shutdowns down
to reboot. First, it creates the file /run/wipe-data and then begins the shutdown and reboot
process. During the system shutdown, the mount-data service will detect the existence of the
/run/wipe-data file and erase the user data (/data) and the switchboard board partitions.
The name “factory reset” is slightly controversial, as this does not truly place Vector into an
identical software state as robot in the factory.
· The connectivity with the cloud can be checked to see if the servers can be reached, if the
authentication (i.e. username and password) is valid, if the server certificate is valid. See
Chapter 14, section 52.1 Check Cloud Connection
· The debug logs can be requested to be sent to the server for analysis. See the Upload
Debug Logs command in Chapter 14, section 52.2 Upload Debug Logs
142. LOGS
Acquiring Logs
The logs can be sent to the server using the Upload Debug Logs API command. See
Chapter 14 section 52.2 Upload Debug Logs
66
The web page says that are “indicated by a blank screen. If you get a status code between 200-219, recovery mode will also help.”
Other
142.2. VIC-LOGMGR-UPLOAD
The vic-logmgr-upload script is not used, but it instructive to examine. When called it copies all of
the messages from /var/log/messages.1.gz and /var/log/messages then sends the compressed
result to the URL given on the command line.
/anki/etc/vic-log-uploader.env
This path is in the start-up /lib/systemd/system/vic-log-uploader.service file that starts the log
uploader service. This file can have the following fields (if none are set, the log uploader reverts to
these defaults):
VIC_LOG_UPLOADER_FOLDER The path on the local file system to store the logs until they can
be uploaded.
VIC_LOG_UPLOADER_QUOTA_MB 10 The maximum allowed total size of the log files to leave in the
upload folder; the oldest files are removed until the total size is
less than (or equal) to this.
VIC_LOG_UPLOADER_SLEEP_SEC 30 The amount of time between checks for log files.
142.4. OPTING INTO (AND OUT OF) UPLOADING LOGS AND DAS EVENTS
The fault handler and crash uploader also checks for the existence of the following file before
passing logs to vic-log-uploader:
/run/das_allow_upload
This file is intended to indicate – to only exist – if the user accepts uploading diagnostic
information, and to not exist if they have opted out of data collection.67 If this exists, the crash
minidump traces and log files are captured by fault-crash-handler and the log files are captured vic-
crashuploader, and passed to be uploaded. If it does not exist, the log files are not captured or
uploaded. (vic-crashuploader uploads the crash minidumps either way, but will only included the
logs files allowed.)
This file is created by the DAS-manager (more on its event collection later).
/data/data/com.anki.victor/persistent/dasGlobals.json
67
Since the file exists, and there was no opt-out setting in the Vector mobile app (that I could find) this indicates either the opt-in/opt-
out setting was not implemented yet, or found to be unneeded.
When a fault occurs, the record of activity is saved for later examination.
Both the service to start the tracing, and to record (on demand) a snapshot of the trace are handled
by the ankitrace script.
vic-cloud
/data/data/
Capture kernel
com.anki.victor/
vic-robot trace
cache/outgoing
vic-anim
The fault code is sent by writing a string with the fault code to the FIFO file located at:
/run/fault_code
/anki/etc/fault-code-handler.env
This path is in the start-up /lib/systemd/system/fault-code.service file that starts the fault-codes
service.
2. When there is any input on the FIFO, systemd launches the corresponding fault-
code.service. This launches fault-code-handler with its stdin set to read from the FIFO.
3. Then a line of text is read from the /run/fault_code FIFO, and cleaned up to only contain
only digits. If there are no digits – or the fault code is 0 – it exits.
4. The handler checks to see if the /run/fault_code.pending exists. If so, it exits. This file is
used to tell if the fault-code-handler still handling a fault, possibly while waiting for the
system to be powered off by the body-board.
5. It begins the process of capturing diagnostic traces, and logs for later analysis of the fault;
6. The system services are stopped; depending on the classification of the fault, this may stop
all, or just a few.
a. Then the vic-faultCodeDisplay is executed to display the fault code. The fault
code is passed on the command line.
11. Attempt to restart the system services, after a delay – if that is allowed with this fault
classification, and there have not been too many restarts in an attempt to clear the error.
The handler counts the number of restarts (of the system services) within a time window; if
there have too many restarts, another one is not performed.
a. If a restart is not allowed, the body-board will eventually power off the system.
The vic-crashuploader service regularly checks for log files to send to a server. The outgoing logs
are in non-volatile memory, so they can be waiting for a reboot before they are sent, if the robot
loses power, has a serious fault, or network access isn’t available.
/anki/etc/vic-crashuploader.env
This path is in the start-up /lib/systemd/system/vic-crashuploader.service file that starts the fault-
codes service.
/run/anki-crash-log
3. When an Anki application crashes, the breakpad toolkit creates a minidump file in the
VIC_CRASH_FOLDER., then it posts the path to the FIFO file
4. When there is any input on the FIFO, systemd launches the corresponding anki-crash-
log.service. This launches anki-crash-log script with its stdin set to read from the FIFO.
5. This script reads a line of text from the /run/anki-crash-log FIFO, and copies the last 400
messages the system log to file in the same directory.
/anki/data/assets/cozmo_resources/ config/engine/console_filter_config.json
This file is organized as dictionary whose key is the host operating system. The “vicos” key is the
one relevant for Vector.68 It dereferences to a structure with the following fields:
68
The other OS key is “osx” which suggests that Vector’s software was development on an OS X platform.
enabled boolean True if should log information from the channel, false if not.
The features are used as linking mechanisms of the modules. It is likely modules of behavior /
functionality. It is not clear how it all ties together.
AIWhiteboard false
Alexa false
Audio false
Behaviors false
BlockPool false
BlockWorld false
CpuProfiler true
FaceRecognizer false
FaceWorld false
JdocsManager true
MessageProfiler true
Microphones false
NeuralNets false
PerfMetric true
SpeechRecognizer false
VisionComponent false
VisionSystem false
* false
“The Services collect gameplay data such as scores, achievements, and feature usage. The
Services also automatically keep track of information such as events or failures within
them. In addition, we may collect your device make and model, an Anki-generated
randomized device ID for the mobile device on which you run our apps, robot/vehicle ID
of your Anki device, ZIP-code level data about your location (obtained from your IP
address), operating system version, and other device-related information like battery level
(collectively, “Analytics Data”).”
The DAS manager protocol’s version identifier dates to the development of Overdrive. One patent
on their “Adaptive Data Analytics Service” is quite an ambitious plan to tune an improve systems.
There is no information on whether this was actually accomplished, or that these techniques were
used in Cozmo or Vector. Anki developed “both batch and real-time dashboards to gain insights
over device and user behavior,” according to their Elemental toolkit literature.
Speculated purpose:
To identify how far people got in a process, or what their flow thru an interaction is
To estimate durations of activities, such as onboarding, how long Vector can play between
charge cycles, and how long a charge cycle is.
/anki/data/assets/cozmo_resources/ config/DASConfig.json
This path is in the vic-dasmgr executable. This file can have the following fields:
backup_quota 10000000
file_threshold_size 1000000
flush_interval 600
persistent_globals_path
storage_path /run/das
Logs
storage_quota 5000000
transient_globals_path
144.2. DAS
The DAS engine uploads JSON files. Each file holds an array of structures with the following
fields:
event string The name of the event/error that occurred, or the type of stats
loggedy. Sometimes the event is generic – as with “log.error” –
so the s1 field needs to be examined. Spaces should be trimmed
from the start and end of the field. Some event names are
accidentally logged with a trailing space (e.g.
“rampost.dfu.desired_version ”).
feature_run_id string
feature_type string
i1 int64 Extra information, in integer format. Note, for at least one kind
of entry the value domain is 64-bits.
i2 int Extra information, in integer format.
i3 int Extra information, in integer format.
i4 int Extra information, in integer format.
level string “info”, “warning”, “error”, etc.
profile_id string The account profile id... probably tied to jdocs, and token
69
This is a very helpful feature
This record is generic enough that it can hold each of the events in this form. Not every field is
used every time, and not necessarily used in the same way.
Basic information about the robot – the version of software it is running, and what the
robot’s identifier/serial number is.
Whether Vector is booted into recovery mode when it is sending the information.
The uptime – how long Vector has been running since the last reboot or power on.
The WiFi performance, to understand the connectivity at home since Vector depends so
heavily on cloud connectivity for his voice interactions.
The CPU temperature profile, to find the balance between overheating and AI
performance. Some versions and features of Vector can cause faults due to the processor
overheating. Anki probably wanted to identify unusual temperatures and whether their
revised settings addressed it.
The CPU and memory usage statistics for the “vic-” application services. Anki probably
sought to identify typical and on unusual processing loads and heavy use cases.
The condition of the storage system – information about the flash size & partitions,
whether the user space is “secure”, and whether the EMR is valid.
Speculated purpose: To identify typical and on unusual processing loads and temperatures. The
heavy uses cases are likely undesired and would be something to identify.
The data gather in Vector for these is primarily based in a library called libosState.
How this is used: to get a sense of WiFi connectivity in the home, and rooms where Vector is Jane Fraser, 2019
used. Anki’s internal research showed that rooms in a home can have a wide range of
connectivity characteristics.
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
How this is used: This information was probably intended to find the balance between overheating
and AI performance.
The experiment can be run for a bounded period of time, with an optional period that the
experiment is paused (perhaps for holidays). An experiment structure has the following fields:
version int 0
70
I suspect that this would have changed once experiments were initiated with Vector
This gives an overview of the break pad process of capture crash information as mini dumps,
and forwarding it to centralize servers for analysis
The LTTng Project, The LTTng Documentation, 2020 Aug 5
https://lttng.org/docs/v2.12/
Microsoft, minidump files
https://docs.microsoft.com/en-us/windows/win32/debug/minidump-files
https://docs.microsoft.com/en-us/windows/win32/api/minidumpapiset/
146. CREDITS
Credit and thanks to Anki who made Vector possible; C0RE, Melanie T for access to the flash
partitions, file-systems, decode keys, board shots, unusual LED codes, information on the
electronics, and OTA URLs. Wire/Kerigan Creighton for board shots, the Project Victor website
& public relations, finding the web-visualization tool, OTA URLs, identifying the valuable OTA
versions, checking the compatibility with Cozmo animations and fun with boot animations. Fictiv
for board shots. (The board shots helped identify parts on the board and inter-connection on the
board.) GooeyChickenman for the github repository. Cyril Peponet for aboot analysis, finding
OTA v1.7, and pointing me valuable past discord postings. Alexander Entinger for connector
signal information. HSReina for Bluetooth LE protocol information. Wayne Venables for crafting
a C# version of the SDK. Silvarius/Silvarius613 & nammo for info on the other Anki products that
were under development. nammo for information on error codes, shaft encoders, battery life,
signal processing, and much more. Mike Huller for catching several typos. Several drawings were
adapted from Steph Dere, and Jesse Easley’s twitter & instagram.
Thank-you Frien and Wire for posting JSON intents, and keeping the communities together.
Cyke for alerting people to interesting updates.
Thank-you to Digital Dream Labs (DDL) for continuing support for Vector; DDL and Drew
Zhrodague for providing error tables and cloud information.
147.1. ANKI
Anki, Vector Quick Start Guide, 293-00036 Rev: B, 2018
Anki, Vector Pillars, 2018
Casner, Daniel, Sensor Fusion in Consumer Robots, Embedded Vision Summit, 2019 May
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-
vision-training/videos/pages/may-2019-embedded-vision-summit-casner
https://www.youtube.com/watch?v=NTU1egF3Z3g
Casner, Daniel; Lee Crippen, Hanns Tappeiner, Anthony Armenta, Kevin Yoon; Map Related
Acoustic Filtering by a Mobile Robot, Anki, US Patent 0212441 A1, 2019 Jul 11
Fraser, Jane, IoT: How it Changes the Way We Test, Spring 2019 Software Test Professionals
Conference, 2019 Apr 3
https://spring2019.stpcon.com/wp-content/uploads/2019/03/Fraser-IoT-How-it-changes-the-
way-we-test-updated.pdf
147.2. OTHER
Coull, Ashley, Sound for Robots: An Interview with Sr. Sound Designer Ben Gabaldon, 2016 Nov
15, Designing Sound
http://designingsound.org/2016/11/15/sound-for-robots-an-interview-with-sr-sound-designer-
ben-gabaldon/
cozmopedia.org
Crowe, Steven, Anki was developing security robots before shutdown, The Robot Report, 2020 Feb
25
https://www.therobotreport.com/anki-developing-security-robots-before-shutdown/
Easley, Jesse
https://fatralla.tumblr.com/
FCC ID 2AAIC00010 internal photos
https://fccid.io/2AAIC00010
FCC ID 2AAIC00011 internal photos
https://fccid.io/2AAIC00011
FPL, FlatBuffers
https://google.github.io/flatbuffers/
Kinvert, Anki Vector Customer Care Info Screen (CCIS)
https://www.kinvert.com/anki-vector-customer-care-info-screen-ccis/
Sriram, Swetha, Anki Vector Robot Teardown, Fictiv, 2019 Aug 6
https://www.fictiv.com/blog/anki-vector-robot-teardown
Tenchov, Kaloyan; PyCozmo
https://github.com/zayfod/pycozmo/tree/master/pycozmo
Venable, Wayne; Anki.Vector.SDK
https://github.com/codaris/Anki.Vector.SDK
https://github.com/codaris/Anki.Vector.Samples
https://weekendrobot.com/
Zaks, Mazim FlatBuffers Explained, 2016-Jan-30
https://github.com/mzaks/FlatBuffersSwift/wiki/FlatBuffers-Explained
147.3. QUALCOMM
Although detailed documentation isn’t available for the Qualcomm APQ8009, there is
documentation available for the sibling APQ8016 processor.
TOOL CHAIN. This appendix lists the tools known or suspected to have been used by Anki to
create, and customize the Vector, and for the servers. Tools that can be used to analyze
Vector.
ALEXA MODULES. This appendix describes the modules used by the Alexa client
FAULT AND STATUS CODES. This appendix provides describes the system fault codes, and
update status codes.
FILE SYSTEM. This appendix lists the key files that are baked into the system.
SERVERS. This appendix provides the servers that the Anki Vector and App contacts
FEATURES. This appendix enumerates the Vector OS “features” that can be enabled and
disabled; and the AI behavior’s called “features.”
PHRASES. This appendix reproduces the phrases that the Vector keys off of.
EMOTION EVENTS. This appendix provides a list of the emotion events that Vector internally
responds to.
PLEO. This appendix gives a brief overview of the Pleo animatronic dinosaur, an antecedent
with many similarities.
Abbreviations,
Acronyms, Glossary
Table 591: Common
Abbreviation / Phrase
Acronym acronyms and
abbreviations
ADC analog to digital converter
AG animation group
ALSA advanced Linux sound architecture
APQ application processor Qualcomm (used when there is no modem
in the processor module)
ASR automatic speech recognition
AVS Alexa Voice Service
BIN binary file
BMS battery management system
BNK AudioKinetic sound bank file
CCIS customer care information screen
CLAD C-like abstract data structures
CLAHE contrast-limited adaptive histogram equalization
CNN convolution neural network
CRC cyclic redundancy check
CSI Camera serial interface
DAS unknown (diagnostic/data analytics service?)
DFU device firmware upgrade
DTTB Dance to the beat
DVT design validation test
EEPROM electrical-erasable programmable read-only memory
EMR electronic medical record
ESD electro-static discharge
ESN electronic serial number
EVT engineering validation test
FBS flat buffers
FDE full disc encryption
FFT fast Fourier transform
71
https://forums.anki.com/t/what-is-the-clad-tool/102/3
Tool chain
This appendix tries to capture the tools that Anki is known or suspected to have used for the Anki
Vector and its cloud server.
Amazon Simple Vector employs Amazon’s SQS for its DAS functions.
Queue Service
(SQS)
Amazon Simple Vector’s cloud interface uses Amazon’s AWS go module to interact with Amazon’s service:
Storage
https://docs.aws.amazon.com/sdk-for-go/api/service/s3/
Service (S3)
https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operations_Amazon_Simple_Stora
ge_Service.html
android boot- Vector uses the Android Boot-loader; the code can be found in the earlier archive.
loader
ARM NN ARM’s neural network support
https://github.com/ARM-software/armnn
AudioKinetic Used to craft the parametric sound effects, and play pre-recorded effects.
Wwise72 https://www.audiokinetic.com/products/wwise/
Backtrace.io A service that receives uploaded minidumps from applications in the field and provides
tools to analyze them.
https://backtrace.io
chromium ?
update
civetweb The embedded webserver that allows Mobile apps and the python SDK to communicate
72
https://blog.audiokinetic.com/interactive-audio-brings-cozmo-to-life/
Google Google FlatBuffers is used to encode the animation data structures. “It is similar to protocol
FlatBuffers buffers, but the primary difference is that FlatBuffers does not need a parsing/unpacking
step to a secondary representation before you can access data, often coupled with per-object
memory allocation. Also, the code footprint of FlatBuffers is an order of magnitude smaller
than protocol buffers”73 https://github.com/google/flatbuffers
Google Google’s Protobuf interface-description language is used to describe the format/encoding of
Protobuf data sent over gRPC to and from Vector. This is used by mobile and python SDK, as well
as on the server.
https://developers.google.com/protocol-buffers
Google RPC A “remote procedure call” standard, that allows mobile apps and the python SDK to
(gRPC) communicate with Vector.
https://grpc.io/docs/quickstart/cpp/
hdr-histogram This is a library used to support gathering histograms over a potentially wide range. It is
most likely used when gathering stats about internet access speeds, and equalizing images
from the camera.
https://github.com/HdrHistogram/HdrHistogram
libsodium Cryptography library suitable for the small packet size in Bluetooth LE connections. Used
to encrypt the mobile applications Bluetooth LE connection with Vector.
https://github.com/jedisct1/libsodium
74
linux, yocto The family of linux distribution used for the Anki Vector
(v3.18.66)
linux on the server
linux unified
key storage
(LUKS)
Maya A character animation tool set, used to design the look and movements of Cozmo and
Vector. The tool emitted the animation scripts.
mpg123 A MPEG audio decoder and player. This is needed by Alexa; other uses are unknown.
https://www.mpg123.de/index.shtml
Omron OKAO Vector uses the Omron Okao Vision library for face recognition and tracking.
Vision https://plus-sensing.omron.com/technology/position/index.html
open CV Used for the first-level image processing – to locate faces, hands, and possibly accessory
symbols.
73
https://nlp.gitbook.io/book/tensorflow/tensorflow-lite
74
https://www.designnews.com/electronics-test/lessons-after-failure-anki-robotics/140103493460822
Pryon, Inc The recognition for the Alexa keyword at least the file system includes the same model as
distributed in AMAZONLITE
https://www.pryon.com/company/
python A programming language and framework used with desktop tools to communicate with
Vector. Vector has python installed. Probably used on the server as well.
https://www.python.org
Qualcomm Qualcomm’s device drivers, camera support and other kit are used.
Segger ICD A high-end ARM compatible in-circuit debugging probe. Rumoured to have been used by
Anki engineers, probably with the STM32F030
https://www.segger.com/products/debug-probes/j-link/
Sensory Vectors recognition for “Hey Vector” and Alexa wake word is done by Sensory, Inc’s
TrulyHandsFree TrulyHandsfree SDK 4.4.23 (c 2008)
https://www.sensory.com/products/technologies/trulyhandsfree/
https://en.wikipedia.org/wiki/Sensory,_Inc.
Signal Essence Designed the microphone array, and the low-level signal processing of audio input.
https://signalessence.com/
Sound Hound, Vector’s Q&A “knowledge graph” is done by Sound Hound, using their Houndify product
inc https://blog.soundhound.com/hey-vector-i-have-a-question-3c174ef226fb
Houndify https://www.houndify.com/
tensor flow lite TensorFlow lite is used to recognize hands, the desk surface, and was intended to support
(TFLite) recognizing pets and common objects.
https://www.tensorflow.org/lite/microcontrollers/get_started
https://anki-vic-pubfiles.anki.com/license/prod/1.0.0/licences/OStarball.v160.tgz
https://anki-vic-pubfiles.anki.com/license/prod/1.0.0/licences/engineTarball.v160.tgz
Note: Other open source tools may have been used by Anki were used without Anki posting their
version (or modifications), and the licenses may not require them to.
75
You can only read the acknowledgements in the mobile application if you are connected to a robot.
Alexa modules
This Appendix outlines the modules used by the Alexa client built into Vector (using the Alexa
Client SDK). Alexa’s modules connect together like so:
File system
This Appendix describes the file systems on Vector’s flash. As the Vector uses the Android
bootloader, it reuses – or at least reserves – many of the Android partitions76 and file systems.
Many are probably not used. Quotes are from Android documentation.
The file system table tells us where they are stored in the partitions, and if they are non-volatile.
76
https://forum.xda-developers.com/android/general/info-android-device-partitions-basic-t3586565
77
This is mounted by “mount-data.service” The file has a lot of information on how it unbricks
78
Much information from: https://source.android.com/devices/bootloader/partitions-images
SBL1 512KB The primary and back up partitions for the secondary boot-loader. Responsible
SBL1BAK 512KB for loading aboot; has an “Emergency” download (EDL) mode using
Qualcomm’s Sahara protocol. This is in the format of a signed, statically linked
ELF binary.
SEC 16KB The secure boot fuse settings, OEM settings, signed-bootloader stuff
SSD 8KB “Secure software download” for secure storage, encrypted RSA keys, etc
SYSTEM_A 896MB The primary and backup system applications and libraries with application
SYSTEM_B 896MB specific code. Updates modify the non-active partition, and then swap which one
is active.
SWITCHBOARD 16 MB This is a modifiable data area used by Vic-switchboard to hold persistent
communication tokens. This appears to be a binary data structure, rather than a
file system.
TZ 768KB The primary and backup TrustZone. This is in the format of a signed, statically
TZBAK 768KB linked ELF binary. This code is executed with special privileges to allow
encrypting and decrypting key-value pairs without any other modules (or
debuggers) having access to the secrets.
USERDATA 768MB The data created for the specific robot (and user) that customizes it. A factory
reset wipes out this user data. This partition is encrypted using “Linux Unified
The following files are employed in the Vector binaries and scripts:
/data/panics
/data/vic-gateway
/tmp/vision/neural_nets
79
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu
80
https://manpages.debian.org/jessie/fake-hwclock/fake-hwclock.8.en.html
Bluetooth LE Services
& Characteristics
This Appendix describes the configuration of the Bluetooth LE services – and the data access they
provide – for the accessory cube and for Vector.
Note: It appears that there isn’t a battery service on the Cube. When in over-the-air update mode,
there may be other services present (i.e. by a bootloader)
81
All values are a little endian, per the Bluetooth 4.0 GATT specification
82
http://developer.bluetooth.org/gatt/services/Pages/ServiceViewer.aspx?u=org.bluetooth.service.device_information.xml
83
http://developer.bluetooth.org/gatt/services/Pages/ServiceViewer.aspx?u=org.bluetooth.service.generic_access.xml
84
http://developer.bluetooth.org/gatt/services/Pages/ServiceViewer.aspx?u=org.bluetooth.service.generic_attribute.xml
Presumably some of these will cause the Cube to go into over the air update (OTAU) mode,
allowing its firmware to be updated.
Others turn the RGB on to an RGB color, possibly duty cycle and pulsing duty cycle
85
All values are a little endian, per the Bluetooth 4.0 GATT specification
86
Todo: sync up with info at: https://github.com/anki-community/vector-archive
87
Project Victor had a write up, reference that.
88
https://developer.amazon.com/docs/alexa-voice-service/api-overview.html
Features
The following is the set of application-level feature flags and whether they are enabled (i.e.
sufficiently developed to be used) in Vector:
CubeSpinner false
GreetAfterLongTime true
HowOldAreYou true The ability for Vector to track how long it has been since he was
activated (his age) and use that info to respond to the question “How
old are you?”
Invalid false
Keepaway true
KnowledgeGraph true The ability for Vector to answer a question when asked “Hey Vector,
I have a question…”
Laser false
Messaging false
MoveCube true
PopAWheelie true The ability for Vector pop a wheelie using his cube
PRDemo false
ReactToHeldCube true
ReactToIllumination true
RollCube true The ability for Vector to drive up and roll his cube
TestFeature false
The following is the set of AI features (related to, but the same as the feature flags), which identify
an active behavior:
InTheAir Vector has detected that his in the air. If he thinks he is falling, he
may engage in “tuck and roll” where lowers his lift, and tilts his head
down.
KeepAway
KnowledgeGraph Vector has been ask to answer a question (“Hey Vector, I have a
question…”) and this behaviour is used to perform the rest of the
interaction.
ListeningForBeats Vector thinks that music may be playing and is listening for the beat
of the music to dance to. (He may follow this with the
DanceToTheBeat feature).
LookAtMe Vector is looking for a person face, to look into their gaze.
MovementBackward
MovementForward
MovementLeft
MovementRight
MovementTurnAround
NoFeature When Vector’s mind isn’t doing anything and his mind is blank…
he’ll probably pick exploring, observing, or sleeping as his next
activity.
Observing Vector is looking around.
ObservingOnCharger Vector is looking around while on his charger.
Petting Vector is being petting.
PlayingMessage The messaging features are not yet support.
PopAWheelie Vector is attempting to pop a wheelie using his cube.
ReactToAbuse Vector is responding to verbally abusive statements (represented as
an intent).
ReactToAffirmative Vector is responding to verbal complements (represented as an
intent).
ReactToApology Vector is responding to an apology (represented as an intent).
ReactToCliff Vector has detected a cliff while driving, and is reacting to it.
ReactToGazeDirection Vector has detected a face looking at him (the gaze) and is reacting
to it.
ReactToGoodBye Vector is responding to a verbal goodbye (represented as an intent).
ReactToGoodMorning Vector is responding to a verbal good morning (represented as an
intent).
ReactToHand Vector has seen a hand and is reacting to it.
ReactToHello Vector is responding to a verbal hello (represented as an intent).
ReactToLove Vector is responding to a verbal statement of affection (represented
as an intent).
ReactToNegative Vector is responding to verbal abuse.
ReactToRobotOnSide Vector has fallen (possibly from driving off the edge of his area) and
is on his side.
RecordingMessage The messaging features are not yet support.
RequestCharger Vector is asking his human to help him by putting him on his
charger. This happens if Vector can’t get to his charger – he is stuck
or doesn’t know where it is.
RobotShaken Vector has detected being shaken, like a snow globe
RollBlock The ability for Vector to drive up and roll his cube
SDK
TimerRinging Vector is playing the timer ring (i.e. the timer has expired) animation
as part of the timer behavior.
TimerSet
UnmatchedVoiceIntent The cloud wasn’t able to identify an intent based on what was said (if
anything) after the Hey Vector wake word.
VolumeAdjustment Vector’s volume was adjusted by a voice command.
Weather Vector is looking up the weather (from the cloud) and animating the
results.
WhatsMyName Vector is looking for a face and identifying it.
blackjack_playagain
blackjack_stand
global_delete
imperative_lookoverthere
knowledge_response
knowledge_unknown
meet_victor
message_playback
message_record
silence
status_feeling
imperative_affirmative 8 Yes
Questions
Emotion Events
The following is the set of emotion names used by Vector’s mood manager. Some are from
external events. Many whether or not a behavior or action succeeded, or failed (failed with retry,
failed with abort).
KeepawayPounce
KeepawayStarted
robot.boot_info
robot.cpu_info
robot.disk_info
robot.memory_info
vectorbot.main_cycle_too_long
ntp.timesync
profile_id.start
profile_id.stop
rampost.lcd_check
random_generator.seed
robot.engine_ready
robot.init.time_spent_ms
robot.maintenance_reboot
switchboard.hello
vic.cloud.hello.world
Note: other startup events are covered elsewhere with their functional groups.
behavior.sleeping.wake_up
engine.power_save.end
engine.power_save.start
hal.active_power_mode
robot.power_off
robot.power_on
vectorbot.prep_for_shutdown
battery.voltage_reset
rampost.battery_level
battery.saturation_charging
robot.off_charger
robot.on_charger
rampost.battery_temperature
gyro.drift_detected
imu_filter.fall_impact_event
imu_filter.falling_event
imu_filter.gyro_calibrated
Table 625:
Event Description & Notes
Microphone statistics
behavior.trigger_word.dropped and events, posted to
DAS
behavior.voice_command.dropped
mic_data_system.speech_trigger_recogni
zed
robot.microphone_on
robot.reacted_to_sound
robot. stuck_mic_bit
wakeword.triggered
wakeword.vad
hal.severe_invalid_prox_reading_report
robot.cliff_detected
robot.bad_prox_data
head_motor_calibrated
head_motor_uncalibrated
lift_motor_calibrated
lift_motor_uncalibrated
rampost.dfu.open_file
rampost.dfu.installed_version
rampost.dfu.request_version
Note: see the updates section for events related to updating the body-board firmware
ble_conn_id.start
ble_conn_id.stop
ble.disconnection
dasmgr.upload.stats
dasmgr.upload.failed
robot.cloud_response_failed
robot.wifi_info
robot.sdk_wrong_version
wifi_conn_id.start
wifi_conn_id.stop
wifi.connection
wifi.disconnection
wifi.initial_state
cube.scan_result
cube.unexpected_connect_disconnect
Note: see the updates section for events related to updating the cube firmware
robot.settings.updated
robot.settings.volume
robot.timezone The “tz database name” for time zone to use for the
time and alarms.
default: “America/Los_Angeles”
sdk.activate
robot.delocalized
robot.delocalized_map_info
robot.dock_action_completed
robot.fallback_planner_used
robot_impl_messaging.handle_robot_stop
ped
robot.object_located
robot.obstacle_detected
robot.offtreadsstatechanged
robot.plan_complete
robot.planner_selected
robot.too_long_in_air
robot.vision.image_quality
robot.vision.profiler.
behavior.find_home.invalid_turn_angle
go_home.charger_not_visible
go_home.result
robust_observe_charger.stats
The face recognition subsystem posts the following events and statistics:
robot.vision.detected_pet
robot.vision.face_recognition.immediate
_recognition
robot.vision.face_recognition.persistent_
session_only
robot.vision.loaded_face_enrollment_ent
ry
robot.vision.remove_unobserved_session_
only_face
robot.vision.update_face_id
turn_towards_face.might_say_name
turn_towards_face.recognition_timeout
behavior.cliffreaction
behavior.cycle_detected
behavior.exploring.end
behavior.exploring.poke
behavior.feature.end
engine.state
mood.event
mood.simple_mood_transition
robot.dizzy_reaction
dttb.cancel_beat_lost
dttb.end
Pleo
The Pleo, sold in 2007 – a decade prior to Vector – has many similarities. The Pleo was a soft-
skinned animatronic baby dinosaur created by Caleb Chung, John Sosuka and their team at Ugobe.
Ugobe went bankrupt in 2009, and the rights were bought by Innvo Labs which introduced a
second generation in 2010. This appendix is mostly adapted from the Wikipedia article and
reference manual.
Two microphones, could do beat detection allowing Pleo to dance to music. The second
generation (2010) could localize the sound and turn towards the source.
12 touch sensors (head, chin, shoulders, back, feet) to detect when petted,
Environmental sensors
Camera-based vision system (for light detection and navigation). The first generation
treated the image as gray-scale, the second generation could recognize colors and patterns.
Four ground foot sensors to detect the ground. The second generation could prevent falling
by detecting drop-offs
Infrared mouth sensor for object detection into mouth, in the first generation. The second
generation could sense accessories with an RFID system.
14 motors
Steel wires to move the neck and tail (these tended to break in the first generation)
The processing
152.10. SALES
Pleo’s original MSRP was $350, “the wholesale cost of Pleo was $195, and the cost to manufacture
each one was $140” sold ~100,000 units, ~$20 million in sales90
152.11. RESOURCES
Wikipedia article. https://en.wikipedia.org/wiki/Pleo
89
https://news.ycombinator.com/item?id=17755596
90
https://www.idahostatesman.com/news/business/article59599691.html
https://www.robotshop.com/community/blog/show/the-rise-and-fall-of-pleo-a-fairwell-lecture-by-john-sosoka-former-cto-of-ugobeJohn
Sosoka