Thesis 2021
Thesis 2021
The aim of this bachelor thesis is to elaborate on the development and implementation of a
versatile hexapod robot, which can navigate and explore terrains which are difficult for
humans to explore by themselves, such as disaster sites or inaccessible cultural heritage areas.
The goal of developing such a robot would be to combine functional aspects of intelligent
robots today to create a more adaptable and more versatile robot to autonomously perform
missions without human supervision. For exploration and navigation, the configuration used
is a hexapod configuration. With that hexapod configuration, the design has been made in a
way which allows the robot to change its gait to match its required energy consumption or
configuration. To keep our robot modular and easily changeable, PLA was used to construct
parts of the body, and servo motors were used for actuation. As for control, a raspberry pi was
used to control and the control policy was trained using reinforcement learning, specifically
Twin Delayed Deep Deterministic Policy Gradient, or TD3, so that the algorithm could find a
suitable gait based on the criteria we provided it with ROS and Gazebo. A prototype of this
robot was assembled, tested and analysed to gauge how functional it would be.
Curiosity can be regarded as humanity’s most defining trait, and a fundamental instinct. From
that curiosity has stemmed exploration which humans have been doing for hundreds of
our own archeologically significant sites. Over the course of human history, we have
developed and created many robots to lead our exploration, such as the Curiosity rover to
explore Mars or Sentry for exploring the oceans. Applications of these robots in our lives
extend from small everyday functions such as conveniences in our home to helping
A quality that allows them to help us is that small-sized robots have access to spaces that
might be too confined for a person’s dimensions, taking tiny cameras and sensors with them
to detect survivors and map survivor locations. Similarly, aerial robots can provide a broad
view of the search and rescue scenario and autonomous robots can increase the amount of
information available to rescue teams (Lima & Ventura, 2012). Well known work on Urban
Search and Rescue (USAR) robots in the US has aided in the rescue operations of the World
Trade Center after the 9/11 attacks, and this technology has also been tested during
catastrophes such as Fukushima, Chernobyl and national disasters like hurricane Katrina
(Lima & Ventura, 2012). These robots come with their own diverse configurations, some
In recent years however, multi-legged walking robots have been broadly recognised for their
robots have a variety of advantages in adapting to scenarios that have difficult terrain, such as
the aforementioned search and rescue missions, or exploration of volcanoes, for example.
Legged robots, having discrete contact points between the ground surface, as compared to
wheeled robots, which require continuous contact points, are advantageous in an unstructured
environment, and thanks to the agility and flexibility from the degrees of freedom from the
leg mechanisms, the body trajectories are less affected by uneven ground (Bo et al., 2011). In
legged robots a specific class, namely hexapods, stands out. Hexapods have six legs which
gives them more versatility as compared to other legged robots (Ding et al., 2010). Moreover,
hexapods can turn on the spot easily, and they have been observed to be the fastest out of
legged robots, as it was observed that adding more than six legs did not improve walking
speed. Along with this, hexapods showed highest static as well as dynamic stability
(Preumont et al., 1991). In addition to this, hexapods can modify their gait to accommodate
the loss of a limb or even use a limb for more complex tasks, such as pressing buttons (Ding
et al., 2010).
Apart from hexapod robots, another class of robots that have gained popularity over the last
decade is biomimetic robots, specifically quadrupeds. The most famous quadruped, Spot, has
recently been commercialised and sold nearly 400 units, at $75,000 each, amounting to
approximately 30 million dollars (Dickson, 2020). Apart from this indicator of adulation, a
lot of research has gone into quadrupeds, starting from the 1970s with the Hirose Fukushima
Robotics Lab working on their TITAN series for virtually 40 years. Moreover, we have iconic
robots like ANYmal, MIT Cheetah, and the Big Dog series. These robots performed well,
going up to speeds of 45 km/h (CHEETAH), starting, turning and stopping immediately, and
even performing complex movements such as somersaults and backflips (MIT Cheetah)
Control methods for robots can be divided into two broad major categories, traditional and
new. Traditional control methods use a robotics control pipeline which involve six different
steps, which are: observation; state estimation; modeling and prediction; planning; low level
control and deployment. Newer methods use artificial intelligence to sum up the steps
between observation and deployment as training using methods such as deep learning,
The former approach makes use of model-based properties such as kinematics and dynamics
so that a predetermined control policy can be implemented manually, such as a tripod gait, or
a wave gait. These gaits are usually not too adaptable to new terrain and need to be modified
by sensor data. However, the latter approach can be used to generate a gait which is optimal
for the robot and the conditions it is in using techniques like reinforcement learning, which
uses the model of the robot to simulate a series of inputs to each joint, rewarding or
penalising the robot based on its progress. For example, if the robot tips over, it is penalised,
but if it covers some distance in a straight line, it is rewarded. After training the robot this
way, an optimum gait is reached which has the highest reward. As compared to machine
task driven (as in supervised ML) or data-driven (as in unsupervised ML) (Shaikh, 2017).
In this thesis, the objective is to assimilate qualities of both hexapods and quadrupeds, to
develop and build a prototype for a versatile robot that can perform dexterous tasks and adapt
to the given environment, and then evaluate the performance of such a robot based on widely
recognised tests. The applications of the robot would range from household assistance to
USAR and exploration of unseen and uneven terrain. This objective extends to choosing a
well-suited gait using reinforcement learning to allow smooth operability on any terrain and
A substantial part of the terrain on Earth is inaccessible to exploration with only wheeled
locomotion. In particular, natural terrain has obstacles like large rocks, loose soil, deep
ravines and steep slopes which leave wheeled locomotion not only inefficient, but in many
cases, ineffective. In these types of landscapes, legs have been proven to be suitable for
locomotion since small obstacles can be avoided by discrete contact with the ground and not
engaging with undesirable footholds. Moreover, these mechanisms can climb over obstacles
and step over ditches while staying leveled and stable, a feat that wheeled locomotion would
only achieve at a small scale. Along with that, the use of multiple degrees of freedom in the
leg joints gives legged vehicles the ability to change their heading without slippage. A unique
advantage that legged systems provide is that they have a higher tolerance for failure during
static stable locomotion as compared to that of a wheeled robot. For example, the impact of a
wheel failing in a wheeled vehicle is detrimental to its locomotion and mobility, and in
contrast, legged robots can possess redundant legs while still maintaining static balance as
well as continuing locomotion even with one or more damaged legs (Silva & Machado,
2007).
That being said, there are limitations which should be considered while evaluating a legged
robot. Although superior to wheeled locomotion, legged robots often have low speeds, can be
difficult to build and require more complicated control algorithms. Apart from this, their high
degrees of freedom can lead to a lot of consumption of energy as well (Silva & Machado,
2007).
Despite these shortcomings, legged robots have been researched as early as 1878, with the
analysis of gaits used by different mammals, and by the mid 1950s, different laboratories
were designing and building walking robots. After that, came a flurry of walking robots,
although rudimentary in design and implementation, such as the six-legged machine by the
Space General Corporation for space travel in the early 1960s; the General Electric
Quadruped in 1968, which was the first to use different gaits; and the Phoney Poney in 1966,
was the first robot to move autonomously under computer control and electrical actuation;
Figure 1.
From there, research and robotics in legged robots has come a long way to give way to
different classes of legged robots, namely monopods, bipeds, and so on according to the
number of limbs required to walk. Out of these classes, one seems to stand out the most, the
hexapod class.
2.2 Hexapods
Hexapods, defined as six legged robots, have been used for exploration of remote locations
and hostile environments such as the seabed, in nuclear power stations, in space or on other
planets, and in search and rescue operations as well (Tedeschi & Carbone, 2014).
Referring to the aforementioned high failure tolerance, hexapods could even utilise one or
two limbs to perform a dextrous task, such as pushing a button or loading and unloading a
payload. They have greater mobility in natural surroundings than other legged robots and
have a lower impact on the terrain as well, which can be crucial in remote or hostile
environments. Additional features which merit the development of hexapods are that they can
have variable geometry, their static and dynamic stability is the highest as compared to other
A successful early design traced back to the University of Rome in 1972 was constructed as a
computer-controlled walking machine with electric drives, and another design from Ohio
State University in 1977 was the OSU Hexapod. These robots could walk short distances and
even traverse obstacles, but they were heavy and lacked versatility and adaptability. By the
early 90s, many renowned organisations and universities such as Mobot Lab and TUM had
built hexapods which showed higher mobility and computing power, such as the Attila and
Hannibal robots, which possessed over 19 degrees of freedom, more than 60 sensory inputs, 8
However in the last two decades, there has been an even more rapid development in control
systems technology which lead to hexapods being equipped with various sensors and
controlled with artificial intelligence systems to analyse the environment and motion of the
robots on complex surfaces. The robots of this generation were fast and agile, such as the
Biobot, which was built as a biomimetic robot modeled after the American cockroach;
Hamlet, constructed by the University of Canterbury, which had legs with three joints each to
study force and position control on even terrain; and lastly, there is RHex, which comes from
multi-university DARPA funded efforts which uses six compliant legs to rotate the legs as
wheels. Other impressive robots included AQUA, the amphibious hexapod, which could
change from a walking gait to a swimming gait for inspecting shipwrecks and surveying the
seafloor in high current and turbid environments. These developments in hexapod robots
helped to popularise the utility of this class of robots (Tedeschi & Carbone, 2014).
Tedeschi and Carbone (2014) divided the design of the hexapod into three sections, namely:
For robot body architecture there are two options, either rectangular or hexagonal. Both have
their merits and demerits, and among the most prominent are that rectangular hexapods are
better suited for moving along a straight line, and that the feasibility of their gaits has been
more widely tested. On the other hand, hexagonal robots do not require a turning motion as
they do not have a front or rear as rectangular ones do. Another point of comparison was that
rectangular hexapods had a higher longitudinal stability margin but hexagonal ones had a
higher turning ability and stride. Coming to the second section, the kinematic architecture of
legs, there are many different options to consider depending on the type of workspace,
payload and terrain the robot would be facing. To navigate this section, there are three stages,
Biomimetic options for the legs are inspired by reptiles, mammals or insectoids. Reptilian
legs are placed on the sides of the body and the needs are to the side of the base, similarly,
mammalian legs are placed under the body and provide less support, but require less energy
as well, and lastly, insectoid legs have the legs on the sides of the body with the knees above
the base level of the body. In contrast to that, non-zoomorphically inspired legs are not
motivated by any naturally inspired gait and can be many different types, such as hybrid with
wheels at the end of the legs, or telescopic legs where the segments of the leg get smaller the
further you get from the body. After deciding on the type of leg, comes the next stage, the
orientation of the legs. There are three options here, which are frontal, where the legs are
perpendicular to the direction of movement, sagittal, where the legs are parallel to the
direction of movement, or circular, where the legs are present radially to the body, allowing
easier movement in any direction. Depending on the type of orientation chosen, the control
methods can be more or less complicated according to the application of the robot. The last
stage of this section is to choose which joint configuration to choose for the legs from the
knees being outwards, inwards or having either side in the same configuration, from these
three, the option that can be seen most frequently throughout hexapods in history is the knee
being outwards, since it grants more dynamic and static stability and does not compromise
the mobility of the robot as the other two configurations do (Tedeschi & Carbone, 2014).
The third and last section regarding hexapod design is the actuator, and can be divided into
two parts, the type of actuator and the arrangement of the actuator. The type of actuators used
in popular hexapods have ranged from electric motors, to pneumatic actuators. Electric
rotating motors are cheap, easy to control and easy to power as well, so they make good
actuators for smaller, lighter hexapods. Some hexapods have used linear motors in the past as
well, but their low movable range to weight ratio makes them undesirable to use. Pneumatic
actuators have low stiffness and inaccurate response but offer a short response time. They do
require heavy air supply from bottles on compressors which can make a robot bulkier than it
needs to be. A popular choice among heavier and larger hexapods is hydraulic actuation,
since they have a high power to weight ratio and they can supply high force, but they do need
an engine to drive the pump, so they’re suitable only for large hexapods. There has been use
of unconventional actuators such as shape memory alloy, which contracts when heated, but
they have been most applicable to microrobots since they have a slow response time and
kinematic structure to save energy consumption. There are three popular approaches that are
used in hexapods, namely attaching the actuator at the joints, attaching the actuator using a
pulley and belt, and remote actuation. The first approach is the most straightforward and least
complex with respect to control algorithms. Using a pulley and belt, or a remote actuator
where the actuator is present at the end of the leg segment usually make the control algorithm
more complex without providing any prominent merits. This section concludes the
The next step in the design and implementation of a hexapod robot is gait planning. A gait
can be defined as the sequence of periodic or aperiodic leg motions coordinated with a
sequence of body motions to move the robot in the desired direction and orientation from one
point to another. Traditional periodic gaits for hexapods are inspired by nature and animals,
and the most popular three gaits are the metachronal/wave gait, ripple gait and the tripod gait.
The following image shows how they work, the white representing the foot in contact with
the ground and the black representing the foot being off the ground.
IMAGE
As we move from the wave to ripple to tripod gait, the speed of the robot increases, but it is a
tradeoff with the dynamic stability of the robot. Since in the wave gait, one leg moves at a
time, the supporting polygon formed by the other five legs makes for a stable base of the
robot. Similarly, in the ripple gait, two legs move at a time and the other four form a
supporting polygon, and lastly in the tripod gait, three legs move at one time, and the other
three form the supporting polygon. A problem with periodic gaits is that they require
synchronization between the expected and actual times that the legs make contact with the
terrain, which can be problematic over uneven ground where the leg phasing and stability is
IMAGE
A solution to this is talked about by Ding et al (2010): free gaits, which are aperiodic gaits
that are non-symmetric and terrain adaptive. Each leg has a flexible motion as a function of
the overall trajectory of the robot and there have been many free gaits developed over the
years. In recent years, however, using simulation-based design to generate gaits has been
used more often. After adding the mechanical structure to a simulation environment,
kinematics and dynamics properties are added, and finally, actuator control. This approach is
a powerful alternative to the classic design methods as it gives more room for gait
optimization and allows testing in a virtual environment, allowing more freedom without
As for types of algorithms to generate an optimal gait, there are quite a few options, such as
genetic and evolutionary algorithms, or data driven machine learning algorithms. Our focus
will be on reinforcement learning, which is an effective method to tackle the gait generation
problem. The main characters of reinforcement learning are the agent and the environment,
the agent in this case is our hexapod robot, and the environment would be the virtual
environment it interacts with. At every step of the interaction, the agent would see an
observation of the state of the world and then it would decide on an action to take. As the
state changes, depending on our criteria, the reward given to the agent changes as well, and
the objective of the agent is to maximise the cumulative reward, called the return. Through
the complete description of the state of the world and an observation o is a partial description.
In deep reinforcement learning, both are represented with a real valued vector, matrix or a
high order tensor, such as representing the state of the robot by its joint angles, velocity and
orientation with respect to the world frame. Our action space, which is a set of valid actions
in the environment, will be continuous, as opposed to discrete, since we have joint angles and
velocities that can be variable. Our policy, which is the function that takes the state as the
input as outputs the action, has many parameters, such as the weights of the neural network.
Lastly, comes Q learning, where the function Q gives us the expected future rewards for each
action in each state. Since our goal is to maximise the reward, in turn we also need to
maximise the Q value function. In a complex environment, however, the function has to do
an innumerable number of calculations for the high number of actions and states (Achiam,
2020).
When there is an extremely high number of calculations to be done, deep Q-learning is used,
which uses a neural network to predict the Q-values as close to their Q-target as possible. The
goal of this neural network is to reduce the loss between the prediction and target over time
through stochastic gradient descent. However, deep Q-learning is applied only to discrete
action spaces, but since ours is continuous, it would not apply to our problem. In addition to
this, there is the policy gradient, which is a second neural network using an approximation
method that consists of updating the weights of the policy and it takes the states as input and
the actions as outputs. And since our goal is to increase the total return, we use gradient
As previously mentioned, deep Q learning does not apply to our problem since we have a
continuous action space, we use something called the Actor-Critic model which is actually
two models, the actor and the critic, working together. The actor is the policy which takes the
states as input and returns actions as output, and is updated through a deterministic policy
gradient algorithm. The critic is what takes the place of the deep Q learning neural network,
and takes states and action both as inputs, outputting Q-values. This way we use policy
gradient on our critic model to get our Q-values closer and closer to the target and in turn, we
use the Q-values to approximate the expected return, which is used to perform gradient ascent
(Foy, 2019).
To summarise, according to Fujimoto et al. (2018), the policy known as the actor, can be
Where Qπ (s, a) =Esi∼pπ,ai∼π [Rt|s, a], the expected return when performing action a in state s
Coming to the method we’ll be using, Twin Delayed Deep Deterministic Policy Gradient
(herein referred to as TD3), the Actor-Critic model is used. To reiterate, the actor is the policy
that takes the state as input and outputs action, and the critic takes states and actions as inputs
and outputs a Q-value. In TD3, to improve upon the approximation errors in the Actor-Critic
model, there are two critics instead of one (attributing to the “Twin” in TD3), and each critic
allows the model to store and access past transitions from which it learns the Q-values. The
transitions are stored as (s, s', a_1, r_1), the current state, the next state, the action and the
reward. Next, a neural network is built for the actor model and one for the actor target as
well. Both the neural networks are initialised the same way, and the weights are updated over
the iterations to return optimal actions. Fujimoto et al. (2018) suggest a specific architecture
Next, two neural networks are built for the critic models, and two for the critic targets
following the aforementioned architecture. At this moment, we let our two actor neural
networks learn the policy and the four critic neural networks learn the Q-values.
As our actor target is built, it outputs the actions for the critic target, which in turn leads to the
critic model, and finally, the actor model gets its weights updated through gradient ascent.
Moving on to the training process, a batch of transitions (s, s', a, r) from memory goes
through the following steps: from the next state s’, the Actor target plays the next action a’,
and gaussian noise is added to the next action a’ and the values are put into a range which is
supported by the environment. The gaussian noise allows us to explore the environment and
the agent to explore its action space by adding bias to the action. Then the two critic targets
takes the resulting transition (s’, a’) as an input and return two Q-values, Qt1(s’,a’) and
Qt2(s’,a’) and the minimum of the two is kept as the approximated value of the next state.
The minimum being taken into account prevents the approximation error from the classic
Actor-Critic model from getting too high and in turn the training process stabilises. The
aforementioned minimum Q-value is then used to get the final target of the two critic models,
by the formula:
Where gamma is the discount factor which determines how fast the agent will react to the
reward, whether it will seek a steep slope to get a fast reward or a steadier one. The two critic
models will take the couple (s,a) as input to return two Q-values, Q1 and Q2 and the values
will be compared with the minimum from the target, Qt to compute a loss function as
follows:
The resulting critic loss will be backpropagated and will update the critic model parameters to
reduce the loss function. This concludes the Q-learning part of the training process, and we
parameters for the actor model so that the agent can perform the optimal actions in each state
to maximise the expected return. The Q-values we obtained from the critic models have a
positive correlation with the expected return so we will use those to perform gradient ascent
by differentiating their output with respect to the weights of the actor model. While our actor
model’s weights update in the direction of a high Q-value, our agent will return better actions
to increase the Q-value of the state-action pair, and get the agent closer to the optimal return.
Every two iterations, we will update the actor model by performing gradient ascent on the
output of the first critic model and the weights of the actor target are updated with polyak
averaging. Polyak averaging refers to the updating of the parameters of the actor target using
the sum of the products of a very small number tao and the parameters of the actor model
theta.
This equation represents a slight transfer of weights from the actor model to the target and
thus the actor target gets closer to the model with each iteration, giving the model time to
learn from the target. Similarly, the critic target’s weights are updated with polyak averaging
as well. Since we update the weights every two iterations, the delayed aspect comes in,
Being a newer model, TD3 shows a lot of promise and improvements as compared to classic
DDPG or to the Actor-Critic Model, which is why it was chosen to generate the optimal gait
For the mechanical design of the hexapod, the three sections of hexapod design described by
Tedeschi & Carbone (2014) were gone through. For the robot body architecture, the
rectangular form was chosen as compared to the hexagonal one, since longitudinal stability
was more important as our hexapod, Audax, would incorporate computer vision in the future,
which would mean that a camera placed on the body would give it a “front” and “rear” side,
being more in line with a rectangular design. A hybrid design between reptilian and insectoid
legs was chosen for Audax, where the leg is composed of three segments which are nearly
identical in size, allowing for a larger workspace and higher fault tolerance, although it would
require higher torque from the joints. To compensate for the higher torque, the actuators
chosen are servo motors since they are easy to control and can generate high torque. And
similarly to make sure that the control algorithm is not too complex, the actuator arrangement
As for the material, PLA was chosen, since the body was designed to be modular and easily
changeable and making more parts for the body would be a simple process. PLA is easy to
use with a 3D printer, is lightweight and relatively strong, and small aluminium plates were
added to strengthen the joints of the main body. The parts to be 3D-printed were designed in a
CAD software, Autodesk Fusion 360 using direct modelling to create a mesh and model of
the hexapod. The main body is composed of two subparts, the roofplate and the baseplate,
where the baseplate houses the power source and the joints where the legs connect to the
body, and the roofplate houses the computing unit, which is the raspberry pi 4 model B, along
with the servo HAT attachments to increase the number of channels to control all the servos
with the raspberry pi. Each subpart is then divided into thirds which are connected via screws
IMAGE
The mounting holes in the baseplate are present for servos to fit in. The leg itself is divided
into three segments as well, namely the arm which is connected to the main body, then the
forearm, and finally the hand which comes into contact with the ground. The joints present
are the shoulder joint which connects the body and the arm, the elbow joint, which connects
the arm and the forearm, and finally the wrist joint, which connects the forearm and the hand.
IMAGE
The length of the leg segments was chosen as a hybrid between a quadruped leg and a
hexapod leg. Quadruped legs have two segments that are identically shaped, and hexapod
legs have three segments that get longer as they move further away from the main body.
IMAGES
This design of the legs was chosen to have the possibility of an alternate quadruped gait if
two of the legs were to get damaged, or if the middle two legs were used to perform
dexterous tasks, such as loading or unloading a payload, pressing a button, or pushing a lever.
Since the longer leg segments allowed for a wider action space for the legs, it increased the
mobility of the legs as well. As mentioned previously, servo motors were chosen for the
actuators of the joint as they generated the high torques that were needed to support the main
body of the robot. At 6V, the servos would generate 15.5 kgcm, which would be more than
sufficient to support the approximately 3.5 kg main body, divided by the six legs. Each joint
on the leg was fastened using metal screws to prevent any deformation, as opposed to nylon
screws. Moreover, the shoulder joints, where the legs joined the body, used servos that had a
metal servo horn instead of the standard plastic servo horn, since the weight of the body was
applied most at this joint. This measure was taken after deformation occurred in one of the
IMAGE
After the parts were designed in Autodesk Fusion 360, they were printed in PLA, and
simultaneously, the meshes were used to construct a model in Unified Robot Description
Before the URDF file was written, the meshes were centered and their center of mass was
edited using Blender, so that the inertias could be as accurate as possible. A small python
script by Stavrinos (2020) was used to generate accurate inertia tensors for each part, using
their mass, mesh file and the scale they were used at. This was crucial to achieve behaviour in
The URDF file contained description of the links, joints, transmissions and the gazebo
descriptions for each link and joint, sensor plugins and control plugins. Each link description
contained the inertial, visual, and collision properties of the link (Mishra, 2021).
IMAGE
The joint descriptions referred to the relationship between a parent and child link with the
dynamics such as damping and friction, and the effort and velocity limits to make the
IMAGE
The transmission element in the URDF file described the relationship between a joint and its
actuator, for example, to describe what kind of interface it had, which in our case was an
effort joint interface, which controls the effort applied at each joint rather than the position of
the joint to get it to its desired angle. The transmission element is necessary to use ROS to
control the joints through the ros_control plugin (Mishra et al., 2021).
IMAGE
As for the gazebo element, there was an element present to describe each link, joint and even
the robot for gazebo. The link descriptions converted the visual colours to Gazebo format, the
stl files to dae files for better textures and to add sensor plugins, such as a contact sensor.
Furthermore, it provided the option of turning off self collisions to debug the code in case
there were errors or abnormalities. One part present in the simulation but absent in the robot
itself was a contact sensor at the end of each leg so that contact with the ground could be
logged into the data to correctly judge the gait. Joint descriptions for Gazebo described
proper damping dynamics for each joint and had the option to add an implicit spring damper
for more realistic movements. Apart from these plugins and descriptions, control plugins
were added as well, such as the aforementioned ros_control plugin to control the actuators of
the robot using ROS, an IMU plugin to receive the pose of the robot and a
p3d_base_controller to receive odometry information for the robot (Mishra et al., 2021)
After that, xacro was used to clean up the file by using local variables where needed and
The repository contained a total of 10 subdirectories, config, stl, src, models, urdf, rviz, srv,
msg, launch and results. The config folder contained two files, audax_control.yaml and
audax_trajectory_control.yaml, and the former initialised joint position controllers for each
joint along with PID values for the controller; similarly, the latter initialised joint trajectory
controllers for each joint along with PID values for each controller.
The directory named stl contained stl files for each link of the robot described in the urdf file,
so that they could be imported to the virtual environment in gazebo to display accurate visual
and collision properties. The rviz folder contained a configuration file for viewing the Audax
model in rviz to see how the inputs would look like in an isolated environment. The launch
directory contained the launch files needed to spawn a model of Audax into the environment,
initialise the controllers and start the TD3 algorithm. The directories src, models, srv, msg
After a model of the robot was spawned successfully in Gazebo, the aforenamed TD3 model
was implemented to generate the optimal gait for Audax. A model implemented by Rahme
(2020) to generate an optimised gait for a bipedal robot, PLEN, was adapted for Audax. The
controllers and plugins used for PLEN were the same ones used for Audax, since Audax has
six legs, the computations for the simulation were significantly slowed down and PLEN’s
Gazebo simulation ran at approximately 1.30 times simulation speed as compared to Audax,
which ran at 0.40 times simulation speed, which increased the training time required for TD3
with Audax.
Let us briefly describe the subdirectories in the repository which are concerned with the TD3
model. The directory src contained the code which implemented the modified Actor-Critic
models for ROS to interact with them. The msg folder contained the .msg file which
described the data values that the ROS nodes initialised from the launch files used, and
similarly, srv contained the service description for the ROS services used, such as
SpawnModel, DeleteModel, UnloadController and others. The data regarding the weights and
optimisation of the actor and critic models was stored in the models directory and the results
The compute reward function for the robot would decrease the reward for the following
conditions: (1) for being dead at the end of the action, which would occur if the pitch or the
roll would be more than 90 degrees in either direction, where Audax would be tipped over;
(2) if Audax strayed away on the y-axis too much, so that the motion would be in a straight
line; (3) if the height of the main body would be too close to the ground; and (4) if the robot
hadn’t moved significantly. The conditions which would increase the reward were as follows:
(1) for being alive at the end of the action i.e. not resulting in any of the aforementioned
conditions; (2) for achieving a positive velocity in x-axis; (3) for staying on the x-axis, thus
going in a straight line; (4) for staying upright with regards to roll and pitch; and (5) for
facing forward.
The training of the model took place for 48 hours in real time, and 24 hours in simulation
time, which led to the policy and the Q-values being optimised, and at the end, providing us
with a gait which was optimal for this model. After 319,999 iterations, the algorithm was
stopped and the policy was implemented in Gazebo to see the results. It showed the actions
3.4 Circuitry
Coming to the assembly of the robot, the raspberry pi 4, being the computing unit, was
connected to a power bank, and two servo HATs were stacked on top of each other to have 32
channels, of which 18 were used to control the servos. The servos themselves were powered
from a separate power supply through the servo HATs, which consisted of lithium ion
batteries, which were the most appropriate since they had a high power:weight ratio. Each
battery was 3.7 V, with 3400 mAh at full charge, and two of them were connected in parallel,
and in turn, two parallel units were connected in series to provide a voltage of 7.4 V with
double the amperage, 6800 mAh. Each servo HAT was powered by such a battery unit and
the batteries were housed on the baseplate. As each leg consisted of 3 servo motors, and there
were 6 legs, giving us a total of 18 servo motors, they were evenly divided amongst the two
servo HATs so that each would be connected to 9 motors, controlling 3 legs. The circuit
diagram shown below illustrates the connection of the battery unit and the motors to each
HAT.
Subsequently, the two HATs were controlled by the raspberry pi, and the picture below shows
The HATs from Adafruit provide the Raspberry pi more channels to easily control and power
the servos individually without compromising the Pi’s own power source, which was a power
bank with a steady voltage of 5V, as opposed to each servo running on approximately 7V.
4 Evaluation
One of the initial problems faced was that although the overall centre of mass of the robot
was at the center, the center of masses of the two forward legs being outwards on the front of
the robot were causing it to fall forwards. To fix this issue, it was considered to attach
counterweights on the other side to prevent any falling, but finally the legs were faced
Another problem that arose was that since PLA was lightweight but not too strong, the
originally designed fasteners between the baseplate segments snapped off at the slightest
pressure. To counter this issue, the previously mentioned aluminium plates were used to
fasten them together, providing mechanical support as well, since they are more durable than
Lastly, a problem that arose during testing was that the cable management was absent since
there had to be adjustments made when setting the maxima and minima of the servos through
observation and trial and error. These maxima and minima were set in a way that no
collisions would happen even if odd values were transmitted to the pi, but that required
consistent changing of the way the servos horns were fastened to the servos. The cables were
often a hindrance but to solve that, it was decided that they would be zip tied to the closest
4.2 Testing
Before testing the prototype of Audax, a prototype of a single leg was built and tested to plot
current consumption and possible length of a single stride. The leg was clamped to hold the
shoulder joint and was connected to a DC supply of 5V. To measure the current, a magnetic
clamp was used to view the current consumption on an oscilloscope, and to measure the
stride, the contact point of the hand segment was marked where it touched the ground given a
As the graph above shows, the average current consumption was around 0.865A, with peaks
of 1.23A when the movement began and a constant running current of nearly 0.5A.
Moreover, the length of the stride seemed promising as it was 30 cm, meaning the robot
After assembling Audax from its parts and fastening all the joints, random joint position
inputs were given to test the functionality of the legs as a manual diagnostic run and it was
assured that the wiring was not faulty. First, the trajectory optimisation algorithm developed
by Ayush Kumar Chaudary was implemented. A ROS node which would subscribe to the
rostopic in which the joint state messages were published was created and run on the pi. Then
using this subscriber code, the joint state values were mapped from the minima and maxima
of the simulation to the minima and maxima of each of the servos present to ensure that no
damage would occur if there were any abnormalities in the transmitted joint position values.
The results are summed in the following videos and the images below:
As it can be seen in the videos and images, although the mechanical structure seemed to bear
the weight and the servos were able to move to their mapped position, the overall gait did not
seem to be very effective in moving forward. When the speed of transmission of values was
slowed, the movements tended to be rigid and uncoordinated, making the gait ineffective.
However, at higher speeds, the movements were more fluid and propelled the robot in a
forward direction. The adaptability of the gait from simulation to real-life was low, and it
seemed that traditional gaits such as the tripod gait would be more effective.
Next, the optimal policy provided by the TD3 model was implemented and it yielded similar
results. The movements were uncoordinated, unstable, and did not seem to produce a gait that
was effective in real life. In simulation, the optimal actions in quick succession in Gazebo
produced fidgety movements, but as selected by the algorithm for the highest return after a
simulation runtime of 1 day, that was the optimal policy which was chosen.
The gap between the two models and reality seemed large, and there are limitations which
4.3 Limitations
The most prominent limitation was that of computing power and complexity. The TD3 model
ran for 68 hours in real time, yet due to the weak processing power of the GPU used, it only
ran for 30 hours in simulation time. This limitation caused the policy to not be fully
optimised since it only ran 319,999 iterations, and for Rahme (2020), it took 1999,999
iterations to get the policy which he used. This illustrates that without sufficient computing
with TD3.
Apart from that, although lightweight and cheap, a better alternative to the PLA baseplate and
roofplate used would have been an aluminum plate, since it is durable, strong, and would be
in one part instead of three. One of the reasons the baseplate and roofplate had to be split in
thirds was that the 3D printer available was not large enough to print the whole thing as one
unit.
Lastly, lack of a regulated testing environment made it difficult to properly document and
5 Conclusion
5.1 Objectives
The objective of this thesis was to incorporate the qualities of both hexapods and quadrupeds
together to build and control a versatile robot. Since most robots in the current market are
appealing to designs like Spot from Boston Dynamics, the versatile class of hexapods is
being overlooked. The design proposed in this thesis pushes for a more dexterous and agile
robot, one that can adapt to any terrain and perform with minimal supervision. The results
and work from this thesis confirm the feasibility of such a robot and albeit the control
algorithms implemented require more work to reach the optimal level, a platform for more
research has been established. With the groundwork and foundation for further research laid
To ensure that the prototype developed can be used for long term research, it is important to
reinforce the parts and change to a more durable material, such as aluminium. Using PLA
parts for the legs over a long time could cause warping and bending of the parts, so changing
agent, and by using a computer equipped with a more powerful GPU, the run time could be
In the future to allow autonomous control, space has been designated for a raspberry pi
Lastly, to employ fault tolerant gaits, it is necessary to have sensors which indicate or monitor
whether a leg is functional or not so that a new control algorithm can be deployed, such as a
gait for the 4 hind legs if the front two are non-functional, or the 4 outer legs if the middle
Bo, J., Cheng, C., Wei, L., & Xiangyun, L. (2011). Design and Configuration of a Hexapod
https://bdtechtalks.com/2020/12/15/boston-dynamics-hyundai-acquisition/
Ding, X., Wang, Z., Rovetta, A., & Zhu, J. M. (2010). Locomotion Analysis of Hexapod
Lima, P. U., & Ventura, R. (2012). Search and Rescue Robots: The Civil Protection Teams of
https://doi.org/10.1109/est.2012.40
Meng, X., Wang, S., Cao, Z., & Zhang, L. (2016). A review of quadruped robots and
https://doi.org/10.1109/chicc.2016.7554355
Preumont, A., Alexandre, P., & Ghuys, D. (1991). Gait analysis and implementation of a six
Silva, M. F., & Machado, J. T. (2007). A Historical Perspective of Legged Robots. Journal of