0% found this document useful (0 votes)
8 views32 pages

Thesis 2021

This bachelor thesis focuses on the design and implementation of a versatile hexapod robot capable of navigating difficult terrains for exploration and disaster relief. The robot utilizes a hexapod configuration, allowing it to adapt its gait for energy efficiency and is controlled using reinforcement learning techniques. The research aims to create a modular and intelligent robot that can operate autonomously with minimal human supervision.

Uploaded by

sabdullah879
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views32 pages

Thesis 2021

This bachelor thesis focuses on the design and implementation of a versatile hexapod robot capable of navigating difficult terrains for exploration and disaster relief. The robot utilizes a hexapod configuration, allowing it to adapt its gait for energy efficiency and is controlled using reinforcement learning techniques. The research aims to create a modular and intelligent robot that can operate autonomously with minimal human supervision.

Uploaded by

sabdullah879
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Abstract

The aim of this bachelor thesis is to elaborate on the development and implementation of a

versatile hexapod robot, which can navigate and explore terrains which are difficult for

humans to explore by themselves, such as disaster sites or inaccessible cultural heritage areas.

The goal of developing such a robot would be to combine functional aspects of intelligent

robots today to create a more adaptable and more versatile robot to autonomously perform

missions without human supervision. For exploration and navigation, the configuration used

is a hexapod configuration. With that hexapod configuration, the design has been made in a

way which allows the robot to change its gait to match its required energy consumption or

efficiency in movement, from walking in a tripod gait to switching to a quadruped

configuration. To keep our robot modular and easily changeable, PLA was used to construct

parts of the body, and servo motors were used for actuation. As for control, a raspberry pi was

used to control and the control policy was trained using reinforcement learning, specifically

Twin Delayed Deep Deterministic Policy Gradient, or TD3, so that the algorithm could find a

suitable gait based on the criteria we provided it with ROS and Gazebo. A prototype of this

robot was assembled, tested and analysed to gauge how functional it would be.

Keywords: Hexapod, CAD, Reinforcement Learning, DDPG, TD3, ROS, Gazebo.


1 Introduction

1.1 USAR and Exploration Robots

Curiosity can be regarded as humanity’s most defining trait, and a fundamental instinct. From

that curiosity has stemmed exploration which humans have been doing for hundreds of

decades, whether it be exploration of space, exploration of the oceans or the exploration of

our own archeologically significant sites. Over the course of human history, we have

developed and created many robots to lead our exploration, such as the Curiosity rover to

explore Mars or Sentry for exploring the oceans. Applications of these robots in our lives

extend from small everyday functions such as conveniences in our home to helping

humankind in times of distress, such as disaster relief.

A quality that allows them to help us is that small-sized robots have access to spaces that

might be too confined for a person’s dimensions, taking tiny cameras and sensors with them

to detect survivors and map survivor locations. Similarly, aerial robots can provide a broad

view of the search and rescue scenario and autonomous robots can increase the amount of

information available to rescue teams (Lima & Ventura, 2012). Well known work on Urban

Search and Rescue (USAR) robots in the US has aided in the rescue operations of the World

Trade Center after the 9/11 attacks, and this technology has also been tested during

catastrophes such as Fukushima, Chernobyl and national disasters like hurricane Katrina

(Lima & Ventura, 2012). These robots come with their own diverse configurations, some

being wheeled robots and other being legged robots.

1.2 Role of Legged Robots

In recent years however, multi-legged walking robots have been broadly recognised for their

superior performance in their potential to be an effective and efficient transportation device


on irregular and difficult terrains. Compared with wheeled or tracked locomotion, legged

robots have a variety of advantages in adapting to scenarios that have difficult terrain, such as

the aforementioned search and rescue missions, or exploration of volcanoes, for example.

Legged robots, having discrete contact points between the ground surface, as compared to

wheeled robots, which require continuous contact points, are advantageous in an unstructured

environment, and thanks to the agility and flexibility from the degrees of freedom from the

leg mechanisms, the body trajectories are less affected by uneven ground (Bo et al., 2011). In

legged robots a specific class, namely hexapods, stands out. Hexapods have six legs which

gives them more versatility as compared to other legged robots (Ding et al., 2010). Moreover,

hexapods can turn on the spot easily, and they have been observed to be the fastest out of

legged robots, as it was observed that adding more than six legs did not improve walking

speed. Along with this, hexapods showed highest static as well as dynamic stability

(Preumont et al., 1991). In addition to this, hexapods can modify their gait to accommodate

the loss of a limb or even use a limb for more complex tasks, such as pressing buttons (Ding

et al., 2010).

Apart from hexapod robots, another class of robots that have gained popularity over the last

decade is biomimetic robots, specifically quadrupeds. The most famous quadruped, Spot, has

recently been commercialised and sold nearly 400 units, at $75,000 each, amounting to

approximately 30 million dollars (Dickson, 2020). Apart from this indicator of adulation, a

lot of research has gone into quadrupeds, starting from the 1970s with the Hirose Fukushima

Robotics Lab working on their TITAN series for virtually 40 years. Moreover, we have iconic

robots like ANYmal, MIT Cheetah, and the Big Dog series. These robots performed well,

going up to speeds of 45 km/h (CHEETAH), starting, turning and stopping immediately, and

even performing complex movements such as somersaults and backflips (MIT Cheetah)

(Meng et al., 2016).


1.3 Control Policies

Control methods for robots can be divided into two broad major categories, traditional and

new. Traditional control methods use a robotics control pipeline which involve six different

steps, which are: observation; state estimation; modeling and prediction; planning; low level

control and deployment. Newer methods use artificial intelligence to sum up the steps

between observation and deployment as training using methods such as deep learning,

reinforcement learning and machine learning (Vilches, 2017).

The former approach makes use of model-based properties such as kinematics and dynamics

so that a predetermined control policy can be implemented manually, such as a tripod gait, or

a wave gait. These gaits are usually not too adaptable to new terrain and need to be modified

by sensor data. However, the latter approach can be used to generate a gait which is optimal

for the robot and the conditions it is in using techniques like reinforcement learning, which

uses the model of the robot to simulate a series of inputs to each joint, rewarding or

penalising the robot based on its progress. For example, if the robot tips over, it is penalised,

but if it covers some distance in a straight line, it is rewarded. After training the robot this

way, an optimum gait is reached which has the highest reward. As compared to machine

learning, the algorithm in reinforcement learning reacts to an environment instead of being

task driven (as in supervised ML) or data-driven (as in unsupervised ML) (Shaikh, 2017).

1.4 Thesis Objective

In this thesis, the objective is to assimilate qualities of both hexapods and quadrupeds, to

develop and build a prototype for a versatile robot that can perform dexterous tasks and adapt

to the given environment, and then evaluate the performance of such a robot based on widely

recognised tests. The applications of the robot would range from household assistance to
USAR and exploration of unseen and uneven terrain. This objective extends to choosing a

well-suited gait using reinforcement learning to allow smooth operability on any terrain and

to have an intelligent robot that would require minimal supervision.


2 Theoretical Background

2.1 History of Legged Robots

A substantial part of the terrain on Earth is inaccessible to exploration with only wheeled

locomotion. In particular, natural terrain has obstacles like large rocks, loose soil, deep

ravines and steep slopes which leave wheeled locomotion not only inefficient, but in many

cases, ineffective. In these types of landscapes, legs have been proven to be suitable for

locomotion since small obstacles can be avoided by discrete contact with the ground and not

engaging with undesirable footholds. Moreover, these mechanisms can climb over obstacles

and step over ditches while staying leveled and stable, a feat that wheeled locomotion would

only achieve at a small scale. Along with that, the use of multiple degrees of freedom in the

leg joints gives legged vehicles the ability to change their heading without slippage. A unique

advantage that legged systems provide is that they have a higher tolerance for failure during

static stable locomotion as compared to that of a wheeled robot. For example, the impact of a

wheel failing in a wheeled vehicle is detrimental to its locomotion and mobility, and in

contrast, legged robots can possess redundant legs while still maintaining static balance as

well as continuing locomotion even with one or more damaged legs (Silva & Machado,

2007).

That being said, there are limitations which should be considered while evaluating a legged

robot. Although superior to wheeled locomotion, legged robots often have low speeds, can be

difficult to build and require more complicated control algorithms. Apart from this, their high

degrees of freedom can lead to a lot of consumption of energy as well (Silva & Machado,

2007).

Despite these shortcomings, legged robots have been researched as early as 1878, with the

analysis of gaits used by different mammals, and by the mid 1950s, different laboratories
were designing and building walking robots. After that, came a flurry of walking robots,

although rudimentary in design and implementation, such as the six-legged machine by the

Space General Corporation for space travel in the early 1960s; the General Electric

Quadruped in 1968, which was the first to use different gaits; and the Phoney Poney in 1966,

was the first robot to move autonomously under computer control and electrical actuation;

and many more (Silva & Machado, 2007).

Figure 1.

General Electric Quadruped (left) and Phoney Pony (right).

Note. Adapted from John Wiley and Sons

From there, research and robotics in legged robots has come a long way to give way to

different classes of legged robots, namely monopods, bipeds, and so on according to the

number of limbs required to walk. Out of these classes, one seems to stand out the most, the

hexapod class.

2.2 Hexapods
Hexapods, defined as six legged robots, have been used for exploration of remote locations

and hostile environments such as the seabed, in nuclear power stations, in space or on other

planets, and in search and rescue operations as well (Tedeschi & Carbone, 2014).

Referring to the aforementioned high failure tolerance, hexapods could even utilise one or

two limbs to perform a dextrous task, such as pushing a button or loading and unloading a

payload. They have greater mobility in natural surroundings than other legged robots and

have a lower impact on the terrain as well, which can be crucial in remote or hostile

environments. Additional features which merit the development of hexapods are that they can

have variable geometry, their static and dynamic stability is the highest as compared to other

legged robots (Tedeschi & Carbone, 2014).

A successful early design traced back to the University of Rome in 1972 was constructed as a

computer-controlled walking machine with electric drives, and another design from Ohio

State University in 1977 was the OSU Hexapod. These robots could walk short distances and

even traverse obstacles, but they were heavy and lacked versatility and adaptability. By the

early 90s, many renowned organisations and universities such as Mobot Lab and TUM had

built hexapods which showed higher mobility and computing power, such as the Attila and

Hannibal robots, which possessed over 19 degrees of freedom, more than 60 sensory inputs, 8

microprocessors and real-time behaviour.

However in the last two decades, there has been an even more rapid development in control

systems technology which lead to hexapods being equipped with various sensors and

controlled with artificial intelligence systems to analyse the environment and motion of the

robots on complex surfaces. The robots of this generation were fast and agile, such as the

Biobot, which was built as a biomimetic robot modeled after the American cockroach;

Hamlet, constructed by the University of Canterbury, which had legs with three joints each to

study force and position control on even terrain; and lastly, there is RHex, which comes from
multi-university DARPA funded efforts which uses six compliant legs to rotate the legs as

wheels. Other impressive robots included AQUA, the amphibious hexapod, which could

change from a walking gait to a swimming gait for inspecting shipwrecks and surveying the

seafloor in high current and turbid environments. These developments in hexapod robots

helped to popularise the utility of this class of robots (Tedeschi & Carbone, 2014).

Tedeschi and Carbone (2014) divided the design of the hexapod into three sections, namely:

robot body architecture; kinematic architecture of legs; and actuator.

For robot body architecture there are two options, either rectangular or hexagonal. Both have

their merits and demerits, and among the most prominent are that rectangular hexapods are

better suited for moving along a straight line, and that the feasibility of their gaits has been

more widely tested. On the other hand, hexagonal robots do not require a turning motion as

they do not have a front or rear as rectangular ones do. Another point of comparison was that

rectangular hexapods had a higher longitudinal stability margin but hexagonal ones had a

higher turning ability and stride. Coming to the second section, the kinematic architecture of

legs, there are many different options to consider depending on the type of workspace,

payload and terrain the robot would be facing. To navigate this section, there are three stages,

firstly, choosing whether the legs should be biomimetic or non-zoomorphically inspired.

Biomimetic options for the legs are inspired by reptiles, mammals or insectoids. Reptilian

legs are placed on the sides of the body and the needs are to the side of the base, similarly,

mammalian legs are placed under the body and provide less support, but require less energy

as well, and lastly, insectoid legs have the legs on the sides of the body with the knees above

the base level of the body. In contrast to that, non-zoomorphically inspired legs are not

motivated by any naturally inspired gait and can be many different types, such as hybrid with

wheels at the end of the legs, or telescopic legs where the segments of the leg get smaller the

further you get from the body. After deciding on the type of leg, comes the next stage, the
orientation of the legs. There are three options here, which are frontal, where the legs are

perpendicular to the direction of movement, sagittal, where the legs are parallel to the

direction of movement, or circular, where the legs are present radially to the body, allowing

easier movement in any direction. Depending on the type of orientation chosen, the control

methods can be more or less complicated according to the application of the robot. The last

stage of this section is to choose which joint configuration to choose for the legs from the

knees being outwards, inwards or having either side in the same configuration, from these

three, the option that can be seen most frequently throughout hexapods in history is the knee

being outwards, since it grants more dynamic and static stability and does not compromise

the mobility of the robot as the other two configurations do (Tedeschi & Carbone, 2014).

The third and last section regarding hexapod design is the actuator, and can be divided into

two parts, the type of actuator and the arrangement of the actuator. The type of actuators used

in popular hexapods have ranged from electric motors, to pneumatic actuators. Electric

rotating motors are cheap, easy to control and easy to power as well, so they make good

actuators for smaller, lighter hexapods. Some hexapods have used linear motors in the past as

well, but their low movable range to weight ratio makes them undesirable to use. Pneumatic

actuators have low stiffness and inaccurate response but offer a short response time. They do

require heavy air supply from bottles on compressors which can make a robot bulkier than it

needs to be. A popular choice among heavier and larger hexapods is hydraulic actuation,

since they have a high power to weight ratio and they can supply high force, but they do need

an engine to drive the pump, so they’re suitable only for large hexapods. There has been use

of unconventional actuators such as shape memory alloy, which contracts when heated, but

they have been most applicable to microrobots since they have a slow response time and

small operating forces.


Actuator arrangements are optimised to obtain maximum leg workspace and minimal

kinematic structure to save energy consumption. There are three popular approaches that are

used in hexapods, namely attaching the actuator at the joints, attaching the actuator using a

pulley and belt, and remote actuation. The first approach is the most straightforward and least

complex with respect to control algorithms. Using a pulley and belt, or a remote actuator

where the actuator is present at the end of the leg segment usually make the control algorithm

more complex without providing any prominent merits. This section concludes the

mechanical design of the hexapod.

2.3 Gait Planning and Algorithms

The next step in the design and implementation of a hexapod robot is gait planning. A gait

can be defined as the sequence of periodic or aperiodic leg motions coordinated with a

sequence of body motions to move the robot in the desired direction and orientation from one

point to another. Traditional periodic gaits for hexapods are inspired by nature and animals,

and the most popular three gaits are the metachronal/wave gait, ripple gait and the tripod gait.

The following image shows how they work, the white representing the foot in contact with

the ground and the black representing the foot being off the ground.

IMAGE

As we move from the wave to ripple to tripod gait, the speed of the robot increases, but it is a

tradeoff with the dynamic stability of the robot. Since in the wave gait, one leg moves at a

time, the supporting polygon formed by the other five legs makes for a stable base of the

robot. Similarly, in the ripple gait, two legs move at a time and the other four form a

supporting polygon, and lastly in the tripod gait, three legs move at one time, and the other

three form the supporting polygon. A problem with periodic gaits is that they require

synchronization between the expected and actual times that the legs make contact with the
terrain, which can be problematic over uneven ground where the leg phasing and stability is

disturbed (Gurel, 2017).

IMAGE

A solution to this is talked about by Ding et al (2010): free gaits, which are aperiodic gaits

that are non-symmetric and terrain adaptive. Each leg has a flexible motion as a function of

the overall trajectory of the robot and there have been many free gaits developed over the

years. In recent years, however, using simulation-based design to generate gaits has been

used more often. After adding the mechanical structure to a simulation environment,

kinematics and dynamics properties are added, and finally, actuator control. This approach is

a powerful alternative to the classic design methods as it gives more room for gait

optimization and allows testing in a virtual environment, allowing more freedom without

risking damage to any parts (Tedeschi & Carbone, 2014).

As for types of algorithms to generate an optimal gait, there are quite a few options, such as

genetic and evolutionary algorithms, or data driven machine learning algorithms. Our focus

will be on reinforcement learning, which is an effective method to tackle the gait generation

problem. The main characters of reinforcement learning are the agent and the environment,

the agent in this case is our hexapod robot, and the environment would be the virtual

environment it interacts with. At every step of the interaction, the agent would see an

observation of the state of the world and then it would decide on an action to take. As the

state changes, depending on our criteria, the reward given to the agent changes as well, and

the objective of the agent is to maximise the cumulative reward, called the return. Through

this process, the agent learns behaviours to achieve its goal.


To understand the algorithm we will use, some terminology is important to know. A state s is

the complete description of the state of the world and an observation o is a partial description.

In deep reinforcement learning, both are represented with a real valued vector, matrix or a

high order tensor, such as representing the state of the robot by its joint angles, velocity and

orientation with respect to the world frame. Our action space, which is a set of valid actions

in the environment, will be continuous, as opposed to discrete, since we have joint angles and

velocities that can be variable. Our policy, which is the function that takes the state as the

input as outputs the action, has many parameters, such as the weights of the neural network.

Lastly, comes Q learning, where the function Q gives us the expected future rewards for each

action in each state. Since our goal is to maximise the reward, in turn we also need to

maximise the Q value function. In a complex environment, however, the function has to do

an innumerable number of calculations for the high number of actions and states (Achiam,

2020).

When there is an extremely high number of calculations to be done, deep Q-learning is used,

which uses a neural network to predict the Q-values as close to their Q-target as possible. The

goal of this neural network is to reduce the loss between the prediction and target over time

through stochastic gradient descent. However, deep Q-learning is applied only to discrete

action spaces, but since ours is continuous, it would not apply to our problem. In addition to
this, there is the policy gradient, which is a second neural network using an approximation

method that consists of updating the weights of the policy and it takes the states as input and

the actions as outputs. And since our goal is to increase the total return, we use gradient

ascent to update the parameters, i.e. the weights of the policy.

As previously mentioned, deep Q learning does not apply to our problem since we have a

continuous action space, we use something called the Actor-Critic model which is actually

two models, the actor and the critic, working together. The actor is the policy which takes the

states as input and returns actions as output, and is updated through a deterministic policy

gradient algorithm. The critic is what takes the place of the deep Q learning neural network,

and takes states and action both as inputs, outputting Q-values. This way we use policy

gradient on our critic model to get our Q-values closer and closer to the target and in turn, we

use the Q-values to approximate the expected return, which is used to perform gradient ascent

(Foy, 2019).

To summarise, according to Fujimoto et al. (2018), the policy known as the actor, can be

updated through the deterministic policy gradient algorithm:

Where Qπ (s, a) =Esi∼pπ,ai∼π [Rt|s, a], the expected return when performing action a in state s

and following after π is known as the critic or the value function.

Coming to the method we’ll be using, Twin Delayed Deep Deterministic Policy Gradient

(herein referred to as TD3), the Actor-Critic model is used. To reiterate, the actor is the policy

that takes the state as input and outputs action, and the critic takes states and actions as inputs

and outputs a Q-value. In TD3, to improve upon the approximation errors in the Actor-Critic

model, there are two critics instead of one (attributing to the “Twin” in TD3), and each critic

proposes a different Q-value (Foy, 2019).


The first step in implementing a TD3 model is to initialise Experience Replay memory, which

allows the model to store and access past transitions from which it learns the Q-values. The

transitions are stored as (s, s', a_1, r_1), the current state, the next state, the action and the

reward. Next, a neural network is built for the actor model and one for the actor target as

well. Both the neural networks are initialised the same way, and the weights are updated over

the iterations to return optimal actions. Fujimoto et al. (2018) suggest a specific architecture

for the neural networks as follows:

Next, two neural networks are built for the critic models, and two for the critic targets

following the aforementioned architecture. At this moment, we let our two actor neural

networks learn the policy and the four critic neural networks learn the Q-values.
As our actor target is built, it outputs the actions for the critic target, which in turn leads to the

critic model, and finally, the actor model gets its weights updated through gradient ascent.

Moving on to the training process, a batch of transitions (s, s', a, r) from memory goes

through the following steps: from the next state s’, the Actor target plays the next action a’,

and gaussian noise is added to the next action a’ and the values are put into a range which is

supported by the environment. The gaussian noise allows us to explore the environment and

the agent to explore its action space by adding bias to the action. Then the two critic targets

takes the resulting transition (s’, a’) as an input and return two Q-values, Qt1(s’,a’) and

Qt2(s’,a’) and the minimum of the two is kept as the approximated value of the next state.

The minimum being taken into account prevents the approximation error from the classic

Actor-Critic model from getting too high and in turn the training process stabilises. The

aforementioned minimum Q-value is then used to get the final target of the two critic models,

by the formula:

Where gamma is the discount factor which determines how fast the agent will react to the

reward, whether it will seek a steep slope to get a fast reward or a steadier one. The two critic

models will take the couple (s,a) as input to return two Q-values, Q1 and Q2 and the values

will be compared with the minimum from the target, Qt to compute a loss function as

follows:

The resulting critic loss will be backpropagated and will update the critic model parameters to

reduce the loss function. This concludes the Q-learning part of the training process, and we

will move on to policy learning.


For context, the objective of the policy learning part of TD3 is to obtain the optimal

parameters for the actor model so that the agent can perform the optimal actions in each state

to maximise the expected return. The Q-values we obtained from the critic models have a

positive correlation with the expected return so we will use those to perform gradient ascent

by differentiating their output with respect to the weights of the actor model. While our actor

model’s weights update in the direction of a high Q-value, our agent will return better actions

to increase the Q-value of the state-action pair, and get the agent closer to the optimal return.

Every two iterations, we will update the actor model by performing gradient ascent on the

output of the first critic model and the weights of the actor target are updated with polyak

averaging. Polyak averaging refers to the updating of the parameters of the actor target using

the sum of the products of a very small number tao and the parameters of the actor model

theta.

This equation represents a slight transfer of weights from the actor model to the target and

thus the actor target gets closer to the model with each iteration, giving the model time to

learn from the target. Similarly, the critic target’s weights are updated with polyak averaging

as well. Since we update the weights every two iterations, the delayed aspect comes in,

improving TD3 as compared to classic DDPG (Foy, 2019).


Adapted from Kim et al. 2020

Being a newer model, TD3 shows a lot of promise and improvements as compared to classic

DDPG or to the Actor-Critic Model, which is why it was chosen to generate the optimal gait

for our hexapod.


3 Methodology

3.1 Mechanical Design

For the mechanical design of the hexapod, the three sections of hexapod design described by

Tedeschi & Carbone (2014) were gone through. For the robot body architecture, the

rectangular form was chosen as compared to the hexagonal one, since longitudinal stability

was more important as our hexapod, Audax, would incorporate computer vision in the future,

which would mean that a camera placed on the body would give it a “front” and “rear” side,

being more in line with a rectangular design. A hybrid design between reptilian and insectoid

legs was chosen for Audax, where the leg is composed of three segments which are nearly

identical in size, allowing for a larger workspace and higher fault tolerance, although it would

require higher torque from the joints. To compensate for the higher torque, the actuators

chosen are servo motors since they are easy to control and can generate high torque. And

similarly to make sure that the control algorithm is not too complex, the actuator arrangement

chosen is having the actuator at the joint.

As for the material, PLA was chosen, since the body was designed to be modular and easily

changeable and making more parts for the body would be a simple process. PLA is easy to

use with a 3D printer, is lightweight and relatively strong, and small aluminium plates were

added to strengthen the joints of the main body. The parts to be 3D-printed were designed in a

CAD software, Autodesk Fusion 360 using direct modelling to create a mesh and model of

the hexapod. The main body is composed of two subparts, the roofplate and the baseplate,

where the baseplate houses the power source and the joints where the legs connect to the

body, and the roofplate houses the computing unit, which is the raspberry pi 4 model B, along

with the servo HAT attachments to increase the number of channels to control all the servos
with the raspberry pi. Each subpart is then divided into thirds which are connected via screws

and small aluminium plates.

IMAGE

The mounting holes in the baseplate are present for servos to fit in. The leg itself is divided

into three segments as well, namely the arm which is connected to the main body, then the

forearm, and finally the hand which comes into contact with the ground. The joints present

are the shoulder joint which connects the body and the arm, the elbow joint, which connects

the arm and the forearm, and finally the wrist joint, which connects the forearm and the hand.

The following image shows the labelled structure.

IMAGE

The length of the leg segments was chosen as a hybrid between a quadruped leg and a

hexapod leg. Quadruped legs have two segments that are identically shaped, and hexapod

legs have three segments that get longer as they move further away from the main body.

IMAGES

This design of the legs was chosen to have the possibility of an alternate quadruped gait if

two of the legs were to get damaged, or if the middle two legs were used to perform

dexterous tasks, such as loading or unloading a payload, pressing a button, or pushing a lever.

Since the longer leg segments allowed for a wider action space for the legs, it increased the

mobility of the legs as well. As mentioned previously, servo motors were chosen for the

actuators of the joint as they generated the high torques that were needed to support the main

body of the robot. At 6V, the servos would generate 15.5 kgcm, which would be more than

sufficient to support the approximately 3.5 kg main body, divided by the six legs. Each joint

on the leg was fastened using metal screws to prevent any deformation, as opposed to nylon

screws. Moreover, the shoulder joints, where the legs joined the body, used servos that had a

metal servo horn instead of the standard plastic servo horn, since the weight of the body was
applied most at this joint. This measure was taken after deformation occurred in one of the

trial runs with the plastic servo horns.

IMAGE

After the parts were designed in Autodesk Fusion 360, they were printed in PLA, and

simultaneously, the meshes were used to construct a model in Unified Robot Description

Format to perform simulations for optimal gait generation.

3.2 Kinematics and Dynamics Design for Simulation

Before the URDF file was written, the meshes were centered and their center of mass was

edited using Blender, so that the inertias could be as accurate as possible. A small python

script by Stavrinos (2020) was used to generate accurate inertia tensors for each part, using

their mass, mesh file and the scale they were used at. This was crucial to achieve behaviour in

the simulation closest to the ground truth.

The URDF file contained description of the links, joints, transmissions and the gazebo

descriptions for each link and joint, sensor plugins and control plugins. Each link description

contained the inertial, visual, and collision properties of the link (Mishra, 2021).

IMAGE

The joint descriptions referred to the relationship between a parent and child link with the

dynamics such as damping and friction, and the effort and velocity limits to make the

simulation as realistic as possible (Mishra et al., 2021).

IMAGE

The transmission element in the URDF file described the relationship between a joint and its

actuator, for example, to describe what kind of interface it had, which in our case was an

effort joint interface, which controls the effort applied at each joint rather than the position of
the joint to get it to its desired angle. The transmission element is necessary to use ROS to

control the joints through the ros_control plugin (Mishra et al., 2021).

IMAGE

As for the gazebo element, there was an element present to describe each link, joint and even

the robot for gazebo. The link descriptions converted the visual colours to Gazebo format, the

stl files to dae files for better textures and to add sensor plugins, such as a contact sensor.

Furthermore, it provided the option of turning off self collisions to debug the code in case

there were errors or abnormalities. One part present in the simulation but absent in the robot

itself was a contact sensor at the end of each leg so that contact with the ground could be

logged into the data to correctly judge the gait. Joint descriptions for Gazebo described

proper damping dynamics for each joint and had the option to add an implicit spring damper

for more realistic movements. Apart from these plugins and descriptions, control plugins

were added as well, such as the aforementioned ros_control plugin to control the actuators of

the robot using ROS, an IMU plugin to receive the pose of the robot and a

p3d_base_controller to receive odometry information for the robot (Mishra et al., 2021)

After that, xacro was used to clean up the file by using local variables where needed and

mathematical formulas and calculations as well.

The repository contained a total of 10 subdirectories, config, stl, src, models, urdf, rviz, srv,

msg, launch and results. The config folder contained two files, audax_control.yaml and

audax_trajectory_control.yaml, and the former initialised joint position controllers for each

joint along with PID values for the controller; similarly, the latter initialised joint trajectory

controllers for each joint along with PID values for each controller.

The directory named stl contained stl files for each link of the robot described in the urdf file,

so that they could be imported to the virtual environment in gazebo to display accurate visual

and collision properties. The rviz folder contained a configuration file for viewing the Audax
model in rviz to see how the inputs would look like in an isolated environment. The launch

directory contained the launch files needed to spawn a model of Audax into the environment,

initialise the controllers and start the TD3 algorithm. The directories src, models, srv, msg

and results were used for the TD3 algorithm itself.

3.3 Gait Generation

After a model of the robot was spawned successfully in Gazebo, the aforenamed TD3 model

was implemented to generate the optimal gait for Audax. A model implemented by Rahme

(2020) to generate an optimised gait for a bipedal robot, PLEN, was adapted for Audax. The

controllers and plugins used for PLEN were the same ones used for Audax, since Audax has

six legs, the computations for the simulation were significantly slowed down and PLEN’s

Gazebo simulation ran at approximately 1.30 times simulation speed as compared to Audax,

which ran at 0.40 times simulation speed, which increased the training time required for TD3

with Audax.

Let us briefly describe the subdirectories in the repository which are concerned with the TD3

model. The directory src contained the code which implemented the modified Actor-Critic

models for ROS to interact with them. The msg folder contained the .msg file which

described the data values that the ROS nodes initialised from the launch files used, and

similarly, srv contained the service description for the ROS services used, such as

SpawnModel, DeleteModel, UnloadController and others. The data regarding the weights and

optimisation of the actor and critic models was stored in the models directory and the results

directory stored the optimised policy.

The compute reward function for the robot would decrease the reward for the following

conditions: (1) for being dead at the end of the action, which would occur if the pitch or the

roll would be more than 90 degrees in either direction, where Audax would be tipped over;
(2) if Audax strayed away on the y-axis too much, so that the motion would be in a straight

line; (3) if the height of the main body would be too close to the ground; and (4) if the robot

hadn’t moved significantly. The conditions which would increase the reward were as follows:

(1) for being alive at the end of the action i.e. not resulting in any of the aforementioned

conditions; (2) for achieving a positive velocity in x-axis; (3) for staying on the x-axis, thus

going in a straight line; (4) for staying upright with regards to roll and pitch; and (5) for

facing forward.

The training of the model took place for 48 hours in real time, and 24 hours in simulation

time, which led to the policy and the Q-values being optimised, and at the end, providing us

with a gait which was optimal for this model. After 319,999 iterations, the algorithm was

stopped and the policy was implemented in Gazebo to see the results. It showed the actions

that were chosen in succession for the highest reward.

3.4 Circuitry

Coming to the assembly of the robot, the raspberry pi 4, being the computing unit, was

connected to a power bank, and two servo HATs were stacked on top of each other to have 32

channels, of which 18 were used to control the servos. The servos themselves were powered

from a separate power supply through the servo HATs, which consisted of lithium ion

batteries, which were the most appropriate since they had a high power:weight ratio. Each

battery was 3.7 V, with 3400 mAh at full charge, and two of them were connected in parallel,

and in turn, two parallel units were connected in series to provide a voltage of 7.4 V with

double the amperage, 6800 mAh. Each servo HAT was powered by such a battery unit and

the batteries were housed on the baseplate. As each leg consisted of 3 servo motors, and there

were 6 legs, giving us a total of 18 servo motors, they were evenly divided amongst the two

servo HATs so that each would be connected to 9 motors, controlling 3 legs. The circuit
diagram shown below illustrates the connection of the battery unit and the motors to each

HAT.

Subsequently, the two HATs were controlled by the raspberry pi, and the picture below shows

the construction of the control unit.

The HATs from Adafruit provide the Raspberry pi more channels to easily control and power

the servos individually without compromising the Pi’s own power source, which was a power

bank with a steady voltage of 5V, as opposed to each servo running on approximately 7V.
4 Evaluation

4.1 Initial Problems

One of the initial problems faced was that although the overall centre of mass of the robot

was at the center, the center of masses of the two forward legs being outwards on the front of

the robot were causing it to fall forwards. To fix this issue, it was considered to attach

counterweights on the other side to prevent any falling, but finally the legs were faced

inwards instead, eliminating this problem.

Another problem that arose was that since PLA was lightweight but not too strong, the

originally designed fasteners between the baseplate segments snapped off at the slightest

pressure. To counter this issue, the previously mentioned aluminium plates were used to

fasten them together, providing mechanical support as well, since they are more durable than

the PLA parts.

Lastly, a problem that arose during testing was that the cable management was absent since

there had to be adjustments made when setting the maxima and minima of the servos through

observation and trial and error. These maxima and minima were set in a way that no

collisions would happen even if odd values were transmitted to the pi, but that required

consistent changing of the way the servos horns were fastened to the servos. The cables were

often a hindrance but to solve that, it was decided that they would be zip tied to the closest

segment of the leg.

4.2 Testing
Before testing the prototype of Audax, a prototype of a single leg was built and tested to plot

current consumption and possible length of a single stride. The leg was clamped to hold the

shoulder joint and was connected to a DC supply of 5V. To measure the current, a magnetic

clamp was used to view the current consumption on an oscilloscope, and to measure the

stride, the contact point of the hand segment was marked where it touched the ground given a

range of 45 degrees at the shoulder joint.

As the graph above shows, the average current consumption was around 0.865A, with peaks

of 1.23A when the movement began and a constant running current of nearly 0.5A.

Moreover, the length of the stride seemed promising as it was 30 cm, meaning the robot

could theoretically move at 0.3 m/s with only traditional gaits.

After assembling Audax from its parts and fastening all the joints, random joint position

inputs were given to test the functionality of the legs as a manual diagnostic run and it was

assured that the wiring was not faulty. First, the trajectory optimisation algorithm developed

by Ayush Kumar Chaudary was implemented. A ROS node which would subscribe to the

rostopic in which the joint state messages were published was created and run on the pi. Then

using this subscriber code, the joint state values were mapped from the minima and maxima

of the simulation to the minima and maxima of each of the servos present to ensure that no

damage would occur if there were any abnormalities in the transmitted joint position values.

The results are summed in the following videos and the images below:

As it can be seen in the videos and images, although the mechanical structure seemed to bear

the weight and the servos were able to move to their mapped position, the overall gait did not

seem to be very effective in moving forward. When the speed of transmission of values was
slowed, the movements tended to be rigid and uncoordinated, making the gait ineffective.

However, at higher speeds, the movements were more fluid and propelled the robot in a

forward direction. The adaptability of the gait from simulation to real-life was low, and it

seemed that traditional gaits such as the tripod gait would be more effective.

Next, the optimal policy provided by the TD3 model was implemented and it yielded similar

results. The movements were uncoordinated, unstable, and did not seem to produce a gait that

was effective in real life. In simulation, the optimal actions in quick succession in Gazebo

produced fidgety movements, but as selected by the algorithm for the highest return after a

simulation runtime of 1 day, that was the optimal policy which was chosen.

The gap between the two models and reality seemed large, and there are limitations which

point towards that.

4.3 Limitations

The most prominent limitation was that of computing power and complexity. The TD3 model

ran for 68 hours in real time, yet due to the weak processing power of the GPU used, it only

ran for 30 hours in simulation time. This limitation caused the policy to not be fully

optimised since it only ran 319,999 iterations, and for Rahme (2020), it took 1999,999

iterations to get the policy which he used. This illustrates that without sufficient computing

power, it was inefficient to perform a power-heavy model such as reinforcement learning

with TD3.

Apart from that, although lightweight and cheap, a better alternative to the PLA baseplate and

roofplate used would have been an aluminum plate, since it is durable, strong, and would be

in one part instead of three. One of the reasons the baseplate and roofplate had to be split in
thirds was that the 3D printer available was not large enough to print the whole thing as one

unit.

Lastly, lack of a regulated testing environment made it difficult to properly document and

measure the metrics and qualities of the implemented gait.

5 Conclusion

5.1 Objectives

The objective of this thesis was to incorporate the qualities of both hexapods and quadrupeds

together to build and control a versatile robot. Since most robots in the current market are

appealing to designs like Spot from Boston Dynamics, the versatile class of hexapods is

being overlooked. The design proposed in this thesis pushes for a more dexterous and agile

robot, one that can adapt to any terrain and perform with minimal supervision. The results

and work from this thesis confirm the feasibility of such a robot and albeit the control

algorithms implemented require more work to reach the optimal level, a platform for more

research has been established. With the groundwork and foundation for further research laid

out, the gap between simulation and real-life can be bridged.

5.2 Future Work

5.2.1 Mechanical Design

To ensure that the prototype developed can be used for long term research, it is important to

reinforce the parts and change to a more durable material, such as aluminium. Using PLA

parts for the legs over a long time could cause warping and bending of the parts, so changing

to machined aluminium will be useful.

5.2.2 Control Algorithm


To get a better result from the TD3 algorithm, more constraints could be added to reward the

agent, and by using a computer equipped with a more powerful GPU, the run time could be

decreased to reach the optimal policy in a short amount of time.

5.2.3 Autonomy and Computer Vision

In the future to allow autonomous control, space has been designated for a raspberry pi

camera to be installed to incorporate computer vision. By incorporating that, a whole new

aspect of navigation using SLAM.

5.2.4. Fault Tolerant Gaits

Lastly, to employ fault tolerant gaits, it is necessary to have sensors which indicate or monitor

whether a leg is functional or not so that a new control algorithm can be deployed, such as a

gait for the 4 hind legs if the front two are non-functional, or the 4 outer legs if the middle

two are holding an object.


References

Bo, J., Cheng, C., Wei, L., & Xiangyun, L. (2011). Design and Configuration of a Hexapod

Walking Robot. 2011 Third International Conference on Measuring Technology and

Mechatronics Automation, 1. https://doi.org/10.1109/icmtma.2011.216

Dickson, B. (2020, December 15). The fate of Boston Dynamics. TechTalks.

https://bdtechtalks.com/2020/12/15/boston-dynamics-hyundai-acquisition/

Ding, X., Wang, Z., Rovetta, A., & Zhu, J. M. (2010). Locomotion Analysis of Hexapod

Robot. Climbing and Walking Robots, 291–310. https://doi.org/10.5772/8822

Lima, P. U., & Ventura, R. (2012). Search and Rescue Robots: The Civil Protection Teams of

the Future. 2012 Third International Conference on Emerging Security Technologies, 1.

https://doi.org/10.1109/est.2012.40

Meng, X., Wang, S., Cao, Z., & Zhang, L. (2016). A review of quadruped robots and

environment perception. 2016 35th Chinese Control Conference (CCC), 1–4.

https://doi.org/10.1109/chicc.2016.7554355

Preumont, A., Alexandre, P., & Ghuys, D. (1991). Gait analysis and implementation of a six

leg walking machine. Fifth International Conference on Advanced Robotics ’Robots in

Unstructured Environments, 941–945. https://doi.org/10.1109/icar.1991.240551

Silva, M. F., & Machado, J. T. (2007). A Historical Perspective of Legged Robots. Journal of

Vibration and Control, 13(9–10), 1447–1486. https://doi.org/10.1177/1077546307078276


Tedeschi, F., & Carbone, G. (2014). Design Issues for Hexapod Walking Robots. Robotics,

3(2), 181–206. https://doi.org/10.3390/robotics3020181

You might also like