0% found this document useful (0 votes)
30 views6 pages

Human Model Simulation in Human Robot in

The document describes research on simulating a human model of what a robot sees in its environment to increase transparency in human-robot interaction. The researcher connected a robot named Pepper to their computer to record video data of humans interacting with Pepper. They then analyzed the video using computer vision algorithms to detect humans and calculate their positions relative to Pepper. Finally, they created a 3D simulation in Unity based on the robot's field of view and the detected human positions to model what Pepper sees. The goal was to better understand a robot's perspective for debugging and improving human-robot interaction programs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

Human Model Simulation in Human Robot in

The document describes research on simulating a human model of what a robot sees in its environment to increase transparency in human-robot interaction. The researcher connected a robot named Pepper to their computer to record video data of humans interacting with Pepper. They then analyzed the video using computer vision algorithms to detect humans and calculate their positions relative to Pepper. Finally, they created a 3D simulation in Unity based on the robot's field of view and the detected human positions to model what Pepper sees. The goal was to better understand a robot's perspective for debugging and improving human-robot interaction programs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Human Model Simulation in Human Robot

Interaction
Nadia Hazar

Department of Computing Science, Simon Fraser University, CA


[email protected]

Abstract
As robots are becoming more seen, used, and interacted with in our everyday lives, reassuring the
transparency of safety, usage, and the purpose of their existence is one of the fundamental areas of
artificial intelligence (A.I.). In this paper I report my work with the Naoqi robot Pepper to simulate a model
in Unity 3d of Pepper’s vision from the world or in other words, model what Pepper sees in the
environment around itself. Having this knowledge, we can work, research, and debug our later programs
of Pepper easier.

Key Terms- human-model simulation, human-robot interaction, robot transparency, human recognition,
human tracking

Table of Contents
Abstract ...................................................................................................................................... 1
1. Introduction ............................................................................................................................ 1
2. Connection to Pepper ............................................................................................................. 3
3. Gathering Data ....................................................................................................................... 3
4. The Algorithm and The Code .................................................................................................. 4
5. Working with Unity 3d ........................................................................................................... 4
6. Human tracking ...................................................................................................................... 5
7. Result and Conclusion............................................................................................................. 5

1. Introduction
In the paper “What does the robot think? Transparency as a Fundamental Design Requirement for
Intelligent Systems” writers explain the importance, and the necessity of transparent robots. We can make
use of the two following points from the article:
1) Constructing mental models enables us to understand and predict the behavior of robots.
2) Robots should be self-explanatory for us to be confident about what they’re doing, especially
when interacting with vulnerable groups like kids, or elderly.
Why does transparency of robot matter? What can go wrong if we aren’t aware of what the robot sees?
Essentially, robots’ actions are based on the input their given, it can be from vision, hearing, sensors,
and much more. Therefore, knowing exactly what information is being perceived by the robot, gives us
the dominancy and the control over the machine world. Otherwise, robots can become unstoppable.
People find robots intimidating in the first encounter, however, the more you “get to know them”, the
more natural and fun it becomes to have them around. In some countries like South Korea, robots are so
common that they are basically an important aspect of the society. They perform tasks like vacuuming
malls, cleaning washrooms, greetings in stores and hotels, and much more. Imagine a robot crashes and
bumps into people in one of those malls, if we are to fix the raised issue, the first step is to see what the
robot is seeing, next would be to figure how it’s analyzing that input, and then follow it by the standard
debugging procedure. Therefore, having a good understanding of robot’s vision is a key element of
human robot interaction.
This is the main purpose and goal behind my research. I have been able to slightly improve the human-
robot interaction with Pepper by modelling its vision of humans locating in front of it. First, I got familiar
with Pepper by performing some basic tasks like making a conversation and asking to perform actions
like dancing, Tai Chi, and more. Then I made connections between Pepper and my computer by
Choregraphe software. I will explain more about this software later on. After being able to connect to
Pepper easily by my computer, I could program Pepper to do as I want either in the terminal or in the
software. Since I’m looking for Pepper’s visuals, I used the video recorder in Choregraphe. After
gathering enough data, I needed to come up with the right algorithm to analyze it. I asked myself
questions like “What am I going to do with these videos?”, “What’s the purpose of the algorithm?”.
Asking myself these questions, I had a better and clearer understanding of the algorithm. It is as follows:
1) Get an input in the form of a video (.avi)
2) Recognize humans in the video using existing face recognition classifiers
3) Using linear algebra, calculate the x, y, and z location of the human relative to Pepper
4) Output the calculated information in form of a text file
I started coding the algorithm in Python. When everything was working fine, and all the bugs were fixed,
I would give the recorded video that was received from Pepper, and as the result I had a text file of data
that consisted of there numbers (x, y, and z) for the human in each frame of the video. What can I do
with these numbers? Again, I reminded myself of my main goal; I am going to model Pepper’s vision in
Unity 3d software. Therefore, what I need is models in Unity, after that I can make use of the data I
gathered.
Unity is a very powerful software to make games. I was unfamiliar with it, so for a few weeks, I was
taking different tutorials and courses to learn about its various features. After that point, I was ready to
start creating my own model. The final result was a Pepper on the screen and a human model which
after pressing play, would move according to the original video taken from the human in real life.
Although, my research was done and successful with very little error, I thought of making it more robust.
One way to do so, would have been to incorporate more details to the human model, like if the human
smiles, or waves his or her hand. Or else I could improve it so that it can handle more than one human in
front of Pepper. I thought the latter could be more useful, therefore I started doing some more research
in human tracking. After figuring out the basics and the story behind its nature, I started writing the
code that would divide the data taken in the video with more than one person in the shot. The result
was just how I had planned, I modeled the people that were walking, standing, or sitting in front of
Pepper, moreover I also knew information like the exact distance of each person from Pepper, and the
approximate height of the people. The following is the detailed journey in each step of the way to make
the model work.

2. Connection to Pepper
In the process of learning how to connect a computer to a robot, we need to know terms like IP Address
and some shell command lines like ssh. Every electronic has an IP Address that is unique to it and is used
for wireless connection to the device. We can access it by going to the settings of the device, but in case
of Pepper, pushing its button on its stomach would have Pepper say its IP Address. Ssh is command line
in Linux that does all the work for connecting computer to the other device from which we have the IP
Address of. This is how it should look like in the shell:
ssh nao@[IP Address]
After which it asks the password for the nao device (Pepper) which has to be entered, to grant the
computer access to connect to Pepper. Once connected, we can easily communicate with Pepper, in the
shell by installing and importing qi package. Qi is a special package containing all the commands that can
be used to program Pepper.
Another way to connect with Pepper is through a software called Choregraphe. The software is very
simple and fun to use. Drag and drop the command form the commands column and press play. This is
mostly what I used since it already had a built-in command for video recording. The video would be
saved on Pepper, so to take the video from Pepper’s head and save it on the computer we have to use
scp command line in shell. The following is a screenshot of the Choregraphe environment.

3. Gathering Data
A video from Pepper’s eyes of a human standing, walking, and generally moving in front of it was the
input data of my main program. So, I had various people (male and female, tall and short, kid and adult)
walk toward, away and to the sides in front of Pepper. Pepper’s head has to be set to static mode,
because it moves around constantly, looking for sounds, movements, or any sign of humans. It’s also
important to save the angle of his head and his body and set it the same in all the videos. Since I was
going to calculate the position of the human in the video relative to Pepper, I had to make sure that I
know exactly where the human is standing. Therefore, I put signs on the ground as to guide the walk of
the person. This way, linear algebra can do all the work of calculating the position of the human based
on the pixels in the video. Once I found a pattern, I tested with another set of data and the result for the
distance from Pepper, were within 5cm error, the height within 10cm, and the horizontal position of the
human within 5 cm error. That was a good enough estimation for the rest of my work.
The following picture is a screenshot form the input video that Pepper recorded, and I have modelled
the full video later on.

4. The Algorithm and The Code


The algorithm of what I was planning to program consists of the following 4 parts:
1) Get an input in the form of a video (.avi)
2) Recognize humans in the video using existing face recognition classifiers
3) Using linear algebra, calculate the x, y, and z location of the human relative to Pepper
4) Output the calculated information in form of a text file
I chose to program the algorithm in Python because of its easy nature to code. First thing to do is to grab
the video file and read it frame by frame. I had to make it in grayscale so that it’s easier to detect human
faces in. Both can be done from OpenCV package. Using haarcascade classifiers I could detect any
human face in the video. I also already had all the calculations for getting from the number and position
of the pixels of the box around the faces to the x, y, z location of the human relative to Pepper. Lastly, I
had to make a text file and write the information to it in an organized way.

5. Working with Unity 3d


When I have the data that I need, I have to start the main part of the project, the visuals in Unity3d. In
Unity, there are some built-in models, backgrounds, and actions, but for my research I made my human
models (female and male) in MakeHuman software and imported it to Unity.
A Screenshot of Creating Human Models in MakeHuman Software

Unity enables you code any object as you want. This allowed me to code the human models to move in
front of my modelled Pepper based on the data in the text file. The programming language in Unity is C#
so I had to learn the language along with the research.

6. Human tracking
Everything was set and done. I wanted to make the program a little more robust. My code worked only
when one person is in front of Pepper, but there are more cases that there are multiple humans in front
of Pepper. Are we able to model all of them? This is where the subject of tracking comes in to play.
Tracking is a popular matter and is used almost everywhere in the programming world. How does it
work? It’s essentially keeping track of each human and saving the information related to each one
separately. There are a lot of complications with this, because first of all the human recognition classifier
isn’t perfect and gives us wrong detection along in the frames, which have to be able to discard. The
other hard part is that when the humans are moving, if they switch places, it can be rather hard to keep
the track as how it happened. Lastly, coding it is tricky on its own. I had to keep comparing each point to
the one before and sort them so that the ones that are close to each other belong to one person. Once
that one was done, I incorporated it in Unity so that for the number of humans detected in the video, it
instantiates a human model and moves them according to the data.

7. Result and Conclusion


We have become more in control now that we can know what our robot is seeing and moreover, we
know the details about the position of the human which can come in handy for further researches. The
final result of the research is a connection between Pepper and the computer and with a click of a
button, we can see an animated model of the humans in front of Pepper, walking, standing still, or just
moving around. We can see whether they are a kid or an adult. We have gained more control over the
robot and that’s the main goal of the research to work on human robot interaction.
This following two screenshots are the simulation of the human walking close to Pepper.

You might also like