1. Introduction
When the magnitude of a gaze is too large, human beings change the orientation of their head or body to assist their eyes in tracking targets because saccade alone is insufficient to keep a target at the center region of the retina. Studies on body–head–eye coordination gaze point tracking are still rare because the body–head–eye coordination mechanism of humans is prohibitively complex. Multiple researchers have investigated the eye–head coordination mechanism, binocular coordination mechanism and bionic eye movement control. In addition, researchers have validated the eye–head coordination models on eye–head systems. This work is significant for the development of intelligent robots for human–robot interaction. However, most of these methods are based on the principle of neurology, and their further developments and applications may be limited by people’s understanding of human processes. However, binocular coordination based on the 3D coordinates of an object is simple and practical, as verified by our previous paper [
1].
When the fixation point transfers greatly, the head and eyes should move in coordination to accurately shift the gaze to the target. Multiple studies have built models of eye–head coordination based on the physiological characteristics of humans. For example, Kardamakis A A et al. [
2] researched eye–head movement and gaze shifting. The best balance between eye movement speed and the duration time was sought, and the optimal control method was used to minimize the loss of motion. Freedman E G et al. [
3] studied the physiological mechanism of coordinated eye–head movement. However, they did not establish an engineering model. Nakashima et al. [
4] proposed a method for gaze prediction that combines information on the head direction with a saliency map. In another study [
5], the authors presented a robotic head for social robots to attend to scene saliency with bio-inspired saccadic behaviors. The scene saliency was determined by measuring low-level static scene information, motion, and prior object knowledge. Law et al. [
6] described a biologically constrained architecture for developmental learning of eye–head gaze control on an iCub robot. They also identified stages in the development of infant gaze control and proposed a framework of artificial constraints to shape the learning of the robot in a similar manner. Other studies have investigated the mechanisms of eye–head movement for robots and achieved satisfactory performance [
7,
8].
Some application studies based on coordinated eye–head movement have been carried out in addition to the mechanism research. For example, Kuang et al. [
9] developed a method for egocentric distance estimation based on the parallax that emerges during compensatory head–eye movements. This method was tested in a robotic platform equipped with an anthropomorphic neck and two binocular pan–tilt units. Reference [
10]’s model is capable of reaching static targets posed at a starting distance of 1.2 m in approximately 250 control steps. Hülse et al. [
11] introduced a computational framework that integrates robotic active vision and reaching. Essential elements of this framework are sensorimotor mappings that link three different computational domains relating to visual data, gaze control and reaching.
Some researchers have applied the combined movement of the eyes, head and body in mobile robots. In one study [
12], large reorientations of the line of sight, involving combined rotations of the eyes, head, trunk and lower extremities, were executed either as fast single-step or as slow multiple-step gaze transfers. Daye et al. [
13] proposed a novel approach for the control of linked systems with feedback loops for each part. The proximal parts had separate goals. In addition, an efficient and robust human tracker for a humanoid robot was implemented and experimentally evaluated in another study [
14].
On the one hand, human eyes can obtain three-dimensional (3D) information from objects. This 3D information is useful for humans to make decisions. Human can shift their gaze stably and approach a target using the 3D information of the object. When the human gaze shifts to a moving target, the eyes first rotate to the target, and then the head and even the body rotate if the target leaves the sight of the eyes [
15]. Therefore, the eyes, head and body move in coordination to shift the gaze to the target with minimal energy expenditure. On the other hand, when a human approaches a target, the eyes, head and body rotate to face the target and the body moves toward the target. The two movements are typically executed with the eyes, head and body acting in conjunction. A robot that can execute these two functions will be more intelligent. Such a robot would need to exploit the smooth pursuit of eyes [
16], coordinated eye–head movement [
17], target detection and the combined movement of the eyes, head and robot body to carry out these two functions. Studies have achieved many positive results in these aspects.
Mobile robots can track and locate objects according to 3D information. Some special cameras such as deep cameras and 3D lasers have been applied to obtain the 3D information of the environment and target. In one study [
18], a nonholonomic under-actuated robot with bounded control was described that travels within a 3D region. A single sensor provided the value of an unknown scalar field at the current location of the robot. Nefti-Meziani S et al. [
19] presented the implementation of a stereo-vision system integrated in a humanoid robot. The low cost of the vision system is one of the main aims, avoiding expensive investment in hardware when used in robotics for 3D perception. Namavari A et al. [
20] presented an automatic system for the gauging and digitalization of 3D indoor environments. The configuration consisted of an autonomous mobile robot, a reliable 3D laser rangefinder and three elaborated software modules.
The main forms of motion of bionic eyes include saccade [
1], smooth pursuit, vergence [
21], vestibule–ocular reflex (VOR) [
22] and optokinetic reflex (OKR) [
23]. Saccade and smooth pursuit are the two most important functions of the human eye. Saccade is used to move eyes voluntarily from one point to another by rapid jumping, while smooth pursuit can be applied to track moving targets. In addition, binocular coordination and eye–head coordination are of high importance to realize object tracking and gaze control.
It is of great significance for robots to be able change their fixation point quickly. In control models, the saccade control system should be implemented using a position servo controller to change and keep the target at the center region of the retina with minimum time consumption. Researchers have been studying the implementation of saccade on robots over the last twenty years. For example, in 1997, Bruske et al. [
24] incorporated saccadic control into a binocular vision system by using the feedback error learning (FEL) strategy. In 2013, Wang et al. [
25] designed an active vision system that can imitate saccade and other eye movements. The saccadic movements were implemented with an open-loop controller, which ensures faster saccadic eye movements than a closed-loop controller can accommodate. In 2015, Antonelli et al. [
26] achieved saccadic movements on a robot head by using a model called recurrent architecture (RA). In this model, the cerebellum is regarded as an adaptive element used to learn an internal model, while the brainstem is regarded as a fixed-inverse model. The experimental results on the robot showed that this model is more accurate and less sensitive to the choice of the inverse model relative to the FEL model.
The smooth pursuit system acts as a velocity servo controller to rotate eyes at the same angular rate as the target while keeping them oriented toward the desired position or in the desired region. In Robinson’s model of smooth pursuit [
27], the input is the velocity of the target’s image across the retina. The velocity deviation is taken as the major stimulus to pursue and is transformed into an eye velocity command. Based on Robinson’s model, Brown [
28] added a smooth predictor to accommodate time delays. Deno et al. [
29] applied a dynamic neural network, which unified two apparently disparate models of smooth pursuit and dynamic element organization to the smooth pursuit system. The dynamic neural network can compensate for delays from the sensory input to the motor response. Lunghi et al. [
30] introduced a neural adaptive predictor that was previously trained to accomplish smooth pursuit. This model can explain a human’s ability to compensate for the 130 ms physiological delay when they follow external targets with their eyes. Lee et al. [
31] applied a bilateral OCS model on a robot head and established rudimentary prediction mechanisms for both slow and fast phases. Avni et al. [
32] presented a framework for visual scanning and target tracking with a set of independent pan–tilt cameras based on model predictive control (MPC). In another study [
33], the authors implemented smooth pursuit eye movement with prediction and learning in addition to solving the problem of time delays in the visual pathways. In addition, some saccade and smooth pursuit models have been validated on bionic eye systems [
34,
35,
36,
37]. Santini F et al. [
34] showed that the oculomotor strategies by which humans scan visual scenes produce parallaxes that provide an accurate estimation of distance. Other studies have realized the coordinated control of eye and arm movements through configuration and training [
35]. Song Y et al. [
36] proposed a binocular control model, which was derived from a neural pathway, for smooth pursuit. In their smooth pursuit experiments, the maximum retinal error was less than 2.2°, which is sufficient to keep a target in the field of view accurately. An autonomous mobile manipulation system was developed in the form of a modified image-based visual servo (IBVS) controller in a study [
37].
The above-mentioned work is significant for the development of intelligent robots. However, there are some shortcomings. First, most of the existing methods are based on the principle of neurology, and further developments and applications may be limited by people’s understanding aimed at humans. Second, only two-dimensional (2D) image information is applied when gaze shifts to targets are implemented, while 3D information is ignored. Third, the studies of smooth pursuit [
16], eye–head coordination [
17], gaze shift and approach are independent and have not been integrated. Fourth, bionic eyes are different from human eyes; for example, some of them are two eyes that are fixed without movement or move with only 1 DOF, whereas some of them use special cameras or a single camera. Fifth, the movements of bionic eyes and heads are performed separately, without coordination.
To overcome the shortcomings mentioned above to a certain extent, a novel control method that implements the gaze shift and approach of a robot according to 3D coordinates is proposed in this paper. A robot system equipped with bionic eyes, a head and a mobile robot is designed to help nurses deliver medicine in hospitals. In this system, both the pan and each eye have 2 DOF (namely, tilt and pan [
38]), and the mobile robot can rotate and move forward over the ground. When the robot gaze shifts to the target, the 3D coordinates of the target are acquired by the bionic eyes and transferred to the eye coordination system, head coordination system and robot coordination system. The desired position of the eye, head and robot are calculated based on the 3D information of the target. Then, the eye, head and mobile robot are driven to the desired positions. When the robot approaches the target, the eye, head and mobile robot first rotate to the target and then move to the target. This method allows the robot to achieve the above-mentioned functions with minimal resource consumption and can separate the control of the eye, head and mobile robot, which can improve the interactions between robots, human beings and the environment.
The rest of the paper is organized as follows. In
Section 2, the robot system platform is introduced, and the control system is presented. In
Section 3, the desired position is discussed and calculated. Robot pose control is described in
Section 4. The experimental results are given and discussed in
Section 5; finally, conclusions are drawn in
Section 6.
3. Desired Pose Calculation
When performing in situ gaze point tracking, the robot performs only pure rotation and does not move forward. When the robot approaches the target, it first turns to the target and then moves straight toward the target. Therefore, the calculation of the desired pose can be divided into two sub-problems: (1) desired pose calculation for in situ gaze point tracking and (2) desired pose calculation for approaching gaze point tracking.
The optimal observation position is used for the accurate acquisition of 3D coordinates. The 3D coordinate accuracy is related to the baseline, time difference and image distortion. In the bionic eye platform, the baseline is changed with the changes in the cameras’ positions because the optical center is not coincident with the center of rotation. The 3D coordinate error of the target is smaller when the baseline of the two cameras is longer. Therefore, it is necessary to keep the baseline unchanged. On the other hand, there is a time difference caused by unstick synchronization between image acquisition and camera position acquisition. In addition, it is necessary to keep the target in the center areas of the two camera images to obtain accurate 3D coordinates of the target.
3.1. Optimal Observation Position of Eyes
In the desired pose of the robot, the most important aspect is the expected pose of the bionic eye [
40]. Following the definition of this parameter, the calculation of the desired pose of the robot system is greatly simplified; thus, we present an engineering definition here of the desired pose of the bionic eye.
As shown in
Figure 5,
lmi (
lui,
lvi) and
r(
rui,
rvi) are the image coordinates of point
eP in the camera at time
i.
lmo and
rmo are the image centers of the left and right cameras, respectively.
lP is the vertical point of
eP along the line
lOclZc, and
rP is the vertical point of
eP along the line
rOcrZc.
l∆
m is the distance between
lm and
lmo.
r∆
m is the distance between
rm and
rmo.
Db is the baseline length. The pan angles of the left and right cameras in the optimal observation position are
lθp and
rθp, respectively. The tilt angles of the left and right cameras in the optimal observation position are
lθt and
rθt, respectively.
Pob (
lθp,
lθt,
rθp,
rθt) is the optimal observation position.
When the two eyeballs of the bionic eye move relative to each other, the 3D coordinates of the target obtained by the bionic eye produces a large error. To characterize this error, we give a detailed analysis of its origins in
Appendix A. Through analysis, we obtain the following conclusions to reduce the measurement error of the bionic eye:
(1) Make the length of Db long enough, and maintain as much length as possible during the movement;
(2) Try to observe the target closer to the target so that the depth error is as small as possible;
(3) During the movement of the bionic eye, control the two cameras so that they move at the same angular velocity;
(4) Try to keep the target symmetrical, and make lΔm and rΔm as equal as possible in the left and right camera images.
Based on these four methods, the motion strategy of the motor is designed, and the measurement accuracy of the target’s 3D information can be effectively improved.
According to the conclusion, we can define a definition of the optimal observed pose of the bionic eye to reduce the measurement error.
The optimal observation position needed to meet the conditions is listed in Equation (2). When the target is very close to the eyes, the target’s optimal observation position cannot be obtained because the image position of the target can be kept at the image center region. It is challenging to obtain the optimal solution of the observation position based on Equation (12). However, a suboptimal solution can be obtained by using a simplified calculation method. First,
lθt and
rθt are calculated in the case that
lθt and
rθt are equal to zero; then,
lθt and
rθt are calculated while
lθt and
rθt are kept equal to the calculated value. Trial-and-error methods can be used to obtain the optimal solution when the suboptimal solution is obtained.
where
3.2. Desired Pose Calculation for In Situ Gaze Point Tracking
When the range of target motion is large and the desired posture of the eyeball exceeds its reachable posture, the head and mobile robot move to keep the target in the center region of the image. In robotic systems, eye movements tend to consume the least amount of resources and do not have much impact on the stability of the head and mobile robot during exercise. Head rotation consumes more resources than the eyeball but consumes fewer resources than trunk rotation. At the same time, the rotation of the head affects the stability of the eyeball but does not have much impact on the stability of the trunk. Mobile robot rotation consumes the most resources and has a large impact on the stability of the head and eyeball. When tracking the target, one needs only to keep the target in the center region of the binocular image. Therefore, when performing gaze point tracking, the movement mechanism of the head, eyes and mobile robot are designed with the principle of minimal resource consumption and maximum system stability. When the eyeball can perceive the 3D coordinates of the target in the reachable and optimal viewing posture, only the eye is rotated; otherwise, the head is rotated. The head also has an attainable range of poses. When the desired pose exceeds this range, the mobile robot needs to be turned so that the bionic eye always perceives the 3D coordinates of the target in the optimal viewing position. Let
hγp and
hγt be the angles between the head and the gaze point in the
XhOhZh and
YhOhZh planes, respectively. The range of binocular rotation in the horizontal direction is [−
eθpmax,
eθpmax], and the range of binocular rotation in the vertical direction is [−
eθtmax,
eθtmax]. The range of head rotation in the horizontal direction is [−
hθpmax,
hθpmax], and the range of head rotation in the vertical direction is [−
hθtmax,
hθtmax]. For the convenience of calculation, the angles between the head and the fixation point in the horizontal direction and the vertical direction are designated as [−
hγpmax,
hγpmax] and [−
hγtmax,
hγtmax], respectively. When the angle between the head and the target exceeds a set threshold, the head needs to be rotated to the
and
positions in the horizontal and vertical directions, respectively. When
exceeds the angle that the head can attain, the angle at which the mobile robot needs to be compensated is
wθp. In the in situ gaze point tracking task, the cart does not need to translate in the
XwOwZw plane, so
xw = 0, and
zw = 0. Furthermore, according to the definition of the optimal observation pose of the bionic eye, the conditions that
gfq should satisfy are
The desired pose needs to be calculated based on the 3D coordinates of the target. Therefore, to obtain the desired pose, it is necessary to acquire the 3D coordinates of the target according to the current pose of the robot.
3.2.1. Three-Dimensional Coordinate Calculation
The mechanical structure and coordinate settings of the system are shown in
Figure 6a. The principle of binocular stereoscopic 3D perception is shown in
Figure 6b. E is the eye coordinate system, E
l is the left motion module’s end coordinate system, E
r is the right motion module’s end coordinate system, B
l is the left motion module’s base coordinate system, B
r is the right motion module’s base coordinate system, C
l is the left camera coordinate system and C
r is the right camera coordinate system. In the initial position, E
l coincides with B
l, and E
r overlaps with B
r. When the binocular system moves, the base coordinate system does not change.
lT represents the transformation matrix of the eye coordinate system E to the left motion module’s base coordinate system B
l,
rT represents the transformation matrix of E to B
r,
lTe represents the transformation matrix of B
l to E
l,
rTe represents the transformation matrix of B
r to E
r and
lTm represents the leftward motion. The module end coordinate system corresponds to the transformation matrix of the left camera coordinate system, and
rTm represents the transformation matrix of the right motion module’s end coordinate system to the right camera coordinate system.
lTr represents the transformation matrix of the right camera coordinate system to the left camera coordinate system at the initial position.
The origin lOc of Cl lies at the optical center of the left camera, the lZc axis points in the direction of the object parallel to the optical axis of the camera, the lXc axis points horizontally to the right along the image plane and the lYc axis points vertically downward along the image plane. The origin rOc of Cr lies at the optical center of the right camera, rZc is aligned with the direction of the object parallel to the optical axis of the camera, rXc points horizontally to the right along the image plane and rYc points vertically downward along the image plane. El’s origin lOe is set at the intersection of the two rotation axes of the left motion module, lZe is perpendicular to the two rotation axes and points to the front of the platform, lXe coincides with the vertical rotation axis and lYe coincides with the horizontal rotation axis. Similarly, the origin rOe of the coordinate system Er is set at the intersection of the two rotation axes of the right motion module, rZe is perpendicular to the two rotation axes and points toward the front of the platform, rXe coincides with the vertical rotation axis and rYe coincides with the horizontal rotation axis.
The left motion module’s base coordinates system Bl coincides with the eye coordinate system E; thus, lT consists of an identity matrix. To calculate the 3D coordinates of the feature points in real time from the camera pose, it is necessary to calculate rT. At the initial position of the system, the external parameters lTr of the left and right cameras are calibrated offline, as are the hand–eye parameters of the left–right motion module to the camera coordinate system.
When the system is in its initial configuration, the coordinates of point
P in the eye coordinate system are
Pe (
xe,
ye,
ze). Its coordinates in B
l are
lPe (
lxe,
lye,
lze), and its coordinates
lPc (
lxc,
lyc,
lzc) in C
l are
The coordinates
rPe (
rxe,
rye,
rze) of point
P in B
r are
The coordinates
rPc (
rxc,
ryc,
rzc) of point
P in C
r are
The point in C
r is transformed into C
l:
Based on the Equations (6) and (9),
rT is available:
During the movement of the system, when the left motion module rotates by
lθp and
lθt in the horizontal and vertical directions, respectively, the transformation relationship between B
l and E
l is
The coordinates of point
P in C
l are
The point
lP1c (
lx1c,
ly1c) at which line
PlOc intersects
lZc = 1 is
The image coordinates of
lP1c in the left camera are
ml (
ul,
vl), (
lx1c,
ly1c) and (
ul,
vl) and can be converted by the parameters of the camera. According to the camera’s internal parameter model, the following can be obtained:
where
lMin is the internal parameter matrix of the left camera. The value of (
lx1c,
ly1c) can be obtained by the image coordinates of
lP1c, and the parameters of the left camera can be obtained by substituting (15) into (14):
During the motion of the system, when the right motion module rotates through
rθp and
rθt in the horizontal and vertical directions, respectively, the transformation relationship between B
r and E
r is
The coordinates of point
P in C
r are
The point
lP1c (
rx1c,
ry1c) at which line
PrOc intersects
rZc = 1 is
The image coordinates of
rP1c in the camera, namely,
mr (
ur,
vr), (
rx1c,
ry1c) and (
ur,
vr), can be converted using the parameters of the camera. According to the camera’s internal parameter model, the following can be obtained:
where
rMin is the inner parameter matrix of the right camera. The value of (
rx1c,
ry1c) can be obtained by the image coordinates of
rP1c and the parameters in the camera, and the following can be obtained by substituting (21) into (20):
Four equations can be obtained from Equations (16) and (22) for xe, ye and ze, and the 3D coordinates of point Pe can be calculated by the least squares method.
The 3D coordinates
Ph (
xh,
yh,
zh) in the head coordinate system can be obtained by Equation (23).
dx and
dy are illustrated in
Figure 4.
Let the angles at which the current moment of the head rotate relative to the initial position be
hθpi and
hθti; the coordinates of the target in the robot coordinate system are
According to the 3D coordinates of the target in the head coordinate system, the angle between the target and
Zh in the horizontal direction and the vertical direction can be obtained as follows:
When
hγp and
hγt exceed a set threshold, the head needs to rotate. To leave a certain margin for the rotation of the eyeball and for the convenience of calculation, the angles required for the head to rotate in the horizontal direction and the vertical direction are calculated by the principle shown in
Figure 7a,b, respectively.
Figure 7a shows the calculation principle of the horizontal direction angle when the target’s
x coordinates of the head coordinate system is greater than zero. After the head is rotated to
, the target point is on the
lZe axis of the left motion module end coordinate system, and the left motion module reaches the maximum rotatable threshold
eθpmax.
Figure 7b shows the calculation principle of the vertical direction when the target’s
y coordinates of the head coordinate system are greater than
dy. After the head is rotated to
, the target point is on the
Ze axis of the eye coordinate system, and the eye reaches the maximum threshold
eθtmax that can be rotated.
3.2.2. Horizontal Rotation Angle Calculation
Let the current angle of the head in the horizontal direction be
hθpi. When the head is rotated in the horizontal direction to
, the 3D coordinates of the target in the new head coordinate system are
The coordinates of the target in the new eye coordinate system are
After turning, the left motion module reaches the maximum threshold
eθpmax that can be rotated, so that
Simplifying Equation (30), we have
According to the triangular relationship,
The solution of Equation (33) is
Equation (35) has two solutions; therefore, we choose the solution in which the deviation
e of Equation (36) is minimized:
When the obtained
h is outside of the range [−
hθpmax,
hθpmax], the value of
hθpq is
Finally, one can obtain the
wθpq value:
Based on the same principle, when the
x coordinate of the target in the head coordinate system is less than 0, the coordinates of the target in the right motion module base coordinate system after the rotation are
After turning, the right motion module reaches −
eθpmax, and the following can be obtained:
We simplify Equation (40) as follows:
The same two solutions are available:
Select the solution in which the deviation
e of Equation (44) is minimized:
Using Equations (37) and (38), hθpq and wθpq can be obtained.
3.2.3. Vertical Rotation Angle Calculation
When the target’s
y coordinate in the head coordinate system is greater than
dy, the current angle of the head in the vertical direction is
hθti, and when the head is rotated in the vertical direction to
, the target is in the new head coordinate system. The 3D coordinates are
Using Equation (29), the coordinates of the eye coordinate system after the rotation of the target can be calculated:
After rotation, the left and right motion modules reach the rotatable maximum value
eθtmax in the vertical direction, so that
Simplifying Equation (48), we obtain
Equation (51) has two solutions; therefore, we choose the solution in which the deviation
e of Equation (52) is minimized:
Similarly, when the target’s
y coordinates in the head coordinate system are less than
dy, we have
When the obtained
is outside of the range [−
hθtmax,
hθtmax], the value of
hθtq is
After obtaining
hθpq,
hθtq and
wθpq,
are the coordinates of the target in the eye coordinate system after the mobile robot and the head are rotated:
The desired observation pose of the eye, characterized by lθtq, lθpq, rθtq and rθpq, can be obtained using the method described in the following section.
3.2.4. Calculation of the Desired Observation Poses of the Eye
According to Formula (2), lθtq = rθtq = θt, and lθpq = rθpq = θp.
The inverse of the hand–eye matrix of the left camera and left motion module end coordinate system is
The coordinate
lPc (
lxc,
lyc,
lzc) of
in the left camera coordinate system satisfies the following relationship:
According to the small hole imaging model, the imaging coordinates of the
point in the left camera are
Substituting Equation (61) into Equation (2), we obtain
Based on the same principle, the coordinate
rPc (
rxc,
ryc,
rzc) of
in the right camera coordinate system is
The imaging coordinates of point
in the right camera are
By Equations (2), (62) and (65), two equations related to
θt and
θp (see
Appendix C for the complete equations) can be obtained. It is challenging to calculate the values of
θt and
θp directly from these two equations, however. To obtain a solution, we consider a suboptimal observation pose and use this pose as the initial value; then, we use the trial-and-error method to obtain the optimal observation pose. When
θt is calculated, let
θp = 0; the solution of
θt can then be obtained by Δ
vl = −Δ
vr. When
θp is calculated, the solution of
θp is solved by Δ
ul = −Δ
ur. The solution
Pob (
θt,
θt,
θp,
θp) is a suboptimal observed pose. Based on the suboptimal observation pose, the trial-and-error method can be used to obtain the optimal solution with the smallest error. The range of
θt is [−
θtmax,
θtmax]. The range of
θp is [−
θpmax,
θpmax].
According to Equations (60) and (63), let
θp be equal to 0 to obtain
The following result is also available:
The base coordinate system of the left motion module is the world coordinate system. Therefore,
lTw is a unit matrix. To simplify the calculation, we have
According to the calculation principle of
Section 3.2.1, we have the following:
The solution to
θt that keeps the target at the center of the two cameras needs to satisfy the following conditions:
Substituting the second equation of Equations (69) and (70) into Equation (72) and solving the equation, we have
where
are
According to the triangle relationship, we have
Replacing cos
θt in Equation (73) with sin
θt, we obtain the following:
where
are
Four solutions can be obtained using Equation (81). The optimal solution is a real number, and the most suitable solution can be selected by the condition of Equation (72).
After θt is obtained, θp can be solved based on the obtained θt.
According to Equations (60) and (63),
is the solution obtained in
Section 3.2.2, so that
The following result is also available:
Since
is known, for convenience of calculation, we set
The following results are obtained:
The solution to
θp that keeps the target at the center of the two cameras needs to satisfy the following conditions:
Substituting the second equation of Equations (91) and (92) into Equation (94) and solving the available equation, we obtain
where
Replacing cos
θp in Equation (73) with sin
θp, we obtain
where
Four solutions can be obtained using Equation (102). The optimal solution must be a real number, and the most suitable solution can be selected using the condition of Equation (94). For the case where the four solutions cannot satisfy Equation (94), the position of the target is beyond the position that the bionic eye can reach. In this case, compensation is required through the head or torso. and obtained at this time are suboptimal solutions close to the optimal solution. θt and θp are the optimal solutions.
Through the above steps, the desired observation pose can be calculated. The calculation steps of
gfq can be summarized by the flow chart shown in
Figure 8.
3.3. Desired Pose Calculation for Approaching Gaze Point Tracking
The mobile robot approaches the target in two steps: the first step is that the robot and the head rotate in the horizontal direction until the robot and the head are facing the target, and the second step is that the robot moves straight toward the target. The desired position of the approaching motion should satisfy the following conditions: (1) the target should be on the
Z axis of the robot and the head coordinate system, (2) the distance between the target and the robot should be less than the set threshold
DT and (3) the eye should be in the optimal observation position.
afq can be defined as
The desired rotation angle
wθpq of the moving robot is the same as the angle
bγp between the robot and the target and can be obtained by
hθtq can be obtained using the method described in
Section 3.2. The optimal observation pose described in
Section 3.2.4 can be used to obtain
lθtq,
lθpq,
rθtq and
rθpq.
4. Robot Pose Control
After obtaining the desired pose of the robot system, the control block diagram shown in
Figure 9 is used to control the robot to move to the desired pose.
The desired pose is converted to the desired position of the motor. Δ
θlt, Δ
θlp, Δ
θrt, Δ
θrp, Δ
θht and Δ
θhp are deviations of the desired angle from the current angle of motor M
lu, motor M
ld, motor M
ru, motor M
rd, motor M
hu and motor M
hd, respectively.
lθm and
rθm are the angles at which each wheel of the moving robot needs to be rotated. During the in situ gaze point tracking process, the moving robot performs only the rotation of the original position, and the angle of the robot movement can be calculated according to the desired angle of the robot. When the robot rotates, the two wheels move in opposite directions at the same speed. Let the distance between the two wheels of the moving robot be
Dr; when the robot rotates around an angle
wθpq, the distance that each wheel needs to move is
The diameter of each wheel is
dw, and the angle of rotation of each wheel is (where counterclockwise is positive)
In the process of approaching the target, the moving robot follows a straight line, and the angle of rotation of each wheel is
The movement of the moving robot is achieved by controlling the rotation of each wheel. Each wheel is equipped with a DC brushless motor, and a DSP2000 controller is used to control the movement of the DC brushless motor. Position servo control is implemented in the DSP2000 controller.
In the robot system, the weight of the camera and lens is approximately 80 g, the weight of the camera and the fixed mechanical parts is approximately 50 g and the motor that controls the vertical rotation of the camera (rotating around the horizontal axis of rotation) and the corresponding encoder weighs approximately 250 g. The mechanical parts of the fixed vertical rotating motor and encoder weigh approximately 100 g. The radius of the rotation of the camera in the vertical direction is approximately 1 cm, and the rotation in the horizontal direction (rotation about the vertical axis of rotation) has a radius of approximately 2 cm. Therefore, when the gravitational acceleration is 9.8 m/s2, the torque required for the vertical rotating electric machine is approximately 0.013 N·m, and the torque required for the horizontal rotating electric machine is approximately 0.043 N·m. The vertical rotating motor uses a 28BYG5401 stepping motor with a holding torque of 0.1 N·m and a positioning torque of 0.008 N·m. The driver is HSM20403A. The horizontal rotating motor is a 57BYGH301 stepping motor with a holding torque of 1.5 N·m, a positioning torque of 0.07 N·m and drive model HSM20504A. The four stepping motors of the eye have a step angle of 1.8° and are all subdivided by 25, so the actual step angle of each motor is 0.072°, and the minimum pulse width that the driver can receive is 2.5 µs. The stepper motor has a maximum angular velocity of 200°/s.
The head vertical rotary motor uses a 57BYGH401 stepper motor with a holding torque of 2.2 N·m, a positioning torque of 0.098 N·m and drive model HSM20504A. The head horizontal rotary motor is an 86BYG350B three-phase AC stepping motor with a holding torque of 5 N·m, a positioning torque of 0.3 N-m and an HSM30860M driver. The step angle of the head motor after subdivision is also 0.072°. The head vertical motor has a load of approximately 5 kg and a radius of rotation of less than 1 cm. The head horizontal rotary motor has a load of approximately 9.5 kg and a radius of rotation of approximately 5 cm. In the experiment, we found that the maximum horizontal pulse frequency that the head horizontal rotary motor can receive is 0.6 Kpps. Its maximum angular velocity is 43.2°/s.
5. Experiments and Discussion
Using the robot platform introduced in
Section 2, experiments on in situ gaze point tracking and approaching gaze point tracking were performed
Each camera has a resolution of 400 × 300 pixels. The directions of rotation are [−45°, 45°]. The range of rotation of the head is [−30°, 30°].
dx and
dy are 150 mm and 200 mm, respectively. The internal and external parameters, distortion parameters, initial position parameters and left- and right-hand–eye parameters of the dual purpose method are calibrated as follows:
The experimental in situ gaze point tracking scene is shown in
Figure 10, with a checkerboard target used as the target. For in situ gaze point tracking, the target is held by a person. In the approaching target gaze tracking experiment, the target is fixed in front of the robot.
5.1. In Situ Gaze Point Tracking Experiment
In the in situ gaze experiment, the target moves at a low speed within a certain range, and the robot combines the movement of the eye, the head and the mobile robot so that the binocular vision can always perceive the 3D coordinates of the target at the optimal observation posture. This experiment prompts the robot to find the target and gaze at it. In the gaze point tracking process, binocular stereo vision is used to calculate the 3D coordinates of the target in the eye coordinate system in real time. Through the positional relationship between the eye and the head, the coordinate system of the target in the eye can be converted to the head coordinate system. Similarly, the 3D coordinates of the target in the robot coordinate system can be obtained. Through the 3D coordinates, the desired poses of the eyes, head and mobile robot are calculated according to the method proposed in this paper. Then, the camera is controlled to the desired position by the stepping motor; after reaching the desired position, the image and the motor position information are collected again, and the 3D coordinates of the target are calculated.
In the experiment, the angles between the head and the target,
hγpmax and
hγpmax, are each 30°. The method described in
Section 3 is used to calculate the desired pose of each joint of the robot based on the 3D coordinates of the target. In the experiment, the actual coordinate position and desired coordinate position of the target in the binocular image space, the actual position and desired position of the eye and head motor, the angle between the head and the robot and the target, and the target in the robot coordinate system are stored.
Figure 11a,b show the
u and
v coordinates of the target on the left image, respectively, and
Figure 11c,d show the
u and
v coordinates of the target on the right image, respectively. The desired image coordinates are recalculated based on the optimal observation position.
Figure 11e–h show the positions of the tilt motor (M
lu) of the left eye, the pan motor (M
ld) of the left eye, the tilt motor (M
ru) of the right eye and the pan motor (M
rd) of the right eye, respectively.
Figure 11i shows the positions of the pan motor (M
hd) of the head. Since the target moves in the vertical direction with small amplitude, the motor M
hu does not rotate, and the case is similar to the motion principle of the motor M
hd, so the motor position of the head only provides the result of the motor M
hd.
Figure 11j shows the angle deviation and rotation. In this figure, T-h is the angle between the head and target, T-r is the angle between the robot and target, R-r is the angle of the robot rotation from the origin location and T-o is the angle of the target to the origin location.
Figure 11k shows the coordinates (
wx,
wz) of the target in the world coordinate system.
Figure 11l shows the coordinates (
ox,
oz) of the target in the world coordinate system of the origin location.
As shown in
Figure 11, the image coordinate of the target is substantially within ±40 pixels in the central region of the left and right images in the
x direction. These coordinates are kept within ±10 pixels of the center region of the left and right images in the
y direction. Throughout the experiment, the target was rotated approximately 200° around the robot. The robot moved approximately 140°, the head rotated 30° and the target could be kept in the center region of the binocular images. The motor position curve shows that the motor’s operating position can track the desired position very well. The angle variation curve shows that the angle between the target and the head and the robot changes and that the robot turning angles are suitably consistent. The coordinates of the target shown in
Figure 11 in the robot coordinate system and the coordinates of the target in the initial position of the world coordinate system are very close to the actual position change in the target’s position.
Through the above analysis, we can determine the following: (1) It is feasible to realize gaze point tracking of a robot based on 3D coordinates. (2) Using the movement of the head, eyes and mobile robot used in this paper, it is possible to achieve gaze point tracking of the target while ensuring minimum resource consumption.
5.2. Approaching Gaze Point Tracking Experiment
The approaching gaze point tracking experimental scene is shown in
Figure 12.
The robot approaches the target without obstacles and reaches the area in which the robot can operate on the target. The target can be grasped or carefully observed. In the approaching gaze experiment, a target is fixed at a position 2.2 m from the robot, and when the robot moves to a position where the distance from the target to robot is 0.6 m, the motion is stopped, and the maximum speed of the moving robot is 1 m/s. The experiment realizes the approaching movement to the target in two steps: first, the head, the eye and the moving robot chassis are rotated so that the head and the moving robot are facing the target, and the head observes the target in the optimal observation posture; second, the movement is controlled. The robot moves linearly in the target’s direction. During the movement, the angles of the head and the eye are fine-tuned, and the 3D coordinates of the target are detected in real time until the z coordinate of the target in the robot coordinate system is less than the threshold set to stop the motion.
Figure 13 shows the results of the approaching gaze point tracking experiment.
Figure 13a,b show the
u and
v coordinates of the target on the left image, respectively, and
Figure 13c,d show the
u and
v coordinates of the target on the right image, respectively. The desired image coordinates are recalculated based on the optimal observation position.
Figure 13e–h show the positions of the tilt motor (M
lu) of the left eye, the pan motor (M
ld) of the left eye, the tilt motor (M
ru) of the right eye and the pan motor (M
rd) of the right eye, respectively.
Figure 13i shows the positions of the pan motor (M
hd) of the head.
Figure 13j shows the angle deviation and rotation. T-h is the angle between the head and the target, T-r is the angle between the robot and the target, R-r is the angle of the robot’s rotation from the origin location and T-o is the angle of the target to the origin location.
Figure 13k shows the coordinates (
wx,
wz) of the target in the world coordinate system.
Figure 13l shows the robot’s forward distance and the distance between the target and the robot.
The change in the image’s coordinate curve indicates that the coordinates of the target in the left and right images move from the initial position to the central region of the image and stabilize in the center region of the image during the approach process. In the process of turning towards the target in the first step, the target coordinates in the image fluctuate because the head motor rotates a large amount and is accompanied by a certain vibration during the rotation, which can be avoided by using a system with better stability. The variety curve of the motor position in
Figure 13 shows that the motion of the motor can track the target well with the desired pose, and the prediction of the 3D coordinates is not used during the tracking process, so this prediction is accompanied by a cycle lag. The changes in angle in
Figure 13 show that the robot system achieves the task of steering toward the target in the first few control cycles and then moves toward the target at a stable angle.
Figure 13a shows the change in the coordinates of the target in the robot coordinate system. When the robot rotates, fluctuations arise around the measured x coordinate, mainly due to the measurement error caused by the shaking of the system. The experimental results in
Figure 13b show that the robot’s movement toward the target is very consistent. During the approach process, the target can be kept within ±50 pixels of the desired position in the horizontal direction of the image while being within ±20 pixels of the desired position in the vertical direction of the image. The eye motor achieves fast tracking of the target in 1.5 s. The angle between the target and the head is reduced from 20° to 0°, and the angle between the target and the robot is reduced from 35° to 0°. The robot then over-turns. At 34°, the target changes by 34° from the initial position.
Through the above analysis, it can be found that by using the combination of the head, the eye and the trunk in the present method, the approach toward the target can be achieved while ensuring that the robot is gazing at the target.
6. Conclusions
This study achieved gaze point tracking based on the 3D coordinates of the target. First, a robot experiment platform was designed. Based on the bionic eye experiment platform, a head with two degrees of freedom was added, using the mobile robot as a carrier.
Based on the characteristics of the robot platform, this paper proposed a method of gaze point tracking. To achieve in situ gaze point tracking, the combination of the eyes, head and trunk is designed based on the principles of minimum resource consumption and maximum system stability. Eye rotation consumes the least amount of resources and has minimal impact on the stability of the overall system during the exercise. The head rotation consumes more resources than the eyeball but fewer than the trunk rotation. At the same time, the rotation of the head affects the stability of the eyeball but only minimally affects the stability of the entire robotic system. The resources consumed by the rotation of the trunk generally predominate, and the rotation of the trunk tends to affect the stability of the head and the eye. Therefore, when the eye can observe the target in the optimal observation posture, only the eye is rotated; otherwise, the head is rotated, and when the angle at which the head needs to move exceeds its threshold, the mobile robot rotates. When approaching gaze point tracking is performed, the robot and head first face the target and then move straight toward the vicinity of the target. Based on the proposed gaze point tracking method, this paper provides an expected pose calculation method for the horizontal rotation angle and the vertical rotation angle.
Based on the experimental robot platform, a series of experiments was performed, and the effectiveness of the gaze point tracking method was verified. In our future works, a practical task of delivering medicine in a hospital and more detailed comparative experiments, as well as discussions with other similar studies, will be implemented.