A System For Automatic Animation of Piano Performances
A System For Automatic Animation of Piano Performances
C(a, b) (1)
C(a, b) is the energy cost for any adjacent instructed ngers a and b for maintaining a pose.
The function value is obtained considering the distance and ease of two neighboring ngers
pressing on the piano keys. For example, we expect C(1, 4) to be minimum when d(a, b)
3-5, where d(a, b) denotes the distance between the 2 ngers pressing the piano keys in units
proportional to the breadth of a white key in the piano, because the thumb and ring ngers
are 3 ngers (corresponds to 3 keys on the piano) apart by default and so this corresponds
to the most relaxed arrangement. The cost value C(a, b), which is a segmented function
evaluated based on the performance experience of piano players, increases for larger or
smaller separations, as illustrated in Figure 3.
9
Page 26 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
3.1.3 Determining cost of hand motion
The hand motion cost from current ngering choice i to the next j, corresponding to the
edge E
i,j
connecting from Node i N
i
to Node j N
j
, arises from 3 individual costs:
C
E
i,j
= C
f
+ C
c
+ C
r
(2)
C
f
, which is a constant set based on the piano players experience, is used to penalize
the reuse of a nger in the subsequent chord, if it plays a different note than in the current
chord. It is easier for a non-instructed nger from the rst chord to play a new note in the
second chord, as we do not have to worry about re-positioning the ngers after playing the
rst chord. This cost encourages consecutive chords to be played with different ngers.
C
c
represents the extra energy required for a nger to cross over other ngers. While
playing a melody, the best ngering may involve the ngers crossing over the thumb to
play the next note or the thumb passing under the ngers, but this should be avoided when
unnecessary. This value is linearly proportional to the number of ngers between the moving
nger and the nger being crossed-over.
C
r
penalizes the extra local movement of the ngers required to strike from one note
to another, based on the fact that larger note changes for each nger will cause the wrist to
move more. This value is linearly proportional to the sum of the distance all ngers move
from the current piano keys to the next ones.
10
Page 27 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
3.1.4 Finding shortest paths by Dijkstras algorithm
Now that we have modeled the problem of ngering choice on a Trellis graph and computed
the costs of all the nodes and all the edges connecting them, we use Dijkstras algorithm to
nd the shortest path. We cannot use Dijkstras algorithm directly to compute the shortest
path, because the nodes have non-zero costs. Therefore we update the graph such that the
node costs are also incorporated into the edge costs as follows:
1. Update the edge cost C
E
i,j
of E
i,j
as:
C
E
i,j
= C
E
i,j
+ (C
N
i
+ C
N
j
)/2, edges(i, j) (3)
2. Update the node weights to zero as:
C
N
i
= 0, i (4)
Now that all nodes have zero weights, we can use Dijkstras algorithm to get the shortest
path in the updated trellis graph. Each node selected at each level of the graph gives the
ngering choice of instructed ngers for that chord.
3.2 Placement of non-instructed ngers depending on future notes
After calculating the ngering for instructed ngers, we need to determine how to pre-
position the non-instructed ngers to minimize the overall effort, which is required by the
piano performance for smooth hand motion. This has not been addressed by any previous
work as it is essential only while generating three dimensional animation output as our
11
Page 28 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
system does. The position for non-instructed ngers for the current chord are chosen so that
they are easier to re-position to play the next chord where they will actually be used, and
therefore to minimize the energy cost of hand motion. The algorithm operates as follows:
1) We wish to determine the position of the non-instructed ngers for chord i and assume
that the instructed ngers for all chords have already been positioned. We consider the next
k (1 k 4) chords in positioning the non-instructed ngers at i due to a maximum of 4
non-instructed ngers to pre-position. Note that the ability to pre-read the notes varies based
on the level of music, player and familiarity with the music piece, and our method provides
the solution assuming the player is familiar with the music and can generate most reasonable
ngering for the current non-instructed ngers.
2) If j is an instructed nger in chord i + k, then j has to be positioned in chord i, so
that it does not have to move much when we play chord i + k.
3) Let the instructed nger in chord i closest to the j
th
nger be a. There could be two
such ngers, one on either side. For all adjacent ngers that exist, the nger js position
should satisfy that the distance d(j, a) between j
th
and an adjacent nger should be in the
range of comfort playing.
4) When the above conditions are satised, the position of nger j has been determined
for chord i based on chord i +k. Do this until all the positions for non-instructed ngers for
a chord has been determined.
When a non-instructed nger is the thumb, ring, or little nger, we update the nger
position by -0.5 along the Z axis (move it to the left). In case the new position is occupied
12
Page 29 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
by another nger, we increase it instead by 1 (move it to the right). When the non-instructed
nger is the index or middle nger, we update the nger position by 0.5 (move it to the
right). When this new position is occupied by another nger, we decrease it by 1 (move it to
the left). We prefer to move some ngers to the left rst and some to the right rst in the aim
of generating a hand pose where ngers have minimum inuence of each other as indicated
by [19]. Row 1 of Figure 4 shows that the default positioning of the non-instructed ngers
is not always meaningful. Row 2 shows the hand pose after the correction has been done.
4 Finger and Hand Pose Calculation for Chord
For any given chord, there are 4 steps in simulating the hand motion as Figure 5: 1) calculate
the hand pose which includes the position of the ngertips and the position and orientation
of the wrist; 2) press down the piano keys; 3) hold the piano keys; fourth; 4) release keys
back to the original position. Because the last three steps can be simulated using methods
similar to the rst one, we just focus on the rst step. Also, an algorithm to handle the
complex performance such as nger crossovers and arpeggio are described.
Before further discussion of this section, we rst describe our hand model, as shown in
Figure 6. The ve ngers in the hand model are labeled from Finger 1 for the thumb to 5
for the little nger. There are 16 total joints with 27 DOFs in this hand model: 6 DOFs for
wrist joint labeled with black circle, 1 DOF for extension and exion of each DIP (Distal
Interphalangeal) joint and PIP (Proximal Interphalangeal) joint, 2 DOFs for extension and
13
Page 30 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
exion, adduction and abduction of each MCP (Metacarpophalangeal) joint for Finger 2
to Finger 5, and 3 DOFs for thumbs nger base rotation. Each nger is assigned an IK
Handler starting from the nger base and going to the ngertip. IK is used when the ngers
strike and release the keys, and FK is used for the in between movement for different chords,
with a blend used to smoothly switch between the FK and IK.
Some important notation is dened as follows: B
i
refers to the base of nger i, T
i
refers
to the ngertip of nger i. P denotes joint position,and denotes the joint orientation.
World axes and wrist local axes are dened as shown in Figure 7.
4.1 Initiate hand pose
The generated ngering of instructed and non-instructed ngers in Section 3 is used to
decide the position of the ngertips along the axis Z, which are then used as the basic pa-
rameters to evaluate other position components of ngertips and the position and orientation
of the wrist. The nger base position P
B
is decided by the wrist position P
W
and orienta-
tion
W
because the nger bases are xed on the palm, but the ngertip positions P
T
, wrist
position P
W
and orientation
W
have to be calculated.
4.1.1 Initiate nger positions
P
T
i
(t)
z
represents which key is pressed and is hence determined by the ngering method.
P
T
i
(t)
x
is determined by wrist position and piano key range occupied by occupied by
ngers.P
T
i
(t)
y
is the height of a black or white key being pressed.
14
Page 31 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
4.1.2 Initiate wrist position
The wrist position is determined based on the fact that when the ngers are more spread,
the wrist will move forward along the X axis and have a lower position value along the Axis
Y; Along the Axis Z, the wrist moves relatively closer to little nger and farther from the
thumb. Therefore we compute the wrist position as follows:
P
W
(t)
z
is the weighted sum of the Z components of all nger tip positions. The thumb
and little nger generally have much more inuence in determining the wrist position along
the axis Z, and therefore they have much larger weight than other ngers.
P
W
(t)
x
is determined by the instructed ngers relative positions along the X axis in a
standard pose, the inuence of the nger (prioritized 1, 5, 2, 4 and 3 in decreasing order)
and the allowable contact range on the keys pressed by these ngers. For example, if the
instructed thumb has to move to a black key from a white key, then this will fully dictate
the wrist movement, as the wrist will have to move less to accommodate the position of any
other nger.
P
W
(t)
y
has similar way of the determination of P
W
(t)
x
.
4.1.3 Initiate wrist orientation
The wrist orientation along the Y axis,
W
(t)
y
, is computed as:
W
(t)
y
=
5
i=1
w
i
(P
T
i
, P
W
, t)
y
(5)
where (P
T
i
, P
W
, t)
y
is the orientation around the Y axis of the ray from the wrist to
15
Page 32 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
T
i
at time t. w
i
for ve ngers are pre-computed weights, which reect the dependence of
rotation of Joint i and the wrist.
In order to obtain ve w
i
, we setup an equation set which consists of ve equations
based on Equation (5) for ve different chords. For each equation, take w
i
for i=1,..,5 as
ve unknown variables, and obtain values of (P
T
i
, P
W
, t)
y
for i=1,..,5 based on the motion
capture data with information of wrist and ngertip position for each chord. Solve the
equation set to obtain the ve weights.
The result of this process is that w
1
has the smallest weight, while w
3
has the largest
weight, which means the wrist orientation mainly depends on the position of non-thumb
ngers on the palm and the middle nger position on the X axis and Z axis, because hand
poses usually satisfy the condition that the middle ngertip, nger base and wrist are almost
co-linear.
We calculate
W
(t)
z
as follows:
W
(t)
z
=
W maxz
w
D
(t) (6)
This equation implies that the wider the key range the ngers press on, the lower the
wrist orientation around Z. This happens because we get a wider range to spread the ngers
when the wrist is closer to the keyboard surface than when it is far away.
W maxz
is the
largest angle of the wrist along the Z axis (also the initial orientation along the Z axis for
the standard pose). controls the largest wrist orientation along the axis Z; w
D
(t) is the
rotation weight inuenced by the 5 nger distribution at time t, and a larger nger extension
16
Page 33 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
will have larger w
D
(t)
W
(t)
x
is computed as:
W
(t)
x
= arctan
P
B(ring)
(t)
y
P
B(index)
(t)
y
P
B(ring)
(t)
z
P
B(index)
(t)
z
(7)
This equation keeps the hand nearly parallel with the piano face. Because the index and
ring nger bases are xed on the palm, they can be used to dene a line in three dimensional
space, and therefore the projection to the plane perpendicular to the X axis can be used to
evaluate the orientation of the wrist around the axis X. The parameter is evaluated based
on the standard pose pressing on 5 neighboring keys with distance of 4, and distance of 7,
respectively, from the little nger to thumb.
4.2 Crossover between thumb and other ngers
Crossover is common while playing various note sequences. This case is handled separately
using the algorithm outlined in Figure 8 which illustrates the case where the thumb crosses
under nger j.
After a nger j presses down the corresponding key for chord i, the wrist is translated
by a distance, which is evaluated based on the corresponding key postures extracted from
motion capture data depending on what nger it crosses over: index, middle or ring. After
the translation, the wrist is rotated such that
(P
W
, P
B
j
, t)
y
= arctan
P
B
j
(t)
x
P
W
(t)
x
P
B
j
(t)
z
P
W
(t)
z
(8)
17
Page 34 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
(P
W
, P
T
j
, t)
y
= arctan
P
T
j
(t)
x
P
W
(t)
x
P
T
j
(t)
z
P
W
(t)
z
(9)
(P
W
, P
B
j
, t)
y
(P
W
, P
T
j
, t)
y
[
min
(j),
max
(j)] (10)
The joint angles of all ngers other than the thumb are kept the same, and the positions
are translated by the wrist rotation. After j releases the key and the thumb presses down its
key, the wrist and ngertips are translated to the position for Chord i + 1.
The algorithmfor ngers crossing over the thumb is similar to case of the thumb crossing
under other ngers.
4.3 Arpeggio skill
In arpeggios, notes in a chord are played in sequence rather than simultaneously. Sometimes
the successive notes might be farther apart from each other, such as playing a typical chord
with notes C3-G3-C4-E4 using left ngering 5-3-2-1, which is more difcult to handle than
playing common chords because more complex wrist motion is needed to make sure the
instructed nger can reach the required piano note in the arpeggio in time while keeping
hand pose natural. The method in Section 4.1.2 is rst used to compute the initial wrist
position for the common chord which has the same notes as the required arpeggio, and then
we shift the wrist position to satisfy the following geometry constraints:
B
i1
,T
i1
i1
(d(T
i1
, T
i
)) =
B
i
,T
i
i
(d(T
i1
, T
i
)) +
W,B
i1
W,B
i
(11)
Just as shown in Figure 9, T
i
is the nger tip used to strike note i and B
i
is the corre-
sponding nger base;
B
i
,T
i
is the orientation along the Y axis from nger base i to ngertip
18
Page 35 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
i;
W,B
i
is the orientation along the Y axis of the vector from wrist to note is nger base;
i
(d(T
i1
, T
i
)) is the rotation offset used for the nger to strike note i based on the distance
between two neighboring instructed nger tips. This constraint is to determine the wrist
position so that the two instructed ngertips can be positioned on the required neighboring
piano keys for smooth hand motion from note i 1 to note i.
5 Optimization of Key Hand Poses
For most relaxed playing, piano theory requires the player to keep natural and precise poses
while decreasing extra motion. Therefore an novel optimization method with geometry con-
straints is proposed to smooth the hand motion between the hand key poses for each chord.
The following is the objective function to nd the wrist pose sequence which minimizes
the overall motion cost determined by the wrist translation and rotation for all the given n
chords:
min
C
i
(
n
i=2
C
i
) = min
P
i
(
n
i=2
(||P
i
P
i1
|| + w ||
i
i1
||)) (12)
Where C
i
is energy cost for Chord i; w is the weight between translation and rotation
components; ||P
i
P
i1
|| and ||
i
i1
|| are the wrist translation and rotation between two
neighboring chords respectively. Sequential Quadratic Programming (SQR) is used for op-
timization solution of the minimum motion cost, considering the following four constraints
for each chord.
c
1
:
J
i,j
[
J
i,j
min
,
J
i,j
max
] (13)
19
Page 36 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
c
2
: d(T
i
, B
i
) [d(T
i
, B
i
)
min
, d(T
i
, B
i
)
max
] (14)
c
3
: P
i
[P
i
1
, P
i
+
2
] (15)
c
4
:
i
(16)
c
1
describes a reasonable rotation range constraint for nger is Joint j, which is used to
ensure that the nger has a natural pose.
c
2
describes the distance constraint between the ngertip and base, so that the ngertip
can reach the required piano key. d(T
i
, B
i
)
min
and d(T
i
, B
i
)
max
respectively denote the
maximal and minimal distance range between nger is tip and base.
c
3
is the translation constraint used to maintain the local optimized position P
i
for Chord
i. Because our system can usually generate a good initial pose for each chord, our optimiza-
tion method uses these good initial poses to generate natural and energy-saving poses with
smaller variable range of wrist translation around initial poses.
c
4
is used to generate a natural wrist orientation
i
based on the P
i
and 5 nger distri-
butions, and is computed in Section 4.1.3.
6 Simulation of Motion Curve
After generating the optimized natural key poses for the given notes, the following set of
steps are used to construct the realistic motion curve between these key poses.
20
Page 37 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
6.1 Wrist motion between chords
When the hand moves from one chord to another, the hand will move up and down during
the motion as shown in Figure 10. Motion capture data shows that the wrist is usually raised
to its maximum height (along the Y axis) in the middle of two neighboring chords, and this
component is calculated by:
P
W
((t
i
+t
i1
)/2)
y
= P
W
(t
i
)
y
+P
W
(t
i1
)
y
+V (i)D(i, i1)H(P
W
(t
i
), P
W
(t
i1
)) (17)
Where V is linearly determined by the volume of Chord i; D is linearly determined by
Chord i 1 and Chord is duration; H is the basic height, inversely proportionate to wrist
translation between the two chords.
After determining the height in the middle position, motion capture data is used to inter-
polate the key frames between the middle position and the two key poses for the two chords
along the Axis Y, and the motion capture data along the X axis and Z are used to generate
the corresponding curve components in order to generate more realistic hand motion by the
following procedure: 1) Sample hand motion used for 20 performances of the same chords.
2) Segment the motion capture curve manually into up and down motion. 3) Normalize the
motion capture clips to the same duration and average them to generate a reference curve.4)
Sample this reference curve and use it as the interpolation function when generating output
motion.
21
Page 38 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
6.2 Inuence of instructed ngers on non-instructed ngers
While striking the piano keys, the rotation of instructed ngers induces some rotation in
the non-instructed ngers, just as shown in Figure 11. Usually, the less skilled the player
is, the more the dependence the instructed ngers will have on the non-instructed ngers.
In order to simulate the inuence of instructed ngers on non-instructed ngers, we dene
dependence index Dep(i) for non-instructed nger is movement inuenced by instructed
nger j as:
Dep(i) =
S
ij
(i, j)
I
(18)
where S
ij
is the slope of the relative motion of the i
th
nger during the j
th
instructed
movement [19], and I is the number of instructed ngers. If nger i is an instructed nger,
the dependence index will be 0 because the instructed nger should exactly press on the
piano key no matter where the other ngers are. If nger i is a non-instructed nger, it
will be close to 1 when the neighboring instructed ngers have high inuence on this nger
and will be close to 0 when the neighboring instructed ngers have little inuence. Given
a maximum movement range Y
max
(i) along the Y axis, the position of the non-instructed
nger along the Y axis due to the inuence of surrounding ngers is given by:
P(i)
y
= Y
max
(i) Dep(i). (19)
22
Page 39 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
6.3 Wrist compensation
The action of pressing a key will tend to induce an upward movement of the wrist due to
the reaction force of the strike. While playing low-volume notes (such as only rotate nger
base and/or wrist to strike the keys which will generate low-volume sound), the wrist will do
an up-down vertical response, and while playing high-volume notes (such as rotate up arm
and/or shoulder to strike the keys which will generate much larger volume sound), the wrist
will do an down-up-down response. Additional key frames based on the feature between
extracted motion capture data of wrist and the corresponding sound are inserted to achieve
this, and the amount of compensation is therefore scaled based on the note volume.
7 Results and Discussion
We present piano animations showing a range of different nger placement and motions
styles.
7.1 Scales
In music, a scale is a sequence of notes in ascending or descending order that is used to
conveniently represent part or all of a musical work including melody and/or harmony [20].
Finger crossovers always happen during playing scales, and are generated realistically in our
system. This feature is demonstrated by Figure 12 and the rst demo in the accompanying
video which includes a side-by-side comparison between the generated animation and real
23
Page 40 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
playing.
7.2 Chords
A chord consists of a set of notes that are heard as simultaneous sound, but might not be
played simultaneously. Generally, notes are played at the same time, except in the special
case of an arpeggio, where the notes are played quickly in sequence. This can be handled
by the simple case of individual notes being pressed. Below, we discuss the common chord
which is played simultaneously and show that our algorithm generates realistic piano per-
formance animation for complex chords.
Figure 13 shows that our system generates correct ngering of instructed ngers for the
rst part (6 chords) of the musical notation of Bilder einer Ausstellung composed by Modest
Mussorgsky.
The generated ngering choice is found to be very feasible for the hand, and Figure 14
shows snapshots of hand poses for some complex chords in the chord demo, which are again
feasible to play and natural.
We analyze the realism of our animation below after the optimization step. In this ex-
ample our optimization method improves the total motion cost (150.6) by about 22%, trans-
lation improvement(111.3 cm) by 10%, and rotation improvement (78.5 degree) by 45%.
Note that the default weight between translation and rotation is 1.
Figure 15 visualizes the three translation components and the one rotation component
24
Page 41 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
along the Y axis (the other components are based on the same parameters and are the same)
before and after optimization. These graphs illustrate that the optimization method can
yield a smoother key pose sequence with less wrist translation and rotation, and therefore
can minimize motion cost. Note the 28 nodes in each curve correspond to the key poses
for the 28 chords; the line between nodes are used to better trace how the wrist component
changes as the chord music progresses.
The following Figure 16 shows for the music with 28 chords, the animated wrist motion
agrees well with the ground truth data.
7.3 A music piece
Finally a music piece, Childhood Memory composed by Modest Mussorgsky, is used to
generate a comprehensive demo to show all the features supported by our current system,
including the simulation of relative nger rotation, wrist compensation, nger crossover and
arpeggio skill. Some key snapshots are shown in the images below.
The rst image in Figure 17 shows the instructed index nger for next chord causing the
relative rotation of the non-instructed ring nger; the second shows the index nger fully
pressing down the key while ring nger returns back to the key surface; the third shows the
wrist moving up due to wrist compensation after the index nger fully presses down the key
(the wrist motion causes the joints of other ngers to rotate a little while keeping contact
with the the piano keys).
25
Page 42 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
8 Conclusion and Future Work
We have described a system that automatically generates three dimensional animation of pi-
ano playing, given an input piece of music. The graph-based approach determines ngering
which ensures that the character plays the piece in as relaxed a manner as possible, which is
one of the fundamental principles of piano theory. A novel rule-based approach is proposed
to pre-position non-instructed ngers such that it is convenient and easy to use them for
playing succeeding notes. Initial key hand poses are determined based on generated nger-
ing and piano theory, and the complex but often encountered cases such as nger crossovers
and arpeggio are also handled. An optimization-based method operating on geometry con-
straints is proposed to generate smooth and natural key pose sequences for the hand. Motion
capture data is then employed to further smooth the transition between poses. We believe
the resulting motion is realistic enough to be used directly as a tool for piano self-study. The
comparison between the generated motion curves and the curves of the raw motion capture
data of a real piano playing shows a good level of realism.
Our hand touch model, which allows goal locations for each nger and maintains exact
timing constraints could be extended to perform animation of other instruments with keys,
such as woodwinds, brass and string instruments. Also, our approach may be benecial for
generating natural grasping with a more beautiful hand pose.
The rst limitation of our system is that although it solves the collision between the
ngertip and the piano surface, it does not handle interpenetration between ngers and col-
26
Page 43 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
lisions with the sides of the black keys.
Secondly, although our system can generate plausible and reasonably realistic piano
playing for standard music, it is not capable of generating emotional piano playing which
reects a personal understanding of the music and players performance background. This
is the main future work we will be pursuing.
Thirdly, it might be benecial to apply principles of machine learning to our system to
learn the parameters for determining standard ngering more accurately in cases where mul-
tiple ngering sequences have the same optimal cost during instructed ngering generation.
Fourthly, we will enhance our system to generate animation for various size hand mod-
els, to meet the requirements of piano students with different hand shapes, so that our system
can be used as a good piano teaching tool.
Fifthly, sometimes a melody must be played continuously by two hands. Because our
method generates ngering for each hand separately, our work cannot generate ngering for
this performance that requires planning ngering simultaneously for two hands. Note that
this problem is also unsolved in all of the previous work.
Finally, our solution does not consider the interdependent rotation of the joints within a
nger, and does not directly simulate the inuence from interdependence between ngers.
Future work on this point would be helpful to improve the nger motion during striking and
releasing the piano keys.
27
Page 44 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
References
[1] S. I. Sayegh. Fingering for string instruments with the optimum path paradigm. Com-
puter Music Journal, 13(3):7684, 1989.
[2] H. Heijink and R. G. J. Meulenbroek. On the complexity of classical guitar playing:
Functional adaptations to task constraints. Journal of Motor Behaviour, 34(4).
[3] A. B. Viana and A. C. de M. Junior. Technological improvements in the siedp. IX
Brazilian Symposium on Computer Music, 2003.
[4] C.-C. Lin and D. S.-M. Liu. An intelligent virtual piano tutor. Proceedings of the
2006 ACM International Conference on Virtual Reality Continuum and its Applica-
tions, pages 353356, 2006.
[5] J.A. Clarke E.F. Raekallio M. Parncutt, R. Sloboda and P. Desain. An ergonomic model
of keyboard ngering for melodic fragments. Music Perception: An Interdisciplinary
Journal, 14:341382, 1997.
[6] H. Yonebayashi, Y. Kameoka and S. Sagayama. Automatic decision of piano ngering
based on hidden markov models. Proceedings of the 20th International Joint Confer-
ence on Articial Intelligence, pages 29152921, 2007.
[7] A. Radisavljevic and P. Driessen. Path difference learning for guitar ngering problem.
Proceedings of the International Computer Music Conference, 2004.
28
Page 45 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
[8] D.R. Tuohy and W.D. Potter. A genetic algorithm for the automatic generation of
playable guitar tablature. Proceedings of the International Computer Music Confer-
ence, page 499502, 2005.
[9] D. R. Tuohy. Creating tablature and arranging music for guitar with genetic algorithms
and articial neural networks. A Thesis Submitted to the Graduate Faculty of The
University of Georgia in Partial Fulllment of the Requirements for the Degree Master
of Science, The University of Georgia, 2006.
[10] R. Hart, M. Bosch and E. Tsai. Finding optimal piano ngerings. Undergraduate
Mathematics and Its Applications, 21(2):67177, 2000.
[11] E. Kasimi, A. A. Nichols and C. Raphael. Automatic ngering system. The Interna-
tional Society for Music Information Retrieval poster presentation, 2005.
[12] L. Radicioni, D. Anselma and V. Lombardo. A segmentation-based prototype to com-
pute string instruments ngering. Proceedings of the Conference on Interdisciplinary
Musicology, 2004.
[13] D. Radicioni and V. Lombardo. Guitar ngering for music performance. Proceedings
of the International Computer Music Conference, page 527530, 2005.
[14] Q. Yasumuro, Y. Chen and K. Chihara. Three-dimensional modeling of the hu-
man hand with motion constraints. Proceedings of Image and Vision Computing,
17(2):149156, 1999.
29
Page 46 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
[15] K-H. Lee and K. H. Kroemer. A nger model with constant tendon moment arms.
Proceedings of Human Factors and Ergonomics Society 37th Annual Meeting, 37:710
714, 1993.
[16] N.S. Pollard and V.B. Zordan. Physically based grasping control from example. Pro-
ceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Ani-
mation, pages 311318, 2005.
[17] G. ElKoura and K. Singh. Handrix: animating the human hand. Proceedings of
the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages
110119, 2003.
[18] J. Kim, F. Cordier, and N. Magnenat-Thalmann. Neural network-based violinists hand
animation. Proceedings of Computer Graphics International 2000, pages 3741, 2000.
[19] C. Hager-Ross and M. H. Schieber. Quantifying the independence of human nger
movements: comparisons of digits, hands, and movement frequencies. The Journal of
Neuroscience, 20(22):85428550, 2000.
[20] B. Benward and M. Saker. Music in Theory and Practice Volume 1. Mcgraw-Hill
College; 7 Edition, 1997.
30
Page 47 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 1: Flowchart of system implementation
31
Page 48 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 2: A trellis graph where consecutive time-slices correspond to consecutive chords in
the piece. The weighted nodes represent the cost of hand poses for a ngering choice and
weighted edges represent the cost of hand motion between the hand poses
32
Page 49 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 3: Hand pose cost values for poses corresponding to different separation of the thumb
and ring ngers
33
Page 50 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 4: Repositioning of non-instructed ngers which are indicated by the red circles
around them
34
Page 51 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 5: Data ow of simulation of piano performance
35
Page 52 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 6: Important joints in our hand model
36
Page 53 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 7: The axes in our model
37
Page 54 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 8: Algorithm to handle nger crossover
38
Page 55 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 9: Arpeggio Skill
39
Page 56 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 10: Hand motion from one chord to another. The hand will reach its highest position
above the piano keyboard in the middle of the motion
40
Page 57 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 11: Rotation of instructed nger(s) inuences the rotation of non-instructed n-
gers. In this example, the rotation of instructed middle nger inuences the rotation of
non-instructed ring and little ngers
41
Page 58 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 12: Key poses of nger crossover while playing scales. The rst row shows a key-
frame of the thumb crossing over the middle nger while playing the C-major scale and the
second row shows a key-frame of the thumb crossing over the ring nger while playing the
D-major scale, both from 3 perspectives. Note that the ring/middle nger rmly presses
down the keys, the ngers avoid collisions with black keys in C-major, the wrist maintains
a natural rotation and the thumb is positioned well on the key to play it after crossing over
42
Page 59 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 13: Correct ngering generated for the rst part of Bilder einer Ausstellung
43
Page 60 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 14: Pose for (a) E4-G4-C5 (b) F4-F5 (c) #D4-G4-#A4-#D5
44
Page 61 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 15: Motion comparison before and after optimization. The rst three sub-gures
show the motion curve along three axis components are smoothed after optimization as
the red curves shows. The fourth sub-gure shows the rotation of the wrist along vertical
axis(the Yaxis) is also smoothed. The four gures together demonstrate that the hand moves
with less distance and rotation for the same music clip after optimization, and therefore the
hand motion is more smooth after optimization routine.
45
Page 62 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 16: Motion comparison between ground truth data and optimization result. The
motion curves along three translation components follows tightly with the those of corre-
sponding motion capture data for the same music clip.
46
Page 63 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Figure 17: Some key-poses while playing Childhood memory
47
Page 64 of 64
http://mc.manuscriptcentral.com/cavw - For Peer Review
Computer Animation and Virtual Worlds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60