0% found this document useful (0 votes)
43 views12 pages

Lexicon of Sound

Uploaded by

klepkoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views12 pages

Lexicon of Sound

Uploaded by

klepkoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PAPERS

Categorization of Sound Attributes for Audio


Quality Assessment—A Lexical Study

SARAH LE BAGOUSSE, MATHIEU PAQUIER, AES Member , AND CATHERINE COLOMES


([email protected]) ([email protected])

University of Brest, Lab-STICC CNRS UMR 6285, 6 avenue Le Gorgeu – 29238 Brest, France
Orange Labs, Cesson Sévigné, France

In most present recommendations for audio quality assessment, only Basic Audio Quality
(BAQ) is evaluated. The assessment of other criteria could provide a good complement and
an explanation of the BAQ result. A broad range of elicitation methods have been used to
collect terms describing the perceived features of audio and spatial audio. Such work has
led to the development of a specialized vocabulary for audio, but the large number of terms
makes it difficult to take all potential parameters into account in audio assessments. Moreover,
the interpretation of definitions used for sound attributes may differ between listeners. The
present study, which is exclusively lexical (without sound listening), aimed to reduce the
number of sound attributes by classing them into categories. The ultimate goal is to integrate
these categories (in addition to the BAQ) into recommendations for audio quality assessment.
Experiments used two methods: (1) multidimensional scaling and (2) free categorization
followed by a cluster analysis. An agreement was found between the results of these two
methods and three categories were defined: timbre, space, and defects.

0 INTRODUCTION reference and the object.” Although it is an essential at-


tribute, BAQ is likely multidimensional, and it would be
0.1 Context: Recommendations for Audio useful to evaluate other attributes through the use of listen-
Evaluation ing tests to detect which areas of sound quality are impaired
Currently, for the perceived quality evaluation of audio by audio coding. These two recommendations suggest other
coding, two well-known subjective test methods are rec- attributes for the evaluation of systems more complex than
ommended by the International Telecommunication Union monophonic ones, but these are seldom used in practice. For
(ITU): stereophonic systems, recommendations allude to stereo-
The ITU-R BS.1116 [1] should be used for “the sub- phonic image quality (related to the differences between
jective assessment of small impairments in audio systems, the reference and the object in terms of sound image lo-
including multichannel sound systems.” cations and the sensation of depth and reality of the audio
The ITU-R BS.1534 [2], also known as the “Multiple event). For multichannel systems, two attributes are sug-
Stimuli with Hidden Reference and Anchor” (MUSHRA) gested: (i) Front image quality (localization of the frontal
methodology, is recommended for the assessment of inter- sound sources, including stereophonic image quality and
mediate audio quality and specified for numerous applica- losses in definition), and (ii) Impression of surround quality
tions. Indeed, rapid developments in the use of the Internet (related to spatial impression, ambiance, or special direc-
for distribution and broadcast of audio material, where the tional surround effects).
data rate is limited, have led to a compromise in audio qual- The goal of this study is to define categories of attributes,
ity. Other applications that may contain intermediate audio providing a complement to the single BAQ. With the rapid
quality include digital AM (i.e., digital radio mondiale: growth of multichannel codecs, microphone arrays, spatial
DRM), digital satellite broadcasting, commentary circuits audio reproducing systems, etc., it would be of particular
in radio and TV, audio on demand services, and audio on interest to understand whether BAQ provided by a technol-
dial-up lines [3–6]. ogy is related to a specific quality of spatial reproduction
The sole parameter assessed by these two methods is Ba- or to other properties of sound. In this regard, the ITU-R
sic Audio Quality (BAQ), defined as the “global attribute BS.1534 [2], states: “Although some studies have shown
used to judge any and all detected differences between the that stereophonic image quality can be impaired, sufficient

736 J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November
PAPERS CATEGORIZATION OF SOUND ATTRIBUTES FOR AUDIO QUALITY ASSESSMENT – A LEXICAL STUDY

Table 1. Synthesis of several studies on sound quality attributes. way the other two were alike. A new triad was then pre-
sented and the same question asked. This continued until
Authors Lists of quality attributes the subject stopped providing new answers. A grid was then
Nakayama et al. [14] Sensation of clearness, Sensation of constructed upon which subjects rated each of the stimuli
fullness, Depth of the image. according to each of the constructs elicited in the previous
Gabrielsson and Clearness/Distinctness, phase.
Sjogren [18] Brightness/Darkness, In their study Choisel and Wickelmaier [9] explained
Sharpness/Hardness – Softness, the disadvantages of direct elicitation methods, which rely
Fullness/Thinness, Feeling of space,
Disturbing sounds, Nearness, on the assumption of a close correspondence between a
Loudness. sensation and its verbal descriptor. First, the elicitation of
Toole [16] Clarity definition, Brightness, Softness, auditory attributes will be dependent upon the availability of
Fullness, Pleasantness, an adequate label in the subject’s lexicon. Second, it cannot
Hiss-noise-distortion, Impression of be ensured that the verbal expression provided by a listener
distance-depth, Definition of sound
image, Continuity of the sound stage, is really related to an actual sensation. Perceptual Structure
Fidelity, Abnormal effects, Analysis (PSA) was developed by Choisel and Wickelmaier
Reproduction of ambiance, based on a mathematical foundation of Formal Concept
spaciousness and reverberation, Analysis (FCA) [10] and Knowledge Space Theory (KST)
Perspective, Overall spatial rating. [11]. The major advantage offered by this approach lies in
Berg and Rumsey Localization, Preference, Envelopment,
[17] Presence, Naturalness, Source distance, the fact that it strictly separates the identification of auditory
Source width, Background noise level. sensations from their labeling. Subjects were presented with
Koivuniemi and Tone color, Richness, Hardness, triads of stimuli. After listening to a triad, a listener had to
Zacharov [19] Emphasis, Naturalness, Sense of answer Yes or No to the question: “Do sounds ‘a’ and ‘b’
direction, Sense of depth, Sense of share a feature that ‘c’ lacks?” The subject was presented
space, Sense of movement, Penetration,
Distance to events, Broadness. with all possible triplets, and based on the responses it
Guastavino and Katz Coloration, Presence, Readability, was possible to extract the auditory features underlying the
[15] Stability, Naturalness/Realism, responses. Then, having analyzed the perceptual structures,
Distance, Localization, Spatial the listener had to name them.
distribution of sound, Spectral balance. Multidimensional Scaling (MDS) transforms listener
Lorho [20] Clarity, Richness, Sense of distance,
Sense of direction, Sense of movement, judgments (made without any reference) of similarity (be-
Ratio of localization, Quality of echo, tween sounds, terms, visual objects, etc.) into distances
Sense of space, Amount of echo, represented in a multidimensional space. The resulting
Balance of space, Separability, perceptual maps show the relative positioning of all ob-
Broadness, Distortion, Disruption, jects. The last step of MDS is generally the scale inter-
Tone color, Balance of sounds.
Choisel and Clarity, Brightness, Spaciousness, pretation (made by correlating physical or perceptual clues
Wickelmaier [9] Envelopment, Naturalness, Elevation, with MDS dimensions). For example MDS has been used
Width, Distance. in order to highlight the perceptual dimensions of auto-
mobile noise [12] and music timbre [13]. In [14], one-
through eight-channel reproductions of multichannel pop-
research has not yet been done to indicate whether a sepa- ular music recordings were made in an anechoic chamber.
rate rating for stereophonic image quality as distinct from Preference judgments of each reproduction and similar-
basic audio quality is warranted.” ity judgments among them were made by ten listeners.
As a result of MDS, it was found that this multichannel
0.2 Audio Attributes recording and reproduction were characterized by three sen-
Recently, studies have focused on choosing the right sory features: fullness, clearness, and depth of the image
words to represent the listening experience in order to pro- sources.
vide accurate terms to qualify sounds, and a broad array In [15], various types of source material (from urban
of methods has been used to elicit vocabulary suitable for soundscapes to musical passages) were played on three sys-
audio and spatial audio applications. Some of the studies tem configurations (1-D, 2-D, and 3-D loudspeaker arrays).
have been aimed toward finding what attributes are present In a first experiment, relevant criteria for sound quality were
and/or perceivable using verbal descriptors as a way of cap- identified by linguistic analysis of spontaneous verbal de-
turing attributes. Table 1 gives a list of terms from some of scriptions. This exploratory study of verbal descriptors re-
these studies, which are described below. sulted in six parameters (presence, coloration, readability,
Berg and Rumsey [7] adapted the Repertory Grid Method timbre, localization, and stability of the image). In a second
[8] for audio evaluation. The principle is that subjects are experiment, the three configurations were evaluated using
shown to be more reliable when using their own language scale judgments and free responses along these parameters
than that of others. In the elicitation part of this method over a wider range of auditory situations.
stimuli were presented by triad, and each subject was asked In [16], reproducibility of listeners’ assessment was stud-
which of the three sounds differed most from the others, ied as a function of their hearing thresholds and age. Lis-
then in what way the selected sound differed and in what teners had to evaluate the quality of reproduction (mono

J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November 737
LE BAGOUSSE ET AL. PAPERS

or stereo) of music excerpts played on loudspeakers, by as-


sessing fifteen attributes (in two categories—spatial quality
and sound quality).
In [17], listeners had to evaluate single instruments (5.0
recordings) played with a 5.0 setup or stereo or mono down-
mix. The aims were (i) to verify whether the subject group
was able to make significant distinctions between the stim-
uli in the test using the attributes provided, (ii) to class these
attributes based upon the subjects’ understanding, and (iii)
observe the relation between attributes (principal compo-
nent analysis, cluster analysis, correlation).
In [18], listeners were provided with lists of 60 adjectives Fig. 1. MURAL model [22].
(chosen on the basis of results from questionnaires given
to 40 sound engineers) and were asked to designate how
well each adjective characterized the reproduction of mono an exhaustive and highly precise description of the sound
audio excerpts (music, speech, and sounds from everyday perception but rather to obtain an estimation of the overall
life) played on loudspeakers, by writing a figure from 0 to 9 quality of a sound, which a few categories of attributes
for each adjective. The same experiment was then done with could make explicit. The maximum number of categories
other excerpts played on headphones, with a reduced list of to include in a test of quality assessment depends on the
30 adjectives. Finally the adjective ratings were subjected number of excerpts/codecs in this test, but we assume that
to a principal component analysis that gave four factors for in most cases five categories (in addition to the BAQ) would
loudspeaker reproduction and five factors for headphone be a maximum.
reproduction.
In [19], a technique for the development of a descriptive 0.3 Categorizations
language was applied to spatial sound reproduction sys- Several categorizations have been proposed in the litera-
tems. Twelve direct attribute scales were developed. Eight ture.
of the scales represented spatial characteristics and four Letowski proposed the MURAL model (Fig. 1), in which
represented timbral characteristics. For each attribute scale the audio image is divided into two main attributes: timbre
a descriptor was developed for its positive and negative and spaciousness [22]. The categories grow successively as
direction. increasingly precise attributes are added.
Lohro [20] used the same method in order to compare the Berg and Rumsey [23] suggested a generic model for
perceptual characteristics of spatial enhancement systems audio quality that would include timbral, spatial, technical,
for stereo headphone reproduction. Five musical programs and miscellaneous quality attributes.
were processed with different algorithms. Each of the sub- Lohro [24] compared the perceptual characteristics of
jects developed their own set of attributes in three hours two subsets of algorithms representing different approaches
and performed a comparative evaluation of the stimuli. Fifty to spatial enhancement for headphones, including stereo
stimuli were presented to listeners who developed the set of enhancement systems and virtual home theater systems
attributes describing and discriminating these stimuli in an for headphone reproduction. He used a verbal elicita-
analytical manner. The vocabulary development resulted in tion method, known as Individual Vocabulary Profiling, in
a set of 16 attributes describing the perceptually important which each subject developed their own descriptors; then
auditory characteristics of the stimuli. he analyzed the collected data with a hierarchical cluster-
All these studies have yielded terms (given in Table 1) ing analysis and PCA correlation loadings. From these two
that a listener can use to qualify a sound. analyses, he identified three important perceptual dimen-
It should be noted that this is a non-exhaustive list and sions: low-frequency emphasis, spatial aspects, and timbral
that the definition of some terms may lead to confusion, as aspects of sound reproduction over headphones.
words used are not identical between the different studies Zielinski et al. [25] showed that, for a given overall band-
[21]. Richness, for example, was described by Koivuniemi width, downmixing was less detrimental to the BAQ than
and Zacharov as the homogeneity of the timbre [19], but band elimination of individual channels of multichannel
Lorho described it as a “combination of harmonics and recordings. In their perceptual study they used three cate-
dynamics perceived in a sample” [20]. Such differences gories: timbral fidelity, frontal spatial fidelity, and surround
can bias the results of listening tests. spatial fidelity.
Moreover, it is not possible to include all of the attributes The aim of the present study is to verify the validity of the
listed in Table 1 in a single listening test (of the MUSHRA categories described above with the objective of integrat-
or BS-1116 type) because this would take an unreasonable ing the categories found into a recommendation for audio
length of time and be too complex for the assessors. By quality evaluation. Two methods, (1) an MDS and (2) a
defining more general categories of sound attributes, such free categorization method followed by a cluster analysis,
problems could be alleviated. Obviously, combining more were each used to group the sound attributes into more gen-
than one factor in a single category reduces specificity. eral categories. The categorizations were based on words
Nevertheless, the goal of a test like MUSHRA is not to have only without presenting sound sequences to the subjects.

738 J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November
PAPERS CATEGORIZATION OF SOUND ATTRIBUTES FOR AUDIO QUALITY ASSESSMENT – A LEXICAL STUDY

Table 2. List of sound attributes to be categorized. To minimize these differences, the translation was done by
the three French authors (one of them was also a sound
SOUND ATTRIBUTES engineer) and a professional native translator specializing
Background noise/Bruit de fond in scientific translations. However some nuances of mean-
Brightness/Brillance ing could cause slight differences between French and En-
Clarity/Clarté glish categorization studies. Some of these are discussed in
Coloration/Coloration Sec. 5.1.
Depth/Profondeur
Disruption/Coupure
Distance/Distance 2 TEST PANEL
Distortion/Distorsion
Dynamics/Dynamique Eighteen assessors without hearing loss participated in
Envelopment/Enveloppement
Equalization/Equalisation
both of the study’s experiments, which were conducted
Fidelity/Fidélité around one week apart. To mitigate any influencing effects
Hardness/Dureté brought about by participation in the first experiment and
Hiss/Sifflement to reduce any assessor bias on responses in the second ex-
Homogeneity/Homogénéité periment that assessors participated in, half of the panel
Hum/Bourdonnement
Immersion/Immersion
started with “Multidimensional Scaling” and the other half
Localization/Localization with “Free Categorization.” The panel consisted of three
Noise/Bruit women and fifteen men, all of whom had some experi-
Realism/Réalisme ence with audio tests and worked in the audio or music
Reverberation/Réverbération industry. They could therefore be considered as experts
Richness/Richesse
Sharpness/Précision
as recommended by MUSHRA [2]. Most of the assessors
Spatial distribution/Distribution spatiale were high-level musicians and a few were sound engineers
Spatialization/Spatialisation or audio researchers. Ages ranged from 18 to 45 years
Stability/Stabilité old. These subjects currently participate as listeners in tests
Tone color/Couleur du timbre on sound quality (for codec, sound reproduction systems,
Width/Largeur
etc.). So, although they were not specifically “trained” to
use descriptors, their professional backgrounds would sug-
gest that they are acquainted with sound descriptors. The
Both tests were conducted in the French language. Finally,
members of this panel did not participate as assessors at the
the results of the two tests were compared to define sound
attribute selection stage.
attribute categories.
The experiments were conducted in a soundproof cabin
even though they did not involve any actual listening.
1 SOUND ATTRIBUTES

We revised the attribute list prior to the trials taking 3 EXPERIMENT A: CATEGORIZATION BY
care to avoid terms that were antonyms (e.g., softness and MULTIDIMENSIONAL SCALING
hardness) in the different categories. The removal of these
The first method used to categorize the sound attributes
terms avoided bias and multiplication of categories. The
was an MDS [26, 27]. This method can reveal dimensions
attribute list (Table 1) was then submitted to 12 assessors.
of perception and hidden meaning in the data [28] by mea-
These subjects were considered audio experts because they
suring similarity/dissimilarity between variables.
regularly take part in audio assessment experiments (espe-
cially about audio on the Internet or dial-up lines). Subjects
were told that the proposed attributes would be applied to 3.1 Procedure
applications like codec evaluation, broadcasts, services on Assessors were presented with all possible pairs of sound
demand, internet, etc. Attributes that were considered not attributes out of the 28 selected and were asked to assess
to be relevant for evaluating audio quality by at least half their similarity. This was done via a computer interface
of these assessors were removed from the list (note that created in Matlab using a continuous scale from 0 to 1,
the assessors made their judgment based on the attribute’s as it has been shown that semantic intervals between la-
verbal descriptor only: with no additional description of the bels on a scale are unequal [25]. Consequently, the scale
attribute was available to them). Then, only attributes cited chosen for the present study had no labels, except at the
in at least two studies were kept. Attributes including read- endpoints, which were marked “very similar” and “very
ability, naturalness, penetration, etc., were therefore left dissimilar” [12]. The scale provided for the assessment was
out. The finalized list contained 28 attributes representing given by 100 points between the two extremities. Each as-
different aspects of perceived sound (Table 2). Although sessor judged the difference between a pair of terms by
the selected terms were French ones (as the study was con- positioning a cursor on this scale and clicking “next” to
ducted with French speakers), their English equivalents are move on to the next pair. The assessor could not then return
used throughout this paper for ease of readability. Some to the previous pair. This was a lexical test only, without
differences exist between the English and French terms. any sounds. The assessors based their responses on their

J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November 739
LE BAGOUSSE ET AL. PAPERS

Table 3. Weight of dimensions, considering a semantic space


with 5 dimensions.

Dim. 1 2 3 4 5

Weight 0.1550 0.1040 0.0699 0.0530 0.0517

Fig. 2. Stress and RSQ curves as a function of the number of


dimensions.

own interpretation of the attributes and were not allowed to


check their definitions. They did not have access to refer-
ence materials such as dictionaries.
Before starting the main test, the assessors were given a
five-minute familiarization phase with the interface and the
attributes to be compared.
The test itself contained the 28 selected attributes in every
possible pairwise combination: [n * (n – 1)]/2 or 378 pairs Fig. 3. Semantic space, dimensions 1 and 2.
were evaluated. Test duration was around one hour.
The values between 0 and 1 that each assessor had given
to each pair of perceptual attributes were recorded in square
symmetrical matrices (28 × 28).

3.2 Multidimensional Scaling Results


The values obtained for the pairs of attributes correspond
to distance measurements between them, which can be used
to construct a multidimensional geometric representation
or so-called perceptual space [30] (or rather a semantic or
lexical space in this application).
The Individual Difference Scaling (INSDCAL) model
is a weighted MDS [31]. This model was selected be-
cause it takes into account inter-individual differences, i.e.,
the weight given to the dimensions by the subjects [32].
Non-metric MDS [33], in contrast to metric MDS, favors
the order of closeness rather than exact values when con-
structing perceptual (or semantic) space [26]. The SPSS
Fig. 4. Semantic space, dimensions 2 and 3.
(Statistical Package for Social Science) software package
can be used to run a non-metric INDSCAL procedure
[34] that constructs solutions with between two and six Table 3 indicates the weight of each dimension in the
dimensions. space. Note that the first two dimensions have a greater
Two main parameters from the analysis, stress and RSQ, weight than those that follow.
allow us to determine the optimal number of dimensions to A dimension is a perceptual axis and all the sound at-
represent the semantic space. Through an iterative process, tributes are distributed along it. Here, rather than interpret-
stress illustrates the differences between the distances in the ing these dimensions, the goal is to find a way to classify the
graphic representation and the disparities in the observed sound attributes. Each sound attribute can only belong in
distances by minimizing their difference. RSQ is the pro- one category. Figure 3 shows the distribution of the sound
portion of the variance in the data explained by the solution attributes according to the first two dimensions considering
(squared correlation coefficient) [35]. the spatial coordinates of all five.
The appropriate number of dimensions is found when Through simple inspection of the plots in Fig. 3 three
the addition of one more dimension makes only a negli- groups of attributes can clearly be distinguished (we did not
gible difference to the stress and RSQ [36]. Based on this use a clustering algorithm at this point). Homogeneity and
approach, the best number of dimensions seems to be five Equalization were included in group 2, although this classi-
(stress = 0.20, RSQ = 0.43). Indeed, the curve of plotted fication appears to be less clear than for the other attributes.
RSQ values (Fig. 2) levels off after the fifth dimension and Fig. 4 shows the attributes according to the second and
the same pattern can be seen, to a lesser extent, for stress. third dimensions.

740 J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November
PAPERS CATEGORIZATION OF SOUND ATTRIBUTES FOR AUDIO QUALITY ASSESSMENT – A LEXICAL STUDY

Table 4. Classification according to the MDS (group 2 can be


split in two sub-groups).

Group 1 Group 2 Group 3

Background noise Fidelity Reverberation


Noise Hardness Localization
Hiss Stability Spatial distribution
Disruption Equalization Envelopment
Hum Brightness Width
Distortion Tone color Distance
Dynamics Immersion
Realism Spatialization
Clarity Depth
Coloration
Homogeneity
Sharpness
Richness
Fig. 5. Test interface for free categorization.

Group 2 can be seen to split into two subgroups contain- MDS experiment, the assessors did not have access to the
ing the following attributes: attribute definitions and had to rely on their own judgment.

r Richness, Brightness, Hardness, Tone color, Clarity, Col- 4.2 Cluster Analysis
oration, and Equalization. MDS is an indirect method of categorization. The task
r Realism, Fidelity, Stability, Sharpness, Homogeneity, required in the second phase of this study was to directly
and Dynamics. compose the categories, contrary to the previous phase,
that was a measure of distance between pairs of attributes.
With the space given by dimensions 1 and 2 (Fig. 3), the Cluster analysis is a categorization method: a method of
terms Homogeneity and Equalization were close to the two grouping objects or groups of objects by calculating dis-
groups 2 and 3. With the space given by dimensions 2 and 3, tances between them. Thus, it is of interest to compare the
the attributes on the left side of Fig. 4 are those of group 3, results of the two tests with their different analysis.
and attributes on the right side are those of group 1 (center- In the present study Agglomerative Hierarchical Cluster-
right), and those of the two subgroups of group 2 (top-right ing (AHC) was used in which elements fuse progressively
and bottom-right). Homogeneity and Equalization are in the with the point closest to them according to the chosen dis-
center of Fig. 4, and again, it is not precisely clear if they tance. After an AHC, a dendrogram is generated showing
should belong in group 3 or in the subgroups of group 2. the groups formed. The applied AHC is a method developed
Nevertheless the two terms seem slightly closer to the by Ward, based on the calculation of inertia [37]. Inertia is
subgroups of group 2. As a consequence they were classi- the squared distances between the centers of gravity of the
fied in group 2, as listed in Table 4. classes (a class can be a single attribute or a group of at-
No further information on grouping was gleaned by an- tributes). The objective is to minimize the total inertia in
alyzing dimensions 4 and 5. In sum, the MDS method al- such a way that, when two classes are fused, the increase of
lowed us to classify the list of sound attributes into three cat- the intraclass inertia is minimal (Huygens’ theorem). The
egories (one of which was divided into two sub-categories). distance formula is as follows:
1 
n
x̄r − x̄s  2 2
d (r, s) = nr n s
2
with x̄r = xr
nr + n s n r i=1 i
4 EXPERIMENT B: FREE CATEGORIZATION
AND CLUSTER ANALYSIS where  2 is the Euclidian distance and x̄r and x̄s are the
centroids of clusters r and s, and nr and nS are the numbers
4.1 Procedure of elements of clusters r and s, respectively.
For the second test, a free categorization method was Eighteen tables were built in this experiment: one for
used. The assessors used an interface that allowed them each assessor. Nine out of the 18 participants made four
to drag and drop attributes from one column to another to categories, five participants made five categories, and the
freely make up their own categories (Fig. 5). remaining four participants made three categories, giving
The list given contained the same 28 attributes as in the a total of 73 categories of attributes in all. A 28 × 73
MDS experiment. Initially, the attributes were all listed in matrix was built with the attributes along one side and the
the first column of the interface. The only instruction pro- categories along the other. To fill this matrix, the value 1
vided was to make at least two categories and a maximum was given when an attribute was included in a category and
of five (no attribute could be left out). Finally, the assessors the value 0 when it was not (Fig. 6). The Ward method was
had to name the categories that they had created. As in the then applied to this matrix.

J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November 741
LE BAGOUSSE ET AL. PAPERS

Fig. 6. Dendrogram with clustering results.

742 J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November
PAPERS CATEGORIZATION OF SOUND ATTRIBUTES FOR AUDIO QUALITY ASSESSMENT – A LEXICAL STUDY

associated with categories: one was associated with Space,


another related to the perceived Defects and a final one con-
cerned Quality. Quality was split into two subgroups. Thus,
the four categories identified by the present study were:

r Defects: these are interfering elements or nuisances


present in a sound, e.g., Noise, Distortion, Background
noise, Hum, Hiss, Disruption, etc.;
r Space: refers to spatial impression-related characteris-
tics, e.g., Depth, Reverberation, Width, Distance, Lo-
calization, Spatial distribution, Spatialization, Envelop-
ment, Immersion, etc.;
r Timbre: deals with the sound color, e.g., Brightness, Tone
color, Coloration, Clarity, Hardness, Equalization, Rich-
ness;
r Quality: is made up of Homogeneity, Stability, Sharp-
ness, Realism, Fidelity, and Dynamics.

5.1 Quality Category


The case of Sharpness needs to be discussed. Looking at
the attributes within the lower cluster in group 1a in Fig. 7,
Fidelity and Realism are clustered with Sharpness. In the
present study, Sharpness was translated from Pre´cision
(Fr.), which evokes something that is precise or rich in
detail. Gabrielsson and Sjo¨gren [18] described Sharp-
ness with adjectives such as sharp, hard, shrill, screaming,
Fig. 7. Classification of attributes by cluster analysis. pointed, clashing, as opposed to soft, mild, calm/quiet, dull,
and subdued. This implies that the French assessors in the
current study may have been dealing with a different at-
4.3 Free Categorization Results tribute to that elicited by Gabrielsson and Sjo¨gren. With
Fig. 7 is a simplified version of Fig. 6, showing the results another translation, Sharpness might have been categorized
of the cluster analysis for the grouping of the attributes. in another way.
Three main groups were formed. Group 1 is divided into Otherwise, looking at the remaining attributes within
two subgroups, a and b. group 1a (Fig. 7), Dynamics, Stability, and Homogeneity all
Concerning the names of the groups, the dendrogram seem to express some dimension of quality. Group 1a seems
made it possible to associate the names that the assessors be less of a descriptive attribute category and rather more
gave to the categories of attributes formed. of an affective [38] or an attitudinal [7] attribute category.
A name was chosen for each category based on the num- Hence, it could be influenced by the subject’s emotional
ber of times it was used by the assessors. Thus, with nine state.
uses group 2 was named Defects. Group 3, which con- Quality includes attributes that do not seem homoge-
cerned the spatial aspects of sound, was named Space. neous. The Quality category might be compared to the
Group 1 could be named Quality, considering the num- “miscellaneous quality” proposed by Berg and Rumsey that
ber of occurrences of this term. Group 1 was divided into related to remaining properties [23]. They suggested four
two subgroups: group 1a, which was named Quality, and categories:
group 1b, which was named Timbre.
r Timbral quality: relating to the Tone color;
r Spatial quality: relating to the three-dimensional nature
5 DISCUSSION
of the sound sources and environments;
It should be noted that although half of the panel per- r Technical quality: relating to Distortion, Hiss, Hum, etc.;
formed the two experiments in the opposite order to the r Miscellaneous quality: relating to the remaining proper-
other half, no effect of this order was detected. The two ties.
methods used, MDS and free categorization with cluster
analysis, gave the same results. Categorizations composed The three other groups in this previous paper were similar
in this study revealed three major categories of sound at- to the three categories named Timbre, Space, and Defects
tributes, one of which was split into two subgroups. Thus, highlighted in the present work.
three or four categories can be taken into consideration. The The categories found in the present study could be in-
free categorization made by assessors allowed names to be corporated into listening tests and their usefulness verified

J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November 743
LE BAGOUSSE ET AL. PAPERS

as assessment criteria for audio quality testing. It would be because the three other categories were larger. Moreover,
interesting to examine the value of a multi-criteria test and it is possible that low-frequency emphasis was specific to
to measure the relative weight of each of these categories headphone enhancement.
in an evaluation of overall quality. Finally, in the study of Gabrielsson and Sjögren [18]
In most recommendations of audio quality evaluation about loudspeakers (stereo restitution), the 60 adjective rat-
(like MUSHRA or BS.1116), the BAQ (Basic Audio Qual- ings were examined by a principal component analysis,
ity) is the main (and generally the only) attribute to be which gave four categories for loudspeaker reproduction,
assessed. This basic audio quality does not mean the same of which one was Disturbance noise. As a conclusion, it
thing as the category Quality identified in our present study. can be thought that the presence of the category Defects in
As a consequence, if sounds were assessed according to our experiments was not specific to the “codec” context of
the categories Defects, Space, Timbre, and Quality in a this previous study.
MUSHRA-type test, in addition to classic BAQ assessment, The non-presentation of sounds in this study makes two
the lexical proximity of the Quality category and the BAQ points of the experiment particularly crucial (and poten-
could be thought to create a bias. Moreover, attributes in tially responsible for the presence of the Defects category):
the category Quality seem quite heterogeneous (as in the first, the instructions for the categorization task and, second,
Miscellaneous quality proposed by Berg and Rumsey [23]). the setting up of the attribute list. Regarding the instructions
for categorization, no experimental context was given: with
5.2 Timbre and Space Categories the MDS, subjects just had to provide a distance between
proposed terms; with the free categorization, subjects had
Timbre and Space categories are present in all catego-
to group the terms on the basis of their semantic similar-
rizations made by previous authors [1, 22, 23]. To evaluate
ity. However, when setting up the list of 28 attributes, the
5.1 surround sound systems, Zielinski et al. [29] evalu-
subjects were informed that they had to exclude the terms
ated timbral fidelity and split spatial aspects into two types:
that did not seem suitable to qualify a sound in the con-
frontal spatial fidelity and surround spatial fidelity (based
text of radio/broadcast/internet. This instruction was given
on attributes proposed by the ITU [1, 2]). These general
because this study formed part of a larger project on eval-
attributes were then included in listening tests. The term
uation (MUSHRA-type) of sounds of intermediate quality
Fidelity was used instead of Quality because the audio
(codec, internet, broadcast, etc.). It is possible that this con-
tests included an explicit reference. Rumsey et al. [39] and
text led to the selection of some specific terms in the list of
Marins et al. [40] thus showed that timbral fidelity had more
the 28 attributes and that these produced the Defects cate-
influence than spatial fidelity on basic audio quality.
gory. Were this the case, this category would be reliable in
the context of the present study but less for other applica-
5.3 Defects Category tions (such as assessments of loudspeakers or multichannel
The Defects category has been little covered in past stud- recordings). Indeed, the list of 28 attributes contains terms
ies. The presence of this category could be due to the con- like Hiss or Disruption, which could seem too specific and
text “codec” of the present study. Nevertheless, this point not always relevant due to their rarity in a study about the
is questionable. First, Berg and Rumsey [23] used this cat- quality of loudspeakers or recordings. However, the Defects
egory, although their study did not concern this type of ap- category also contains terms like Noise or Distortion, which
plication. In the Lohro study, about evaluation of a spatial would be reliable in every context (transducers, amplifica-
enhancement system [20], the panel divided the descriptive tion, sound processing, etc.). As a conclusion, the Defects
scales into three groups relating to localization, space, and category, found in the present context, could probably be
timbre. Attributes in the localization group described sim- extended to other types of evaluation.
ple geometric associations between the perceived sounds Additionally, as Defects have rarely been assessed in past
and the listener. The second group was composed of at- studies it would be interesting to observe its weight with re-
tributes relating to the space perceived by the listener, and spect to the overall quality and compare this with the weight
the attributes of the third group concerned aspects of the of the Timbre and Space attributes. Some other audio tests
sound samples relating to timbre (including Separability, already offer four attributes linked to coding artifacts [41,
Tone color, Richness, Distortion, Disruption, Clarity, and 42]: Band limitation, Birdies, Temporal smearing, and Spa-
Balance of sounds). No explicit Defects group was iden- tial distortions. These four types of impairment were evalu-
tified. However, the attributes Distortion and Disruption ated and compared using the basic audio quality (BAQ). The
were present in the category Timbre. It is worth noting that results revealed that Band limitation and Temporal smear-
spatial enhancement is likely to create artifacts like those ing can be classified as the attributes that most affect the
encountered with codecs. In another study on spatial en- overall quality, whereas Birdies and Spatial distortions af-
hancement for headphones, including stereo enhancement fect the basic audio quality the least. However, broader cat-
systems and virtual home theater systems for headphone egories like Timbre, Space, and Defects were not compared.
reproduction [24], Lohro identified three important percep-
tual dimensions: low-frequency emphasis, spatial aspects,
and timbral aspects of sound reproduction over headphones, 5.4 A Lexical Study
but in fact found five categories, one containing mainly oc- The present study is exclusively lexical: subjects did
currences of terms about noise. This group was not kept not listen to any sounds, in contrast to most studies about

744 J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November
PAPERS CATEGORIZATION OF SOUND ATTRIBUTES FOR AUDIO QUALITY ASSESSMENT – A LEXICAL STUDY

elicitation or attribute categorization. An experiment with present in numerous quality assessment tests: a physical
listening would have allowed listeners to categorize the at- degradation is applied to a sequence to give the listener the
tributes about sounds that the experimenter proposed. While worst possible quality (thus the worst assessment) in the test
it could be thought that this approach would have been more [1, 2]. This makes it possible to set the response range of the
correct and precise, as listeners would have just listened to test and to reinforce its repeatability and consistency across
the sounds they had to qualify, such a method would only listeners. For example, in the MUSHRA test with BAQ as-
be reliable for the sounds proposed by the experimenter. In sessment alone, the anchor is commonly a low-pass 3.5 kHz
order to generalize from this type of study, many excerpts filtering. If the BAQ estimation is accompanied by an as-
would need to be used, with a large number of versions sessment of categories, some other anchors, specific to each
(with bandwidth modifications, noise addition, spatial dis- category, could be created. However, as the categories are
tortions, etc.). In the present study subjects were asked to multidimensional, the choice of the anchor is problematic.
group the attributes (using both a direct and an indirect For example, a monophonic reduction of a recording does
method), without any sounds being played. This type of not drastically lessen the Space category. In a MUSHRA
task, which is exclusively lexical (or semantic), is possibly test with an assessment of the Space category, Le Bagousse
not as correct as a categorization made with sounds played et al. created a low anchor (monophonic reduction and L-R
simultaneously; nevertheless, it enables another approach channel inversion) dedicated to this category [43]. Listeners
that could probably be applied more widely to the ensemble assessed this anchor in the middle of the range. If subjects
of sounds we hear in the everyday life. had to judge the Localization attribute, the anchor would be
expected to be efficient. Nevertheless the category Space
included such a large number of attributes in addition to
5.5 Benefits and Drawbacks of Categorization Localization that the anchor proved to be inefficient. The
The main planned application of the categorization real- difficulty of finding a spatial anchor to be assessed at the
ized in this study is the integration of categories into audio bottom of the response range has also been highlighted in
quality assessment tests, especially in the context of codecs, other studies [44, 45].
loudspeakers, recordings, etc. Adding such information to To summarize, categorization presents some drawbacks
the single BAQ (which is usually used in this type of test) (the multidimensionality of categories leads to imprecision,
could make explicit the global quality of a sound. non-orthogonality between categories, problems with low
The use of a small number of attribute categories rather anchors, etc.). Nevertheless, the use of a small number of
than numerous attributes offers two benefits. First, it makes attribute categories rather than numerous attributes can pro-
it possible to keep the experimental duration reasonably vide a better understanding and can drastically reduce the
short. Otherwise, as a category is in essence very generic, duration of a test.
the imprecise understanding of its meaning is less critical
than in the case of a specialized attribute. For example,
although everyone would be expected to have a good un- 6 CONCLUSION
derstanding of Timbre reproduction or Space reproduction,
This study enabled us to class a list of attributes into a
two individuals could have two different representations of
smaller number of more general categories with the aim
the Clarity, Hardness, or Brightness of a sound.
of using these as axes of perceived quality in a listening
However, the use of attribute categories could lead to
test. The two methods used in this study, MDS and free
some biases.
categorization with cluster analysis, led to the same results
First, the use of a small number of attribute categories
and correlated with previous studies on the clustering of
rather than numerous attributes reduces the information
sound attributes. Of the categories found, one is related to
given by a test. For example, if a listener points out that the
Space, a second is associated with Defects, and a third is
category Space is altered, this does not indicate whether the
split into the two subgroups Timbre and Quality.
problem comes from a distortion of the frontal scene or a
degradation of the reverberation.
The more the categories are generic, the higher becomes 7 REFERENCES
the risk that they will not be orthogonal. For example, an
impoverishment of high frequencies of a source will lead [1] ITU-R BS.1116, “Methods for the Subjective As-
to a degradation of its Timbre, but could also potentially sessment of Small Impairments in Audio Systems including
modify its localization. In such a case, the categories Tim- Multichannel Sound Systems,” Technical report, Interna-
bre and Space would be poorly assessed. Another example tional Telecommunications Union, Radio-communication
is that harmonic distortion would be expected to simultane- Assembly (1997).
ously degrade the Defect and Timbre categories. If a simple [2] ITU-R BS.1534, “Method for the Subjective Assess-
attribute Distortion was present it would be specifically ment of Intermediate Quality Level of Coding Systems,”
designated. A given technical problem or a given physical Technical report, International Telecommunications Union,
alteration could have perceptual consequences for several Radio-communication Assembly (2003).
categories. [3] R&D White Paper WHP 022, “Digital Radio Mon-
The multidimensional property of categories also raises diale (DRM): Compliance Testing and Specification Vali-
the problem of the choice of low anchor. This anchor is dation” (February 2002).

J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November 745
LE BAGOUSSE ET AL. PAPERS

[4] J. Stott, “DRM - Key Technical Features,” EBU tech- Audio Engineering Society (1989 Oct.), convention paper
nical review (March 2001). 2825.
[5] EBU project group B/AIM, BPN 029, EBU Report [23] J. Berg and F. Rumsey, “Systematic Evaluation of
on the Subjective Listening Tests of Some Commercial Perceived Spatial Quality,” the 24th AES International Con-
Internet Audio Codecs (October 2000). ference on Multichannel Audio (2003 June), conference pa-
[6] DAB/DAB+/DMB Receivers, worlddad.org. per 43.
[7] J. Berg and F. Rumsey, “Identification of Quality [24] G. Lorho, “Individual Vocabulary Profiling of Spa-
Attributes of Spatial Audio by Repertory Grid Technique,” tial Enhancement Systems for Stereo Headphone Repro-
J. Audio Eng. Soc., vol. 54, pp. 365-379 (2006 May). duction,” presented at the 119th Convention of the Au-
[8] G. Kelly, The Psychology of Personal Constructs dio Engineering Society (2005 Oct), convention paper
(Norton, New York, 1955). 6629.
[9] S. Choisel and F. Wickelmaier, “Extraction of Audi- [25] S. K. Zielinski, F. Rumsey, R. Kassier, and S. Bech,
tory Features and Elicitation of Attributes for the Assess- “Comparison of Basic Audio Quality and Timbral and Spa-
ment of Multichannel Reproduced Sound,” J. Audio Eng. tial Fidelity Changes Caused by Limitation of Bandwidth
Soc., vol. 54, pp. 815–826 (2006 Sep.). and by Down-Mix Algorithms in 5.1 Surround Audio Sys-
[10] B. Ganter, R. Wille, and C. Franzke Formal tems,” J. Audio Eng. Soc., vol. 53, pp. 174–192 (2005 Mar.).
Concept Analysis: Mathematical Foundations (Springer- [26] I. Borg and P. J. F. Groenen, Modern Multidimen-
Verlag, Berlin-Heidelberg-New York, 1997). sional Scaling: Theory and Applications (Springer, New
[11] J. P. Doignon and J. C. Falmagne, Knowledge York, 2005).
Spaces (Springer-Verlag, Berlin-Heidelberg-New York, [27] J. B. Kruskal and M. Wish, Multidimensional Scal-
1998). ing (Sage, Beverly Hills, 1978).
[12] P. Susini, S. McAdams, and S. Winsberg, “A Mul- [28] F. Rumsey, “Subjective Assessment of the Spatial
tidimensional Technique for Sound Quality Assessment,” Attributes of Reproduced Sound,” 15th AES International
Acta Acustica, vol. 85, pp. 650–656 (1999). Conference on Audio, Acoustics & Small Spaces (1998
[13] J. M. Grey, “Multidimensional Perceptual Scaling Oct.), conference paper 15-012.
of Music Timbres,” J. Acoust. Soc. Am., vol. 61, no. 6, pp. [29] S. Zielinski, P. Brooks, and F. Rumsey, “On the Use
122–135 (1977). of Graphic Scales in Modern Listening Tests,” presented at
[14] T. Nakayama, T. Miura, O. Kosaka, M. Okamoto, the 123th Convention of the Audio Engineering Society
and T. Shiga, “Subjective Assessment of Multichannel Re- (2007 Oct.), convention paper 7176.
production,” J. Audio Eng. Soc., vol. 19, pp. 744–751 (1971 [30] B. Yannou and P. Deshayes, Intelligence et inno-
Oct.). vation en conception de produits et services (Intelligence
[15] C. Guastavino and B. F. G. Katz, “Perceptual Eval- and Innovation in Conception of Products and Services)
uation of Multi-Dimensional Spatial Audio Reproduction,” (L’Harmattan, Paris, 2006).
J. Acoust. Soc. Am., vol. 116, no. 2, pp. 1105–1115 (2004). [31] J. D. Carroll and J. J. Chang “Analyses of Individual
[16] F. Toole, “Subjective Measurements of Loud- Differences in Multidimensional Scaling via an n-Way Gen-
speaker Sound Quality and Listener Performance,” J. Audio eralization of ‘Eckart-Young’ Decomposition,” Psychome-
Eng. Soc., vol. 33, pp. 2–32 (1985 Jan./Feb.). troka, vol. 35, no. 3, pp. 283–319 (1970).
[17] J. Berg and F. Rumsey, “Verification and Correla- [32] Thierry Etame, “Conception de signaux de
tion of Attributes Used for Describing the Spatial Quality référence pour l’évaluation de la qualité perçue des codeurs
of Reproduced Sound,” 19th AES International Conference de la parole et du son (Design of Reference Signals for
on Surround Sound (2001 June), conference paper 1932. the Perceived Quality Evaluation of Speech and Sound
[18] A. Gabrielsson and H. Sjögren, “Perceived Sound Codecs),” Ph.D. thesis, Université de Rennes 1 (2008).
Quality of Sound Reproduction Systems,” J. Acoust. Soc. [33] J. B. Kruskal, “Nonmetric Multidimensional Scal-
Am., vol. 65, pp. 1019–1033 (1979). ing: A Numerical Method,” Psychometrika, vol. 29, pp
[19] K. Koivuniemi and N. Zacharov, “Unraveling the 115–129 (1964).
Perception of Spatial Sound Reproduction: Language De- [34] Y. Takane, F. W. Yound and J. De Leeuw, “Non-
velopment, Verbal Protocol Analysis and Listener Train- metric Individual Differences Multidimensional Scaling:
ing,” presented at the 111th Convention of the Audio Engi- An Alternating Least Squares Method with Optimal Scal-
neering Society (2001 Nov.), convention paper 5424. ing Features,” Psychometrika, vol. 42, pp. 7-67 (1977).
[20] G. Lorho, “Individual Vocabulary Profiling of Spa- [35] T. Naes and E. Risvik, Multivariate Analysis of Data
tial Enhancement Systems for Stereo Headphone Repro- in Sensory Science (Elsevier Science Ltd., Oxford, 1996).
duction,” presented at the 118th Convention of the Audio [36] J. Tournois and P. Dickes, Pratique de
Engineering Society (2005 Oct.), convention paper 6629. l’échelonnement multidimensionnel: de l’observation à
[21] S. Le Bagousse, C. Colomes, and M. Paquier, “State l’interprétation (Practice of Multidimentional Spreading
of the Art on Subjective Assessment of Spatial Audio Qual- Out: From Observation to Interpretation) (De Boeck-
ity,” 38th AES International Conference on Sound Quality Wesmael, Brussels, 1993).
Evaluation (2010 June), conference paper 5-3. [37] J. H. Ward, “Hierarchical Grouping to Optimize an
[22] T. Letowski, “Sound Quality Assessment: Concepts Objective Function,” J. Am. Stat. Assoc., vol. 58, no. 301,
and Criteria,” presented at the 87th Convention of the pp. 236–244 (1963).

746 J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November
PAPERS CATEGORIZATION OF SOUND ATTRIBUTES FOR AUDIO QUALITY ASSESSMENT – A LEXICAL STUDY

[38] S. Bech and N. Zacharov, Perceptual Audio [42] P. Marins, F. Rumsey, and S. Zielinski, “The Rela-
Evaluation—Theory, Method and Application (John Wiley tionship between Basic Audio Quality and Selected Arti-
& Sons, 2011). facts in Perceptual Audio Codecs—Part II: Validation Ex-
[39] F. Rumsey, S. Zielinski, R. Kassier, and S. Bech, periment,” presented at the 122th Convention of the Au-
“On the Relative Importance of Spatial and Timbral Fideli- dio Engineering Society (2007 May), convention paper
ties in Judgments of Degraded Multichannel Audio Qual- 7079.
ity,” J. Acoust. Soc. Am., vol. 118, pp. 968–976 (2005). [43] S. Le Bagousse, “Élaboration d’une méthode de
[40] P. Marins, F. Rumsey, and S. Zielinski, “Unravelling test pour l’évaluation subjective de la qualité des sons
the Relationship between Basic Audio Quality and Fidelity spatialisés (Development of a Test Method for the Subjec-
Attributes in Low Bit-Rate Multichannel Audio Codecs,” tive Evaluation of the Quality of Spatial Sounds),” Ph.D.
presented at the 124th Convention of the Audio Engineering thesis, Université de Bretagne Occidentale (2014).
Society (2008 May), convention paper 7335. [44] EBU Tech 3324, “EBU Evaluations of Multichannel
[41] P. Marins, F. Rumsey, and S. Zielinski, “The Rela- Audio Codecs,” European Broadcasting Union, 2007.
tionship between Selected Artifacts and Basic Audio Qual- [45] A. Mason, D. Marston, F. Kozamernik, and G. Stoll,
ity in Perceptual Audio Codecs," presented at the 120th “EBU Tests of Multichannel Audio Codecs,” presented at
Convention of the Audio Engineering Society (2006 May), the 122th Convention of the Audio Engineering Society
convention paper 6745. (2007 May), convention paper 7052.

THE AUTHORS

Sarah Le Bagousse Mathieu Paquier Catherine Colomes

Sarah Le Bagousse was born in Ploemeur, France, in sity of Brest. He managed the Image & Sound department
1984. She received a Master’s degree in musical acoustic from 2003 to 2012. His research interests mainly include
at Ircam institut in 2009. Sound quality and spatial sound the evaluation of spatial sound and the perception of head-
are her research interests. She received the Ph.D. degree phone restitution. Dr. Paquier is member of the French

r
in 2014 from the University of Brest in collaboration with Acoustical Society (Sound Perception group).

r
Orange Labs.

Catherine Colomes completed a doctorate thesis in psy-


Mathieu Paquier was born in Bordeaux, France, in 1974. choacoustic and signal processing. She currently works at
He received the B.S. degree in sound engineering from Orange labs as a research and development engineer in au-
the University of Brest in 1997, M.S. then Ph.D. degrees dio quality. Her main subjects of interest are objective and
in acoustics from the University of Lyon in 2002. Since subjective audio quality assessment and new immersive
2003 he has been an Assistant Professor at the Univer- sound technologies.

J. Audio Eng. Soc., Vol. 62, No. 11, 2014 November 747

You might also like