0% found this document useful (0 votes)

48 views29 pages

Sciarretta Dialectometry Revised

This paper presents a new classification of Central-Southern Italian dialects using dialectometric methods, focusing on phonetic features to create a dataset for analysis. A k-means clustering algorithm is applied, resulting in a nine-group classification that is deemed more precise and comprehensive than traditional methods. The study aims to confirm and refine existing classifications by utilizing modern dialectometry techniques to objectively assess dialect variation.

Uploaded by

bucaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views29 pages

Sciarretta Dialectometry Revised

Uploaded by

bucaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Dialectometry-Based Classification of

Central-Southern Italian Dialects

Antonio Sciarretta
April 2023

Abstract
This paper provides a new classification of Central-Southern Italian dialects us-
ing dialectometric methods. All varieties considered are analysed and cast in a
data set where homogeneous areas are evaluated according to a selected list of
phonetic features. Using numerical evaluation of these features and Manhattan
distance, a linguistic distance rule is defined. On this basis, the classification
problem is formulated as a clustering problem, and a k-means algorithm is used.
Additionally, an ad-hoc rule is set to identify transitional areas and silhouette
analysis is used to select the most appropriate number of clusters. While mean-
ingful results are obtained for each number of clusters, a nine-group classification
emerges as the most appropriate. As the results suggest, this classification is less
subjective, more precise, and more comprehensive than traditional ones based
on selected isoglosses.

1 Introduction
The standard classification of peninsular Italian dialects is that proposed by
G.B. Pellegrini [Pellegrini, 1977]. Within the Italo-Romance branch of the
Romance languages, the dialectal areas (systems) identified are: i) Tuscan,
ii) Central (Mediano), iii) Intermediate Southern, and iv) Extreme Southern.
Area i) largely corresponds to Tuscany. Area ii) comprises of four subareas
(Central Marchigiano in Central Marche, Umbrian, Latian in Central-Northern
Latium, and Cicolano-Sabino-Aquilano between Latium and the Abruzzi). Area
iii) is further subdivided into five subareas (Southern Marchigiano-Abruzzese,
Molisano, Apulian, Southern Latian-Campanian, Lucanian-Northern Calabrian).
Area iv) comprises of three subareas (Salentino, Central-Southern Calabrese,
and Sicilian). These subareas, largely inspired by the administrative regions
(Regioni ) of Italy, are further subdivided into sub-sub-areas Ia, Ib, etc., often
corresponding to a provincial (Provincia) level.
In SIL International’s Ethnologue database [Eberhard et al. 2022], upon
which ISO 693-3 is based, Italian (ita) is based on Pellegrini’s Tuscan and Cen-

1
tral, Napoletano-Calabrese (nap) is based on Pellegrini’s Intermediate Southern,
and Sicilian (scn) on Pellegrini’s Extreme Southern. UNESCO’s endangered lan-
guages list [Moseley, 2010] and the Glottolog database [Hammarström et al., 2022]
adopt a virtually identical classification, albeit with slight differences in naming,
even including some of Pellegrini’s subareas.
While Pellegrini’s primary classification is largely based on phonetic and
morphological isoglosses (up to 33 for the whole of Italy), the subarea classifica-
tion in Central-Southern Italy, particularly in the Intermediate Southern area,
does not follow this approach – only three isoglosses are completely included
within the boundaries of the area in question and have virtually no effect on
the definition of subareas – but is rather grounded on administrative subdivi-
sions. For example, the boundaries between Molisano and Southern Latian-
Campanian, or between the latter and Apulian, resp., Lucanian-Northern Cal-
abrian, largely reflect the administrative boundaries between the corresponding
regions.
The goal of this work is to investigate to which extent modern dialectometry
confirms this standard classification. Dialectometry [Séguy, 1973, Goebl, 1982]
aims at providing an objective view of dialect variation through the use of quan-
titative data analysis. In particular, dialectometric clustering has been applied
to several regions, including The Netherlands [Wieling & Nerbonne, 2011], Cat-
alonia [Valls et al., 2012], English dialects [Wieling et al., 2013]. In Italy, rele-
vant examples are mostly concerning Tuscany [Montemagni & Wieling, 2016],
[Calamai et al., 2022].
In these works, various clustering techniques have been applied mainly on the
basis of distance matrices, although other examples exist [Syrjanen et al., 2016].
Distance matrices collect the linguistic distances between any pair of N sites or
areas. Linguistic distance has been defined in several different ways.
One common procedure consists in considering categorical lexical data, that
is, M entries in a linguistic atlas, which may have up to P variants each. A
distance between two sites is then defined by counting the number of pair-
wise variant mismatches for all features. An example is the Relative Difference
Value (RDV), initially used as a difference function for unequivocal outcomes
of features [Goebl, 2010] and later adapted to cover features with multiple pos-
sible outcomes [Pickl, 2014]. A slightly modified metric, the Weighted Iden-
tity/difference Value (WIV), can use weights to emphasise some particular fea-
tures [Goebl, 1982]. This approach has been extended to variables/features
other than lexical, i.e., phonological rules [Valls et al., 2012].
Another approach considers individual word pronunciations, which are con-
verted in edit-distances between strings of characters, typically using one partic-
ular location as a reference. The most common edit distance used is Levenshtein
distance [Levenshtein, 1966], which describe the cost (number of elementary
operations) of changing one string into another or, equivalently, the character
mismatches when the string are opportunely aligned. More refined methods
with variable costs of substitutions (weights) also exist, such as the PMI-based
Levenshtein distance [Wieling et al., 2014]. Once normalized by the length of
alignment, the edit distances between m word pairs can be then aggregated

2
by taking their average [Heeringa, 2004], leading to the distance between two
varieties.
Once a distance matrix is obtained, several analyses can be performed, the
basic ones being beam maps, honeycomb maps, and cluster analysis. Among
clustering techniques, hierarchical clustering such as complete-linkage, UPGMA
or Ward’s has been more often used [Goebl, 2008]. Partitional clustering has
been somehow less used in dialectometry, although both k-means and k-medoid
clustering have been applied on different kinds of linguistic data [Hyvönen et al., 2007,
Burridge et al., 2019, Cheshire et al., 2011, Syrjanen et al., 2016].
Hierarchical clustering or k-medoid can be used directly once the distance
matrix is defined since these methods only need the distance metric between
the sites. On the contrary, k-means requires to evaluate the distance between
actual sites and iteratively updated centroids, which do not correspond to any
site, therefore preventing the use of pre-calculated distance matrices.
Dimensionality Reduction (DR) techniques try to reduce the number of vari-
ables while preserving the variation as much as possible. For instance, Bipar-
tite Spectral Graph Partitioning (BSGP) using Singular Value Decomposition
(SVD) has been used in [Wieling & Nerbonne, 2011, Montemagni & Wieling, 2016,
Wieling et al., 2013]. This technique uses a binary segment substitution matrix
(N × M ) with value Aij = 1 when segment substitution j occurs in variety i.
SVD is applied to produce a synthetic vector of size N + M , which is then pro-
cessed by k-means, in an attempt to simultaneously cluster sites and linguistic
features which give rise to the geographical clustering.
Other dimensionality reduction techniques such as Multidimensional Scaling,
Principal Component Analysis (PCA), or Factor Analysis [Pröll et al., 2014]
are usually used to discover indirectly latent clusters and dialect continua in
the data, e.g., by converting the distance matrix into a N × 3 matrix, then
attributing RGB values to rows and visualizing them on maps. However, these
DR techniques usually do not provide explicit clustering capability.
Recently, spatial Bayesian Clustering (BC) has been applied to linguistic
data by [Romano et al., 2022]. While hard clustering generates clear boundaries
between clusters and thus may fail to represent gradual variations in continuous
dialect data, in BC clustering is fuzzy: each point belongs to every cluster
with a certain probability. Bayesian clustering yields core regions where points
predominantly belong to a single cluster and gradual boundaries where points
belong to multiple clusters with almost equal probabilities.
The data used for clustering are generally the entries of linguistic atlases.
For the region under consideration, the web page of Salzburg dialectometry team
[Goebl et al., 2019] provides a classification based on the AIS [Jaberg & Jud, 1987]
data and two hierarchical clustering algorithms. However, Central-Southern
Italian dialects are classified alongside with other Italian dialects: even set-
ting the number of clusters to the maximum value available (20), only four-five
groups emerge in the region considered. Moreover, the results change dramati-
cally depending on the corpus considered, which is probably due to the relatively
low number of sites (N less than 100) in the corpus.
In this work, we try to consider all Central-Southern Italian varieties, that is,

3
more than a thousand communes in nine regions: The Marches (South of Esino
river), Umbria, Latium, Abruzzi, Molise, Campania, Apulia, Basilicata, and
Calabria. To obtain access to useful and homogeneous data, we select the most
relevant L phonetic features (selected according to three guiding principles)
instead of trying to gather a vocabulary of word entries. Then we apply k-
means clustering to points in an abstract L-dimensional space. Each point
represents a group of varieties that are homogeneous according to the selected
phonetic features and can be represented as strings of numerical values that
describe the outcomes of those features. Thanks to the relatively low dimension
of the dataset (N × L), clustering can be performed directly with the k-means
algorithm, without the need of dimensionality reduction techniques. Distance
can be calculated between any strings, also not representative of any variety,
e.g., the k-means centroids. We adopt the silhouette analysis to choose the most
appropriate number of clusters. Based on that, we propose a heuristic method
to define fuzzy or transitional areas across groups.

2 Method
Varieties are classified according to L = 18 phonetic traits, which are listed in
Table 1. These traits certainly represent a subset of the diatopic variation in
the area considered. Their choice has been made according to three guiding
principles:

• Being sufficiently compact in their areal distribution, thus avoiding the use
of possibly widespread but “darting” phenomena occurring here and there,
e.g., due to diachronic variation and the influence of standard Italian.
This criterion discarded, e.g., the propagation of /u/ in pre-tonic position
(Savoia & Baldi, 2016; Schirru, 2016) or the semivocalization of initial and
intervocalic /v/.
• Being sufficiently widespread, concerning at least two-three provinces. For
this reason, e.g., the palatalization of pre-tonic /a/, which concerns a pos-
sibly compact but limited area in Molise (Iannacito, 2002), was discarded.
• Being sufficiently identifiable, i.e., occurring in at least half a dozen words
that can be retrieved in common speech, written texts, or in the scien-
tific literature. For this reason, e.g., the different outcomes of -TJ- or
-BJ- (Carosella, 2016), occurring in a very few common words, have been
discarded.

4
Table 1: Set of phonetic features considered and their possible
outcomes.

ℓ Phonetic trait – Outcomes Examples xℓ wℓ

1 Metaphony, given /-U/ ‘bed’ 0.5
Absent ["lEt:o] 0
Raising-type ["let:u] 1
Diphthonigization-type ["ljEt:@] 2
Monophthongization-type ["lit:@] 3
2 Metaphony, given /-I/ ‘good’ (pl.) 0.5
Absent ["bOno] 0
Raising-type ["bonu] 1
Diphthonigization-type ["bwOn@] 2
Monophthongization-type ["bun@] 3
3 Vocalic differentiation by position ‘thing’, ‘mouth’ 1
Absent ["kOsa], ["vOk:a] 0
Present (central-southern origin) ["kosa], ["vOk:a] 1
Present (northern origin) ["kosa], ["vok:a] -1
4 Word-final vowels ‘house’, ‘heart’, ‘eight’, 1
‘wolf’
Reduction of all (/@/) ["kas@], ["kOr@], ["Ot:@], 0
["lup@]
Conservation of -a, reduction of ["kasa], ["kOr@], ["Ot:@], 1
others (/a/, /@/) ["lup@]
Conservation of three (/a/, ["kasa], ["kOre]-["kOri], 2
/e/-/@/-/i/, /o/-/u/) or four vowels ["Ot:u], ["lupu]
(with /i/ distinct from /e/-/@/)
Conservation of all five vowels ["kasa], ["kOre], ["Ot:o], 3
(/a/, /e/, /i/, /o/, /u/) ["lupu]
5 Alteration of -LL- ‘horse’ 1
Absent (/ll/) [ka"val:u] 0
Palatal (/j/, /L/, /J/) [ka"vaj:u] 1
Occlusive (/dd/) and retroflex [ka"vad:u] 2
6 Metaphony of -A- ‘hands’ 1
Absent ["man@] 0
Present ["min@] 1
7 Some groups of consonants + L ‘(it) rains’, ‘white’, 1
‘flower’
Standard (/pj/, /bj/, /fj/) ["pjov@], ["b:jang@], 0
["fjor@]
Alteration of /PL/ > /kj/ ["cov@], ["b:jang@], 1
["fjor@]
Further alteration of /BL/ > /j/ ["cov@], ["jang@], ["fjor@] 2
Further alteration of /FL/ > /S/, ["cov@], ["jang@], ["Sor@] 3
/x/ etc.

5
8 Apocope of -no, -ne ‘bread’, ‘wine’ 0.5
Absent ["pane], ["vino] 0
Only -ne ["pa], ["vino] 1
Both ["pa], ["vi] 2
9 Outcomes of -LJ- ‘son’ 1
Palatal (/L/) ["fiL@] 0
Approximant (/j/) ["fij@] 1
Occlusive (/é/) ["fié@] 2
10 Aspiration of -F- ‘coffee’ 1
Absent [ka"fe] 0
Present [ka"he] 1
11 Rhotacization of -D- ‘tooth’ 1
Absent ["dEnd@] 0
Present ["rEnd@] 1
12 Degemination of -RR- and other ‘ground’ 1
geminates
Absent ["tEr:a] 0
Present (of -rr-) ["tEra] 1
Present (of -rr- and others) ["tEra] 2
13 Postnasal sonorization of stops and ‘spring’, ‘when’ 1
progressive assimilation in groups
of /n/ + stops
Both present ["fonde], ["kwan:o] 0
Only assimilation ["fonte], ["kwan:o] 1
Both absent ["fonte], ["kwando] 2
14 “Florentine” Anaphonesis ‘tongue’ 1
Absent ["leNgwa] 0
Present ["liNgwa] 1
15 Some groups of consonants + J ‘arm’, ‘to eat’, ‘to go 1
out’
Standard (/Ù/, /ñ/, /j/) ["vraÙ:@], [ma"ñ:a], ["ji] 0
Alteration of /kj/ > /ţ/ ["vraţ:@], [ma"ñ:a], ["ji] 1
Further alteration of /ngj/ > /nÃ/ ["vraţ:@], [ma"nÃa], ["ji] 2
Further alteration of /j/ > /S/ ["vraţ:@], [ma"nÃa], ["Si] 3
16 Group R + J ‘baker’ 1
Central-southern /r/ [for"naro] 0
Tuscan /j/ [for"najo] 1
17 Group S + J ‘kiss’ 1
Postalveolar (/S/) ["vaS@] 0
Alveolar (/s/) ["vas@] 1
18 Tonic vowel system ‘snow’, ‘month’, ‘cross’ 1
Common Romance ["nev@], ["mes@], ["kroÙ@] 0
“Romanian” ["nev@], ["mes@], ["kruÙ@] 1
“Sardinian” ["niv@], ["mEs@], [’kruÙ@] 2

6
“Sicilian” ["niv@], ["mis@], ["kruÙ@] 3

Traditionally, most of these features are associated with “isoglosses” that

have been used to define dialect groups or subgroups. For instance, pho-
netic trait 4 is the definitory isogloss that separates the Central dialects from
Intermediate-Southern dialects in the classification of Pellegrini.
All varieties in the geographical space considered have been inspected and
attributed a numerical value for each trait. Traits that have just two outcomes
can generate either a digit 0 (in general, absence of that trait) or 1 (presence).
Traits with multiple (P ) outcomes can generate digits ranging from 0 to P −
1 where 0 is generally attributed to the “most standard” outcome, and the
digit increases with the degree of deviation from this standard. The numerical
values of each outcome are equally listed in Table 1. In case of intermediate,
simultaneous, or uncertain outcomes, sometimes fractional values have been
used.
Resulting from this transcription, each dialect corresponds to a string of L
digits, {xℓ }L
1 . Varieties that are geographically adjacent and share the same
string are considered as equal and form a “homogeneous area” (HA) for the
purposes of this study. In the whole space, no less than N = 647 homogeneous
areas have been identified in this way: 111 in Latium, 101 in Calabria, 89 in the
Abruzzi, 83 in Campania, 79 in Basilicata, 76 in Apulia, 40 in Molise, 44 in the
Marches, 24 in Umbria. The localization of these areas is shown schematically
in Fig. 1. Their actual extension and the varieties included in each of them are
detailed in the companion web site.
Each homogeneous area represents one point in the data set used for the
classification. The metrics used is the Manhattan distance
X
Dij = wℓ |xiℓ − xjℓ |, (1)
ℓ

where | · | denotes the absolute value and w is a vector of weights. In this study,
wℓ is always 1 except for ℓ = {1, 2, 8} where w = 0.5 has been used, see Table
1. We note that this procedure is roughly equivalent to ‘count the isoglosses’
between two different locations.
Based on this metric, a k-means algorithm has been used to classify the
N L-dimensional points into K groups. This well-known algorithm tries to
attribute each point to one of the clusters by minimising the within-cluster sum
of Manhattan distances, that is,
K X X
X L
min wℓ |xiℓ − mkℓ |, (2)
k=1 xi ∈Ck ℓ=1

where the centroid mk is defined as the mean of points belonging to cluster k

(Ck ),
1 X
mk = xj . (3)
card(Ck )
xj ∈Ck

7
Figure 1: Localization of the homogeneous areas (circles). Each colour corre-
sponds to one of the administrative regions. Boundaries between regions are
drawn.

8
In practice, the algorithm proceeds iteratively. First, a set of K means is
randomly generated. Then, each point is attributed to the cluster with the
‘nearest’ mean. Further, means are recalculated based on the points attributed
to each cluster. This process is repeated for T iterations. However, the algorithm
is not guaranteed to find the optimum, i.e., the clustering that minimizes the
objective in (2) [Russel & Norvig, 2020]. For this reason, the algorithm is run
for R times, each time with a different (random) initialization of the means.
For each run, the objective is calculated and finally the run with the minimal
objective is chosen as the result. For this study, the algorithm is parametrized
with T = 20 and R = 200K.
To choose the optimal number of clusters K, the silhouette analysis is used.
According to this method, a silhouette metric is defined as a function of the
number of clusters as
bi − ai
σ(K) = (4)
max(ai , bi )
where ⟨·⟩ denotes the average over all points i, and
1 X 1 X
ai = Dij , bi = min Dij (5)
card(CI ) − 1 J̸=I card(CJ )
j∈CI ,j̸=i j∈CJ

are the mean distance between point i and all other points in the same cluster CI
and the smallest mean distance of i to all points in any other cluster, respectively.
The optimal number of clusters is chosen as to maximise the silhouette. The
silhouette coefficient SC = maxK σ(K) summarizes the final result.
It is common opinion that belonging to one particular dialectal group is not a
rigid attribute, but instead, transition bands exist. To retrieve this intuitive be-
haviour in a quantitative way, we have used the following method. We compute
the distance between each HA and the centroids of all clusters,
X
Dik = wℓ |xiℓ − mkℓ |. (6)
ℓ

The lowest distance corresponds by definition to the cluster k to which the

HA is member. If the difference between the second lowest distance (say, with
cluster h) and the lowest distance is less than a specified fraction of the lowest
distance, then that HA is marked as a transitional area between cluster k and
cluster h,
\
i ∈ CKH if Dih < Diℓ ,∀ℓ ̸= {k, h} Dih < (1 + ξ)Dik . (7)

3 Data
Data for all varieties considered have been collected from multiple and diverse
sources, including material covering the phonetics of specific varieties (see Se-
lected Sources: Specific Varieties), larger areas or entire regions (see Selected
Sources: Larger Areas), comprehensive monographies (see Selected Sources:

9
Comprehensive Monographies) and linguistic atlases (see Selected Sources: Lin-
guistic Atlases) including acoustic atlases. Other speech material available on
the web, both ethnographic and spontaneous, has also provided data for cer-
tain dialectal traits. Good dialectal dictionaries, although often written by
non-professional researchers, have been found for many varieties. Dialectal lit-
erature (mostly, poetry) in specific varieties and collections covering broader
areas have been also perused, particularly for those traits that are unambiguous
when written. Many of these non-scholarly sources are listed in the compan-
ion website. Less canonically, many data have been obtained by inspecting,
searching, and sometimes querying dialect-oriented groups in social networks
such as Facebook. Older scholarly data have been systematically checked in the
(written) conversations found in these groups.
As a result, a database containing thousands of observations has been pre-
pared and is available to the readers upon request to the author. Based on that,
the strings for each variety have been constructed and the homogeneous areas
identified.
Inspection of unclassified results provides already some useful insight. For
instance, it is possible to graphically represent on a map the distances from a
given HA, creating similarity maps as defined in [Goebl et al., 2019]. Moreover,
“isogloss maps” and “beam maps” have been also created. Examples of the
latter for all regions considered are shown in the companion website, where only
‘beams’ corresponding to distances D ≤ 1 are plotted, depicting the emergence
of dialectal continua. However, this analysis yields many small continua and a
large number of isolated areas (whose distance with all conterminous areas is
larger than 1), making a significant classification impossible. For this purpose,
the most useful analysis is that of clustering, which is presented in the next
section.

4 Results
Clustering with several values of K ranging from 2 to 11 have been run and
inspected. For higher values of K the overall results become very sensitive to the
random initialization, unstable, and thus are not shown. Table 2 summarizes
the main divides (traditionally, the “isoglosses”) that characterize each new
partition, as well as the new groups that emerge from it.
The silhouette factor as a function of K is shown in 2. Values are given
as the mean of four series of runs plus/minus the standard deviation. When
the latter is small, it means that the results are stable when different series
of runs are executed. As it can be observed, the factor σ generally decreases
with the number of clusters, with the coincidence intervals of two consecutive
K that are generally not overlapping. However, three values of K emerge as
local maxima, namely, K = 2, 4, and 8. These partitions are all very stable,
as evidenced by a variance of the silhouette coefficient low or null. A fourth
cluster number, K = 10 has a mean silhouette factor that is close to that with
K = 9 and presents an overlap of the respective confidence intervals, meaning

10
that for some series of runs the silhouette could be higher with K = 10 than
with K = 9. However, partition K = 10 is also the least stable, with a large
variance due to concurrent clustering results.

Table 2: Clustering results as a function of the no. of clusters K.

New groups in bold.

K Main new divide (w.r.t. Groups identified σ

K − 1)
2 Salerno-Lucera-Vieste Northern space vs. 0.397 ± 0.000
(SLV) Southern space
3 Gaeta-Sora-Termoli (GST), Northern, Central, 0.364 ± 0.001
Alento-Agri-Taranto- Southern subspaces
Brindisi
4 GST, SLV, Alento-Crati- Northern, 0.366 ± 0.000
Nardò-Brindisi Campanian-Molisan,
Apulian-Lucanian,
Southern subspaces
5 Sora-L’Aquila-S. Benedetto Northern subspace, 0.336 ± 0.003
Abruzzese,
Campanian-Molisan,
Apulian-Lucanian,
Southern subspaces
6 Foggia-Potenza-Cassano Northern subspace, 0.323 ± 0.003
Abruzzese,
Campanian-Molisan,
Apulian,
Irpino-Lucanian,
Southern subspace
7 Pollino-Sila-Lamezia Northern subspace, 0.322 ± 0.001
Abruzzese,
Campanian-Molisan,
Apulian, Irpino-Lucanian,
Cosentino,
Salentino-Calabrian
8 Latina-Ancona Perimedian, Median, 0.315 ± 0.002
Abruzzese,
Campanian-Molisan,
Apulian, Irpino-Lucanian,
Cosentino,
Salentino-Calabrian

11
0.4

silhouette coefficient
0.38

0.36

0.34

0.32

0.3
2 4 6 8 10
K

Figure 2: Silhouette coefficient as a function of the number of groups.

9 irregular Perimedian, Median, 0.315 ± 0.002

Abruzzese, Samnite,
Neapolitan-Molisano,
Apulian, Irpino-Lucanian,
Cosentino,
Salentino-Calabrian
10 Irregular As above, but Salentino 0.315 ± 0.005
split from Calabrian
11 Irregular As above, but 0.313 ± 0.001
Irpino-Lucanian split in
two groups

Table 3: List of Province codes. Regional capital cities in bold.

THE MARCHES (Marche) CAMPANIA

Ancona AN Avellino AV
Ascoli Piceno AP Benevento BN
Fermo FM Caserta CE
Macerata MC Napoli NA
UMBRIA Salerno SA
Perugia PG APULIA (Puglia)
Terni TN Bari BA
LATIUM (Lazio) Barletta-Andria-Trani BT
Frosinone FR Brindisi BR
Latina LT Foggia FG
Rieti RI Lecce LE

12
Roma RM Taranto TA
Viterbo VT BASILICATA
ABRUZZI (Abruzzo) Matera MT
L’Aquila AQ Potenza PZ
Chieti CH CALABRIA
Pescara PE Cosenza CS
Teramo TE Catanzaro CZ
MOLISE Crotone KR
Campobasso CB Reggio di Calabria RC
Isernia IS Vibo Valentia VV

The first optimal classification with K = 2 divides the overall space con-
sidered into a Northern and a Southern space, separated by a line that resem-
bles the traditional Salerno-Lucera (actually, Salerno-Lucera (FG)-Vieste (FG),
SLV) isogloss bundle. For instance, around this line lays the northern limit of
KJ > /ţ:/ (see trait 15 in Table 1).
In the partition with K = 4 each of these subspaces split in two. Thus,
a Northern subspace is separated from a Central-Northern subspace by a line
running from around Gaeta (LT) on the Tyrrhenian coast to around Termoli
(CB) on the Adriatic coast, with an elbow around Sora (FR). The Central-
Northern subspace is separated from a Central-Southern subspace by a SLV line,
although not exactly coincident with the previous one. Finally, the Central-
Southern subspace is separated from a Southern subspace by two lines, one
running from around the mouth of the Alento river (SA) on the Tyrrhenian
coast to the mouth of the Crati river (CS) on the Ionian coast, the other running
from around Nardò (LE) on the Ionian coast to around Brindisi on the Adriatic
coast.
The next optimal classification is with K = 8. Incidentally, this value of
K almost matches the number of administrative regions (Regioni) in the space
considered. This partition could be thus a promising basis for the definition of
more accurate “regional languages” in this half of Italy. The groups identified
by clustering with K = 8 are listed in Table 2 and detailed here from North to
South as:
1. “Perimedian” group, including provincial capitals AN, PG, VT, RM, LT,
and areas in Northern Marches, in Central-Western Umbria, in Western
Latium, besides a hamlet (frazione) in Basilicata (a Marchigiano colony).
2. “Median” group, including provincial capitals MC, FM, TN, RI, FR, AQ,
and areas in Central Marche, South-Eastern Umbria, Central Latium,
Western Abruzzi.
3. “Abruzzese” group, including provincial capitals AP, TE, PE, CH, and
areas in Southern Marche, Eastern Abruzzi, Southern Latium, besides a
few smaller areas in Molise.
4. “Campanian-Molisan” group, including provincial capitals IS, CB, BN,
NA, CE, SA, and areas in South-Eastern Latium, Molise, Northern-Central

13
Campania, besides some smaller areas in Northern Apulia and Basilicata
around and including the provincial capital PZ (Gallo-Italic colonies).
5. “Apulian” group, including provincial capitals FG, BAT, BA, TA, MT,
BR, and areas in Northern-Central Apulia, South-Eastern Basilicata, North-
Eastern Calabria, besides some smaller areas in Central Campania.
6. “Irpino-Lucanian” group, including provincial capital AV and areas in
Southern-Eastern Campania, Western Basilicata, and North-Eastern Apu-
lia.
7. “Cosentino” group, including provincial capital CS and areas in Southern
Campania (likely having a Greek substratum), Northern Calabria, besides
some smaller areas in Basilicata (most of them being or having been Gallo-
Italic colonies).
8. “Salentino-Calabrian” group, including provincial capitals LE, KR, VV,
CZ, RC, and areas in Southern Apulia and Central-Southern Calabria,
besides some smaller areas in Northern Calabria.
Figure 3 shows the attribution of each HA to one of the nine clusters, iden-
tified by a colour. It must be noted that the k-means algorithm has no knowl-
edge about the spatial correlation between the HA, each of them representing
a “point” in an 18-dimensioned space, with these points that can be geograph-
ically ordered in any arbitrary way. Though, the spatial consistency of the
results is striking, and the groups obtained clearly recall traditional regions and
dialectal groups. The actual boundaries between the eight groups can be traced
on a map as depicted in Fig. 4.
Boundary between groups 1–2 recalls a well-known isogloss, the Northern
limit of simultaneous /NT/ > /nd/ and /ND/ > /nn/ (see trait 13 in Table 1)
that traditionally separates the Central Italian dialects into a “Perimedian”
and a “Median” section (whence the naming of groups 1 and 2 used here).
Boundary 2–3 runs similarly to another definitory isogloss, the Northern limit
of [@] (see trait 4 in Table 1), which serves to separate Central from Southern
(“Neapolitan language”) dialects in traditional classifications. Boundary 3–4,
or the GST bundle introduced above, is similar to the Northern limit of PL
> /kj/ (see trait 7 in Table 1) or isogloss 21 in Pellegrini’s map. Boundary
between 4 on one side and 5, 6 on the other, is the SLV bundle discussed above.
Boundary between 5, 6 on one side and 7, 8 on the other, recalls the Northern
limit of non-standard tonic vowel systems (trait 18 in Table 1), which is different
from isogloss 25 (Southern limit of [@]) that is traditionally used to separate the
Intermediate Southern dialects from the Extreme Southern dialects (“Sicilian”
language). Finally, the North-South boundary between groups 6, 7 on one
side and 5, 8 on the other, matches almost perfectly a less used isogloss, i.e.,
the Western limit of LJ > /é/ (see trait 9 in Table 1), whereas in traditional
classifications the corresponding boundaries are purely administrative.
Figures 5 (schematic view) and 6 (pictorial) show the transitional areas
identified with the method (7) of second-best clusters (with ξ = 0.5). These

14
Figure 3: HA clustered in K = 8 groups: schematic representation.
Each colour corresponds to one group: blue (Perimedian), purple (Median),
pink (Abruzzese), red (Samnite), orange (Neapolitan-Molisano), green (Apu-
lian), yellow (Irpino-Lucanian), grey blue (Cosentino), light blue (Salentino-
Calabrian).

15
Figure 4: HA clustered in K=9 groups: a linguistic map with actual group
boundaries. Colours of groups correspond to those of Fig. 3

16
Figure 5: HA clustered in K=8 groups with second-best clusters: schematic
representation. Core clusters are identified by the left-half colour of the circles;
second-best clusters in transitional area are identified by right-half colours.

results suggest the existence of such areas at the geographical boundary between
groups 1 and 2 (in Marche, Umbria, and Latium), 2 and 3 (in Marche, Abruzzi,
and Latium), 2 and 4 (in Latium), 3 and 4 (in Latium, Abruzzi, Molise, and
Apulia), 4 and 5 (in Apulia), 4 and 6 (in Campania), 5 and 6 (in Apulia and
Basilicata), 6 and 7 (in Basilicata), 5 and 7 (in Calabria), 7 and 8 (in Apulia
and Calabria), 7 and 8 (in Calabria). Again, these results are consistent with
geography in the sense that transitional areas are generally identified between
clusters that actually share a geographical border.

5 Conclusions
In this paper, we have presented a dialectometry-based study aimed at classi-
fying the Romance varieties of Central-Southern Italy. We have analysed the
thousands of varieties under study and operated a massive pre-treatment of data
available from many sources. Instead of trying to gather a vocabulary of word
entries, we opted for a reduced data set, where each variety is characterized with

17
Figure 6: HA clustered in K=8 groups with second-best clusters: a linguistic
map with actual group boundaries and transitional areas (hatched).

18
respect to 18 phonetic traits, including the isoglosses that have been tradition-
ally used by linguists to define dialectal groups. On this basis we have identified
647 homogeneous areas as the groups conterminous varieties that share the same
traits. As a result, we have got an operating data set of 647 points in an 18-
dimensional space, where we could define linguistic distances. We have then
formulated the problem as a clustering problem, that is, find the K clusters
of those points that minimise the within-cluster linguistic distance. We have
used a k-means algorithm to cluster and an ad-hoc rule to define second-best
clusters and transitional areas. We have used silhouette analysis to select the
most appropriate number of clusters.
The results are geographically consistent, although the algorithms used have
no information about the actual geographical distance between areas or the
boundaries shared by them. The groups identified for various numbers K re-
semble but do not coincide with the regional varieties traditionally invoked. For
example, when the partition with K = 3 is compared with the traditional high-
level (Pellegrini’s areas) tripartite grouping into Central, Intermediate Southern,
and Extreme Southern, the results do not match unless the Central area includes
the Abruzzese.
The methods used suggest that clustering with 8 groups is the most appro-
priate choice. The dialectal groups identified (labelled as Perimedian, Median,
Abruzzese, Campanian-Molisan, Apulian, Irpino-Lucanian, Cosentino, and Salentino-
Calabrese) again do not coincide with the regional varieties (Pellegrini’s subar-
eas) traditionally invoked. The six geographic boundaries that can be roughly
traced between them (considering that the geographical representations of the
clusters are not perfectly connected in the topological sense) loosely run along
known isoglosses, which are all among the 18 traits considered. However, in no
way these isoglosses are favored a priori but it is the algorithm that ‘naturally’
selects them in the optimization process, which in turn depends on the entire
set of traits considered. This contrast with the traditional classification that is
based on a mixture of fewer definitory isoglosses and administrative or historical
boundaries.
We conclude that a classification based on these grounds is less arbitrary than
traditional ones based on selected isoglosses as it considers multiple dialectal
traits on an equal footing. It is also less subjective since the partitioning is
made by an algorithm that tries to minimise a clearly defined objective function.
Another strength of the method is that it can be readily adapted as long as new
data is available, varieties evolve, or corrections are made to the data set.

End Note
The author maintains a popularisation website (in Italian) at http://www.
asciatopo.altervista.org/dialettolog.html which contains a detailed de-
scription of the varieties and their arrangement in homogeneous areas, and sev-
eral linguistic maps.

19
References
[Burridge et al., 2019] Burridge, James, B. Vaux, M. Gnacik, & Y. Grudeva.
2019. Statistical physics of language maps in the USA. Phys. Rev. E.
99:032305.
[Calamai et al., 2022] Calamai, Silvia, D. Piccardi, & R. Nodari. 2022. Quan-
tifying folk perceptions of dialect boundaries. A case study from Tuscany
(Italy). Journal of Linguistic Geography 10(2), 87-111.
[Cheshire et al., 2011] Cheshire, J. A., P. Mateos, & P.A. Longley. 2011. Delin-
eating Europe’s cultural regions: population structure and surname clus-
tering. Hum. Biol. 83. 573–598.
[Eberhard et al. 2022] Eberhard, David M., G.F. Simons, & C.D. Fennig (eds.).
2022. Ethnologue: Languages of the World. Twenty-fifth edition. Dallas,
Texas: SIL International. Online version: http://www.ethnologue.com.
[Goebl, 1982] Goebl, Hans. 1982. Dialektometrie, vol. 157. Vienna: Verlag der
Osterreichischen Akademië der Wissenschaften.

[Goebl, 2008] Goebl, Hans. 2008. Le Laboratoire de dialectométrie de

l’Université de Salzbourg. Un bref rapport de recherche. Zeitschrift für
französische Sprache und Literatur 118(1). 35–55.
[Goebl, 2010] Goebl, Hans. 2010. Dialectometry and quantitative mapping. In
A. Lameli, R. Kehrein, & S. Rabanus (eds.), Language and Space. An Inter-
national Handbook of Linguistic Variation, Volume 2: Language Mapping,
433–457. Berlin & New York: de Gruyter.
[Goebl, 2018] Goebl, Hans. 2018. Dialectometry. In Boberg, C, J. Nerbonne &
D. Watt (eds.), The handbook of dialectology, 123-142. Oxford: Wiley.
[Goebl et al., 2019] Goebl, Hans, Edgar Haimerl, Pavel Smečka, Bernhard
Castellazzi, & Yves Scherrer. 2019. Dialectometry AIS. Retrieved from
http://dialektkarten.ch/dmviewer/ais/index.en.html.
[Hammarström et al., 2022] Hammarström, Harald, R. Forkel, M. Haspelmath
& S. Bank, 2022. Glottolog 4.6. Leipzig: Max Planck Institute for Evolu-
tionary Anthropology. Online version http://glottolog.org

[Heeringa, 2004] Heeringa, W.J. 2004. Measuring Dialect Pronunciation Differ-

ences using Levenshtein Distance. Ph.D. dissertation. Groningen: Univer-
sity of Groningen.
[Hyvönen et al., 2007] Hyvönen, Saara, Antti Leino, & Marko Salmenkivi.
2007. Multivariate analysis of Finnish dialect data—an overview of lexi-
cal variation. Literary and Linguistic Computing 22(3). 271–290.

20
[Jaberg & Jud, 1987] Jaberg, Karl & Jacob Jud. 1987. Atlante linguistico ed
etnografico dell’Italia e della Svizzera meridionale (AIS). Milano: Unicopli.
Retrieved from NavigAIS-web, https://navigais-web.pd.istc.cnr.it/
[Levenshtein, 1966] Levenshtein, Vladimir I. 1966. Binary codes capable of cor-
recting deletions, insertions, and reversals. Soviet Physics Doklady 10(8).
707–10.
[Montemagni & Wieling, 2016] Montemagni, Simonetta & Martijn Wieling.
2016. Tracking linguistic features underlying lexical variation patterns: A
case study on Tuscan dialects. In Côté, M.-H., R. Knooihuizen & J. Ner-
bonne (eds.). The future of dialects, 117-135. Berlin: Language Science
Press.
[Moseley, 2010] Moseley, Christopher (ed.). 2010. Atlas of the World’s Lan-
guages in Danger (3rd edition). UNESCO Publishing. Online version
https://unesdoc.unesco.org/ark:/48223/pf0000187026
[Nerbonne et al., 2021] Nerbonne, John, W.J. Heeringa, Jelena Prokić & Mar-
tijn Wieling. 2021. Dialectology for computational linguists. In Zampieri,
M. & P. Nakov (eds.), Similar languages, varieties, and dialects: A compu-
tational perspective, Studies in Natural language processing, 96-118.
[Pellegrini, 1977] Pellegrini, Giovan Battista. 1977. Carta dei dialetti d’Italia.
Pisa: Pacini.

[Pickl, 2014] Pickl, Simon, Aaron Spettl, Simon Pröll, Stephen Elspass, Werner
König, & Volker Schmidt. 2014. Linguistic Distances in Dialectometric In-
tensity Estimation. Journal of Linguistic Geography 2(1). 25-40.
[Pröll et al., 2014] Pröll, Simon, Pickl, Simon, & Spettl, Aaron. 2014. Latente
Strukturen in geolinguistischen Korpora. In Michael Elmentaler, Markus
Hundt, & Jürgen Erich Schmidt (eds.), Deutsche Dialekte. Konzepte, Prob-
leme, Handlungsfelder. Akten des, 4., vol. 4, 247–58. Stuttgard: Steiner.
[Romano et al., 2022] Romano, Noemi, P. Ranacher, S. Bachmann & Stéphane
Joost. 2022. Linguistic traits as heritable units? Spatial Bayesian cluster-
ing reveals Swiss German dialect regions. Journal of Linguistic Geography
10(1). 11-22.
[Russel & Norvig, 2020] Russel, Stuart & Peter Norvig. 2020. Artificial Intelli-
gence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall.
[Séguy, 1973] S’eguy, Jean. 1973. La dialectométrie dans l’Atlas linguistique de
la Gascogne. France: Société de linguistique romane.
[Syrjanen et al., 2016] Syrjänen, Kaj, Terhi Honkola, Jyri Lehtinen, Antti Lein
& Outi Vesakoski. 2016. Applying Population genetic approaches within
languages. Language dynamics and change 6. 235-283.

21
[Valls et al., 2012] Valls, Esteve, Nerbonne, John, Prokić, Jelena, Wieling, Mar-
tijn, Clue, Esteve, & Rosa Lloret, Maria. 2012. Applying the Levenshtein
Distance to Catalan dialects: A brief comparison of two dialectometric
approaches. Verba: anuario galego de filoloxia 39. 35–61.
[Wieling & Nerbonne, 2011] Wieling, Martijn & John Nerbonne. 2011. Bipar-
tite spectral graph partitioning for clustering dialect varieties and detecting
their linguistic features. Computer Speech and Language 25. 700-715.
[Wieling et al., 2013] Wieling, Martijn, R.G. Shackleton & John Nerbonne.
2013. Analyzing phonetic variation in the traditional English dialects: Si-
multaneously clustering dialects and phonetic features. Literary and Lin-
guistic Computing 28(1). 31-41.
[Wieling et al., 2014] Wieling, Martjin, J. Bloem, K. Mignella, M. Timmermeis-
ter, & John Nerbonne. 2014. Validating and using the PMI-based Leven-
shtein distance as a measure of foreign accent strength. Poster presented
at Methods in Dialectology XV, Groningen (The Netherlands).

Selected Sources
Specific Varieties
1. Abete, Giovanni. 2006. Sulla questione della sillaba superpesante: I dit-
tonghi discendenti in sillaba chiusa nel dialetto di Pozzuoli. In Savy, R. &
C. Crocco (eds.), Atti del II Convegno Nazionale dell’Associazione Italiana
di Scienze della Voce (AISV), 379-398. Torriana: EDK.
2. Abete, Giovanni. 2020. Nuove acquisizioni sul vocalismo marginale: il
dialetto di Calitri (AV). L’Italia dialettale 81. 311-339.
3. Avolio, Francesco & Antonio Romano. 2010. Ai margini dell’area Laus-
berg: le varietà di Aliano e Alianello nei risultati di un’indagine dialet-
tologica e fonetica. In Iliescu, M., H.-S. Runggaldier & P. Danler (eds.),
Actes du XXVe Congrès International de Linguistique et de Philologie
Romane, vol. 4, 25-36. Berlin: De Gruyter.
4. Ceppaglia, Marco & Antonio Romano. 2018. Il dialetto di Martina Franca
da G. Grassi a G.G. Marangi: analisi fonetica descrittiva del vocalismo.
L’idomeneo 25. 241-250.
5. Colonna, Valentina & Antonio Romano. 2018. La variazione diatopica
nel micro-spazio dialettale leccese: il dialetto salentino delle frazioni di
Vernole. In Caramuscio, G. & A. Romano (eds.), Una d’arme, di lingua,
d’altare, di memorie, di sangue, di cor: omaggio a Luciano Graziuso.
Lecce: Grifo.

22
6. Costagliola, Angelica. 2007. Il vocalismo tonico di Lecce: analisi Acustica
di un campione di parlanti differenziati per sesso ed età. Atti del I Con-
gressso Nazionale dell’Associazione Italiana di Scienze della Voce (AISV),
567-596.
7. D’Alessandro, Roberta & Marc van Oostendorp. 2014. La metafonia
fra fonologia e lessico: Il caso dell’ariellese. In Pescarini, Diego & Diana
Passino (eds.), Studi sui dialetti dell’Abruzzo. Quaderni di lavoro ASIT
17, 1-18.
8. D’Andrea, Federica, Carmela Lavecchia, Francesca V. Russo, Carminella
Scarfiello, Anna M. Tesoro & Francesco Villone. 2016. I dialetti: patri-
moni culturali locali nella lingua (rivista di proietti dottorali in Corso).
Ianua. Revista Philologica Romanica 17. 133-168.
9. De Iacovo, Valentina. 2018. Il dialetto di Leporano (TA): un confronto tra
un’inchiesta dialettale recente e l’inchiesta della Carta dei Dialetti Italiani.
L’idomeneo 25. 215-222.

10. Ferrari-Bridgers, Franca. 2010. The Ripano dialect: Toward the end of
a mysterious linguistic island in the heart of Italy. In Millar, Robert Mc-
Coll (ed.). Marginal Dialects: Scotland, Ireland and Beyond. Aberdeen:
Forum for Research on the Languages of Scotland and Ireland. 106-130.
11. Gaglia, Sascha. 2007. Metaphonie im kampanischen Dialekt von Piedi-
monte Matese. Ph.D. dissertation. Konstanz: Universität Konstanz.
12. Hončová, Marketa. 2011. La scelta del verbo ausiliare nei dialetti di
Corropoli e Nereto. Ph.D. dissertation. Praha: Univerzita Karlova.
13. Iannacito, Roberta. 2002. L’assimilazione progressive nel dialetto molisano
di Villa San Michele (IS). Italica 79(4). 509-524.
14. Iannacito-Provenzano, Roberta. 2005. Cenni sulla frase ipotetica in due
dialetti dell’Alto Molise. Forum Italicum: A Journal of Italian Studies
39(2). 498-519.
15. Idone, Alice & Giuseppina Silvestri. 2018. Verbicarese. Zurich: Univer-
sität Zürich. Retrieved from http://www.dai.uzh.ch/new/#/public/
overviews
16. Loporcaro, Michele & Dafne Pedrazzoli. 2016. Classi flessive del nome e
genere grammaticale nel dialetto di Agnone (Isernia). Revue de Linguis-
tique Romane (RLiR) 80(317). 73-100.

17. Lupini, Carmelo. 2004. Frangimenti e turbamenti di “a” tonica nella

Calabria centro-settentrionale: il dialetto di Acri. Atti del convegno su
Padula, la Calabria, l’Italia: nuovi orizzonti della ricerca paduliana (Roc-
cella Jonica, Italy, March).

23
18. Maggiore, Marco & Angelo Variano. 2015. Differenziazione vocalica per
posizione e differenziazione fonetica su base sessuale nella varietà di Zap-
poneta (FG). L’Italia dialettale 76. 83-104.
19. Mancarella, Giovan Battista. 2018. L’attività Contadina narrate nel di-
aletto di Sava. L’idomeneo 25. 133-146.
20. Paciaroni, Tania & Michele Loporcaro. 2007. Funzioni morfologiche
dell’opposizione fra -u e -o nei dialetti del maceratese. In Iliescu, M.,
H. Siller-Runggaldier & P. Danler (eds.), Actes du XXVe Congrès Inter-
national de Linguistique et de Philologie Romanes, 497-506. Innsbruck
(Austria).
21. Paciaroni, Tania. 2009. Coarticolazione e mutamento: una ricerca sul vo-
calismo atono finale nell’entroterra maceratese. In Schmid, S., M. Schwarzen-
bach & D. Studer (eds.), La dimensione temporale del parlato: Atti del 5°
convegno nazionale AISV, 177-194. Zürich: Phonetisches Laboratorium.
22. Paciaroni, Tania. 2012. Dialecte et italien standard à Macerata: du côté
du locuteur. In Léonard, J.L. & K.J. Avilés Gonzáles (eds.), Documen-
tation et revitalisation des “langues en danger”: épistémiologie et praxis.
Paris: Houdiard. 327-369.
23. Passino, Diana & Diego Pescarini. 2018. Il Sistema vocalico del dialetto
alto-meridionale di San Valentino in Abruzzo Citeriore con particolare
riferimento agli esiti di Ū. In Antonelli, Roberto, Martin Glessgen & Paul
Visdott (eds.), Proc. of the XXVIII Congresso internazionale di linguistica
e filologia romanza, 484-497. Roma.
24. Patrascu, Francesca-Alessandra. 2019. Dialetto e italiano regionale ad
Assisi. Mag. Phil. Dissertation. Vienna: Universität Wien.
25. Radtke, Edgar. 1997. Tortorella: eine bislang unbekannte galloitalienische
Sprachkolonie im Cilento. Zeitschrift für romanisce Philologie 113(1). 82-
108.
26. Romano, Antonio. 2010. Norma e variazione nel dialetto salentino di
Parabita. In Spedicato, M. (ed.), Scritti in memoria di Oronzo Parlangeli
a 40 anni dalla scomparsa (1969-2009), 237-268. Galatina: EdiPan.
27. Romano, Antonio. 2013. Il vocalismo del dialetto salentino di Galatone:
differenze d’apertura metafonetiche, trace isolate di romanzo commune e
interferenze diasistematiche. In Romano, A. & M. Spedicato (eds.), Sub
voce Sallentinitas: Studi in onore di G.B. Mancarella, 247-276. Lecce:
Grifo.
28. Russo, Michela. 2010. Le origini della dittongazione spontanea nei dialetti
italiani meridionali dell’ovest (Ischia e Pozzuoli): Isocronia diacronica an-
tischürriana e quantificazioni isocroniche attuali. Zeitschrift für Romanis-
che Philologie 126(2). 304-349.

24
29. Schirru, Giancarlo. 2016. Propagginazione e flessione nominale in al-
cuni dialetti italiani centro-meridionali. Atti del Sodalizio Glottologico
Milanese 8-9. 121-130.
30. Sornicola, Rosanna. 1999. Alcune recenti ricerche sul parlato: le di-
namiche vocaliche di € nell’area flegrea e le loro implicazioni per una
teoria della variazione. In Dardano, M., A. Pelo & A. Stefinlongo (eds.),
Atti del colloquio internazionale di studi Scritto e parlato: metodi, testi e
contesti, 239-264. Roma.
31. Sornicola, Rosanna. 1999. La variazione dialettale nell’area costiera napo-
letana: il Progetto di un Archivio di testi dialettali parlati. In Marcato,
G. (ed.), Dialetti oggi, Atti del convegno Tra lingua, cultura, società: di-
alettologia sociologica (Sappada-Plodn 1-4 luglio 1999), 103-122. Padua:
Unipress.
32. Villone, Francesco. 2018. Tra micro e macro aree: elementi linguistici
ed extra-linguistici nel case study Avigliano alla luce dei dati dell’Atlante
Linguistico della Basilicata (A.L.Ba). In Sampino, G. e F. Scaglione (eds.),
Saperi umanistici nella contemporaneità, Atti del convegno internazionale
dei dottorandi (Palermo, Italy, 17-18 Sept. 2015), 161-177. Palermo:
Palumbo.
33. Vitolo, Giuseppe. 2005. L’ausiliare nei dialetti di Salerno, Cetara, Cas-
tiglione dei Genovesi e Salitto. Quaderni del Dipartimento di Linguistica
15, 238-254. Firenze: Università di Firenze.
34. Vitolo, Giuseppe. 2017. Fenomeni fonetici e morfo-sintattici del dialetto
campano di Pagani. Quaderni di Linguistica e Studi Orientali 3. 219-241.

Larger Areas
1. Abete, Giovanni. 2011. I processi di dittongazione nei dialetti dell’Italia
meridionale: un approccio sperimentale. Roma: Aracne.
2. Abete, Giovanni. 2017. Parole e cose della pastorizia in Alta Irpinia.
Napoli: Giannini.
3. Avolio, Francesco. 1995. Note sulla variabilità linguistica nell’Appennino
abruzzese. Nouvelles du Centre d’Etudes Francoprovençales René Willien,
Mélanges en souvenir de Marco Perron 31. 91-105.

4. Avolio, Francesco. 2010. I dialetti dell’area cassinese e dell’odierno basso

Lazio: alcune considerazioni. Quaderni Coldragonesi 1. 27-36.
5. Balducci, Sanzio. 1987. I dialetti. In Anselmi, S. (ed.), La provincial di
Ancona: storia di un territorio, 273-284. Bari: Laterza.

25
6. Cangemi, Francesco, Rachele Delucchi, Michele Loporcaro & Stephan
Schmid. 2010. Vocalismo finale atono “toscano” nei dialetti del Vallo
di Diano (Salerno). In Cutugno, F., P. Maturi, R. Savy, G. Abete, &
I. Alfano (eds.), Parlare con le persone, parlare alle machine, 477-490.
Torriana: EDK.

7. Capotosto, Silvia. 2011. La palatalizzazione di -LL- e -L- nel quadro

linguistico mediano. Contributi di filologia dell’Italia mediana 25. 275-
300.
8. Carosella, Maria. 1999. La metafonesi nei dialetti garganici nord-occidentali.
Quaderni del Dipartimento di Linguistica, Università di Firenze 9. 97-138.

9. Castagna, Raffaele (ed.). 2006. I dialetti d’Ischia nella tesi di laurea di

Ilse Freund elaborate dopo un soggiorno a Serrara Fontana (1929). La
rassegna d’Ischia 1.
10. Cimarra, Luigi & Francesco Petroselli. 2001. Proverbi e detti proverbiali
della Tuscia viterbese. Viterbo: Gruppo interdisciplinare per lo studio
della cultura tradizionale dell’Alto Lazio.
11. Corsi, Anna, Valentina Cardinale & Vincenzo Luciani. 2014. Dialetto e
poesia nei 33 comuni della provincia di Latina. Roma: Cofine.
12. Di Carlo, Miriam. 2015. Né toscani né romani: per una caratterizzazione
dei dialetti dell’area viterbese. In Marcato, G. (ed.), Dialetto parlato,
scritto, trasmesso, 345-352. Padua: CLEUP.
13. Egidi, Francesco. 1965. Dizionario dei dialetti piceni fra Tronto e Aso.
Montefiore dell’Aso.

14. Garrapa, Luigia. Vocali maschili e femminili fra Salentocentrale e Salento

meridionale: problemi sincronici per un’analisi diacronica. Atti del I Con-
gressso Nazionale dell’Associazione Italiana di Scienze della Voce (Padova,
Italy, 2-4 Dec. 2004), 651-669.
15. Germani, Alfonso. 2014. Il tipo dialettale del Lazio meridionale: alcuni
fenomeni linguistici caratteristici. In Lucrările celui de-al XV-lea Sim-
pozion Internat, ional de Dialectologie, 135-165. Cluj-Napoca: Argonaut,
Scriptor.
16. Granatiero, Francesco. 2012. Vocabolario dei dialetti garganici. Foggia:
Grenzi.

17. Grimaldi, Mirko. 2001. Ancora sulla questione del vocalismo siciliano alla
luce di processi metafonetici scoperti nel Salento meridionale. In Quaderni
del Dipartimento di Linguistica 11, 69-105. Firenze: Università di Firenze.

26
18. Loporcaro, Michele. 2009. Opposizioni di caso nel pronome personale: i
dialetti del mezzogiorno in prospettiva romanza. In De Angelis, A. (ed.),
I dialetti italiani meridionali tra arcaismo e interferenza, Atti del Con-
vegno internazionale di Dialettologia (Messina, 4-6 June 2008), 207-235.
Palermo.
19. Maiden, Martin. 1987. New perspectives on the genesis of Italian metaphony.
Transactions of the Philological Society 85(1). 38-73.
20. Mancarella, Giovan Battista. 2015. Dialetti salentini. L’idomeneo 19.
147-156.
21. Melillo, Giacomo. 1926. I dialetti del Gargano. Pisa: Simoncini.
22. Memoli, Giovanna. Il patrimonio linguistico del Vallo di Diano. In Aro-
mando, G. (ed.), Per angusta ad augusta 1(2). 110-132.
23. Paggini, Valentina & Silvia Calamai. 2016. L’anafonesi in Toscana: il
contributo degli archivi sonori del passato. Atti del Convegno Associazione
Italiana Scienze della Voce, 155-168. Milano: Officinaventuno.
24. Paternostro, Luigi. 2019. Gli alti Bruzi e Il loro linguaggio. Firenze:
Phasar.
25. Retaro, Valentina & Giovanni Abete. 2016. Sull’importanza delle aree
intermedie: I dialetti del Vallo di Lauro. In Antonelli, R., M. Glessgen &
P. Videsott (eds.), Atti del XXVIII Congresso internazionale di linguistica
e filologia romanza vol. 2, 957-968. Roma.
26. Romano, Antonio. 2015. Una selezione di carte linguistiche del Salento.
L’idomeneo 19. 43-56.
27. Romito, Luciano, Tiziana Turano, Michele Loporcaro & Antonio Mendi-
cino. 1996. Micro e macrofenomeni di centralizzazione nella variazione
diafisica: rilevanza dei dati fonetico-acustici per il quadro dialettologico
calabrese. Fonetica e fonologia degli stili dell’italiano parlato. 14-15.
28. Romito, Luciano, Vincenzo Galatà, Rosita Lio, Francesca Stillo. 2006. La
metafonia nei dialetti dell’area Lausberg: un’introspezione sulla natura
della sillaba. In Savy, R. & C. Crocco (eds.), Analisi prosodica: teorie,
modelli e sistemi di annotazione. Torriana: EDK.
29. Romito, Luciano & Daniela Gagliardi. 2007. La metafonia in alcuni centri
del Nord Calabria: verso una mappa regionale. In Romito, L., V. Galatà &
R. Lio (eds.), Atti del IV Convegno Nazionale dell’Associazione Italiana
di Scienze della Voce (Arcavacata di Rende, Italy, 30-5 Dec.), 423-436.
Torriana: EDK.
30. Russo, Michela. 2002. Metafonesi opaca e differenziazione vocalica nei
dialetti della Campania. Zeitschrift für romanische Philologie 118(2). 195-
223.

27
31. Savoia, Leonardo M. & Benedetta Baldi. 2016. Propagation and preserva-
tion of rounded back vowels in Lucanian and Apulian varieties. Quaderni
di Linguistica e Studi Orientali / Working Papers in Linguistics and Ori-
ental Studies 2. 11-58.
32. Trumper, John B. La valle del Savuto e la catena paolana: alcune osser-
vazioni storico-linguistiche, anche sulla ‘presenza longobarda’. Retrieved
from https://lingcal.wordpress.com/letture-varie/
33. Vecchia, Cesarina. 2017. La variazione fonetica degli esiti di -LL- in
Irpinia: Processi di rotacizzazione e di retroflessione nelle varietà dell’alta
valle del Calore. Ph.D. dissertation. Napoli: Università Federico II.

34. Vitolo, Giuseppe. 2012. Parlate campane: la selezione dell’ausiliare e il

sistema clitico. Roma: Aracne.

Comprehensive Monographies
1. Avolio, Francesco. 1995. Bommespr@: profile linguistico dell’Italia centro-
meridionale. San Severo: Gerni.
2. Giammarco, Ernesto. 1968-1979. Dizionario abruzzese e molisano. Roma:
Ateneo.

3. Loporcaro, Michele. 2013. Profilo inguistico dei dialetti italiani. Bari:

Laterza.
4. Loporcaro, Michele. 2021. La Puglia e il Salento. Bologna: Il Mulino.
5. Loporcaro, Michele. 2016. Metaphony and diphthongization in Southern
Italy: Reconstructive implications for sound change in early Romance. In
Torres-Tamarit, F., K. Linke & M. van Oostendorp (eds.). Approaches to
metaphony in the languages of Italy, 55-87. Berlin: De Gruyter.
6. Moretti, Giovanni. 1975. Profilo dei dialetti italiani: Umbria. Pisa:
Pacini.

7. Piemontese, Pasquale (ed.). 1982. La parabola del figliuol prodigo nei

dialetti italiani: I dialetti del Molise. Bari: Università di Bari.
8. Rohlfs, Gerhard. 1966. Grammatica storica della lingua italiana e dei suoi
dialetti, vol. 1 Fonetica. Torino: Einaudi.
9. Vignuzzi, Ugo. 1994. Il volgare nell’Italia mediana. In Serianni, Luca
& Pietro Trifone (eds.), Storia della lingua italiana 3, 329-372. Torino:
Einaudi.

28
Linguistic Atlases
1. Vivaio acustico delle Lingue e dei Dialetti d’Italia (VIVALDI). 1998-2018.
Retrieved from https://www2.hu-berlin.de/vivaldi/index.php
2. Archivio di parlato – La tramontana e il sole. 2017. Torino: Laboratorio
fonetica sperimentale Arturo Genre (LFSAG). Retrieved from https://
www.lfsag.unito.it/ark/table_ita.html
3. Microcontact. Utrecht: Utrecht University. Retrieved from https://
microcontact.hum.uu.nl

Dennis R. Preston - Handbook of Perceptual Dialectology Volume 1 (1999)
No ratings yet
Dennis R. Preston - Handbook of Perceptual Dialectology Volume 1 (1999)
454 pages
Cityscapes and Perceptual Dialectology: Jennifer Cramer and Chris Montgomery (Eds.)
No ratings yet
Cityscapes and Perceptual Dialectology: Jennifer Cramer and Chris Montgomery (Eds.)
322 pages
Linguistic Change and Diffusion: Description and Explanation in Sociolinguistic Dialect Geography
0% (1)
Linguistic Change and Diffusion: Description and Explanation in Sociolinguistic Dialect Geography
32 pages
Makalah Language Society Bab 18 Kelompok 5
100% (1)
Makalah Language Society Bab 18 Kelompok 5
8 pages
Language Variation
100% (1)
Language Variation
16 pages
Stoeckle (2014) Linguistic Geography and GIS
No ratings yet
Stoeckle (2014) Linguistic Geography and GIS
88 pages
WIELING M. - NERBONNE J. - Advances in Dialectometry PDF
No ratings yet
WIELING M. - NERBONNE J. - Advances in Dialectometry PDF
47 pages
Dialect Ology
No ratings yet
Dialect Ology
5 pages
Ward Linkage-3941-Eng
No ratings yet
Ward Linkage-3941-Eng
41 pages
Geography of Language For FRAGL-Formatiert - FRAGL 16
No ratings yet
Geography of Language For FRAGL-Formatiert - FRAGL 16
39 pages
Regional and Social Variation
100% (2)
Regional and Social Variation
4 pages
Cavirani 2018
No ratings yet
Cavirani 2018
42 pages
Maps &amp Iso Glosses
No ratings yet
Maps &amp Iso Glosses
20 pages
The Atlas of North American English Phonetics Phon... - (Part D Overviews of North American Dialects) - 1
No ratings yet
The Atlas of North American English Phonetics Phon... - (Part D Overviews of North American Dialects) - 1
68 pages
LL 215 Lecture 1
No ratings yet
LL 215 Lecture 1
4 pages
New Tendencies in Geographical Dialectology: The Catalan Corpus Oral Dialectal (Cod)
No ratings yet
New Tendencies in Geographical Dialectology: The Catalan Corpus Oral Dialectal (Cod)
14 pages
Adalya
No ratings yet
Adalya
10 pages
(M) Handbook of Perceptual Dialectology - Volume 2 (2003)
No ratings yet
(M) Handbook of Perceptual Dialectology - Volume 2 (2003)
439 pages
Science2012 - Origins and Expansion of The Indo-European Language Family - S
No ratings yet
Science2012 - Origins and Expansion of The Indo-European Language Family - S
59 pages
Automatic Identification of Vietnamese Dialects - 1169056
No ratings yet
Automatic Identification of Vietnamese Dialects - 1169056
12 pages
Rabanus2011 TheStateOfTheArtInLinguisticCartography
No ratings yet
Rabanus2011 TheStateOfTheArtInLinguisticCartography
22 pages
Fine-Grained Arabic Dialect Identification
No ratings yet
Fine-Grained Arabic Dialect Identification
13 pages
Dialect Geography Power Point PDF
100% (1)
Dialect Geography Power Point PDF
56 pages
CRITICAL JOURNAL REPORT Lang Studies
No ratings yet
CRITICAL JOURNAL REPORT Lang Studies
21 pages
Predictive Modelling of Roman Settlement
No ratings yet
Predictive Modelling of Roman Settlement
14 pages
Pone 0023613 PDF
No ratings yet
Pone 0023613 PDF
14 pages
Phylogeny and Geometry of Languages From Normalized Levenshtein Distance
No ratings yet
Phylogeny and Geometry of Languages From Normalized Levenshtein Distance
11 pages
978 1 4438 4757 5 Sample
No ratings yet
978 1 4438 4757 5 Sample
30 pages
Genes and Languages
No ratings yet
Genes and Languages
10 pages
Towards An Updated Dialect Atlas of British English (MacKenzie, Bailey, Turton 2022)
No ratings yet
Towards An Updated Dialect Atlas of British English (MacKenzie, Bailey, Turton 2022)
22 pages
Adrian Leemann - Dialectology
No ratings yet
Adrian Leemann - Dialectology
26 pages
Visualising Language in Space - New Approaches in Linguistic Cartography
No ratings yet
Visualising Language in Space - New Approaches in Linguistic Cartography
4 pages
Methods and Objectives in Contemporary Dialectology: Benedikt Szmrecsanyi
No ratings yet
Methods and Objectives in Contemporary Dialectology: Benedikt Szmrecsanyi
12 pages
(15699846 - Journal of Greek Linguistics) Modern Greek Dialects - A Preliminary Classification
No ratings yet
(15699846 - Journal of Greek Linguistics) Modern Greek Dialects - A Preliminary Classification
19 pages
BassiouneyReem 2017 3DialectPerformancesI IdentityAndDialectPer
No ratings yet
BassiouneyReem 2017 3DialectPerformancesI IdentityAndDialectPer
20 pages
Computer Modelling of Innovations Relative To Lati
No ratings yet
Computer Modelling of Innovations Relative To Lati
32 pages
1 s2.0 S0024384115002533 Main
No ratings yet
1 s2.0 S0024384115002533 Main
16 pages
10.1515 - Ling 2021 0138
No ratings yet
10.1515 - Ling 2021 0138
40 pages
Presentation 3
No ratings yet
Presentation 3
32 pages
Dialectology: History
100% (1)
Dialectology: History
5 pages
Diaz Et All. (2015) A Method For Processing Perceptual Dialectology Data
No ratings yet
Diaz Et All. (2015) A Method For Processing Perceptual Dialectology Data
11 pages
Computational Historical Linguistics: Gerhard Jäger
No ratings yet
Computational Historical Linguistics: Gerhard Jäger
32 pages
Dialect Chains
No ratings yet
Dialect Chains
4 pages
Word English's
No ratings yet
Word English's
7 pages
Comparing Germanic, Romance and Slavic: Relationships Among Linguistic Distances
No ratings yet
Comparing Germanic, Romance and Slavic: Relationships Among Linguistic Distances
23 pages
Tibetan Dialects: A Case Study in Areal Phonology
No ratings yet
Tibetan Dialects: A Case Study in Areal Phonology
47 pages
The Mouton Atlas of Languages and Cultures) 1. Introduction
No ratings yet
The Mouton Atlas of Languages and Cultures) 1. Introduction
13 pages
Προκλήσεις επισημείωσης ενός πολυ-διαλεκτικού, πολυ-επίπεδου σώματος γραπτών και προφορικών κειμένων των Νεοελληνικών Διαλέκτων
No ratings yet
Προκλήσεις επισημείωσης ενός πολυ-διαλεκτικού, πολυ-επίπεδου σώματος γραπτών και προφορικών κειμένων των Νεοελληνικών Διαλέκτων
10 pages
Coek - Info - Dialectology An Introduction
No ratings yet
Coek - Info - Dialectology An Introduction
6 pages
Measuring Dialect Distance Phonetically: Nerbonne Let, NL
No ratings yet
Measuring Dialect Distance Phonetically: Nerbonne Let, NL
8 pages
Dialectal Resources On-Line: The ALT-Web Experience
No ratings yet
Dialectal Resources On-Line: The ALT-Web Experience
6 pages
LabSet 04
No ratings yet
LabSet 04
4 pages
Using Gaussian Mixture Model Clustering To Explore Morphology and Standardized Production of Ceramic Vessels
No ratings yet
Using Gaussian Mixture Model Clustering To Explore Morphology and Standardized Production of Ceramic Vessels
15 pages
Debapriya Sengupta, Goutam Saha - Identification of The Major Language Families of India and Evaluation of Their Mutual Influence 2016
No ratings yet
Debapriya Sengupta, Goutam Saha - Identification of The Major Language Families of India and Evaluation of Their Mutual Influence 2016
16 pages
Semantic Analysis of English Verbs in Reuters World Tweets: Mustansiriyah University
0% (1)
Semantic Analysis of English Verbs in Reuters World Tweets: Mustansiriyah University
6 pages
Language Variation
No ratings yet
Language Variation
3 pages
How To Distinguish Languages and Dialects
No ratings yet
How To Distinguish Languages and Dialects
9 pages
4 Grieve2012 - The Encyclopedia of Applied Linguistics - Sociolinguistics Quantitative Methods
No ratings yet
4 Grieve2012 - The Encyclopedia of Applied Linguistics - Sociolinguistics Quantitative Methods
8 pages
Isogloss
No ratings yet
Isogloss
1 page
Data Mining and Data Warehousing Principles and Practical Techniques 1108727743 9781108727747 Compress
No ratings yet
Data Mining and Data Warehousing Principles and Practical Techniques 1108727743 9781108727747 Compress
513 pages
Data Clustering: 50 Years Beyond K-Means
No ratings yet
Data Clustering: 50 Years Beyond K-Means
35 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
CS 2032 - Data Warehousing and Data Mining PDF
No ratings yet
CS 2032 - Data Warehousing and Data Mining PDF
3 pages
Digital Transformation in Manufacturing Industries Effects of Firm Size
No ratings yet
Digital Transformation in Manufacturing Industries Effects of Firm Size
9 pages
ML For Perovskite Solar Cells
No ratings yet
ML For Perovskite Solar Cells
18 pages
Data Mining Dan Bigdata
No ratings yet
Data Mining Dan Bigdata
38 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Plant Health Monitoring Using Digital Image Processing: By: Sivapriya.G
No ratings yet
Plant Health Monitoring Using Digital Image Processing: By: Sivapriya.G
12 pages
K Means Clustering Project Updated Cleaned
No ratings yet
K Means Clustering Project Updated Cleaned
3 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Le 4
No ratings yet
Le 4
12 pages
Data Analysis and Data Science Task - 3
No ratings yet
Data Analysis and Data Science Task - 3
3 pages
An Unsupervised Machine Learning Algorithms - Comprehensive Review
No ratings yet
An Unsupervised Machine Learning Algorithms - Comprehensive Review
12 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
Crime Rate Prediction Using Machine Learning and Data Mining
No ratings yet
Crime Rate Prediction Using Machine Learning and Data Mining
12 pages
Solution For DWDM Problems
No ratings yet
Solution For DWDM Problems
24 pages
Sathyabama: Register Number
No ratings yet
Sathyabama: Register Number
2 pages
Machine Learning For Microbiology
No ratings yet
Machine Learning For Microbiology
15 pages
Aiml QB With Ans - 075736
No ratings yet
Aiml QB With Ans - 075736
69 pages
Final2021 Evenning Sol
No ratings yet
Final2021 Evenning Sol
12 pages
Data Summarization A Survey - 0
No ratings yet
Data Summarization A Survey - 0
25 pages
Guidelines For Preparing The Presentation Slides For Intro To Data Science and AI Project
No ratings yet
Guidelines For Preparing The Presentation Slides For Intro To Data Science and AI Project
2 pages
Study On Holographic ImageRecognition Technology of Zooplankton
No ratings yet
Study On Holographic ImageRecognition Technology of Zooplankton
16 pages
Consensus Cluster Plus
No ratings yet
Consensus Cluster Plus
12 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
28 pages
Section 7.7 Transformation Hellinger
No ratings yet
Section 7.7 Transformation Hellinger
7 pages
04 LEC Data Science Kmeans
No ratings yet
04 LEC Data Science Kmeans
26 pages
Efficient Data Clustering With Link Approach
No ratings yet
Efficient Data Clustering With Link Approach
8 pages

Sciarretta Dialectometry Revised

Uploaded by

Sciarretta Dialectometry Revised

Uploaded by

Dialectometry-Based Classification of

Central-Southern Italian Dialects

ℓ Phonetic trait – Outcomes Examples xℓ wℓ

Traditionally, most of these features are associated with “isoglosses” that

where the centroid mk is defined as the mean of points belonging to cluster k

The lowest distance corresponds by definition to the cluster k to which the

Table 2: Clustering results as a function of the no. of clusters K.

K Main new divide (w.r.t. Groups identified σ

Figure 2: Silhouette coefficient as a function of the number of groups.

9 irregular Perimedian, Median, 0.315 ± 0.002

Table 3: List of Province codes. Regional capital cities in bold.

THE MARCHES (Marche) CAMPANIA

[Goebl, 2008] Goebl, Hans. 2008. Le Laboratoire de dialectométrie de

[Heeringa, 2004] Heeringa, W.J. 2004. Measuring Dialect Pronunciation Differ-

17. Lupini, Carmelo. 2004. Frangimenti e turbamenti di “a” tonica nella

4. Avolio, Francesco. 2010. I dialetti dell’area cassinese e dell’odierno basso

7. Capotosto, Silvia. 2011. La palatalizzazione di -LL- e -L- nel quadro

9. Castagna, Raffaele (ed.). 2006. I dialetti d’Ischia nella tesi di laurea di

14. Garrapa, Luigia. Vocali maschili e femminili fra Salentocentrale e Salento

34. Vitolo, Giuseppe. 2012. Parlate campane: la selezione dell’ausiliare e il

3. Loporcaro, Michele. 2013. Profilo inguistico dei dialetti italiani. Bari:

7. Piemontese, Pasquale (ed.). 1982. La parabola del figliuol prodigo nei

You might also like