Lexi Cal Decay

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

15

1965 I nt e r na t i ona l Conf e r e nc e on


Comput at i onal Li ngui s t i c s
M O D E L S O F L E X I C A L D E C A Y
D . Kl e i ne c ke
T E C H N I C A L M I L I T A R Y P L A N N I N G O P E R A T I O N
G E N E R A L E L E C T R I C C O M P A N Y
7 3 5 S t a t e S t r e e t ( P . O . D r a w e r Q Q )
S A N T A B A R B A R A , C A L I F O R N I A 9 3 1 0 Z
Abs t r a c t
L e xica l d e c a y is t he p h e n o m e n o n unde r lying t he da t ing t e ch-
nique s k n o w n a s "glot t ochr onology" a nd"le xicost a t ist ics. " M u c h of
t he cont r a ve r sia l na t ur e of w o r k in t his fie ld is t he r e sult of e x t r e m e l y
i m p r e c i s e founda t ions a n d la ck of a t t e nt ion t o t he unde r lying st a t ist ica l
a n d s e m a n t i c m o d e l s .
A sa t isfa ct or y s e m a n t i c m o d e l ca n be found in t he conce pt of se -
m a n t i c a t om. N ot w it hst a nding a n u m b e r of philosophica l obje ct ions,
t he s e m a n t i c a t o m is a n ope r a t iona lly fe a sible suppor t for a le xicon
w h i c h is a s e m a n t i c subse t of a ll possible m e a n i n g s a n d a t t he s a m e
t ime , e xha ust s t he v o c a b u l a r y of a la ngua ge . L e xica l d e c a y is t he
p r o c e s s by w h i c h t he le xica l i t e m cove r ing a n a t o m is r e pla ce d b y
a not he r le xica l it e m.
E xpone nt ia l le xica l pr e se r va t ion is, in t his m o d e l , dir e ct ly
a na logous t o d e c a y p h e n o m e n a in nucle a r physics. C o n s i s t e n c y r e -
quir e s t ha t t he d e c a y p r o c e s s involve d in e xpone nt ia lly p r e s e r v e d
voca bula r ie s be a P o i s s o n pr oce ss. T his s h o w s h o w t o f o r m t e st
voca bula r ie s for da t ing a n d p r o v e s t ha t pr e se nt ly u s e d voca bula r ie s
a r e not cor r e ct ly for me d.
D ia le ct a t ion st udie s s h o w t ha t hist or ica lly dive r ging popula t ions
m u s t be m o d e l l e d by cor r e la t e d P o i s s o n pr oce sse s. D e finit ive st a -
t ist ica l t r e a t m e n t of t he se que st ions is not possible a t t his t ime , but
m u c h de sir a ble r e s e a r c h ca n be indica t e d.
Kl e i ne c ke - I
I nt r od uc t i on
T h i s pa pe r i s an a t t e mpt t o e s t a bl i s h t h e me t h od of d a t i ng by
l e x i c a l d e c a y upon an a d e q ua t e t h e or e t i c a l f ound a t i on. T h e me t h od
d i s c us s e d i s t h a t i nv e nt e d by S wa d e s h (1) ov e r a d e c a d e a go a nd
us ua l l y known a s gl ot t oc h r onol ogy or l e x i c os t a t i s t i c s . I n t h e i nt e r -
v e ni ng y e a r s i t h a s be e n wi d e l y a ppl i e d , but of t e n t o t h e a c c ompa -
ni me nt of muc h c onf us i on a nd c ont r a v e r s y . I t s e e ms t h a t muc h of
t h e c onf us i on c a n be r e mov e d by a r i gor ous t r e a t me nt of t h e ph e nom-
e nol ogi c a l mod e l a nd c a r e f ul a ppl i c a t i on of s t a t i s t i c s . T h e c ont r a v e r s y
c a n be r e mov e d onl y by t h e c ompl e t i on of a s uf f i c i e nt numbe r of
s uppor t i ng s t ud i e s . Ri gor ous f or mul a t i on pe r mi t s us t o' pi npoi nt
wh a t s t ud i e s a r e ne e d e d a nd wh a t c onc l us i ons a r e be i ng s ough t .
Gr a nt i ng ( as not e v e r y one s e e ms wi l l i ng t o do) t h a t t h e ba s i c
f a c t of " uni f or m" l e x i c a l d e c a y oc c ur s , t h e pr obl e m t o be a t t a c ke d
i s t h a t of c or r e c t l y f or mul a t i ng mod e l s f or l e x i c a l d e c a y a nd of
c or r e c t l y d e r i v i ng s t a t i s t i c a l c ons e q ue nc e s f r om t h e s e mod e l s . I n
wh a t f ol l ows , we wi l l c ons t r uc t a s e t of mod e l s wh i c h s e e m t o f i t
t h e ne e d s of t h e me t h od of d a t i ng by l e x i c a l d e c a y , Our a ppr oa c h
i s s t r i c t l y pr a gma t i c , t h a t i s , we c ons t r uc t t h e mod e l we ne e d wi t h -
out c onc e r ni ng our s e l v e s a bout i t s a pr i or i r e a s ona bl e ne s s . La t e r
we t r y t o a s s e mbl e s ome a r gume nt s wh i c h j us t i f y t h e mod e l . I n no
se nse is t his a n a p p r o a c h for fir st pr inciple s.
T h e a n a l o g y b e t w e e n le xica l d e c a y a n d t he d e c a y p h e n o m e n a
of n u c l e a r p h y s i c s h a s b e e n oft e n n o t e d a n d d i s m i s s e d . I n t he p r e -
se nt pa pe r , w e insist t ha t t his a n a l o g y is m u c h m o r e t ha n a n a na logy;
it is, on t he fir st le ve l, a n ide nt it y. T h e only a lt e r na t ive t o t his
hypot he sis s e e m s t o be a kind of m y s t i c fa it h t ha t t he d e c a y o c c u r s
but w it hout pa lpa ble m a n i p u l a b l e pr inciple s. T h e b u r d e n of t he p r o o f
t ha t t he ide nt it y is fa lse lie s w it h t he d o u b t e r a n d w e w ill m a k e no
fur t he r d e m o n s t r a t i o n of it s va lidit y.
K l e i n e c k e - Z
D e c a y p h e n o m e n a in nucle a r physics a r e g o v e r n e d by r e la t ive ly
simple , w e ll u n d e r s t o o d pr inciple s. T o a pply t he se r e sult s t o le xica l
d e c a y w e fir st e st a blish t he conce pt s of a s e m a n t i c a t o m a n d a se t of
inde pe nde nt s e m a n t i c a t o m s . T h e o b s e r v e d fa ct of e xpone nt ia l d e c a y
of v o c a b u l a r y t he n is a c c o u n t e d for by a s s u m i n g t ha t t he le xica l i t e m
cove r ing a n a t o m d e c a y s a c c o r d i n g t o a P o i s s o n pr oce ss. O e n e r a l l y
spe a king, t he c o n v e r s e of t his is a lso t r ue , a nd only a P o i s s o n p r o c e s s
w o u l d p r o d u c e e xpone nt ia l de ca y. F r o m t he se conside r a t ions, w e
ca n d r a w m a n y conclusions a bout h o w t o a n d h o w not t o const r uct t e st
voca bula r ie s for da t ing pur pose s.
W i t h t his m o d e l in ha nd, w e ca n d r a w conclusions of a st a t ist ica l
na t ur e . F o r e x a m p l e , w e ca n de ve lop f o r m u l a s for t he p r o p e r m e t h o d
of da t ing t he split b e t w e e n t hr e e or m o r e la ngua ge s a n d for g o o d e st i-
m a t o r s in m o r e c o m p l e x sit ua t ions.
W e ca n const r uct a n inpr e cise he ur ist ic m o d e l for t he d y n a m i c
s e m a n t i c s unde r lying t he P o i s s o n pr oce ss. S o long a s t he fir st o r d e r
t he or y is a de qua t e , t his is m u c h in t he na t ur e of a cur iosit y. I t s e e m s ,
h o w e v e r , t ha t fir st o r d e r t he or y is not a de qua t e . A ct ua lly, such a
conclusion is r e a lly p r e m a t u r e b e c a u s e t he kind of ve r ifica t ion st udie s
n e e d e d h a v e not b e e n m a d e . A s s u m i n g t he pe ssimist ic conclusion, w e
h a v e t o const r uct s e c o n d ( or highe r ) o r d e r t he or ie s t o a c c o u n t for t he
ina de qua cie s of fir st o r d e r t he or y. A t t he m o m e n t , w e h a v e no use ful
r e sult s in t his dir e ct ion--t he p r o b l e m m e r g e s int o t he p r o b l e m of
dia le ct a t ion. P r o b a b l y t he m o s t i m p o r t a n t se r vice w e c a n r e n d e r is
t o indica t e e xa ct ly w h a t kind of de t a ile d st udie s a r e ne e de d.
S e m a n t i c A t o m s
I t is v e r y e a s y t o r a ise obje ct ions of a philosophica l na t ur e t o
t he conce pt of a s e m a n t i c a t om. I n t his p a p e r w e w ill s i m p l y ignor e
K l e i n e c k e - 3
t he se obje ct ions a n d de fine s e m a n t i c a t o m in a n ope r a t iona l w a y .
T h e r e a r e a lso ope r a t iona l difficult ie s, but t he se s e e m t o be sur -
m o u n t a b l e .
A s e m a n t i c a t o m is a
cie nt ly spe cifie d t o r e m o v e
c o m p l e t e l y de fine d unit c o n c e p t suffi-
a ll a mbiguit y. F o r e x a m p l e , in a n
a nt hr opologica l cont e xt , w e m i g h t h a v e "sun, a s point e d a t by a m a l e
a nt hr opologist a t high n o o n in t he m i d d l e of s u m m e r o n a n a v e r a g e
d a y a m o n g a g r o u p of y o u n g m e n w it h ple nt y t o dr ink". T h e kind of
subt ilit ie s n e e d e d t o c o m p l e t e t he de finit ion r e m i n d s o n e of K o r z y b s k i a n
G e n e r a l S e m a n t i c s , but t he int e nt ion is not t he s a m e . W e s e e k t o
r e m o v e a m b i g u i t y but w e m u s t h a v e a n o n - u n i q u e c o n c e p t - - o n e t ha t is
a l w a y s pr e se nt .
C e r t a inly t he r e h a s b e e n lit t le u s e of s e m a n t i c a t o m s a n y w h e r e
in t he pa st . T h o s e int e r e st e d in s e m a n t i c s for it s o w n se lf w ill r e je ct
t h e m a s u s e l e s s or m e a n i n g l e s s ; l e x o g r a p h e r s de a l in m o r e g e n e r a l i z e d
conce pt s. I t w o u l d be h a r d t o a r g u e t ha t t he y h a v e g e n e r a l ut ilit y, but
t he y a r e pr e cise ly w h a t is n e e d e d for st udying le xica l de ca y.
E a c h s e m a n t i c a t o m , in a n y s p e e c h a t a n y t ime , is a s s u m e d t o
be c o v e r e d by s o m e le xica l it e m. T h a t is, t he r e is s o m e w o r d w h o s e
m e a n i n g include s t ha t of t he a t o m . T hus, v o c a b u l a r i e s c a n be f o r m e d
o v e r a n y se t of s e m a n t i c a t o m s b y list ing t he c o v e r i n g le xica l i t e m
for e a c h a t o m . T h e kind of d e c a y be ing st udie d is t ha t w h e r e t he c o v -
e r ing le xica l i t e m is r e p l a c e d by a n o t h e r it e m. T h e r e p l a c e d w o r d
only r a r e ly i m m e d i a t e l y d i s a p p e a r s f r o m t he l a n g u a g e a s a w hole , but
it h a s d i s a p p e a r e d f r o m t he s e m a n t i c a t o m .
A n i n d e p e n d e n t se t of s e m a n t i c a t o m s is a se t of a t o m s a ll of
w h i c h diffe r a m o n g t h e m s e l v e s e n o u g h t o m a k e t he d e c a y a t a n y a t o m
c o m p l e t e l y i n d e p e n d e n t of t ha t a t a n y ot he r a t o m . T h u s , only one f r o m
se t s of w o r d s , like n u m e r a l s or p r o n o u n s , w it h ha bit ua l int e r r e la t ions
Kl e i ne c ke - 4
c a n a p p e a r i n t he s e t . I n d e p e n d e n t se t s a r e use ful b e c a u s e in
t h e m , p r o b l e m s of i n t e r - a t o m cor r e la t ions n e e d not be c o n s i d e r e d .
B e f o r e p a s s i n g on, w e should s a y a f e w w o r d s a s t o t he p r a c -
t ica l u s e of s e m a n t i c a t o m s . T h e r e d o e s not s e e m t o be a n y doubt
t ha t t he colle ct or s of v o c a b u l a r i e s w a n t t o w o r k w it h s e m a n t i c a t o m s - -
e v e n if t he ir r e sult s a r e c o m p l e t e l y unsucce ssful. I n a n e nt r y " d o g =
h u n d " t he y w o u l d like t o sa y t ha t t he r e is a s e m a n t i c a t o m a n d it s c o v e r
in E n g l i s h is "dog", in G e r m a n , "hund". T h e pit fa lls of t his sor t of
t hing a r e w e l l - k n o w n . S o m e c a r e in de fining a t o m s m i g h t m a k e it
fe a sible if w e r e q u i r e not c o m p l e t e ide nt it y of t he E n g l i s h a n d G e r m a n
s e m a n t i c s , but r a t he r t he e xist e nce of s o m e c o n c r e t e c o n c e p t w h e r e
bot h t he E n g l i s h a n d G e r m a n w o r d s a r e a ppr opr ia t e . C l e a r l y t his m u c h
w e a k e r r e q u i r e m e n t w ill be e a sie r t o sa t isfy, so w e a dopt it.
W e c o n c l u d e t ha t , w it h a d e q u a t e pr e ca ut ions, s e m a n t i c a t o m s
c a n be ope r a t iona lly fe a sible e v e n if t r ue r igor is i m p o s s i b l e . I n t he
c a s e of lit t le -know n la ngua ge s, t he r e is m u c h m o r e c h a n c e for e r r or .
W e should e n c o u r a g e colle ct or s of v o c a b u l a r i e s t o i m p r o v e t he p r e -
cision of t he ir de finit ions so t ha t t he a t o m in que st ion c a n be ide nt ifie d.
D e c a y P r o c e s s
W e a s s u m e t ha t le xica l de ca y, for a se t of i n d e p e n d e n t s e m a n t i c
a t o m s , is a P o i s s o n p r o c e s s . T h a t is, it sa t isfie s t hr e e condit ions:
i. E a c h a t o m d e c a y s i n d e p e n d e n t l y of a ll t he ot he r a t o m s .
2. E a c h a t o m d e c a y s i n d e p e n d e n t l y of it s hist or y of e a r lie r
de c a y.
3 . T h e r e is a const a nt k s u c h t ha t for e a c h a t o m t he p r o -
ba bilit y of one d e c a y in a shor t t i m e int e r va l A t is kA t ,
a n d t he pr oba bilit y of m o r e t ha n o n e d e c a y is ne gligible .
Kl e i ne c ke - 5
I t is r a t he r e a s y t o d e d u c e t ha t for l o n g e r t i m e int e r va ls t ,
t he pr oba bilit y of not d e c a y i n g is e xp( -I t ) , a n d if t he r e a r e N a t o m s ,
t h e e x pe c t e d numbe r of und e c a y e d a t oms a f t e r t i me t i s Ne x p( - l t ) .
T h i s f or mul a i s t h e us ua l f or mul a f or l e x i c a l d e c a y . It s h oul d
be poi nt e d out t h a t i t wa s t e s t e d , s t a t i s t i c a l l y , i n t h e f i r s t publ i c a t i on
by S wa d e s h , a nd i t f a i l e d t o pa s s . Th e d i f f i c ul t y i s pr oba bl y d ue to
t h e wor d l i s t us e d wh i c h i s not an i nd e pe nd e nt s e t of a t oms .
If we e x a mi ne t h e a s s umpt i ons ma d e so f a r , we s e e t h a t a ny
l i s t of s e ma nt i c a t oms c a n be us e d i f t h e y a r e : (1) i nd e pe nd e nt ; a nd
(2) a s s ur e d of e x i s t e nc e t h r ough out t h e t i me i n q ue s t i on. T h e r e i s no
s a t i s f a c t or y a pr i or i ba s i s f or a s s umi ng t h a t s ome ki nd s of s e ma nt i c
a t oms d e c a y at d i f f e r e nt r a t e s t h an ot h e r ki nd s , and i t i s d oubt f ul i f
e nough h i s t or i c a l e v i d e nc e c a n be c ol l e c t e d t o ma ke s uc h a c onc l us i on
s t at i s t i c a l l y s i gni f i c a nt .
Th e q ue s t i on wh e t h e r l i s a uni v e r s a l c ons t a nt , a c ons t a nt
wi t h i n a ny one l a ngua ge but pos s i bl y d i f f e r i ng be t we e n l a ngua ge s , or
a v a r i a bl e , i s e a s i e r to d i s c us s . So f a r , i nd i c a t i ons a r e t h a t k i s
a bout e q ua l t o 1/ 5000 y e a r s . Now t h i s me a ns t h at ov e r t h e s pa n of
mos t h i s t or i c e v i d e nc e , e x p( - kt ) wi l l be gr e a t e r t h an a bout 0. 60.
T h e r e i s a gr e a t d e a l of s c a t t e r t o be e x pe c t e d i n t h e r e s ul t s be c a us e
N e x p( - kt ) i s an e x pe c t a t i on, not an e x a c t pr e d i c t i on.
T h e r e h a v e be e n a numbe r of s t ud i e s of t h e e x pone nt of e x po-
ne nt i a l d e c a y . Al l of t h e m a r e t oo s upe r f i c i a l t o be c onc l us i v e (Z) .
An a d e q ua t e s t ud y i n a ny one l a ngua ge woul d h a v e t o me e t s e v e r a l
c r i t e r i a wh i c h ma ke i t i nt o a ma j or r e s e a r c h e f f or t . A s e t of i nd e -
pe nd e nt s e ma nt i c a t oms mus t be s e l e c t e d - - s e l e c t e d pr i or t o d e t a i l e d
s t ud y - - a nd no a t oms , h owe v e r d i f f i cul t , d r oppe d wi t h out c ompl e t e
e x pl a na t i ons ( 3) . T h e n t h e h i s t or y of e a c h a t om mus t be t r a c e d t h r ough
t h e h i s t or i c a l r e c or d to l oc a t e t h e l e x i c a l i t e m c ov e r i ng t he a t om at
K l e i n e c k e - 6
e a c h point in t ime . I n r e por t ing t he st udy, a ll of t his should be fully
d o c u m e n t e d in de t a il. E a c h inst a nce of d e c a y c a n t he n be r e c o g n i z e d
a n d t a llie d. S t a t ist ica l t e st s should be a pplie d t o se e w h e t h e r or not
t he m o d e l is sa t isfie d a n d t o e s t i m a t e )~ . F o r e x a m p l e , if t he r e a r e
i0 0 s e m a n t i c i t e m s T, t he r e should be a bout o n e d e c a y e v e r y 5 0 y e a r s
u n i f o r m l y s p r e a d t h r o u g h t ime . T h e s e t hings c a n be c h e c k e d st a t is-
t ica lly. W e h o p e t ha t schola r s w ill u n d e r t a k e de finit ive st udie s of t his
t ype for a s m a n y c a s e s a s possible ( 4) .
Unt il t he r e sult s of t he kind of r e s e a r c h just m e n t i o n e d a r e a va il-
a ble , t he st a t us of ~ is unsur e . W e a nt icipa t e it w ill be r e c o g n i z e d
a s a unive r sa l const a nt .
T h e r e r e m a i n s t he p r o b l e m of m a k i n g a P o i s s o n p r o c e s s a r e a -
sona ble a s s u m p t i o n . I n ot he r w o r d s , w e n e e d t o d e s c r i b e s o m e sor t
of m e c h a n i s m w h i c h m a k e s w o r d s slip off s e m a n t i c a t o m s inde pe nde nt ly
of h o w long t he y h a v e b e e n c o v e r i n g t he a t o m , a n d a t a const a nt r a t e
p e r unit t ime , a t le a st o v e r shor t t i m e int e r va ls. I ncide nt a lly,
since )~ is on t he o r d e r of 1 / 5 0 0 0 ye a r s, 5 0 y e a r s is a shor t t i m e
int e r va l. S ince t he s p e a k e r s of n o r m a l l a n g u a g e s a r e not hist or ia ns,
t he i n d e p e n d e n c e f r o m hist or y s e e m s e a s y t o a cce pt .
T h e const a nt r a t e is h a r d e r t o a cce pt . F ir st of a ll w e h a v e t o
a c c o u n t for a n ide nt ica l figur e in popula t ions, lit e r a t e a n d illit e r a t e ,
a n d b e t w e e n a ha ndful of s p e a k e r s a n d ha lf a billion spe a ke r s. T h e
d e c a y e ffe ct m u s t be i n d e p e n d e n t of t he n u m b e r of spe a ke r s, h e n c e it
m u s t be ope r a t ive a t t he le ve l of t he single isola t e d spe a ke r . T his is
sa t isfa ct or y since , b y a n d la r ge , t he a m o u n t of s p e e c h r e a ching a n
individua l d o e s not s e e m t o h a v e c h a n g e d m u c h t h r o u g h o u t hist or y a n d
d o e s not v a r y m u c h b e t w e e n cult ur e s a t t he p r e s e n t da y.
B u t w h y d o e s a s p e a k e r de cide t o c h a n g e a n occa siona l le xica l
i t e m - - a b o u t i~0 in his life t ime --a nd m a i n t a i n t he r e st . T h e only
K l e i n e c k e - 7
h y p o t h e s i s w e h a v e b e e n a ble t o const r uct is t ha t a ll w o r d s a r e a l w a y s
u n d e r p r e s s u r e - - p e r h a p s f r o m se ve r a l s e m a n t i c "dir e ct ions" a t t he
s a m e t ime . M o s t a t o m s r e sist c h a n g e m o s t of t he t ime , but s o m e se t
of a ccide nt s ( a ll v e r y r e a l e ve nt s a t t he sociologica l a n d p s y c h o l o g i c a l
le ve ls, but r a n d o m a ccide nt s in o u r cont e xt ) w e a k e n s a fe w , a n d t he
le xicon de ca ys. I n ot he r w o r d s , t he r e is a const a nt d y n a m i c m o v e -
m e n t a m o n g s e c o n d a r y a n d incide nt a l c o v e r s of t he s e m a n t i c a t o m
w h i c h t hr e a t e n t he pr incipa l cove r . U s u a l l y t he t hr e a t e ning le xica l
i t e m s r e ce de , but occa siona lly, in a r a n d o m w a y , a bout o n c e e v e r y
five t h o u s a n d y e a r s t he pr incipa l c o v e r is d i s p l a c e d a n d a le xica l de -
c a y occur s.
T h e hypot he t ica l m e c h a n i s m a d v a n c e d t o e xpla in le xica l d e c a y
c a n be c h e c k e d a ga inst hist or y by c a s e st udie s of s e m a n t i c a t o m s .
E a c h a t o m should s h o w t i m e pe r iods w h e n t he pr incipa l w o r d w a s
n e a r l y displa ce d. D u r i n g t he se pe r iods it is difficult t o de cide w h e t h e r
t he old w o r d or a n e w w o r d is t he pr incipa l cove r . U s u a l l y t he n e w
w o r d w ill p a s s a w a y a ga in, but s o m e t i m e s it w ill displa ce t he old w o r d .
A v e r y t e nt a t ive g u e s s b a s e d on a ca sua l e x a m i n a t i o n of one h u n d r e d
c u r r e n t E n g l i s h w o r d s sugge st s t he r e a r e a bout four v e r y he a vily
t h r e a t e n e d w o r d s p e r h u n d r e d . S ince w e c a n e x p e c t a b o u t o n e w o r d
t o be d e c a y i n g a t t his m o m e n t , w e c o n c l u d e t ha t a bout t hr e e out of
four t i m e s t he old w o r d sur vive s. A ll of t his n e e d s t o be ve r ifie d or
d i s p r o v e n in de t a ile d st udie s.
D e c a y S t a t ist ics
T h e st a t ist ica l c o n s e q u e n c e s of t he m o d e l - - t h e fir st o r d e r m o d e l
d e s c r i b e d a b o v e - - n e e d t o be e xplor e d. W e c a n n o t h a n d l e a ll possible
sit ua t ions, but t he follow ing e x a m p l e s should p r o v i d e a n a d e q u a t e d e m -
onst r a t ion of t e chnique so t ha t a n y ot he r p r o b l e m s w h i c h o c c u r c a n be
solve d in t he s a m e m a n n e r .
Kl e i ne c ke - 8
F i r s t , l e t us c ons i d e r N l a ngua ge s d e v i a t i ng i nd e pe nd e nt l y f r om
a c ommon pa r e nt wh i c h i s not known to us . Th e f ol l owi ng d i s c us s i on
i s a bi t mor e c umbe r s ome t h a n s ome a l t e r na t i v e a ppr oa c h e s , but i t
ge ne r a l i z e s mor e e a s i l y .
Le t ~ be a ny s e t of t h e N l a ngua ge s and l e t P ( a ) be t h e pr o-
ba bi l i t y t h a t t h e gi v e n s e ma nt i c a t om i s c ov e r e d by t h e or i gi na l l e x i c a l
i t e m i n e x a c t l y t h e l a ngua ge s of s e t C~ Ne w c ov e r i ng wor d s a r e
a s s ume d t o be d i f f e r e nt i n e a c h of t h e i nnov a t i ng l a ngua ge s . P(cc) i s
a f unc t i on of t i me a nd s a t i s f i e s t h e f ol l owi ng d i f f e r e nt i a l e q ua t i on:
wh e r e i a nd j a r e l a ngua ge s , ~ a nd me a n " be l ongs t o" a nd
" d oe s not be l ong t o" r e s pe c t i v e l y , and ~) i s t h e uni on of c~ a nd t h e
s e t c ont a i ni ng onl y t h e l a ngua ge j .
Le t lal d e not e t h e numbe r of l a ngua ge s i n ~ . If 10~1 = N, t h e
e q ua t i on i s e a s y t o solve :
= e x p ( - X t ) I=[ = N p
If a f e w c a s e s - - I ~ I = N - 1 , I ~ I : N - Z. e t c . - - a r e s ol v e d , we a r e
l e a d t o h y p o t h e s i z e t ha t
P( ~) = e x p( - kt ) loci
(1 - e x p( - kt ) ) N- I~1
T h i s c a n b e p r o v e n b y i n d u c t i o n o n ] ~ I f r o m IO~l : I N d o w n w a r d since
IOC(~jl = I ~I + I . T h e n
K l e i n e c k e - 9
d p( g) : _ ] Cci Xp( g ) + X( N- [ C~[ ) e x p( - kt ) [ al + l ( 1- e x p( - X t ) ) N- I c ~ l - 1
dt
so t h a t
d < p( c ~ ) e x p (xt)laIS. = ( N- ] C~ l ) k e x p( - kt ) ( 1- e x p( - kt ) ) N- I c ~ l - 1
P ( g) ex p( X t ) ] al : {1 - e x p( - X t ) ) N- I(l] ,"
a n d t he h y p o t h e s i s is p r o v e n b y induct ion.
T h u s , P ( c~ ) d e p e n d s only o n t he v a l u e of Ic~l = n . W e c a n r e c -
o g n i z e P ( n) for n = Z , 3 , . . . , N but P ( 0 ) a n d P ( 1 ) c a n n o t b e dist in-
g u i s h e d so w e c o m b i n e t h e s e int o P ' w h i c h is o b t a i n e d b y
P ' : 1 N ( N - I ) P ( Z ) . . . . N '.
Z n: N - n:
P ( n) . . . . . P ( N)
~,,N
= 1 - ( ( 1- e x p( - X t ) ) + e x p( - kt ) , - P{ 0) - NP ( 1) ;
. /
s i nc e t h e r e a r e N; / n! ( N- n) ' s e t s wi t h ]0~ I = n . T h us ,
P ' = (1 - e x p( - kt } ) N- 1 (1 + ( N- 1) e x p( - kt ) ) .
Now s uppos e t h a t f r om K s e ma nt i c a t oms we obs e r v e t h a t k N a t oms
a r e c ov e r e d by t h e s a me wor d i n al l l a ngua ge s , a nd kN_ 1 i n al l but
one, a nd so on t o k 2 , a nd t h e r e a r e k' a t oms d i f f e r e nt l y c ov e r e d i n
al l l a ngua ge s . T h e pr oba bi l i t y of t h i s oc c ur i ng i s
K l e i n e c k e - i0
k n k N - I k Z k'
P ( N ) P ( N - I ) . . . . P ( 2) P '
( 2ks + 3 k s + . . .
x
N k N ) (I - x)
N K - k ' - ( Z k 2 + . . . N k N ) k'
(1 + ( N - l) x)
w h e r e x = e x p ( - k t ) . A m a x i m u m liklihood e s t i m a t e for x s e e m s t o
be t he be st single va lue w e c a n a s s i g n t o x . T h i s is obt a ine d b y
se t t ing t he ( loga r it hmic) de r iva t ive of pr oba bilit y t o z e r o so t ha t
0 = A - k ' N K - A + ( N - l) k'
x l - x i + ( N - l ) x
w h e r e A = k' + Z k s + 3 k s + . . . + N k N . O r
N ( N - l ) K x s - ( ( N - I ) A - N K + k') x - ( A - k ' ) = 0 .
I f N = Z , x a = k s / K , w h i c h is t he w e l l - k n o w n f o r m u l a for t he s e p a r a -
t ion b e t w e e n t w o la ngua ge s. F o r g e n e r a l N , is t he solut ion of t he
X
qua t r a t ic e qua t ion give n a bove . N o t e t ha t t he a n s w e r d e p e n d s on t he
st a t ist ic A w h i c h d o e s not usua lly a p p e a r in d i s c u s s i o n s of le xica l
da t ing.
A n e v e n m o r e g e n e r a l diffe r e nce b e t w e e n t his t r e a t m e n t a n d
u s u a l t r e a t m e n t b y pa ir s is found in t he u s e m a d e of t he n u m b e r of a ll
t he l a n g u a g e s cont a ining a ce r t a in le xica l i t e m a s t he c o v e r of a se -
m a n t i c a t o m . T h i s kind of count is a l m o s t n e v e r m a d e in t he lit e r a t ur e
o n da t ing p r o b l e m s .
A n o t h e r c a s e w h i c h const a nt ly r e c u r s in pr a ct ice is t ha t of t hr e e
la ngua ge s; i, 2 a n d 3, sa y. "T he pa ir 1 a n d 2 a r e m o r e close ly
K l e i n e c k e - I i
r e la t e d t ha n l a n g u a g e 3 is t o e it he r I or ? . S u p p o s e t is t i m e
f r o m t he c o m m o n a n c e s t o r of i, 2 a n d 3 t o 3 , a n d t ' t he t i m e
f r o m t he c o m m o n a n c e s t o r of 1 a n d 2 t o 1 or Z . L e t x = e x p ( - I t ) ,
x = e xp( -kt ' ) so t ha t x / x ' is t he pr oba bilit y a s s o c i a t e d w i t h t he
t i m e f r o m t he c o m m o n a n c e s t o r of I , Z a n d 3 t o t ha t of 1 a n d 2 .
W e m i g h t o b s e r v e a n y of five sit ua t ions c o n c e r n i n g t he c o v e r
of a s e m a n t i c a t o m . I t m a y b e t he s a m e in a ll ( i, Z , 3 ) ; or in a n y
pa ir ( I , 2) , ( i, 3 ) or ( Z , 3 ) , or diffe r e nt in e a ch. T h e pr oba bilit y of
e a c h of t h e s e e v e n t s is
X X l ~ X s I
X l S 3 = X ~ - T = X ,
X I t ) X ~ t
x l ~ = x ~ 3 = x - - r x ( l - x = ( l - x ) ,
X
x 1 ~ = ~ l - x + x ( l - x I ~ = x ( x - x ~ ) ,
t = . I . - t X ~ S
x I x ~ x - Z x 2 ( I x ' ) x ( x ' - x 2) = 1 - - Z x ~ + Z x ~ x '
/ /
= ( i - x ) ( I + x - Z x s)
S u p p o s i n g klz 3 , k~ s, k s 8 , kle a n d k' of e a c h of t h e s e is o b s e r v e d
w h e n K a t o m s a r e c o n s i d e r e d . T h e t ot a l pr oba bilit y is
' ( kls + klss ) x' + k~s + k2s xS ) kls x'
x Z ( k l s + k ~ s + k ~ s s ) x ( i - ) k' ( x'- ( i+ -Z x2 ) k'
l
M a x i m u m liklihood e s t i m a t e s for x a n d x a r e got t e n f r o m t he
e q u a t i o n s o b t a i n e d b y se t t ing t he ( loga r it hmic) d e r i v a t i v e s b y x a n d
!
x t o z e r o se pa r a t e ly.
K l e i n e c k e - iZ
2( k~3 + k~3 + k1 2 s ) 2 k l s x ' 4 k 'x
x x - X ~ l + x - 2 x 2 '
k t
k~ s+ klsa k ' + k ~ s + k ~ + + l + x ' 2 x ~
= x' " 1 - x I
T h e s e e qua t ions a r e be st solve d n u m e r i c a l l y for give n va lue s of k~s s,
kls , kls, kss a n d k' .
T h e m e t h o d o l o g y is s t r a i g h t - f o r w a r d a n d t he r e is n o n e e d t o
mult iply e x a m p l e s . I n e v e r y ca se w e obt a in n e w f o r m u l a s b a s e d on
m a x i m u m liklihood e st ima t or s. A n o t h e r a r e a in w h i c h t he se m e t h o d s
could a lso be ut ilize d is in t he const r uct ion of significa nce t e st s a n d
confide nce ba nds. W i t h t his ba sis, m o s t of t he m a c h i n e r y of m o d e r n
st a t ist ics w o u l d be a va ila ble for use .
C r i t i c i s m of F ir st O r d e r T h e o r y
A s w e e x p l a i n e d in discussing s e m a n t i c a t o m s , w e fe e l t he r e is
no a d e q u a t e o b s e r v a t i o n a l da t a t o w h i c h t o a pply t he se f o r m u l a s for a
conclusive t e st of t he ir va lue . W e h a v e m a d e a f e w e x p e r i m e n t a l
a pplica t ions using t he unsa t isfa ct or y da t a a va ila ble in t he lit e r a t ur e .
N u m e r i c a l l y , t he t i m e e s t i m a t e s we obt a ine d, w h i c h w e w ill
not quot e he r e , do not diffe r a gr e a t de a l f r o m t hose obt a ine d b y con-
side r ing pa ir s a lone . T his is t o be e x p e c t e d if t he p h e n o m e n a a r e a t
a ll consist e nt . T h e va lue in t he f o r m u l a s d e r i v e d a b o v e lie s in t he
fa ct t ha t t he y c o r r e c t l y c o m b i n e t he da t a f r o m se ve r a l pa ir s.
T h e fir st -or de r m e t h o d d o e s h a v e o n e v e r y i m p o r t a n t difficult y
w h i c h a p p e a r s a l m o s t i m m e d i a t e l y if w e t r y t o t r e a t m o r e t ha n t hr e e
la ngua ge s. T h i s difficult y is in t he f a m i l y t r e e of t he la ngua ge s.
I n t he e nt ir e fir st -or de r d e v e l o p m e n t , w e h a v e implicit ly u s e d
t he c o n c e p t of a t r e e . L a n g u a g e s go t oge t he r a s a " c o m m o n a n c e s t o r "
K l e i n e c k e - 1 3
unt il s o m e point in t i m e w h e n t he y divide a n d b e c o m e t w o s e p a r a t e
la ngua ge s. T h e t r e e is t he fir st -or de r m o d e l of dia le ct a t ion--it is
k n o w n t o be ina de qua t e , a t le a st in m a n y sit ua t ions. I n spit e of a
c e n t u r y or so of st udie s, w e s i m p l y do not u n d e r s t a n d h o w dia le c-
t a t ion o c c u r s . M o r e st udy is gr e a t ly n e e d e d , e spe cia lly in t he c o n -
st r uct ion of h i g h e r - o r d e r m o d e l s , but t he p r o b l e m lie s out side t he
s c o p e of t his pa pe r .
T h e difficult y w it h t he t r e e r ise s in d e c a y st udie s b e c a u s e only
split t ing is c o m p a t i b l e w it h o u r st a t ist ica l m o d e l . W e h a v e no a lt e r -
na t ive t o const r uct ing a f a m i l y t r e e if w e w i s h t o a pply t he m e t h o d
out line d a bove . H o w e v e r , it s e e m s t o be e a s y t o find e x a m p l e s w h i c h
do not a llow a t r e e t o be const r uct e d. C o n s i d e r four la ngua ge s; A ,
B , C a n d D . S u p p o s e one s e m a n t i c a t o m h a s t he s a m e c o v e r in A
a n d B , a n d a n o t h e r diffe r e nt c o v e r in C a n d D . A n d a t t he s a m e
t ime , s o m e ot he r a t o m h a s one c o v e r in A a n d C , a n d a diffe r e nt
c o v e r in B a n d D . W e ca nnot fit t his da t a int o a n y f a m i l y t r e e .
A lit t le m o r e spe cifica lly in t he R o m a n c e la ngua ge s, w e find
t ha t t he s a m e innova t ion w it h r e s p e c t t o L a t in is s h a r e d b y se ve r a l or
a ll t he la t e r la ngua ge s. S o m e of t his ca n be e xpla ine d b y t he colloquia l
v e r s u s l e a r n e d s p e e c h t he or y, but no f a m i l y t r e e c a n be c o n s t r u c t e d
t o e xpla in a ll t he c o m b i n a t i o n s of innova t ions. I f w e h a d a n a d e q u a t e
e xpla na t ion of t he p h e n o m e n a involve d in t he se s h a r e d innova t ions, it
is quit e possible t ha t w e could a s s u m e R o m a n c e w a s t he dir e ct d e s c e n -
de nt of I m p e r i a l L a t in w it hout going b a c k t o P l a u t u s or t he r e a bout s,
a s s e e m s t o be r e q u i r e d by t he fir st o r d e r t he or y.
A t e nt a t ive be ginning in t his dir e ct ion c a n be m a d e b y a s e c o n d -
o r d e r t h e o r y b a s e d o n t he d y n a m i c m o d e l of le xica l influe nce .
K l e i n e c k e - 1 4
S e c ond - Or d e r Le x i c a l De c a y
T h e i mpr e c i s e mod e l of s e ma nt i c pr e s s ur e s we f or me d t o
e x pl a i n l e x i c a l d e c a y s ugge s t s t h e f ol l owi ng s e c ond - or d e r mod e l .
F or e a c h s e ma nt i c a t om, we c ons i d e r not onl y a c ov e r i ng
l e x i c a l i t e m a s be f or e , but a l s o a pot e nt i a l c ov e r i ng i t e m. T h e pot e n-
t i a l c ov e r i s t h e s our c e of pr e s s ur e a ga i ns t t h e c ov e r . Wh e n t h e
c ov e r d e c a y s , i t i s r e pl a c e d by t h e pot e nt i a l c ov e r . Na t ur a l l y we
a l s o a s s ume t h a t t h e pot e nt i a l c ov e r d e c a y s a nd i s r e pl a c e d by a
ne w pot e nt i a l c ov e r . I n t h e i nt e r e s t of s i mpl i c i t y a nd be c a us e we
h a v e no nume r i c a l d a t a , we wi l l a s s ume bot h d e c a y s h a v e t h e s a me
const a nt k
F ir st , le t us c o n s i d e r a single la ngua ge . T h e sit ua t ion a t a n
a t o m ca n be of four t ype s: (1) bot h t he or igina l c o v e r a n d pot e nt ia l
c o v e r r e m a i n ; ( I f) t he or igina l c o v e r r e m a i n s , but t he pot e nt ia l
c o v e r h a s d e c a y e d ; ( I I I ) t he or igina l c o v e r h a s d e c a y e d a n d t he pot e n-
t ia l c o v e r h a s r e p l a c e d it; ( I V) t he c o v e r is n o w ne it he r t he or igina l
n o r t he pot e nt ia l cove r .
L e t P l a n d P I I be t he pr oba bilit y of t he fir st t w o sit ua t ions.
T h e n
d
~'-t P I = " Z X P I '
= - k P I I + X P I ;
so t h a t
P l = e xp( -Z it ) ,
K l e i n e c k e - 1 5
PI I = exp(-Xt) (I - exp(-%t))
T h e or i gi na l c ov e r r e ma i ns i n t h e s e t wo c a s e s onl y so t h a t t h e pr ob-
a bi l i t y of it r e m a i n i n g is
P I + P I I = e x p( - kt )
wh i c h i s e x a c t l y t h e s a me as i n f i r s t - or d e r t h e or y .
Wh e n t h e s e c ond - or d e r t h e or y i s a ppl i e d t o N l a ngua ge s , t h e
r e s ul t s a r e q ui t e c ompl i c a t e d . We d i v i d e t h e l a ngua ge s i nt o f our s e t s
(~, B, Y, 6 d e pe nd i ng on wh i c h s i t ua t i on h ol d s i n t h e l a ngua ge ; i n s e t 0%
s i t ua t i on I h ol d s , a nd so on. T h e n we h a v e t h e ba s i c d i f f e r e nt i a l
e q ua t i on
- ; ' , )
P ( a , ~ , y , 6) = - 2%p( a , ~ , y , 6) , S ~ j e ~ + %P ( a , ~ , , 8 )
dt iec~ . ~ L~k
~jc~ %P ( ~ @j , ~ j , y , 6) + I k
x P(~@k, ~, X -Cgj, 6)
+ ~. ~6% P ( C ~, ~E ) ~, Y , 6~, f. ) + Z ~. c6
w h i c h h a s t he solut ion
P(&,~,,6) = [exp(-Xt)] 21Ct{ + IBI + IYI [1 - exp(-Xt)] l~I + IYi + 2161
W e h a v e no w a y of r e c o g n i z i n g t he condit ion of t he pot e nt ia l c o v e r , so
se t s ez a n d ~ s h o u l d b e c o m b i n e d int o a se t ~ a n d
P(~,y, 6) = [exp(-%t)] Irll + Ivl [ I - exp(-%t}] IYI + 2t61
Kl e i ne c ke - 16
B e f o r e w e c a n a ct ua lly a pply t he m a x i m u m liklihood t e chnique t o la n-
g u a g e s w it hout k n o w n a nce st or s, w e h a v e t o m a k e S o m e fur t he r c o m -
bina t ions b e c a u s e se t s w it h I~]I = 1 c a n not be dist inguishe d f r o m
thos e with I ~ I - o or thos e w i t h I ' ~ I - i f r o m thos e with l ~ J- 0
M o r e o v e r , w e c a n n o t dist inguish or igina l c o v e r s f r o m pot e nt ia l c o v e r s
so t ha t t w o se t s T] a n d y m u s t be c o m b i n e d w it h t he s a m e se t s in
t he r e v e r s e or de r .
T h e g e n e r a l c a s e is v e r y complica t e d, so w e r e st r ict o u r s e l v e s
t o t w o la ngua ge s. W e t he n o b s e r v e t ha t t he c o v e r s a r e e it he r t he
s a m e or diffe r e nt . I f t he y a r e t he s a m e , w e h a v e e it he r I~ I = 2 a n d
IYI = 161 = 0 , o r I~{I = 2 a n d IT]I = 161 = 0 T h u s , t he pr oba bilit y is
[e xp( -kt ) ] s + [e xp( -kt ) ] s [i - e xp( -kt ) ] s
= e xp( -Z kt ) [i + (i e xp( -kt ) ) s]
w h i c h diffe r s f r o m t he fir st o r d e r t h e o r y by t he t e r m in t he s q u a r e
br a cke t .
T h e s i m p l e s t c a s e w h e r e t he s e c o n d - o r d e r t h e o r y is r e a lly r e -
q u i r e d is t ha t of four la ngua ge s. W e w ill illust r a t e t he r e sult s b y one
e xpr e ssion. I f kss w o r d s a r e c o v e r e d b y t w o i t e m s bot h in t w o la n-
gua ge s, k 4 w o r d s b y one i t e m in a ll la ngua ge s, k 8 b y o n e i t e m in
t hr e e la ngua ge s, k s b y one i t e m in t w o la ngua ge s, a n d k' b y no
c o m m o n i t e m s , t h e n t he e x p r e s s i o n t o be solve d for m a x i m u m liklihood
is
4kss + 4 k 4 + 3 k s + 2 k s 2ks s + k 3 + 3ks + 5k'
p 1 - p
K l e i n e c k e - 1 7
4 k 4( I _ p) S ks( 3 4 p + 4 p s)
2 - 4 p + 6p ~ - 4 p ~ + p% " + 2 + 3 p - 2p~ + p4
k s ( 4 - Z p - 3 p s )
+ 2 + 4 p - p~ - p3 +
k' ( 5 + 6 p + 9 p 2)
+ 5 p + 3 p s + 3 p s
w h e r e p = e xp( -% t ) .
T h i s s e c o n d - o r d e r t h e o r y is not sa t isfa ct or y not only b e c a u s e
it le a ds t o v e r y c o m p l e x f o r m u l a s , but it a lso s e e m s t o be qua lit a -
t ive ly ina de qua t e . T h e f o r m u l a for split t ing b e t w e e n t w o l a n g u a g e s
is not gr e a t ly m o d i f i e d e xce pt for v e r y long t ime s, a n d t he c h a n g e
d o e s not s e e m t o be e n o u g h t o a c c o u n t for da t a s h o w i n g shor t t i m e s
of division. I t is h a r d t o t e ll w h e t h e r t he f o r m u l a for se ve r a l la n-
g u a g e s including t he qua nt it y k2e is a n y h e l p - - s o fa r w e h a v e n o
st r iking r e sult s t o quot e f r o m it s use .
A s e c o n d - o r d e r t h e o r y w h e r e pot e nt ia l c o v e r d e c a y e d a t a
diffe r e nt r a t e t ha n t he or igina l c o v e r m i g h t c o r r e c t s o m e of t he se
de fe ct s, but w e h a v e no e v i d e n c e u p o n w h i c h t o e s t i m a t e t he d e c a y
r a t e in t his ca se . I t is m u c h like ly t ha t a m o r e e la bor a t e m e c h a n i s m
m u s t be post ula t e d--it n e e d not le a d t o m o r e e la bor a t e r e sult s. T h e
m o d e l m u s t be b a s e d o n a kind of dia le ct a t ion st udy w h i c h s e e m s t o
be a b s e n t a s ye t f r o m t he lit e r a t ur e .
C o n c l u s i o n
W e h a v e d e r i v e d a n u m b e r of f o r m u l a s r e la t ing t o t he e s t i m a t i o n
of t i m e de pt hs b y o b s e r v a t i o n s of le xica l de ca y. T h e m e t h o d s u s e d
c a n be a pplie d t o obt a in m a n y m o r e simila r f o r m u l a s a s r e q u i r e d in
st udie s of a ct ua l da t a .
A ll of t he se f o r m u l a s a r e b a s e d o n m o d e l s of le xica l d e c a y using
t he c o n c e p t of s e m a n t i c a t o m s a n d t he ir le xica l cove r s. L e x i c a l d e c a y
Kl e i ne c ke - 18
i s i d e nt i f i e d wi t h a c h a nge i n l e x i c a l c ov e r . If t h e s e ma nt i c a t oms
a r e s uf f i c i e nt l y i nd e pe nd e nt , t h e d e c a y i s a P oi s s on pr oc e s s .
P r oba bl y t h e mos t i mpor t a nt pr a c t i c a l c onc l us i on i s t h e r e s ul t
t h a t a ny s e t of s e ma nt i c a t oms c a n be us e d t o e v a l ua t e l e x i c a l d e c a y
pr ov i d e d t h e s e t i s ma d e up of a t oms :
.
.
f a r e nough r e mov e d i n me a ni ng f r om one a not h e r t o a s s ur e
i nde pe nde nc e ,
wh i c h r e pr e s e nt c onc e pt s a s s ur e d to h a v e be e n i n e x i s -
t e nc e t h r ough out t h e t i me pe r i od be i ng s t ud i e d .
( 1 )
(z)
( 3)
(4)
End Not e s
S e e R o b e r t B . L e e s, " T h e B a sis of G lot t ochr onology"
L a n g u a g e , 29 . I 1 3 -2 7 ( 1 9 5 3 ) .
T h e r e is no out st a nding st udy of t his p r o b l e m . A t t e m p t s
t o " i m p r o v e " t he t e st v o c a b u l a r y by limit ing it t o m e a n -
ings w h i c h h a v e b e h a v e d w e ll in e a r lie r st udie s a r e m e t h -
odologica lly disa st e r ous b e c a u s e t he y bia s t he va lue of k.
T his r e q u i r e m e n t is a lso int e nde d t o r e m o v e bia s f r o m
t he e st ima t e of k .
T his is a m a t t e r of cla ssica l philologica l r e s e a r c h inde -
pe nde nt of st a t ist ica l synt he se s m a d e f r o m t he r e sult s.

You might also like