This document proposes models of lexical decay to establish a theoretical foundation for dating languages using lexical decay. It introduces the concepts of semantic atoms and independent semantic atoms to model lexical decay. Exponential decay of vocabulary is accounted for by assuming lexical items covering atoms decay according to a Poisson process. This allows conclusions about how to properly construct test vocabularies and date splits between languages. Higher order models may be needed to fully account for dynamics, but more verification studies are required first.
This document proposes models of lexical decay to establish a theoretical foundation for dating languages using lexical decay. It introduces the concepts of semantic atoms and independent semantic atoms to model lexical decay. Exponential decay of vocabulary is accounted for by assuming lexical items covering atoms decay according to a Poisson process. This allows conclusions about how to properly construct test vocabularies and date splits between languages. Higher order models may be needed to fully account for dynamics, but more verification studies are required first.
This document proposes models of lexical decay to establish a theoretical foundation for dating languages using lexical decay. It introduces the concepts of semantic atoms and independent semantic atoms to model lexical decay. Exponential decay of vocabulary is accounted for by assuming lexical items covering atoms decay according to a Poisson process. This allows conclusions about how to properly construct test vocabularies and date splits between languages. Higher order models may be needed to fully account for dynamics, but more verification studies are required first.
This document proposes models of lexical decay to establish a theoretical foundation for dating languages using lexical decay. It introduces the concepts of semantic atoms and independent semantic atoms to model lexical decay. Exponential decay of vocabulary is accounted for by assuming lexical items covering atoms decay according to a Poisson process. This allows conclusions about how to properly construct test vocabularies and date splits between languages. Higher order models may be needed to fully account for dynamics, but more verification studies are required first.
Comput at i onal Li ngui s t i c s M O D E L S O F L E X I C A L D E C A Y D . Kl e i ne c ke T E C H N I C A L M I L I T A R Y P L A N N I N G O P E R A T I O N G E N E R A L E L E C T R I C C O M P A N Y 7 3 5 S t a t e S t r e e t ( P . O . D r a w e r Q Q ) S A N T A B A R B A R A , C A L I F O R N I A 9 3 1 0 Z Abs t r a c t L e xica l d e c a y is t he p h e n o m e n o n unde r lying t he da t ing t e ch- nique s k n o w n a s "glot t ochr onology" a nd"le xicost a t ist ics. " M u c h of t he cont r a ve r sia l na t ur e of w o r k in t his fie ld is t he r e sult of e x t r e m e l y i m p r e c i s e founda t ions a n d la ck of a t t e nt ion t o t he unde r lying st a t ist ica l a n d s e m a n t i c m o d e l s . A sa t isfa ct or y s e m a n t i c m o d e l ca n be found in t he conce pt of se - m a n t i c a t om. N ot w it hst a nding a n u m b e r of philosophica l obje ct ions, t he s e m a n t i c a t o m is a n ope r a t iona lly fe a sible suppor t for a le xicon w h i c h is a s e m a n t i c subse t of a ll possible m e a n i n g s a n d a t t he s a m e t ime , e xha ust s t he v o c a b u l a r y of a la ngua ge . L e xica l d e c a y is t he p r o c e s s by w h i c h t he le xica l i t e m cove r ing a n a t o m is r e pla ce d b y a not he r le xica l it e m. E xpone nt ia l le xica l pr e se r va t ion is, in t his m o d e l , dir e ct ly a na logous t o d e c a y p h e n o m e n a in nucle a r physics. C o n s i s t e n c y r e - quir e s t ha t t he d e c a y p r o c e s s involve d in e xpone nt ia lly p r e s e r v e d voca bula r ie s be a P o i s s o n pr oce ss. T his s h o w s h o w t o f o r m t e st voca bula r ie s for da t ing a n d p r o v e s t ha t pr e se nt ly u s e d voca bula r ie s a r e not cor r e ct ly for me d. D ia le ct a t ion st udie s s h o w t ha t hist or ica lly dive r ging popula t ions m u s t be m o d e l l e d by cor r e la t e d P o i s s o n pr oce sse s. D e finit ive st a - t ist ica l t r e a t m e n t of t he se que st ions is not possible a t t his t ime , but m u c h de sir a ble r e s e a r c h ca n be indica t e d. Kl e i ne c ke - I I nt r od uc t i on T h i s pa pe r i s an a t t e mpt t o e s t a bl i s h t h e me t h od of d a t i ng by l e x i c a l d e c a y upon an a d e q ua t e t h e or e t i c a l f ound a t i on. T h e me t h od d i s c us s e d i s t h a t i nv e nt e d by S wa d e s h (1) ov e r a d e c a d e a go a nd us ua l l y known a s gl ot t oc h r onol ogy or l e x i c os t a t i s t i c s . I n t h e i nt e r - v e ni ng y e a r s i t h a s be e n wi d e l y a ppl i e d , but of t e n t o t h e a c c ompa - ni me nt of muc h c onf us i on a nd c ont r a v e r s y . I t s e e ms t h a t muc h of t h e c onf us i on c a n be r e mov e d by a r i gor ous t r e a t me nt of t h e ph e nom- e nol ogi c a l mod e l a nd c a r e f ul a ppl i c a t i on of s t a t i s t i c s . T h e c ont r a v e r s y c a n be r e mov e d onl y by t h e c ompl e t i on of a s uf f i c i e nt numbe r of s uppor t i ng s t ud i e s . Ri gor ous f or mul a t i on pe r mi t s us t o' pi npoi nt wh a t s t ud i e s a r e ne e d e d a nd wh a t c onc l us i ons a r e be i ng s ough t . Gr a nt i ng ( as not e v e r y one s e e ms wi l l i ng t o do) t h a t t h e ba s i c f a c t of " uni f or m" l e x i c a l d e c a y oc c ur s , t h e pr obl e m t o be a t t a c ke d i s t h a t of c or r e c t l y f or mul a t i ng mod e l s f or l e x i c a l d e c a y a nd of c or r e c t l y d e r i v i ng s t a t i s t i c a l c ons e q ue nc e s f r om t h e s e mod e l s . I n wh a t f ol l ows , we wi l l c ons t r uc t a s e t of mod e l s wh i c h s e e m t o f i t t h e ne e d s of t h e me t h od of d a t i ng by l e x i c a l d e c a y , Our a ppr oa c h i s s t r i c t l y pr a gma t i c , t h a t i s , we c ons t r uc t t h e mod e l we ne e d wi t h - out c onc e r ni ng our s e l v e s a bout i t s a pr i or i r e a s ona bl e ne s s . La t e r we t r y t o a s s e mbl e s ome a r gume nt s wh i c h j us t i f y t h e mod e l . I n no se nse is t his a n a p p r o a c h for fir st pr inciple s. T h e a n a l o g y b e t w e e n le xica l d e c a y a n d t he d e c a y p h e n o m e n a of n u c l e a r p h y s i c s h a s b e e n oft e n n o t e d a n d d i s m i s s e d . I n t he p r e - se nt pa pe r , w e insist t ha t t his a n a l o g y is m u c h m o r e t ha n a n a na logy; it is, on t he fir st le ve l, a n ide nt it y. T h e only a lt e r na t ive t o t his hypot he sis s e e m s t o be a kind of m y s t i c fa it h t ha t t he d e c a y o c c u r s but w it hout pa lpa ble m a n i p u l a b l e pr inciple s. T h e b u r d e n of t he p r o o f t ha t t he ide nt it y is fa lse lie s w it h t he d o u b t e r a n d w e w ill m a k e no fur t he r d e m o n s t r a t i o n of it s va lidit y. K l e i n e c k e - Z D e c a y p h e n o m e n a in nucle a r physics a r e g o v e r n e d by r e la t ive ly simple , w e ll u n d e r s t o o d pr inciple s. T o a pply t he se r e sult s t o le xica l d e c a y w e fir st e st a blish t he conce pt s of a s e m a n t i c a t o m a n d a se t of inde pe nde nt s e m a n t i c a t o m s . T h e o b s e r v e d fa ct of e xpone nt ia l d e c a y of v o c a b u l a r y t he n is a c c o u n t e d for by a s s u m i n g t ha t t he le xica l i t e m cove r ing a n a t o m d e c a y s a c c o r d i n g t o a P o i s s o n pr oce ss. O e n e r a l l y spe a king, t he c o n v e r s e of t his is a lso t r ue , a nd only a P o i s s o n p r o c e s s w o u l d p r o d u c e e xpone nt ia l de ca y. F r o m t he se conside r a t ions, w e ca n d r a w m a n y conclusions a bout h o w t o a n d h o w not t o const r uct t e st voca bula r ie s for da t ing pur pose s. W i t h t his m o d e l in ha nd, w e ca n d r a w conclusions of a st a t ist ica l na t ur e . F o r e x a m p l e , w e ca n de ve lop f o r m u l a s for t he p r o p e r m e t h o d of da t ing t he split b e t w e e n t hr e e or m o r e la ngua ge s a n d for g o o d e st i- m a t o r s in m o r e c o m p l e x sit ua t ions. W e ca n const r uct a n inpr e cise he ur ist ic m o d e l for t he d y n a m i c s e m a n t i c s unde r lying t he P o i s s o n pr oce ss. S o long a s t he fir st o r d e r t he or y is a de qua t e , t his is m u c h in t he na t ur e of a cur iosit y. I t s e e m s , h o w e v e r , t ha t fir st o r d e r t he or y is not a de qua t e . A ct ua lly, such a conclusion is r e a lly p r e m a t u r e b e c a u s e t he kind of ve r ifica t ion st udie s n e e d e d h a v e not b e e n m a d e . A s s u m i n g t he pe ssimist ic conclusion, w e h a v e t o const r uct s e c o n d ( or highe r ) o r d e r t he or ie s t o a c c o u n t for t he ina de qua cie s of fir st o r d e r t he or y. A t t he m o m e n t , w e h a v e no use ful r e sult s in t his dir e ct ion--t he p r o b l e m m e r g e s int o t he p r o b l e m of dia le ct a t ion. P r o b a b l y t he m o s t i m p o r t a n t se r vice w e c a n r e n d e r is t o indica t e e xa ct ly w h a t kind of de t a ile d st udie s a r e ne e de d. S e m a n t i c A t o m s I t is v e r y e a s y t o r a ise obje ct ions of a philosophica l na t ur e t o t he conce pt of a s e m a n t i c a t om. I n t his p a p e r w e w ill s i m p l y ignor e K l e i n e c k e - 3 t he se obje ct ions a n d de fine s e m a n t i c a t o m in a n ope r a t iona l w a y . T h e r e a r e a lso ope r a t iona l difficult ie s, but t he se s e e m t o be sur - m o u n t a b l e . A s e m a n t i c a t o m is a cie nt ly spe cifie d t o r e m o v e c o m p l e t e l y de fine d unit c o n c e p t suffi- a ll a mbiguit y. F o r e x a m p l e , in a n a nt hr opologica l cont e xt , w e m i g h t h a v e "sun, a s point e d a t by a m a l e a nt hr opologist a t high n o o n in t he m i d d l e of s u m m e r o n a n a v e r a g e d a y a m o n g a g r o u p of y o u n g m e n w it h ple nt y t o dr ink". T h e kind of subt ilit ie s n e e d e d t o c o m p l e t e t he de finit ion r e m i n d s o n e of K o r z y b s k i a n G e n e r a l S e m a n t i c s , but t he int e nt ion is not t he s a m e . W e s e e k t o r e m o v e a m b i g u i t y but w e m u s t h a v e a n o n - u n i q u e c o n c e p t - - o n e t ha t is a l w a y s pr e se nt . C e r t a inly t he r e h a s b e e n lit t le u s e of s e m a n t i c a t o m s a n y w h e r e in t he pa st . T h o s e int e r e st e d in s e m a n t i c s for it s o w n se lf w ill r e je ct t h e m a s u s e l e s s or m e a n i n g l e s s ; l e x o g r a p h e r s de a l in m o r e g e n e r a l i z e d conce pt s. I t w o u l d be h a r d t o a r g u e t ha t t he y h a v e g e n e r a l ut ilit y, but t he y a r e pr e cise ly w h a t is n e e d e d for st udying le xica l de ca y. E a c h s e m a n t i c a t o m , in a n y s p e e c h a t a n y t ime , is a s s u m e d t o be c o v e r e d by s o m e le xica l it e m. T h a t is, t he r e is s o m e w o r d w h o s e m e a n i n g include s t ha t of t he a t o m . T hus, v o c a b u l a r i e s c a n be f o r m e d o v e r a n y se t of s e m a n t i c a t o m s b y list ing t he c o v e r i n g le xica l i t e m for e a c h a t o m . T h e kind of d e c a y be ing st udie d is t ha t w h e r e t he c o v - e r ing le xica l i t e m is r e p l a c e d by a n o t h e r it e m. T h e r e p l a c e d w o r d only r a r e ly i m m e d i a t e l y d i s a p p e a r s f r o m t he l a n g u a g e a s a w hole , but it h a s d i s a p p e a r e d f r o m t he s e m a n t i c a t o m . A n i n d e p e n d e n t se t of s e m a n t i c a t o m s is a se t of a t o m s a ll of w h i c h diffe r a m o n g t h e m s e l v e s e n o u g h t o m a k e t he d e c a y a t a n y a t o m c o m p l e t e l y i n d e p e n d e n t of t ha t a t a n y ot he r a t o m . T h u s , only one f r o m se t s of w o r d s , like n u m e r a l s or p r o n o u n s , w it h ha bit ua l int e r r e la t ions Kl e i ne c ke - 4 c a n a p p e a r i n t he s e t . I n d e p e n d e n t se t s a r e use ful b e c a u s e in t h e m , p r o b l e m s of i n t e r - a t o m cor r e la t ions n e e d not be c o n s i d e r e d . B e f o r e p a s s i n g on, w e should s a y a f e w w o r d s a s t o t he p r a c - t ica l u s e of s e m a n t i c a t o m s . T h e r e d o e s not s e e m t o be a n y doubt t ha t t he colle ct or s of v o c a b u l a r i e s w a n t t o w o r k w it h s e m a n t i c a t o m s - - e v e n if t he ir r e sult s a r e c o m p l e t e l y unsucce ssful. I n a n e nt r y " d o g = h u n d " t he y w o u l d like t o sa y t ha t t he r e is a s e m a n t i c a t o m a n d it s c o v e r in E n g l i s h is "dog", in G e r m a n , "hund". T h e pit fa lls of t his sor t of t hing a r e w e l l - k n o w n . S o m e c a r e in de fining a t o m s m i g h t m a k e it fe a sible if w e r e q u i r e not c o m p l e t e ide nt it y of t he E n g l i s h a n d G e r m a n s e m a n t i c s , but r a t he r t he e xist e nce of s o m e c o n c r e t e c o n c e p t w h e r e bot h t he E n g l i s h a n d G e r m a n w o r d s a r e a ppr opr ia t e . C l e a r l y t his m u c h w e a k e r r e q u i r e m e n t w ill be e a sie r t o sa t isfy, so w e a dopt it. W e c o n c l u d e t ha t , w it h a d e q u a t e pr e ca ut ions, s e m a n t i c a t o m s c a n be ope r a t iona lly fe a sible e v e n if t r ue r igor is i m p o s s i b l e . I n t he c a s e of lit t le -know n la ngua ge s, t he r e is m u c h m o r e c h a n c e for e r r or . W e should e n c o u r a g e colle ct or s of v o c a b u l a r i e s t o i m p r o v e t he p r e - cision of t he ir de finit ions so t ha t t he a t o m in que st ion c a n be ide nt ifie d. D e c a y P r o c e s s W e a s s u m e t ha t le xica l de ca y, for a se t of i n d e p e n d e n t s e m a n t i c a t o m s , is a P o i s s o n p r o c e s s . T h a t is, it sa t isfie s t hr e e condit ions: i. E a c h a t o m d e c a y s i n d e p e n d e n t l y of a ll t he ot he r a t o m s . 2. E a c h a t o m d e c a y s i n d e p e n d e n t l y of it s hist or y of e a r lie r de c a y. 3 . T h e r e is a const a nt k s u c h t ha t for e a c h a t o m t he p r o - ba bilit y of one d e c a y in a shor t t i m e int e r va l A t is kA t , a n d t he pr oba bilit y of m o r e t ha n o n e d e c a y is ne gligible . Kl e i ne c ke - 5 I t is r a t he r e a s y t o d e d u c e t ha t for l o n g e r t i m e int e r va ls t , t he pr oba bilit y of not d e c a y i n g is e xp( -I t ) , a n d if t he r e a r e N a t o m s , t h e e x pe c t e d numbe r of und e c a y e d a t oms a f t e r t i me t i s Ne x p( - l t ) . T h i s f or mul a i s t h e us ua l f or mul a f or l e x i c a l d e c a y . It s h oul d be poi nt e d out t h a t i t wa s t e s t e d , s t a t i s t i c a l l y , i n t h e f i r s t publ i c a t i on by S wa d e s h , a nd i t f a i l e d t o pa s s . Th e d i f f i c ul t y i s pr oba bl y d ue to t h e wor d l i s t us e d wh i c h i s not an i nd e pe nd e nt s e t of a t oms . If we e x a mi ne t h e a s s umpt i ons ma d e so f a r , we s e e t h a t a ny l i s t of s e ma nt i c a t oms c a n be us e d i f t h e y a r e : (1) i nd e pe nd e nt ; a nd (2) a s s ur e d of e x i s t e nc e t h r ough out t h e t i me i n q ue s t i on. T h e r e i s no s a t i s f a c t or y a pr i or i ba s i s f or a s s umi ng t h a t s ome ki nd s of s e ma nt i c a t oms d e c a y at d i f f e r e nt r a t e s t h an ot h e r ki nd s , and i t i s d oubt f ul i f e nough h i s t or i c a l e v i d e nc e c a n be c ol l e c t e d t o ma ke s uc h a c onc l us i on s t at i s t i c a l l y s i gni f i c a nt . Th e q ue s t i on wh e t h e r l i s a uni v e r s a l c ons t a nt , a c ons t a nt wi t h i n a ny one l a ngua ge but pos s i bl y d i f f e r i ng be t we e n l a ngua ge s , or a v a r i a bl e , i s e a s i e r to d i s c us s . So f a r , i nd i c a t i ons a r e t h a t k i s a bout e q ua l t o 1/ 5000 y e a r s . Now t h i s me a ns t h at ov e r t h e s pa n of mos t h i s t or i c e v i d e nc e , e x p( - kt ) wi l l be gr e a t e r t h an a bout 0. 60. T h e r e i s a gr e a t d e a l of s c a t t e r t o be e x pe c t e d i n t h e r e s ul t s be c a us e N e x p( - kt ) i s an e x pe c t a t i on, not an e x a c t pr e d i c t i on. T h e r e h a v e be e n a numbe r of s t ud i e s of t h e e x pone nt of e x po- ne nt i a l d e c a y . Al l of t h e m a r e t oo s upe r f i c i a l t o be c onc l us i v e (Z) . An a d e q ua t e s t ud y i n a ny one l a ngua ge woul d h a v e t o me e t s e v e r a l c r i t e r i a wh i c h ma ke i t i nt o a ma j or r e s e a r c h e f f or t . A s e t of i nd e - pe nd e nt s e ma nt i c a t oms mus t be s e l e c t e d - - s e l e c t e d pr i or t o d e t a i l e d s t ud y - - a nd no a t oms , h owe v e r d i f f i cul t , d r oppe d wi t h out c ompl e t e e x pl a na t i ons ( 3) . T h e n t h e h i s t or y of e a c h a t om mus t be t r a c e d t h r ough t h e h i s t or i c a l r e c or d to l oc a t e t h e l e x i c a l i t e m c ov e r i ng t he a t om at K l e i n e c k e - 6 e a c h point in t ime . I n r e por t ing t he st udy, a ll of t his should be fully d o c u m e n t e d in de t a il. E a c h inst a nce of d e c a y c a n t he n be r e c o g n i z e d a n d t a llie d. S t a t ist ica l t e st s should be a pplie d t o se e w h e t h e r or not t he m o d e l is sa t isfie d a n d t o e s t i m a t e )~ . F o r e x a m p l e , if t he r e a r e i0 0 s e m a n t i c i t e m s T, t he r e should be a bout o n e d e c a y e v e r y 5 0 y e a r s u n i f o r m l y s p r e a d t h r o u g h t ime . T h e s e t hings c a n be c h e c k e d st a t is- t ica lly. W e h o p e t ha t schola r s w ill u n d e r t a k e de finit ive st udie s of t his t ype for a s m a n y c a s e s a s possible ( 4) . Unt il t he r e sult s of t he kind of r e s e a r c h just m e n t i o n e d a r e a va il- a ble , t he st a t us of ~ is unsur e . W e a nt icipa t e it w ill be r e c o g n i z e d a s a unive r sa l const a nt . T h e r e r e m a i n s t he p r o b l e m of m a k i n g a P o i s s o n p r o c e s s a r e a - sona ble a s s u m p t i o n . I n ot he r w o r d s , w e n e e d t o d e s c r i b e s o m e sor t of m e c h a n i s m w h i c h m a k e s w o r d s slip off s e m a n t i c a t o m s inde pe nde nt ly of h o w long t he y h a v e b e e n c o v e r i n g t he a t o m , a n d a t a const a nt r a t e p e r unit t ime , a t le a st o v e r shor t t i m e int e r va ls. I ncide nt a lly, since )~ is on t he o r d e r of 1 / 5 0 0 0 ye a r s, 5 0 y e a r s is a shor t t i m e int e r va l. S ince t he s p e a k e r s of n o r m a l l a n g u a g e s a r e not hist or ia ns, t he i n d e p e n d e n c e f r o m hist or y s e e m s e a s y t o a cce pt . T h e const a nt r a t e is h a r d e r t o a cce pt . F ir st of a ll w e h a v e t o a c c o u n t for a n ide nt ica l figur e in popula t ions, lit e r a t e a n d illit e r a t e , a n d b e t w e e n a ha ndful of s p e a k e r s a n d ha lf a billion spe a ke r s. T h e d e c a y e ffe ct m u s t be i n d e p e n d e n t of t he n u m b e r of spe a ke r s, h e n c e it m u s t be ope r a t ive a t t he le ve l of t he single isola t e d spe a ke r . T his is sa t isfa ct or y since , b y a n d la r ge , t he a m o u n t of s p e e c h r e a ching a n individua l d o e s not s e e m t o h a v e c h a n g e d m u c h t h r o u g h o u t hist or y a n d d o e s not v a r y m u c h b e t w e e n cult ur e s a t t he p r e s e n t da y. B u t w h y d o e s a s p e a k e r de cide t o c h a n g e a n occa siona l le xica l i t e m - - a b o u t i~0 in his life t ime --a nd m a i n t a i n t he r e st . T h e only K l e i n e c k e - 7 h y p o t h e s i s w e h a v e b e e n a ble t o const r uct is t ha t a ll w o r d s a r e a l w a y s u n d e r p r e s s u r e - - p e r h a p s f r o m se ve r a l s e m a n t i c "dir e ct ions" a t t he s a m e t ime . M o s t a t o m s r e sist c h a n g e m o s t of t he t ime , but s o m e se t of a ccide nt s ( a ll v e r y r e a l e ve nt s a t t he sociologica l a n d p s y c h o l o g i c a l le ve ls, but r a n d o m a ccide nt s in o u r cont e xt ) w e a k e n s a fe w , a n d t he le xicon de ca ys. I n ot he r w o r d s , t he r e is a const a nt d y n a m i c m o v e - m e n t a m o n g s e c o n d a r y a n d incide nt a l c o v e r s of t he s e m a n t i c a t o m w h i c h t hr e a t e n t he pr incipa l cove r . U s u a l l y t he t hr e a t e ning le xica l i t e m s r e ce de , but occa siona lly, in a r a n d o m w a y , a bout o n c e e v e r y five t h o u s a n d y e a r s t he pr incipa l c o v e r is d i s p l a c e d a n d a le xica l de - c a y occur s. T h e hypot he t ica l m e c h a n i s m a d v a n c e d t o e xpla in le xica l d e c a y c a n be c h e c k e d a ga inst hist or y by c a s e st udie s of s e m a n t i c a t o m s . E a c h a t o m should s h o w t i m e pe r iods w h e n t he pr incipa l w o r d w a s n e a r l y displa ce d. D u r i n g t he se pe r iods it is difficult t o de cide w h e t h e r t he old w o r d or a n e w w o r d is t he pr incipa l cove r . U s u a l l y t he n e w w o r d w ill p a s s a w a y a ga in, but s o m e t i m e s it w ill displa ce t he old w o r d . A v e r y t e nt a t ive g u e s s b a s e d on a ca sua l e x a m i n a t i o n of one h u n d r e d c u r r e n t E n g l i s h w o r d s sugge st s t he r e a r e a bout four v e r y he a vily t h r e a t e n e d w o r d s p e r h u n d r e d . S ince w e c a n e x p e c t a b o u t o n e w o r d t o be d e c a y i n g a t t his m o m e n t , w e c o n c l u d e t ha t a bout t hr e e out of four t i m e s t he old w o r d sur vive s. A ll of t his n e e d s t o be ve r ifie d or d i s p r o v e n in de t a ile d st udie s. D e c a y S t a t ist ics T h e st a t ist ica l c o n s e q u e n c e s of t he m o d e l - - t h e fir st o r d e r m o d e l d e s c r i b e d a b o v e - - n e e d t o be e xplor e d. W e c a n n o t h a n d l e a ll possible sit ua t ions, but t he follow ing e x a m p l e s should p r o v i d e a n a d e q u a t e d e m - onst r a t ion of t e chnique so t ha t a n y ot he r p r o b l e m s w h i c h o c c u r c a n be solve d in t he s a m e m a n n e r . Kl e i ne c ke - 8 F i r s t , l e t us c ons i d e r N l a ngua ge s d e v i a t i ng i nd e pe nd e nt l y f r om a c ommon pa r e nt wh i c h i s not known to us . Th e f ol l owi ng d i s c us s i on i s a bi t mor e c umbe r s ome t h a n s ome a l t e r na t i v e a ppr oa c h e s , but i t ge ne r a l i z e s mor e e a s i l y . Le t ~ be a ny s e t of t h e N l a ngua ge s and l e t P ( a ) be t h e pr o- ba bi l i t y t h a t t h e gi v e n s e ma nt i c a t om i s c ov e r e d by t h e or i gi na l l e x i c a l i t e m i n e x a c t l y t h e l a ngua ge s of s e t C~ Ne w c ov e r i ng wor d s a r e a s s ume d t o be d i f f e r e nt i n e a c h of t h e i nnov a t i ng l a ngua ge s . P(cc) i s a f unc t i on of t i me a nd s a t i s f i e s t h e f ol l owi ng d i f f e r e nt i a l e q ua t i on: wh e r e i a nd j a r e l a ngua ge s , ~ a nd me a n " be l ongs t o" a nd " d oe s not be l ong t o" r e s pe c t i v e l y , and ~) i s t h e uni on of c~ a nd t h e s e t c ont a i ni ng onl y t h e l a ngua ge j . Le t lal d e not e t h e numbe r of l a ngua ge s i n ~ . If 10~1 = N, t h e e q ua t i on i s e a s y t o solve : = e x p ( - X t ) I=[ = N p If a f e w c a s e s - - I ~ I = N - 1 , I ~ I : N - Z. e t c . - - a r e s ol v e d , we a r e l e a d t o h y p o t h e s i z e t ha t P( ~) = e x p( - kt ) loci (1 - e x p( - kt ) ) N- I~1 T h i s c a n b e p r o v e n b y i n d u c t i o n o n ] ~ I f r o m IO~l : I N d o w n w a r d since IOC(~jl = I ~I + I . T h e n K l e i n e c k e - 9 d p( g) : _ ] Cci Xp( g ) + X( N- [ C~[ ) e x p( - kt ) [ al + l ( 1- e x p( - X t ) ) N- I c ~ l - 1 dt so t h a t d < p( c ~ ) e x p (xt)laIS. = ( N- ] C~ l ) k e x p( - kt ) ( 1- e x p( - kt ) ) N- I c ~ l - 1 P ( g) ex p( X t ) ] al : {1 - e x p( - X t ) ) N- I(l] ," a n d t he h y p o t h e s i s is p r o v e n b y induct ion. T h u s , P ( c~ ) d e p e n d s only o n t he v a l u e of Ic~l = n . W e c a n r e c - o g n i z e P ( n) for n = Z , 3 , . . . , N but P ( 0 ) a n d P ( 1 ) c a n n o t b e dist in- g u i s h e d so w e c o m b i n e t h e s e int o P ' w h i c h is o b t a i n e d b y P ' : 1 N ( N - I ) P ( Z ) . . . . N '. Z n: N - n: P ( n) . . . . . P ( N) ~,,N = 1 - ( ( 1- e x p( - X t ) ) + e x p( - kt ) , - P{ 0) - NP ( 1) ; . / s i nc e t h e r e a r e N; / n! ( N- n) ' s e t s wi t h ]0~ I = n . T h us , P ' = (1 - e x p( - kt } ) N- 1 (1 + ( N- 1) e x p( - kt ) ) . Now s uppos e t h a t f r om K s e ma nt i c a t oms we obs e r v e t h a t k N a t oms a r e c ov e r e d by t h e s a me wor d i n al l l a ngua ge s , a nd kN_ 1 i n al l but one, a nd so on t o k 2 , a nd t h e r e a r e k' a t oms d i f f e r e nt l y c ov e r e d i n al l l a ngua ge s . T h e pr oba bi l i t y of t h i s oc c ur i ng i s K l e i n e c k e - i0 k n k N - I k Z k' P ( N ) P ( N - I ) . . . . P ( 2) P ' ( 2ks + 3 k s + . . . x N k N ) (I - x) N K - k ' - ( Z k 2 + . . . N k N ) k' (1 + ( N - l) x) w h e r e x = e x p ( - k t ) . A m a x i m u m liklihood e s t i m a t e for x s e e m s t o be t he be st single va lue w e c a n a s s i g n t o x . T h i s is obt a ine d b y se t t ing t he ( loga r it hmic) de r iva t ive of pr oba bilit y t o z e r o so t ha t 0 = A - k ' N K - A + ( N - l) k' x l - x i + ( N - l ) x w h e r e A = k' + Z k s + 3 k s + . . . + N k N . O r N ( N - l ) K x s - ( ( N - I ) A - N K + k') x - ( A - k ' ) = 0 . I f N = Z , x a = k s / K , w h i c h is t he w e l l - k n o w n f o r m u l a for t he s e p a r a - t ion b e t w e e n t w o la ngua ge s. F o r g e n e r a l N , is t he solut ion of t he X qua t r a t ic e qua t ion give n a bove . N o t e t ha t t he a n s w e r d e p e n d s on t he st a t ist ic A w h i c h d o e s not usua lly a p p e a r in d i s c u s s i o n s of le xica l da t ing. A n e v e n m o r e g e n e r a l diffe r e nce b e t w e e n t his t r e a t m e n t a n d u s u a l t r e a t m e n t b y pa ir s is found in t he u s e m a d e of t he n u m b e r of a ll t he l a n g u a g e s cont a ining a ce r t a in le xica l i t e m a s t he c o v e r of a se - m a n t i c a t o m . T h i s kind of count is a l m o s t n e v e r m a d e in t he lit e r a t ur e o n da t ing p r o b l e m s . A n o t h e r c a s e w h i c h const a nt ly r e c u r s in pr a ct ice is t ha t of t hr e e la ngua ge s; i, 2 a n d 3, sa y. "T he pa ir 1 a n d 2 a r e m o r e close ly K l e i n e c k e - I i r e la t e d t ha n l a n g u a g e 3 is t o e it he r I or ? . S u p p o s e t is t i m e f r o m t he c o m m o n a n c e s t o r of i, 2 a n d 3 t o 3 , a n d t ' t he t i m e f r o m t he c o m m o n a n c e s t o r of 1 a n d 2 t o 1 or Z . L e t x = e x p ( - I t ) , x = e xp( -kt ' ) so t ha t x / x ' is t he pr oba bilit y a s s o c i a t e d w i t h t he t i m e f r o m t he c o m m o n a n c e s t o r of I , Z a n d 3 t o t ha t of 1 a n d 2 . W e m i g h t o b s e r v e a n y of five sit ua t ions c o n c e r n i n g t he c o v e r of a s e m a n t i c a t o m . I t m a y b e t he s a m e in a ll ( i, Z , 3 ) ; or in a n y pa ir ( I , 2) , ( i, 3 ) or ( Z , 3 ) , or diffe r e nt in e a ch. T h e pr oba bilit y of e a c h of t h e s e e v e n t s is X X l ~ X s I X l S 3 = X ~ - T = X , X I t ) X ~ t x l ~ = x ~ 3 = x - - r x ( l - x = ( l - x ) , X x 1 ~ = ~ l - x + x ( l - x I ~ = x ( x - x ~ ) , t = . I . - t X ~ S x I x ~ x - Z x 2 ( I x ' ) x ( x ' - x 2) = 1 - - Z x ~ + Z x ~ x ' / / = ( i - x ) ( I + x - Z x s) S u p p o s i n g klz 3 , k~ s, k s 8 , kle a n d k' of e a c h of t h e s e is o b s e r v e d w h e n K a t o m s a r e c o n s i d e r e d . T h e t ot a l pr oba bilit y is ' ( kls + klss ) x' + k~s + k2s xS ) kls x' x Z ( k l s + k ~ s + k ~ s s ) x ( i - ) k' ( x'- ( i+ -Z x2 ) k' l M a x i m u m liklihood e s t i m a t e s for x a n d x a r e got t e n f r o m t he e q u a t i o n s o b t a i n e d b y se t t ing t he ( loga r it hmic) d e r i v a t i v e s b y x a n d ! x t o z e r o se pa r a t e ly. K l e i n e c k e - iZ 2( k~3 + k~3 + k1 2 s ) 2 k l s x ' 4 k 'x x x - X ~ l + x - 2 x 2 ' k t k~ s+ klsa k ' + k ~ s + k ~ + + l + x ' 2 x ~ = x' " 1 - x I T h e s e e qua t ions a r e be st solve d n u m e r i c a l l y for give n va lue s of k~s s, kls , kls, kss a n d k' . T h e m e t h o d o l o g y is s t r a i g h t - f o r w a r d a n d t he r e is n o n e e d t o mult iply e x a m p l e s . I n e v e r y ca se w e obt a in n e w f o r m u l a s b a s e d on m a x i m u m liklihood e st ima t or s. A n o t h e r a r e a in w h i c h t he se m e t h o d s could a lso be ut ilize d is in t he const r uct ion of significa nce t e st s a n d confide nce ba nds. W i t h t his ba sis, m o s t of t he m a c h i n e r y of m o d e r n st a t ist ics w o u l d be a va ila ble for use . C r i t i c i s m of F ir st O r d e r T h e o r y A s w e e x p l a i n e d in discussing s e m a n t i c a t o m s , w e fe e l t he r e is no a d e q u a t e o b s e r v a t i o n a l da t a t o w h i c h t o a pply t he se f o r m u l a s for a conclusive t e st of t he ir va lue . W e h a v e m a d e a f e w e x p e r i m e n t a l a pplica t ions using t he unsa t isfa ct or y da t a a va ila ble in t he lit e r a t ur e . N u m e r i c a l l y , t he t i m e e s t i m a t e s we obt a ine d, w h i c h w e w ill not quot e he r e , do not diffe r a gr e a t de a l f r o m t hose obt a ine d b y con- side r ing pa ir s a lone . T his is t o be e x p e c t e d if t he p h e n o m e n a a r e a t a ll consist e nt . T h e va lue in t he f o r m u l a s d e r i v e d a b o v e lie s in t he fa ct t ha t t he y c o r r e c t l y c o m b i n e t he da t a f r o m se ve r a l pa ir s. T h e fir st -or de r m e t h o d d o e s h a v e o n e v e r y i m p o r t a n t difficult y w h i c h a p p e a r s a l m o s t i m m e d i a t e l y if w e t r y t o t r e a t m o r e t ha n t hr e e la ngua ge s. T h i s difficult y is in t he f a m i l y t r e e of t he la ngua ge s. I n t he e nt ir e fir st -or de r d e v e l o p m e n t , w e h a v e implicit ly u s e d t he c o n c e p t of a t r e e . L a n g u a g e s go t oge t he r a s a " c o m m o n a n c e s t o r " K l e i n e c k e - 1 3 unt il s o m e point in t i m e w h e n t he y divide a n d b e c o m e t w o s e p a r a t e la ngua ge s. T h e t r e e is t he fir st -or de r m o d e l of dia le ct a t ion--it is k n o w n t o be ina de qua t e , a t le a st in m a n y sit ua t ions. I n spit e of a c e n t u r y or so of st udie s, w e s i m p l y do not u n d e r s t a n d h o w dia le c- t a t ion o c c u r s . M o r e st udy is gr e a t ly n e e d e d , e spe cia lly in t he c o n - st r uct ion of h i g h e r - o r d e r m o d e l s , but t he p r o b l e m lie s out side t he s c o p e of t his pa pe r . T h e difficult y w it h t he t r e e r ise s in d e c a y st udie s b e c a u s e only split t ing is c o m p a t i b l e w it h o u r st a t ist ica l m o d e l . W e h a v e no a lt e r - na t ive t o const r uct ing a f a m i l y t r e e if w e w i s h t o a pply t he m e t h o d out line d a bove . H o w e v e r , it s e e m s t o be e a s y t o find e x a m p l e s w h i c h do not a llow a t r e e t o be const r uct e d. C o n s i d e r four la ngua ge s; A , B , C a n d D . S u p p o s e one s e m a n t i c a t o m h a s t he s a m e c o v e r in A a n d B , a n d a n o t h e r diffe r e nt c o v e r in C a n d D . A n d a t t he s a m e t ime , s o m e ot he r a t o m h a s one c o v e r in A a n d C , a n d a diffe r e nt c o v e r in B a n d D . W e ca nnot fit t his da t a int o a n y f a m i l y t r e e . A lit t le m o r e spe cifica lly in t he R o m a n c e la ngua ge s, w e find t ha t t he s a m e innova t ion w it h r e s p e c t t o L a t in is s h a r e d b y se ve r a l or a ll t he la t e r la ngua ge s. S o m e of t his ca n be e xpla ine d b y t he colloquia l v e r s u s l e a r n e d s p e e c h t he or y, but no f a m i l y t r e e c a n be c o n s t r u c t e d t o e xpla in a ll t he c o m b i n a t i o n s of innova t ions. I f w e h a d a n a d e q u a t e e xpla na t ion of t he p h e n o m e n a involve d in t he se s h a r e d innova t ions, it is quit e possible t ha t w e could a s s u m e R o m a n c e w a s t he dir e ct d e s c e n - de nt of I m p e r i a l L a t in w it hout going b a c k t o P l a u t u s or t he r e a bout s, a s s e e m s t o be r e q u i r e d by t he fir st o r d e r t he or y. A t e nt a t ive be ginning in t his dir e ct ion c a n be m a d e b y a s e c o n d - o r d e r t h e o r y b a s e d o n t he d y n a m i c m o d e l of le xica l influe nce . K l e i n e c k e - 1 4 S e c ond - Or d e r Le x i c a l De c a y T h e i mpr e c i s e mod e l of s e ma nt i c pr e s s ur e s we f or me d t o e x pl a i n l e x i c a l d e c a y s ugge s t s t h e f ol l owi ng s e c ond - or d e r mod e l . F or e a c h s e ma nt i c a t om, we c ons i d e r not onl y a c ov e r i ng l e x i c a l i t e m a s be f or e , but a l s o a pot e nt i a l c ov e r i ng i t e m. T h e pot e n- t i a l c ov e r i s t h e s our c e of pr e s s ur e a ga i ns t t h e c ov e r . Wh e n t h e c ov e r d e c a y s , i t i s r e pl a c e d by t h e pot e nt i a l c ov e r . Na t ur a l l y we a l s o a s s ume t h a t t h e pot e nt i a l c ov e r d e c a y s a nd i s r e pl a c e d by a ne w pot e nt i a l c ov e r . I n t h e i nt e r e s t of s i mpl i c i t y a nd be c a us e we h a v e no nume r i c a l d a t a , we wi l l a s s ume bot h d e c a y s h a v e t h e s a me const a nt k F ir st , le t us c o n s i d e r a single la ngua ge . T h e sit ua t ion a t a n a t o m ca n be of four t ype s: (1) bot h t he or igina l c o v e r a n d pot e nt ia l c o v e r r e m a i n ; ( I f) t he or igina l c o v e r r e m a i n s , but t he pot e nt ia l c o v e r h a s d e c a y e d ; ( I I I ) t he or igina l c o v e r h a s d e c a y e d a n d t he pot e n- t ia l c o v e r h a s r e p l a c e d it; ( I V) t he c o v e r is n o w ne it he r t he or igina l n o r t he pot e nt ia l cove r . L e t P l a n d P I I be t he pr oba bilit y of t he fir st t w o sit ua t ions. T h e n d ~'-t P I = " Z X P I ' = - k P I I + X P I ; so t h a t P l = e xp( -Z it ) , K l e i n e c k e - 1 5 PI I = exp(-Xt) (I - exp(-%t)) T h e or i gi na l c ov e r r e ma i ns i n t h e s e t wo c a s e s onl y so t h a t t h e pr ob- a bi l i t y of it r e m a i n i n g is P I + P I I = e x p( - kt ) wh i c h i s e x a c t l y t h e s a me as i n f i r s t - or d e r t h e or y . Wh e n t h e s e c ond - or d e r t h e or y i s a ppl i e d t o N l a ngua ge s , t h e r e s ul t s a r e q ui t e c ompl i c a t e d . We d i v i d e t h e l a ngua ge s i nt o f our s e t s (~, B, Y, 6 d e pe nd i ng on wh i c h s i t ua t i on h ol d s i n t h e l a ngua ge ; i n s e t 0% s i t ua t i on I h ol d s , a nd so on. T h e n we h a v e t h e ba s i c d i f f e r e nt i a l e q ua t i on - ; ' , ) P ( a , ~ , y , 6) = - 2%p( a , ~ , y , 6) , S ~ j e ~ + %P ( a , ~ , , 8 ) dt iec~ . ~ L~k ~jc~ %P ( ~ @j , ~ j , y , 6) + I k x P(~@k, ~, X -Cgj, 6) + ~. ~6% P ( C ~, ~E ) ~, Y , 6~, f. ) + Z ~. c6 w h i c h h a s t he solut ion P(&,~,,6) = [exp(-Xt)] 21Ct{ + IBI + IYI [1 - exp(-Xt)] l~I + IYi + 2161 W e h a v e no w a y of r e c o g n i z i n g t he condit ion of t he pot e nt ia l c o v e r , so se t s ez a n d ~ s h o u l d b e c o m b i n e d int o a se t ~ a n d P(~,y, 6) = [exp(-%t)] Irll + Ivl [ I - exp(-%t}] IYI + 2t61 Kl e i ne c ke - 16 B e f o r e w e c a n a ct ua lly a pply t he m a x i m u m liklihood t e chnique t o la n- g u a g e s w it hout k n o w n a nce st or s, w e h a v e t o m a k e S o m e fur t he r c o m - bina t ions b e c a u s e se t s w it h I~]I = 1 c a n not be dist inguishe d f r o m thos e with I ~ I - o or thos e w i t h I ' ~ I - i f r o m thos e with l ~ J- 0 M o r e o v e r , w e c a n n o t dist inguish or igina l c o v e r s f r o m pot e nt ia l c o v e r s so t ha t t w o se t s T] a n d y m u s t be c o m b i n e d w it h t he s a m e se t s in t he r e v e r s e or de r . T h e g e n e r a l c a s e is v e r y complica t e d, so w e r e st r ict o u r s e l v e s t o t w o la ngua ge s. W e t he n o b s e r v e t ha t t he c o v e r s a r e e it he r t he s a m e or diffe r e nt . I f t he y a r e t he s a m e , w e h a v e e it he r I~ I = 2 a n d IYI = 161 = 0 , o r I~{I = 2 a n d IT]I = 161 = 0 T h u s , t he pr oba bilit y is [e xp( -kt ) ] s + [e xp( -kt ) ] s [i - e xp( -kt ) ] s = e xp( -Z kt ) [i + (i e xp( -kt ) ) s] w h i c h diffe r s f r o m t he fir st o r d e r t h e o r y by t he t e r m in t he s q u a r e br a cke t . T h e s i m p l e s t c a s e w h e r e t he s e c o n d - o r d e r t h e o r y is r e a lly r e - q u i r e d is t ha t of four la ngua ge s. W e w ill illust r a t e t he r e sult s b y one e xpr e ssion. I f kss w o r d s a r e c o v e r e d b y t w o i t e m s bot h in t w o la n- gua ge s, k 4 w o r d s b y one i t e m in a ll la ngua ge s, k 8 b y o n e i t e m in t hr e e la ngua ge s, k s b y one i t e m in t w o la ngua ge s, a n d k' b y no c o m m o n i t e m s , t h e n t he e x p r e s s i o n t o be solve d for m a x i m u m liklihood is 4kss + 4 k 4 + 3 k s + 2 k s 2ks s + k 3 + 3ks + 5k' p 1 - p K l e i n e c k e - 1 7 4 k 4( I _ p) S ks( 3 4 p + 4 p s) 2 - 4 p + 6p ~ - 4 p ~ + p% " + 2 + 3 p - 2p~ + p4 k s ( 4 - Z p - 3 p s ) + 2 + 4 p - p~ - p3 + k' ( 5 + 6 p + 9 p 2) + 5 p + 3 p s + 3 p s w h e r e p = e xp( -% t ) . T h i s s e c o n d - o r d e r t h e o r y is not sa t isfa ct or y not only b e c a u s e it le a ds t o v e r y c o m p l e x f o r m u l a s , but it a lso s e e m s t o be qua lit a - t ive ly ina de qua t e . T h e f o r m u l a for split t ing b e t w e e n t w o l a n g u a g e s is not gr e a t ly m o d i f i e d e xce pt for v e r y long t ime s, a n d t he c h a n g e d o e s not s e e m t o be e n o u g h t o a c c o u n t for da t a s h o w i n g shor t t i m e s of division. I t is h a r d t o t e ll w h e t h e r t he f o r m u l a for se ve r a l la n- g u a g e s including t he qua nt it y k2e is a n y h e l p - - s o fa r w e h a v e n o st r iking r e sult s t o quot e f r o m it s use . A s e c o n d - o r d e r t h e o r y w h e r e pot e nt ia l c o v e r d e c a y e d a t a diffe r e nt r a t e t ha n t he or igina l c o v e r m i g h t c o r r e c t s o m e of t he se de fe ct s, but w e h a v e no e v i d e n c e u p o n w h i c h t o e s t i m a t e t he d e c a y r a t e in t his ca se . I t is m u c h like ly t ha t a m o r e e la bor a t e m e c h a n i s m m u s t be post ula t e d--it n e e d not le a d t o m o r e e la bor a t e r e sult s. T h e m o d e l m u s t be b a s e d o n a kind of dia le ct a t ion st udy w h i c h s e e m s t o be a b s e n t a s ye t f r o m t he lit e r a t ur e . C o n c l u s i o n W e h a v e d e r i v e d a n u m b e r of f o r m u l a s r e la t ing t o t he e s t i m a t i o n of t i m e de pt hs b y o b s e r v a t i o n s of le xica l de ca y. T h e m e t h o d s u s e d c a n be a pplie d t o obt a in m a n y m o r e simila r f o r m u l a s a s r e q u i r e d in st udie s of a ct ua l da t a . A ll of t he se f o r m u l a s a r e b a s e d o n m o d e l s of le xica l d e c a y using t he c o n c e p t of s e m a n t i c a t o m s a n d t he ir le xica l cove r s. L e x i c a l d e c a y Kl e i ne c ke - 18 i s i d e nt i f i e d wi t h a c h a nge i n l e x i c a l c ov e r . If t h e s e ma nt i c a t oms a r e s uf f i c i e nt l y i nd e pe nd e nt , t h e d e c a y i s a P oi s s on pr oc e s s . P r oba bl y t h e mos t i mpor t a nt pr a c t i c a l c onc l us i on i s t h e r e s ul t t h a t a ny s e t of s e ma nt i c a t oms c a n be us e d t o e v a l ua t e l e x i c a l d e c a y pr ov i d e d t h e s e t i s ma d e up of a t oms : . . f a r e nough r e mov e d i n me a ni ng f r om one a not h e r t o a s s ur e i nde pe nde nc e , wh i c h r e pr e s e nt c onc e pt s a s s ur e d to h a v e be e n i n e x i s - t e nc e t h r ough out t h e t i me pe r i od be i ng s t ud i e d . ( 1 ) (z) ( 3) (4) End Not e s S e e R o b e r t B . L e e s, " T h e B a sis of G lot t ochr onology" L a n g u a g e , 29 . I 1 3 -2 7 ( 1 9 5 3 ) . T h e r e is no out st a nding st udy of t his p r o b l e m . A t t e m p t s t o " i m p r o v e " t he t e st v o c a b u l a r y by limit ing it t o m e a n - ings w h i c h h a v e b e h a v e d w e ll in e a r lie r st udie s a r e m e t h - odologica lly disa st e r ous b e c a u s e t he y bia s t he va lue of k. T his r e q u i r e m e n t is a lso int e nde d t o r e m o v e bia s f r o m t he e st ima t e of k . T his is a m a t t e r of cla ssica l philologica l r e s e a r c h inde - pe nde nt of st a t ist ica l synt he se s m a d e f r o m t he r e sult s.