齊夫定律:修订间差异
外观
删除的内容 添加的内容
小 →參見 |
White Moth DXX(留言 | 贡献) 消歧義 |
||
(未显示7个用户的13个中间版本) | |||
第1行: | 第1行: | ||
{{NoteTA |G1=Math}} |
|||
{{Probability distribution |
{{Probability distribution |
||
|name = 齐夫定律 |
|name = 齐夫定律 |
||
|type = 質量 |
|type = 質量 |
||
|pdf_image = [[Image:Zipf distribution PMF.png|325px|齐夫定律的[[概率质量函数]]的图像,其中''N'' = 10]]<br /><small>横纵坐标均为对数比例下,齐夫定律的[[概率质量函数]]的图像,其中''N'' = 10。横坐标是指数''k'' |
|pdf_image = [[Image:Zipf distribution PMF.png|325px|齐夫定律的[[概率质量函数]]的图像,其中''N'' = 10]]<br /><small>横纵坐标均为对数比例下,齐夫定律的[[概率质量函数]]的图像,其中''N'' = 10。横坐标是指数''k'' 。(注意,函数仅在''k''为整数时有定义,图上的连线不代表函数连续。)</small> |
||
|cdf_image = [[Image:Zipf distribution CMF.png|325px|齐夫定律的[[累计分布函数]]的图像,其中''N'' = 10]]<br /><small>横纵坐标均为对数比例下,齐夫定律的[[累计分布函数]]的图像,其中''N'' = 10。横坐标是指数''k'' |
|cdf_image = [[Image:Zipf distribution CMF.png|325px|齐夫定律的[[累计分布函数]]的图像,其中''N'' = 10]]<br /><small>横纵坐标均为对数比例下,齐夫定律的[[累计分布函数]]的图像,其中''N'' = 10。横坐标是指数''k'' 。(注意,函数仅在''k''为整数时有定义,图上的连线不代表函数连续。)</small> |
||
|parameters =<math>s>0\,</math>(实数)<br /><math>N \in \{1,2,3\ldots\}</math>(正整数) |
|parameters =<math>s>0\,</math>(实数)<br /><math>N \in \{1,2,3\ldots\}</math>(正整数) |
||
|support = <math>k \in \{1,2,\ldots,N\}</math> |
|support = <math>k \in \{1,2,\ldots,N\}</math> |
||
第20行: | 第21行: | ||
|char = <math>\frac{1}{H_{N,s}}\sum_{n=1}^N \frac{e^{int}}{n^s}</math> |
|char = <math>\frac{1}{H_{N,s}}\sum_{n=1}^N \frac{e^{int}}{n^s}</math> |
||
|}} |
|}} |
||
'''齐夫定律'''({{lang-en|Zipf's law}},[[IPA]]{{IPAc-en|ˈ|z|ɪ|f}})是由[[哈佛大學]]的[[語言學家]] |
'''齐夫定律'''({{lang-en|Zipf's law}},[[國際音標|IPA]]:{{IPAc-en|ˈ|z|ɪ|f}})是由[[哈佛大學]]的[[語言學家]]{{le|喬治·金斯利·齊夫|George Kingsley Zipf}}于1949年发表的实验定律。它可以表述为:在[[自然语言]]的[[語料庫]]裡,一个单词出现的频率与它在频率表里的排名成[[反比]]。所以,频率最高的单词出现的频率大约是出现频率第二位的单词的2倍,而出现频率第二位的单词则是出现频率第四位的单词的2倍。这个定律被作为任何与[[冪定律]][[概率分布]]有关的事物的参考。 |
||
==例子== |
==例子== |
||
最简单的齐夫定律的例子是“1/''f'' function”。给出一组齐夫分布的频率,按照从最常见到非常见排列,第二常见的频率是最常见频率的出现次数的½,第三常见的频率是最常见的频率的1/3,第n常见的频率是最常见频率出现次数的1/n。然而,这并不精确,因为所有的项必须出现一个整数次数,一个单词不可能出现2.5次。 |
最简单的齐夫定律的例子是“1/''f'' function”。给出一组齐夫分布的频率,按照从最常见到非常见排列,第二常见的频率是最常见频率的出现次数的½,第三常见的频率是最常见的频率的1/3,第n常见的频率是最常见频率出现次数的1/n。然而,这并不精确,因为所有的项必须出现一个整数次数,一个单词不可能出现2.5次。 |
||
在{{ |
在{{tsl|en|Brown Corpus|布朗语料库}}中,“the”、“of”、“and”是出現頻率最前的三個單詞,其出現的頻數分別為69971次、36411次、28852次,大約佔整個語料庫100萬個單詞中的7%、3.6%、2.9%,其比例約為6:3:2。大約佔整個語料庫的7%(100万单词中出现69971次)。满足齐夫定律中的描述。仅仅前135個字彙就佔了Brown語料庫的一半。 |
||
齐夫定律是一个[[歸納推理|实验定律]],而非[[演繹推理|理论定律]],可以在很多非语言学排名中被观察到,例如不同国家中城市的数量、公司的规模、收入排名等。但它的起因是一个争论的焦点。齐夫定律很容易用点阵图观察,坐标分别为排名和频率的[[ |
齐夫定律是一个[[歸納推理|实验定律]],而非[[演繹推理|理论定律]],可以在很多非语言学排名中被观察到,例如不同国家中城市的数量、公司的规模、收入排名等。但它的起因是一个争论的焦点。齐夫定律很容易用点阵图观察,坐标分别为排名和频率的[[对数]](log)。比如,“the”用上述表述可以描述为x = log(1), y = log(69971)的点。如果所有的点接近一条直线,那么它就遵循齐夫定律。 |
||
==遵循该定律的现象== |
==遵循该定律的现象== |
||
* 单词的出现频率 |
* 英文单词或中文汉字的出现频率:不仅适用于语料全体,也适用于单独的一篇文章 |
||
* 网页访问频率 |
* [[网页]]访问频率 |
||
* 城 |
* [[城镇人口]]与[[城镇等级]]的关系 |
||
* 收入前3%的人的收入 |
* 收入前3%的人的收入 |
||
* 地震震级 |
* [[地震]]震级 |
||
* 固体破碎时的碎片大小 |
* [[固体]]破碎时的碎片大小 |
||
==參見== |
==參見== |
||
第47行: | 第48行: | ||
* George K. Zipf (1935) ''The Psychobiology of Language''. Houghton-Mifflin.(see citations at http://citeseer.ist.psu.edu/context/64879/0) |
* George K. Zipf (1935) ''The Psychobiology of Language''. Houghton-Mifflin.(see citations at http://citeseer.ist.psu.edu/context/64879/0) |
||
次要: |
次要: |
||
* Lada Adamic. ''Zipf, Power-laws, and Pareto - a ranking tutorial''. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html |
* Lada Adamic. ''Zipf, Power-laws, and Pareto - a ranking tutorial''. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html{{Wayback|url=http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html |date=20071026062626 }} |
||
* Alexander Gelbukh and Grigori Sidorov (2001) [http://www.gelbukh.com/CV/Publications/2001/CICLing-2001-Zipf.htm "Zipf and Heaps Laws’ Coefficients Depend on Language"]. Proc. [[CICLing]]-2001, ''Conference on Intelligent Text Processing and Computational Linguistics'', February 18–24, 2001, Mexico City. Lecture Notes in Computer Science N 2004, ISSN 0302-9743, ISBN 3-540-41687-0, Springer-Verlag: 332–335. |
* Alexander Gelbukh and Grigori Sidorov (2001) [http://www.gelbukh.com/CV/Publications/2001/CICLing-2001-Zipf.htm "Zipf and Heaps Laws’ Coefficients Depend on Language"] {{Wayback|url=http://www.gelbukh.com/CV/Publications/2001/CICLing-2001-Zipf.htm |date=20210224215350 }}. Proc. [[CICLing]]-2001, ''Conference on Intelligent Text Processing and Computational Linguistics'', February 18–24, 2001, Mexico City. Lecture Notes in Computer Science N 2004, ISSN 0302-9743, ISBN 3-540-41687-0, Springer-Verlag: 332–335. |
||
* Damián H. Zanette (2006) "[http://xxx.arxiv.org/abs/cs.CL/0406015 Zipf's law and the creation of musical context,]" ''Musicae Scientiae 10'': 3-18. |
* Damián H. Zanette (2006) "[http://xxx.arxiv.org/abs/cs.CL/0406015 Zipf's law and the creation of musical context,]" ''Musicae Scientiae 10'': 3-18. |
||
* Kali R. (2003) "The city as a giant component: a random graph approach to Zipf's law," ''Applied Economics Letters 10'': 717-720(4) |
* Kali R. (2003) "The city as a giant component: a random graph approach to Zipf's law," ''Applied Economics Letters 10'': 717-720(4) |
||
*{{cite journal |
*{{cite journal|last= Gabaix|first= Xavier|authorlink= Xavier Gabaix|date= August 1999|title= Zipf's Law for Cities: An Explanation|journal= Quarterly Journal of Economics|volume= 114|issue= 3|pages= 739–67|issn= 0033-5533|url= http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf|doi= 10.1162/003355399556133|access-date= 2014-02-05|archive-date= 2021-02-24|archive-url= https://web.archive.org/web/20210224021553/http://pages.stern.nyu.edu/~xgabaix/papers/zipf.pdf|dead-url= no}} |
||
* Axtell, Robert L; [http://www.sciencemag.org/content/293/5536/1818.short Zipf distribution of US firm sizes], Science, 293, 5536, 1818, 2001, American Association for the Advancement of Science |
* Axtell, Robert L; [http://www.sciencemag.org/content/293/5536/1818.short Zipf distribution of US firm sizes] {{Wayback|url=http://www.sciencemag.org/content/293/5536/1818.short |date=20151228192137 }}, Science, 293, 5536, 1818, 2001, American Association for the Advancement of Science |
||
{{refend}} |
{{refend}} |
||
第58行: | 第59行: | ||
{{commons category|Zipf's law}} |
{{commons category|Zipf's law}} |
||
{{div col|cols=2}} |
{{div col|cols=2}} |
||
*{{Cite news | last = Steven | first = Strogatz | authorlink = Steven Strogatz | title = Guest Column: Math and the City |
*{{Cite news | last = Steven | first = Strogatz | authorlink = Steven Strogatz | title = Guest Column: Math and the City | date = 2009-05-29 | url = http://judson.blogs.nytimes.com/2009/05/19/math-and-the-city/ | accessdate = 2009-05-29 | postscript = <!--None--> | work = The New York Times | archive-date = 2015-09-27 | archive-url = https://web.archive.org/web/20150927204318/http://judson.blogs.nytimes.com/2009/05/19/math-and-the-city/ | dead-url = no }}—An article on Zipf's law applied to city populations |
||
*[http://www.theatlantic.com/issues/2002/04/rauch.htm Seeing Around Corners (Artificial societies turn up Zipf's law)] |
*[http://www.theatlantic.com/issues/2002/04/rauch.htm Seeing Around Corners (Artificial societies turn up Zipf's law)] {{Wayback|url=http://www.theatlantic.com/issues/2002/04/rauch.htm |date=20080828141102 }} |
||
*[http://planetmath.org/encyclopedia/ZipfsLaw.html PlanetMath article on Zipf's law] |
*[http://planetmath.org/encyclopedia/ZipfsLaw.html PlanetMath article on Zipf's law]{{Wayback|url=http://planetmath.org/encyclopedia/ZipfsLaw.html |date=20021018011011 }} |
||
*[http://www.hubbertpeak.com/laherrere/fractal.htm Distributions de type "fractal parabolique" dans la Nature (French, with English summary)] |
*[http://www.hubbertpeak.com/laherrere/fractal.htm Distributions de type "fractal parabolique" dans la Nature (French, with English summary)]{{Wayback|url=http://www.hubbertpeak.com/laherrere/fractal.htm |date=20041024144850 }} |
||
*[http://www.newscientist.com/article.ns?id=mg18524904.300 An analysis of income distribution] |
*[http://www.newscientist.com/article.ns?id=mg18524904.300 An analysis of income distribution] {{Wayback|url=http://www.newscientist.com/article.ns?id=mg18524904.300 |date=20080419065136 }} |
||
*[http://www.lexique.org/listes/liste_mots.txt Zipf List of French words] |
*[https://web.archive.org/web/20070623154627/http://www.lexique.org/listes/liste_mots.txt Zipf List of French words] |
||
*[http://1.1o1.in/en/webtools/semantic-depth Zipf list for English, French, Spanish, Italian, Swedish, Icelandic, Latin, Portuguese and Finnish from Gutenberg Project and online calculator to rank words in texts] |
*[https://web.archive.org/web/20110408115104/http://1.1o1.in/en/webtools/semantic-depth Zipf list for English, French, Spanish, Italian, Swedish, Icelandic, Latin, Portuguese and Finnish from Gutenberg Project and online calculator to rank words in texts] |
||
*[http://uk.arxiv.org/abs/physics/9901035 Citations and the Zipf–Mandelbrot's law] |
*[http://uk.arxiv.org/abs/physics/9901035 Citations and the Zipf–Mandelbrot's law] {{Wayback|url=http://uk.arxiv.org/abs/physics/9901035 |date=20210223183018 }} |
||
*[http://demonstrations.wolfram.com/ZipfsLawForUSCities/ Zipf's Law for U.S. Cities] by Fiona Maclachlan, [[Wolfram Demonstrations Project]]. |
*[http://demonstrations.wolfram.com/ZipfsLawForUSCities/ Zipf's Law for U.S. Cities] {{Wayback|url=http://demonstrations.wolfram.com/ZipfsLawForUSCities/ |date=20121113203324 }} by Fiona Maclachlan, [[Wolfram Demonstrations Project]]. |
||
* {{MathWorld |title=Zipf's Law |urlname=ZipfsLaw}} |
* {{MathWorld |title=Zipf's Law |urlname=ZipfsLaw}} |
||
*[http://www.geoffkirby.co.uk/ZIPFSLAW.pdf Zipf's Law examples and modelling (1985)] |
*[http://www.geoffkirby.co.uk/ZIPFSLAW.pdf Zipf's Law examples and modelling (1985)] |
||
*[http://www.nature.com/nature/journal/v474/n7350/full/474164a.html Complex systems: Unzipping Zipf's law (2011)] |
*[http://www.nature.com/nature/journal/v474/n7350/full/474164a.html Complex systems: Unzipping Zipf's law (2011)] {{Wayback|url=http://www.nature.com/nature/journal/v474/n7350/full/474164a.html |date=20110902140933 }} |
||
*[http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/ Benford’s law, Zipf’s law, and the Pareto distribution] by Terence Tao. |
*[http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/ Benford’s law, Zipf’s law, and the Pareto distribution] {{Wayback|url=http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/ |date=20210414220010 }} by Terence Tao. |
||
{{div col end}} |
{{div col end}} |
||
2024年8月22日 (四) 13:00的最新版本
概率质量函數 横纵坐标均为对数比例下,齐夫定律的概率质量函数的图像,其中N = 10。横坐标是指数k 。(注意,函数仅在k为整数时有定义,图上的连线不代表函数连续。) | |||
累積分布函數 横纵坐标均为对数比例下,齐夫定律的累计分布函数的图像,其中N = 10。横坐标是指数k 。(注意,函数仅在k为整数时有定义,图上的连线不代表函数连续。) | |||
参数 |
(实数) (正整数) | ||
---|---|---|---|
值域 | |||
概率质量函数 | |||
累積分布函數 | |||
期望值 | |||
眾數 | |||
熵 | |||
矩生成函数 | |||
特徵函数 |
齐夫定律(英語:Zipf's law,IPA:/ˈzɪf/)是由哈佛大學的語言學家喬治·金斯利·齊夫于1949年发表的实验定律。它可以表述为:在自然语言的語料庫裡,一个单词出现的频率与它在频率表里的排名成反比。所以,频率最高的单词出现的频率大约是出现频率第二位的单词的2倍,而出现频率第二位的单词则是出现频率第四位的单词的2倍。这个定律被作为任何与冪定律概率分布有关的事物的参考。
例子
[编辑]最简单的齐夫定律的例子是“1/f function”。给出一组齐夫分布的频率,按照从最常见到非常见排列,第二常见的频率是最常见频率的出现次数的½,第三常见的频率是最常见的频率的1/3,第n常见的频率是最常见频率出现次数的1/n。然而,这并不精确,因为所有的项必须出现一个整数次数,一个单词不可能出现2.5次。
在布朗语料库中,“the”、“of”、“and”是出現頻率最前的三個單詞,其出現的頻數分別為69971次、36411次、28852次,大約佔整個語料庫100萬個單詞中的7%、3.6%、2.9%,其比例約為6:3:2。大約佔整個語料庫的7%(100万单词中出现69971次)。满足齐夫定律中的描述。仅仅前135個字彙就佔了Brown語料庫的一半。
齐夫定律是一个实验定律,而非理论定律,可以在很多非语言学排名中被观察到,例如不同国家中城市的数量、公司的规模、收入排名等。但它的起因是一个争论的焦点。齐夫定律很容易用点阵图观察,坐标分别为排名和频率的对数(log)。比如,“the”用上述表述可以描述为x = log(1), y = log(69971)的点。如果所有的点接近一条直线,那么它就遵循齐夫定律。
遵循该定律的现象
[编辑]參見
[编辑]延伸閱讀
[编辑]主要:
- George K. Zipf(1949)Human Behavior and the Principle of Least Effort. Addison-Wesley.
- George K. Zipf (1935) The Psychobiology of Language. Houghton-Mifflin.(see citations at http://citeseer.ist.psu.edu/context/64879/0)
次要:
- Lada Adamic. Zipf, Power-laws, and Pareto - a ranking tutorial. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html(页面存档备份,存于互联网档案馆)
- Alexander Gelbukh and Grigori Sidorov (2001) "Zipf and Heaps Laws’ Coefficients Depend on Language" (页面存档备份,存于互联网档案馆). Proc. CICLing-2001, Conference on Intelligent Text Processing and Computational Linguistics, February 18–24, 2001, Mexico City. Lecture Notes in Computer Science N 2004, ISSN 0302-9743, ISBN 3-540-41687-0, Springer-Verlag: 332–335.
- Damián H. Zanette (2006) "Zipf's law and the creation of musical context," Musicae Scientiae 10: 3-18.
- Kali R. (2003) "The city as a giant component: a random graph approach to Zipf's law," Applied Economics Letters 10: 717-720(4)
- Gabaix, Xavier. Zipf's Law for Cities: An Explanation (PDF). Quarterly Journal of Economics. August 1999, 114 (3): 739–67 [2014-02-05]. ISSN 0033-5533. doi:10.1162/003355399556133. (原始内容存档 (PDF)于2021-02-24).
- Axtell, Robert L; Zipf distribution of US firm sizes (页面存档备份,存于互联网档案馆), Science, 293, 5536, 1818, 2001, American Association for the Advancement of Science
外部連結
[编辑]- Steven, Strogatz. Guest Column: Math and the City. The New York Times. 2009-05-29 [2009-05-29]. (原始内容存档于2015-09-27).—An article on Zipf's law applied to city populations
- Seeing Around Corners (Artificial societies turn up Zipf's law) (页面存档备份,存于互联网档案馆)
- PlanetMath article on Zipf's law(页面存档备份,存于互联网档案馆)
- Distributions de type "fractal parabolique" dans la Nature (French, with English summary)(页面存档备份,存于互联网档案馆)
- An analysis of income distribution (页面存档备份,存于互联网档案馆)
- Zipf List of French words
- Zipf list for English, French, Spanish, Italian, Swedish, Icelandic, Latin, Portuguese and Finnish from Gutenberg Project and online calculator to rank words in texts
- Citations and the Zipf–Mandelbrot's law (页面存档备份,存于互联网档案馆)
- Zipf's Law for U.S. Cities (页面存档备份,存于互联网档案馆) by Fiona Maclachlan, Wolfram Demonstrations Project.
- 埃里克·韦斯坦因. Zipf's Law. MathWorld.
- Zipf's Law examples and modelling (1985)
- Complex systems: Unzipping Zipf's law (2011) (页面存档备份,存于互联网档案馆)
- Benford’s law, Zipf’s law, and the Pareto distribution (页面存档备份,存于互联网档案馆) by Terence Tao.