Readability: Difference between revisions

Content deleted Content added
m →‎External links: Clean up spacing around commas and other punctuation fixes, replaced: , → ,
 
(19 intermediate revisions by 16 users not shown)
Line 3:
{{Reading}}
 
'''Readability''' is the ease with which a [[Reading (process)|reader]] can [[Understanding|understand]] a [[Writing|written text]]. The concept exists in both in [[natural language]] and [[Programmingprogramming language|programming languages]]s though in different forms. In [[natural language]], the readability of text depends on its [[Content (media)|content]] (the complexity of its [[vocabulary]] and [[syntax]]) and its presentation (such as [[Typography|typographic]] aspects that affect [[legibility]], like [[font size]], [[line height]], [[Kerning|character spacing]], and [[line length]]).<ref>{{Cite web |date=8 May 2013 |title=Typographic Readability and Legibility |url=https://webdesign.tutsplus.com/articles/typographic-readability-and-legibility--webdesign-12211 |access-date=2020-08-17 |website=Web Design Envato Tuts+}}</ref> In [[Computer programming|programming]], things such as programmer [[Comment (computer programming)|comments]], choice of [[Control flow#Loops|loop]] structure, and [[Naming convention (programming)|choice of names]] can determine [[Computer programming#Readability of source code|the ease with which humans can read computer program code]].
 
Higher readability in a text eases reading effort and speed for the general population of readers. For those who do not have high [[reading comprehension]], readability is necessary for understanding and applying a given text. Techniques to simplify readability are essential to communicate a set of information to the intended audience.<ref>{{Cite journal |lastlast1=Link to external site |firstfirst1=this link will open in a new tab |last2=Link to external site |first2=this link will open in a new tab |date=2023 |title=Text Simplification to Specific Readability Levels |urljournal=https://www.proquest.com/docview/2812618745Mathematics |volume=11 |issue=9 |language=English |pages=2063 |doi=10.3390/math11092063|id={{ProQuest|2812618745}} |doi-access=free }}</ref> Whether it is code, news information, or storytelling, every writer has a target audience that they have to adjust their readability levels to.
 
== Definition ==
Different definitions of readability exist from various sources. The term "readability" is '''inherently broad''' and can become confusing when examining all of the possible definitions.<ref name="DaleChall3">Dale, Edgar and Jeanne S. Chall. 1949. "The concept of readability." ''Elementary English'' 26:23.</ref> Readability is a concept that involves '''audience, content, quality, legibility,''' and can even involve the '''formatting''' and '''design structure''' of any given text.<ref name="Harris19953">Harris, Theodore L. and Richard E. Hodges, eds. 1995. ''The Literacy Dictionary, The Vocabulary of Reading and Writing.'' Newark, DE: International Reading Assn.</ref> Therefore, the definition can fluctuate based on the type of audience to whom one is presenting a certain type of content to. For example, a technical writer might focus on clear and concise language and formatting that allows easy-reading. In contrast, a scholarly journal would use sophisticated writing that would appeal and make sense to the type of audience to whom they are directing information to.
 
== Applications ==
Readability is essential to the '''clarity''' and '''accessibility''' of texts used in classrooms, work environments, and every dayeveryday life. The government prioritizes readability as well through Plain Language Laws which enforces important documents to be written at an 8th grade level.<ref>{{Cite journal |last=Fusaro |first=Joseph A. |date=September 1988 |title=Applying statistical rigor to a validation study of the fry readability graph |url=http://dx.doi.org/10.1080/19388078809557957 |journal=Reading Research and Instruction |volume=28 |issue=1 |pages=44–48 |doi=10.1080/19388078809557957 |issn=0886-0246}}</ref>
 
Much research has focused on matching prose to reading skill, resulting in formulas for use in research, government, teaching, publishing, the military, medicine, and business.<ref name="Fryuse">Fry, E. B. 1986. ''Varied uses of readability measurement.'' Paper presented at the 31st Annual Meeting of the International Reading Association, Philadelphia, PA.</ref><ref name="Rabin">Rabin, A. T. 1988 "Determining difficulty levels of text written in languages other than English." In ''Readability: Its past, present, and future,'' eds. B. L. Zakaluk and S. J. Samuels. Newark, DE: International Reading Association.</ref>
 
===Readability and Newspapernewspaper Readershipreadership===
Several studies in the 1940s showed that even small increases in readability greatly increases readership in large-circulation newspapers.
 
In 1947, Donald Murphy of ''Wallace's Farmer'' used a split-run<ref name="Murphy2">Murphy, D. 1947. "How plain talk increases readership 45% to 60%." ''Printer's ink.'' 220:35–37.</ref> edition to study the effects of making text easier to read. He found that reducing from the 9th to the 6th-grade reading level increased readership by 43% for an article about 'nylon'. He also found a 60% increase in readership for an article on corn, with better responses from people under 35.<ref name="Murphy2" /> The result was a gain of 42,000 readers in a circulation of 275,000.
 
Wilber Schramm, who directed the Communications Research program at the University of Illinois interviewed 1,050 newspaper readers in 1947. He found that an easier reading style helps to determine how much of an article is read. This was called '''reading persistence, depth, or perseverance'''. He also found that people will read less of long articles than of short ones, for example, a story nine paragraphs long will lose 3 out of 10 readers by the fifth paragraph. In contrast, a shorter story will lose only 2 out of 10 readers.<ref name="Schramm">Schramm, W. 1947. "Measuring another dimension of newspaper readership." ''Journalism quarterly'' 24:293–306.</ref>
 
A study in 1947 by Melvin Lostutter showed that newspapers were generally written at a level five years above the ability of average American adult readers. The reading ease of newspaper articles was not found to have much connection with the education, experience, or personal interest of the journalists writing the stories. It instead had more to do with the convention and culture of the industry. Lostutter argued for more readability testing in newspaper writing. Improved readability must be a "conscious process somewhat independent of the education and experience of the staffs ''writers.''"''<ref name="Lostutter">Lostutter, M. 1947. "Some critical factors in newspaper readability." ''Journalism quarterly'' 24:307–314.</ref>''
Line 28:
Both Rudolf Flesch and Robert Gunning worked extensively with newspapers and the wire services in improving readability. Mainly through their efforts in a few years, the readability of US newspapers went from the 16th to the 11th-grade level, where it remains today.
 
The two publications with the largest circulations, ''TV Guide'' (13 million) and ''Reader's Digest'' (12 million), are written at the 9th-grade level.<ref name="DuBay">DuBay, W. H. 2006. ''Smart language: Readers, Readability, and the Grading of Text''. Costa Mesa:Impact Information.</ref> The most popular novels are written at the 7th-grade level. This supports the fact that the average adult reads at the 9th-grade level. It also shows that, for recreation, people read texts that are two grades below their actual reading level.<ref name="KlareBuckKlareBuck3"/>
 
==Early Researchresearch==
In the 1880s, English professor L. A. Sherman found that the English sentence was getting shorter. In [[Elizabethan]] times, the average sentence was 50 words long while in Sherman's modern time, it was 23 words long.
 
Line 46:
In 1921, Harry D. Kitson published ''The Mind of the Buyer'', one of the first books to apply psychology to marketing. Kitson's work showed that each type of reader bought and read their own type of text. On reading two newspapers and two magazines, he found that short sentence length and short [[word length]] were the best contributors to reading ease.<ref name="Kitson">Kitson, Harry D. 1921. ''The Mind of the Buyer.'' New York: Macmillan.</ref>
 
==Text Levelingleveling==
The earliest reading ease assessment is the subjective judgment termed '''text leveling'''. Formulas do not fully address the various content, purpose, design, visual input, and organization of a text.<ref name="Clay">Clay, M. 1991. ''Becoming literate: The construction of inner control.'' Portsmouth, NH: Heinneman.</ref><ref name="frylevel">Fry, E. B. 2002. "Text readability versus leveling." ''Reading Teacher'' 56 no. 23:286–292.</ref> Text leveling is commonly used to rank the reading ease of texts in areas where reading difficulties are easy to identify, such as books for young children. At higher levels, ranking reading ease becomes more difficult, as individual difficulties become harder to identify. This has led to better ways to assess reading ease.
 
==Vocabulary Frequencyfrequency Listslists==
In the 1920s, the scientific movement in education looked for tests to measure students' achievement to aid in curriculum development. Teachers and educators had long known that, to improve reading skill, readers—especially beginning readers—need reading material that closely matches their ability. University-based psychologists did much of the early research, which was later taken up by textbook publishers.<ref name="fry">Fry, Edward B. 2006. "Readability." ''Reading Hall of Fame Book.'' Newark, DE: International Reading Assn.</ref>
 
Educational psychologist [[Edward Thorndike]] of Columbia University noted that, in Russia and Germany, teachers used word frequency counts to match books to students. Word skill was the best sign of intellectual development, and the strongest predictor of reading ease. In 1921, Thorndike published ''Teachers Word Book'', which contained the [[word frequency|frequencies]] of 10,000 words.<ref>Thorndike E.L. 1921 ''The teacher's word book''. 1932 ''A teacher's word book of the twenty thousand words found most frequently and widely in general reading for children and young people''. 1944 (with J.E. Lorge) ''The teacher's word book of 30,000 words''.</ref> It made it easier for teachers to choose books that matched class reading skills. It also provided a basis for future research on reading ease.
 
Until computers came along, word frequency lists were the best aids for grading reading ease of texts.<ref name="KlareBuckKlareBuck3">Klare, G. R. and B. Buck. 1954. ''Know Your Reader: The scientific approach to readability.'' New York: Heritage House.</ref> In 1981 the World Book Encyclopedia listed the grade levels of 44,000 words.<ref name="livingword">Dale, E. and J. O'Rourke. 1981. ''The living word vocabulary: A national vocabulary inventory.'' World Book-Childcraft International.</ref> A popular strategy amongst educators in modern times is "incidental vocabulary learning," which enforces efficiency in learning vocabulary in the short-term rather than drilling words and meanings teachers hope will stick.<ref>{{Cite journal |last=He |first=Shumin 1 1 Country Garden Experimental School |date=2023 |title=Exploration of Incidental Vocabulary Learning Strategies from Different Modes to Acquire Vocabulary |urljournal=https://www.proquest.com/docview/2866467078The Educational Review, USA |volume=7 |issue=7 |language=English |pages=927–932 |doi=10.26855/er.2023.07.014|id={{ProQuest|2866467078}} |doi-access=free }}</ref> The incidental learning tactic is meant to help learners build comprehension and learning skills rather than memorizing words. Through this strategy, students would hopefully be able to navigate various levels of readability using context clues and comprehension.
 
==Early Childrenchildren's Readabilityreadability Formulasformulas==
In 1923, Bertha A. Lively and [[Sidney L. Pressey]] published the first reading ease formula. They were concerned that junior high school science textbooks had so many technical words and that teachers would spend all class time explaining these words. They argued that their formula would help to measure and reduce the "vocabulary burden" of textbooks. Their formula used five variable inputs and six constants. For each thousand words, it counted the number of unique words, the number of words not on the Thorndike list, and the median index number of the words found on the list. Manually, it took three hours to apply the formula to a book.<ref name="Lively">Lively, Bertha A. and S. L. Pressey. 1923. "A method for measuring the 'vocabulary burden' of textbooks. ''Educational administration and supervision'' 9:389–398.</ref>
 
Line 65:
In 1934, Edward Thorndike published his formula. He wrote that word skills can be increased if the teacher introduces new words and repeats them often.<ref name="Thorn2">Thorndike, E. 1934. "Improving the ability to read." ''Teachers college record'' 36:1–19, 123–44, 229–41. October, November, December.</ref> In 1939, W.W. Patty and W. I Painter published a formula for measuring the vocabulary burden of textbooks. This was the last of the early formulas that used the Thorndike vocabulary-frequency list.<ref name="Patty">Patty. W. W. and W. I. Painter. 1931. "A technique for measuring the vocabulary burden of textbooks." ''Journal of educational research'' 24:127–134.</ref>
 
==Early Adultadult Readabilityreadability Formulasformulas==
During the recession of the 1930s, the U.S. government invested in [[adult education]]. In 1931, [[Douglas Waples]] and [[Ralph W. Tyler|Ralph Tyler]] published ''What Adults Want to Read About.'' It was a two-year study of adult reading interests. Their book showed not only what people read but what they would like to read. They found that many readers lacked suitable reading materials: they would have liked to learn but the reading materials were too hard for them.<ref name="Waples">Waples, D. and R. Tyler. 1931. ''What adults want to read about.''Chicago: University of Chicago Press.</ref>
 
[[Lyman Bryson]] of [[Teachers College, Columbia University]] found that many adults had poor reading ability due to poor education. Even though [[college]]s had long tried to teach how to write in a clear and readable style, Bryson found that it was rare. He wrote that such language is the result of a "...[[discipline]] and artistry that few people who have ideas will take the trouble to achieve... If simple language were easy, many of our problems would have been solved long ago."<ref name="KlareBuckKlareBuck3"/> Bryson helped set up the Readability Laboratory at the college. Two of his students were Irving Lorge and [[Rudolf Flesch]].
 
In 1934, Ralph Ojemann investigated adult reading skills, factors that most directly affect reading ease, and causes of each level of difficulty. He did not invent a formula, but a method for assessing the difficulty of materials for [[parent education]]. He was the first to assess the validity of this method by using 16 magazine passages tested on actual readers. He evaluated 14 measurable and three reported factors that affect reading ease.
Line 102:
* Used vocabulary and sentence length in formulas to predict reading ease
 
==Popular Readabilityreadability Formulasformulas==
 
===The Flesch Formulasformulas===
{{Main|Flesch–Kincaid readability tests}}
In 1943, Rudolf Flesch published his PhD dissertation, ''Marks of a Readable Style'', which included a readability formula to predict the difficulty of adult reading material. Investigators in many fields began using it to improve communications. One of the variables it used was ''personal references,'' such as names and personal pronouns. Another variable was ''affixes''.<ref name="FleschStyle">Flesch, R. "Marks of a readable style." ''Columbia University contributions to education,'' no. 187. New York: Bureau of Publications, Teachers College, Columbia University.</ref>
Line 122:
In 1975, in a project sponsored by the U.S. Navy, the Reading Ease formula was recalculated to give a grade-level score. The new formula is now called the [[Flesch–Kincaid readability tests|Flesch–Kincaid grade-level]] formula.<ref name="Kincaid">Kincaid, J. P., R. P. Fishburne, R. L. Rogers, and B. S. Chissom. 1975. ''Derivation of new readability formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease Formula) for Navy enlisted personnel.'' CNTECHTRA Research Branch Report 8-75.</ref> The Flesch–Kincaid formula is one of the most popular and heavily tested formulas. It correlates 0.91 with comprehension as measured by reading tests.<ref name="DuBay"/>
 
===The Dale–Chall Formulaformula===
{{Main|Dale–Chall readability formula}}
[[Edgar Dale]], a professor of education at Ohio State University, was one of the first critics of Thorndike's vocabulary-frequency lists. He claimed that they did not distinguish between the different meanings that many words have. He created two new lists of his own. One, his "short list" of 769 easy words, was used by Irving Lorge in his formula. The other was his "long list" of 3,000 easy words, which were understood by 80% of fourth-grade students. However, one has to extend the word lists by regular plurals of nouns, regular forms of the past tense of verbs, progressive forms of verbs etc. In 1948, he incorporated this list into a formula he developed with [[Jeanne Chall|Jeanne S. Chall]], who later founded the Harvard Reading Laboratory.
Line 159:
Raw score = 64 - 0.95 *(PDW) - 0.69 *(ASL)
 
===The Gunning Fogfog Formulaformula===
{{Main|Gunning fog index}}
In the 1940s, Robert Gunning helped bring readability research into the workplace. In 1944, he founded the first readability consulting firm dedicated to reducing the "fog" in newspapers and business writing. In 1952, he published ''The Technique of Clear Writing'' with his own Fog Index, a formula that correlates 0.91 with comprehension as measured by reading tests.<ref name="DuBay"/> The formula is one of the most reliable and simplest to apply:
Line 165:
:Grade level= 0.4 * ( (average sentence length) + (percentage of Hard Words) )
 
:Where: Hard Words = words with more than two syllables.<ref name="GunningGunning2">Gunning, R. 1952. ''The Technique of Clear Writing''. New York: McGraw–Hill.</ref>
 
===Fry Readabilityreadability Graphgraph===
{{Main|Fry readability formula}}
In 1963, while teaching English teachers in Uganda, Edward Fry developed his [[Fry readability formula|Readability Graph]]. It became one of the most popular formulas and easiest to apply.<ref name="Fry">Fry, E. B. 1963. ''Teaching faster reading''. London: Cambridge University Press.</ref><ref name="Fry2">Fry, E. B. 1968. "A readability formula that saves time." '' Journal of reading '' 11:513–516.</ref> The Fry Graph correlates 0.86 with comprehension as measured by reading tests.<ref name="DuBay"/>
 
===McLaughlin's SMOG Formulaformula===
{{Main|SMOG}}
Harry McLaughlin determined that word length and sentence length should be multiplied rather than added as in other formulas. In 1969, he published his SMOG (Simple Measure of Gobbledygook) formula:
Line 179:
:Where: polysyllable count = number of words of more than two syllables in a sample of 30 sentences.<ref name="McLaughlin1969">McLaughlin, G. H. 1969. "SMOG grading-a new readability formula." ''Journal of reading'' 22:639–646.</ref>
 
The SMOG formula correlates 0.88 with comprehension as measured by reading tests.<ref name="DuBay"/> It is often recommended for use in healthcare.<ref name="Doak">Doak, C. C., L. G. Doak, and J. H. Root. 1996. ''Teaching patients with low literacy skills''. Philadelphia: J. PB. Lippincott Company& Co.</ref>
 
===FORCAST formula<span class="anchor" id="The FORCAST formula"></span>===
===The FORCAST Formula<!-- [[FORCAST]] redirects here. Please update it if you rename this section. -->===
In 1973, a study commissioned by the US military of the reading skills required for different military jobs produced the FORCAST formula. Unlike most other formulas, it uses only a vocabulary element, making it useful for texts without complete sentences. The formula satisfied requirements that it would be:
* Based on Army-job reading materials.
Line 195:
The FORCAST formula correlates 0.66 with comprehension as measured by reading tests.<ref name="DuBay"/>
 
=== The Golub Syntactic Density Score ===
The Golub Syntactic Density Score was developed by Lester Golub in 1974. It is among a smaller subset of readability formulas that concentrate on the syntactic features of a text. To calculate the reading level of a text, a sample of several hundred words is taken from the text. The number of words in the sample is counted, as are the number of T-units. A T-unit is defined as an independent clause and any dependent clauses attached to it. Other syntactical units are then counted and entered into the following table:
 
Line 254:
|}
 
===Measuring Coherencecoherence and Organizationorganization===
For centuries, teachers and educators have seen the importance of organization, coherence, and emphasis in good writing. Beginning in the 1970s, cognitive theorists began teaching that reading is really an act of thinking and organization. The reader constructs meaning by mixing new knowledge into existing knowledge. Because of the limits of the reading ease formulas, some research looked at ways to measure the content, organization, and coherence of text. Although this did not improve the reliability of the formulas, their efforts showed the importance of these variables in reading ease.
 
Line 271:
* Difficult concepts;<ref name="Chall"/>
* Idea density;<ref name="Dolch">Dolch. E. W. 1939. "Fact burden and reading difficulty." ''Elementary English review'' 16:135–138.</ref>
* Human interest;<ref name="GunningGunning2"/><ref name="Fleschwrite">{{cite book |last=Flesch |first=R. |author-link=Rudolf Flesch |year=1949 |title=The Art of Readable Writing |location=New York |publisher=Harper |oclc=318542}}</ref>
* Nominalization;<ref name="ColemanBlu">Coleman, E. B. and P. J. Blumenfeld. 1963. "Cloze scores of nominalization and their grammatical transformations using active verbs." ''Psychology reports'' 13:651–654.</ref>
* Active and passive voice;<ref name="Gough">Gough, P. B. 1965. "Grammatical transformations and the speed of understanding." ''Journal of verbal learning and verbal behavior'' 4:107–111.</ref><ref name="Coleman">Coleman, E. B. 1966. "Learning of prose written in four grammatical transformations." ''Journal of Applied Psychology'' 49:332–341.</ref><ref name="Clark">Clark, H. H. and S. E. Haviland. 1977. "Comprehension and the given-new contract." In ''Discourse production and comprehension,'' ed. R. O. Freedle. Norwood, NJ: Ablex Press, pp. 1–40.</ref><ref name="Hornby">Hornby, P. A. 1974. "Surface structure and presupposition." ''Journal of verbal learning and verbal behavior'' 13:530–538.</ref>
Line 282:
*Document age.<ref name="Jatowt">Jatowt, A. and K. Tanaka. 2012. "Longitudinal analysis of historical texts' readability." ''Proceedings of Joint Conference on Digital Libraries 2012'' 353-354</ref>
 
==Advanced Readabilityreadability Formulasformulas==
 
===The John Bormuth Formulasformulas===
John Bormuth of the University of Chicago looked at reading ease using the new [[Cloze test|Cloze deletion test]] developed by Wilson Taylor. His work supported earlier research including the degree of reading ease for each kind of reading. The best level for classroom "assisted reading" is a slightly difficult text that causes a "set to learn", and for which readers can correctly answer 50% of the questions of a multiple-choice test. The best level for unassisted reading is one for which readers can correctly answer 80% of the questions. These cutoff scores were later confirmed by Vygotsky<ref name="Vygotsky">Vygotsky, L. 1978. ''Mind in society.'' Cambridge, MA: Harvard University Press.</ref> and Chall and Conard.<ref name="ChallConard">Chall, J. S. and S. S. Conard. 1991. ''Should textbooks challenge students? The case for easier or harder textbooks.'' New York: Teachers College Press.</ref>
Among other things, Bormuth confirmed that vocabulary and sentence length are the best indicators of reading ease. He showed that the measures of reading ease worked as well for adults as for children. The same things that children find hard are the same for adults of the same reading levels. He also developed several new measures of cutoff scores. One of the most well known was the ''Mean Cloze Formula'', which was used in 1981 to produce the ''Degree of Reading Power'' system used by the College Entrance Examination Board.<ref name="Bormuth">Bormuth, J. R. 1966. "Readability: A new approach." ''Reading research quarterly 1:79–132.''</ref><ref name="Bormuth2">Bormuth, J. R. 1969. ''Development of readability analysis'': Final Report, Project no 7-0052, Contract No. OEC-3-7-0070052-0326. Washington, D. C.: U. S. Office of Education, Bureau of Research, U. S. Department of Health, Education, and Welfare.</ref><ref name="Bormuth3">Bormuth, J. R. 1971. ''Development of standards of readability: Towards a rational criterion of passage performance.'' Washington, D. C.: U. S. Office of Education, Bureau of Research, U. S. Department of Health, Education, and Welfare.</ref>
 
===The Lexile Frameworkframework===
In 1988, Jack Stenner and his associates at MetaMetrics, Inc. published a new system, the [[Lexile|Lexile Framework]], for assessing readability and matching students with appropriate texts.
 
The Lexile framework uses average sentence length, and average word frequency in the American Heritage Intermediate Corpus to predict a score on a 0–2000 scale. The AHI Corpus includes five million words from 1,045 published works often read by students in grades three to nine.{{Citation needed|date=February 2024}}<!--what's this trying to say?-->
 
The Lexile Book Database has more than 100,000 titles from more than 450 publishers. By knowing a student's Lexile score, a teacher can find books that match his or her reading level.<ref name="Stenner">Stenner, A. J., I Horabin, D. R. Smith, and R. Smith. 1988. ''The Lexile Framework.'' Durham, NC: Metametrics.</ref>
 
===ATOS Readability Formula for Books===
Line 315 ⟶ 313:
*Feedback and interaction with the teacher are the most important factors in reading.<ref name="atos">School Renaissance Institute. 2000. ''The ATOS readability formula for books and how it compares to other formulas.'' Madison, WI: School Renaissance Institute, Inc.</ref><ref name="Paul">Paul, T. 2003. ''Guided independent reading.'' Madison, WI: School Renaissance Institute, Inc. [http://www.renlearn.com/GIRP2008.pdf http://www.renlearn.com/GIRP2008.pdf]</ref>
 
===CohMetrix Psycholinguisticspsycholinguistics Measurementsmeasurements===
[[Coh-Metrix]] can be used in many different ways to investigate the cohesion of the explicit text and the coherence of the mental representation of the text. "Our definition of [[Cohesion (linguistics)|cohesion]] consists of characteristics of the explicit text that play some role in helping the reader mentally connect ideas in the text."<ref name="graesser2003">{{Citation | last1 = Graesser | first1 = A.C. | last2 = McNamara | first2 = D.S. | last3 = Louwerse | first3 = M.M. | editor-last = Sweet | editor-first = A.P. | editor2-last = Snow | editor2-first = C.E. | year = 2003 | title = What do readers need to learn in order to process coherence relations in narrative and expository text | work = Rethinking reading comprehension | publisher = Guilford Publications | publication-place = New York | pages = 82–98}}</ref> The definition of coherence is the subject of much debate. Theoretically, the coherence of a text is defined by the interaction between linguistic representations and knowledge representations. While coherence can be defined as characteristics of the text (i.e., aspects of cohesion) that are likely to contribute to the coherence of the mental representation, Coh-Metrix measurements provide indices of these cohesion characteristics.<ref name="graesser2003"/>
 
==Other Formulasformulas==
 
*[[Automated readability index]] (1967)
Line 324 ⟶ 322:
*[[Spache readability formula]] (1952)
 
==Artificial Intelligenceintelligence (AI) Approachapproach==
Unlike the traditional readability formulas, [[AI|artificial intelligence]] approaches to readability assessment (also known as '''Automatic Readability Assessment''') incorporate myriad linguistic features and construct statistical prediction models to predict text readability.<ref name="Text Readability Assessment for Sec">{{cite journal |last1=Xia |first1=Menglin |last2=Kochmar |first2=Ekaterina |last3=Briscoe |first3=Ted |date=June 2016 |title=Text Readability Assessment for Second Language Learners |url=https://www.aclweb.org/anthology/W16-0502 |journal=Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications |pages=12–22 |arxiv=1906.07580 |doi=10.18653/v1/W16-0502 |doi-access=free}}</ref><ref name="aclweb.org">{{cite journal |last1=Lee |first1=Bruce W. |last2=Lee |first2=Jason |title=LXPER Index 2.0: Improving Text Readability Assessment Model for L2 English Students in Korea |journal=Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications |date=Dec 2020 |pages=20–24 |arxiv=2010.13374 |url=https://www.aclweb.org/anthology/2020.nlptea-1.3}}</ref> These approaches typically consist of three steps: 1. a training corpus of individual texts, 2. a set of linguistic features to be computed from each text, and 3. a [[machine learning]] model to predict the readability, using the computed linguistic feature values.<ref>{{cite journal |last1=Feng |first1=Lijun |last2=Jansche |first2=Martin |last3=Huernerfauth |first3=Matt |last4=Elhadad |first4=Noémie |title=A Comparison of Features for Automatic Readability Assessment |journal=Coling 2010: Posters |date=August 2010 |pages=276–284 |url=https://www.aclweb.org/anthology/C10-2032}}</ref><ref name="On Improving the Accuracy of Readab">{{cite journal |last1=Vajjala |first1=Sowmya |last2=Meurers |first2=Detmar |title=On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition |journal=Proceedings of the Seventh Workshop on Building Educational Applications Using NLP |date=June 2012 |pages=163–173 |url=https://www.aclweb.org/anthology/W12-2019}}</ref><ref name="aclweb.org"/>
 
===Corpora===
Line 334 ⟶ 332:
Wei Xu ([[University of Pennsylvania]]), Chris Callison-Burch ([[University of Pennsylvania]]), and Courtney Napoles ([[Johns Hopkins University]]) introduced the [[Newsela]] corpus to the academic field in 2015.<ref>{{cite journal |last1=Xu |first1=Wei |last2=Callison-Burch |first2=Chris |last3=Napoles |first3=Courtney |title=Problems in Current Text Simplification Research: New Data Can Help |journal=Transactions of the Association for Computational Linguistics |date=2015 |volume=3 |pages=283–297|doi=10.1162/tacl_a_00139 |s2cid=17817489 |doi-access=free }}</ref> The corpus is a collection of thousands of news articles professionally leveled to different reading complexities by professional editors at [[Newsela]]. The corpus was originally introduced for [[text simplification]] research, but was also used for text readability assessment.<ref>{{cite journal |last1=Deutsch |first1=Tovly |last2=Jasbi |first2=Masoud |last3=Shieber |first3=Stuart |title=Linguistic Features for Readability Assessment |journal=Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications |date=July 2020 |pages=1–17 |doi=10.18653/v1/2020.bea-1.1 |arxiv=2006.00377 |url=https://www.aclweb.org/anthology/2020.bea-1.1|doi-access=free }}</ref>
 
===Linguistic Featuresfeatures===
====Semantic or Advancedadvanced Semanticsemantic====
Advanced semantic or semantic features' influence on text readability was pioneered by Bruce W. Lee during his study at the ([[University of Pennsylvania]]), in 2021. Whilst introducing his features hybridization method, he also explored handcrafted advanced semantic features which aim to measure the amount of knowledge contained in a given text.<ref>{{cite journalbook |last1=Lee |first1=Bruce W. |last2=Jang |first2=Yoo Sung |last3=Lee |first3=Jason Hyung-Jong |titlechapter=Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features |journaltitle=Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing |series=EMNLP '21 |date=November 2021 |pages=10669–10686 |doi=10.18653/v1/2021.emnlp-main.834 |s2cid=237940206 |chapter-url=https://aclanthology.org/2021.emnlp-main.834/|arxiv=2109.12258 }}</ref>
*Semantic Richness : <math>\sum_{i=1}^{n} p_i \cdot i</math>
*Semantic Clarity : <math>\frac{1}{n} \cdot \sum_{i=1}^{n} max(p) - p_{i}</math>
Line 342 ⟶ 340:
where the count of discovered topics (n) and topic probability (p)
 
====Lexico-Semanticsemantic====
The type-token ratio is one of the features that are often used to captures the lexical richness, which is a measure of vocabulary range and diversity. To measure the lexical difficulty of a word, the relative frequency of the word in a representative corpus like the [[Corpus of Contemporary American English]] (COCA) is often used. Below includes some examples for lexico-semantic features in readability assessment.<ref name="Computational assessment of text re"/>
*Average number of syllables per word
Line 351 ⟶ 349:
*Language model perplexity (comparing the text to generic or genre-specific models)
 
In addition, Lijun Feng pioneered the cognitively-motivated features (mostly lexical) in 2009. This was during her [[doctorate]] study at the [[City University of New York]] (CUNY).<ref>{{cite journalbook |last1=Feng |first1=Lijun |last2=Elhadad |first2=Noémie |last3=Huenerfauth |first3=Matt |titlechapter=Cognitively motivated features for readability assessment |journaltitle=EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics |series=Eaclon - EACL '09 |date=March 2009 |pages=229–237 |doi=10.3115/1609067.1609092 |s2cid=13888774 |chapter-url=https://dl.acm.org/doi/10.5555/1609067.1609092|doi-access=free }}</ref> The cognitively-motivated features were originally designed for adults with [[intellectual disability]], but was proved to improve readability assessment accuracy in general. Cognitively-motivated features, in combination with a [[logistic regression]] model, can correct the average error of [[Flesch–Kincaid readability tests|Flesch–Kincaid grade-level]] by more than 70%. The newly discovered features by Feng include:
*Number of [[lexical chain]]s in document
*Average number of unique entities per sentence
Line 367 ⟶ 365:
*Average number of verb phrases per sentence
 
== Using the Readabilityreadability Formulasformulas ==
The accuracy of readability formulas increases when finding the average readability of a large number of works. The tests generate a score based on characteristics such as [[statistical average]] word length (which is used as an unreliable proxy for [[Semantics|semantic]] difficulty; sometimes [[word frequency]] is taken into account) and sentence length (as an unreliable proxy for [[Syntax|syntactic]] complexity) of the work.
 
Most experts agree that simple readability formulas like [[Flesch–Kincaid readability tests|Flesch–Kincaid grade-level]] can be highly misleading.<ref name="KlareBuck3">Klare, G. R. and B. Buck. 1954. ''Know Your Reader: The scientific approach to readability.'' New York: Heritage House.</ref> Even though the traditional features like the average sentence length have high correlation with reading difficulty, the measure of readability is much more complex. The [[artificial intelligence]], data-driven approach (see above) was studied to tackle this shortcoming.<ref name="Gunning2">Gunning, R. 1952. ''The Technique of Clear Writing''. New York: McGraw–Hill.</ref>
 
Writing experts have warned that an attempt to simplify the text only by changing the length of the words and sentences may result in text that is more difficult to read.<ref name="Fleschwrite2">{{cite book |last=Flesch |first=R. |title=The Art of Readable Writing |publisher=Harper |year=1949 |location=New York |oclc=318542 |author-link=Rudolf Flesch}}</ref> All the variables are tightly related. If one is changed, the others must also be adjusted, including approach, voice, person, tone, typography, design, and organization.
 
Writing for a class of readers other than one's own is very difficult. It takes training, method, and practice.<ref name="FleschArt2">Flesch, R. 1946. ''The art of plain talk.'' New York: Harper.</ref> Among those who are good at this are writers of novels and children's books. The writing experts all advise that, besides using a formula, observe all the norms of good writing, which are essential for writing readable texts. Writers should study the texts used by their audience and their reading habits.<ref name="FleschPlain2">Flesch, R. 1979. ''How to write in plain English: A book for lawyers and consumers''. New York: Harpers.</ref> This means that for a 5th-grade audience, the writer should study and learn good quality 5th-grade materials.<ref name="KlareEnglish2">Klare, G. R. 1980. ''How to write readable English.'' London: Hutchinson.</ref><ref name="Frywriting2">Fry, E. B. 1988. "Writeability: the principles of writing for increased comprehension." In ''Readability: Its past, present, and future'', eds. B. I. Zakaluk and S. J. Samuels. Newark, DE: International Reading Assn.</ref>
 
== Readability of Wikipedia ==
{{See also|Criticism_of_Wikipedia#Quality_of_writing|Help:How to write a readable article|user:Phlsph7/Readability}}{{See also|Health_information_on_Wikipedia#Readability}}
 
==See also==
Line 397 ⟶ 392:
* Manzo, A. V. and U. C. Manzo. 1995. ''Teaching children to be literate.'' Fort Worth: Harcourt Brace.
* Vacca, J. A., R. Vacca, and M. K. Gove. 1995. ''Reading and learning to read.'' New York: HarperCollins.
* Johns, Adrian. 2023. ''The Science of Reading: Information, Media, and Mind in Modern America.'' Chicago: University of Chicago Press.
 
==External links==
Line 403 ⟶ 399:
* [https://readable.io/ Readability Scoring Tool - Scores against many readability formulas at once - Readable.io]
* [http://www.joeswebtools.com/text/readability-tests/ Readability Tests - Joe's Web Tools]
* [https://www.usingenglish.com/resources/text-statistics/ Text Content Analysis Tool -UsingEnglish.com] , free membership required
* [https://hellotools.org/en/check-and-measure-the-readability-of-a-website-text Check text readability online]
 
{{Readability tests}}