0% found this document useful (0 votes)
10 views14 pages

Random Writer

Random Writer adalah sebuah algoritma atau model yang digunakan untuk menghasilkan teks secara otomatis berdasarkan pola atau urutan kata yang diambil dari sebuah teks sumber. Konsep dasar dari Random Writer adalah memanfaatkan Markov Chains untuk membangun kalimat-kalimat baru yang mengikuti struktur dari teks yang telah diberikan sebagai contoh, tetapi dengan menghasilkan urutan kata yang acak

Uploaded by

aset gis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views14 pages

Random Writer

Random Writer adalah sebuah algoritma atau model yang digunakan untuk menghasilkan teks secara otomatis berdasarkan pola atau urutan kata yang diambil dari sebuah teks sumber. Konsep dasar dari Random Writer adalah memanfaatkan Markov Chains untuk membangun kalimat-kalimat baru yang mengikuti struktur dari teks yang telah diberikan sebagai contoh, tetapi dengan menghasilkan urutan kata yang acak

Uploaded by

aset gis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

Random Writer

or Probabilistic Text Generation

A Nifty Assignment from


Joe Zachary
School of Computing
University of Utah
Random Writer
• Based on an idea by Claude Shannon
(1948) popularized by A.K. Dewdney
(1989)
• Generates random text based on the patterns
in a source file
• Both fun and appropriate for CS 2 students
• Guess which one is actually from the text
– which means the other two are random
Tom Sawyer
Huck started to act very intelligently on the back of his
pocket behind, as usual on Sundays.

He was always dressed fitten for drinking some old


empty hogsheads.

The men contemplated the treasure awhile in blissful


silence.
Tom Sawyer nGram length == 6
Huck started to act very intelligently on the back of his
pocket behind, as usual on Sundays.

He was always dressed fitten for drinking some old


empty hogsheads.

The men contemplated the treasure awhile in blissful


silence.
Hamlet

Ay me, what act, That roars so loud and thunders in the


index?

Worse that a rat? Dead for a ducat, drugs fit that I bid you not?

Leave heart; for to our lord, it we show him, but skin and he,
my lord, I have fat all not over thought, good my lord?
Hamlet nGram length == 5

Ay me, what act, That roars so loud and thunders in the


index?

Worse that a rat? Dead for a ducat, drugs fit that I bid you not?

Leave heart; for to our lord, it we show him, but skin and he,
my lord, I have fat all not over thought, good my lord?
Alice in Wonderland nGram length == 9

This was not here before,' said the Dormouse again,


and we won't talk about cats or dogs

'Let us get to the shore, and then I'll tell you my history, and
you'll understand

'It IS a long tail, certainly,'


Alice in Wonderland
This was not here before,' said the Dormouse again,
and we won't talk about cats or dogs

'Let us get to the shore, and then I'll tell you my history, and
you'll understand

'It IS a long tail, certainly,'


Niftiness
• Not a toy: it slurps up entire books

• Defies expectations: it turns out to be both


straightforward and educational

• Entertaining: I (Joe Zachary) run a contest


to find the funniest generated text
nGram length = 1
• The probability that ch is the next character
to be produced equals the probability that ch
occurs in the source file.
• Just pick any char from the text at random
• quite unreadable
rla bsht eS ststofo hhfosdsdewno
oe wee h .mr ae irii ela iad o r
te u t mnyto onmalysnce, ifu en c
fDwn oee iteo
nGram length ==2
• Let the nGram have two character “in” “nd” “he”
• The probability that ch is the next character to be
produced equals the probability that ch follows
those two characters in the source text

"Shand tucthiney m?" le ollds mind


Theybooure He, he s whit Pereg
lenigabo Jodind alllld ashanthe
ainofevids tre lin--p asto oun
Bigger nGram length
Let the nGram be the previously produced k (4 in this
case) characters. The probability that ch is the next
character to be produced equals the probability that
ch follows the nGram in the source text.

Mr. Welshman, but him awoke, the


balmy shore. I'll give him that
he couple overy because in the
slated snufflindeed structure's
Algorithm
• Read the entire book into one big StringBuffer
– use StringBuffer’s append(String s)
• Pick an initial nGram randomly from that one big
String that holds the entire book
• For each “random” char to print:
– Make a List<Character> holding every char in the
book that follows the current nGram
– Randomly pick a character ch from the
List<Character> that follows the nGram
– Print ch
– Remove the 1st char from the nGram, append ch
An Example: Print 1 char nGram length == 2
• Given this current state of the system:
– The one big string:
We hold these truths to be self-
evident: that all men are created
equal; that they
– A random nGram: “th”

• For this one example loop iteration, do the following:


– build a new List of following chars [e, s, a, a, e]
– pick a char to print (random, could only be e s or a): s
– change nGram (remove first char, add printed char): “hs”

You might also like