Skip to content

Commit 5f9bf0d

Browse files
committed
Finishing assignment
1 parent 00732df commit 5f9bf0d

File tree

2 files changed

+266
-28
lines changed

2 files changed

+266
-28
lines changed

Diff for: Assignments/EN/Assignment_5.ipynb

+266-28
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
22
"cells": [
33
{
4-
"attachments": {},
54
"cell_type": "markdown",
65
"metadata": {},
76
"source": [
@@ -17,15 +16,13 @@
1716
]
1817
},
1918
{
20-
"attachments": {},
2119
"cell_type": "markdown",
2220
"metadata": {},
2321
"source": [
2422
"***"
2523
]
2624
},
2725
{
28-
"attachments": {},
2926
"cell_type": "markdown",
3027
"metadata": {},
3128
"source": [
@@ -34,7 +31,7 @@
3431
},
3532
{
3633
"cell_type": "code",
37-
"execution_count": 5,
34+
"execution_count": 1,
3835
"metadata": {},
3936
"outputs": [
4037
{
@@ -67,30 +64,34 @@
6764
]
6865
},
6966
{
70-
"attachments": {},
7167
"cell_type": "markdown",
7268
"metadata": {},
7369
"source": [
7470
"***"
7571
]
7672
},
7773
{
78-
"attachments": {},
7974
"cell_type": "markdown",
8075
"metadata": {},
8176
"source": [
82-
"You will solve the following exercises using Pure Python.\n",
83-
"1. Count words in a text\n",
77+
"#### You will solve the following exercises using Pure Python.\n",
78+
"1. Count words in a text \n",
8479
"2. Sort a list of words in various ways \n",
85-
" • ascii order \n",
86-
"\"rhyming\" order \n",
87-
"3. Extract useful info from a dictionary\n",
88-
"4. Compute ngram statistics\n",
89-
"5. Make a Concordance"
80+
" • ascii order \n",
81+
"\"rhyming\" order \n",
82+
"3. Extract useful info for a dictionary \n",
83+
"4. Compute ngram statistics \n",
84+
"5. Make a Concordance "
85+
]
86+
},
87+
{
88+
"cell_type": "markdown",
89+
"metadata": {},
90+
"source": [
91+
"***"
9092
]
9193
},
9294
{
93-
"attachments": {},
9495
"cell_type": "markdown",
9596
"metadata": {},
9697
"source": [
@@ -105,7 +106,7 @@
105106
},
106107
{
107108
"cell_type": "code",
108-
"execution_count": null,
109+
"execution_count": 2,
109110
"metadata": {},
110111
"outputs": [],
111112
"source": [
@@ -114,7 +115,7 @@
114115
},
115116
{
116117
"cell_type": "code",
117-
"execution_count": null,
118+
"execution_count": 3,
118119
"metadata": {},
119120
"outputs": [],
120121
"source": [
@@ -123,7 +124,7 @@
123124
},
124125
{
125126
"cell_type": "code",
126-
"execution_count": null,
127+
"execution_count": 4,
127128
"metadata": {},
128129
"outputs": [],
129130
"source": [
@@ -132,30 +133,268 @@
132133
},
133134
{
134135
"cell_type": "code",
135-
"execution_count": null,
136+
"execution_count": 5,
136137
"metadata": {},
137138
"outputs": [],
138139
"source": [
139140
"# d)"
140141
]
141142
},
142143
{
143-
"attachments": {},
144144
"cell_type": "markdown",
145145
"metadata": {},
146146
"source": [
147147
"##### 2. Sorting and reversing lines of text\n",
148148
"\n",
149-
"a. Sort each line ignoring case\n",
150-
"• sort –n Numeric order\n",
151-
"• sort –r Reverse sort\n",
152-
"• sort –nr Reverse numeric sort"
149+
"a. Sort each line alphabetically ignoring case \n",
150+
"b. sort in numeric ([ascii](https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html)) order \n",
151+
"c. Alphabetically reverse sort (ignoring case) \n",
152+
"d. Reverse numeric ([ascii](https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html)) sort "
153+
]
154+
},
155+
{
156+
"cell_type": "code",
157+
"execution_count": 6,
158+
"metadata": {},
159+
"outputs": [],
160+
"source": [
161+
"# a)"
162+
]
163+
},
164+
{
165+
"cell_type": "code",
166+
"execution_count": 7,
167+
"metadata": {},
168+
"outputs": [],
169+
"source": [
170+
"# b)"
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": 8,
176+
"metadata": {},
177+
"outputs": [],
178+
"source": [
179+
"# c)"
180+
]
181+
},
182+
{
183+
"cell_type": "code",
184+
"execution_count": 9,
185+
"metadata": {},
186+
"outputs": [],
187+
"source": [
188+
"# d)"
189+
]
190+
},
191+
{
192+
"cell_type": "markdown",
193+
"metadata": {},
194+
"source": [
195+
"##### 3. Sorting and reversing lines of text\n",
196+
"\n",
197+
"a. Find the 50 most common words \n",
198+
"b. Find the words in the NYT that end in \"zz\" "
199+
]
200+
},
201+
{
202+
"cell_type": "code",
203+
"execution_count": 10,
204+
"metadata": {},
205+
"outputs": [],
206+
"source": [
207+
"# a)"
208+
]
209+
},
210+
{
211+
"cell_type": "code",
212+
"execution_count": 11,
213+
"metadata": {},
214+
"outputs": [],
215+
"source": [
216+
"# b)"
217+
]
218+
},
219+
{
220+
"cell_type": "markdown",
221+
"metadata": {},
222+
"source": [
223+
"##### 4. Compute ngrams and other statistics\n",
224+
"\n",
225+
"a. Find the 10 most common bigrams \n",
226+
"b. Find the 10 most common trigrams \n",
227+
"c. Count the lines, the words, and the characters\n",
228+
"d. How many all uppercase words are there in this NYT file?\n",
229+
"e, How many 4-letter words?\n",
230+
"f. How many different words are there with no vowels\n",
231+
"g. What subtypes do they belong to?\n",
232+
"h. How many “1 syllable” words are there"
233+
]
234+
},
235+
{
236+
"cell_type": "code",
237+
"execution_count": 12,
238+
"metadata": {},
239+
"outputs": [],
240+
"source": [
241+
"# a)"
242+
]
243+
},
244+
{
245+
"cell_type": "code",
246+
"execution_count": 13,
247+
"metadata": {},
248+
"outputs": [],
249+
"source": [
250+
"# b)"
251+
]
252+
},
253+
{
254+
"cell_type": "code",
255+
"execution_count": 14,
256+
"metadata": {},
257+
"outputs": [],
258+
"source": [
259+
"# c)"
260+
]
261+
},
262+
{
263+
"cell_type": "code",
264+
"execution_count": 15,
265+
"metadata": {},
266+
"outputs": [],
267+
"source": [
268+
"# d)"
269+
]
270+
},
271+
{
272+
"cell_type": "code",
273+
"execution_count": 16,
274+
"metadata": {},
275+
"outputs": [],
276+
"source": [
277+
"# e)"
278+
]
279+
},
280+
{
281+
"cell_type": "code",
282+
"execution_count": 17,
283+
"metadata": {},
284+
"outputs": [],
285+
"source": [
286+
"# f)"
287+
]
288+
},
289+
{
290+
"cell_type": "code",
291+
"execution_count": 18,
292+
"metadata": {},
293+
"outputs": [],
294+
"source": [
295+
"# g)"
296+
]
297+
},
298+
{
299+
"cell_type": "code",
300+
"execution_count": 19,
301+
"metadata": {},
302+
"outputs": [],
303+
"source": [
304+
"# h)"
305+
]
306+
},
307+
{
308+
"cell_type": "markdown",
309+
"metadata": {},
310+
"source": [
311+
"##### 5. Make a Concordance\n",
312+
"\n",
313+
"a. Create a concordance display for an arbitrary word. See the example below \n",
314+
"\n",
315+
"![](../../Data/figs/Sample-concordance-lines-of-actually.png)"
316+
]
317+
},
318+
{
319+
"cell_type": "code",
320+
"execution_count": null,
321+
"metadata": {},
322+
"outputs": [],
323+
"source": [
324+
"# a)"
325+
]
326+
},
327+
{
328+
"cell_type": "markdown",
329+
"metadata": {},
330+
"source": [
331+
"***"
332+
]
333+
},
334+
{
335+
"cell_type": "markdown",
336+
"metadata": {},
337+
"source": [
338+
"##### Extra Credit – Secret Message\n",
339+
"+ The answers to the extra credit exercises will reveal a secret message. \n",
340+
"+ We will be working with the following text file for these exercises: \n",
341+
"[Link to Text](https://web.stanford.edu/class/cs124/lec/secret_ec.txt) "
342+
]
343+
},
344+
{
345+
"cell_type": "markdown",
346+
"metadata": {},
347+
"source": [
348+
"##### Extra Credit Exercise 1\n",
349+
"• Find the 2 most common words in secret_ec.txt containing the letter e. \n",
350+
"• Your answer will correspond to the first two words of the secret message. "
351+
]
352+
},
353+
{
354+
"cell_type": "code",
355+
"execution_count": null,
356+
"metadata": {},
357+
"outputs": [],
358+
"source": []
359+
},
360+
{
361+
"cell_type": "markdown",
362+
"metadata": {},
363+
"source": [
364+
"##### Extra Credit Exercise 2\n",
365+
"• Find the 2 most common bigrams in secret_ec.txt where the second word in the bigram ends with a consonant. \n",
366+
"• Your answer will correspond to the next four words of the secret message. "
367+
]
368+
},
369+
{
370+
"cell_type": "code",
371+
"execution_count": null,
372+
"metadata": {},
373+
"outputs": [],
374+
"source": []
375+
},
376+
{
377+
"cell_type": "markdown",
378+
"metadata": {},
379+
"source": [
380+
"##### Extra Credit Exercise 3\n",
381+
"• Find all 5-letter-long words that only appear once in secret_ec.txt. \n",
382+
"• Concatenate your result. This will be the final word of the secret message. \n",
383+
"\n",
384+
"What is the secret message? "
153385
]
386+
},
387+
{
388+
"cell_type": "code",
389+
"execution_count": null,
390+
"metadata": {},
391+
"outputs": [],
392+
"source": []
154393
}
155394
],
156395
"metadata": {
157396
"kernelspec": {
158-
"display_name": "Python 3",
397+
"display_name": "Python 3 (ipykernel)",
159398
"language": "python",
160399
"name": "python3"
161400
},
@@ -169,10 +408,9 @@
169408
"name": "python",
170409
"nbconvert_exporter": "python",
171410
"pygments_lexer": "ipython3",
172-
"version": "3.10.10"
173-
},
174-
"orig_nbformat": 4
411+
"version": "3.10.6"
412+
}
175413
},
176414
"nbformat": 4,
177-
"nbformat_minor": 2
415+
"nbformat_minor": 4
178416
}

Diff for: Data/figs/Sample-concordance-lines-of-actually.png

111 KB
Loading

0 commit comments

Comments
 (0)