|
1 | 1 | {
|
2 | 2 | "cells": [
|
3 | 3 | {
|
4 |
| - "attachments": {}, |
5 | 4 | "cell_type": "markdown",
|
6 | 5 | "metadata": {},
|
7 | 6 | "source": [
|
|
17 | 16 | ]
|
18 | 17 | },
|
19 | 18 | {
|
20 |
| - "attachments": {}, |
21 | 19 | "cell_type": "markdown",
|
22 | 20 | "metadata": {},
|
23 | 21 | "source": [
|
24 | 22 | "***"
|
25 | 23 | ]
|
26 | 24 | },
|
27 | 25 | {
|
28 |
| - "attachments": {}, |
29 | 26 | "cell_type": "markdown",
|
30 | 27 | "metadata": {},
|
31 | 28 | "source": [
|
|
34 | 31 | },
|
35 | 32 | {
|
36 | 33 | "cell_type": "code",
|
37 |
| - "execution_count": 5, |
| 34 | + "execution_count": 1, |
38 | 35 | "metadata": {},
|
39 | 36 | "outputs": [
|
40 | 37 | {
|
|
67 | 64 | ]
|
68 | 65 | },
|
69 | 66 | {
|
70 |
| - "attachments": {}, |
71 | 67 | "cell_type": "markdown",
|
72 | 68 | "metadata": {},
|
73 | 69 | "source": [
|
74 | 70 | "***"
|
75 | 71 | ]
|
76 | 72 | },
|
77 | 73 | {
|
78 |
| - "attachments": {}, |
79 | 74 | "cell_type": "markdown",
|
80 | 75 | "metadata": {},
|
81 | 76 | "source": [
|
82 |
| - "You will solve the following exercises using Pure Python.\n", |
83 |
| - "1. Count words in a text\n", |
| 77 | + "#### You will solve the following exercises using Pure Python.\n", |
| 78 | + "1. Count words in a text \n", |
84 | 79 | "2. Sort a list of words in various ways \n",
|
85 |
| - " • ascii order \n", |
86 |
| - " • \"rhyming\" order \n", |
87 |
| - "3. Extract useful info from a dictionary\n", |
88 |
| - "4. Compute ngram statistics\n", |
89 |
| - "5. Make a Concordance" |
| 80 | + " • ascii order \n", |
| 81 | + " • \"rhyming\" order \n", |
| 82 | + "3. Extract useful info for a dictionary \n", |
| 83 | + "4. Compute ngram statistics \n", |
| 84 | + "5. Make a Concordance " |
| 85 | + ] |
| 86 | + }, |
| 87 | + { |
| 88 | + "cell_type": "markdown", |
| 89 | + "metadata": {}, |
| 90 | + "source": [ |
| 91 | + "***" |
90 | 92 | ]
|
91 | 93 | },
|
92 | 94 | {
|
93 |
| - "attachments": {}, |
94 | 95 | "cell_type": "markdown",
|
95 | 96 | "metadata": {},
|
96 | 97 | "source": [
|
|
105 | 106 | },
|
106 | 107 | {
|
107 | 108 | "cell_type": "code",
|
108 |
| - "execution_count": null, |
| 109 | + "execution_count": 2, |
109 | 110 | "metadata": {},
|
110 | 111 | "outputs": [],
|
111 | 112 | "source": [
|
|
114 | 115 | },
|
115 | 116 | {
|
116 | 117 | "cell_type": "code",
|
117 |
| - "execution_count": null, |
| 118 | + "execution_count": 3, |
118 | 119 | "metadata": {},
|
119 | 120 | "outputs": [],
|
120 | 121 | "source": [
|
|
123 | 124 | },
|
124 | 125 | {
|
125 | 126 | "cell_type": "code",
|
126 |
| - "execution_count": null, |
| 127 | + "execution_count": 4, |
127 | 128 | "metadata": {},
|
128 | 129 | "outputs": [],
|
129 | 130 | "source": [
|
|
132 | 133 | },
|
133 | 134 | {
|
134 | 135 | "cell_type": "code",
|
135 |
| - "execution_count": null, |
| 136 | + "execution_count": 5, |
136 | 137 | "metadata": {},
|
137 | 138 | "outputs": [],
|
138 | 139 | "source": [
|
139 | 140 | "# d)"
|
140 | 141 | ]
|
141 | 142 | },
|
142 | 143 | {
|
143 |
| - "attachments": {}, |
144 | 144 | "cell_type": "markdown",
|
145 | 145 | "metadata": {},
|
146 | 146 | "source": [
|
147 | 147 | "##### 2. Sorting and reversing lines of text\n",
|
148 | 148 | "\n",
|
149 |
| - "a. Sort each line ignoring case\n", |
150 |
| - "• sort –n Numeric order\n", |
151 |
| - "• sort –r Reverse sort\n", |
152 |
| - "• sort –nr Reverse numeric sort" |
| 149 | + "a. Sort each line alphabetically ignoring case \n", |
| 150 | + "b. sort in numeric ([ascii](https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html)) order \n", |
| 151 | + "c. Alphabetically reverse sort (ignoring case) \n", |
| 152 | + "d. Reverse numeric ([ascii](https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html)) sort " |
| 153 | + ] |
| 154 | + }, |
| 155 | + { |
| 156 | + "cell_type": "code", |
| 157 | + "execution_count": 6, |
| 158 | + "metadata": {}, |
| 159 | + "outputs": [], |
| 160 | + "source": [ |
| 161 | + "# a)" |
| 162 | + ] |
| 163 | + }, |
| 164 | + { |
| 165 | + "cell_type": "code", |
| 166 | + "execution_count": 7, |
| 167 | + "metadata": {}, |
| 168 | + "outputs": [], |
| 169 | + "source": [ |
| 170 | + "# b)" |
| 171 | + ] |
| 172 | + }, |
| 173 | + { |
| 174 | + "cell_type": "code", |
| 175 | + "execution_count": 8, |
| 176 | + "metadata": {}, |
| 177 | + "outputs": [], |
| 178 | + "source": [ |
| 179 | + "# c)" |
| 180 | + ] |
| 181 | + }, |
| 182 | + { |
| 183 | + "cell_type": "code", |
| 184 | + "execution_count": 9, |
| 185 | + "metadata": {}, |
| 186 | + "outputs": [], |
| 187 | + "source": [ |
| 188 | + "# d)" |
| 189 | + ] |
| 190 | + }, |
| 191 | + { |
| 192 | + "cell_type": "markdown", |
| 193 | + "metadata": {}, |
| 194 | + "source": [ |
| 195 | + "##### 3. Sorting and reversing lines of text\n", |
| 196 | + "\n", |
| 197 | + "a. Find the 50 most common words \n", |
| 198 | + "b. Find the words in the NYT that end in \"zz\" " |
| 199 | + ] |
| 200 | + }, |
| 201 | + { |
| 202 | + "cell_type": "code", |
| 203 | + "execution_count": 10, |
| 204 | + "metadata": {}, |
| 205 | + "outputs": [], |
| 206 | + "source": [ |
| 207 | + "# a)" |
| 208 | + ] |
| 209 | + }, |
| 210 | + { |
| 211 | + "cell_type": "code", |
| 212 | + "execution_count": 11, |
| 213 | + "metadata": {}, |
| 214 | + "outputs": [], |
| 215 | + "source": [ |
| 216 | + "# b)" |
| 217 | + ] |
| 218 | + }, |
| 219 | + { |
| 220 | + "cell_type": "markdown", |
| 221 | + "metadata": {}, |
| 222 | + "source": [ |
| 223 | + "##### 4. Compute ngrams and other statistics\n", |
| 224 | + "\n", |
| 225 | + "a. Find the 10 most common bigrams \n", |
| 226 | + "b. Find the 10 most common trigrams \n", |
| 227 | + "c. Count the lines, the words, and the characters\n", |
| 228 | + "d. How many all uppercase words are there in this NYT file?\n", |
| 229 | + "e, How many 4-letter words?\n", |
| 230 | + "f. How many different words are there with no vowels\n", |
| 231 | + "g. What subtypes do they belong to?\n", |
| 232 | + "h. How many “1 syllable” words are there" |
| 233 | + ] |
| 234 | + }, |
| 235 | + { |
| 236 | + "cell_type": "code", |
| 237 | + "execution_count": 12, |
| 238 | + "metadata": {}, |
| 239 | + "outputs": [], |
| 240 | + "source": [ |
| 241 | + "# a)" |
| 242 | + ] |
| 243 | + }, |
| 244 | + { |
| 245 | + "cell_type": "code", |
| 246 | + "execution_count": 13, |
| 247 | + "metadata": {}, |
| 248 | + "outputs": [], |
| 249 | + "source": [ |
| 250 | + "# b)" |
| 251 | + ] |
| 252 | + }, |
| 253 | + { |
| 254 | + "cell_type": "code", |
| 255 | + "execution_count": 14, |
| 256 | + "metadata": {}, |
| 257 | + "outputs": [], |
| 258 | + "source": [ |
| 259 | + "# c)" |
| 260 | + ] |
| 261 | + }, |
| 262 | + { |
| 263 | + "cell_type": "code", |
| 264 | + "execution_count": 15, |
| 265 | + "metadata": {}, |
| 266 | + "outputs": [], |
| 267 | + "source": [ |
| 268 | + "# d)" |
| 269 | + ] |
| 270 | + }, |
| 271 | + { |
| 272 | + "cell_type": "code", |
| 273 | + "execution_count": 16, |
| 274 | + "metadata": {}, |
| 275 | + "outputs": [], |
| 276 | + "source": [ |
| 277 | + "# e)" |
| 278 | + ] |
| 279 | + }, |
| 280 | + { |
| 281 | + "cell_type": "code", |
| 282 | + "execution_count": 17, |
| 283 | + "metadata": {}, |
| 284 | + "outputs": [], |
| 285 | + "source": [ |
| 286 | + "# f)" |
| 287 | + ] |
| 288 | + }, |
| 289 | + { |
| 290 | + "cell_type": "code", |
| 291 | + "execution_count": 18, |
| 292 | + "metadata": {}, |
| 293 | + "outputs": [], |
| 294 | + "source": [ |
| 295 | + "# g)" |
| 296 | + ] |
| 297 | + }, |
| 298 | + { |
| 299 | + "cell_type": "code", |
| 300 | + "execution_count": 19, |
| 301 | + "metadata": {}, |
| 302 | + "outputs": [], |
| 303 | + "source": [ |
| 304 | + "# h)" |
| 305 | + ] |
| 306 | + }, |
| 307 | + { |
| 308 | + "cell_type": "markdown", |
| 309 | + "metadata": {}, |
| 310 | + "source": [ |
| 311 | + "##### 5. Make a Concordance\n", |
| 312 | + "\n", |
| 313 | + "a. Create a concordance display for an arbitrary word. See the example below \n", |
| 314 | + "\n", |
| 315 | + "" |
| 316 | + ] |
| 317 | + }, |
| 318 | + { |
| 319 | + "cell_type": "code", |
| 320 | + "execution_count": null, |
| 321 | + "metadata": {}, |
| 322 | + "outputs": [], |
| 323 | + "source": [ |
| 324 | + "# a)" |
| 325 | + ] |
| 326 | + }, |
| 327 | + { |
| 328 | + "cell_type": "markdown", |
| 329 | + "metadata": {}, |
| 330 | + "source": [ |
| 331 | + "***" |
| 332 | + ] |
| 333 | + }, |
| 334 | + { |
| 335 | + "cell_type": "markdown", |
| 336 | + "metadata": {}, |
| 337 | + "source": [ |
| 338 | + "##### Extra Credit – Secret Message\n", |
| 339 | + "+ The answers to the extra credit exercises will reveal a secret message. \n", |
| 340 | + "+ We will be working with the following text file for these exercises: \n", |
| 341 | + "[Link to Text](https://web.stanford.edu/class/cs124/lec/secret_ec.txt) " |
| 342 | + ] |
| 343 | + }, |
| 344 | + { |
| 345 | + "cell_type": "markdown", |
| 346 | + "metadata": {}, |
| 347 | + "source": [ |
| 348 | + "##### Extra Credit Exercise 1\n", |
| 349 | + "• Find the 2 most common words in secret_ec.txt containing the letter e. \n", |
| 350 | + "• Your answer will correspond to the first two words of the secret message. " |
| 351 | + ] |
| 352 | + }, |
| 353 | + { |
| 354 | + "cell_type": "code", |
| 355 | + "execution_count": null, |
| 356 | + "metadata": {}, |
| 357 | + "outputs": [], |
| 358 | + "source": [] |
| 359 | + }, |
| 360 | + { |
| 361 | + "cell_type": "markdown", |
| 362 | + "metadata": {}, |
| 363 | + "source": [ |
| 364 | + "##### Extra Credit Exercise 2\n", |
| 365 | + "• Find the 2 most common bigrams in secret_ec.txt where the second word in the bigram ends with a consonant. \n", |
| 366 | + "• Your answer will correspond to the next four words of the secret message. " |
| 367 | + ] |
| 368 | + }, |
| 369 | + { |
| 370 | + "cell_type": "code", |
| 371 | + "execution_count": null, |
| 372 | + "metadata": {}, |
| 373 | + "outputs": [], |
| 374 | + "source": [] |
| 375 | + }, |
| 376 | + { |
| 377 | + "cell_type": "markdown", |
| 378 | + "metadata": {}, |
| 379 | + "source": [ |
| 380 | + "##### Extra Credit Exercise 3\n", |
| 381 | + "• Find all 5-letter-long words that only appear once in secret_ec.txt. \n", |
| 382 | + "• Concatenate your result. This will be the final word of the secret message. \n", |
| 383 | + "\n", |
| 384 | + "What is the secret message? " |
153 | 385 | ]
|
| 386 | + }, |
| 387 | + { |
| 388 | + "cell_type": "code", |
| 389 | + "execution_count": null, |
| 390 | + "metadata": {}, |
| 391 | + "outputs": [], |
| 392 | + "source": [] |
154 | 393 | }
|
155 | 394 | ],
|
156 | 395 | "metadata": {
|
157 | 396 | "kernelspec": {
|
158 |
| - "display_name": "Python 3", |
| 397 | + "display_name": "Python 3 (ipykernel)", |
159 | 398 | "language": "python",
|
160 | 399 | "name": "python3"
|
161 | 400 | },
|
|
169 | 408 | "name": "python",
|
170 | 409 | "nbconvert_exporter": "python",
|
171 | 410 | "pygments_lexer": "ipython3",
|
172 |
| - "version": "3.10.10" |
173 |
| - }, |
174 |
| - "orig_nbformat": 4 |
| 411 | + "version": "3.10.6" |
| 412 | + } |
175 | 413 | },
|
176 | 414 | "nbformat": 4,
|
177 |
| - "nbformat_minor": 2 |
| 415 | + "nbformat_minor": 4 |
178 | 416 | }
|
0 commit comments