0% found this document useful (0 votes)

27 views110 pages

AllNotes WebTrack

CS50x is an introductory computer science course that teaches algorithmic thinking and problem-solving using languages like C, Python, and SQL, among others. Students are expected to complete nine problem sets and a final project, with no deadlines for submissions, but must achieve a satisfactory score for a verified certificate. The course emphasizes academic honesty and encourages collaboration within specified limits while fostering a community among students with varying levels of prior programming experience.

Uploaded by

audacityimpact

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views110 pages

AllNotes WebTrack

Uploaded by

audacityimpact

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 110

This is CS50x

OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

[email protected]
 (https://www.facebook.com/dmalan)  (https://github.com/dmalan)  (https://www.instagram.com/davidjmalan/) 
(https://www.linkedin.com/in/malan/)  (https://www.quora.com/pro le/David-J-Malan)  (https://twitter.com/davidjmalan)

Syllabus
Introduction to the intellectual enterprises of computer science and the art of programming. This course teaches students how to think
algorithmically and solve problems ef ciently. Topics include abstraction, algorithms, data structures, encapsulation, resource management,
security, and software engineering. Languages include C, Python, and SQL plus students’ choice of: HTML, CSS, and JavaScript (for web
development); Java or Swift (for mobile app development); or Lua (for game development). Problem sets inspired by the arts, humanities, social
sciences, and sciences. Course culminates in a nal project. Designed for concentrators and non-concentrators alike, with or without prior
programming experience. Two thirds of CS50 students have never taken CS before. Among the overarching goals of this course are to inspire
students to explore unfamiliar waters, without fear of failure, create an intensive, shared experience, accessible to all students, and build
community among students.

Expectations

You are expected to

 submit nine problem sets and

 submit a nal project.

Website

https://cs50.edx.org/

Certi cates

CS50x is free to take, and you are welcome to submit the course’s nine problem sets and nal project for automated feedback. To be eligible for
a veri ed certi cate (https://www.edx.org/veri ed-certi cate) from edX, however, you must receive a satisfactory score (at least 70%) on each
problem you submit as part of one of the course’s nine problem sets as well as on the course’s nal project.

Problems are evaluated along axes of correctness (as determined by a program called check50 ) and style (as determined by a program called
style50 ), with scores ordinarily computed as 3 × correctness + 1 × style.

Books

No books are required or recommended for this course. However, you might nd the below books of interest. Realize that free, if not superior,
resources can be found on the course’s website.

C Programming Absolute Beginner’s Guide, Third Edition + Greg Perry, Dean Miller + Pearson Education, 2014 + ISBN 0-789-75198-4

Hacker’s Delight, Second Edition + Henry S. Warren Jr. + Pearson Education, 2013 + ISBN 0-321-84268-5

How Computers Work, Tenth Edition + Ron White + Que Publishing, 2014 + ISBN 0-7897-4984-X

Programming in C, Fourth Edition + Stephen G. Kochan + Pearson Education, 2015 + ISBN 0-321-77641-0

Lectures

The course’s lectures introduce each week’s concepts.

1/3
Walkthroughs

Integrated into problem sets are “walkthroughs,” videos that offer direction on where to begin and how to approach problems.

Problem Sets

Problem sets are programming assignments. CS50x does not have deadlines for problem sets. You are welcome to work on and submit them at
your own pace. To be eligible for a veri ed certi cate from edX, however, you must submit (and receive a score of at least 70% on) all problem
sets by 31 December 2020.

Final Project

The climax of this course is its nal project. The nal project is your opportunity to take your newfound savvy with programming out for a spin
and develop your very own piece of software. So long as your project draws upon this course’s lessons, the nature of your project is entirely up
to you. You may implement your project in any language(s). You are welcome to utilize infrastructure other than the CS50 IDE. All that we ask is
that you build something of interest to you, that you solve an actual problem, that you impact your community, or that you change the world.
Strive to create something that outlives this course.

Inasmuch as software development is rarely a one-person effort, you are allowed an opportunity to collaborate with one or two classmates for
this nal project. Needless to say, it is expected that every student in any such group contribute equally to the design and implementation of
that group’s project. Moreover, it is expected that the scope of a two- or three-person group’s project be, respectively, twice or thrice that of a
typical one-person project. A one-person project, mind you, should entail more time and effort than is required by each of the course’s problem
sets. Although no more than three students may design and implement a given project, you are welcome to solicit advice from others, so long
as you respect the course’s policy on academic honesty.

CS50x does not have a deadline for the nal project. You are welcome to work on and submit it at your own pace. To be eligible for a veri ed
certi cate from edX, however, you must submit (and receive a score of at least 70% on) it by 31 December 2020.

Academic Honesty

This course’s philosophy on academic honesty is best stated as “be reasonable.” The course recognizes that interactions with classmates and
others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the
work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problem sets is not permitted except to the extent
that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking,
when asking for help, you may show your code to others, but you may not view theirs, so long as you and they respect this policy’s other
constraints. Collaboration on the course’s nal project is permitted to the extent prescribed by its speci cation.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to
whether some act is reasonable, do not commit it. If the course determines that you have commited an act that is not reasonable, you may be
deemed ineligible for a certi cate. If you commit some act that is not reasonable but bring it to the attention of the course’s instructor within
72 hours, the course may reconsider that outcome.

Reasonable
 Communicating with classmates about problem sets’ problems in English (or some other spoken language).
 Discussing the course’s material with others in order to understand it better.
 Helping a classmate identify a bug in his or her code in person or online, as by viewing, compiling, or running his or her code, even on your
own computer.
 Incorporating a few lines of code that you nd online or elsewhere into your own code, provided that those lines are not themselves
solutions to assigned problems and that you cite the lines’ origins.
 Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and x a bug.
 Sharing a few lines of your own code online so that others might help you identify and x a bug.
 Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical dif culties, but not
for outright solutions to problem set’s problems or your own nal project.
 Whiteboarding solutions to problem sets with others using diagrams or pseudocode but not actual code.
 Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

N tR bl 2/3
Not Reasonable
 Accessing a solution to some problem prior to (re-)submitting your own.
 Asking a classmate to see his or her solution to a problem set’s problem before (re-)submitting your own.
 Decompiling, deobfuscating, or disassembling the staff’s solutions to problem sets.
 Failing to cite (as with comments) the origins of code or techniques that you discover outside of the course’s own lessons and integrate
into your own work, even while respecting this policy’s other constraints.
 Giving or showing to a classmate a solution to a problem set’s problem when it is he or she, and not you, who is struggling to solve it.
 Paying or offering to pay an individual for work that you may submit as (part of) your own.
 Searching for or soliciting outright solutions to problem sets online or elsewhere.

 Splitting a problem set’s workload with another individual and combining your work.
 Submitting (after possibly modifying) the work of another individual beyond the few lines allowed herein.
 Submitting the same or similar work to this course that you have submitted or will submit to another.
 Viewing another’s solution to a problem set’s problem and basing your own solution on it.

3/3
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 0
Welcome
What is computer science?
Binary
Representing data
Algorithms
Pseudocode
Scratch

Welcome
 When David was a rst year, he was too intimidated to take any computer science courses. By the time he was a sophomore, he found the
courage to take the equivalent of CS50, but only pass/fail.
 In fact, two-thirds of CS50 students have never taken a CS course before.
 And importantly, too:
what ultimately matters in this course is not so much where you end up relative to your classmates but where you end up relative to
yourself when you began

What is computer science?

 Computer science is fundamentally problem-solving.
 We can think of problem-solving as the process of taking some input (details about our problem) and generate some output (the solution
to our problem). The “black box” in the middle is computer science.

 We need a way to represent inputs, such that we can store and work with information in a standard way.

Binary
 A computer, at the lowest level, stores data in binary, a numeral system in which there are just two digits, 0 and 1.
 When we rst learned to count, we might have used one nger to represent one thing. That system is called unary. When we learned to
write numbers with the digits 0 through 9, we learned to use decimal.
 For example, we know the following represents one hundred and twenty-three.

1 2 3
1/13
 The 3 is in the ones column, the 2 is in the tens column, and the 1 is in the hundreds column.
 So 123 is 100×1 + 10×2 + 1×3 = 100 + 20 + 3 = 123.
 Each place for a digit represents a power of ten, since there are ten possible digits for each place.
 In binary, with just two digits, we have powers of two for each place value:

4 2 1
0 0 0

 This would still be equal to 0.

 Now if we change the binary value to, say, 0 1 1 , the decimal value would be 3.

4 2 1
0 1 1

 If we wanted to represent 8, we would need another digit:

8 4 2 1
1 0 0 0

 And binary makes sense for computers because we power them with electricity, which can be either on or off, so each bit only needs to be
on or off. In a computer, there are millions or billions of switches called transistors that can store electricity and represent a bit by being
“on” or “off”.
 With enough bits, or binary digits, computers can count to any number.
 8 bits make up one byte.

Representing data
 To represent letters, all we need to do is decide how numbers map to letters. Some humans, many years ago, collectively decided on a
standard mapping called ASCII (https://en.wikipedia.org/wiki/ASCII). The letter “A”, for example, is the number 65, and “B” is 66, and so on.
The mapping also includes punctuation and other symbols. Other characters, like letters with accent marks, and emoji, are part of a
standard called Unicode (https://en.wikipedia.org/wiki/Unicode) that use more bits than ASCII to accommodate all these characters.
 When we receive an emoji, our computer is actually just receiving a decimal number like 128514 ( 11111011000000010 in binary, if
you can read that more easily) that it then maps to the image of the emoji.
 An image, too, is comprised of many smaller square dots, or pixels, each of which can be represented in binary with a system called RGB,
with values for red, green, and blue light in each pixel. By mixing together different amounts of each color, we can represent millions of
colors:

 The red, green, and blue values are combined to get a light yellow color:

 We can see this in an emoji if we zoom in far enough:

2/13
 And computer programs know, based on the context of its code, whether the binary numbers should be interpreted as numbers, or letters,
or pixels.
 And videos are just many, many images displayed one after another, at some number of frames per second. Music, too, can be represented
by the notes being played, their duration, and their volume.

Algorithms
 So now we can represent inputs and outputs. The black box earlier will contain algorithms, step-by-step instructions for solving a problem:

 Let’s say we wanted to nd a friend, Mike Smith, in a phone book.

 We could start by ipping through the book, one page at a time, until we nd Mike Smith or reach the end of the book.
 We could also ip two pages at a time, but if we go too far, we’ll have to know to go back a page.
 But an even more ef cient way would be opening the phone book to the middle, decide whether Mike will be in the left half or right
half of the book (because the book is alphabetized), and immediately throw away half of the problem. We can repeat this, dividing the
problem in half each time. With 1024 pages to start, we would only need 10 steps of dividing in half before we have just one page
remaining to check.
 In fact, we can represent the ef ciency of each of those algorithms with a chart:

 Our rst solution, one page at a time, is like the red line: our time to solve increases linearly as the size of the problem increases.
 The second solution, two pages at a time, is like the yellow line: our slope is less steep, but still linear.
 Our nal solution, is like the green line: logarithmic, since our time to solve rises more and more slowly as the size of the problem
increases. In other words, if the phone book went from 1000 to 2000 pages, we would need one more step to nd Mike. If the size
doubled again from 2000 to 4000 pages, we would still only need one more step.
3/13
doubled again from 2000 to 4000 pages, we would still only need one more step.

Pseudocode
 We can write pseudocode, an informal syntax that is just a more speci c version of English (or other human language) that represents our
algorithm:

1 Pick up phone book

2 Open to middle of phone book
3 Look at page
4 If Smith is on page
5 Call Mike
6 Else if Smith is earlier in book
7 Open to middle of left half of book
8 Go back to line 3
9 Else if Smith is later in book
10 Open to middle of right half of book
11 Go back to line 3
12 Else
13 Quit

 Some of these lines start with verbs, or actions. We’ll start calling these functions:

1 Pick up phone book

 We also have branches that lead to different paths, like forks in the road, which we’ll call conditions:

1 Pick up phone book

 And the questions that decide where we go are called Boolean expressions, which eventually result to a value of true or false:

1 Pick up phone book

Fi ll h d h l d l h f ll d l 4/13
 Finally, we have words that lead to cycles, where we can repeat parts of our program, called loops:

1 Pick up phone book

Scratch
 We can write programs with the building blocks we just discovered:
 functions
 conditions
 Boolean expressions
 loops
 We’ll use a graphical programming language called Scratch (https://scratch.mit.edu/), where we’ll drag and drop blocks that contain
instructions.
 Later in our course, we’ll move onto textual programming languages like C, and Python, and JavaScript. All of these languages, including
Scratch, has more powerful features like:
 variables
 the ability to store values and change them
 threads
 the ability for our program to do multiple things at once
 events
 the ability to respond to changes in our program or inputs
 …
 The programming environment for Scratch looks like this:

 On the left, we have puzzle pieces that represent functions or variables, or other concepts, that we can drag and drop into our
instruction area in the center.
5/13
 On the right, we have a stage that will be shown by our program to a human, where we can add or change backgrounds, characters
(called sprites in Scratch), and more.

 We can drag a few blocks to make Scratch say “hello, world”:

 The “when green ag clicked” block is the start of our program, and below it we’ve snapped in a “say” block and typed in “hello,
world”.
 We can also drag in the “ask and wait” block, with a question like “What’s your name?”, and combine it with a “say” block for the answer:

 But we didn’t wait after we said “Hello” with the rst block, so we can use the “say () for () seconds” block:

 We can use the “join” block to combine two phrases so Scratch can say “hello, David”:

 Notice that we can nest instructions and variables.

 In fact, the “say” block itself is like an algorithm, where we provided an input of “hello, world” and it produced the output of Scratch (the
cat) “saying” that phrase:

6/13
 The “ask” block, too, takes in an input (the question we want to ask), and produces the output of the “answer” block:

 We can then use the “answer” block along with our own text, “hello, “, as two inputs to the join algorithm …

7/13
 … which we pass as input again to the “say” block:

 We can try to make Scratch (the name of the cat) say meow:

 But when we click the green ag, we hear the meow sound over and over immediately. Our rst bug, or mistake! We can add a block
to wait, so the meows sound more normal.

 We can have Scratch point towards the mouse and move towards it:

8/13
 We’ll look at a sheep that can count:

 Here, counter is a variable, the value of which we can set, use, and change.
 We can also have Scratch meow if we touch it with the mouse pointer:

 Alternatively, we can have Scratch roar if we do:

 Here, we have two different branches, or conditions, that will repeat forever. If the mouse is touching it, Scratch will “roar”, otherwise
it will just meow.

9/13
 We can make Scratch move back and forth on the screen with a few more blocks we can discover by looking around:

 We can even record our own sound to play.

 With two different “costumes,” or images of Scratch with its legs in different positions, we can even simulate an animated walking motion:

10/13
 We look at another program, bark, where we can use the space bar to mute a sea lion:

 We have a variable, muted , that’s false by default. And our program will constantly check if the space bar is pressed, and set muted
to false if it’s true , or true if not. This way, we can toggle whether the sound plays or not, since our other set of blocks for the
sea lion check the muted variable:

 With multiple sprites, or characters, we can have different sets of blocks for each of them:

11/13
 For one puppet, we have these blocks that say “Marco!”, and then a “broadcast event” block. This “event” is used for our two sprites to
communicate with each other, like sending a secret message. So our other puppet can just wait for this event to say “Polo!”:

 Now that we know some basics, we can think about the design, or quality of our programs. For example, we might want to have Scratch
cough three times by repeating some blocks:

 While this is correct, we can avoid repeating blocks with a loop:

 The next step is abstracting away some of our code into a function, or making it reusable in different ways. We can make a block called
“cough” and put some blocks inside it:

12/13
 Now, all of our sprites can use the same “cough” block, in as many places as we’d like.
 We can even put a number of times into our cough function, so we only need a single block to cough any number of times:

 We look at some examples and discuss how we might implement components of them with different sprites that follow the mouse cursor,
or cause something else to happen on the stage.
 Welcome aboard!

13/13
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 1
C
hello, world
Compilers
String
Scratch blocks in C
Types, formats, operators
More examples
Screens
Memory, imprecision, and over ow

C
 Today we’ll learn a new language, C: a programming language that has all the features of Scratch and more, but perhaps a little less
friendly since it’s purely in text:

#include <stdio.h>

int main(void)
{
printf("hello, world\n");
}

 Though the words are new, the ideas are exactly as same as the “when green ag clicked” and “say (hello, world)” blocks in Scratch:

 Though cryptic, don’t forget that 2/3 of CS50 students have never taken CS before, so don’t be daunted! And though at rst, to borrow a
phrase from MIT, trying to absorb all these new concepts may feel like drinking from a re hose, be assured that by the end of the
semester we’ll be empowered by and experienced at learning and applying these concepts.
 We can compare a lot of the constructs in C, to blocks we’ve already seen and used in Scratch. The syntax is far less important than the
principles, which we’ve already been introduced to.

hello, world
 The “when green ag clicked” block in Scratch starts the main program; clicking the green ag causes the right set of blocks underneath
to start. In C, the rst line for the same is int main(void) , which we’ll learn more about over the coming weeks, followed by an open
curly brace { , and a closed curly brace } , wrapping everything that should be in our program.

int main(void)
{

}
1/12
 The “say (hello, world)” block is a function, and maps to printf("hello, world"); . In C, the function to print something to the screen is
printf , where f stands for “format”, meaning we can format the printed string in different ways. Then, we use parentheses to pass in
what we want to print. We have to use double quotes to surround our text so it’s understood as text, and nally, we add a semicolon ; to
end this line of code.
 To make our program work, we also need another line at the top, a header line #include <stdio.h> that de nes the printf function that
we want to use. Somewhere there is a le on our computer, stdio.h , that includes the code that allows us to access the printf function,
and the #include line tells the computer to include that le with our program.
 To write our rst program in Scratch, we opened Scratch’s website. Similarly, we’ll use the CS50 Sandbox (https://sandbox.cs50.io/) to start
writing and running code the same way. The CS50 Sandbox is a virtual, cloud-based environment with the libraries and tools already
installed for writing programs in various languages. At the top, there is a simple code editor, where we can type text. Below, we have a
terminal window, into which we can type commands:

 We’ll type our code from earlier into the top, after using the + sign to create a new le called hello.c :

 We end our program’s le with .c by convention, to indicate that it’s intended as a C program. Notice that our code is colorized, so that
certain things are more visible.

Compilers
 Once we save the code that we wrote, which is called source code, we need to convert it to machine code, binary instructions that the
computer understands directly.
 We use a program called a compiler to compile our source code into machine code.
 To do this, we use the Terminal panel, which has a command prompt. The $ at the left is a prompt, after which we can type commands.
 We type clang hello.c (where clang stands for “C languages”, a compiler written by a group of people). But before we press enter,
we click the folder icon on the top left of CS50 Sandbox. We see our le, hello.c . So we press enter in the terminal window, and see
that we have another le now, called a.out (short for “assembly output”). Inside that le is the code for our program, in binary. Now,
we can type ./a.out in the terminal prompt to run the program a.out in our current folder. We just wrote, compiled, and ran our
rst program!

String
 But after we run our program, we see hello, world$ , with the new prompt on the same line as our output. It turns out that we need to
specify precisely that we need a new line after our program, so we can update our code to include a special newline character, \n :

#include <stdio.h>

2/12
int main(void)
{
printf("hello, world\n");
}

 Now we need to remember to recompile our program with clang hello.c before we can run this new version.
 Line 2 of our program is intentionally blank since we want to start a new section of code, much like starting new paragraphs in essays. It’s
not strictly necessary for our program to run correctly, but it helps humans read longer programs more easily.
 We can change the name of our program from a.out to something else, too. We can pass command-line arguments, or additional options,
to programs in the terminal, depending on what the program is written to understand. For example, we can type clang -o hello
hello.c , and -o hello is telling the program clang to save the compiled output as just hello . Then, we can just run ./hello .

 In our command prompt, we can run other commands, like ls (list), which shows the les in our current folder:

$ ls
a.out* hello* hello.c

 The asterisk, * , indicates that those les are executable, or that they can be run by our computer.
 We can use the rm (remove) command to delete a le:

$ rm a.out
rm: remove regular file 'a.out'?

 We can type y or yes to con rm, and use ls again to see that it’s indeed gone forever.
 Now, let’s try to get input from the user, as we did in Scratch when we wanted to say “hello, David”:

string answer = get_string("What's your name?\n");

printf("hello, %s\n", answer);

 First, we need a string, or piece of text (speci cally, zero or more characters in a sequence in double quotes, like "" , "ba" , or
“bananas”), that we can ask the user for, with the function get_string . We pass the prompt, or what we want to ask the user, to the
function with "What is your name?\n" inside the parentheses. On the left, we want to create a variable, answer , the value of which
will be what the user enters. (The equals sign = is setting the value from right to left.) Finally, the type of variable that we want is
string , so we specify that to the left of answer .
 Next, inside the printf function, we want the value of answer in what we print back out. We use a placeholder for our string
variable, %s , inside the phrase we want to print, like "hello, %s\n" , and then we give printf another argument, or option, to tell
it that we want the variable answer to be substituted.
 If we made a mistake, like writing printf("hello, world"\n); with the \n outside of the double quotes for our string, we’ll see an errors
from our compiler:

$ clang -o hello hello.c

hello.c:5:26: error: expected ')'
printf("hello, world"\n);
^
hello.c:5:11: note: to match this '('
printf("hello, world"\n);
^
1 error generated.

 The rst line of the error tells us to look at hello.c , line 5, column 26, where the compiler expected a closing parentheses, instead
of a backslash.
 To simplify things (at least for the beginning), we’ll include a library, or set of code, from CS50. The library provides us with the string
variable type, the get_string function, and more. We just have to write a line at the top to include the le cs50.h :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
string name = get_string("What's your name?\n");
printf("hello, name\n");
3/12
p ( , \ );
}

 So let’s make a new le, string.c , with this code:

#include <stdio.h>

int main(void)
{
string name = get_string("What's your name?\n");
printf("hello, %s\n", name);
}

 Now, if we try to compile that code, we get a lot of lines of errors. Sometimes, one mistake means that the compiler then starts
interpreting correct code incorrectly, generating more errors than there actually are. So we start with our rst error:

$ clang -o string string.c

string.c:5:5: error: use of undeclared identifier 'string'; did you mean 'stdin'?
string name = get_string("What's your name?\n");
^~~~~~
stdin
/usr/include/stdio.h:135:25: note: 'stdin' declared here
extern struct _IO_FILE *stdin; /* Standard input stream. */

 We didn’t mean stdin (“standard in”) instead of string , so that error message wasn’t helpful. In fact, we need to import another le
that de nes the type string (actually a training wheel from CS50, as we’ll nd out in the coming weeks).
 So we can include another le, cs50.h , which also includes the function get_string , among others.

#include <cs50.h>
#include <stdio.h>

int main(void)
{
string name = get_string("What's your name?\n");
printf("hello, %s\n", name);
}

 Now, when we try to compile our program, we have just one error:

$ clang -o string string.c

/tmp/string-aca94d.o: In function `main':
string.c:(.text+0x19): undefined reference to `get_string'
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)

 It turns out that we also have to tell our compiler to add our special CS50 library le, with clang -o string string.c -lcs50 , with -
l for “link”.
 We can even abstract this away and just type make string . We see that, by default in the CS50 Sandbox, make uses clang to compile
our code from string.c into string , with all the necessary arguments, or ags, passed in.

Scratch blocks in C
 The “set [counter] to (0)” block is creating a variable, and in C we would write int counter = 0; , where int speci es that the type of our
variable is an integer:

 “change [counter] by (1)” is counter = counter + 1; in C. (In C, the = isn’t like an equals sign in a equation, where we are saying
counter is the same as counter + 1 . Instead, = is an assignment operator that means, “copy the value on the right, into the value on
the left”.) And notice we don’t need to say int anymore, since we presume that we already speci ed previously that counter is an int ,
with some existing value. We can also say counter += 1; or counter++; both of which are “syntactic sugar”, or shortcuts that have the
same effect with fewer characters to type.

 A condition would map to:

4/12
if (x < y)
{
printf("x is less than y\n");
}

 Notice that in C, we use { and } (as well as indentation) to indicate how lines of code should be nested.
 We can also have if-else conditions:

if (x < y)
{
printf("x is less than y\n");
}
else
{
printf("x is not less than y\n");
}

 Notice that lines of code that themselves are not some action ( if... , and the braces) don’t end in a semicolon.
 And even else if :<

5/12
if (x < y)
{
printf("x is less than y\n");
}
else if (x > y)
{
printf("x is greater than y\n");
}
else if (x == y)
{
printf("x is equal to y\n");
}

 Notice that, to compare two values in C, we use == , two equals signs.

 And, logically, we don’t need the if (x == y) in the nal condition, since that’s the only case remaining, and we can just say else .
 Loops can be written like the following:

while (true)
{
printf("hello, world\n");
}

 The while keyword also requires a condition, so we use true as the Boolean expression to ensure that our loop will run forever. Our
program will check whether the expression evaluates to true (which it always will in this case), and then run the lines inside the
curly braces. Then it will repeat that until the expression isn’t true anymore (which won’t change in this case).
 We could do something a certain number of times with while :

int i = 0;
while (i < 50)
{
printf("hello, world\n");
i++;
}

 We create a variable, i , and set it to 0. Then, while i < 50 , we run some lines of code, and we add 1 to i after each run.
 The curly braces around the two lines inside the while loop indicate that those lines will repeat, and we can add additional lines to
6/12
our program after if we wanted to.
 To do the same repetition, more commonly we can use the for keyword:

for (int i = 0; i < 50; i++)

{
printf("hello, world\n");
}

 Again, rst we create a variable named i and set it to 0. Then, we check that i < 50 every time we reach the top of the loop, before
we run any of the code inside. If that expression is true, then we run the code inside. Finally, after we run the code inside, we use i++
to add one to i , and the loop repeats.

Types, formats, operators

 There are other types we can use for our variables
 bool , a Boolean expression of either true or false
 char , a single character like a or 2
 double , a oating-point value with even more digits
 float , a oating-point value, or real number with a decimal value
 int , integers up to a certain size, or number of bits
 long , integers with more bits, so they can count higher
 string , a string of characters
 And the CS50 library has corresponding functions to get input of various types:
 get_char
 get_double
 get_float
 get_int
 get_long
 get_string
 For printf , too, there are different placeholders for each type:
 %c for chars
 %f for oats, doubles
 %i for ints
 %li for longs
 %s for strings
 And there are some mathematical operators we can use:
 + for addition
 - for subtraction
 * for multiplication
 / for division
 % for remainder

More examples
 For each of these examples, you can click on the sandbox links to run and edit your own copies of them.
 In int.c , we get and print an integer:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
int age = get_int("What's your age?\n");
int days = age * 365;
printf("You are at least %i days old.\n", days);
}

 Notice that we use %i to print an integer.

 We can now run make int and run our program with /int
7/12
 We can now run make int and run our program with ./int .
 We can combine lines and remove the days variable with:

int age = get_int("What's your age?\n");

printf("You are at least %i days old.\n", age * 365);

 Or even combine everything in one line:

printf("You are at least %i days old.\n", get_int("What's your age?\n") * 365);

 Though, once a line is too long or complicated, it may be better to keep two or even three lines for readability.

 In float.c , we can get decimal numbers (called oating-point values in computers, because the decimal point can “ oat” between the
digits, depending on the number):

#include <cs50.h>
#include <stdio.h>

int main(void)
{
float price = get_float("What's the price?\n");
printf("Your total is %f.\n", price * 1.0625);
}

 Now, if we compile and run our program, we’ll see a price printed out with tax.
 We can specify the number of digits printed after the decimal with a placeholder like %.2f for two digits after the decimal point.
 With parity.c , we can check if a number is even or odd:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
int n = get_int("n: ");

if (n % 2 == 0)
{
printf("even\n");
}
else
{
printf("odd\n");
}
}

 With the % (modulo) operator, we can get the remainder of n after it’s divided by 2. If the remainder is 0, we know that n is even.
Otherwise, we know n is odd.
 And functions like get_int from the CS50 library do error-checking, where only inputs from the user that matches the type we want
is accepted.
 In conditions.c , we turn the condition snippets from before into a program:

// Conditions and relational operators

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt user for x
int x = get_int("x: ");

// Prompt user for y

int y = get_int("y: ");

// Compare x and y
if (x < y)
{
printf("x is less than y\n");
}
else if (x > y)
{
printf("x is greater than y\n");
}
8/12
}
else
{
printf("x is equal to y\n");
}
}

 Lines that start with // are comments, or note for humans that the compiler will ignore.
 For David to compile and run this program in his sandbox, he rst needed to run cd src1 in the terminal. This changes the directory,
or folder, to the one in which he saved all of the lecture’s source les. Then, he could run make conditions and ./conditions . With
pwd , he can see that he’s in a src1 folder (inside other folders). And cd by itself, with no arguments, will take us back to our
default folder in the sandbox.

 In agree.c , we can ask the user to con rm or deny something:

// Logical operators

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt user to agree
char c = get_char("Do you agree?\n");

// Check whether agreed

if (c == 'Y' || c == 'y')
{
printf("Agreed.\n");
}
else if (c == 'N' || c == 'n')
{
printf("Not agreed.\n");
}
}

 We use two vertical bars, || , to indicate a logical “or”, whether either expression can be true for the condition to be followed.
 And if none of the expressions are true, nothing will happen since our program doesn’t have a loop.
 Let’s implement the coughing program from week 0:

#include <stdio.h>

int main(void)
{
printf("cough\n");
printf("cough\n");
printf("cough\n");
}

 We could use a for loop:

#include <stdio.h>

int main(void)
{
for (int i = 0; i < 3; i++)
{
printf("cough\n");
}
}

 By convention, programmers tend to start counting at 0, and so i will have the values of 0 , 1 , and 2 before stopping, for a total
of three iterations. We could also write for (int i = 1, i <= 3, i++) for the same nal effect.
 We can move the printf line to its own function:

#include <stdio.h>

void cough(void);

int main(void)
{
for (int i = 0; i < 3; i++)
{
cough();
9/12
}
}

void cough(void)
{
printf("cough\n");
}

 We declared a new function with void cough(void); , before our main function calls it. The C compiler reads our code from top to
bottom, so we need to tell it that the cough function exists, before we use it. Then, after our main function, we can implement the
cough function. This way, the compiler knows the function exists, and we can keep our main function close to the top.
 And our cough function doesn’t take any inputs, so we have cough(void) .

 We can abstract cough further:

#include <stdio.h>

void cough(int n);

int main(void)
{
cough(3);
}

void cough(int n)
{
for (int i = 0; i < n; i++)
{
printf("cough\n");
}
}

 Now, when we want to print “cough” any number of times, we can just call the same function. Notice that, with void cough(int n) ,
we indicate that the cough function takes as input an int , which we refer to as n . And inside cough , we use n in our for loop
to print “cough” the right number of times.
 Let’s look at positive.c :

#include <cs50.h>
#include <stdio.h>

int get_positive_int(void);

int main(void)
{
int i = get_positive_int();
printf("%i\n", i);
}

// Prompt user for positive integer

int get_positive_int(void)
{
int n;
do
{
n = get_int("%s", "Positive Integer: ");
}
while (n < 1);
return n;
}

 The CS50 library doesn’t have a get_positive_int function, but we can write one ourselves. Our function int
get_positive_int(void) will prompt the user for an int and return that int , which our main function stores as i . In
get_positive_int , we initialize a variable, int n , without assigning a value to it yet. Then, we have a new construct, do ...
while , which does something rst, then checks a condition, and repeats until the condition is no longer true.
 Once the loop ends because we have an n that is not < 1 , we can return it with the return keyword. And back in our main
function, we can set int i to that value.

Screens
 We might want a program that prints part of a screen from a video game like Super Mario Bros. In mario0.c , we have:

// Prints a row of 4 question marks

10/12
#include <stdio.h>

int main(void)
{
printf("????\n");
}

 We can ask the user for a number of question marks, and then print them, with mario2.c :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
int n;
do
{
n = get_int("Width: ");
}
while (n < 1);
for (int i = 0; i < n; i++)
{
printf("?");
}
printf("\n");
}

 And we can print a two-dimensional set of blocks with mario8.c :

// Prints an n-by-n grid of bricks with a loop

#include <cs50.h>
#include <stdio.h>

int main(void)
{
int n;
do
{
n = get_int("Size: ");
}
while (n < 1);
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
printf("#");
}
printf("\n");
}
}

 Notice we have two nested loops, where the outer loop uses i to do everything inside n times, and the inner loop uses j , a
different variable, to do something n times for each of those times. In other words, the outer loop prints n “rows”, or lines, and the
inner loop prints n “columns”, or # characters, in each line.
 Other examples not covered in lecture are available under “Source Code” for Week 1.

Memory, imprecision, and over ow

 Our computer has memory, in hardware chips called RAM, random-access memory. Our programs use that RAM to store data as they run,
but that memory is nite. So with a nite number of bits, we can’t represent all possible numbers (of which there are an in nite number
of). So our computer has a certain number of bits for each oat and int, and has to round to the nearest decimal value at a certain point.
 With floats.c , we can see what happens when we use oats:

#include <cs50.h>
#include <stdio.h>

i t i ( id) 11/12
int main(void)
{
// Prompt user for x
float x = get_float("x: ");

// Prompt user for y

float y = get_float("y: ");

// Perform division
printf("x / y = %.50f\n", x / y);
}

 With %50f , we can specify the number of decimal places displayed.

 Hmm, now we get …

x: 1
y: 10
x / y = 0.10000000149011611938476562500000000000000000000000

 It turns out that this is called oating-point imprecision, where we don’t have enough bits to store all possible values, so the
computer has to store the closest value it can to 1 divided by 10.
 We can see a similar problem in overflow.c :

#include <stdio.h>
#include <unistd.h>

int main(void)
{
for (int i = 1; ; i *= 2)
{
printf("%i\n", i);
sleep(1);
}
}

 In our for loop, we set i to 1 , and double it with *= 2 . (And we’ll keep doing this forever, so there’s no condition we check.)
 We also use the sleep function from unistd.h to let our program pause each time.
 Now, when we run this program, we see the number getting bigger and bigger, until:

1073741824
overflow.c:6:25: runtime error: signed integer overflow: 1073741824 * 2 cannot be represented in type 'int'
-2147483648
0
0
...

 It turns out, our program recognized that a signed integer (an integer with a positive or negative sign) couldn’t store that next value,
and printed an error. Then, since it tried to double it anyways, i became a negative number, and then 0.
 This problem is called integer over ow, where an integer can only be so big before it runs out of bits and “rolls over”. We can picture
adding 1 to 999 in decimal. The last digit becomes 0, we carry the 1 so the next digit becomes 0, and we get 1000. But if we only had
three digits, we would end up with 000 since there’s no place to put the nal 1!
 The Y2K problem arose because many programs stored the calendar year with just two digits, like 98 for 1998, and 99 for 1999. But when
the year 2000 approached, the programs would have stored 00, leading to confusion between the years 1900 and 2000.
 A Boeing 787 airplane also had a bug where a counter in the generator over ows after a certain number of days of continuous operation,
since the number of seconds it has been running could no longer be stored in that counter.
 So, we’ve seen a few problems that can happen, but now understand why, and how to prevent them.
 With this week’s problem set, we’ll use the CS50 Lab, built on top of the CS50 Sandbox, to write some programs with walkthroughs to
guide us.

12/12
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 2
Compiling
Debugging
help50 and printf
debug50
check50 and style50
Data Types
Memory
Arrays
Strings
Command-line arguments
Readability
Encryption

Compiling
 Last time, we learned to write our rst program in C. We learned the syntax for the main function in our program, the printf function
for printing to the terminal, how to create strings with double quotes, and how to include stdio.h for the printf function.
 Then, we compiled it with clang hello.c to be able to run ./a.out (the default name), and then clang -o hello hello.c (passing in a
command-line argument for the output’s name) to be able to run ./hello .
 If we wanted to use CS50’s library, via #include <cs50.h> , for strings and the get_string function, we also have to add a ag: clang -o
hello hello.c -lcs50 . The -l ag links the cs50 le, which is already installed in the CS50 Sandbox, and includes prototypes, or
de nitions of strings and get_string (among more) that our program can then refer to and use.
 We write our source code in C, but need to compile it to machine code, in binary, before our computers can run it.
 clang is the compiler, and make is a utility that helps us run clang without having to indicate all the options manually.
 “Compiling” source code into machine code is actually made up of smaller steps:
 preprocessing
 compiling
 assembling
 linking
 Preprocessing involves looking at lines that start with a # , like #include , before everything else. For example, #include <cs50.h> will
tell clang to look for that header le rst, since it contains content that we want to include in our program. Then, clang will essentially
replace the contents of those header les into our program.
 For example …

#include <cs50.h>
#include <stdio.h>

int main(void)
{
string name = get_string("Name: ");
printf("hello, %s\n", name);
}

 … will be preprocessed into:

string get_string(string prompt);

i t i tf( t h *f t ) 1/11
int printf(const char *format, ...);

int main(void)
{
string name = get_string("Name: ");
printf("hello, %s\n", name);
}

 Compiling takes our source code, in C, and converts it to assembly code, which looks like this:

...
main: # @main
.cfi_startproc
# BB#0:
pushq %rbp
.Ltmp0:
.cfi_def_cfa_offset 16
.Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
xorl %eax, %eax
movl %eax, %edi
movabsq $.L.str, %rsi
movb $0, %al
callq get_string
movabsq $.L.str.1, %rdi
movq %rax, -8(%rbp)
movq -8(%rbp), %rsi
movb $0, %al
callq printf
...

 These instructions are lower-level and is closer to the binary instructions that a computer’s CPU can directly understand. They
generally operate on bytes themselves, as opposed to abstractions like variable names.
 The next step is to take the assembly code and translate it to instructions in binary by assembling it. The instructions in binary are called
machine code, which a computer’s CPU can run directly.
 The last step is linking, where the contents of previously compiled libraries that we want to link, like cs50.c , are actually combined with
the binary of our program. So we end up with one binary le, a.out or hello , that is the compiled version of hello.c , cs50.c , and
printf.c .

Debugging
 Bugs are mistakes in programs that we didn’t intend to make. And debugging is the process of nding and xing bugs.

help50 and printf

 Let’s say we wrote this program, buggy0.c :

int main(void)
{
printf("hello, world\n");
}

 We see an error (in red), when we try to make this program, that we are implicitly declaring library function 'printf' . We don’t
really understand this, so we can run help50 make buggy0 , which will tell us, at the end, that we might have forgotten to write
#include <stdio.h> , which contains printf .
 We can try this again with buggy1.c :

#include <stdio.h>

int main(void)
{
string name = get_string("What's your name?\n");
i f("h ll % \ " ) 2/11
printf("hello, %s\n", name);
}

 We see a lot of errors, and even the rst one doesn’t seem to make much sense. So we can again run help50 make buggy1 , which will
hint to us that we need cs50.h since string isn’t de ned.
 To clear the terminal window (so that we can see just the output of whatever we want to run next), we can press control + L , or type in
clear as a command to the terminal window.
 Let’s look at buggy2.c :

#include <stdio.h>

int main(void)
{
for (int i = 0; i <= 10; i++)
{
printf("#\n");
}
}

 Hmm, we intended to only see 10 # s, but there are 11. If we didn’t know what the problem is (since our program is compiling
without any errors, and we now have a logical error), we could add another print line to help us:

#include <stdio.h>

int main(void)
{
for (int i = 0; i <= 10; i++)
{
printf("i is now %i: ", i);
printf("#\n");
}
}

 Now, we see that i started at 0 and continued until it was 10, but we should have it stop once it’s at 10, with i < 10 instead of i
<= 10 .

debug50
 Today we’ll also take a look at CS50 IDE, which is like the CS50 Sandbox, but with more features. It is an online development environment,
with a code editor and a terminal window, but also tools for debugging and collaborating:

 In the CS50 IDE, we’ll have another tool, debug50 , to help us debug programs.
 We’ll open buggy2.c and try to make buggy2 . But we saved buggy2.c into a folder called src2 , so we need to run cd src2 to change
our directory to the right one. And CS50 IDE’s terminal will remind us what directory we’re in, with a prompt like ~/src/ $ . (The ~
indicates the default, or home directory.)
3/11
 Instead of using printf , we can also debug our program interactively. We can add a breakpoint, or an indicator for a line of code where
the debugger should pause our program. For example, we can click to the left of line 5 of our code, and a red circle will appear:

 Now, if we run debug50 ./buggy2 , we’ll see the debugger panel open on the right:

 We see that the variable we made, i , is under the Local Variables section, and see that there’s a value of 0 .
 Our breakpoint has paused our program after line 5, to just before line 7, since it’s the rst line of code that can run. To continue, we have a
few controls in the debugger panel. The blue triangle will continue our program until we reach another breakpoint or the end of our
program. The curved arrow to its right will “step over” the line, running it and pausing our program again immediately after.
 So, we’ll use the curved arrow to run the next line, and see what changes after. We’re at the printf line, and pressing the curved arrow
again, we see a single # printed to our terminal window. With another click of the arrow, we see the value of i on the right change to
1 . And we can keep clicking the arrow to watch our program run, one line at a time.
 To exit the debugger, we can press control + C to stop the program.
 We can save lots of time in the future by investing a little bit now to learn how to use debug50 !

check50 and style50

 We can run a command like check50 cs50/problems/hello , where check50 is a program that will follow instructions identi ed by the
argument cs50/problems/hello to upload, run, and test our program on CS50’s servers. This will check our program for correctness.
 When writing software in the real world, developers will generally write their own tests to ensure their code works as they expect,
especially as more features are added to the same code.
 style50 is another program that will check our code for aesthetic issues, such as whitespace, such that our code is more readable and
maintainable. For example, we might be missing indentation. And the Style Guide (https://cs50.readthedocs.io/style/c/) will include more
explanations for what we expect.
 We can even use rubber duck debugging, a method where we explain what we’re trying to do to a rubber duck, such that we realize what
we’re trying to do and what we should x.
 We also want to write our code with good design, where we not only solve the problem correctly but well, where we make reasonable
choices for how our program runs, and make tradeoffs between time, development cost, and memory.
4/11
choices for how our program runs, and make tradeoffs between time, development cost, and memory.

Data Types
 In C, we have different types of variables we can use for storing data:
 bool 1 byte
 char 1 byte
 int 4 bytes
 oat 4 bytes
 long 8 bytes

 double 8 bytes
 string ? bytes
 Each of these types take up a certain number of bytes per variable we create, and the sizes above are what the sandbox, IDE, and most
likely your computer uses for each type in C.

Memory
 Inside our computers, we have chips called RAM, random-access memory, that stores data for short-term use. We might save a program or
le to our hard drive (or SSD) for long-term storage, but when we open it, it gets copied to RAM rst. Though RAM is much smaller, and
temporary (until the power is turned off), it is much faster.
 We can think of bytes, stored in RAM, as though they were in a grid:

 In reality, there are millions or billions of bytes per chip.

 In C, when we create a variable of type char , which will be sized one byte, it will physically be stored in one of those boxes in RAM. An
integer, with 4 bytes, will take up four of those boxes.
 And each of these boxes is labeled with some number, or address, from 0, to 1, to 2, and so on.

Arrays
 Let’s say we wanted to store three variables:

#include <stdio.h>

int main(void)
{
char c1 = 'H';
char c2 = 'I';
char c3 = '!';
printf("%c %c %c\n", c1, c2, c3);
}

 Notice that we use single quotes to indicate a literal character, and double quotes for multiple characters together in a string.
 We can compile and run this, to see H I ! .
 And we know characters are just numbers, so if we change our string formatting to be printf("%i %i %i\n", c1, c2, c3); , we can see
the numeric values of each char printed: 72 73 33 .
 We can explicitly convert, or cast, each character to an int before we use it, with (int) c1 , but our compiler can implicitly do that for
us.
 And in memory, we might have three boxes, labeled c1 , c2 , and c3 somehow, each of which representing a byte of binary with the
values of each variable.
 Let’s look at scores0.c :

#include <cs50.h>
5/11
#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Scores
int score1 = 72;
int score2 = 73;
int score3 = 33;

// Print average
printf("Average: %i\n", (score1 + score2 + score3) / 3);
}

 We can print the average of three numbers, but now we need to make one variable for every score we want to include, and we can’t
easily use them later.
 It turns out, in memory, we can store variables one after another, back-to-back. And in C, a list of variables stored, one after another in a
contiguous chunk of memory, is called an array.
 For example, we can use int scores[3]; to declare an array of 3 integers.
 And we can assign and use variables in an array with:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Scores
int scores[3];
scores[0] = 72;
scores[1] = 73;
scores[2] = 33;

// Print average
printf("Average: %i\n", (scores[0] + scores[1] + scores[2]) / 3);
}

 Notice that arrays are zero-indexed, meaning that the rst element, or value, has index 0.
 And we repeated the value 3, representing the length of our array, in two different places. So we can use a constant, or xed value, to
indicate it should always be the same in both places:

#include <cs50.h>
#include <stdio.h>

const int N = 3;

int main(void)
{
// Scores
int scores[N];
scores[0] = 72;
scores[1] = 73;
scores[2] = 33;

// Print average
printf("Average: %i\n", (scores[0] + scores[1] + scores[2]) / N);
}

 We can use the const keyword to tell the compiler that the value of N should never be changed by our program. And by convention,
we’ll place our declaration of the variable outside of the main function and capitalize its name, which isn’t necessary for the compiler
but shows other humans that this variable is a constant and makes it easy to see from the start.
 With an array, we can collect our scores in a loop, and access them later in a loop, too:

6/11
#include <cs50.h>
#include <stdio.h>

float average(int length, int array[]);

int main(void)
{
// Get number of scores
int n = get_int("Scores: ");

// Get scores
int scores[n];
for (int i = 0; i < n; i++)
{
scores[i] = get_int("Score %i: ", i + 1);
}

// Print average
printf("Average: %.1f\n", average(n, scores));
}

float average(int length, int array[])

{
int sum = 0;
for (int i = 0; i < length; i++)
{
sum += array[i];
}
return (float) sum / (float) length;
}

 First, we’ll ask the user for the number of scores they have, create an array with enough int s for the number of scores they have, and
use a loop to collect all the scores.
 Then we’ll write a helper function, average , to return a float , or a decimal value. We’ll pass in the length and an array of int s
(which could be any size), and use another loop inside our helper function to add up the values into a sum. We use (float) to cast
both sum and length into oats, so the result we get from dividing the two is also a oat.
 Finally, when we print the result we get, we use %.1f to show just one place after the decimal.
 In memory, our array is now stored like this, where each value takes up not one but four bytes:

Strings
 Strings are actually just arrays of characters. If we had a string s , each character can be accessed with s[0] , s[1] , and so on.
 And it turns out that a string ends with a special character, ‘\0’, or a byte with all bits set to 0. This character is called the null character, or
null terminating character. So we actually need four bytes to store our string “HI!”:

7/11
 Now let’s see what four strings in an array might look like:

string names[4];
names[0] = "EMMA";
names[1] = "RODRIGO";
names[2] = "BRIAN";
names[3] = "DAVID";

printf("%s\n", names[0]);
printf("%c%c%c%c\n", names[0][0], names[0][1], names[0][2], names[0][3]);

 We can print the rst value in names as a string, or we can get the rst string, and get each individual character in that string by
using [] again. (We can think of it as (names[0])[0] , though we don’t need the parentheses.)
 And though we know that the rst name had four characters, printf probably used a loop to look at each character in the string,
printing them one at a time until it reached the null character that marks the end of the string. And in fact, we can print names[0][4]
as an int with %i , and see a 0 being printed.
 We can visualize each character with its own label in memory:

 We can try experimenting with string0.c :

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Input: ");
printf("Output: ");
for (int i = 0; i < strlen(s); i++)
{
printf("%c", s[i]);
}
printf("\n");
}

 We can use the condition s[i] != '\0' , where we can check the current character and only print it if it’s not the null character.
 We can also use the length of the string, but rst, we need a new library, string.h , for strlen , which tells us the length of a string.
 We can improve the design of our program. string0 was a bit inef cient, since we check the length of the string, after each character is
printed, in our condition. But since the length of the string doesn’t change, we can check the length of the string once:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Input: ");
printf("Output:\n");
8/11
for (int i = 0, n = strlen(s); i < n; i++)
{
printf("%c\n", s[i]);
}
}

 Now, at the start of our loop, we initialize both an i and n variable, and remember the length of our string in n . Then, we can
check the values each time, without having to actually calculate the length of the string.
 And we did need to use a little more memory for n , but this saves us some time with not having to check the length of the string
each time.
 We can now combine what we’ve seen, to write a program that can capitalize letters:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Before: ");
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
if (s[i] >= 'a' && s[i] <= 'z')
{
printf("%c", s[i] - 32);
}
else
{
printf("%c", s[i]);
}
}
printf("\n");
}

 First, we get a string s . Then, for each character in the string, if it’s lowercase (its value is between that of a and z ), we convert it
to uppercase. Otherwise, we just print it.
 We can convert a lowercase letter to its uppercase equivalent, by subtracting the difference between their ASCII values. (We know that
lowercase letters have a higher ASCII value than uppercase letters, and the difference is conveniently the same between the same
letters, so we can subtract that difference to get an uppercase letter from a lowercase letter.)
 We can use the man pages (https://man.cs50.io/), or programmer’s manual, to nd library functions that we can use to accomplish the
same thing:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Before: ");
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
printf("%c", toupper(s[i]));
}
printf("\n");
}

 From searching the man pages, we see toupper() is a function, among others, from a library called ctype , that we can use.

Command-line arguments
 We’ve used programs like make and clang , which take in extra words after their name in the command line. It turns out that programs of
our own, can also take in command-line arguments.
 In argv.c , we change what our main function looks like:

#include <cs50.h>
#include <stdio.h>

int main(int argc, string argv[])

{ 9/11
{
if (argc == 2)
{
printf("hello, %s\n", argv[1]);
}
else
{
printf("hello, world\n");
}
}

 argc and argv are two variables that our main function will now get, when our program is run from the command line. argc is
the argument count, or number of arguments, and argv is an array of strings that are the arguments. And the rst argument,

argv[0] , is the name of our program (the rst word typed, like ./hello ). In this example, we check if we have two arguments, and
print out the second one if so.
 For example, if we run ./argv David , we’ll get hello, David printed, since we typed in David as the second word in our command.
 It turns out that we can indicate errors in our program by returning a value from our main function (as implied by the int before our
main function). By default, our main function returns 0 to indicate nothing went wrong, but we can write a program to return a
different value:

#include <cs50.h>
#include <stdio.h>

int main(int argc, string argv[])

{
if (argc != 2)
{
printf("missing command-line argument\n");
return 1;
}
printf("hello, %s\n", argv[1]);
return 0;
}

 The return value of main in our program is called an exit code.

 As we write more complex programs, error codes like this can help us determine what went wrong, even if it’s not visible or meaningful to
the user

Readability
 Now that we know how to work with strings in our programs, we can analyze paragraphs of text for their level of readability, based on
factors like how long and complicated the words and sentences are.

Encryption
 If we wanted to send a message to someone, we might want to encrypt, or somehow scramble that message so that it would be hard for
others to read. The original message, or input to our algorithm, is called plaintext, and the encrypted message, or output, is called
ciphertext.
 A message like HI! could be converted to ASCII, 72 73 33 . But anyone would be able to convert that back to letters.
 An encryption algorithm generally requires another input, in addition to the plaintext. A key is needed, and sometimes it is simply a
number, that is kept secret. With the key, plaintext can be converted, via some algorith, to ciphertext, and vice versa.
 For example, if we wanted to send a message like I L O V E Y O U , we can rst convert it to ASCII: 73 76 79 86 69 89 79 85 . Then,
we can encrypt it with a key of just 1 and a simple algorithm, where we just add the key to each value: 74 77 80 87 70 90 80 86 .
Then, someone converting that ASCII back to text will see J M P W F Z P V . To decrypt this, someone will need to know the key.
 We’ll apply these concepts in our problem set!

10/11
11/11
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 3
Searching
Big O
Linear search
Structs
Sorting
Selection sort
Recursion
Merge sort

Searching
 Last time, we talked about memory in a computer, or RAM, and how our data can be stored as individual variables or as arrays of many
items, or elements.
 We can think of an array with a number of items as a row of lockers, where a computer can only open one locker to look at an item, one at
a time.
 For example, if we want to check whether a number is in an array, with an algorithm that took in an array as input and produce a boolean
as a result, we might:
 look in each locker, or at each element, one at a time, from the beginning to the end.
 This is called linear search, where we move in a line, since our array isn’t sorted.
 start in the middle and move left or right depending on what we’re looking for, if our array of items is sorted.
 This is called binary search, since we can divide our problem in two with each step, like what David did with the phone book in
week 0.
 We might write pseudocode for linear search with:

For i from 0 to n–1

If i'th element is 50
Return true
Return false

 We can label each of n lockers from 0 to n–1 , and check each of them in order.
 For binary search, our algorithm might look like:

If no items
Return false
If middle item is 50
Return true
Else if 50 < middle item
Search left half
Else if 50 > middle item
Search right half

 Eventually, we won’t have any parts of the array left (if the item we want wasn’t in it), so we can return false .
 Otherwise, we can search each half depending on the value of the middle item.

Big O

1/10
 In week 0, we saw different types of algorithms and their running times:

 The more formal way to describe this is with big O notation, which we can think of as “on the order of”. For example, if our algorithm is
linear search, it will take approximately O(n) steps, “on the order of n”. In fact, even an algorithm that looks at two items at a time and
takes n/2 steps has O(n). This is because, as n gets bigger and bigger, only the largest term, n, matters.
 Similarly, a logarithmic running time is O(log n), no matter what the base is, since this is just an approximation of what happens with n is
very large.
 There are some common running times:
 O(n2)
 O(n log n)
 O(n)
 (linear search)
 O(log n)
 (binary search)
 O(1)
 Computer scientists might also use big Ω, big Omega notation, which is the lower bound of number of steps for our algorithm. (Big O is the
upper bound of number of steps, or the worst case, and typically what we care about more.) With linear search, for example, the worst case
is n steps, but the best case is 1 step since our item might happen to be the rst item we check. The best case for binary search, too, is 1
since our item might be in the middle of the array.
 And we have a similar set of the most common big Ω running times:
 Ω(n2)
 Ω(n log n)
 Ω(n)
 (counting the number of items)
 Ω(log n)
 Ω(1)
 (linear search, binary search)

Linear search
2/10
 Let’s take a look at numbers.c :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// An array of numbers
int numbers[] = {4, 8, 15, 16, 23, 42};

// Search for 50
for (int i = 0; i < 6; i++)
{
if (numbers[i] == 50)
{
printf("Found\n");
return 0;
}
}
printf("Not found\n");
return 1;
}

 Here we initialize an array with some values, and we check the items in the array one at a time, in order.
 And in each case, depending on whether the value was found or not, we can return an exit code of either 0 (for success) or 1 (for
failure).
 We can do the same for names:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
// An array of names
string names[] = {"EMMA", "RODRIGO", "BRIAN", "DAVID"};

// Search for EMMA

for (int i = 0; i < 4; i++)
{
if (strcmp(names[i], "EMMA") == 0)
{
printf("Found\n");
return 0;
}
}
printf("Not found\n");
return 1;
}

 We can’t compare strings directly, since they’re not a simple data type but rather an array of many characters, and we need to compare
them differently. Luckily, the string library has a strcmp function which compares strings for us and returns 0 if they’re the same,
so we can use that.
 Let’s try to implement a phone book with the same ideas:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string names[] = {"EMMA", "RODRIGO", "BRIAN", "DAVID"};
string numbers[] = {"617–555–0100", "617–555–0101", "617–555–0102", "617–555–0103"};
3/10
string numbers[] = { 617 555 0100 , 617 555 0101 , 617 555 0102 , 617 555 0103 };

for (int i = 0; i < 4; i++)

{
if (strcmp(names[i], "EMMA") == 0)
{
printf("Found %s\n", numbers[i]);
return 0;
}
}
printf("Not found\n");
return 1;
}

 We’ll use strings for phone numbers, since they might include formatting or be too long for a number.
 Now, if the name at a certain index in the names array matches who we’re looking for, we’ll return the phone number in the numbers
array, at the same index. But that means we need to particularly careful to make sure that each number corresponds to the name at
each index, especially if we add or remove names and numbers.

Structs
 It turns out that we can make our own custom data types called structs:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

typedef struct
{
string name;
string number;
}
person;

int main(void)
{
person people[4];

people[0].name = "EMMA";
people[0].number = "617–555–0100";

people[1].name = "RODRIGO";
people[1].number = "617–555–0101";

people[2].name = "BRIAN";
people[2].number = "617–555–0102";

people[3].name = "DAVID";
people[3].number = "617–555–0103";

// Search for EMMA

for (int i = 0; i < 4; i++)
{
if (strcmp(people[i].name, "EMMA") == 0)
{
printf("Found %s\n", people[i].number);
return 0;
}
}
printf("Not found\n");
return 1;
}

 We can think of structs as containers, inside of which are multiple other data types.
 Here, we create our own type with a struct called person , which will have a string called name and a string called number .
Then, we can create an array of these struct types and initialize the values inside each of them, using a new syntax, . , to access the
properties of each person .
 In our loop, we can now be more certain that the number corresponds to the name since they are from the same person element.

Sorting
 If our input is an unsorted list of numbers, there are many algorithms we could use to produce an output of a sorted list.
4/10
 With eight volunteers on the stage with the following numbers, we might consider swapping pairs of numbers next to each other as a rst
step.
 Our volunteers start in the following random order:

6 3 8 5 2 7 4 1

 We look at the rst two numbers, and swap them so they are in order:

6 3 8 5 2 7 4 1
– –
3 6 8 5 2 7 4 1

 The next pair, 6 and 8 , are in order, so we don’t need to swap them.
 The next pair, 8 and 5 , need to be swapped:

3 6 8 5 2 7 4 1
– –
3 6 5 8 2 7 4 1

 We continue until we reach the end of the list:

3 6 5 2 8 7 4 1
– –
3 6 5 2 7 8 4 1
– –
3 6 5 2 7 4 8 1
– –
3 6 5 2 7 4 1 8

 Our list isn’t sorted yet, but we’re slightly closer to the solution because the biggest value, 8 , has been shifted all the way to the right.
 We repeat this with another pass through the list:

3 6 5 2 7 4 1 8
– –
3 6 5 2 7 4 1 8
– –
3 5 6 2 7 4 1 8
– –
3 5 2 6 7 4 1 8
– –
3 5 2 6 7 4 1 8
– –
3 5 2 6 4 7 1 8
– –
3 5 2 6 4 1 7 8

 Note that we didn’t need to swap the 3 and 6, or the 6 and 7.

 Now, the next biggest value, 7 , moved all the way to the right. If we repeat this, more and more of the list becomes sorted, and pretty
quickly we have a fully sorted list.
 This algorithm is called bubble sort, where large values “bubble” to the right. The pseudocode for this might look like:

Repeat n–1 times

For i from 0 to n–2
If i'th and i+1'th elements out of order
Swap them

 Since we are comparing the i'th and i+1'th element, we only need to go up to n – 2 for i . Then, we swap the two elements if
they’re out of order.
 And we can stop after we’ve made n – 1 passes, since we know the largest n–1 elements will have bubbled to the right.
 We have n – 2 steps for the inner loop, and n – 1 loops, so we get n2 – 3n + 2 steps total. But the largest factor, or dominant term, is n2, as
n gets larger and larger, so we can say that bubble sort is O(n2).
 We’ve seen running times like the following, and so even though binary search is much faster than linear search, it might not be worth the
one–time cost of sorting the list rst, unless we do lots of searches over time:
 O(n2)
 bubble sort
 O(n log n)
 O(n)
 linear search
5/10
 O(log n)
 binary search
 O(1)
 And Ω for bubble sort is still n2, since we still check each pair of elements for n – 1 passes.

Selection sort
 We can take another approach with the same set of numbers:

6 3 8 5 2 7 4 1

 First, we’ll look at each number, and remember the smallest one we’ve seen. Then, we can swap it with the rst number in our list, since
we know it’s the smallest:

6 3 8 5 2 7 4 1
– –
1 3 8 5 2 7 4 6

 Now we know at least the rst element of our list is in the right place, so we can look for the smallest element among the rest, and swap
it with the next unsorted element (now the second element):

1 3 8 5 2 7 4 6
– –
1 2 8 5 3 7 4 6

 We can repeat this over and over, until we have a sorted list.
 This algorithm is called selection sort, and we might write pseudocode like this:

For i from 0 to n–1

Find smallest item between i'th item and last item
Swap smallest item with i'th item

 With big O notation, we still have running time of O(n2), since we were looking at roughly all n elements to nd the smallest, and making n
passes to sort all the elements.
 More formally, we can use some formulas to show that the biggest factor is indeed n2:

n + (n – 1) + (n – 2) + ... + 1
n(n + 1)/2
(n^2 + n)/2
n^2/2 + n/2
O(n^2)

 So it turns out that selection sort is fundamentally about the same as bubble sort in running time:
 O(n2)
 bubble sort, selection sort
 O(n log n)
 O(n)
 linear search
 O(log n)
 binary search
 O(1)
 The best case, Ω, is also n2.
 We can go back to bubble sort and change its algorithm to be something like this, which will allow us to stop early if all the elements are
sorted:

Repeat until no swaps

For i from 0 to n–2
If i'th and i+1'th elements out of order
Swap them

 Now, we only need to look at each element once, so the best case is now Ω(n):
 Ω(n2)
 selection sort 6/10
 selection sort
 Ω(n log n)
 Ω(n)
 bubble sort
 Ω(log n)
 Ω(1)
 linear search, binary search
 We look at a visualization online comparing sorting algorithms (https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html) with
animations for how the elements move within arrays for both bubble sort and selection sort.

Recursion
 Recall that in week 0, we had pseudocode for nding a name in a phone book, where we had lines telling us to “go back” and repeat some
steps:

1 Pick up phone book

2 Open to middle of phone book
3 Look at page
4 If Smith is on page
5 Call Mike
6 Else if Smith is earlier in book
7 Open to middle of left half of book
8 **Go back to line 3**
9 Else if Smith is later in book
10 Open to middle of right half of book
11 **Go back to line 3**
12 Else
13 Quit

 We could instead just repeat our entire algorithm on the half of the book we have left:

1 Pick up phone book

2 Open to middle of phone book
3 Look at page
4 If Smith is on page
5 Call Mike
6 Else if Smith is earlier in book
7 **Search left half of book**
8
9 Else if Smith is later in book
10 **Search right half of book**
11
12 Else
13 Quit

 This seems like a cyclical process that will never end, but we’re actually dividing the problem in half each time, and stopping once
there’s no more book left.
 Recursion occurs when a function or algorithm refers to itself, as in the new pseudocode above.
 In week 1, too, we implemented a “pyramid” of blocks in the following shape:

#
##
###
####

 And we might have had iterative code like this:

#include <cs50.h>
#include <stdio.h>

void draw(int h);

int main(void)
{
// Get height of pyramid
int height = get_int("Height: ");

// Draw pyramid
draw(height);
}
7/10
}

void draw(int h)
{
// Draw pyramid of height h
for (int i = 1; i <= h; i++)
{
for (int j = 1; j <= i; j++)
{
printf("#");
}
printf("\n");
}
}

 Here, we use for loops to print each block in each row.

 But notice that a pyramid of height 4 is actually a pyramid of height 3, with an extra row of 4 blocks added on. And a pyramid of height 3
is a pyramid of height 2, with an extra row of 3 blocks. A pyramid of height 2 is a pyramid of height 1, with an extra row of 2 blocks. And
nally, a pyramid of height 1 is just a pyramid of height 0, or nothing, with another row of a single block added on.
 With this idea in mind, we can write:

#include <cs50.h>
#include <stdio.h>

void draw(int h);

int main(void)
{
// Get height of pyramid
int height = get_int("Height: ");

// Draw pyramid
draw(height);
}

void draw(int h)
{
// If nothing to draw
if (h == 0)
{
return;
}

// Draw pyramid of height h - 1

draw(h - 1);

// Draw one more row of width h

for (int i = 0; i < h; i++)
{
printf("#");
}
printf("\n");
}

 Now, our draw function rst calls itself recursively, drawing a pyramid of height h - 1 . But even before that, we need to stop if h
is 0, since there won’t be anything left to drawn.
 After, we draw the next row, or a row of width h .

Merge sort
 We can take the idea of recusion to sorting, with another algorithm called merge sort. The pseudocode might look like:

If only one item

Return
Else
Sort left half of items
Sort right half of items
Merge sorted halves

 We’ll best see this in practice with an unsorted list:

7 4 5 2 6 3 8 1

 First, we’ll sort the left half (the rst four elements):
8/10
7 4 5 2 | 6 3 8 1
– – – –

 Well, to sort that, we need to sort the left half of the left half rst:

7 4 | 5 2 | 6 3 8 1
– –

 Now, we have just one item, 7 , in the left half, and one item, 4 , in the right half. So we’ll merge that together, by taking the smallest
item from each list rst:

– – | 5 2 | 6 3 8 1
4 7

 And now we go back to the right half of the left half, and sort it:

– – | – – | 6 3 8 1
4 7 | 2 5

 Now, both halves of the left half are sorted, so we can merge the two of them together. We look at the start of each list, and take 2 since
it’s smaller than 4 . Then, we take 4 , since it’s now the smallest item at the front of both lists. Then, we take 5 , and nally, 7 , to get:

– – – – | 6 3 8 1
– – – –
2 4 5 7

 We now sort the right half the same way. First, the left half of the right half:

– – – – | – – | 8 1
– – – – | 3 6 |
2 4 5 7

 Then, the right half of the right half:

– – – – | – – | – –
– – – – | 3 6 | 1 8
2 4 5 7

 We can merge the right half together now:

– – – – | – – – –
– – – – | – – – –
2 4 5 7 | 1 3 6 8

 And nally, we can merge both halves of the whole list, following the same steps as before. Notice that we don’t need to check all the
elements of each half to nd the smallest, since we know that each half is already sorted. Instead, we just take the smallest element of the
two at the start of each half:

– – – – | – – – –
– – – – | – – – –
2 4 5 7 | – 3 6 8
1

– – – – | – – – –
– – – – | – – – –
– 4 5 7 | – 3 6 8
1 2

– – – – | – – – –
– – – – | – – – –
– 4 5 7 | – – 6 8
1 2 3

– – – – | – – – –
– – – – | – – – –
– – 5 7 | – – 6 8
1 2 3 4

9/10
– – – – | – – – –
– – – – | – – – –
– – – 7 | – – 6 8
1 2 3 4 5

– – – – | – – – –
– – – – | – – – –
– – – 7 | – – – 8
1 2 3 4 5 6

– – – – | – – – –
– – – – | – – – –
– – – – | – – – 8
1 2 3 4 5 6 7

– – – – | – – – –
– – – – | – – – –
– – – – | – – – –
1 2 3 4 5 6 7 8

 It took a lot of steps, but it actually took fewer steps than the other algorithms we’ve seen so far. We broke our list in half each time, until
we were “sorting” eight lists with one element each:

7 | 4 | 5 | 2 | 6 | 3 | 8 | 1
4 7 | 2 5 | 3 6 | 1 8
2 4 5 7 | 1 3 6 8
1 2 3 4 5 6 7 8

 Since our algorithm divided the problem in half each time, its running time is logarithmic with O(log n). And after we sorted each half (or
half of a half), we needed to merge together all the elements, with n steps since we had to look at each element once.
 So our total running time is O(n log n):
 O(n2)
 bubble sort, selection sort
 O(n log n)
 merge sort
 O(n)
 linear search
 O(log n)
 binary search
 O(1)
 Since log n is greater than 1 but less than n, n log n is in between n (times 1) and n2.
 The best case, Ω, is still n log n, since we still sort each half rst and then merge them together:
 Ω(n2)
 selection sort
 Ω(n log n)
 merge sort
 Ω(n)
 bubble sort
 Ω(log n)
 Ω(1)
 linear search, binary search
 Finally, there is another notation, Θ, Theta, which we use to describe running times of algorithms if the upper bound and lower bound is
the same. For example, merge sort has Θ(n log n) since the best and worst case both require the same number of steps. And selection sort
has Θ(n2).
 We look at a nal visualization (https://www.youtube.com/watch?v=ZZuD6iUe3Pc) of sorting algorithms with a larger number of inputs,
running at the same time.

10/10
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 4
Hexadecimal
Pointers
string
Compare and copy
valgrind
Swap
Memory layout
get_int
Files
JPEG

Hexadecimal
 In week 0, we learned binary, a counting system with 0s and 1s.
 In week 2, we talked about memory and how each byte has an address, or identi er, so we can refer to where our variables are actually
stored.
 It turns out that, by convention, the addresses for memory use the counting system hexadecimal, where there are 16 digits, 0-9 and A-F.
 Recall that, in binary, each digit stood for a power of 2:

128 64 32 16 8 4 2 1
1 1 1 1 1 1 1 1

 With 8 bits, we can count up to 255.

 It turns out that, in hexadecimal, we can perfectly count up to 8 binary bits with just 2 digits:

16^1 16^0
F F

 Here, the F is a value of 15 in decimal, and each place is a power of 16, so the rst F is 16^1 * 15 = 240, plus the second F with
the value of 16^0 * 15 = 15, for a total of 255.
 And 0A is the same as 10 in decimal, and 0F the same as 15. 10 in hexadecimal would be 16, and we would say it as “one zero in
hexadecimal” instead of “ten”, if we wanted to avoid confusion.
 The RGB color system also conventionally uses hexadecimal to describe the amount of each color. For example, 000000 in hexadecimal
means 0 of each red, green, and blue, for a color of black. And FF0000 would be 255, or the highest possible, amount of red. With
different values for each color, we can represent millions of different colors.
 In writing, we can also indicate a value is in hexadecimal by pre xing it with 0x , as in 0x10 , where the value is equal to 16 in decimal,
as opposed to 10.

Pointers
 We might create a value n , and print it out:

#include <stdio.h>

int main(void)
{
int n = 50; 1/11
int n = 50;
printf("%i\n", n);
}

 In our computer’s memory, there are now 4 bytes somewhere that have the binary value of 50, labeled n :

 It turns out that, with the billions of bytes in memory, those bytes for the variable n starts at some unique address that might look like
0x12345678 .
 In C, we can actually see the address with the & operator, which means “get the address of this variable”:

#include <stdio.h>

int main(void)
{
int n = 50;
printf("%p\n", &n);
}

 And in the CS50 IDE, we might see an address like 0x7ffe00b3adbc , where this is a speci c location in the server’s memory.
 The address of a variable is called a pointer, which we can think of as a value that “points” to a location in memory. The * operator lets us
“go to” the location that a pointer is pointing to.
 For example, we can print *&n , where we “go to” the address of n , and that will print out the value of n , 50 , since that’s the value at
the address of n :

#include <stdio.h>

int main(void)
{
int n = 50;
printf("%i\n", *&n);
}

 We also have to use the * operator (in an unfortunately confusing way) to declare a variable that we want to be a pointer:

#include <stdio.h>

int main(void)
{
int n = 50;
int *p = &n;
printf("%p\n", p);
}

 Here, we use int *p to declare a variable, p , that has the type of * , a pointer, to a value of type int , an integer. Then, we can
print its value (something like 0x12345678 ), or print the value at its location with printf("%i\n", *p); .

2/11
 In our computer’s memory, the variables might look like this:

 We have a pointer, p , with the address of some variable.

 We can abstract away the actual value of the addresses now, since they’ll be different as we declare variables in our programs, and simply
think of p as “pointing at” some value:

 Let’s say we have a mailbox labeled “123”, with the number “50” inside it. The mailbox would be int n , since it stores an integer. We
might have another mailbox with the address “456”, inside of which is the value “123”, which is the address of our other mailbox. This
would be int *p , since it’s a pointer to an integer.
 With the ability to use pointers, we can create different data structures, or different ways to organize data in memory that we’ll see next
week.
 Many modern computer systems are “64-bit”, meaning that they use 64 bits to address memory, so a pointer will be 8 bytes, twice as big as
an integer of 4 bytes.

string
 We might have a variable string s for a name like EMMA , and be able to access each character with s[0] and so on:

 But it turns out that each character is in stored in memory at a byte with some address, and s is actually just a pointer with the address
of the rst character:

3/11
 And since s is just a pointer to the beginning, only the \0 indicates the end of the string.
 In fact, the CS50 Library de nes a string with typedef char *string , which just says that we want to name a new type, string , as a
char * , or a pointer to a character.
 Let’s print out a string:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
string s = "EMMA";
printf("%s\n", s);
}

 This is familiar, but we can just say:

#include <stdio.h>

int main(void)
{
char *s = "EMMA";
printf("%s\n", s);
}

 This will also print EMMA .

 With printf("%p\n", s); , we can print s as its value as a pointer, like 0x42ab52 . ( printf knows to go to the address and print the
entire string when we use %s and pass in s , even though s only points to the rst character.)
 We can also try printf("%p\n", &s[0]); , which is the address of the rst character of s , and it’s exactly the same as printing s . And
printing &s[1] , &s[2] , and &s[3] gets us the addresses that are the next characters in memory after &s[0] , like 0x42ab53 , 0x42ab54 ,
and 0x42ab55 , exactly one byte after another.
 And nally, if we try to printf("%c\n", *s); , we get a single character E , since we’re going to the address contained in s , which has
the rst character in the string.
 In fact, s[0] , s[1] , and s[2] actually map directly to *s , *(s+1) , and *(s+2) , since each of the next characters are just at the
address of the next byte.

Compare and copy

 Let’s look at compare0 :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Get two integers
int i = get_int("i: ");
int j = get_int("j: ");

// Compare integers
if (i == j)
{
printf("Same\n");
}
else
{
printf("Different\n");
}
}

 We can compile and run this, and our program works as we’d expect, with the same values of the two integers giving us “Same” and
different values “Different”.
 In compare1 , we see that the same string values are causing our program to print “Different”:
4/11
p , g g p g p

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Get two strings
string s = get_string("s: ");
string t = get_string("t: ");

// Compare strings' addresses

if (s == t)
{
printf("Same\n");
}
else
{
printf("Different\n");
}
}

 Given what we now know about strings, this makes sense because each “string” variable is pointing to a different location in memory,
where the rst character of each string is stored. So even if the values of the strings are the same, this will always print “Different”.
 For example, our rst string might be at address 0x123, our second might be at 0x456, and s will be 0x123 and t will be 0x456 ,
so those values will be different.
 And get_string , this whole time, has been returning just a char * , or a pointer to the rst character of a string from the user.
 Now let’s try to copy a string:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>

int main(void)
{
string s = get_string("s: ");

string t = s;

t[0] = toupper(t[0]);

// Print string twice

printf("s: %s\n", s);
printf("t: %s\n", t);
}

 We get a string s , and copy the value of s into t . Then, we capitalize the rst letter in t .
 But when we run our program, we see that both s and t are now capitalized.
 Since we set s and t to the same values, they’re actually pointers to the same character, and so we capitalized the same character!
 To actually make a copy of a string, we have to do a little more work:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
char *s = get_string("s: ");

char *t = malloc(strlen(s) + 1);

5/11
for (int i = 0, n = strlen(s); i < n + 1; i++)
{
t[i] = s[i];
}

t[0] = toupper(t[0]);

printf("s: %s\n", s);

printf("t: %s\n", t);
}

 We create a new variable, t , of the type char * , with char *t . Now, we want to point it to a new chunk of memory that’s large
enough to store the copy of the string. With malloc , we can allocate some number of bytes in memory (that aren’t already used to
store other values), and we pass in the number of bytes we’d like. We already know the length of s , so we add 1 to that for the
terminating null character. So, our nal line of code is char *t = malloc(strlen(s) + 1); .
 Then, we copy each character, one at a time, and now we can capitalize just the rst letter of t . And we use i < n + 1 , since we
actually want to go up to n , to ensure we copy the terminating character in the string.
 We can actually also use the strcpy library function with strcpy(t, s) instead of our loop, to copy the string s into t . To be
clear, the concept of a “string” is from the C language and well-supported; the only training wheels from CS50 are the type string
instead of char * , and the get_string function.
 If we didn’t copy the null terminating character, \0 , and tried to print out our string t , printf will continue and print out the unknown,
or garbage, values that we have in memory, until it happens to reach a \0 , or crashes entirely, since our program might end up trying to
read memory that doesn’t belong to it!

valgrind
 It turns out that, after we’re done with memory that we’ve allocated with malloc , we should call free (as in free(t) ), which tells our
computer that those bytes are no longer useful to our program, so those bytes in memory can be reused again.
 If we kept running our program and allocating memory with malloc , but never freed the memory after we were done using it, we would
have a memory leak, which will slow down our computer and use up more and more memory until our computer runs out.
 valgrind is a command-line tool that we can use to run our program and see if it has any memory leaks. We can run valgrind on our
program above with help50 valgrind ./copy and see, from the error message, that line 10, we allocated memory that we never freed (or
“lost”).
 So at the end, we can add a line free(t) , which won’t change how our program runs, but no errors from valgrind.
 Let’s take a look at memory.c :

// http://valgrind.org/docs/manual/quick-start.html#quick-start.prepare

#include <stdlib.h>

void f(void)
{
int *x = malloc(10 * sizeof(int));
x[10] = 0;
}

int main(void)
{
f();
return 0;
}

 This is an example from valgrind’s documentation (valgrind is a real tool, while help50 was written speci cally to help us in this
course).
 The function f allocates enough memory for 10 integers, and stores the address in a pointer called x . Then we try to set the 11th
value of x with x[10] to 0 , which goes past the array of memory we’ve allocated for our program. This is called buffer over ow,
where we go past the boundaries of our buffer, or array, and into unknown memory.
 valgrind will also tell us there’s an “Invalid write of size 4” for line 8, where we are indeed trying to change the value of an integer (of size
4 bytes).
 And this whole time, the CS50 Library has been freeing memory it’s allocated in get_string , when our program nishes!

Swap
 We have two colored drinks, purple and green, each of which is in a cup. We want to swap the drinks between the two cups, but we can’t
do that without a third cup to pour one of the drink into rst.
6/11
do t at w t out a t d cup to pou o e o t e d to st.
 Now, let’s say we wanted to swap the values of two integers.

void swap(int a, int b)

{
int tmp = a;
a = b;
b = tmp;
}

 With a third variable to use as temporary storage space, we can do this pretty easily, by putting a into tmp , and then b to a , and
nally the original value of a , now in tmp , into b .
 But, if we tried to use that function in a program, we don’t see any changes:

#include <stdio.h>

void swap(int a, int b);

int main(void)
{
int x = 1;
int y = 2;

printf("x is %i, y is %i\n", x, y);

swap(x, y);
printf("x is %i, y is %i\n", x, y);
}

void swap(int a, int b)

{
int tmp = a;
a = b;
b = tmp;
}

 It turns out that the swap function gets its own variables, a and b when they are passed in, that are copies of x and y , and so
changing those values don’t change x and y in the main function.

Memory layout
 Within our computer’s memory, the different types of data that need to be stored for our program are organized into different sections:

 The machine code section is our compiled program’s binary code. When we run our program, that code is loaded into the “top” of
memory.
 Globals are global variables we declare in our program or other shared variables that our entire program can access.
 The heap section is an empty area where malloc can get free memory from, for our program to use.
 The stack section is used by functions in our program as they are called. For example, our main function is at the very bottom of the
stack, and has the local variables x and y . The swap function, when it’s called, has its own frame, or slice, of memory that’s on top

7/11
of main ’s, with the local variables a , b , and tmp :

 Once the function swap returns, the memory it was using is freed for the next function call, and we lose anything we did, other
than the return values, and our program goes back to the function that called swap .
 So by passing in the addresses of x and y from main to swap , we can actually change the values of x and y :

 By passing in the address of x and y , our swap function can actually work:

8/11
#include <stdio.h>

void swap(int a, int b);

int main(void)
{
int x = 1;
int y = 2;

printf("x is %i, y is %i\n", x, y);

swap(&x, &y);
printf("x is %i, y is %i\n", x, y);
}

void swap(int a, int b)

{
int tmp = *a;
*a = *b;
*b = tmp;
}

 The addresses of x and y are passed in from main to swap , and we use the int *a syntax to declare that our swap function
takes in pointers. We save the value of x to tmp by following the pointer a , and then take the value of y by following the pointer
b , and store that to the location a is pointing to ( x ). Finally, we store the value of tmp to the location pointed to by b ( y ), and
we’re done.
 If we call malloc too many times, we will have a heap over ow, where we end up going past our heap. Or, if we have too many functions
being called, we will have a stack over ow, where our stack has too many frames of memory allocated as well. And these two types of
over ow are generally known as buffer over ows, after which our program (or entire computer) might crash.

get_int
 We can implement get_int ourselves with a C library function, scanf :

#include <stdio.h>

int main(void)
{
int x;
printf("x: ");
scanf("%i", &x);
printf("x: %i\n", x);
}

 scanf takes a format, %i , so the input is “scanned” for that format, and the address in memory where we want that input to go. But
scanf doesn’t have much error checking, so we might not get an integer.
 We can try to get a string the same way:

#include <stdio.h>

int main(void)
{
char *s = NULL;
printf("s: ");
scanf("%s", s);
printf("s: %s\n", s);
}

 But we haven’t actually allocated any memory for s ( s is NULL , or not pointing to anything), so we might want to call char s[5]
9/11
to allocate an array of 5 characters for our string. Then, s will be treated as a pointer in scanf and printf .
 Now, if the user types in a string of length 4 or less, our program will work safely. But if the user types in a longer string, scanf
might be trying to write past the end of our array into unknown memory, causing our program to crash.

Files
 With the ability to use pointers, we can also open les:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
// Open file
FILE *file = fopen("phonebook.csv", "a");

// Get strings from user

char *name = get_string("Name: ");
char *number = get_string("Number: ");

// Print (write) strings to file

fprintf(file, "%s,%s\n", name, number);

// Close file
fclose(file);
}

 fopen is a new function we can use to open a le. It will return a pointer to a new type, FILE , that we can read from and write to.
The rst argument is the name of the le, and the second argument is the mode we want to open the le in ( r for read, w for write,
and a for append, or adding to).
 After we get some strings, we can use fprintf to print to a le.
 Finally, we close the le with fclose .
 Now we can create our own CSV les, les of comma-separated values (like a mini-spreadsheet), programmatically.

JPEG
 We can also write a program that opens a le and tells us if it’s a JPEG (image) le:

#include <stdio.h>

int main(int argc, char *argv[])

{
// Check usage
if (argc != 2)
{
return 1;
}

// Open file
FILE *file = fopen(argv[1], "r");
if (!file)
{
return 1;
}

// Read first three bytes

unsigned char bytes[3];
fread(bytes, 3, 1, file);

// Check first three bytes

if (bytes[0] == 0xff && bytes[1] == 0xd8 && bytes[2] == 0xff)
{
printf("Maybe\n");
}
else
{
10/11
printf("No\n");
}

// Close file
fclose(file);
}

 Now, if we run this program with ./jpeg brian.jpg , our program will try to open the le we specify (checking that we indeed get a
non-NULL le back), and read the rst three bytes from the le with fread .
 We can compare the rst three bytes (in hexadecimal) to the three bytes required to begin a JPEG le. If they’re the same, then our le
is likely to be a JPEG le (though, other types of les may still begin with those bytes). But if they’re not the same, we know it’s
de nitely not a JPEG le.
 We can use these abilities to read and write les, in particular images, and modify them by changing the bytes in them, in this week’s
problem set!

11/11
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 5
Pointers
Resizing arrays
Data structures
Linked Lists
More data structures

Pointers
 Last time, we learned about pointers, malloc , and other useful tools for working with memory.
 Let’s review this snippet of code:

int main(void)
{
int *x;
int *y;

x = malloc(sizeof(int));

*x = 42;
*y = 13;
}

 Here, the rst two lines of code in our main function are declaring two pointers, x and y . Then, we allocate enough memory for an
int with malloc , and stores the address returned by malloc into x .
 With *x = 42; , we go to the address pointed to by x , and stores the value 42 into that location.
 The nal line, though, is buggy since we don’t know what the value of y is, since we never set a value for it. Instead, we can write:

y = x;
*y = 13;

 And this will set y to point to the same location as x does, and then set that value to 13 .
 We take a look at a short clip, Pointer Fun with Binky (https://www.youtube.com/watch?v=3uLKjb973HU), which also explains this snippet
in an animated way!

Resizing arrays
 In week 2, we learned about arrays, where we could store the same kind of value in a list, side-by-side. But we need to declare the size of
arrays when we create them, and when we want to increase the size of the array, the memory surrounding it might be taken up by some
other data.
 One solution might be to allocate more memory in a larger area that’s free, and move our array there, where it has more space. But we’ll
need to copy our array, which becomes an operation with running time of O(n), since we need to copy each of n elements in an array.
 We might write a program like the following, to do this in code:

1/9
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
// Here, we allocate enough memory to fit three integers, and our variable
// list will point to the first integer.
int *list = malloc(3 * sizeof(int));
// We should check that we allocated memory correctly, since malloc might
// fail to get us enough free memory.
if (list == NULL)
{
return 1;
}

// With this syntax, the compiler will do pointer arithmetic for us, and
// calculate the byte in memory that list[0], list[1], and list[2] maps to,
// since integers are 4 bytes large.
list[0] = 1;
list[1] = 2;
list[2] = 3;

// Now, if we want to resize our array to fit 4 integers, we'll try to allocate
// enough memory for them, and temporarily use tmp to point to the first:
int *tmp = malloc(4 * sizeof(int));
if (tmp == NULL)
{
return 1;
}

// Now, we copy integers from the old array into the new array ...
for (int i = 0; i < 3; i++)
{
tmp[i] = list[i];
}

// ... and add the fourth integer:

tmp[3] = 4;

// We should free the original memory for list, which is why we need a
// temporary variable to point to the new array ...
free(list);

// ... and now we can set our list variable to point to the new array that
// tmp points to:
list = tmp;

// Now, we can print the new array:

for (int i = 0; i < 4; i++)
{
printf("%i\n", list[i]);
}

// And finally, free the memory for the new array.

free(list);
}

 It turns out that there’s actually a helpful function, realloc , which will reallocate some memory:

2/9
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
int *list = malloc(3 * sizeof(int));
if (list == NULL)
{
return 1;
}

list[0] = 1;
list[1] = 2;
list[2] = 3;

// Here, we give realloc our original array that list points to, and it will
// return a new address for a new array, with the old data copied over:
int *tmp = realloc(list, 4 * sizeof(int));
if (tmp == NULL)
{
return 1;
}
// Now, all we need to do is remember the location of the new array:
list = tmp;

list[3] = 4;

for (int i = 0; i < 4; i++)

{
printf("%i\n", list[i]);
}

free(list);
}

Data structures
 Data structures are programming constructs that allow us to store information in different layouts in our computer’s memory.
 To build a data structure, we’ll need some tools we’ve seen:
 struct to create custom data types
 . to access properties in a structure
 * to go to an address in memory pointed to by a pointer

Linked Lists
 With a linked list, we can store a list of values that can easily be grown by storing values in different parts of memory:

3/9
 This is different than an array since our values are no longer next to one another in memory.
 We can link our list together by allocating, for each element, enough memory for both the value we want to store, and the address of the
next element:

 By the way, NUL refers to \0 , a character that ends a string, and NULL refers to an address of all zeros, or a null pointer that we can
think of as pointing nowhere.
 Unlike we can with arrays, we no longer randomly access elements in a linked list. For example, we can no longer access the 5th element
of the list by calculating where it is, in constant time. (Since we know arrays store elements back-to-back, we can add 1, or 4, or the size of
our element, to calculate addresses.) Instead, we have to follow each element’s pointer, one at a time. And we need to allocate twice as
much memory as we needed before for each element.
 In code, we might create our own struct called node (like a node from a graph in mathematics), and we need to store both an int and a
pointer to the next node called next :

typedef struct node

{
int number;
struct node *next;
}
node;

 We start this struct with typedef struct node so that we can refer to a node inside our struct.
 We can build a linked list in code starting with our struct. First, we’ll want to remember an empty list, so we can use the null pointer: node
*list = NULL; .
 To add an element, rst we’ll need to allocate some memory for a node, and set its values:

node *n = malloc(sizeof(node));
// We want to make sure malloc succeeded in getting memory for us:
if (n != NULL)
{
// This is equivalent to (*n).number, where we first go to the node pointed
// to by n, and then set the number property. In C, we can also use this
// arrow notation:
n->number = 2;
// Then we need to store a pointer to the next node in our list, but the
// new node won't point to anything (for now):
n->next = NULL;
}

 Now our list can point to this node: list = n; :

4/9
 To add to the list, we’ll create a new node the same way, perhaps with the value 4. But now we need to update the pointer in our rst node
to point to it.
 Since our list pointer points only to the rst node (and we can’t be sure that the list only has one node), we need to “follow the
breadcrumbs” and follow each node’s next pointer:

// Create temporary pointer to what list is pointing to

node *tmp = list;
// As long as the node has a next pointer ...
while (tmp->next != NULL)
{
// ... set the temporary to the next node
tmp = tmp->next;
}
// Now, tmp points to the last node in our list, and we can update its next
// pointer to point to our new node.

 If we want to insert a node to the front of our linked list, we would need to carefully update our node to point to the one following it,
before updating list. Otherwise, we’ll lose the rest of our list:

// Here, we're inserting a node into the front of the list, so we want its
// next pointer to point to the original list, before pointing the list to
// n:
n->next = list;
list = n;

 And to insert a node in the middle of our list, we can go through the list, following each element one at a time, comparing its values, and
changing the next pointers carefully as well.
 With some volunteers on the stage, we simulate a list, with each volunteer acting as the list variable or a node. As we insert nodes into
the list, we need a temporary pointer to follow the list, and make sure we don’t lose any parts of our list. Our linked list only points to the
rst node in our list, so we can only look at one node at a time, but we can dynamically allocate more memory as we need to grow our list.
 Now, even if our linked list is sorted, the running time of searching it will be O(n), since we have to follow each node to check their values,
and we don’t know where the middle of our list will be.
 We can combine all of our snippets of code into a complete program:

5/9
#include <stdio.h>
#include <stdlib.h>

// Represents a node
typedef struct node
{
int number;
struct node *next;
}
node;

int main(void)
{
// List of size 0, initially not pointing to anything
node *list = NULL;

// Add number to list

node *n = malloc(sizeof(node));
if (n == NULL)
{
return 1;
}
n->number = 1;
n->next = NULL;
// We create our first node, store the value 1 in it, and leave the next
// pointer to point to nothing. Then, our list variable can point to it.
list = n;

// Add number to list

n = malloc(sizeof(node));
if (n == NULL)
{
return 1;
}
n->number = 2;
n->next = NULL;
// Now, we go our first node that list points to, and sets the next pointer
// on it to point to our new node, adding it to the end of the list:
list->next = n;

// Add number to list

n = malloc(sizeof(node));
if (n == NULL)
{
return 1;
}
n->number = 3;
n->next = NULL;
// We can follow multiple nodes with this syntax, using the next pointer
// over and over, to add our third new node to the end of the list:
list->next->next = n;
// Normally, though, we would want a loop and a temporary variable to add
// a new node to our list.

// Print list
// Here we can iterate over all the nodes in our list with a temporary
// variable. First, we have a temporary pointer, tmp, that points to the
// list. Then, our condition for continuing is that tmp is not NULL, and
// finally, we update tmp to the next pointer of itself.
for (node *tmp = list; tmp != NULL; tmp = tmp->next)
{
// Within the node, we'll just print the number stored:
printf("%i\n", tmp->number);
}
6/9
}

// Free list
// Since we're freeing each node as we go along, we'll use a while loop
// and follow each node's next pointer before freeing it, but we'll see
// this in more detail in Problem Set 5.
while (list != NULL)
{
node *tmp = list->next;
free(list);
list = tmp;
}
}

More data structures

 A tree is another data structure where each node points to two other nodes, one to the left (with a smaller value) and one to the right
(with a larger value):

 Notice that there are now two dimensions to this data structure, where some nodes are on different “levels” than others. And we can
imagine implementing this with a more complex version of a node in a linked list, where each node has not one but two pointers, one
to the value in the “middle of the left half” and one to the value in the “middle of the right half”. And all elements to the left of a node
are smaller, and all elemnts to the right are greater.
 This is called a binary search tree because each node has at most two children, or nodes it is pointing to, and a search tree because
it’s sorted in a way that allows us to search correctly.
 And like a linked list, we’ll want to keep a pointer to just the beginning of the list, but in this case we want to point to the root, or top
center node of the tree (the 4).
 Now, we can easily do binary search, and since each node is pointing to another, we can also insert nodes into the tree without moving all
of them around as we would have to in an array. Recursively searching this tree would look something like:

typedef struct node

{
int number;
struct node *left;
struct node *right;
} node;

// Here, *tree is a pointer to the root of our tree.

bool search(node *tree)
{
// We need a base case, if the current tree (or part of the tree) is NULL,
// to return false:
if (tree == NULL)
{
return false;
}
// Now, depending on if the number in the current node is bigger or smaller,
// we can just look at the left or right side of the tree:
else if (50 < tree->number)
{
return search(tree->left);
}
else if (50 > tree->number)
{
return search(tree->right);
}
// Otherwise, the number must be equal to what we're looking for:
else {
return true;
}
}

7/9
 The running time of searching a tree is O(log n), and inserting nodes while keeping the tree balanced is also O(log n). By spending a bit
more memory and time to maintain the tree, we’ve now gained faster searching compared to a plain linked list.
 A data structure with almost a constant time search is a hash table, which is a combination of an array and a linked list. We have an array
of linked lists, and each linked list in the array has elements of a certain category. For example, in the real world we might have lots of
nametags, and we might sort them into 26 buckets, one labeled with each letter of the alphabet, so we can nd nametags by looking in
just one bucket.

 We can implement this in a hash table with an array of 26 pointers, each of which points to a linked list for a letter of the alphabet:

 Since we have random access with arrays, we can add elements quickly, and also index quickly into a bucket.
 A bucket might have multiple matching values, so we’ll use a linked list to store all of them horizontally. (We call this a collision, when two
values match in some way.)
 This is called a hash table because we use a hash function, which takes some input and maps it to a bucket it should go in. In our example,
the hash function is just looking at the rst letter of the name, so it might return 0 for “Albus” and 25 for “Zacharias”.
 But in the worst case, all the names might start with the same letter, so we might end up with the equivalent of a single linked list again.
We might look at the rst two letters, and allocate enough buckets for 26*26 possible hashed values, or even the rst three letters, and
now we’ll need 26*26*26 buckets. But we could still have a worst case where all our values start with the same three characters, so the
running time for search is O(n). In practice, though, we can get closer to O(1) if we have about as many buckets as possible values,
especially if we have an ideal hash function, where we can sort our inputs into unique buckets.
 We can use another data structure called a trie (pronounced like “try”, and is short for “retrieval”):

 Imagine we want to store a dictionary of words ef ciently, and be able to access each one in constant time. A trie is like a tree, but
each node is an array. Each array will have each letter, A-Z, stored. For each word, the rst letter will point to an array, where the next
valid letter will point to another array, and so on, until we reach something indicating the end of a valid word. If our word isn’t in the
8/9
p y, , g g
trie, then one of the arrays won’t have a pointer or terminating character for our word. Now, even if our data structure has lots of
words, the lookup time will be just the length of the word we’re looking for, and this might be a xed maximum so we have O(1) for
searching and insertion. The cost for this, though, is 26 times as much memory as we need for each character.
 There are even higher-level constructs, abstract data structures, where we use our building blocks of arrays, linked lists, hash tables, and
tries to implement a solution to some problem.
 For example, one abstract data structure is a queue, where we want to be able to add values and remove values in a rst-in- rst-out (FIFO)
way. To add a value we might enqueue it, and to remove a value we would dequeue it. And we can implement this with an array that we
resize as we add items, or a linked list where we append values to the end.
 An “opposite” data structure would be a stack, where items most recently added (pushed) are removed (popped) rst, in a last-in- rst-out
(LIFO) way. Our email inbox is a stack, where our most recent emails are at the top.

 Another example is a dictionary, where we can map keys to values, or strings to values, and we can implement one with a hash table
where a word comes with some other information (like its de nition or meaning).
 We take a look at “Jack Learns the Facts About Queues and Stacks” (https://www.youtube.com/watch?v=2wM6_PuBIxY), an animation about
these data structures.

9/9
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 6
Python Basics
Examples
More features
Files
New features

Python Basics
 Today we’ll learn a new programming language called Python, and remember that one of the overall goals of the course is not learning
any particular languages, but how to program in general.
 Source code in Python looks a lot simpler than C, but is capable of solving problems in elds like data science. In fact, to print “hello,
world”, all we need to write is:

print("hello, world")

 Notice that, unlike in C, we don’t need to import a standard library, declare a main function, specify a newline in the print function,
or use semicolons.
 Python is an interpreted language, which means that we actually run another program (an interpreter) that reads our source code and runs
it top to bottom. For example, we can save the above as hello.py , and run the command python hello.py to run our code, without
having to compile it.
 We can get strings from a user:

answer = get_string("What's your name?\n")

print("hello, " + answer)

 We create a variable called answer , without specifying the type (the interpreter determins that from context for us), and we can
easily combine two strings with the + operator before we pass it into print .
 We can also pass in multiple arguments to print , with print("hello,", answer) , and it will automatically join them with spaces
for us too.
 print also accepts format strings like f"hello, {answer}" , which substitutes variables inside curly braces into a string.
 We can create variables with just counter = 0 . To increment a variable, we can use counter = counter + 1 or counter += 1 .
 Conditions look like:

if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else:
print("x is equal to y")

 Unlike in C and JavaScript (whereby braces { } are used to indicate blocks of code), the exact indentation of each line is what
determines the level of nesting in Python.
 And instead of else if , we just say elif .
 Boolean expressions are slightly different, too:

while True:
print("hello, world")

1/8
 We can write a loop with a variable:

i = 3
while i > 0:
print("cough")
i -= 1

 We can also use a for loop, where we can do something for each element in a list:

for i in [0, 1, 2]:

print("cough")

 Lists in Python are like arrays in C, but they can grow and shrink easily with the interpreter managing the implementation and
memory for us.
 This for loop will set the variable i to the rst element, 0 , run, then to the second element, 1 , run, and so on.
 And we can use a special function, range , to get some number of values, as in for i in range(3) . This will give us 0 , 1 , and 2 ,
for a total of thee values.
 In Python, there are many data types:
 bool , True or False
 float , real numbers
 int , integers
 str , strings
 range , sequence of numbers
 list , sequence of mutable values, that we can change or add or remove
 tuple , sequence of immutable values, that we can’t change
 dict , collection of key/value pairs, like a hash table
 set , collection of unique values
 docs.python.org (https://docs.python.org) is the of cial source of documentation, but Google and StackOver ow will also have helpful
resources when we need to gure out how to do something in Python. In fact, programmers in the real world rarely know everything in the
documentation, but rather how to nd what they need when they need it.

Examples
 We can blur an image with:

from PIL import Image, ImageFilter

before = Image.open("bridge.bmp")
after = before.filter(ImageFilter.BLUR)
after.save("out.bmp")

 In Python, we include other libraries with import , and here we’ll import the Image and ImageFilter names from the PIL library.
 It turns out, if we look for documention for the PIL library, we can use the next three lines of code to open an image called
bridge.bmp , run a blur lter on it, and save it to a le called out.bmp .
 And we can run this with python blur.py after saving to a le called blur.py .
 We can implement a dictionary with:

words = set()

def check(word):
if word.lower() in words:
return True
else:
return False

def load(dictionary):
file = open(dictionary, "r")
for line in file:
words.add(line.rstrip("\n"))
file.close()
return True

def size():
return len(words)

2/8
def unload():
return True

 First, we create a new set called words . Then, for check , we can just ask ` if word.lower() in words . For load , we open the le
and use words.add to add each line to our set. For size , we can use len to count the number of elements in our set, and nally,
for unload , we don’t have to do anything!
 It turns out, even though implementing a program in Python is simpler for us, the running time of our program in Python is slower than
our program in C since our interpreter has to do more work for us. So, depending on our goals, we’ll also have to consider the tradeoff of
human time of writing a program that’s more ef cient, versus the running time of the program.
 In Python, we can too include the CS50 library, but our syntax will be:

from cs50 import get_string

 Notice that we specify the functions we want to use.

 Now we can get strings from a user:

from cs50 import get_string

s = get_string("What's your name?:\n")

print("hello, " + s)

 We can substitute expressions into our format strings, too:

from cs50 import get_int

age = get_int("What's your age?\n")

print(f"You are at least {age * 365} days old.")

 And we can demonstrate conditions:

from cs50 import get_int

x = get_int("x: ")
y = get_int("y: ")

if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else:
print("x is equal to y")

 To check conditions, we can say:

from cs50 import get_string

s = get_string("Do you agree?\n")

if s == "Y" or s == "y":
print("Agreed.")
elif s == "N" or s == "n":
print("Not agreed.")

 Python doesn’t have chars, so we can check them as strings directly.

 We can also say if s in ["Y", "y"]: , or if s.lower() in ["y"]: . It turns out that strings in Python are like structs in C, where we
have not only variables but functions that we can call. For example, given a string s , we can call its lower function with
s.lower() to get the lowercase version of the string.
 We can improve versions of cough , too:

print("cough")
print("cough")
print("cough")

 We don’t need to declare a main function, so we just write the same line of code three times.
 But we can do better:

for i in range(3):
cough()

def cough(): 3/8

def cough():
print("cough")

 Notice that we don’t need to specify the return type of a new function, which we can de ne with def .
 But this causes an error when we try to run it: NameError: name 'cough' is not defined . It turns out that we need to de ne our
function before we use it, so we can either move our de nition of cough to the top, or create a main function:

def main():
for i in range(3):
cough()

def cough():
print("cough")

main()

 Now, by the time we actually call our main function, the cough function will have been read by our interpreter.
 Our functions can take inputs, too:

def main():
cough(3)

def cough(n):
for i in range(n):
print("cough")

main()

 We can de ne a function to get a positive integer:

from cs50 import get_int

def main():
i = get_positive_int()
print(i)

def get_positive_int():
while True:
n = get_int("Positive Integer: ")
if n > 0:
break
return n

main()

 Since there is no do-while loop in Python as there is in C, we have a while loop that will go on in nitely, but we use break to end
the loop as soon as n > 0 . Then, our function will just return n .
 Notice that variables in Python have function scope by default, meaning that n can be initialized within a loop, but still be accessible
later in the function.
 We can print out a row of question marks on the screen:

for i in range(4):
print("?", end="")
print()

 When we print each block, we don’t want the automatic new line, so we can pass a parameter, or named argument, to the print
function. Here, we say end="" to specify that nothing should be printed at the end of our string. Then, after we print our row, we can
call print to get a new line.
 We can also “multiply” a string and print that directly with: print("?" * 4) .
 We can print a column with a loop:

for i in range(3):
print("#")

 And without a loop: print("#\n" * 3, end="") .

4/8
 We can implement nested loops:

for i in range(3):
for j in range(3):
print("#", end="")
print()

 We don’t need to use the get_string function from the CS50 library, since we can use the input function built into Python to get a
string from the user. But if we want another type of data, like an integer, from the user, we’ll need to cast it with int() .
 But our program will crash if the string isn’t convertable to an integer, so we can use get_string which will just ask again.
 In Python, trying to get an integer over ow actually won’t work:

from time import sleep

i = 1
while True:
print(i)
sleep(1)
i *= 2

 We call the sleep function to pause our program for a second between each iteration.
 This will continue until the integer can no longer t in your computer’s memory.
 Floating-point imprecision, too, can be prevented by libraries that can represent decimal numbers with as many bits as are needed.
 We can make a list:

scores = []
scores.append(72)
scores.append(73)
scores.append(33)

print(f"Average: {sum(scores) / len(scores)}")

 With append , we can add items to our list, using it like a linked list.
 We can also declare a list with some values like scores = [72, 73, 33] .
 We can iterate over each character in a string:

from cs50 import get_string

s = get_string("Input: ")
print("Output: ", end="")
for c in s:
print(c, end="")
print()

 Python will get each character in the string for us.

 To make a string uppercase, too, we can just call s.upper() to get the uppercase version of the entire string, without having to iterate
over each character ourselves.

More features
 We can take command-line arguments with:

from sys import argv

for i in range(len(argv)):
print(argv[i])

 Since argv is a list of strings, we can use len() to get its length, and range() for a range of values that we can use as an index for
each element in the list.
 But we can also let Python iterate over the list for us:

from sys import argv

for arg in argv:

print(arg)

 We can return exit codes when our program exits, too:

5/8
from sys import argv, exit

if len(argv) != 2:
print("missing command-line argument")
exit(1)
print(f"hello, {argv[1]}")
exit(0)

 We import the exit function, and call it with the code we want our program to exit with.
 We can implement linear search by just checking each element in a list:

import sys

names = ["EMMA", "RODRIGO", "BRIAN", "DAVID"]

if "EMMA" in names:
print("Found")
sys.exit(0)
print("Not found")
sys.exit(1)

 If we have a dictionary, a set of key:value pairs, we can also check each key:

import sys

people = {
"EMMA": "617-555-0100",
"RODRIGO": "617-555-0101",
"BRIAN": "617-555-0102",
"DAVID": "617-555-0103"
}

if "EMMA" in people:
print(f"Found {people['EMMA']}")
sys.exit(0)
print("Not found")
sys.exit(1)

 Notice that we can get the value of of a particular key in a dictionary with people['EMMA'] . Here, we use single quotes (both single
and double quotes are allowed, as long they match for a string) to differentiate the inner string from the outer string.
 And we declare dictionaries with curly braces, {} , and lists with brackets [] .
 In Python, we can compare strings directly with just == :

from cs50 import get_string

s = get_string("s: ")
t = get_string("t: ")

if s == t:
print("Same")
else:
print("Different")

 Copying strings, too, works without any extra work from us:

from cs50 import get_string

s = get_string("s: ")

t = s

t = t.capitalize()

print(f"s: {s}")
print(f"t: {t}")

 Swapping two variables can also be done by assigning both values at the same time:

x = 1
y = 2

print(f"x is {x}, y is {y}")

6/8
print(f x is {x}, y is {y} )
x, y = y, x
print(f"x is {x}, y is {y}")

Files
 Let’s open a CSV le:

import csv
from cs50 import get_string

file = open("phonebook.csv", "a")

name = get_string("Name: ")

number = get_string("Number: ")

writer = csv.writer(file)
writer.writerow((name, number))

file.close()

 It turns out that Python also has a csv package (library) that helps us work with CSV les, so after we open the le for appending, we
can call csv.writer to create a writer from the le and then writer.writerow to write a row. With the inner parentheses, we’re
creating a tuple with the values we want to write, so we’re actually passing in a single argument that has all the values for our row.
 We can use the with keyword, which will helpfully close the le for us:

...
with open("phonebook.csv", "a") as file:
writer = csv.writer(file)
writer.writerow((name, number))

New features
 A feature of Python that C does not have is regular expressions, or patterns against which we can match strings. For example, its syntax
includes:
 . , for any character
 .* , for 0 or more characters
 .+ , for 1 or more characters
 ? , for something optional
 ^ , for start of input
 $ , for end of input
 For example, we can match strings with:

import re
from cs50 import get_string

s = get_string("Do you agree?\n")

if re.search("^y(es)?$", s, re.IGNORECASE):
print("Agreed.")
elif re.search("^no?$", s, re.IGNORECASE):
print("Not agreed.")

 First, we need the re package, or library, for regular expressions.

 Then, for y or yes , we have the regular expression ^y(es)?$ . We want to make sure that the string starts with y , and optionally
has es immediately after the y , and then ends.
 Similarly, for n and no , we want our string to start, have the letter n , and optionally the letter o next, and then end. The regular
expression for that would be ^no?$ .
 We pass in another argument, re.IGNORECASE , to ignore the casing of the letters in the string.
 If neither regular expression matches, we wouldn’t print anything.
 On our own Mac or PC, we can open a terminal after installing Python, and use the microphone to convert our speech to text:
7/8
O ou ow ac o C, we ca ope a te al a te stall g yt o , a d use t e c op o e to co ve t ou speec to te t:

import speech_recognition

recognizer = speech_recognition.Recognizer()
with speech_recognition.Microphone() as source:
print("Say something!")
audio = recognizer.listen(source)

print("Google Speech Recognition thinks you said:")

print(recognizer.recognize_google(audio))

 It turns out that there’s another library we can download, called speech_recognition , that can listen to audio and convert it to a
string.
 And now, we can match on the audio to print something else:

...
words = recognizer.recognize_google(audio)

# Respond to speech
if "hello" in words:
print("Hello to you too!")
elif "how are you" in words:
print("I am well, thanks!")
elif "goodbye" in words:
print("Goodbye to you too!")
else:
print("Huh?")

 We can even use regular expressions, to match on part of a string:

...
words = recognizer.recognize_google(audio)

matches = re.search("my name is (.*)", words)

if matches:
print(f"Hey, {matches[1]}.")
else:
print("Hey, you.")

 Here, we can get all the characters after my name is with .* , and print it out.
 We run detect.py and faces.py (https://cdn.cs50.net/2019/fall/lectures/6/src6/6/faces/), which nds each face (or even a speci c face) in a
photo.
 qr.py (https://cdn.cs50.net/2019/fall/lectures/6/src6/6/qr/) will also generate a QR code to a particular URL.

8/8
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 7
Spreadsheets
SQL
IMDb
Multiple tables
Problems

Spreadsheets
 Most of us are familiar with spreadsheets, rows of data, with each column in a row having a different piece of data that relate to each
other somehow.
 A database is an application that can store data, and we can think of Google Sheets as one such application.
 For example, we created a Google Form to ask students their favorite TV show and genre of it. We look thorugh the responses, and see
that the spreadsheet has three columns: “Timestamp”, “title”, and “genres”:

 We can download a CSV le from the spreadsheet with “File > Download”, upload it to our IDE, and see that it’s a text le with comma-
separated values matching the spreadsheet’s data.
 We’ll write favorites.py :

import csv

with open("CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv", "r") as file:
reader = csv.DictReader(file)

for row in reader:

print(row["title"])

 We’re just going to open the le and make sure we can get the title of each row.
 Now we can use a dictionary to count the number of times we’ve seen each title, with the keys being the titles and the values for each key
an integer, tracking how many times we’ve seen that title:
1/9
import csv

counts = {}

with open("CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv", "r") as file:
reader = csv.DictReader(file)

for row in reader:

title = row["title"]
if title in counts:
counts[title] += 1
else:
counts[title] = 1

for title, count in counts.items():

print(title, count, sep=" | ")

 In each row, we can get the title with row["title"] .

 Here, if we’ve seen the title before (it’s in counts ), we can just add 1 to the value. Otherwise, we need to set the initial value to 1.
 Finally, we can print out our dictionary’s keys and values with a separator so it’s a bit easier to read.
 We can change the way we iterate to for title, count in sorted(counts.items()): , and we’ll see our dictionary sorted by the keys,
alphabetically.
 But we can sort by the key-value pairs in the dictionary with:

def f(item):
return item[1]

for title, count in sorted(counts.items(), key=f, reverse=True):

 We de ne a function, f , which just returns the value from the item in the dictionary with item[1] . The sorted function, in turn,
can use that as the key to sort the dictionary’s items. And we’ll also pass in reverse=True to sort from largest to smallest, instead of
smallest to largest.
 We can actually de ne our function in the same line, with this syntax:

for title, count in sorted(counts.items(), key=lambda item: item[1], reverse=True):

 We pass in a lambda, or anonymous function, as the key, which takes in the item and returns item[1] .
 Finally, we can make all the titles lowercase with title = row["title"].lower() , so our counts can be a little more accurate even if the
names weren’t typed in the exact same way.

SQL
 We’ll look at a new program in our terminal window, sqlite3 , a command-line program that lets us use another language, SQL
(pronounced like “sequel”).
 We’ll run some commands to create a new database called favorites.db and import our CSV le into a table called “favorites”:

~/ $ sqlite3 favorites.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .import "CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv" favorites

 We see a favorites.db in our IDE after we run this, and now we can use SQL to interact with our data:

sqlite> SELECT title FROM favorites;

title 2/9
title
Dynasty
The Office
Blindspot
24
Friends
psych
Veep
Survivor
...

 We can even sort our results:

sqlite> SELECT title FROM favorites ORDER BY title;

title
/
24
9009
Adventure Time
Airplane Repo
Always Sunny
Ancient Aliens
...

 And get a count of the number of times each title appears:

sqlite> SELECT title, COUNT(title) FROM favorites GROUP BY title;

 We can even set the count of each title to a new variable, n , and order our results by that, in descending order. Then we can see the top
10 results with LIMIT 10 :

 SQL is a language that lets us work with a relational database, an application lets us store data and work with them more quickly than
with a CSV.
 With .schema , we can see how the format for the table for our data is created:

sqlite> .schema
CREATE TABLE favorites(
"Timestamp" TEXT,
"title" TEXT,
"genres" TEXT
);

 It turns out that, when working with data, we only need four operations:
 CREATE
 READ
 UPDATE
 DELETE
 In SQL, the commands to perform each of these operations are:
 INSERT
3/9
 SELECT
 UPDATE
 DELETE
 First, we’ll need to insert a table with the CREATE TABLE table (column type, ...); command.
 SQL, too, has its own data types to optimize the amount of space used for storing data:
 BLOB , for “binary large object”, raw binary data that might represent les
 INTEGER
 smallint
 integer

 bigint
 NUMERIC
 boolean
 date
 datetime
 numeric(scale,precision) , which solves oating-point imprecision by using as many bits as needed, for each digit before and
after the decimal point
 time
 timestamp
 REAL
 real , for oating-point values
 double precision , with more bits
 TEXT
 char(n) , for an exact number of characters
 varchar(n) , for a variable number of characters, up to a certain limit
 text
 SQLite is one database application that supports SQL, and there are many companies with server applications that support SQL, includes
Oracle Database, MySQL, PostgreSQL, MariaDB, and Microsoft Access.
 After inserting values, we can use functions to perform calculations, too:
 AVG
 COUNT
 DISTINCT , for getting distinct values without duplicates
 MAX
 MIN
 …
 There are also other operations we can combine as needed:
 WHERE , matching on some strict condition
 LIKE , matching on substrings for text
 LIMIT
 GROUP BY
 ORDER BY
 JOIN , combining data from multiple tables
 We can update data with UPDATE table SET column=value WHERE condition; , which could include 0, 1, or more rows depending on our
condition. For example, we might say UPDATE favorites SET title = "The Office" WHERE title LIKE "%office" , and that will set all the
rows with the title containing “of ce” to be “The Of ce” so we can make them consistent.
 And we can remove matching rows with DELETE FROM table WHERE condition; , as in DELETE FROM favorites WHERE title = "Friends"; .
 We can even delete an entire table altogether with another command, DROP .

IMDb
 IMDb, or “Internet Movie Database”, has datasets available to download (https://www.imdb.com/interfaces/) as TSV, or tab-separate values,
les.
 For example, we can download title.basics.tsv.gz , which will contain basic data about titles:
 tconst , a unique identi er for each title, like tt4786824
 titleType , the type of the title, like tvSeries
4/9
yp , yp ,
 primaryTitle , the main title used, like The Crown
 startYear , the year a title was released, like 2016
 genres , a comma-separated list of genres, like Drama,History
 We take a look at title.basics.tsv after we’ve unzipped it, and we see that the rst rows are indeed the headers we expected and each
row has values separated by tabs. But the le has more than 6 million rows, so even searching for one value takes a moment.
 We’ll download the le into our IDE with wget , and then gunzip to unzip it. But our IDE doesn’t have enough space, so we’ll use our
Mac’s terminal instead.
 We’ll write import.py to read the le in:

import csv

# Open TSV file for reading

with open("title.basics.tsv", "r") as titles:

# Since the file is a TSV file, we can use the CSV reader and change
# the separator to a tab.
reader = csv.DictReader(titles, delimiter="\t")

# Open new CSV file for writing

with open("shows0.csv", "w") as shows:

# Create writer
writer = csv.writer(shows)

# Write header of the columns we want

writer.writerow(["tconst", "primaryTitle", "startYear", "genres"])

# Iterate over TSV file

for row in reader:

# If non-adult TV show
if row["titleType"] == "tvSeries" and row["isAdult"] == "0":

# Write row
writer.writerow([row["tconst"], row["primaryTitle"], row["startYear"], row["genres"]])

 Now, we can open shows0.csv and see a smaller set of data. But it turns out, for some of the rows, startYear has a value of \N , and
that’s a special value from IMDb when they want to represent values that are missing. So we can lter out those values and convert the
startYear to an integer to lter for shows after 1970:

...
# If year not missing (We need to escape the backslash too)
if row["startYear"] != "\\N":

# If since 1970
if int(row["startYear"]) >= 1970:

# Write row
writer.writerow([row["tconst"], row["primaryTitle"], row["startYear"], row["genres"]])

 We can write a program to search for a particular title:

import csv

# Prompt user for title

title = input("Title: ")

# Open CSV file

with open("shows2.csv", "r") as input:

# Create DictReader
reader = csv.DictReader(input)

# Iterate over CSV file

for row in reader:

# Search for title

if title.lower() == row["primaryTitle"].lower():
print(row["primaryTitle"], row["startYear"], row["genres"], sep=" | ")

 We can run this program and see our results, but we can see how SQL can do a better job.
5/9
 In Python, we can connect to a SQL database and read our le into it once, so we can make lots of queries without writing new programs
and without having to read the entire le each time.
 Let’s do this more easily with the CS50 library:

import cs50
import csv

# Create database by opening and closing an empty file first

open(f"shows3.db", "w").close()
db = cs50.SQL("sqlite:///shows3.db")

# Create table called `shows`, and specify the columns we want,

# all of which will be text except `startYear`
db.execute("CREATE TABLE shows (tconst TEXT, primaryTitle TEXT, startYear NUMERIC, genres TEXT)")

# Open TSV file

# https://datasets.imdbws.com/title.basics.tsv.gz
with open("title.basics.tsv", "r") as titles:

# Create DictReader
reader = csv.DictReader(titles, delimiter="\t")

# Iterate over TSV file

for row in reader:

# If non-adult TV show
if row["titleType"] == "tvSeries" and row["isAdult"] == "0":

# If year not missing

if row["startYear"] != "\\N":

# If since 1970
startYear = int(row["startYear"])
if startYear >= 1970:

# Insert show by substituting values into each ? placeholder

db.execute("INSERT INTO shows (tconst, primaryTitle, startYear, genres) VALUES(?, ?, ?, ?)",
row["tconst"], row["primaryTitle"], startYear, genres)

 Now we can run sqlite3 shows3.db and run commands like before, such as SELECT * FROM shows LIMIT 10; .
 With SELECT COUNT(*) FROM shows; we can see that there are more than 150,000 shows in our table, and with SELECT COUNT(*) FROM
shows WHERE startYear = 2019; , we see that there were more than 6000 this year.

Multiple tables
 But each of the rows will only have one column for genres, and the values are multiple genres put together. So we can go back to our
import program, and add another table:

6/9
import cs50
import csv

# Create database
open(f"shows4.db", "w").close()
db = cs50.SQL("sqlite:///shows4.db")

# Create tables
db.execute("CREATE TABLE shows (id INT, title TEXT, year NUMERIC, PRIMARY KEY(id))")

# The `genres` table will have a column called `show_id` that references
# the `shows` table above
db.execute("CREATE TABLE genres (show_id INT, genre TEXT, FOREIGN KEY(show_id) REFERENCES shows(id))")

# Open TSV file

# https://datasets.imdbws.com/title.basics.tsv.gz
with open("title.basics.tsv", "r") as titles:

# Create DictReader
reader = csv.DictReader(titles, delimiter="\t")

# Iterate over TSV file

for row in reader:

# If non-adult TV show
if row["titleType"] == "tvSeries" and row["isAdult"] == "0":

# If year not missing

if row["startYear"] != "\\N":

# If since 1970
startYear = int(row["startYear"])
if startYear >= 1970:

# Trim prefix from tconst

id = int(row["tconst"][2:])

# Insert show
db.execute("INSERT INTO shows (id, title, year) VALUES(?, ?, ?)", id, row["primaryTitle"], startYear)

# Insert genres
if row["genres"] != "\\N":
for genre in row["genres"].split(","):
db.execute("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id, genre)

 So now our shows table no longer has a genres column, but instead we have a genres table with each row representing a show
and an associated genre. Now, a particular show can have multiple genres we can search for, and we can get other data about the
show from the shows table given its ID.
 In fact, we can combine both tables with SELECT * FROM shows WHERE id IN (SELECT show_id FROM genres WHERE genre = "Comedy") AND
year = 2019; . We’re ltering our shows table by IDs where the ID in the genres table has a value of “Comedy” for the genre column,
and has the value of 2019 for the year column.
 Our tables look like this:

 Since the ID in the genre table come from the shows table, we call it show_id . And the arrow indicates that a single show ID might
have many matching rows in the genres table.
7/9
 We see that some datasets from IMDb, like title.principals.tsv , have only IDs for certain columns that we’ll have to look up in other
tables.

 By reading the descriptions for each table, we can see that all of the data can be used to construct these tables:

 Notice that, for example, a person’s name could also be copied to the stars or writers tables, but instead only the person_id is
used to link to the data in the people table. This way, we only need to update the name in one place if we need to make a change.
 We’ll open a database, shows.db , with these tables to look at some more examples.
 We’ll download a program called DB Browser for SQLite (https://sqlitebrowser.org/dl/), which will have a graphical user interface to browse
our tables and data. We can use the “Execute SQL” tab to run SQL directly in the program, too.
 We can run SELECT * FROM shows JOIN genres ON show.id = genres.show_id; to join two tables by matching IDs in columns we specify.
Then we’ll get back a wider table, with columns from each of those two tables.
 We can take a person’s ID and nd them in shows with SELECT * FROM stars WHERE person_id = 1122; , but we can do a query inside our
query with SELECT show_id FROM stars WHERE person_id = (SELECT id FROM people WHERE name = "Ellen DeGeneres"); .
 This gives us back the show_id , so to get the show data we can run: SELECT * FROM shows WHERE id IN (...); with ... being the
query above.
 We can get the same results with:

SELECT title FROM

people JOIN stars ON people.id = stars.person_id JOIN
shows ON stars.show_id = shows.id
WHERE name = "Ellen DeGeneres"

 We join the people table with the stars table, and then with the shows table by specifying columns that should match between
the tables, and then selecting just the title with a lter on the name.
 But now we can select other elds from our combined tables, too.
 It turns out that we can specify columns of our tables to be special types, such as:
 PRIMARY KEY , used as the primary identi er for a row
 FOREIGN KEY , which points to a row in another table
 UNIQUE , which means it has to be unique in this table
 INDEX , which asks our database to create a index to more quickly query based on this column. An index is a data structure like a tree,
which helps us search for values.
 We can create an index with CREATE INDEX person_index ON stars (person_id); . Then the person_id column will have an index called
person_index . With the right indexes, our join query is several hundred times faster.

Problems
 One problem with databases is race conditions, where the timing of two actions or events cause unexpected behavior.
 For example, consider two roommates and a shared fridge in their dorm. The rst roommate comes home, and sees that there is no milk in
the fridge. So the rst roommate leaves to the store to buy milk, and while they are at the store, the second roommate comes home, sees
that there is no milk and leaves for another store to get milk Later there will be two jugs of milk in the fridge By leaving a note we can 8/9
that there is no milk, and leaves for another store to get milk. Later, there will be two jugs of milk in the fridge. By leaving a note, we can
solve this problem. We can even lock the fridge so that our roommate can’t check whether there is milk, until we’ve gotten back.
 This can happen in our database if we have something like this:

rows = db.execute("SELECT likes FROM posts WHERE id=?", id);

likes = rows[0]["likes"]
db.execute("UPDATE posts SET likes = ?", likes + 1);

 First, we’re getting the number of likes on a post with a given ID. Then, we set the number of likes to that number plus one.
 But now if we have two different web servers both trying to add a like, they might both set it to the same value instead of actually
adding one each time. For example, if there are 2 likes, both servers will check the number of likes, see that there are 2, and set the
value to 3. One of the likes will then be lost.
 To solve this, we can use transactions, where a set of actions is guaranteed to happen together.
 Another problem in SQL is called a SQL injection attack, where an adversary can execute their own commands on our database.
 For example, someone might try type in [email protected]'-- as their email. If we have a SQL query that’s a formatted string (without
escaping, or substituting dangerous characters from, the input), such as f"SELECT * FROM users WHERE username = '{username}' AND
password = '{password}'" , then the query will end up being f"SELECT * FROM users WHERE username = '[email protected]'--' AND
password = '{password}'" , which will actually select the row where username = '[email protected]' and turn the rest of the line into a
comment. To prevent this, we should use ? placeholders for our SQL library to automatically escape inputs from the user.

9/9
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Lecture 8
A Look Back
Privacy

A Look Back
 Just a few weeks ago, 2/3rd of us had never taken a CS course before. We started with making programs in Scratch, struggled through
using C to write loops and eventually implementing more applicable algorithms, and nally took advantage of higher-level languages like
Python and its packages, and SQL, to solve even more interesting problems.
 In week 0, we said:
 what ultimately matters in this course is not so much where you end up relative to your classmates but where you end up relative to
yourself when you began
 And now we can look back to see how far we’ve come.
 Indeed, David’s own notes from when he took CS50 in 1996 includes concepts like algorithms, functions, and arguments.
 To start solving problems with algorithms, we need to represent inputs and outputs. So we can use binary to represent data, whether that’s
numbers, letters, or pixels in images.
 We demonstrate binary search in a phone book by dividing the book in half each time.
 Precision and correctness are both critical in programming, since computers can’t infer “what we mean”. We demonstrate this with a
volunteer giving the audience instructions on how to draw an image. We see that abstractions (“draw a stick gure”) can be useful, but we
lose some precision when we use them.

Privacy
 Computer science, in essence, is about the processing and storage of information. But we need to also consider not just what we can do,
but whether we should do it.
 For example, we use passwords to protect many of our accounts and data, but the top 10 passwords are just:
1. 123456
2. 123456789
3. qwerty
4. password
5. 111111
6. 12345678
7. abc123
8. 1234567
9. password1
10. 12345
 But unfortunately, even a more complex password can be quickly guessed by modern computers. We can write a program in just a few
minutes, that will generate all possible PINs and check them. We can even open a dictionary le that has all English words, and iterate
over each of them.
 Cookies are small pieces of data that websites store on our computers when we visit them, useful for identifying us such that we don’t
have to log in on every visit, but can also be used for advertising and tracking purposes.

1/3
 In Chrome, we can use View > Developer > Developer Tools to see the cookies that a particular site leaves under the “Network” tab:

 And on other websites, where Google’s ads might be embedded, Google can track us there, too, with the same cookie.
 And the request that our web browser sends to each site also includes a string called “user-agent”, which describes the version of the
browser we have.
 On the internet, too, we have unique IP addresses that identify us so that we can receive responses from servers.
 We also explored how we might recover “deleted” photos in a problem set, and services like Snapchat that promise to delete photos after
some time, may not actually remove the data.
 In fact, a “soft delete” might set a value of “deleted” to be “true” to hide it from us, but the rest of the data is still stored.
 Photos of ourselves on social media, too, can help someone else track us, what we do, and who we’re with.
 In the Chrome’s Developer Tools again, we can run some code in a website that prompts us to share our location and then puts it on the
screen:

 We’ll now have the opportunity to explore one of four tracks: web programming, mobile app development for either iOS or Android, and
game de elopment ith L a 2/3
game development with Lua.
 With these new skills, we’ll be working on a nal project of our own design, solving a problem in the real world that we’re interested in.
 We’ll have an overnight hackathon, focused on collaborating with classmates and staff on our nal projects.
 Finally, we’ll have the CS50 Fair, where we’ll celebrate our nal projects to friends and visitors.
 We give a big thanks to our staff, without whom this course would not be possible!

3/3
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Web

What to Do

1. After watching Introduction, HTTP, HTML, CSS, JavaScript, and Homepage, submit Homepage.
2. After watching Flask, Databases, and Finance, submit Finance.

When to Do It

By 11:59pm on 31 December 2020.

How to Do It

 Source Code

Introduction

 In this track, we’ll write programs that can run on the internet. We’ll rst learn about the basics of the internet and how it works, and then
dive into the languages of the internet, from HTML and CSS to JavaScript to frameworks in Python and SQL that can turn a webpage into
an application.

HTTP

1/9
 Computers talk to each other across the network by sending and receiving messages. At the most basic level, there are standard protocols,
or rules to follow, for sending and receiving messages. In the context of the internet, the standard protocol is TCP/IP, Transmission Control
Protocol and Internet Protocol. We can think of this at a high-level as sending a letter in the mail, with an address for the recipient and the
address of the sender. On the internet, computers have IP addresses, usually in the format #.#.#.# , so our digital envelope might include
1.2.3.4 for the address of the computer we want to message, and our own address 5.6.7.8 , so that we can get a response.
 [2:16] With four numbers of one byte each, an IP address is 32 bits, which only allows us to count up to about 4 billion. It turns out that we
now have more devices than 32 bits will support, and so in addition to IPv4, the protocol with 32-bit addresses, we also have IPv6, a
protocol with 128-bit addresses.
 [4:10] In addition to the address of the recipient, we also specify a port number, or a number assigned to a particular service or type of
message, like emails, webpages, or les. This way, the recipient computer can process incoming messages with the right program. So our
envelope might say 1.2.3.4:80 .
 [5:50] But when we visit a website, we probably type in something like example.com , and it turns out that there’s something called DNS,
Domain Name System, which maps domain names to IP addresses of the servers that can respond for that domain.
 [7:40] And we might notice URLs are the form http://www.example.com , and HTTP is short for another protocol, Hypertext Transfer
Protocol, which essentially describe the format of the contents inside each digital envelope. The content of a request in HTTP might look
like:

GET / HTTP/1.1
Host: www.example.com
...

 The rst parameter, GET , speci es what the action we’re trying to do here, which is just getting something. The next one, / , stands
for the root, or the top-most directory. Finally, HTTP/1.1 is the version of protocol we’re asking to use. We also specify the host, or the
website, since the same server might be able to handle multiple, and there’s also additional information in a request that are less
important.
 [10:15] The response we get back might look like:

HTTP/1.1 200 OK
Content-Type: text/html
...

 Here we get an HTTP status code of 200, which means “OK”, and then a line describing the type of content. HTML, Hypertext Markup
Language, is a format that webpages use to markup content. Finally, we’ll get the actual data for the page.
 [11:40] Other common status codes include 404, for a page not found, and 500 for an internal server error, where the server itself had an
error trying to respond.
 [13:05] We can open Google Chrome, and open the Developer Tools panel. In the Network tab, we can load a site, and see lots of requests.
At the very top, we can see the original request for google.com , and we’ll see the Request Headers that we sent, and the Response
Headers we got back. In fact, the rst response we got back was HTTP/1.1 301 Moved Permanently , to http://www.google.com , since by
convention URLs for websites start with www . Next, we get redirected to https://www.google.com , with the more secure, encrypted
ersion of HTTP In this response e nall get a 200 OK code and some content to load the page Later e’ll be riting o r o n ser er 2/9
version of HTTP. In this response, we nally get a 200 OK code and some content to load the page. Later, we ll be writing our own server
programs that return these codes and content in response to requests from browsers.

HTML

 Now that our computers can communicate over the internet, we can take a closer look at the actual data we get back. In Chrome, we can
go to View > Developer > View Source, to see the HTML, Hypertext Markup Language, that makes up the text-based content of a webpage.
 [1:30] We’ll look at a simple HTML page, where we rst declare to the browser of the version and format of the page. Then, we have a tag,
<html> , which starts the HTML content. Generally, HTML is made up of lots of nested tags that map to a tree structure, with opening tags
and closing tags that determine the structure of the page. Next we have the <head> tag, which includes metadata, data about the page,
such as the <title> tag inside that de nes what the title of the webpage will be, as displayed in the tab of the browser. After, we have
the <body> tag, which contains the visible content displayed by the browser.
 [6:00] In the CS50 IDE, we can start by writing this code in a le called index.html . And the CS50 IDE has a built-in server we can use. In
the terminal, we can run http-server , and there will be a URL for our IDE’s server that we can open. Then, we’ll see the les in our IDE,
and we can open index.html . We can change our le, save, and refresh to see what it looks like.
 [10:20] We take a look at an example where we use an <img> tag to display an image. Here, we add attributes, or additional parameters
to the tag, like src="cat.jpg" to indicate that the source of the image is a le called cat.jpg , and alt="" to indicate alternative text
for the image. And the <img> tag doesn’t have a closing tag, since it doesn’t make sense for there to be other tags inside the image.
 [13:30] We add links to go between pages with the <a> , or anchor, tag. Notice that we can have any text for any URL for our link, so we
should pay attention to the URL we end up at.
 [18:00] We can add additional elements, like paragraphs with the <p> tag, headings with <h1> or <h2> , or tables with <table> .
 [22:35] We’ll add aesthetic styling like borders and colors later, but we can think about HTML as describing the structure of the content of
our webpage.
 [22:55] We’ll add a <form> element with some <input> elements where we can get some information from the user. Finally, we can
redirect ourself to Google’s search page for whatever we typed in, by using https://www.google.com/search . We noticed that
https://www.google.com/search?q=cats takes us to a search page for cats, and the ? indicates some HTTP GET parameters, where here
we have a q , or query, parameter, with the value cats . So our form can have an action that submits our text input with name="q" , to
https://www.google.com/search .
 [29:35] There are so many more HTML elements. We can likely nd an HTML tag that lets us add a particular feature, just by searching
Google for relevant documentation.

CSS

3/9
 To style webpages, we’ll use another language, CSS, Cascading Style Sheets.
 [0:40] First, in our HTML, we’ll need to add a style attribute to a tag, and set the value to something like style="color: blue;" . The
key-value pairs in the style will change how the browser displays the element. In fact, we can add a style to the <body> , and all the
elements inside the body will inherit the style unless they speci cally have a different style.
 [5:20] We can also change the alignment, like centering or right-aligning text, or the font size. We can add multiple properties by
separating them with semicolons.
 [8:40] We might have multiple elements of the same type, like <h1> , and we can add a common set of styles in the <head> element with
the <style> tag. In that tag, we can specify that all h1 elements share some set of styles.
 [14:00] If we want set the same styles to multiple types of elements, we can add classes, which we can think of as names, to any number
and type of element. We’ll do this by adding the class="title" attribute, with a class name of our choosing, to elements we want to style
the same way. Then, in our CSS we can select all elements with the class with .title .
 [18:25] We can create another class, and even give the same element multiple classes with class="title green" , and the styles for both
will apply.
 [20:40] We can include CSS in a separate le, like styles.css , so all of our webpages can share the same styles. We’ll use a new tag,
<link> , to link a le to our HTML page. And we can include many different CSS les, each of which having some subset of styles.
 [24:00] With CSS, we can also style tables in HTML by selecting the table , tr , and td classes. By looking at CSS documentation online,
we can gure out what styles will give us the border styles we want.
 [27:40] We can add padding, or spacing, within each table data cell. And we can select the rst row by adding a class like header , or use a
special table header cell element <th> that we can select precisely.
 [31:05] It turns out that there are lots of CSS libraries, written by other people, that will include styles for common elements that can
quickly apply a theme or aesthetic to our HTML. Bootstrap is one such popular library, and its documentation will include a <link>
element we can add, such that our page will use Boostrap’s CSS les. The documentation will also show us various components we can
use, and classes we can use to style them easily. A <div> element in HTML is like a generic container or section, so we’ll see that
commonly used for elements that don’t have a more semantic HTML tag.

JavaScript

4/9
 To build a more interactive website, we’ll need a programming language that will allow us to run code on the browser that changes how it
behaves with our webpage, beyond just the content and style. The language that we’ll use is JavaScript, a language that browsers can
interpret and run, with syntax similar to that of C.
 [0:35] We take a look at syntax for declaring and changing variables, conditions, loops, and functions.

 [5:00] A simple webpage has elements that we can represent as a graphical tree, where each nested element is a child of a node in the
tree. This is called the Document Object Model, and JavaScript can manipulate, or change this, without having to refresh the page.
 [7:15] We’ll add JavaScript to our page with a <script> tag inside our <head> tag. We can call a built-in function, alert() , to show an
alert on our page. After we save our le, we can run a server in our IDE with http-server , and see our page.
 [9:20] We can add a form, and have our form call a function and return false; to stop any default behavior after our function is called.
 [12:00] Our form can have a text eld, and our JavaScript button can get its value. Fist, we need to add an ID to our element with an
attribute to the element, like id="name" . And in Javascript, we can use document.querySelector('#name') to get that element by its id.
 [17:25] We can change our alert to display something else with a condition.
 [18:45] Instead of just reading the content of the DOM, we can also change the contents of elements by setting their innerHTML property,
after selecting them with document.querySelector .
 [22:00] We’ll look at another example that has a counter, or a variable that we can increment by pressing a button.
 [24:25] It turns out that we can even change these variables or call these functions in our browser, with View > Developer > Developer
Tools in Chrome. In the Console tab, we can type in JavaScript code, and it will run in our page. If our JavaScript code has errors, those
errors will also show up in the console.
 [26:00] We can dynamically change the style of the page. We’ll create three buttons, each with a unique id . And in our script tag, we’ll
select each button, and we’ll set their onclick property to a function that our browser will call when the button is clicked. We can create
an anonymous function, or a function with no name, directly with function() { ... } , instead of de ning it separately rst. And in our
function, we can select the body tag by type since there’s only one of them on our page, and set the style.backgroundColor property to
a color.
 [30:25] It turns out that we can’t add the onclick function in the beginning of our JavaScript code, since our browser interprets the code
from top to bottom, and our code can’t nd the buttons. There are a few ways to solve this problem, but for now we can simply move our
script tag to the end of our body tag.
 [33:55] The onclick function is an event handler, or a function that is called when an event happens. There are many such events that we
can listen for, like a change to the selected option in a dropdown menu. We’ll look at another example, where we add onChange to a
<select> element. Here, inside our event handler function, we can use this.value() to get the value of the option that was just
selected. We can think of this as a special variable that contains some kind of context for how a function is called. In this case, this is
the event that triggered our event handler.
 [39:20] We can update our page periodically with window.setInterval , which calls a function for us at some interval of time. We’ll create
a function, blink() , that will change the body ’s visibility to be either visible or hidden .
 [43:10] We can also create a separate le like blink.js , where we only have our JavaScript code, and include it in our HTML le with
<script src="blink.js"></script> .
 [44:45] Finally, we can ask the browser to give the user’s location to our JavaScript code, with
navigator.geolocation.getCurrentPosition . The argument we pass in is a callback function, or a function that will be called by the
browser when the getCurrentPosition nishes running. Inside our function, we’ll just write the coordinates we get to the page.
 [47:05] With JavaScript, we can read and write to the DOM, and take advantage of even more features that browsers provide.

Homepage

5/9
 Our rst assignment will be to create a homepage of our choice using HTML, CSS, and JavaScript.
 We’ll create four different pages in HTML, each linked to one another somehow. Recall that we can use the <a> tag, with the link to
another le in our IDE.
 We’ll also use at least ve different CSS selectors, for ve different types elements, classes, or IDs. And we’ll want to use at least ve
different properties overall to style our page, and documentation online will help us nd what we’re looking for. We’ll also use the
Bootstrap library to style at least one of our components, so we don’t have to write the CSS ourselves for that.
 Finally, after we’ve written the content for our pages and styled them, we’ll use JavaScript to make our page interactive somehow, through
alerts, buttons, dropdowns, forms, intervals, or even more.
 Be as creative as you’d like!

Flask

 So far, we’ve learned how to write webpages that are saved as a le and returned by an HTTP server. But we can also have web servers, or
applications, that generate content dynamically before returning it as a response.
 [1:00] We’ll use a framework in Python called Flask, which allows us to write a web server with many features. We’ll create a new folder in
our IDE, called hello/ , and create a new le called application.py . By reading the documentation and experimenting, we can write our
rst Flask application which returns something for the / route. And in our terminal, we can cd into our folder and run flask run ,
which will nd our application.py le and run it. We’ll open the URL, and see our returned string.
 [4:10] We’ll add another route, /goodbye , and a function that returns different content. We can return any content we want in our routes.
 [6:00] It turns out that Flask allows us to use template les, or les with HTML that are like format strings, with some parts that are the
same every time, and some parts that will contain variables that we can substitute in. The render_template function in the Flask library
will allow us to use templates and plug in variables like ``.
 [10:35] We can generate a random number, for example, and display it each time our page is loaded. We can use control + c to stop our
server, and then restart it, to make sure any changes we make are reloaded. And once we load our page in the browser, we can view its
source to make sure that Flask substituted our variable as we expected.
 [13:25] We can add conditions to our templates, with if ... , so depending on the value of our variables, we can return different content
6/9
[ 3: 5] We ca add co d t o s to ou te plates, w t ... , so depe d g o t e value o ou va ables, we ca etu d e e t co te t
entirely.
 [16:25] We can even write a form that our server can accept, with another route that the form can submit to. Then, in that route, our server
can receive and use the form data. We write a form that has a name input, and write a route function that gets the input with
request.args.get() , and returns a template with the input substituted in.
 [21:30] We see an Internal Server Error, and in our terminal we see the error that request is not de ned, and it turns out that we need to
import it from Flask. We try again, and see that the GET parameters in the URL changes based on what we submit in the form.
 [24:00] We can add additional logic in our route to handle the case where name is empty, and return a different template.
 [26:00] It turns out that we can have templates for our templates, since many of our pages might have similar HTML code around its
content. We’ll create layout.html , and add a special block inside the <body> tag. Then, our other les like index.html can use the
template with extends "layout.html" , and only have the content block for the body .
 [30:35] And we can add additional blocks, like for content we would want to have inside a <style> tag in the page.
 [32:20] We’ll start writing a new application by creating a new folder called tasks , and creating an application.py le. Inside, we’ll
create routes for / to list tasks and /add to add a new task. We’ll create a templates folder with a layout.html before, a tasks.html
showing a list of items, and a add.html that includes a simple form. We’ll have our routes render each of these templates, and set our
form to use a new method, POST , to send the form’s data back to the /add route. Our add() function can then either display the form
for a GET request, or create a new task for a POST request.
 [42:30] We can create a global variable, todos , to store a list of task names that we can display later. In our add() function, if we get a
POST request with some data, we’ll add the new task name to our list on the server, and redirect back to the default route, which will
show a list.
 [44:15] And in our tasks.html template, we can loop over our todos list variable with for todo in todos , and create a <li> element
with the contents set to each item.
 [48:00] We can also make sure that the task name is not empty, by adding some JavaScript code that only enables the submit button if the
input eld’s value is not empty. Otherwise, we disable the submit button. We do this by adding an event handler to listen to the onkeyup
event for our task input, which is triggered by the browser every time the user presses a key and releases it.
 [52:40] But our task list goes away when we stop and start our web server, since we initialize our todos variable to an empty list each
time. Next, we’ll use a database with SQL to store and modify data.

Databases

 So far, we’ve learned how to write a server that can respond with webpages that are the same for every user. But there are websites where
we can log in, and it will show us information speci c to us.
 Recall that cookies are small les that websites ask our browser to store on our computer, with some kind of identi er that our browser
shows the website the next time we go there, so the website knows who we are. This allows our server to have sessions, or data for users’
interactions with a website, speci c to each of them.
 [1:20] We’ll look at the task list application we made last time. Since our task list was stored in a global variable in our server application,
everyone who visits our page will see the same list.
 [2:40] To solve this, we can use sessions from Flask, by importing and initializing their implementation. By doing so, our tasks() function
7/9
[ ] , , y p g g p y g , ()
can look in the global session variable, and read, set, or update a todos key within it. Flask will take care of making sure that the global
session variable is actually speci c to the user who made that request, by storing and checking some cookies.
 [7:30] If we want to store more complex data, it would make more sense to use a database instead of session objects. So we’ll create a new
application to store registration information, like names and emails.
 [9:25] We’ll make a new empty le, lecture.db , and run sqlite3 lecture.db to create a table and set column names and types for the
data we think we’ll need.
 [11:00] In sqlite3 , we can run queries to select or insert into the table to check that everything works. In our new Flask application, we’ll
import the SQL library from CS50 so we can work with our database more easily, and establish a connection to our lecture.db le. In our
/ route, we can run a SELECT query to get the rows from our registrants table, and pass them into our template. Our template will in
turn iterate over each row, and generate an <li> item with the values of each column in each row.

 [17:35] Once we have our index route, we can add more rows to our table with the sqlite3 prompt, and see our server return the new
data.
 [18:05] We can add a new route to our application that will insert new data, too. In our register() function, we can return a
register.html le with a form that has the inputs we need, and ensure that the form submits to our register route with the POST
method. Then, in our register route, we can check for a POST request, insert the data from the request into our table, and redirect to
the main route. In our SQL query, we’ll be careful to substitute our variables safely with the db.execute function, instead of combining
the strings ourselves, to avoid SQL injection attacks.
 [23:05] We’ll try out our application, and everything seems to be working as we expect. To improve the design of our server’s code, we’ll
factor out some common template code into layout.html , and create an apology.html page where we’ll tell the user an error message if
something in their form is blank.
 [28:40] Now we can write Flask applications to read and store data in a database, saving our data ef ciently for the long term.

Finance

 We’ll take the concepts we’ve seen to create CS50 Finance, a virtual stock trading website with an account for users to register for, the
ability to get quotes for shares of stocks and to virtually buy or sell them. We’ll also have a history page for each account to see what we’ve
done in the past.
 [2:45] We look at the distribution code for CS50 Finance, or the code that we’ll all start off with. We have an application.py le that our
Flask app will run, with various con guration options, a connection to a database le finance.db , and routes for . This follows the MVC,
Model-View-Controller, pattern, which generally separates the concerns of data and how that’s stored (our database), the views that display
some amount of data (our templates), and controllers that control the logic of what is displayed when (our application.py routes).
 [4:45] Since we’re using a third-party API, or Application Programming Interface, some code that someone else wrote designed for us to
use, we’ll also need an API key to get stock information.
 [5:30] Notice that our routes also have a @login_required decorator, or extra attribute in Python to indicate that the function should
behave differently. Flask allows us to automatically redirect users to a login page, and we have the login functionality implemented in our
distribution code too. The /login route checks whether a matching user and password exists in our database (for a POST method, as
from the login form), or displays the login form for a GET method. And in our database, instead of storing the user’s raw password, which
8/9
is more insecure since hackers might use them against other websites, we store the hash of their password which is suf cient for
veri cation, but dif cult from which to recover the original password.
 [14:30] After the login route we have logout , which just clears the session, and we have quote , register , and sell routes left to
implement.
 [15:10] We’ll implement:
 register so we can register for a new account
 quote so we can get a price quote for a stock
 buy to buy some shares of a stock
 index to show the stocks in our account
 sell to sell some shares of a stock

 history to show transactions in the past

 and a personal feature of our choice
 [15:55] We talk about the requirements for each of these routes, and how they might be implemented with conditions based on the
request’s method, and either display forms or perform some action after validating the request.
 [20:50] We have an existing finance.db database, and we can use sqlite3 finance.db to run queries that add columns or tables that we
might want to use to store additional data to support our routes.
 [23:00] index will query our database for a user’s stocks and their cash balance, along with using an API to get the current price of each
and displaying all this data with a template. sell , too, should have validation and update our data in the database.
 [25:25] Finally, we might need another table (in our database) to support our history page, and display the data for each user’s
transactions in a table (in our template).
 [26:25] And we’ll need to add a personal touch, whether that’s allowing users to change their password, add cash, or additional features.

Conclusion

 In this track, we learned about how computers communicate over an internet, structured web pages with HTML and styled them with CSS,
and added some interactivity with JavaScript. Then we learned how to write a web server application with Flask, that can dynamically
generate web pages and use a database to read and write data.

9/9
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Homepage
Build a simple homepage using HTML, CSS, and JavaScript.

Background

The internet has enabled incredible things: we can use a search engine to research anything imaginable, communicate with friends and family
members around the globe, play games, take courses, and so much more. But it turns out that nearly all pages we may visit are built on three
core languages, each of which serves a slightly different purpose:

1. HTML, or HyperText Markup Language, which is used to describe the content of websites;
2. CSS, Cascading Style Sheets, which is used to describe the aesthetics of websites; and
3. JavaScript, which is used to make websites interactive and dynamic.

Create a simple homepage that introduces yourself, your favorite hobby or extracurricular, or anything else of interest to you.

Getting Started

Here’s how to download this problem’s “distribution code” (i.e., starter code) into your own CS50 IDE. Log into CS50 IDE (https://ide.cs50.io/)
and then, in a terminal window, execute each of the below.

1. Execute cd to ensure that you’re in ~/ (i.e., your home directory).

2. Execute mkdir pset8 to make (i.e., create) a directory called pset8 in your home directory.
3. Execute cd pset8 to change into (i.e., open) that directory.
4. Execute wget https://cdn.cs50.net/2019/fall/tracks/web/homepage/homepage.zip to download a (compressed) ZIP le with this
problem’s distribution.
5. Execute unzip homepage.zip to uncompress that le.
6. Execute rm homepage.zip followed by yes or y to delete that ZIP le.
7. Execute ls . You should see a directory called homepage , which was inside of that ZIP le.
8. Execute cd homepage to change into that directory.
9. Execute ls . You should see this problem’s distribution, including index.html and styles.css .
10. You can immediately start a server to view the site by running

$ http-server

in the terminal window and clicking on the link that appears.

Speci cation

Implement in your homepage directory a website that must:

 Contain at least four different .html pages, at least one of which is index.html (the main page of your website), and it should be
possible to get from any page on your website to any other page by following one or more hyperlinks.
 Use at least ten (10) distinct HTML tags besides <html> , <head> , <body> , and <title> . Using some tag (e.g., <p> ) multiple times still
counts as just one (1) of those ten!
I f f B i i B i l lib (h ihl f CSS l d ) 1/3
 Integrate one or more features from Bootstrap into your site. Bootstrap is a popular library (that comes with lots of CSS classes and more)
via which you can beautify your site. See Bootstrap’s documentation (https://getbootstrap.com/docs/4.1/getting-started/introduction/) to
get started. To add Bootstrap to your site, it suf ces to include

<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">

in your pages’ <head> , below which you can also include

<link href="styles.css" rel="stylesheet">

to link your own CSS.

 Have at least one stylesheet le of your own creation, styles.css , which uses at least ve (5) different CSS selectors (e.g. tag ( example ),
class ( .example ), or ID ( #example )), and within which you use a total of at least ve (5) different CSS properties, such as font-size , or
margin ; and
 Integrate one or more features of JavaScript into your site to make your site more interactive. For example, you can use JavaScript to add
alerts, to have an effect at a recurring interval, or to add interactivity to buttons, dropdowns, or forms. Feel free to be creative!
 Ensure that your site looks nice on browsers both on mobile devices as well as laptops and desktops.

Testing

If you want to view how your site looks while you work on it, there are two options:

1. Within CS50 IDE, navigate to your homepage directory (remember how?) and then execute

$ http-server

1. Within CS50 IDE, right-click (or Ctrl+click, on a Mac) on the homepage directory in the le tree at left. From the options that appear, select
Serve, which should open a new tab in your browser (it may take a second or two) with your site therein.

Recall also that by opening Developer Tools in Google Chrome, you can simulate visiting your page on a mobile device by clicking the phone-
shaped icon to the left of Elements in the developer tools window, or, once the Developer Tools tab has already been opened, by typing
Ctrl + Shift + M on a PC or Cmd + Shift + M on a Mac, rather than needing to visit your site on a mobile device separately!

Assessment

No check50 for this assignment! Instead, your site’s correctness will be assessed based on whether you meet the requirements of the
speci cation as outlined above, and whether your HTML is well-formed and valid. To ensure that your pages are, you can use the W3Schools
HTML Validator (https://validator.w3.org/#validate_by_input) service, copying and pasting your HTML directly into the provided text box. Take
care to eliminate any warnings or errors suggested by the validator before submitting!

Consider also:

 whether the aesthetics of your site are such that it is intuitive and straightforward for a user to navigate;
 whether your CSS has been factored out into a separate CSS le(s); and
 whether you have avoided repetition and redundancy by “cascading” style properties from parent tags.

Afraid style50 does not support HTML les, and so it is incumbent upon you to indent and align your HTML tags cleanly. Know also that you
can create an HTML comment with:

but commenting your HTML code is not as imperative as it is when commenting code in, say, C or Python. You can also comment your CSS, in
CSS les, with:

/* Comment goes here */

Hints
2/3
For fairly comprehensive guides on the languages introduced in this problem, check out the documentation for each on W3Schools.

 HTML (https://www.w3schools.com/html)
 CSS (https://www.w3schools.com/css)
 JavaScript (https://www.w3schools.com/js)

How to Submit

Execute the below, logging in with your GitHub username and password when prompted. For security, you’ll see asterisks ( * ) instead of the
actual characters in your password.

submit50 cs50/problems/2020/x/tracks/web/homepage

3/3
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

C$50 Finance
Implement a website via which users can “buy” and “sell” stocks, a la the below.

Background

If you’re not quite sure what it means to buy and sell stocks (i.e., shares of a company), head here
(https://www.investopedia.com/articles/basics/06/invest1000.asp) for a tutorial.

You’re about to implement C$50 Finance, a web app via which you can manage portfolios of stocks. Not only will this tool allow you to check
real stocks’ actual prices and portfolios’ values, it will also let you buy (okay, “buy”) and sell (okay, “sell”) stocks by querying IEX
(https://iextrading.com/developer/) for stocks’ prices.

Indeed, IEX lets you download stock quotes via their API (application programming interface) using URLs like https://cloud-
sse.iexapis.com/stable/stock/nflx/quote?token=API_KEY . Notice how Net ix’s symbol (NFLX) is embedded in this URL; that’s how IEX knows
whose data to return. That link won’t actually return any data because IEX requires you to use an API key (more about that in a bit), but if it did,
you’d see a response in JSON (JavaScript Object Notation) format like this:

1/7
{
"symbol": "NFLX",
"companyName": "Netflix, Inc.",
"primaryExchange": "NASDAQ",
"calculationPrice": "close",
"open": 317.49,
"openTime": 1564752600327,
"close": 318.83,
"closeTime": 1564776000616,
"high": 319.41,
"low": 311.8,
"latestPrice": 318.83,
"latestSource": "Close",
"latestTime": "August 2, 2019",
"latestUpdate": 1564776000616,
"latestVolume": 6232279,
"iexRealtimePrice": null,
"iexRealtimeSize": null,
"iexLastUpdated": null,
"delayedPrice": 318.83,
"delayedPriceTime": 1564776000616,
"extendedPrice": 319.37,
"extendedChange": 0.54,
"extendedChangePercent": 0.00169,
"extendedPriceTime": 1564876784244,
"previousClose": 319.5,
"previousVolume": 6563156,
"change": -0.67,
"changePercent": -0.0021,
"volume": 6232279,
"iexMarketPercent": null,
"iexVolume": null,
"avgTotalVolume": 7998833,
"iexBidPrice": null,
"iexBidSize": null,
"iexAskPrice": null,
"iexAskSize": null,
"marketCap": 139594933050,
"peRatio": 120.77,
"week52High": 386.79,
"week52Low": 231.23,
"ytdChange": 0.18907500000000002,
"lastTradeTime": 1564776000616
}

Notice how, between the curly braces, there’s a comma-separated list of key-value pairs, with a colon separating each key from its value.

Let’s turn our attention now to this problem’s distribution code!

Distribution

Downloading

$ wget https://cdn.cs50.net/2019/fall/tracks/web/finance/finance.zip
$ unzip finance.zip
$ rm finance.zip
$ cd finance
$ ls
li i h l i / 2/7
application.py helpers.py static/
finance.db requirements.txt templates/

Con guring
Before getting started on this assignment, we’ll need to register for an API key in order to be able to query IEX’s data. To do so, follow these
steps:

 Visit iexcloud.io/cloud-login#/register/ (https://iexcloud.io/cloud-login#/register/).

 Enter your email address and a password, and click “Create account”.
 On the next page, scroll down to choose the Start (free) plan.
 Once you’ve con rmed your account via a con rmation email, sign in to iexcloud.io (https://iexcloud.io/).
 Click API Tokens.
 Copy the key that appears under the Token column (it should begin with pk_ ).
 In a terminal window within CS50 IDE, execute:

$ export API_KEY=value

where value is that (pasted) value, without any space immediately before or after the = . You also may wish to paste that value in a text
document somewhere, in case you need it again later.

Running

. Start Flask’s built-in web server (within finance/ ):

$ flask run

Visit the URL outputted by flask to see the distribution code in action. You won’t be able to log in or register, though, just yet!

Via CS50’s le browser, double-click finance.db in order to open it with phpLiteAdmin. Notice how finance.db comes with a table called
users . Take a look at its structure (i.e., schema). Notice how, by default, new users will receive $10,000 in cash. But there aren’t (yet!) any
users (i.e., rows) therein to browse. + Here on out, if you’d prefer a command line, you’re welcome to use sqlite3 instead of phpLiteAdmin.

Understanding

application.py

Open up application.py . Atop the le are a bunch of imports, among them CS50’s SQL module and a few helper functions. More on those
soon.

After con guring Flask (http:// ask.pocoo.org/), notice how this le disables caching of responses (provided you’re in debugging mode, which
you are by default on CS50 IDE), lest you make a change to some le but your browser not notice. Notice next how it con gures Jinja
(http://jinja.pocoo.org/) with a custom “ lter,” usd , a function (de ned in helpers.py ) that will make it easier to format values as US dollars
(USD). It then further con gures Flask to store sessions (http:// ask.pocoo.org/docs/1.0/quickstart/#sessions) on the local lesystem (i.e., disk)
as opposed to storing them inside of (digitally signed) cookies, which is Flask’s default. The le then con gures CS50’s SQL module to use
finance.db , a SQLite database whose contents we’ll soon see!

Thereafter are a whole bunch of routes, only two of which are fully implemented: login and logout . Read through the implementation of
login rst. Notice how it uses db.execute (from CS50’s library) to query finance.db . And notice how it uses check_password_hash to
compare hashes of users’ passwords. Finally, notice how login “remembers” that a user is logged in by storing his or her user_id , an
INTEGER, in session . That way, any of this le’s routes can check which user, if any, is logged in. Meanwhile, notice how logout simply clears
session , effectively logging a user out.

Notice how most routes are “decorated” with @login_required (a function de ned in helpers.py too). That decorator ensures that, if a user
tries to visit any of those routes, he or she will rst be redirected to login so as to log in.

Notice too how most routes support GET and POST. Even so, most of them (for now!) simply return an “apology,” since they’re not yet
implemented.

helpers.py

Next take a look at helpers.py . Ah, there’s the implementation of apology . Notice how it ultimately renders a template, apology.html . It
3/7
also happens to de ne within itself another function, escape , that it simply uses to replace special characters in apologies. By de ning
escape inside of apology , we’ve scoped the former to the latter alone; no other functions will be able (or need) to call it.

Next in the le is login_required . No worries if this one’s a bit cryptic, but if you’ve ever wondered how a function can return another
function, here’s an example!

Thereafter is lookup , a function that, given a symbol (e.g., NFLX), returns a stock quote for a company in the form of a dict with three keys:
name , whose value is a str , the name of the company; price , whose value is a float ; and symbol , whose value is a str , a canonicalized
(uppercase) version of a stock’s symbol, irrespective of how that symbol was capitalized when passed into lookup .

Last in the le is usd , a short function that simply formats a float as USD (e.g., 1234.56 is formatted as $1,234.56 ).

requirements.txt

Next take a quick look at requirements.txt . That le simply prescribes the packages on which this app will depend.

static/

Glance too at static/ , inside of which is styles.css . That’s where some initial CSS lives. You’re welcome to alter it as you see t.

templates/

Now look in templates/ . In login.html is, essentially, just an HTML form, stylized with Bootstrap (http://getbootstrap.com/.) In
apology.html , meanwhile, is a template for an apology. Recall that apology in helpers.py took two arguments: message , which was passed
to render_template as the value of bottom , and, optionally, code , which was passed to render_template as the value of top . Notice in
apology.html how those values are ultimately used! And here’s why (https://github.com/jacebrowning/memegen). 0:-)

Last up is layout.html . It’s a bit bigger than usual, but that’s mostly because it comes with a fancy, mobile-friendly “navbar” (navigation bar),
also based on Bootstrap. Notice how it de nes a block, main , inside of which templates (including apology.html and login.html ) shall go. It
also includes support for Flask’s message ashing (http:// ask.pocoo.org/docs/1.0/patterns/ ashing/) so that you can relay messages from one
route to another for the user to see.

Speci cation

register
Complete the implementation of register in such a way that it allows a user to register for an account via a form.

 Require that a user input a username, implemented as a text eld whose name is username . Render an apology if the user’s input is blank
or the username already exists.
 Require that a user input a password, implemented as a text eld whose name is password , and then that same password again,
implemented as a text eld whose name is confirmation . Render an apology if either input is blank or the passwords do not match.
 Submit the user’s input via POST to /register .
 INSERT the new user into users , storing a hash of the user’s password, not the password itself. Hash the user’s password with
generate_password_hash (http://werkzeug.pocoo.org/docs/0.14/utils/#werkzeug.security.generate_password_hash. *) Odds are you’ll want
to create a new template (e.g., register.html ) that’s quite similar to login.html .

Once you’ve implemented register correctly, you should be able to register for an account and log in (since login and logout already
work)! And you should be able to see your rows via phpLiteAdmin or sqlite3 .

quote
Complete the implementation of quote in such a way that it allows a user to look up a stock’s current price.

 Require that a user input a stock’s symbol, implemented as a text eld whose name is symbol .
 Submit the user’s input via POST to /quote .
 Odds are you’ll want to create two new templates (e.g., quote.html and quoted.html ). When a user visits /quote via GET, render one of
those templates, inside of which should be an HTML form that submits to /quote via POST. In response to a POST, quote can render that
second template, embedding within it one or more values from lookup .

buy
Complete the implementation of buy in such a way that it enables a user to buy stocks.
4/7
Complete the implementation of buy in such a way that it enables a user to buy stocks.

 Require that a user input a stock’s symbol, implemented as a text eld whose name is symbol . Render an apology if the input is blank or
the symbol does not exist (as per the return value of lookup ).
 Require that a user input a number of shares, implemented as a text eld whose name is shares . Render an apology if the input is not a
positive integer.
 Submit the user’s input via POST to /buy .
 Odds are you’ll want to call lookup to look up a stock’s current price.
 Odds are you’ll want to SELECT how much cash the user currently has in users .
 Add one or more new tables to finance.db via which to keep track of the purchase. Store enough information so that you know who
bought what at what price and when.

 Use appropriate SQLite types.

 De ne UNIQUE indexes on any elds that should be unique.
 De ne (non- UNIQUE ) indexes on any elds via which you will search (as via SELECT with WHERE ).
 Render an apology, without completing a purchase, if the user cannot afford the number of shares at the current price.
 You don’t need to worry about race conditions (or use transactions).

Once you’ve implemented buy correctly, you should be able to see users’ purchases in your new table(s) via phpLiteAdmin or sqlite3 .

index
Complete the implementation of index in such a way that it displays an HTML table summarizing, for the user currently logged in, which
stocks the user owns, the numbers of shares owned, the current price of each stock, and the total value of each holding (i.e., shares times price).
Also display the user’s current cash balance along with a grand total (i.e., stocks’ total value plus cash).

 Odds are you’ll want to execute multiple SELECT s. Depending on how you implement your table(s), you might nd GROUP BY
(https://www.google.com/search?q=SQLite+GROUP+BY,) HAVING (https://www.google.com/search?q=SQLite+HAVING,) SUM
(https://www.google.com/search?q=SQLite+SUM,) and/or WHERE (https://www.google.com/search?q=SQLite+WHERE) of interest.
 Odds are you’ll want to call lookup for each stock.

sell
Complete the implementation of sell in such a way that it enables a user to sell shares of a stock (that he or she owns).

 Require that a user input a stock’s symbol, implemented as a select menu whose name is symbol . Render an apology if the user fails to
select a stock or if (somehow, once submitted) the user does not own any shares of that stock.
 Require that a user input a number of shares, implemented as a text eld whose name is shares . Render an apology if the input is not a
positive integer or if the user does not own that many shares of the stock.
 Submit the user’s input via POST to /sell .
 You don’t need to worry about race conditions (or use transactions).

history
Complete the implementation of history in such a way that it displays an HTML table summarizing all of a user’s transactions ever, listing
row by row each and every buy and every sell.

 For each row, make clear whether a stock was bought or sold and include the stock’s symbol, the (purchase or sale) price, the number of
shares bought or sold, and the date and time at which the transaction occurred.
 You might need to alter the table you created for buy or supplement it with an additional table. Try to minimize redundancies.

personal touch
Implement at least one personal touch of your choice:

 Allow users to change their passwords.

 Allow users to add additional cash to their account.
 Allow users to buy more shares or sell shares of stocks they already own via index itself, without having to type stocks’ symbols
manually.
 Require users’ passwords to have some number of letters, numbers, and/or symbols.
 Implement some other feature of comparable scope.

5/7
Testing

Be sure to test your web app manually too, as by

 inputting alpabetical strings into forms when only numbers are expected,
 inputting zero or negative numbers into forms when only positive numbers are expected,
 inputting oating-point values into forms when only integers are expected,
 trying to spend more cash than a user has,
 trying to sell more shares than a user has,
 inputting an invalid stock symbol, and

 including potentially dangerous characters like ' and ; in SQL queries.

Staff’s Solution

You’re welcome to stylize your own app differently, but here’s what the staff’s solution looks like!

https:// nance.cs50.net/

Feel free to register for an account and play around. Do not use a password that you use on other sites.

It is reasonable to look at the staff’s HTML and CSS.

Hints
 Within cs50.SQL is an execute method whose rst argument should be a str of SQL. If that str contains named parameters to which
values should be bound, those values can be provided as additional named parameters to execute . See the implementation of login for
one such example. The return value of execute is as follows:

 If str is a SELECT , then execute returns a list of zero or more dict objects, inside of which are keys and values representing a
table’s elds and cells, respectively.
 If str is an INSERT , and the table into which data was inserted contains an autoincrementing PRIMARY KEY , then execute returns
the value of the newly inserted row’s primary key.
 If str is a DELETE or an UPDATE , then execute returns the number of rows deleted or updated by str .

If an INSERT or UPDATE would violate some constraint (e.g., a UNIQUE index), then execute returns None . In cases of error, execute raises
a RuntimeError .

 Recall that cs50.SQL will log to your terminal window any queries that you execute via execute (so that you can con rm whether they’re
as intended).
 Be sure to use named bind parameters (i.e., a paramstyle (https://www.python.org/dev/peps/pep-0249/#paramstyle) of named ) when
calling CS50’s execute method, a la WHERE name=:name . Do not use f-strings, format
(https://docs.python.org/3.6/library/functions.html#format,) or + (i.e., concatenation), lest you risk a SQL injection attack.
 If (and only if) already comfortable with SQL, you’re welcome to use SQLAlchemy Core (http://docs.sqlalchemy.org/en/latest/index.html) or
Flask-SQLAlchemy (http:// ask-sqlalchemy.pocoo.org/) (i.e., SQLAlchemy ORM (http://docs.sqlalchemy.org/en/latest/index.html)) instead of
cs50.SQL .
 You’re welcome to add additional static les to static/ .
 Odds are you’ll want to consult Jinja’s documentation (http://jinja.pocoo.org/docs/dev/) when implementing your templates.
 It is reasonable to ask others to try out (and try to trigger errors in) your site.
 You’re welcome to alter the aesthetics of the sites, as via
 https://bootswatch.com/,
 https://getbootstrap.com/docs/4.1/content/,
 https://getbootstrap.com/docs/4.1/components/, and/or
 https://memegen.link/.

FAQs

ImportError: No module named ‘application’

By default, flask looks for a le called application.py in your current working directory (because we’ve con gured the value of FLASK_APP ,
6/7
an environment variable, to be application.py ). If seeing this error, odds are you’ve run flask in the wrong directory!

OSError: [Errno 98] Address already in use

If, upon running flask , you see this error, odds are you (still) have flask running in another tab. Be sure to kill that other process, as with
ctrl-c, before starting flask again. If you haven’t any such other tab, execute fuser -k 8080/tcp to kill any processes that are (still) listening
on TCP port 8080.

How to Submit

Execute the below from within your finance directory, logging in with your GitHub username and password when prompted. For security,
you’ll see asterisks ( * ) instead of the actual characters in your password.

submit50 cs50/problems/2020/x/tracks/web/finance

7/7
This is CS50x
OpenCourseWare

David J. Malan (https://cs.harvard.edu/malan/)

Final Project
The climax of this course is its nal project. The nal project is your opportunity to take your newfound savvy with programming out for a spin
and develop your very own piece of software. So long as your project draws upon this course’s lessons, the nature of your project is entirely up
to you. You may implement your project in any language(s). You are welcome to utilize infrastructure other than the CS50 IDE. All that we ask is
that you build something of interest to you, that you solve an actual problem, that you impact your community, or that you change the world.
Strive to create something that outlives this course.

Ideas
 a web-based application using JavaScript, Python, and SQL, based in part on the web track’s distribution code
 an iOS app using Swift
 a game using Lua with LÖVE
 an Android app using Java
 a Chrome extension using JavaScript
 a command-line program using C
 a hardware-based application for which you program some device
 …

How to Submit

Step 1 of 2

Create a README.md text le that explains your project and save it in a new folder called project in your ~/ directory. Note that your project
source code itself does not need to be submitted, but this README.md le must.

Execute the below from within your ~/project directory, logging in with your GitHub username and password when prompted. For security,
you’ll see asterisks instead of the actual characters in your password.

submit50 cs50/problems/2020/x/project

Step 2 of 2

Submit a short video (that’s no more than 2 minutes in length) in which you present your project to the world, as with slides, screenshots,
voiceover, and/or live action. Your video should somehow include your project’s title, your name, your city and country, and any other details
that you’d like to convey to viewers. See https://www.howtogeek.com/205742/how-to-record-your-windows-mac-linux-android-or-ios-screen/
for tips on how to make a “screencast,” though you’re welcome to use an actual camera. Upload your video to YouTube (or, if blocked in your
country, a similar site) and take note of its URL; it’s ne to ag it as “unlisted,” but don’t ag it as “private.”

When ready to submit your video, submit this form (https://forms.cs50.io/9f20d498-c446-4d76-ab3c-8737d479016a)!

1/2
That’s it! Your project should be graded within a few minutes. If you don’t see any results in your gradebook, best to resubmit (running the
above submit50 command) with only your README.md le this time. No need to resubmit your form.

This was CS50x!

2/2

Scopa Rules
No ratings yet
Scopa Rules
2 pages
Inuktitut Syllabics Chart
100% (3)
Inuktitut Syllabics Chart
3 pages
Esp32 Technical Reference Manual en
No ratings yet
Esp32 Technical Reference Manual en
660 pages
CS50 Notes All Weeks
100% (1)
CS50 Notes All Weeks
196 pages
AllNotes Mobile iOSTrack
No ratings yet
AllNotes Mobile iOSTrack
106 pages
AllNotes Mobile AndroidTrack
No ratings yet
AllNotes Mobile AndroidTrack
107 pages
AllNotes GameTrack
No ratings yet
AllNotes GameTrack
110 pages
All Notes NoTrack
No ratings yet
All Notes NoTrack
91 pages
All Notes and PSets
No ratings yet
All Notes and PSets
206 pages
The Syllabus For Course
No ratings yet
The Syllabus For Course
6 pages
Syllabus cs50 1abg0vk
No ratings yet
Syllabus cs50 1abg0vk
5 pages
CS50 Pset 1-7 Standard
No ratings yet
CS50 Pset 1-7 Standard
121 pages
Syllabus - CS50x 2022
No ratings yet
Syllabus - CS50x 2022
4 pages
Project
No ratings yet
Project
6 pages
Pset 0
No ratings yet
Pset 0
7 pages
Pset0 cs50
No ratings yet
Pset0 cs50
7 pages
Syllabus
No ratings yet
Syllabus
11 pages
Problem Set 1: C: Objectives
No ratings yet
Problem Set 1: C: Objectives
23 pages
Project
No ratings yet
Project
6 pages
Problem Set 1: C: Objectives
No ratings yet
Problem Set 1: C: Objectives
22 pages
Problem Set 4: Breakout: Objectives
No ratings yet
Problem Set 4: Breakout: Objectives
23 pages
CS50 Pset 2
No ratings yet
CS50 Pset 2
15 pages
Lecture 0 - CS50x 2023
No ratings yet
Lecture 0 - CS50x 2023
21 pages
2024 Fall Lecture0-720p-En
No ratings yet
2024 Fall Lecture0-720p-En
44 pages
Final Project - CS50x 2021
No ratings yet
Final Project - CS50x 2021
2 pages
All Problem Sets
No ratings yet
All Problem Sets
115 pages
Problem Set 1: C: Goals
No ratings yet
Problem Set 1: C: Goals
16 pages
Course Syllabus: 24-780 - Engineering Computation
No ratings yet
Course Syllabus: 24-780 - Engineering Computation
5 pages
Final Project - CS50x
No ratings yet
Final Project - CS50x
2 pages
CMSC 10600 Fundamentals of Computer Programming II (C++)
No ratings yet
CMSC 10600 Fundamentals of Computer Programming II (C++)
8 pages
Course Information: Additional Office Hours Available by Appointments Via Email
No ratings yet
Course Information: Additional Office Hours Available by Appointments Via Email
5 pages
GGG
No ratings yet
GGG
21 pages
Problem Set 7: C 50 Finance: Out of 76 Points
No ratings yet
Problem Set 7: C 50 Finance: Out of 76 Points
24 pages
Lecture 0 - CS50x 2025
No ratings yet
Lecture 0 - CS50x 2025
20 pages
CS50-Intro To Programming
No ratings yet
CS50-Intro To Programming
10 pages
Lecture
No ratings yet
Lecture
16 pages
CS 50
No ratings yet
CS 50
6 pages
CSC 520 AI 2018 Spring Syllabus
No ratings yet
CSC 520 AI 2018 Spring Syllabus
7 pages
Lecture 0 - CS50x
No ratings yet
Lecture 0 - CS50x
13 pages
Cs50x 2025 Course
No ratings yet
Cs50x 2025 Course
49 pages
CS50x - Problem Set 4
No ratings yet
CS50x - Problem Set 4
25 pages
Lecture 0 - CS50x 2024
No ratings yet
Lecture 0 - CS50x 2024
21 pages
CSC 458/2209: Computer Networks, Fall 2019: Department of Computer Science, University of Toronto
No ratings yet
CSC 458/2209: Computer Networks, Fall 2019: Department of Computer Science, University of Toronto
4 pages
ITCS
No ratings yet
ITCS
119 pages
Lecture 0 - CS50x 2024
No ratings yet
Lecture 0 - CS50x 2024
19 pages
Ug Python1 Syllabus
No ratings yet
Ug Python1 Syllabus
3 pages
Cspsyllabusmay2017 Codeorg
No ratings yet
Cspsyllabusmay2017 Codeorg
18 pages
2022 Lecture0 720p Sdr-En
No ratings yet
2022 Lecture0 720p Sdr-En
51 pages
CS - 110 A - FALL 2018 - Moe Alabdullatif
No ratings yet
CS - 110 A - FALL 2018 - Moe Alabdullatif
6 pages
Cmpsci 101de Syllabus: Instructor: Dr. Steven Shaffer, 338C IST. Send All Email Correspondence Via ANGEL Email To "All
No ratings yet
Cmpsci 101de Syllabus: Instructor: Dr. Steven Shaffer, 338C IST. Send All Email Correspondence Via ANGEL Email To "All
4 pages
Computer Science Coursework
100% (2)
Computer Science Coursework
8 pages
Aqa Computer Science Course Work
100% (2)
Aqa Computer Science Course Work
7 pages
CS 427 Software Engineering: Course Description
No ratings yet
CS 427 Software Engineering: Course Description
5 pages
CS50x 2024 Merged
No ratings yet
CS50x 2024 Merged
244 pages
Lecture 8 - CS50x
No ratings yet
Lecture 8 - CS50x
3 pages
CS111 Assignment 1, Semester II, 2017: 1 Rules
No ratings yet
CS111 Assignment 1, Semester II, 2017: 1 Rules
5 pages
CS01.104 Syllabus Fall 2012A
No ratings yet
CS01.104 Syllabus Fall 2012A
8 pages
Harvard CS50
No ratings yet
Harvard CS50
445 pages
CS 40 Syllabus W25
No ratings yet
CS 40 Syllabus W25
4 pages
2023 Fall Lectures 0 Lang en Lecture0
No ratings yet
2023 Fall Lectures 0 Lang en Lecture0
35 pages
8 Ways to Boost Your Logic
From Everand
8 Ways to Boost Your Logic
Pawan Sharma
No ratings yet
Teaching Primary Programming with Scratch Teacher Book: Research-Informed Approaches
From Everand
Teaching Primary Programming with Scratch Teacher Book: Research-Informed Approaches
Phil Bagge
No ratings yet
Teaching Primary Programming with Scratch Pupil Book Year 6
From Everand
Teaching Primary Programming with Scratch Pupil Book Year 6
Phil Bagge
No ratings yet
6.2 Properties of Parallelograms: Quadrilaterals
No ratings yet
6.2 Properties of Parallelograms: Quadrilaterals
15 pages
Series CC01
No ratings yet
Series CC01
4 pages
Trs en
No ratings yet
Trs en
2 pages
Dr. Trukk Price List W.E.F. 1st April, 2015. List No: 4
No ratings yet
Dr. Trukk Price List W.E.F. 1st April, 2015. List No: 4
20 pages
Analizador de Carbono Orgánico Total C391E058L TOC V
100% (1)
Analizador de Carbono Orgánico Total C391E058L TOC V
20 pages
Book Review of Lewis Vaughn's "The Power of Critical Thinking"
No ratings yet
Book Review of Lewis Vaughn's "The Power of Critical Thinking"
6 pages
Phys BP PB 2
No ratings yet
Phys BP PB 2
1 page
IBM System Networking SAN24B-5 Switch: Flexible, Easy-To-Use, Entr Y-Level SAN Switch For Private Cloud Storage
No ratings yet
IBM System Networking SAN24B-5 Switch: Flexible, Easy-To-Use, Entr Y-Level SAN Switch For Private Cloud Storage
4 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
5 pages
Saic-Q-1035 Sub-Base & Base Course
No ratings yet
Saic-Q-1035 Sub-Base & Base Course
4 pages
Canopy Merged PDF
No ratings yet
Canopy Merged PDF
32 pages
3500 C175 C280 AftertreatmentCEM T4 Marine A and I
100% (1)
3500 C175 C280 AftertreatmentCEM T4 Marine A and I
121 pages
IoT Lab Assignment No. 2
No ratings yet
IoT Lab Assignment No. 2
8 pages
Test2 QP VE Resit2
No ratings yet
Test2 QP VE Resit2
3 pages
1SFA898118R7000 pstx720 600 70
No ratings yet
1SFA898118R7000 pstx720 600 70
6 pages
ST1837 B46TU-B48TU Engines
100% (2)
ST1837 B46TU-B48TU Engines
40 pages
Sim of Tyre Rolling Resistance Final Rev
No ratings yet
Sim of Tyre Rolling Resistance Final Rev
26 pages
Kubler - Bellows Couplings
No ratings yet
Kubler - Bellows Couplings
2 pages
How To Resize A Garment: Method A: Increase Bust Size
No ratings yet
How To Resize A Garment: Method A: Increase Bust Size
3 pages
Fundamental Counting Principle
No ratings yet
Fundamental Counting Principle
14 pages
Introduction To C Programming Course Materail
100% (1)
Introduction To C Programming Course Materail
161 pages
Pleiades - Sigdell 6 10
No ratings yet
Pleiades - Sigdell 6 10
5 pages
Form Substation+400V+Switchboard+Test+Form
No ratings yet
Form Substation+400V+Switchboard+Test+Form
2 pages
Design Parameters For De-Formable Cushion Systems
No ratings yet
Design Parameters For De-Formable Cushion Systems
19 pages
Periodical Exam Science 8
No ratings yet
Periodical Exam Science 8
3 pages