Lecture6 0 1
Lecture6 0 1
DAVID J. MALAN: All right, this is CS50. And this is week 1, wherein we continue
programming, but we do it in a different language because recall last time, we
focused on this graphical language called Scratch. But we use Scratch, not only
because it's sort of fun and accessible, but because it allows us to explore a lot
of these concepts here, namely functions, and conditionals, Boolean expressions,
loops, variables, and more. And so, indeed, even if today's syntax, as we
transition to this new language called C, feels a little bit cryptic, maybe a
little intimidating at first, and you don't quite see all of the meaning of the
symbols beyond the syntax itself, realize that the ideas are ultimately going to be
the same.
In fact, as we transition from what was last week-- a Hello World program that
looked a little something like this-- this week, of course, it's going to now look
a little more cryptic. It's going to look a little something like this. And now
even if you can't quite distinguish what all of the various symbols mean in this
code, it turns out that at the end of the day, it's indeed going to do what you
expect. It's just going to say, hello, world on the screen, just like we did in
Scratch.
So let's start to apply some terminology to these tokens first. So what we're about
to see, what we're about to write henceforth, we're going to start calling source
code. Code that you the human programmer write is just henceforth called source
code.
Doesn't matter if it's Scratch. Doesn't matter if it's C. Doesn't matter if it's
Python before long. Source code is the general term for really what you and I as
human programmers will ultimately write.
Of course, computers don't understand source code, it turns out. Computers don't
understand Scratch and Puzzle Pieces, per se, or C code like we're about to see.
They only understand this, which we called what last week?
DAVID J. MALAN: Yeah. So this is binary, zeros and ones. But really, it's just
information represented in binary. And in fact, the technical term now for patterns
of zeros and ones that a computer not only understands how to interpret as letters,
or numbers, or colors, or images, or more, but knows how to execute as well
henceforth is going to be called machine code to contrast it with source code.
So whereas you and I, the humans, write source code, it's the computer that
ultimately only understands machine code. And even though we won't get into the
details of exactly what pattern of symbols means what, you'll see that in this kind
of pattern of zeros and ones, there's going to be numbers. There's going to be
letters.
But there's also going to be instructions because, indeed, computers are really
good at doing things-- addition, subtraction, moving things in and out of memory.
And suffice it to say that the Macs, the PCs, the other computers of the world have
just decided as a society what certain patterns of zeros and ones mean when it
comes to operations as well-- so not just data, but instructions. But those
patterns are not something we're going to focus on in a class like this. We're
going to focus on the higher level software side of things, simply assuming that we
need to somehow output machine code.
So it turns out, then, that this problem we have to solve getting from source code
to machine code actually fits into the same paradigm as last time. But the input in
this case is going to be source code on the one hand. That's what you and I ideally
will write so that we don't have to write zeros and ones. But we need to somehow
output machine code because that's what your Macs, PCs, phones are actually going
to understand.
Well, it turns out there are special programs in life whose purpose is to do
exactly this conversion-- convert the source code you and I write to the machine
code that our phones and computers understand. And that type of program is going to
be called a compiler. So indeed today, we'll introduce you to another piece of
software. And these come in many forms. We'll use a popular one here that allows
you to convert source code in C to machine code in zeros and ones.
Now, you didn't have to do this with Scratch. In the world of Scratch, it was as
simple as clicking the green flag because essentially MIT did all of the heavy
lifting there figuring out how to convert these graphical puzzle pieces to the
underlying machine code. But now starting today, as we begin to study programming
and computer science proper, now that power moves to you.
And it's up to you now to do that kind of conversion. But thankfully, the fact that
these compilers exist means that you and I don't have to program in machine code
like our ancestors once upon a time did, be it virtually or with physical punch
cards, like pieces of paper with holes in them. You and I get to focus on our
keyboard, as such.
But it's not just going to be a matter today of writing code. It's going to be a
matter ultimately today onward of writing good code as well. And this is the kind
of thing that you don't just learn overnight.
It takes time. It takes practice. Just like writing an essay in any subject might
take time, and practice, and iteration over time. But in a programming class like
CS50, we're going to aspire to evaluate the quality of code along these three axes,
generally.
Is it correct, first and foremost? Does the code do what it's supposed to do? After
all, if it doesn't, well, what was the point of writing it in the first place? So
it goes without saying that you want code you write to be correct. And it's
obviously not always. Again, anytime your Mac, or PC, or phone has crashed, some
human somewhere wrote buggy-- that is code with mistakes. But code correctness is
going to be the first and foremost goal.
But then there's a more subjective goal see in time, a matter of design. And we saw
a little bit of this last week when I proposed that we could design even scratch
programs better, maybe by using loops instead of just by copying and pasting the
same blocks again and again. So design is more subjective. It's more of a learned
art whereby two people might ultimately disagree as to which version of a program
is better designed.
But we'll give you building blocks and principles over the coming weeks so that you
can have a better sense for yourself if your own code is well designed. And why is
that valuable? Well, the better designed your code is, often the faster it's going
to run, the more maintainable it's going to be by you or colleagues if you're
working with others in the real world. So good design is a good thing. It helps you
communicate your ideas, just like in a typical English essay.
And then lastly, we'll talk this week onward about style. And this is really just
the aesthetics of your code. It turns out that computers often don't care how
sloppy your actual code is, where in the world of code, it turns out that you don't
really need to indent things in a beautiful way.
You don't need to paginate things like might in an essay. The computer generally
does not care, but the human does. The teaching assistant does. You will care the
next day when you're just trying to understand what your code does. So we'll focus
lastly on style, the aesthetics of the code that you're writing.
So where are we going to write code? Where are we going to compile code? So for
this class, not only with C, but the other languages we use later in the term,
we're going to use a free text editor that is program called Visual Studio Code,
AKA VS Code. It's super popular nowadays, not just for C, but for C++, and Python,
and Java, and any number of other languages. It's a text editor in the sense that
it lets you edit text. And that's all code is going to be.
Now, strictly speaking, you could write code on paper/pencil. In fact, in high
school if you took a class, you might have done that one or more times as an in-
class exercise. You can't run it on paper, of course, but you could write it,
certainly.
You could use something like Microsoft Word, or Notepad.exe, or Text Edit on the
Mac. But none of those programs are really designed to format the code in the best
way for you, nor are they designed to let you compile and run the code. So VS Code
is going to be a tool via which you can do all that and more-- write the code,
compile the code, run the code.
So that you all don't have to wrestle with stupid technical support headaches at
the beginning of the course by installing this software and that on your Macs or
PCs, we'll use a cloud-based version of VS Code at code.cs50.io. And that's going
to be the exact same tool. And the goal, then, is by the end of the semester to
migrate you off of that cloud-based environment to your own Mac and PC so that even
if CS50 is the only CS class you ever take, you're 100% equipped to continue
writing code after the class using not something that's even CS50-specific, but a
de facto industry standard, at least for some time.
So what's this program VS Code going to look like, be it on your Mac, PC, or
initially in your browser? It's going to look a little something like this. And
there's going to be several different regions to the screen. And pictured here is
that very same code I keep proposing as the simplest program you can write in C.
And what are these different regions of the screen?
Well, there's essentially these four here. So first, highlighted up top is going to
be one or more tabs where you're going to actually write code. So much like in
Google Docs or Microsoft Word, you can have tabs open with files.
Down here, though, is going to be an interface that many of you might not know.
This is what's called a terminal window. And a terminal window provides what's
generally called a Command Line Interface, or CLI. And this is in contrast with a
Graphical User Interface, or GUI.
Now, you and I, every day, are using GUIs on our phones, on our PCs. And a GUI is
literally graphical-- so menus, and buttons, and icons. And you generally use your
finger or a trackpad or a mouse or something like that to interact with it. But it
turns out that many programmers-- they're saying most programmers, at least over
time, come to prefer, not a GUI, but a CLI, a Command Line Interface where you
actually do everything somewhat arcanely via keyboard alone. Why?
Well, it turns out, there's just more features built in to most computers if you
can access them with a keyboard. It turns out, most of us can type faster than you
can point and click. And so that ends up being an efficiency gain over time.
So in time, will you get comfortable using this terminal window to do things like
compile your code or make your program as well as run it. So you won't be in the
habit initially of just double clicking icons like we do in our typical real world.
You'll do it the programmer's way. But it's not to the exclusion of adding icons,
and clickability, and more.
And then far away on the left is the so-called Activity Bar, and this is where you
just get a lot of traditional menus and buttons. So VS Code itself gives you both a
GUI and a CLI. But it's within the CLI, the terminal window, the bottom region of
the screen that we're actually going to type most of our commands. And in general
in class, I'm going to hide all of the graphical stuff that's just not of all that
much interest.
So with that said, let me actually change over to a live version of VS Code. And
I've indeed hidden in the Activity Bar. I've indeed hidden the File Explorer. So
what I have here for visibility sake is a really big area for writing code and a
really big terminal window at the bottom.
You'll see in the terminal window, there's dollar sign. And this doesn't mean any
form of currency. This is just the standard symbol that represents type commands
here. So the fact that there's just dollar sign and a cursor means, eventually,
that's where I'm going to type commands. But first, I'm going to actually create
some code.
Notice that I've deliberately lowercased the whole file name. And these are just
conventions. You could use a capital H.
You kind of could use a capital C. But just don't do that. Follow best practices so
that it's consistent with what most everyone else would do.
When I hit Enter, I just get an empty tab, just like the screenshot a moment ago.
And it's in this tab where I can now write my very first program in C.
Unfortunately, it's not quite as user-friendly as Scratch where you drag and drop a
couple of puzzle pieces and, boom, it's done. So I'm going to do this for memory.
But this, too, will become familiar to you over time.
I'm going to include something called stdio.h. I'm going to type int main, void in
parentheses. On a new line, I'm going to insert some curly braces, as we'll call
them. And then I'm going to type printf, and then some parentheses, and then in
quotes, hello, comma, world, then a backslash, then a lowercase n, then a close
quote, and then a semicolon at the very end of the line.
So all I've done is recreate, just from memory, that very first program. In a
little bit, we'll make clear what most of this does. But for now, let's just
actually run this thing. And just like I clicked the green flag last week for the
first time, let's actually compile and run this program.
If it were your Mac or PC and Google, or Microsoft, or someone else had made the
software, at this point in the story, we'd be double clicking an icon. But we can't
do that yet. This is still source code.
So I'm going to click back down in my terminal window. Notice I have a second
dollar sign below the first, which just means it's ready for a second command. And
now the command via which to make this an actual program, to compile it from source
code to machine code is, going to be quite simply make and then the name of the
program I want to make.
Slight subtlety-- I'm omitting deliberately .c because the program I want to make,
I just want to call hello. Don't write make hello.c. Just write make hello.
And this program make is essentially our compiler. Technically speaking, it's a
program that automates the compilation of my program for me. But it is going to see
that I've typed the word hello. It's going to automatically look now for a file on
the hard drive called hello.c and convert it from source code in C to machine code
in zeros and ones.
So if I didn't make any typos, Enter, nothing seems to happen. And that's a good
thing. Almost always, if nothing gets outputted on the screen, you did good.
You didn't make any mistakes. You didn't get yelled at. There's no error messages.
So this is actually a good thing.
How do I now run this program? Well, notice I've got a third dollar sign, which
just means I'm ready for a third command. And now I'm going to go ahead and run
./hello. And this is admittedly a little weird that you have to do dot slash.
But for now just take on faith that this is how you run a program called hello In
your current folder, in your current directory in this cloud-based environment. All
right, crossing my fingers again, hitting Enter. And voila. My very first program
in C, hello, world.
And now let me go ahead and reveal the File Explorer that I proposed exists
earlier. I'm just going to use the keyboard shortcut to reveal that. And generally,
I keep it closed because I don't really need to constantly what files are in my
account.
But you'll see now in the File Explorer, similar in spirit to a Mac or PC but
graphically a little different, here's my file, hello.c. It's highlighted because I
have that tab open. But now there's a second file here called just hello. That's
the name of my program.
So if you were on a Mac or PC, you would ideally double click that thing. You can't
do that in a command line environment. You have to run it down here.
But that's all we've done. We've created a file called hello.c, and then my
compiler made the program from that. Let me pause here and see if there's any
questions because that's a lot of magical phrases. Yeah?
Yeah. So if you're currently following along, playing along at home and you're
getting some kind of error message, part of today will be for me to deliberately
induce some of those error messages. For now let me just propose that if you
literally did what I did, you must have made a typo somewhere. And notice that it's
indeed standard io-- stdio.h. Maybe you typed studio.h?
OK, super common mistake, I could call you out. It is not studio.h. It is stdio.h--
so common. But this is exactly representative of the kind of stupid headaches
you're going to run into this week, probably for a few weeks, probably, honestly,
for a few years.
But you start to see past these stupid mistakes over time, and it just gets easier
and easier because the computer is going to be so regimented. It will only do what
you tell it to do. And if you say because it's verbally sounds like studio.h, it's
not going to know what the file is. So actually, thank you for tripping over that
so early. That's super common to happen. Yeah?
AUDIENCE: Yes.
DAVID J. MALAN: So why do I have to hello files? One is the one I created as the
human called hello.c, and it's pictured right here. But then when I ran make hello,
that process compiled my source code into machine code. So this second file just
called hello is the file that contains all of those zeros and ones that the server
actually understands. All right, so yeah, question?
DAVID J. MALAN: If you try clicking on the hello file, you'll see in this
environment the VS Code, quote/unquote, The file is not displayed in the editor
because it is either binary-- AKA zeros and ones-- or uses an unsupported text
encoding. In this case, it's binary. It's zeros and ones.
Now, you could use software to see those zeros and ones. It won't be intellectually
enlightening to most any human. So VS Code just takes the choice of not showing it
to you at all. So that would be a common mistake too, clicking on a file you don't
intend. But the source code is indeed going to be editable by us.
All right, so I've written this program. It seems to magically work, at least with
some effort if you get every single keystroke right. Well, what is it that's going
on? And how is this working?
Well, first of all, notice that even without my highlighting things or choosing
buttons for menus, notice that it's already color coded. And yet, I wasn't
highlighting along the way in Google Docs style, changing the color, certainly.
Well, it turns out, what VS Code and most programming environments nowadays do for
you automatically is syntax highlighting.
So syntax highlighting is just this feature of typical text editors nowadays that
analyzes the code that you've typed. And when it notices certain types of
keystrokes, things that represent functions, or conditionals, or loops, or
variables-- a lot of the vocab from last week-- it just highlights it ever so
differently for you. So main, for instance, which we'll soon see, is in purple
here.
Int, and void, and include are in red. Hello, world is in blue. My parentheses are
in green.
This will totally vary by programmer too. In fact, if you do want to change these
colors for problem set 1 for your own environment, you can poke around VS Code
Settings via the gear icon. You can change to a different color theme.
Syntax highlighting isn't some specific color scheme like it is in Scratch. It just
generally is to each human their own preference. But that's all that's happening
here is this notion of syntax highlighting at the moment.
Well, what more is going on in this code before I run it, but rather write it?
Well, it looks a little something like this if I take away all of the colors. And
then just for discussion's sake, let me go ahead and color it a little more like
Scratch. Recall that our very first Scratch program that just said hello, world on
the screen had a green flag clicked icon-- puzzle piece, roughly in orange, and
then a purple say block beneath it.
So whereas this is the C version, if we run rewind to last week, this was the same
program in Scratch. But what's happening now is exactly the same. So if you think
back to last week and you've got some function, like the say function in purple,
that might take one or more arguments, like inputs that influences what it says on
the screen.
And then functions, recall, can sometimes have side effects, like the speech bubble
appears on the screen. So last week when we used the say block and we passed in an
argument of hello, world at left, we got this visual side effect on the screen that
says now hello, world in the speech bubble. And that's exactly what just happened
in VS Code but much, much more textually.
And let's look a little closer now at the code itself. Let me wave my hand at the
equivalent of when green flag clicked part of my code, and let's focus only on the
say block in Scratch and the corresponding function in C. So if I step through this
and I wanted to convert what we did last week with the say block to C, I would
first use the print function-- although that's actually a bit of a white lie. It's
actually the printf function.
Printf means formatted. And it's just a function that allows you to format text on
the screen. There is no say function in C. There's a printf function. What MIT did
down the road years ago was they took what existed historically as printf, and they
simplified it for a broader audience by just calling it essentially say instead.
But notice that now if I want to convert the Scratch code at left to C code it
right, it's the same shape. So MIT deliberately used this white oval, if only
because it conjures this idea of having parentheses too. So on the right, if I want
to pass an argument or an input to the printf function, I use an open parentheses
and a close parentheses. In those parentheses, I then type whatever it is I want to
print on the screen-- in this case, hello, comma, world.
But notice I've deliberately left some room because you need some extra keystrokes
in the world of C. Any time you type out some text-- otherwise known as a string of
text, to use computer science jargon-- you need to quote it, in this case with
double quotes. Double quote at the left, double quote at the right. And notice too
I'm going to include some slightly cryptic symbol here too-- backslash n, which I
also typed and said verbally earlier, and then one last nuisance at the end of
this, which is a semicolon.
So suffice it to say, this is why we start with Scratch. This, drag and drop,
you're good to go. In a language like C, printf, parentheses, double quotes, the
text you want, backslash n, semicolon at the end.
There's just so much syntactic overhead. But at the end of the day, it's just a
function. And you'll get used to these nuisances like the parentheses, the quotes,
the semicolon, and the like.
But things can very easily you go wrong, and it's very easy to make mistakes, even
with lines of code like this. So let me do this. Let me go back to VS Code where I
have the exact same code.
Notice that on line 5 is exactly that line of code. So this is the equivalent of
the say block. And let's consider what mistakes I may make early on or even now 20
years later after learning this that are quite common in general.
Suppose I forget the semicolon there. So easy to do. You will do this eventually.
Let's see what happens now when I go back to my terminal window and try to compile
my code again. Just to keep things tidy, I'm going to clear my screen. But that's
just for lecture's sake so that we can focus only on the most recent command.
But I'm going to go ahead now and rerun make hello. This will ensure that my
program is recompiled. And this is a manual process.
I changed my code. The zeros and ones on the hard drive have not changed. I need to
recompile it to output the latest machine code.
So here we go. I'm going to hit Enter, crossing my fingers as before. But again, I
remove the semicolon by accident. Oh, my god. There's more lines of errors now than
there are of actual code.
And this, too, takes some getting used to. The programs we're using were not
necessarily written with the least comfortable audience in mind but, really,
professional programmers back in the day. But through practice, and through
experience, and through mistakes, you'll start to notice patterns here too.
So here's what I typed. Make hello after the sign prompt. Now I get yelled at as
follows, hello.c, colon, 5, colon, 29. Well, what's that referring to? I've screwed
up somewhere-- on line 5, on the 29th character on that line.
Generally, the specific character is not that useful unless you actually want to
count it out. But line 5 is a good clue. Why? It means I screwed up somewhere on
line 5 here.
All right. Well, what is the error? Expected a semicolon after expression. And this
error is actually pretty obvious now that I see it and I realize, oh, wait a
minute. All right, here's my line of code.
So what's the fix? Well, obviously, the fix is to go back up here, put the
semicolon there. And now if I recompile my code with make hello-- I won't clear my
screen just yet just to show you the difference-- now it just worked. So we're back
in business as before.
All right, let me pause here, though, and ask if there's any questions about what I
just did. These error messages will become frequent initially. Yeah?
DAVID J. MALAN: Really good question. Do you need a semicolon after every line or
just some? It turns out, just some. This is something you'll learn through
practice, through demonstrations and examples today.
Generally, you put a semicolon after a statement, so to speak. And this is the
technical term for this line of code. It's a statement. And think of it as it's the
code equivalent of an English sentence.
So the semicolon in code is like a period in English when you're done with that
particular thought. You don't need semicolons for now anywhere else. And we'll see
examples of where else you put them. But it usually is at the end of a line of code
that isn't purely syntactic like curly braces instead. Other questions on the
mistake I just fixed and created for myself? Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Correct. So line 5 is where the error is most likely. Character 29
means it's 29 characters that way.
And then it's actually, in this case, giving me a suggestion. The compiler won't
always know how to advise me, especially if I've made a real mess of my code. But
often, it will do its best to give you the answer like this. Yeah?
DAVID J. MALAN: Ah, so how come I first typed code, space, hello.c, and now I'm
typing make hello? Two different processes. So when I typed code, space, hello.c,
that was because I wanted to open VS Code and create a new file called hello.c.
It's like going to File, New in a Mac or PC.
Thereafter, though, once the file exists and is actually open here-- and it does
autosave, you don't need to hit Command-S or Control-S all the time-- I can now
compile it with make hello again and again. So theoretically, I should never need
to type code, space, hello.c again unless I want to create a brand-new file called
the same thing.
All right, so what about this other piece of syntax here? Let me clear my terminal
window here. You can also hit Control-L just to throw everything away just to clean
it up aesthetically. Suppose that I omit whatever this sequence of symbols is,
backslash n, since I'm not really sure at first glance why that's even there. Does
anyone want to conjecture, especially if you've never programmed before, what might
happen now if I recompile and rerun this version of the program? I left the
semicolon, but I took away the backslash n. Any instincts? All right, well-- yeah?
AUDIENCE: Will the next dollar sign appear straight after your hello, world?
DAVID J. MALAN: It will. The next dollar sign will appear right after my hello,
world. But what makes you think that?
So you don't need the backslash n. You do need the semicolon. But if you don't have
the backslash n, watch what happens when I do ./hello this time. Now, indeed, I see
hello, comma, world and then a weird dollar sign. And this is still a prompt.
I can still type commands at it, like clear, and everything gets cleaned up. But it
just looks kind of stupid. If I run it again here with ./hello, it's just not very
user friendly.
It is convention that when you're done running your program, you should ideally
clean things up, move the cursor to the next line for the user. And so the
backslash n is simply the special symbol, otherwise known as an escape sequence
that C knows means move the cursor to the next line. In other languages, Python
among them, uses this same symbology as well.
Now, if I go back to the code here and, for instance, I try to do this differently.
Suppose I don't put the backslash n. I just hit Enter like a normal person would in
Google Docs or Microsoft Word. Let me go ahead and try compiling this program.
And this, you would hope, would work, right? You would hope this would print out
hello, world and then a blank line because I move the cursor to the next line. But
no. If I run make hello now and try to compile that, C does not like this.
Now I get a different error, still on line 5, this time starting at character 12--
error, missing terminating double quote character and then some other esoteric
stuff. And then this does not sound good-- fatal error this time, too many errors
emitted, stopping now. So I really screwed up here.
So why can't I do this? Just because. The humans who designed C decided that if you
have a string of text, it must stay on the same line. It can get really long.
It can soft wrap-- that is, without you hitting Enter. But you can't hit Enter to
create a new line. If you deliberately want a new line, you have to indeed use this
backslash n escape character.
So let me go ahead and do this. Let me put it back. Let me go back to my terminal
window. I'll clear the screen again.
Let me go ahead now and do make hello to recompile to that version-- ./hello. And
voila. We're back in business with hello.
All right, so now let's tease apart some other aspects of this code because there's
a lot going on just to get us to say hello, world on the screen. For today, we're
largely going to ignore this-- int main(void) and these curly braces here. We'll
come back to that before long as to why it's there.
But for now just think of int main(void) and these curly braces here as really
being the C equivalent of when green flag clicked. Why? You just need it there.
That's how you get your program going. And main is indeed going to be some special
function, but more on that another time.
But why do I have this line of code here? The correct spelling is indeed stdio.h,
S-T-D-I-O dot H. And they're angled brackets this time, so that's a little new.
There's a hash and then an include keyword. If you don't know what something is,
there's not really that much harm in just getting rid of it and see what happens.
So let me delete that line.
Let me go back to my terminal window, clear the screen, and then run make hello
again. And let's try compiling this program now without that first line. Why? I
don't understand it, so let's see what happens.
All right, here's yet another error, but let's see-- hello.c, line 5, character 5--
so it's pretty early on-- error, implicitly declaring library function printf with
type int and then dot, dot, dot. So implicitly declaring library function printf--
so this is very cryptic sounding. You'll get better at understanding phrases like
these.
But apparently, I do need the include line for stdio.h. But why? Based on this
symptom, what might your instinct be for what that first line of code is doing for
us in the first place? Why intuitively must it be there?
DAVID J. MALAN: Exactly. It's like importing a library so that you can do things
like print things out on the screen. Now, in Scratch, you didn't have to do this
for most of the puzzle pieces.
But you might recall that partway through week 0, I went to the Extensions button
at the bottom left of the Scratch screen, and I imported some extra puzzle pieces
for text to speech that gave us the creepy humanized voice that actually came out
of the cat's mouth. Well, that was like adding a library-- code that someone else
wrote. In that case, it was a third party. But I gave myself access to it. Same
here.
Turns out that you don't really get printf automatically in C. You have to include
a so-called header file that declares that function to exist. Now, the reason for
this historically is just efficiency. Back in the day when computers were really
slower and resource constrained, you don't want to just give yourself access to the
entire kitchen sink of functionality. You only want to include only the functions
you actually care about.
Nowadays, it's a copy/paste step because you almost always want to print something
out on the screen, at least when writing programs like these. But these so-called
header files contain enough information about all of the functions in what's called
the Standard I/O Library. And standard I/O just means standard Input and Output.
And that's appropriate, right? Because printing is pretty basic output.
Turns out, there's other functions for getting input from the human's keyboard--
more on that in a bit. But any time you want to print something on the screen in C,
you indeed need to include this header file at the top of your code. And that's
going to essentially inform the compiler, hey, compiler, I want to use
functionality from the Standard I/O Library, including printf in this case. And if
you omit the header file by accident, it's just not going to work because it
doesn't know what printf is. It's some unrecognized symbol in that case. All right,
questions, then, about this line of code, this line of code here, or what these
header files are?
All right, you might wonder, well, how do you know what functions exist? How do you
know what files you might indeed want to include? Well, it turns out that C is a
many-year-old language, and it has ample documentation. A caveat is that its
documentation isn't necessarily all that user friendly. But what we have for the
course is a simplified version of the official documentation for C at this URL
here, manual.cs50.io.
So in the world of C, and other languages too, there are what are called manual
pages. And these are just text-based documentation that, honestly, is typically
written in a voice that you have to be an experienced programmer to understand some
of it. So what we've done in this version of the same documentation is we've
imported all of the original official documentation, but we've added less
comfortable translations in English for a lot of the functionality that you might
use in class just to help onboard you. So at the end of the day, you don't need
this documentation long term. But just to get started, we'll translate it into
terminology that you might appreciate from a teaching assistant, for instance, as
opposed to the original author of these documents.
And so, for instance, if you were interested in reading up on what functions exist
in the stdio.h file, well, you could go to a URL like this, or you could search for
it at manual.cs50.io. That would show you a list of all of the available functions
in that library, and print if indeed would be one of them. And then you could click
further on that, reaching a URL like this that's going to give you all of the
documentation for how to use printf. It turns out, you can do even more than just
printing out hello, world. And we'll scratch the surface of that today. But it
turns out that the documentation will always be your authoritative source
ultimately for questions like, what can I do, and how can I do it?
Meanwhile, it turns out that CS50 has its own library and accessible via header
file called cs50.h. It turns out in C that output is actually pretty easy,
relatively speaking, once you get used to all the curly braces, parentheses, quote
marks, and the like. But input is a little more difficult.
And if you have programmed before, input's not that hard to do in Python. It's not
that hard to do in Java. It's more difficult to do in C. And we'll see why in a
couple of weeks.
But for the first couple of weeks of the class, we actually provide you with some
training wheels, of sorts, whereby we have a number of functions that are declared
in this file, cs50.h. It lives its documentation at a URL like this. And in a
moment, we'll use a few of these. You'll see that CS50 provides you with some
functions like get_char for get a single character from the user's keyboard,
get_int to get an integer from the user's keyboard, get_string to get a sequence of
text from the user's keyboard, and a bunch of others as well.
So let's actually use some of these functions, how about, by revisiting, really,
the second program we wrote in Scratch last time, which adds some input to the
output. So first version of Scratch was just hello, world. Said the same thing
every time you click the green flag.
Version 2, recall, though, did this. It asked the user, what's your name? And then
that somehow gave it back a return value, we called it. And we then joined hello
and that name to say something a little more interesting on the screen.
So what did that model look like? Same thing as before. We've got a function in the
middle where function is like the code implementation of our algorithm. That takes
in one or more arguments, like what is it you want to say on the screen ultimately?
And return value, in this case, is going to be actually a value that comes back.
So in the case of getting input, we can consider this ask block again, like last
week. The input to it is whatever words of English you want to ask the user. And
then it returns a value. And this was called by default in MIT'S world answer. But
we'll see in C, you can call these return values anything you want ultimately in
variables.
But this is different from a side effect. A side effect is just something visual
often that happens on the screen, like the speech bubble or hello, world. A return
value is actually a value you get back from a function that you can use or reuse.
So how do we convert this Scratch block from last week to C code this week? Well,
if you want to ask the user for something like their name, you can do this. You use
a CS50 function called get_string. And you use the parentheses to represent here
comes the inputs there too. You can then put the sentence you want to ask the
user-- quote/unquote, what's your name?
But you do indeed need the quotes literally in C. So I'll go ahead and add those as
well. Subtle, but I've deliberately included a space after the question mark, but
before the double quote, just so that the cursor moves one step over because, in
this case, we're not going to get a special speech box like we did in Scratch. It's
just going to leave the cursor where it is, so we'll see that, aesthetically, that
just moves the blinking cursor one space after the sentence on the screen.
All right, but the catch is with Scratch, we just automatically got back the answer
from the user in a special variable called answer. In C, you're going to have to be
a little more specific. In C, If you want to get back a return value from a
function like get_string, you have to use an equal sign and then the name of a
variable on the left.
But notice that this doesn't represent equality, per se. This is assignment in this
case. So in C, when you use a single equal sign, that means copy the value on the
right over to the value on the left-- from right to left. So what does this do for
us?
Well, if get_string is a function that prompts the user with, quote/unquote, what's
your name, and it has I claim a return value, that means it hands me back some
value. But it's up to me in C to do something with that value. So if I want to copy
that value into a variable that I can use and reuse, I use an equal sign, and I
invent on the left-hand side of that equal sign any variable name I want.
There are certain rules. There are certain conventions. But generally if you use a
single word with all lowercase, you're in good shape.
But C's a little more pedantic than that. And those of you who have programmed
before might not be used to this, for instance, in Python, which is a world we'll
get to in a few weeks. You also have to tell C what type of value you're storing.
So if I do want a string of text from the user-- so not an integer, not a single
character. I want a whole string of text, like a phrase, a sentence, a name, in
this case-- I have to tell C that this variable is of type string. So it's a little
wordy, but you get used to it.
And you just have to be precise. You're informing the computer what type of value
is going in this variable. All right, it's so close to being correct, but I have
omitted something that's annoyingly important still. What's missing still? Yeah?
AUDIENCE: Semicolon?
Let me go back to VS Code where I have version 0 of my code here. Let me go ahead
and include one other file at the top of hello.c, namely include cs50.h so that I
have access now to get_string and anything else I might want. Now let me go ahead
and add a line of code here inside of these curly braces.
And let me go ahead and do this-- string answer equals get_string, quote/unquote,
what's your name, question mark. I'm going to add an extra space before the double
quote. I'm going to indeed end my thought with a semicolon. And now let me
deliberately make a mistake, just to make a point here. Let me now try changing
hello, world to hello, comma, answer.
Now, perhaps, even though this is some new lines of code, you can see where I've
errored already. But let me try making this program now. So far, so good. So no
error messages. So that's a good thing.
Let me go ahead and run ./hello, and you'll see the prompt. What's your name,
question mark. And notice, the cursor is indeed one space to the right just because
I thought it would look prettier to put a little blank space there as opposed to
leaving it right after the question mark. Let me type my name. But even if you've
never programmed before, I have screwed up here. What are we going to see on the
screen when I hit Enter?
DAVID J. MALAN: Yeah. Hello, answer, most likely. Why? Because the computer is
going to take me literally.
And if I say, quote/unquote, hello, answer. That is the string of text followed by
a new line that's going to be outputted to the screen. So we need some way of
actually plugging answer into this line of code.
It's not quite as simple as scratch where you could just grab a second say block
and drag and drop the variable there. We actually need a new syntax. And it's going
to look weird at first, but it is everywhere in software nowadays, especially in
the world of C and certain other languages.
So let me go ahead and propose that I solve it as follows. Well, back when we did
this in Scratch, remember that the most elegant solution was this here. We used the
say block still, which is going to be analogous to printf today. But I use the join
puzzle piece and Scratch to combine hello, comma, space, and then the name of the
human. So how do we translate this code to C?
Well, it's going to look a little different now. I'm going to start with printf
with some parentheses and a semicolon representing the say block. But how do I now
do this joining? This is where the puzzle pieces don't quite translate perfectly.
This would be the way to do this.
You put hello, comma, and then a placeholder. So this is what's known as a format
code in C, specifically for printf. And it just means this is a placeholder for a
string.
Again, a string is just text. So this means, hey, computer, print out literally,
hello, comma, space, and then not literally %s. %s is treated specially to mean
plug in some value here.
All right, so what else do I still need? Well, this is still some text, so I'm
still going to surround the whole thing with double quotes. I'm still going to
include my backslash n just to keep things tidy and move the cursor to the next
line. So the last step here in C is to somehow join the answer with that word
hello. And the way you do this is with printf, passing it not one argument, which
is what I keep doing. I keep passing it one string of text, quote/unquote.
I'm going to now add a comma and then the name of the value that I want printf to
go back and plug into that %s. And printf is just smart about this. If you have one
%s and one additional argument after a comma, it just does-- from right to left, it
plugs it in.
If you have two %s's and two variables after the comma, that's OK too. If you
separate them with commas, it'll plug the first into the first %s and the second
variable into the second %s. So it's just left to right, order of operations. It's
not as pretty or as simple as this, but this is how it's done in C.
All right, let me pause because this is a lot of symbology. Any questions on this
technique here? Yeah?
AUDIENCE: Why did you exclude the backslash n in the previous section?
DAVID J. MALAN: Yeah, a really good question. Why did I exclude the backslash n a
moment ago? Really, just my sense of aesthetics, if you will. No good reason beyond
that.
So if I look back at my code, you quite rightly notice that I didn't have a
backslash n there. That's just because, for whatever sense of style that I have, I
wanted the name to be typed right after the question.
I totally could have added a backslash n there instead of a space. That would have
just allowed me to type down here. Totally fine. Just wanted to show you something
different. Good catch. Yeah?
DAVID J. MALAN: Can I show an example with two %s's? Surely. So let me in VS Code
do this. Let me clear my terminal window to clean things up.
And let me do this. Instead of calling the variable answer all over the place, let
me call it first. And I'll ask two questions. What's your first name?
And now let me do string last equals get_string-- whoops, capitalization matters,
so let me fix my capital S there-- quote/unquote, What's your last name, question
mark, semicolon. And now we'll plug in one %s and a second %s. And now I'm going to
plug in first first and last last, coincidentally. And now I'm going to go back to
the terminal window.
Make hello-- crossing my fingers, all good-- ./hello. Here's my first question,
David. Here's my second question, Malan. And again?
Hello, David Malan. So it just inserts them left to right. All I was doing for
parity with Scratch though-- and let me go ahead and undo this again.
I'll go back to answer, like this. I'll go back to just asking for the person's
name. I'm going to delete mention of last. I'm going to delete mention of the
second %s. And now if I recompile this simpler version, oh, I did screw up-- didn't
intend it. What did I do wrong?
I didn't declare first. So indeed, intuitively, I want to just change that to that.
Let me now do make hello again, ./hello, type in just my first name this time. And
there it is-- hello, David. Questions on this then syntax with printf? Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Ah, the placeholder-- I'll zoom in-- is just a single percent then
an s. So inside of my string here is %s, and then I have a comma outside the
quotes, and then the name of the variable whose value I want to plug in for that
%s.
And now notice there's technically two commas inside of these parentheses on line
7. And yet, I claim that printf, at the moment, is only taking in two arguments.
Why is there then two commas but only two arguments? If there were two commas, you
would think there would be three arguments, right?
DAVID J. MALAN: Exactly. The comma in between the quotes is just an English thing.
It's separating the hello from the name. So that's why indeed it's not only in
quotes, that's also why programs like VS Code tend to syntax highlight it a little
differently just so that it jumps out as different to you, even though, in this
case, it's a little subtle-- a light blue versus white-- but indeed, it's trying
its best. Other questions now on this placeholder? Yeah?
AUDIENCE: If you wanted to put an exclamation point at the end, would you put a
comma after your answer variable, and would that put it [INAUDIBLE], or would you
have to add a new line?
DAVID J. MALAN: Ah, good question. If I wanted to add an exclamation point after
the name, would I have to add another placeholder and so forth? I could actually do
that much more simply. I can just put the exclamation point right after the percent
sign. I don't need an additional placeholder, per se.
If I zoom out now and run make hello again, ./hello, and type in just my name-- no
exclamation point-- now you'll see more excitedly, hello, comma, David. So printf
is smart. It will figure out where the %s is and then go and replace it.
Now, let me propose that a common thing in programming is that as soon as we make a
decision as how to design something, we often paint ourselves into a corner and
regret a decision. Can anyone think of a problem that arises from using %s as a
placeholder in this string to printf? What could go wrong if we're using percent in
this special way?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah. If you literally want to say, for whatever weird reason, %s
on the screen-- or honestly, even just a single %. It turns out that a percent sign
is treated specially inside of printf strings. So what's the solution here?
There's different patterns of solutions to problems like these. But suppose you
wanted to say, I got 100%, for instance. Let me go ahead and change this
completely. So I got 100% on your test or whatever.
All right, let me go ahead and run make hello, Enter. All right, so invalid
conversions specifier. I mean, I have no idea what this means, but it's underlining
the percent sign as problematic.
Well, it turns out that humans years ago decided, ugh, all right, damn it. We
already used %. Well, two percent signs will mean one %, literally. So now if I
rerun make hello, aha, ./hello, I got 100%.
So there's going to be things like that, honestly, that you have to ask someone,
you have to Google, you have to look it up in the documentation. But there's always
a solution to those kinds of problems. And thankfully, they don't come up all that
often. Yeah? Oh, just pointing. Other questions? Yeah?
AUDIENCE: So if you have multiple [INAUDIBLE]
AUDIENCE: This more of a clarification question. What exactly does the %s mean?
DAVID J. MALAN: It's just a placeholder. It's called a format code, and it just
means colloquially, plug in some value here. And printf-- the humans who wrote
printf decades ago decided to treat %s special. Why?
Just because. They needed some placeholder. They decided that, eh, no one's ever
going to really want to type %s. And if they do, they can just do %%s. So they
decided to implement printf in such a way that they have code that analyzes
whatever text comes in, looks for %s, and then somehow plugs in the subsequent
values into that placeholder. And just the-- ah, question? Sorry?
DAVID J. MALAN: Ah, so what if you wanted to do a single characters, like initials,
like D M or D J M for first, middle, last, absolutely. And that, too, is a perfect
segue from the two of you to what, in general, are going to be called data types in
C.
So it turns out, in C, there's not only strings as text. And we'll see in more
detail over the next couple of weeks what a string really is underneath the hood.
But strings of text are not the only thing that programs can output. They can
indeed output single characters, as for initials.
They can output integers as well. Turns out that printf has different format codes
for all sorts of different data types. And just some of the data types we'll see in
the coming weeks will be this list here, which you'll notice it almost perfectly
lines up with the CS50 functions that I rattled off earlier, like get_char,
get_int, get_string.
The reason we called those functions that is because each of them is designed to
return to you a different type of value. We've used get_string already in this
example here. We'll soon see get_int, and we'll see opportunities to use others.
But these indeed are the menu of available data types plus others-- dot, dot, dot--
that you can use when writing a program in C.
That was a lot. Let's go ahead and take a five-minute break here. No cookies yet.
But in five minutes, we'll come back, dive into more detail. On our second break
today, we'll have cookies.
All right, we are back. And so if you have been playing along at home but hitting
some bumps in the road, that's totally normal. And indeed, the goals of lecture
generally will be to give you a sense, conceptually, of where we'll be going during
the course of the week. But it's indeed through the hands-on labs and problem sets
that you'll really have an opportunity at your own pace to work through some of
those same bumps in the road.
But for today, let me give you a few more building blocks. And these two will
translate from Scratch initially. Namely, like conditionals, like how now in C,
after knowing now how we can use functions-- at least get_string and printf-- and
we can use variables like the string I created earlier, how can I now add to the
mix things like decision making and conditionals at that?
Well, with conditionals in Scratch, we had this kind of syntax on the left. Here in
Scratch is how you might express if two variables, x and y, have this relationship.
If x is less than y, then say on the screen, x is less than y. Well, let me
translate that to the right now in C code.
So you use parentheses there. So similar in functions where we use parentheses for
printf and parentheses for get_string, and this is just a weird inconsistency
stylistically. When using the keyword if, you should, as a matter of best practice,
put a space after the word if. When using a function like printf or get_string, you
shouldn't. Both will work, but you'll find that these are conventions stylistically
that most people adhere to-- so space when using an if here.
All right, now inside of the curly braces is where the actual code goes that you
want to execute conditionally. So if you want to print out x is less than y only if
x is actually less than y in C, you use this open curly brace-- which, up until
now, you've probably rarely used on your keyboard-- and the closed curly brace down
here. And those are hugging, if you will, the one or more lines of code underneath
the if-- very similar in spirit to how the orange block here hugs the purple puzzle
piece here.
So there's no graphics in C. It's all text. So you can think of those curly braces
as really representing the same idea.
As a side note, if you only have one line of code inside of the if condition, if
you will, you strictly, speaking, don't need the curly braces. But as a matter of
good style, do include them. It will make more obvious what your intent is.
How about in Scratch if you wanted to express this-- two ways in the road that you
might go, left or right, so to speak? Well, if x is less than y, I want to say, x
is less than y. Else, I want to say the opposite, x is not less than y in this
case. So I'm making a decision based on that Boolean expression.
In C, It's almost the same, but you're adding to the mix the key word else-- so MIT
borrowed for Scratch the same keyword there-- and a second pair of curly braces,
open and close respectively. And you might guess now what goes inside of those.
Well, you print out x is less than y, or you print out x is not less than y.
All right, what if there is a three-way fork in the road? In Scratch, this actually
gets a little unwieldy graphically, if you will. But notice that in Scratch, this
is how we could express if x is less than y, say x is less than y. Else if x is
greater than y, say x is greater than y. Else if x equals y, then say x is equal to
y.
Now, minor inconsistency here. Just a little bit ago, I claimed, in C, that an
equal sign represents what operation?
AUDIENCE: Assignment.
DAVID J. MALAN: Assignment from right to left. Insofar as Scratch is really meant
for kids, and they didn't really want to get into the weeds of this kind of
semantic, equal sign in Scratch means equality. However, we're going to need to fix
this in C in just a moment. In C, equal sign means assignment right to left. In
Scratch, it literally means what you would expect.
All right, let's translate this code then to C. On the right, this code would
correspond really to this. And you can perhaps see, somewhat goofily, what the
solution was, not unlike the %% solution earlier when humans painted themselves
into one other corner. You say if, you say else if, and you say else if, and how
did we resolve the use of a single equal sign already? In C, when you want to
express equality-- is the thing on the left equal to the thing on the right-- you
literally use two equal signs right next to each other, no space in between them.
But now this code would be correct on both the left and the right, whether you're
doing this in Scratch or C respectively. But now we can nitpick our code,
specifically the design thereof. Logically, can anyone critique the design of this
code, either in Scratch or C? I feel like we could do better. How about in back?
AUDIENCE: The only option after it getting greater than or less than is
[INAUDIBLE].
DAVID J. MALAN: Perfect. Logically, it's got to be the case that x is less than y,
or x is greater than y, or by conclusion, it's got to be equal to y. So why are you
wasting my time or the computer's time asking a third question? You don't need to
ask this final else if because logically, as you note, it should go without saying.
So it's a minor tweak. You're doing extra work potentially in the cases where x
equals y. So we can just refine that. And just like in Scratch, you could just use
an else block, similarly in C, could we simplify this code to just an else, a sort
of catch-all logically that just handles the reality that, of course, that's going
to be the final situation instead.
All right, so we have this ability now to express conditionals with Boolean
expressions. Let's actually do something with this next here. So let me go back to
VS Code. I've closed hello.c, and I want to create a second file for the sake of
some demos now. Recall that you can create a new files by typing code, space, and
then the name of the file you want to create.
For instance, I might do compare.c. I want to write a program that's going to start
comparing some values for demonstration's sake. But before I do that, let me just
show you by opening the File Explorer at right, this is similar in spirit to a Mac
or PC. You can go up here and click on an icon, and you can click on the plus icon,
and you'll get a blue box. And I can type in compare.c, and I can just manually
create it that way.
Notice that opens the tab even without my having typed code. So again, on the left,
you have a GUI, a Graphical User Interface, albeit a simplistic one. On the right
and at the bottom here, you have a command line interface, but they're one in the
same.
I'm going to hide the File Explorer just to make more room for code here. And let's
go ahead and do this. Let's write a program that compares two values that the human
inputs, but not strings this time. Let's use some actual integers.
All right, I'm going to go ahead and include the CS50 library's header file at
top-- cs50.h. I'm going to also include stdio.h. Why? One gives me user-friendly
input via get_string, get_int, and so forth. One gives me user-friendly output via
printf in the case of stdio.h.
Now I'm just going to blindly type this line of code, which we'll come back to in
future weeks. But for now, that's analogous to the when green flag clicked code in
Scratch. And now let's go ahead and do this.
Let me go ahead and get_int from the user and ask the user, What's x, question
mark. I'm not going to bother with a new line. I want to keep it all in one line,
just for aesthetics' sake.
But when I get back and int, just like I get back a string, I get back a return
value. So if I want to store the result of get_int somewhere, I had better put it
in a variable. And I can call the variable anything I want.
Previously, I used answer, or first, or last. Now I'm going to use x. But there's
still two things left to do here logically, even though we haven't technically done
this yet. What I still need to do?
AUDIENCE: A semicolon.
DAVID J. MALAN: And the int at the beginning. You the programmer, starting today,
need to decide what you're going to be storing in your variables. And you just need
to tell the computer that so that it knows.
Now, as a teaser for languages like Python, more modern languages, turns out,
humans realized, well, gee, this is stupid. Why can't the computer just figure out
that I'm putting an int there? Why do I have to tell it proactively? So in some
languages nowadays, like Python will get rid of some of this syntax, will get rid
of the semicolons. But for now we're looking at, really, the origins of how this
all worked.
All right, so I've done this one line ending with semicolon. Let me do one other.
And let me get a second int asking the user, What's y, question mark. So almost
identical but different responses from the user, hopefully.
And let me just ask simply if x is less than y, in parentheses, then some curly
braces, let me go ahead and print out, quote/unquote, x is less than y backslash n.
And now just as a side note-- I seem to be typing fast. Some of that is because VS
Code is helping me. Let me go back to this first line with the if, hit Enter.
And now I'm only on my keyboard going to type the open curly brace. This is a
feature of many text editors nowadays. It finishes part of your thought.
Why? Just to save yourself a keystroke to make sure you don't accidentally forget
the closing one. So you'll notice sometimes that things are happening that you
didn't type. It's just VS Code or future programs you use trying to be helpful for
you.
I'll go ahead and manually type out now printf x is less than y backslash n close
quote semicolon. So let me go ahead now and try to run this, and we'll see-- let's
see. So make-- not hello-- but make compare because this file is called compare.c,
hitting Enter.
No output is good because it means I haven't messed up. Let me ./compare instead of
./hello, Enter. What's x? How about 1? What's y? How about 2? X is less than y.
Well, let's try it again. And here, I'll save you some keystrokes too. Let me clear
my screen. Instead of constantly typing ./this and ./that, you can also use your
keyboard's arrow keys in VS Code to scroll back through time.
All right, let me go ahead, though, and rerun ./compare, Enter. Let's reverse it
this time-- 2 for x, 1 for y. And now, of course, there's no output. All right,
well, that's logically to be expected because we didn't have an else here.
So let's add that. Else-- now let's open my curly braces, letting VS Code do one of
them for me-- printf, quote/unquote, x is not less than y backslash n semicolon.
Let me go ahead and try this again-- ./compare, Enter. Again, 2 for x, 1 for y. And
we should see-- huh. What did I do wrong? Why am I not seeing any else output?
Yeah?
AUDIENCE: You changed your code when you rebuild. You need to compile it.
DAVID J. MALAN: Exactly. You got to get into the habit after you change your code
of recompiling it. Or otherwise, the zeros and ones in the server are the old ones
until you manually compile.
So let's fix this-- make compare, Enter. No error messages. That's good. ./compare,
2, 1. And now I get back the output. So x is not less than y.
How about if I go and add in the third condition? Well, we can do this either
efficiently or inefficiently. Let me go ahead and refine this. So else if x is
greater than y, let's literally say, x is greater than y. And now I could do x else
if x equals equals y.
But I think we already claimed that that's unnecessarily inefficient. So let's just
have our catchall. And here I'm going to say, quote/unquote, x is equal to y
backslash n, close quote there.
So I think now with this code, we've handled all three scenarios. Let me go ahead
and recompile it properly-- make compare, ./compare. And now 1 and 2-- is less than
y.
Let me run it again. 2 and 1-- x is greater than y. And lastly, 1 and 1, and x is
equal to y.
So for the most part, our code is getting longer. We're up to 21 lines of code,
though some of them are just single characters on the screen. Almost everything
else is the same. I'm using the CS50 library's header file for my get_int function,
stdio.h for my printf function, and the rest of this is just now new syntax for
conditionals as well. Questions, then, on this C implementation of just some basic
comparisons like this? Any questions? Yeah?
And there's generally automated tools within a company that help give feedback on
the code or stylize it as such. There are alternative styles than what we use in
the class. We deliberately keep and ask that you keep the curly braces on their own
line, if only because it rather resembles like the hugging nature of Scratch's
blocks and just makes clear that they're balanced, opened and closed.
However, another common paradigm in some languages and with some programmers is to
do something like this on each of them. So you have the opening curly brace on the
same line as here. We do not recommend this. This is en vogue in the JavaScript
world and some others. But ultimately in the real world, it's up to each individual
programmer and/or the company they're working for, if applicable, to decide on
those things.
Well, let me do this. Let me create a new program, a third one called agree.c. So
I'm going to write code agree.c just to give myself a new tab.
I'm going to start, as always now, include cs50.h. Let's include stdio.h. And then
let me do my int main(void)-- which, again, for today's purposes, we'll take at
face value is just copy/paste.
And if I just want to get Y or N, for instance, instead of Yes or No, we can just
use a simpler variable here. How about just a char, a character, a single
character? So I can use get_char to ask the user, for instance, do you agree,
question mark. But as before, I need to store this somewhere.
So I don't want a string because that's a single char. I don't want an int. I just
want a char. And it's literally C-H-A-R. And then I can call this thing anything I
want.
It's conventional if you have a simple program with just a single variable and it's
of type char, call it c. If it's an int, call it i. If it's a string, call it s.
For now I'm just going to keep it simple and call it c.
And now I'm going to ask a question. So if c equals equals, how about,
quote/unquote, y, then let me go ahead and print out Agreed backslash n, as though
they agreed to my terms and conditions. Otherwise, let's see. Else if the character
equals equals, quote/unquote, n, then let me go ahead and print out, say, Not
agreed, as though they didn't, quote/unquote. And let's leave it at that, I think,
here initially.
Now, you'll notice one curiosity, one inconsistency perhaps. Does anyone want to
call it out, though it's somewhat subtle? I've done something ever so slightly
differently without explaining it yet. Do you see it?
AUDIENCE: The single quotation mark.
DAVID J. MALAN: Yeah. So I've suddenly used single quotation marks for my single
characters and double quotes for my actual strings of text. This is a necessity in
C. When you're dealing with strings, like strings of text, like someone's name, a
sentence, a paragraph, anything really more than one character, you typically use
double quotes.
And indeed, you must. When dealing with deliberately single characters, like I am
here for y or n, you must use single quotes instead. Why? Because that makes sure
that the computer knows that it's indeed a char and not a string. So double quotes
are for strings. Single quotes are for chars.
So with that said, let me go ahead and zoom out. Let me go ahead in my terminal
window run make agree, Enter. Seems to work OK so let me go ahead and do ./agree.
Let me go ahead now and type in y. Here we go. Enter. Huh.
Let me try that again. Rerun ./agree. How about no? Enter. Why is it not behaving
as I would have expected?
DAVID J. MALAN: Yeah, I kind of cheated there, and I hit the Caps Lock key just as
I started typing in input. Why? Because I deliberately wanted to type in uppercase
instead of lowercase, which is kind of reasonable. It's a little obnoxious if you
force the user to toggle their caps lock key on or off when you just need a simple
answer. That's not the best User Experience, or UX.
But it would work if I cooperated. Let me run this again without caps lock-- y
lowercase for yes. Ah, that worked. n lowercase for no. That worked. But how could
I get it to work for both? Well, how about this?
Let me go ahead and just add two possibilities. So else if c equals equals
quote/unquote capital Y, then also do printf agreed backslash n. And down here,
else if c equals equals single quote capital N, then go ahead and print out, again,
Not agreed.
This, I will claim now, is correct. And I'll do make agree real fast, ./agree. And
I'll use capital. It now works. I'll use capital. It again works.
But this is perhaps not the best design. Let me hide the terminal window and pull
this up on the screen all at once. Why might this arguably not be the best design,
even though it's correct? There's another term of art we can toss here, like
[SNIFFS] something smells kind of funky about this code.
This is an actual term of art. There's code smell here. Something smells a little
off. Why? What do you think?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah. There's the same output again and again. I mean, I manually
typed it. But honestly, I might as well have just copied and pasted most of my
original code to do it again and again for the two capital letters. So if line 10
and 14 are the same AND line 18 and 22 are the same, AND then the rest of these if
and else ifs are almost the same, [SNIFFS] there's some code smell there.
It's not well designed. Why? Because if I want to change things now, just like last
week in Scratch, I might have to change my code in multiple places or copy/paste is
never a good thing. And god forbid I want to add support for Yes and No as full
words, it's really going to get long.
So how can we solve this? Well, it turns out, we can combine some of these
thoughts. So let me try to improve the Yeses first. It turns out, if I delete that
clause, I can actually or things together. In Scratch, there's a couple of puzzle
pieces, if you didn't discover them, that literally have the word or and the word
and on them, which allow you to combine Boolean expressions. So that either this or
this is true, or this and this is true.
In C, you can't just say the word or. You instead use two vertical bars. And
vertical bars together mean or, logically. And so I can say, c equals equals
quote/unquote capital Y, Agreed.
And now I can get rid of this code down here. And let me go ahead and say, vertical
bar twice c equals quote/unquote N in all caps. And now my program's roughly a
third smaller, which is good.
DAVID J. MALAN: A really good question. Is there not a function to just ignore the
case? Short answer, there is. And we'll see how to do that in, actually, just about
a week's time. And in other languages, there's even more ways to just canonicalize
the user's input, throwing away any space characters they might have accidentally
hit, forcing everything to lowercase.
In C, It's going to be a little more work on our part to do that. But in fact, as
early as next week, we'll see how we can do that. But for now we're comparing
indeed just these literal values. Other questions?
AUDIENCE: So we're assuming the user's putting in what they're suggesting. How do
you handle if they were to put in a number?
DAVID J. MALAN: Really good question. So we are assuming, with this program and all
of my last ones, that the human's cooperating and when I ask for their name, they
typed in David and not 123, or, in this case, they typed in a single character and
not a full word. So this is one of the features often of using a library.
So for instance, if I run agree again, and I say something like sure, Enter, it
rejects it altogether. Why? Because s, u, r, e is a string of characters. It's not
a single character.
Now, I could just say something like x, which is neither y nor n, of course. But it
tolerates that because it's a single character. But built in to CS50's library is
some built-in rejections of inputs that's not expected.
So if you use get_int and the user types in not the number 1 or 2 but cat, C-A-T,
it will just prompt them again, prompt them again. And this is where, too, if you
were to do this manually in C, you end up writing this much code just to check for
all of these errors. That's why we use these training wheels for a few weeks just
to make the code more robust. But in a few weeks' time, we'll take the liberty
away. And you'll see and understand how it's indeed doing all that.
All right, so how about this. Let's now transition to something a little more
Scratch-like, literally, by creating how about another program here called meow--
so meow.c. We won't have any audio capabilities for this one. We'll just rely on
print.
And suppose that I wanted to write a program and see that just simulates a cat
meowing. So I don't need any user input just yet. So I'm just going to use stdio.h.
I'm going to do my usual int main(void) up here.
And then I'm just going to go ahead and do printf meow backslash n. And let's have
this cat meow three times, like last week. So I'm going to do meow, meow, meow.
Notice as an aside whenever you highlight the lines, you'll see little dots appear.
This is just a visual cue to you to let you figure out how many spaces you've
indented. VS Code, like a lot of editors, will automatically indent your code for
you. I've not been hitting the space bar four times every time.
I've not even been hitting Tab. However, in C, the convention is indeed to indent
lines where appropriate by four spaces-- so not three, not five. And these dots
help you see things so that they just line up as a matter of good style.
All right, so this program, I'm just going to stipulate right now, is indeed going
to work. Make meow-- which is kind of cute-- and now meow. There, three times.
Correct. It's meowing three times.
But of course, this is not well designed. It wasn't well designed in Scratch last
week. Why? What should I be doing differently? Yeah?
AUDIENCE: A loop?
DAVID J. MALAN: Yeah. It's a perfect opportunity for a loop. Why? Because if you
wanted to change maybe the capitalization of these words, or you wanted to change
the sound to like woof for a dog or something, you'd have to change it one, two,
three places. And that's just kind of stupid, right? In code, you should ideally
change things in one place. So how might I do that?
Well, we could introduce a loop, yes. But we're going to need another building
block as well that we had in Scratch, namely those things called variables. So
recall that a variable, like an algebra-- x, y, z, whatever-- can store a value for
you. And a variable in Scratch might have looked like this. You use this orange
puzzle piece to set a variable of any name, not just x, y, or z.
But you could call it something more descriptive, like counter, and you can set it
equal to some value. In C, the way to do this is similar to spirit to some of the
syntax we've seen thus far. You start by saying the name of the variable you want,
a single equal sign, and then the value. You want to initialize it too, copying
therefore from right to left. Why? Because the equal sign denotes, again,
assignment from right to left.
This isn't enough though. You might have the intuition already. What's missing
probably from this line of code just to create a variable?
AUDIENCE: Int.
DAVID J. MALAN: So we need int to make sure the computer knows that this is indeed
an int. And then lastly, semicolon as well. And that now completes the thought. So
a little more annoying than Scratch, but we're starting to see patterns here. So
not every piece of syntax will be new.
All right, if you want to increment the counter by one, Scratch uses the verb
change, and they mean add the value to counter. So if I want to increment an
existing variable called counter, this syntax is a little more interesting. It
turns out the code looks like this, which almost seems like a paradox. How can
counter equal counter plus 1? That's not how math works. But again, a single equal
sign is assignment from right to left.
So this is saying, take whatever the value of counter is, add 1 to it, and copy
that value from right to left into counter itself. You still need the semicolon,
but I claim you do not need to mention the keyword int when updating an existing
variable. So only when you create a variable in C do you use the word string, or
the word int, or any of the others we'll eventually see-- only when creating it or
initializing it for the first time.
Thereafter if you want to change it, it just exists. It's the word you gave it. The
computer is smart enough to at least remember what type it is. So this line is now
complete
Turns out, in code, as we'll see, it's pretty common to want to add things
together, increment things by one. So there's actually different syntax for the
same idea. The term of art here is syntactic sugar. There's often in code many ways
to do the same thing, even though, at the end of the day, they do exactly the same
functionality.
So for instance, if, after a few days of CS50, you find this a little tedious to
keep typing in some program, you can simplify it to just this. This is the
syntactic sugar. You can use plus equals and only mention the variable name once on
the left, and it just knows that means the previous thing. It's just slightly more
succinct.
This, too, is such a common thing to add 1 to a value. And it doesn't have to be 1.
But in this case, it is. But if it is indeed 1, you can further tighten the code up
to just do this, counter++. So any time in C you see ++, it means literally adding
1 to that particular variable.
There's other ways to do this in the other direction. If you want to subtract 1
from a variable, you can use any of the previous syntax using a minus sign instead
of plus, or you can more succinctly do counter--. This is the way a typical C
programmer would do this.
All right, so if we have no variables, let's go and solve the meowing with loop. So
in Scratch, we saw loops like this. This, of course, had the cat meow three times.
How do we do this in C?
Now, this is where things get a little more involved code-wise. but if you
understand each and every line, we'll follow logically what's going on. So here, I
claim, is one way to implement a loop that iterates three times in C. And this is
kind of ridiculous, right?
We went from two super simple puzzle pieces like this to, my god, it's 1, 2, 3, 4,
5, 6 lines of code, all of which are pretty involved. So that escalated quickly.
But what's each line doing? And we'll see other ways to do this more simply.
So we're initializing a variable called counter to 3, just like before. Why? Well,
what does it mean to loop or to repeat something three times? Well, it's like doing
something three times, and then do it, and then count down, and then do it, and
then count down, and then do it, until you're all out of counts.
So this is declaring a variable called counter, setting it equal to 3. Then I'm
inducing a loop in C, which is similar in spirit to repeat 3, but you have to do
more of the math yourself. So I'm asking the question in parentheses, while count
is greater than 0, what do I want to do? Well, per the indentation inside the curly
braces, I want to meow one time. And then, to be clear, what's this last line of
code doing? If counter starts off at three, this makes it 2 by subtracting 1 from
it.
Then what happens? By nature of the loop, just like in Scratch, it knows to go back
and forth. even though there's a nice, pretty arrow in Scratch, and there isn't
here, C knows to do this again, and again, and again, constantly asking this
question and then updating this value at the end.
So if I highlight just a few of these steps, the variable starts off at 3. And
actually, let me simplify 2. I claimed earlier that when using single variables,
people very often just call it i for int, or c for char, or s for string unless you
have multiple variables. So let me tighten the code up. And this already makes it
look a little more tolerable.
Let me actually tighten it up further, add one more step. So now this is about as
tight, as succinct as you can make this code at the moment. So what's actually
going to happen here? Well, the first line of code executes, and that initializes i
to 3.
Then we check the condition. While i is greater than 0, is i greater than 0? Well,
per my three fingers, obviously. So we print out meow on the screen. Then we
subtract 1 from i, at which point now we have 2 as the value of i. Then the code
goes back to the condition.
So put another way-- came with some props here-- so suppose this ball here is your
variable, and you initialize it to 3 with three stress balls, you can do something
three times, right? If I want to give out three stress balls-- here's your chance
for free stress balls without having to answer any questions. OK, there we go. So
here we go, subtracting 1 from my variable.
I'm left with two. Oh my god. All right, don't tell Sanders. [GRUNTS] Oh, I'm
sorry. Oh.
[LAUGHTER]
OK, that ended poorly. Apologies. All right. But now the educational point, though,
is that my variable has been decrement did further to just have-- I'm not throwing
that far again.
I can't do this. Here we go. All right, here we go. And one final subtraction. And
now our variable is left empty.
So we had three stress balls there, and that's all a variable is. It's some kind of
storage. It's actually, of course, implemented in the computer's memory. But
metaphorically, it's really just a bowl with some values. And every time you or, in
this case, subtract, you're just changing the value of that variable.
And then the code, meanwhile, of course, in parentheses, is just checking, is the
bowl empty? Is the bowl empty? Is the bowl empty? AKA, is i greater than 0 or not?
Any questions on how we've implemented loops in this way? And I owe you a stress
ball after class. Questions on loops?
All right, so it turns out, this is kind of ugly. And this really starts to take
the fun out of programming when you have to write out this sequence of steps. So it
turns out, there's other ways to do this. But first, let's see, logically, how else
you might express this because it's a little weird that we keep using zero.
So the one other way to do this would be to invert the logic. You could absolutely
start with your variable, call it i equal to 1. And then you could ask the
question, is i less than or equal to 3? And notice a bit of new syntax here.
On your typical keyboard, there is number less than or equal sign or greater than
or equal sign like you would write in math class with 1 over the other. And so in
C, you use two characters, less than followed by an equal sign or, if appropriate,
greater than followed by in equal sign. And that logically captures that idea.
So notice that I'm changing my questions. I'm initializing i to 1, and then I'm
going to increment it ultimately to 2 and then 3. But because I'm doing less than
or equal to, it's still going to go from 1, 2, 3. So that works too.
We could similarly do this yet another way. We could initialize i to 0, and then we
could say, well, i is less than 3 and keep incrementing it. And I showed this last
form is actually the most canonical.
DAVID J. MALAN: Yeah, it'll meow an extra-- a fourth time, in fact, total, right?
Because you'll start at 0, then 1, then 2, then 3. And less than or equal to 3--
sorry-- 3 will give you the fourth time. So we do want indeed to be just a single
less than.
All right, so now that we have those options, let me just give you one other. And
this one takes a little more, getting used to as well, but it's probably the more
common way to write this. Let me go ahead and propose that we implement this as
follows.
Let me go back to my code here. Let me go into my several printfs, getting rid of
all but one of them ultimately. And let's implement this in code. So let's do int i
get 0, how about then while i is less than 3, then let's go ahead and say printf
quote/unquote meow-- melow-- meow backslash n. And then we have to do i minus minus
or plus plus?
DAVID J. MALAN: So plus plus because we're starting at 0 and going up to but not
through 3. So let me go ahead now and make meow after clearing my terminal, ./meow,
and it's still just as correct. But it's a little more-- it's a little better
designed. Why?
Because now if I want to change it from 3 to 30 times, for instance, I can change
it there. I can recompile my code. I can do ./meow, and done. I don't have to copy
and paste it 27 more times to get that effect. And I can even change what the word
is by changing it in just one location.
But it turns out, there's other ways to do this too. And let me propose that we
introduce you to what's called a for loop as well. So if you want to repeat
something three times, you can absolutely take the while loop approach that we just
saw, or you can do this. And this one takes a little more, getting used to, but it
kind of consolidates into one line all of the same logic.
So notice, we have the keyword for here. And for is just a preposition in this case
that generally implies, here comes a loop. Inside of parentheses here is not just a
Boolean expression.
And this is where things get a little weird. There's three things-- to the left of
the semicolon, in the middle of the two semicolons, and to the right of the
semicolon. This is really the only other context we'll see semicolons, and it's
weird. Normally, it's been at the end of the line. Now it's two of them in the
middle of the line, but this is the way humans decided years ago to do it.
So what is this doing? Almost the same thing. It is going to initialize a variable
called i to 0. It's going to then check. If it's less than 3, it's then going to do
whatever's in the curly braces, and it's lastly going to increment i and repeat.
But now you repeat through those three other highlights. I check if i is less than
3. It is.
I print out meow. i gets incremented. I now check. Is i less than 3? No, it's not,
because 3 is not less than 3.
And so the whole thing stops. And whatever code is below this curly brace, if any,
starts executing instead. Just like in Scratch, you break out of the loop and the
puzzle pieces being hugged. Questions, then, about this alternative syntax for
loops, AKA, a for loop?
DAVID J. MALAN: Yeah. Can I explain again why it doesn't reset to 0? Honestly, just
because. This was the syntax they chose. This first part before the first semicolon
is only executed once just because. That's how it's designed. Everything else
cycles again and again.
And this is just an alternative syntax to using the slightly more lines of code. It
was, like, six lines of code using the while loop. Logically, it's the same thing.
Programmers, once they get more comfortable, tend to prefer this because it just
expresses all your same thoughts more succinctly. That's all. Yeah?
DAVID J. MALAN: OK. So let's just work this into my meow example. Let me go back to
the code here. And notice, indeed, if I highlight all these lines, I think we can
tighten this up.
Let me get rid of all of those and instead do for int i equals 0. And I'm saying
equals. Most programmers would say gets. So int i gets 0 means assignment-- the
word get.
Now I'm going to do i is less than 3 i++. Now in here I'm going to do my printf
quote/unquote meow backslash n. And so it's indeed a little tighter. I mean, two of
the lines are just curly braces.
There's really only two juicy lines of code now. Let me go ahead and do make
meow, ./meow. And again, we're back in business with three of them printing only.
All right, there's one last structure we should explore just because it's sometimes
useful. This was a forever block. And this would be a little weird in Scratch to
just say meow forever, or at least without waiting.
But there is indeed a forever block in Scratch, which means do the following,
forever. And I proposed, I think, verbally last week at least one example where
this is useful. Meowing forever, a little annoying. But can you think of common
cases where you might want to write code or use a program that loops forever? Yeah?
DAVID J. MALAN: Yeah, playing music. Like Spotify playlists, just repeating again
and again would be some kind of loop.
DAVID J. MALAN: Checking for input. So yeah, get_string is essentially just waiting
there forever for me to type in some input until I do.
DAVID J. MALAN: Checking the time and actually maintaining human time, like a wall
clock. Behind you? Is that the same?
DAVID J. MALAN: OK, checking the time. And one more? Detecting a key press too.
Like in Scratch, just waiting for some kind of event to happen, just like on a
phone or a browser. And so there's so many examples where you might want to do
something forever-- just so you've seen the corresponding C building block.
It's a little weird, but this is probably the most canonical way to do it in C. If
you want to print meow forever-- which would be a little crazy because it would
literally print and take over your computer printing forever meow-- you would
generally do it like this. Why? Well, a while loop expects in parentheses a Boolean
expression, and a Boolean expression is, again, a yes/no, a true/false question.
But if you want the answer to that question always to be yes-- or really, always to
be true, turns out in C and a lot of languages will then just say true because
true-- T-R-U-E-- is never going to change magically to false. I mean, it's just a
special word in the programming language. So by saying while true, it just means do
the following forever.
Another common paradigm before true and false became commonplace would be to do
this instead-- change while 1. You might see in online examples and texts and the
like, while 1 is really the same thing. Any value that is 0 is generally
interpreted as false by a computer.
Any value that is 1 or any other non-zero value is generally interpreted as true.
And so this, too, would have the same effect, saying while true or while 1.
Generally speaking, while true is perhaps a little clearer these days.
Now, meowing forever is not a good thing. But suppose I did that by intent or by
accident. Well, let's try this.
So here I'll go into my code. I'm going to get rid of for loop and change my while
loop to, how about, true. And in this case here, well, we'll keep it-- let's do
this. Make meow, Enter.
And you'll see this, use of undeclared identifier true. This is actually hinting at
my mention that the old way was 0 and 1. Nowadays, you could say true or false. But
true and false are themselves special words that you have to include. And it turns
out, if you want to use special Boolean values like this, there's another header
file we haven't seen called stdbool that essentially creates true and false as
keywords.
Alternatively, CS50 includes that same file. So it's more common in CS50 is to see
it like this. Now if I clear my terminal window and do make meow and then ./meow
and hit Enter, well, unfortunately, this isn't the best thing to do infinitely when
you're in the cloud using a browser. This is indeed a browser, just full-screened
here. This means I'm sending millions of meows over the internet to my computer
here.
So this will happen to you at some point, probably not with meow. But you'll lose
control over your terminal window. Why? Because you screwed up. And you have an
infinite loop.
You didn't really intend it. Or maybe you did. You were curious to see what
happens. What do you do? When does the meowing stop? What recourse do we have here?
Well, Control-C will be your friend.
Sometimes you have to hit it a bunch in a cloud environment. But Control-C for
cancel will interrupt a program that's running. And I promise that almost all of
you will at some point accidentally introduce an infinite loop because your math is
slightly off.
When in doubt, click in the terminal window and hit Control-C-- sometimes multiple
times-- and that will indeed cancel whatever is happening there. In this case, I
might have intended it. But sometimes it's not, in fact, intended.
All right, so we've been taking for granted this whole graphical user interface for
some time and, indeed, the commands that I'm typing and the buttons on I'm
clicking. And let me just give you a better sense of what it is we are using
underneath the hood this whole time, namely an operating system called Linux. So I
keep alluding verbally, of course, to Macs and PCs because almost all of us are
running macOS or Windows on our desktops or laptops nowadays. But there's lots of
other operating systems out there, and one of the most popular one is called Linux.
And Linux is very often used on servers nowadays-- companies that host email,
companies that host websites or apps, more generally. Certain computer scientists
or computer science students often like to brag that they run Linux just because
that's a thing. But it is really just an alternative to macOS or Windows that
provides you with both a GUI, if you want it, but also an especially a command line
environment.
Now, fun fact-- Windows and macOS do have terminal windows or the equivalent
thereof. And eventually, you might use it on your own Mac or PC to solve some
problem. But Linux is really known for, along with other operating systems, its
command line environment, which, again, I distinguished earlier from GUI as a
Command Line Interface, or CLI. And that refers, really, to the terminal window.
So if I go back to VS Code here, and let me, in fact, go ahead and close my tab and
focus entirely on the terminal window, this terminal window is really just your
command line interface to your very own server in the cloud. The term of art here
is you each will have your own container in the cloud, which is like your own
computer running somewhere on the internet with your own username and password to
which you have access and your own hard drive, if you will, your own home folder
that has all of your files for the class. And it's only accessible to you unless
you enable live sharing thereof.
So when you're typing commands here, it looks like you're typing them, of course,
on your own Mac or PC. But they're actually being sent over the browser to some
server in the cloud where you are controlling, really, your own account therein.
So it turns out that there are other commands that are worth knowing. And we'll
give you just a few of these today. And over the coming weeks will you have
opportunities to play with others as well.
But these are some of the basics. And they're all incredibly succinct because,
indeed, for things you're typing at the command line, humans generally have not
wanted to type out long commands. So a lot of these are abbreviations here.
Now, perhaps the most common one I'll start with first is ls, a lowercase l and a
lowercase s that stands for, succinctly, list. So if I go to my terminal window now
where up until now, I've only typed code, which is a VS Code thing for creating an
opening files, and make, which triggers the compilation of my code, what if I now
type ls? This will list all of the files in my current folder-- my hard drive in
the cloud, if you will.
So if I hit Enter, you'll see a whole bunch of results. Now, they're color coded
too. The white ones here end in .c. Those are the source code files I've written
during class today-- agree.c, compare.c, hello.c, and meow.c. And you can perhaps
guess, the green ones here that just by convention have an asterisk on the end to
denote that they're special represent what? One of the four others. Yeah?
DAVID J. MALAN: Yeah, the machine code. So those are my actual programs that are
identically named minus the .c extension. And the asterisk means that they're
executable. That is in the world of macOS or Windows, you would double click. But
in the world of a command line environment, that means you do ./ and then the name
without the asterisk to execute or run the code therein.
So it's just two different places to be. You're welcome to use whatever you're
comfortable with. But over time will you naturally get more comfortable and capable
with the terminal window alone.
Well, what else is on this list here? Well, during the break, I saw that in at
least one of you, for instance, had created a file called hello instead of hello.c.
So you were in a situation where you did this accidentally and hit Enter. And then
you went ahead and typed in all of your code like this.
And then down in your terminal window, you were trying to do make hello, Enter. And
this now didn't actually do anything. I can't-- I'm hitting-- I'm trying to run the
command. I got permission denied, as at least one of you did.
Now, why is that? Well, let's just do a quick check. If I do ls, I see now hello,
but hello has no asterisk next to it, which means it's not executable. That's my
code. Why? Well, notice the top of my tab confirms, oh, I screwed up. I didn't name
my file hello.c, which it just has to be.
So what do you do? Well, you could very hackishly copy this, create a new file,
paste it in. Or no, no, no. We know how to rename things now here because that's
one of our options.
Let me do this. Let me do mv for move, hello, and then hello.c, and hit Enter.
You'll see the tab closes because hello no longer exists.
But if I now type ls, you'll see, ah, there is hello.c. And if I open that file
now, whew, there's all of my same code. And now if I do make hello-- make hello--
now I do get an executable file where in the world is restored.
So mv, it's just a command not just for renaming, but it also turns out, eventually
for moving files as well. You can also create directories or folders. So for
instance, if I go into VS Code again, and suppose I hover over here and click not
on the plus file icon but plus folder, I can create a folder called, for instance,
pset1 for problem set 1 in the class. And you'll see now that it's empty because
all of my other files are in the default folder of my account.
But I could also go in there like this. And I could click on File, and now I can
create a new file called mario.c, which is one of the first problems, for instance.
But you'll notice now that mario.c is inside of the pset1 folder.
So if I zoom out and I type ls at my terminal window, I won't see mario.c anywhere.
But I do see a pset1 folder. And it's in light blue followed by a slash, which you
don't have to type. It just indicates that's a folder.
Now, I can visually at top left obviously see pwet1 contains mario.c. But if I try
to do something like make mario here, no rule to make target mario. It just doesn't
seem to exist. And that's because you're in the wrong directory.
So if I want to actually change into that directory, I can do cd, space, pset1,
Enter. And now you'll see my prompt changes. And this is just a common convention,
but it's not the only one out there.
Now I still have a dollar sign, which indicates where I can type commands. But
before it, I see a reminder constantly what folder I'm in. And we put that there
deliberately, like a lot of Linux users do just to remind themselves where they are
because unlike macOS or Windows, where you have a nice, big window telling you
where you are, at the command line, you need to be reminded textually.
AUDIENCE: Mario.c
DAVID J. MALAN: Yeah, mario.c. And now if I want to open it-- if I want to actually
compile it, I can run make mario in this directory once I actually type out all of
the code. Rest assured that in problem sets and labs, we'll almost always--
certainly, in the first weeks of the class-- give you exactly the commands to type.
Odds are because it's new to many of you, you will accidentally type the wrong
commands.
No big deal. Just remember that you have different ways to solve these problems.
You've got the graphical File Explorer, which should feel a little more familiar.
But in time you'll start to know and, honestly, probably prefer commands like
these-- so cd for Change Directory, cp for copy a file, ls for list, mkdir to make
a directory-- create a new folder at the command line instead of with the button--
mv for move or rename, rm for--
AUDIENCE: Remove.
DAVID J. MALAN: Remove. So be careful with that one. Rmdir, remove directory. And
there's dozens, hundreds of other commands. You won't need many of them, but we'll
start to scratch the surface all the more over time.
So with that said, we have some problems still to solve, but we promised cookies
today. So let's go ahead and take a 10-minute break. Cookies are now served in the
transept. And we'll be back here in 10.
All right, we are back. And up until now, each of the code examples in C we've done
have been designed to show one specific topic. But we thought we'd try to take a
step back and solve a more general problem and give you a sense of when given a
problem set, for instance, or just a programming problem more generally, where you
even begin and how you go about approaching it when it's not obvious what the point
of the exercise is.
So one of my favorite games from yesteryear is this one here, "Super Mario
Brothers" that has come in so many different forms since. But in this original two-
dimensional side scroller game, there was a lot of artwork like this. So for
instance, up here in the sky were four question marks. And we'll find that in C and
a lot of programming languages initially, it's a lot easier, a lot more accessible
to focus really on black and white type interactive programs textually as opposed
to full-fledged graphics and the like, but more on the more graphical acoustic type
of programs before long.
But for now let me go over and propose that we try to just implement in ASCII art--
ASCII, again, being the code that maps numbers to letters, at least for English,
into a textual version of these for question marks in the sky. So for this, let me
go over to VS Code. I'll create my own version of mario.c that will be different
from the one you're challenged with in problem set 1. Indeed, in problem set 1,
you'll be challenged to build a little something like this, albeit with hashtags
for ASCII art instead of graphics.
And in mario.c, I want to just solve this simple problem first. So it's all
involving output. So I'll do include stdio.h so I can use printf.
I'll do my int main(void)-- more on why we keep doing that in future weeks. And I'm
just going to do something simple initially, like 1, 2, 3, 4, backslash n. This is
about the simplest way I can implement four question marks in the sky like these
here using pure text like this.
So let me go ahead and do make mario, ./mario, and voila. We have those four
question marks. But we've seen, of course, that there are better ways to do this.
And if you wanted to generalize this to be five question marks, six, 60 different
question marks, loop was always the answer for not repeating ourselves. So maybe I
should rewrite this a little bit more flexibly and say something like this, for in
i get 0, i less than 4, i++. And then inside of a for loop, now I can just do a
single question mark, but I don't think what I've just done is correct. Any one
spot the aesthetic bug already? Yeah, why is this wrong if I want to print the same
thing? Yeah?
DAVID J. MALAN: Yeah. So I don't think I want a backslash n after every question
mark because the goal is, again, this row of question marks in the sky. So if I now
recompile this, make mario, ./mario, OK, it's almost there. But now I have that
regression to where the dollar sign is not on its own line. So I think I need a new
line, but I don't think I want it here because that was not going to end well.
Where do I want to instead? Any instinct? Yeah? Yeah, so outside for loop. So
indeed, I can just go below line 8 and above line 9, creating a new one. And now
it's totally fine to just print a new line like that.
You don't have to print anything else with it. It's indeed a character unto itself.
So let's do make mario one last time, ./mario. OK, so now we're back in business
there.
Well, what if we wanted to do some other scene from "Mario," such as this one here
where there's a lot of vertical obstacles like these bricks here? If I wanted to
print out now a column of three bricks-- and I'll use hashtags for these instead of
anything graphical-- well, I think we're almost there, right? I think I can now--
it's almost maybe a little easier.
I can go back here, change the question mark to something that looks more like a
brick, like this hash symbol. And I think now I do want the new line character
because when I now do make mario, ./mario, OK, there's my wall of four. Oh, but
wait. I didn't want four. I wanted to be consistent just with this particular scene
here, so I just want three.
So I can still change it in one place. And here, again, is that paradigm. Even
whether you're using 4 or 3, if you get into the habit of starting counting from 0,
you go on up to but not through the value you want to count up to. So that's why
I'm using less than instead of less than or equal to there. So this would be the
common paradigm, though you could count it like we saw earlier in different ways.
But what if things escalate one level further? And when you're in the underground
version of "Super Mario Brothers," there's a lot of these underground obstructions,
including grids of bricks like this. And let me conjecture that if you slice this
up, it's roughly a 3 by 3 grid of bricks that all interlock prettily to give us
just one big, large brick like this.
So if I want to print out a 3 by 3 grid, now things are getting a little more
interesting because up until now, I've printed either one row horizontally or one
column vertically. But we haven't really seen any code where I'm printing or living
in two different dimensions like the game would imply.
But let me propose that we could do this. Let me go ahead and say, all right,
suppose I want to print a 3 by 3 grid of bricks. It's really that I want to print,
what, three rows of bricks. A grid is three rows. So if I take the high-level idea
and reduce it to something a little simpler, how do I do that?
Well, let me get rid of the printf for a moment as I did. And let me just stipulate
that this for loop, even though it doesn't do anything useful yet, will do
something how many times just by design? All right, three times. This for loop is
good to go. It will do something three times by just using i to do the counting.
All right, well, if I want to print out now a row of three bricks all on the same
line, that's pretty similar to what we did earlier when I just wanted to print out
four question marks in the sky. So we've seen a solution there. And I daresay we
can compose one into the other.
So if I want to print out a row of bricks, I could just do this for in i get 0 i
less than 3 i++, and then inside of this inner loop, if you will, let me print out
a single brick like this. And then I don't like where this is going but, I think
I've taken two ideas and I've combined them. But what might be problematic about
lines 5 and 7 at the moment? What might be bad here? Yeah, in back?
DAVID J. MALAN: Yeah, I'm using the same integer i, which I feel like could get me
into trouble. If I'm trying to count three things here, but then I'm hijacking this
variable and using it inside of the loop, I feel like I should avoid this collision
of names. And so what's a good alternative to i? Well, a programmer, if nesting
loops in this way, would pretty commonly go with j. You could certainly change this
to be rows and columns if you want more descriptive variables. But i and j is
pretty canonical.
So I'm going to go ahead and do this, j++ instead of i++ everywhere. And let me try
compiling this. So make mario, Enter, ./mario.
OK, so a couple of things are wrong here. This is not a 3 by 3 grid. But if you
count these things, how many did I indeed print at least? You can probably just
guess logically.
AUDIENCE: Nine.
DAVID J. MALAN: Yeah, there's nine hashes there. Unfortunately, they're all on the
same line instead of on three different lines. So where logically can I fix this?
I'm definitely printing all the bricks. They're just not on the right levels. Yeah?
AUDIENCE: If you put a new line at the first loop, then you'll get three separate
lines.
DAVID J. MALAN: Yeah. So put a new line after the first loop, this inner loop, if
you will, the nested loop, if you will. So let me go ahead and print out just a
backslash n here. And what's this doing? Well, I think that's going to solve it by
just moving the cursor to the next line after you've done one row.
So let me go ahead and do make mario, Enter, ./mario, and now we're in business. So
it's a very simplistic version of this same graphic, but I'm leveraging two
different ideas now-- or the same idea twice rather now. I'm using one loop to
control my cursor going row, by row, by row. But then within that loop, I'm doing
left to right, dot, dot, dot, dot, dot, with printing out each of these individual
bricks like this.
Now, there's a little sloppiness here still. If I want this to always be a square
just because that's what it looks like in the game, well, I could change it to be a
4 by 4 square by doing this or a 5 by 5 grid-- whoops-- by doing this. Why is this
perhaps not the best design to just keep changing the numbers when I want to change
the size? Where could this go awry? Yeah?
DAVID J. MALAN: Yeah. If it's always going to be a square and height is going to be
the same as width, I'm just inviting trouble here, right? Eventually, I'm going to
screw up. I'm going to change one but not the other. Then it's going to come out to
be a rectangle instead of a proper square. So I should probably solve this a little
differently.
So let me do that. At the top of my main function here, let me go ahead and give
myself a variable called maybe n for the number of bricks I want horizontally and
vertically. And I'll just initialize that to 3 initially.
And instead of putting 3 here, I'll literally just use n. But I'll do it in both
places so that now, henceforth, if I ever want to change this and change it to 4,
or 5, or anything else, I'm all done. It's better designed because there's a lower
probability of mistakes.
And instead of just declaring a simple variable like we did in Scratch, I can
further harden my code, so to speak, by declaring it to be a constant using the
keyword const. Now, this is just a feature of C and some other languages to protect
you against yourself by proactively saying, n is a constant, specifically the
number 5 or, previously, the number 3. You cannot accidentally write code elsewhere
that changes it.
The computer will throw an error and catch that error. So it's just a way of
programming a little more defensively. Some languages have this.
Some languages don't. But in general, it's a good practice. It makes your code
better designed because it's just as less vulnerable to mistakes by you,
colleagues, or anyone else using the code.
So let me change this back to 3 just to be our default. But now I'm using n in both
places. And if I do make mario, ./mario, we're back to where we originally started.
But the code is a little more better designed.
And let me note this too. All this time, I've been mentioning that correctness is
important. Design is important. There is also this matter of style. I've been very
deliberately writing pretty code, if you will-- not just the syntax highlighting,
which, is automatic.
But notice that I keep indenting everything nicely. Any time I have curly braces,
like on lines 4 and 14, everything is indented one level. When I have additional
curly braces on lines 7 and 13, everything is nicely indented as well.
Technically speaking, the computer does not care about that kind of whitespace, so
to speak. And you could really make a mess of things like this because you have a
strange sense of style or just because you're being a little sloppy. But this code
is actually still correct.
Now, you'll often find that there's tools that can help you format your code for
you in a manner consistent with a courses or a company's style. But this is the
kind of muscle memory you'll want to develop over time too. Take these VS Code
suggestions as it's outputting lines of code for you because it's trying to format
your code in a readable way.
And, oh, my god, if and when you do have bugs in your code and things aren't even
indented properly, there's no way you the human are going to be able to wrap your
mind around what's happening and where. You're just making the problem harder for
yourself. So do get into this habit too of manifesting good style as well.
All right, well, let me propose that we don't only want a 3 by 3 grid. We want this
to be a little more dynamic. So suppose we moved away from a constant to just using
an integer called n. And let's ask the user for the size of this grid as by
prompting them with get_int, as we've done before. And I'll store it in n here.
And then I can go ahead and, more dynamically, run make mario to compile it--
whoops. Oh, I screwed up accidentally. What is it suggesting I do, albeit
cryptically?
DAVID J. MALAN: Yeah, I forgot to include the CS50 header file up top. And that's
why it doesn't know that get_int is, in fact, valid. So that's an easy fix.
I'm just going to go up here and include cs50.h. Now I'm going to clear my terminal
and rerun make mario. Now we're good-- ./mario.
And now notice I'm prompted for size. So if I type in 3, it's the same as before.
If I type in 10, it's even bigger, but it happens all now automatically.
But there are some things that we're not detecting. For instance, suppose I type in
cat. Well, that's handled by the get_int function, as I claimed earlier. That's one
of the features of using a library. You don't have to deal with erroneous input.
But we only designed a function called get_int to get you an integer. We don't know
if you want it to be positive, negative, zero, or some combination thereof. And
it's kind of weird to allow the user to type in negative 1 for the size of the grid
or negative 3 for the size of the grid. And indeed, your code does nothing, so at
least it's not crashing. But that's kind of stupid, right? It'd be nice to force
the user if they want a grid to give us a positive value.
So how could we do this? Well, I could go up here and I could say something like if
n is less than 1-- so if it's 0 or negative, which I don't want, what could I do?
Well, I could say, well, prompt the user again for the size. And now notice, I'm
not declaring n again because once it exists, you don't have to mention the data
type again. We said that earlier. But this is kind of stupid. Why?
Because now when you've given the user a second chance, OK, now maybe I'll do-- all
right, if this version of n is less than 1, well, let's just go and prompt the user
a third time. I mean, you can see where this is stupidly going. This can't be the
right solution to keep typing recursively the same thing again and again. Where
would it stop? You'd have to give them a finite number of chances or just make a
mess of your code.
DAVID J. MALAN: Yeah, so some kind of loop. We've seen a while loop. We've seen a
for loop, so maybe one of those. So let me try this. Let me delete this messiness
and just go back to the first question. And let me do this.
So while n is less than 1-- so while the number is not what we want-- let's just
prompt the user in a loop this time for the size again. Now, here too, this is
better because it's only two requests for information. But clearly, lines 6 and 9
are pretty much identical other than the int.
And if I went in and changed the size, if I add this, if I change the wording here,
change it to a different language, I have to change it in two places. That's bad.
Copy/paste, bad.
So what might be better? Well, it turns out, there's another paradigm in C that you
can use that gets around this problem, this duplication of code. It would be much
nicer if I just write the code once. And I can do that using a third type of loop
called a do while loop.
So it turns out, in C, you can do this. If you want to get the value of a variable
like n, first just to create the variable without an initial value. So int n
semicolon means we don't know what value it has, yes. But that's OK. We're going to
add a value to it eventually.
Then I'm going to say this, do, literally. I'm going to open my curly braces. And
what do I want to do? I want to assign to n the return value of get_int, prompting
the user for size. Well, when do you want to do that? I want to do that while n is
less than 1.
And this code now achieves the exact same goal, but by never repeating myself. Why?
Well, notice on these lines of code now, I'm literally saying on line 6, give me a
variable called n of type integer. It doesn't have a value initially, but that's
fine. You can do that.
Line 7 says, do the following. What do you want to do? get_int, prompting the user
with the word size, and just store that value in n. But because C code runs top to
bottom, left to right, now it's reasonable on line 11 to ask that question, OK, is
the current value of n, which it definitely got on line 8, less than 1? And if the
user didn't cooperate-- they typed in 0, or negative 1, or negative 3-- what's
going to happen? It's going to go back up here and repeat, repeat, repeat
everything in the do while loop.
So a do while loop in C-- which is not something some other languages have. Python,
if you know it, does not have a do while loop. This is perhaps the cleanest way to
achieve this, even though it's a little weird that you have to declare your
variable, create your variable up top, and then check it down below.
But otherwise, it's similar to a while loop. It just flips the order in which
you're asking the question. Any questions on this construct? And do while, in
general, is super useful when you want to get input from the user and make sure it
meets certain requirements.
So all right, so now that we have this building block after that interlude. How can
I go about cleaning up this code? And then let's conclude by taking a look at
things that our code can't do or can't do very well or correctly. Let me propose
that in a final version of Mario, let me just add what are called now some
comments.
So it turns out, in code in C, you can define what are called comments, which are
just notes to self. Some of you discovered these in Scratch. There's little yellow
sticky notes you can use to add citations or explanations.
At some point, the programmer should know what individual lines of code do. But
it's nice to be able to glance at this comment on line 6 that starts with two
slashes, and it gets grayed out because of syntax highlighting. It's not logic.
It's just a note to self. It generally gives me a little cheat sheet as to what the
following lines of code should be doing and/or why.
And then down here, well, there's a second block of code that's a bunch of lines.
But together, this just, what, prints grid of bricks. And so it's another comment
to myself that just makes it a little more understandable what these 20-some-odd
lines of code are doing by adding some English explanations thereof.
But now that I have these, wouldn't it be nice if I could abstract these pieces of
functionality away, this getting of the size and this printing of the grid? In
other words, suppose that you didn't know where to begin with this problem. And the
problem at hand were literally implement a program that prints a grid of bricks of
some variable size-- 3, or 4, or 5, or whatever the human types in. If you have
really no idea where to start, comments are actually a good way of getting started
because comments can be an approximation of what we call last week pseudocode.
Pseudocode is terse English that gets your point across, like for the phone book
searching like last time.
So if you didn't really know where to begin, you could do something like this. I
could, for instance, just say, Get size of grid as my first step and then Print
grid of bricks as my second step. And that's it for my program thus far.
And now I can even go this far. I could say, well, let's suppose that there's just
a function already that exists called get size. I could do something like this. I
could do int n equals get_size.
And now I just have to assume for the moment that some abstraction called get_size
exists. It doesn't. This does not come with the CS50 library. But I could invent
it, I bet.
How else might I proceed? Well, let's just assume for the moment that there's also
a function called print_grid that just prints a grid of that size n. So here too is
an abstraction. These puzzle pieces don't exist. These functions don't yet exist.
But in C, just like in Scratch, I can create my own functions.
How do I do that? Well, let me go down later in the file. And by convention, you
generally want to leave main at the top of your code. Why? Because it's the main
function, and it's just where the human eye is going to look to see what some file
of code does.
And let me do this. I want to create a function of my own called get_size whose
purpose in life is to get the size that the user wants. I want this function to
return an integer. And the syntax for doing that is this, right, similar to a
variable, the data type that this function returns. I don't need this function to
take any inputs.
And so I'm going to use a new keyword that we've actually been using thus far--
more on it another time-- just called void, which just means this get_size function
does not take any inputs. It does have an output. It outputs an int. And this is
just the weird order in which you write it. You write the output format, the name
of the function, and then the inputs, if any, inside of parentheses.
And now I can implement get_size. But I've already implemented get_size. Or at
least now at this point in the story, I at least know concretely what to do. And I
could figure out eventually, with some trial and error perhaps, all right, if I
declare a variable and I do the following n equals get_int, prompting the user for
size, and I keep doing that while n is less than 1, once that block of code is
done, here is a new keyword in C where you can return that value n.
So I keep referring to these values that some functions return as return values. In
C, there's literally a keyword called return that will hand back to any function
that uses that function the value in question. So in a nutshell, between lines 15
and 21 now, here is some code identical to our solution earlier that gets a value n
from the user that is positive. It's 1, or 2, or higher. It's not 0, or it's not
less than 1.
And as soon as we've got that value, we hand it back as a return value. Notice how
I'm using this function on line 7. Just like with get_int, just like with
get_string, I'm calling the function-- nothing in the parentheses in this case. But
then I'm using the assignment operator to copy whatever its return value is into my
variable n. And so now I have a function that didn't use to exist called get_size
that gets me a positive integer no matter what.
And now for the grid, how do I do this? How do I invent a function called
print_grid that takes a single argument, a number and prints a grid of that size?
Well, let's go down here. I'm going to write the name of this function print_grid.
This function just needs to print. It has a side effect, as we keep saying.
So I'm just going to say it has no return value. It's just void. It doesn't have an
output, per se. It's just an aesthetic side effect.
But it does take in an argument. An argument is an input, and the syntax for this
in C is to name the type of the input it takes and the name of the variable. And I
could call this anything I want. I'll call it size. I could call it n. And it's OK
to use the same variable in different functions, but I'll call it size just to be
distinct.
And then in this function, I'm just going to copy from memory the same code is
before. for int i get 0, i less than size-- instead of 3-- i++, inside of this, for
int j gets 0, j is less than size j++, and inside of that, print out with printf a
single hash, print out after that loop a single new line, and that's it.
Now, I did this fast, admittedly. But it's the same code that I wrote earlier. But
now, just like I did with Scratch, let me just arbitrarily hit Enter a bunch of
times to move the code out of sight, out of mind. Now I have abstractions. I have
puzzle pieces that now exist called get_size and print_grid, syntax for which takes
some getting used to, but they now just exist.
When I hadn't included CS50 library, get_int didn't work. But that's not the issue
here because this is not from a library. I just invented this. C takes you
literally. And if you define these functions at the bottom of your file, they don't
exist on line 7 or 10.
So I could do this. I could, all right, fine, well, let me just highlight all of
this, cut to my clipboard, and paste it up here. This would solve the problem. I
could just move all of those functions at the top of my file.
That's annoying because now main is at the bottom of the file. It's going to take
longer to find it. That's not a clean solution.
So let me put it back where it was at the bottom. And let me do this. This is the
only time in CS50 and, really in C programming where copy/paste is reasonable. If
you copy and paste the first line of code from each function and then end it with a
semicolon, you can tease the compiler by giving it just enough of a hint at the top
of the file that, OK, these functions don't exist till down later.
But here's a hint that they will exist. This is how you can convince the compiler
to trust you. So those other functions can still be lower in the file, below main.
But now when I do make mario-- oh, damn it. Oh, I said print instead of printf.
That's my bad-- printf.
So if I do make mario, ./mario, now I can type in 3, and we're back in business.
Now, this was a very heavy-handed way in long way to get to a much more complicated
solution. But this solution, in some sense, is better designed. Why? Because now,
especially without the comments, I mean, look how short my code is.
My main function is literally two lines of code. Why? Well, I factored out the
juicy stuff into its own functions. And now, especially if I'm working with
colleagues or others, you could imagine splitting up large programs into smaller
parts, having different people implement different parts, so long as you all agree
in advance on what those inputs and those outputs actually are.
All right, so let's now consider what computers can do well and not so well. C
indeed supports a whole bunch of operators, mathematically, via which we can do
addition, and subtraction, multiplication, division, and even calculate the
remainder when you divide one number by another. In fact, why don't we go ahead and
use these in a very simple program and make our very own calculator?
So let me go over here to VS Code. Let me go ahead and create a new file called
calculator.c. And in this file, let's go ahead and first include a couple of now
familiar header files-- cs50.h as well as stdio.h.
Let's go ahead then and declare main with int main(void). And then inside of main,
let's do something relatively simple. Let's declare an int and call it x, and set
it equal to whatever the return value is of get int, prompting the user for a value
for x.
Let's then give ourselves a second variable. We'll call it, say, y. Set that equal
to the return value of another call to get_int, prompting the user this time for
that value y. And then let's very simply go ahead at the very end and just print
out, say, the sum of x plus y, a super simple calculator.
So I'll use printf, quote/unquote, %i for integer, backslash n to give me the new
line. Then I'm going to go ahead and do x plus y to indeed print out the sum.
But it turns out that sometimes there are going to be limitations that we bump up
against. And let me get a little more ambitious here. Let me clear my terminal
window. And let me go ahead and rerun calculator again.
And this time, let's, oh, 2 billion for x, and let's type in the same for y. And,
of course, now the answer of 2 billion plus 2 billion should, of course, be 4
billion. And yet, it's not.
So curiously, we see, of all things, a negative number here, which suggests that
somehow the plus operator doesn't quite work as well as we might like. Now, why
might this actually be?
Well, it turns out that inside of your computer is, of course, memory, or RAM,
Random Access Memory. And depending on the size of your computer and the type of
computer, it might very well look a little something like this-- a little circuit
board with these black little modules on it that actually contain all of the bytes
of your computer's memory.
Unfortunately, you and I only have a finite amount of this memory inside of our
computers, which means no matter how high we want to count, there's ultimately
going to be a limitation on how high we can count because we only have a finite
amount of memory. We don't have an infinite number of zeros and ones to play with.
We have to actually be bounded ultimately.
So what's the implication of this? Well, it turns out that computers typically use
as many as 32 bits in zeros or ones to represent something like an integer, or in
C, in int. So for instance, the smallest number we could represent using 32 ints,
of course, using 32 bits, of course, would be zero-- 32 zeros like this here.
And the biggest number we could represent is by changing all of those zeros to
ones, which, in this case, will ideally give us a number that equals roughly 4
billion in total. It's actually 4,294,967,295 maximally if you set all 32 of those
bits to ones and then do out the actual math.
The catch, though, is that we humans and computers in general also sometimes want
to and need to be able to represent negative numbers. So if you want to represent
negative numbers as well as positive numbers in 0, you can't really just start
counting at 0 and go all the way up to roughly 4 billion. You've got to split the
difference and maybe allocate half of those patterns of zeros and ones two negative
numbers and the other half roughly to positive numbers.
So in fact, in practice, when you're using even as many as 32 bits, the highest
most computers could count, certainly in a program like this in C using an int,
would be roughly 2 billion. That is 2,147,483,647. But the flip side of that is
that we could also now, using different patterns of bits, represent negative
numbers as low as negative 2 billion, give or take.
But the implication then, of course, is that if we only have a finite number of
bits and can only count so high, at some point, we're going to run out of bits, so
to speak. In other words, we encounter what's generally known as integer overflow
where you want to use more bits than you have available. And as a result, you
overflow the available space.
What does this mean, in fact, in real terms? Well, let's suppose that you only have
three bits, but I'm going to gray out a fourth bit just to convey where we'd like
to put an additional bit ultimately. If this of course, is 0, per week 0's
discussion, this is 1, 2, 3, 4, 5, 6, 7. Now, ideally, in binary, if you want to
add one more to this value 7, you're going to have to carry the 1 mathematically,
and that would ideally give 1000.
But if you don't have four bits and your computer is only sophisticated enough to
have three bits, not even 32, but three, the implication is that you're effectively
representing not 1000, but rather, 000. There's just no room to store that fourth
bit that I've grayed out here, which is to say that your integer might overflow.
And as soon as you get to 7, the next number once you add 1 is actually going to be
0, or worse, as we've seen here in my code, a negative value instead.
So what could we do to perhaps address this kind of concern? Well, C does not have
just integers or ints. It also has longs, which, as the name suggests, are just
longer integers, which means they have more bits available to them. So let me go
back into my code here. I'll clear the terminal window.
And let me go ahead and change my integers to literally long here, long here. I'm
going to have to change my function in CS50's library to be not get_int, but
get_long. And that's indeed another function we provide in the library. Let me
change this get_int to get_long as well.
I'll keep my variable names the same, but I do need to make one other change. It
turns out that printf also supports other format codes-- so not just %i for
integers or %s for strings, but also, for instance, %li for a long integer, as well
as %f for floating-point values with decimals.
So with that said, let's go ahead and change my printf line to be not %i, but %li.
Now let me go ahead and do make calculator again, Enter-- no apparent errors
now-- ./calculator. And 2 plus 2 still equals 4 as before.
But now if I do calculator again, and let's do 2 billion again as well as 2 billion
for y, previously, we overflowed the size of an integer and got some weird negative
number because the pattern was misinterpreted, if you will, as a negative number
instead. But a long, instead of using 32 bits, conventionally uses 64 bits, which
means we have more than enough spare bits to go when we add 2 billion plus 2
billion. And now, in fact, we get the correct answer of 4 billion, which does fit
inside of the size of a long.
Now, a long can count up quite high. And, in fact, it can count as high as this, 9
quintillion. And so that will give us quite a bit more runway. But, of course, it
too is ultimately going to be finite. So if you have numbers that need to go bigger
than that, you might still very well have a problem.
Now, there's another problem that we might run into as well. And we can see it in
the context of even this simple calculator. Computers also suffer from potentially
what's called truncation, where especially when you're doing math involving
floating-point values-- that is numbers with decimals-- you might accidentally
unknowingly truncate the value-- that is lose everything after the decimal point.
So in fact, let me go back to VS Code here. I'll clear my terminal window. And
let's still use longs, but let's go ahead and use division instead of addition
here.
So let me change this plus to a divide operator. Let me go ahead and recompile the
code down here with make calculator. Let me go ahead and run ./calculator, and let
me go ahead and do something like 1 for x and 3 for y.
And we'll see that-- well, wait a minute. 1 divided by 3, I learned, should be 1/3.
But in a floating-point value, that should 0.33333, maybe with a little line over
it in grade school, but, really, an infinite number of threes. And yet, we seem to
have lost even one of those threes after the decimal point because the answer is
coming back here as just 0.
So why might that be? Well, if I know that two integers, when divided one by the
other, is supposed to give me a fraction, a floating-point value with a decimal
point, I can't continue to use integers or even, in this case, longs, which do not
have support for decimal points. So let me go ahead and change this format code
here from %li to %f, which is, again, going to represent a floating-point value
instead of a long integer or even an integer. And let me go ahead further and
define maybe a third variable, z, as a float itself.
So I'll give myself a variable z equals x divided by y. And now rather than print x
divided by y, let's just go ahead and print z. So now I'm operating in a world of
floating-point values because I proactively that a long or an int divided by
another such value, if it's meant to have a fraction, needs to be stored in a
floating-point value, something with a decimal point.
Well, let me go down to my terminal window here and rerun make of calculator--
seems to work OK-- ./calculator, and let's do 1 divided by 3 again. And still here,
we see all zeros. So we do at least see a decimal point, so we've made some
progress Thanks to the %f and the float. But it seems that we've already truncated
the value 1 divided by 3.
So how do we actually get around this issue? Well, if you the programmer know that
you're dealing in a world that's going to give you floating point values with
decimal points, you might very well need to use what's called a feature known as
typecasting-- that is convert one data type to another by explicitly telling the
compiler that you want to do so.
Now, how do I do this? Well, let's go back to my code here. And if the issue
fundamentally is that C is still treating x and y as integers-- or technically,
longs with no decimal point-- and dividing one by the other, therefore has no room,
so to speak, for any numbers after a decimal point, why don't I proactively do
this?
Let me, using a slightly new syntax with parentheses, specify that I want to
convert x proactively from a long to a float. Let me specify proactively that I
want to convert y from a long to a float as well. And now let me go ahead and trust
that nz should be the result of dividing not a long by a long or an int by an int,
but rather, a float by a float.
Let me clear my terminal window, run make calculator again-- seems to work OK--
./calculator. And now 1, 3, and hopefully now we actually see that my code has
outputted 0.333333. And I think if we kept showing more numbers after the decimal
point, we'd theoretically see as many of those threes as we want.
But there is still one more catch. And especially when we're manipulating numbers
in this way in a computer using a finite amount of memory, another challenge we
might run up against-- besides integer overflow, besides truncation-- is this known
as floating-point imprecision. Just as we can't represent as big of an integer as
we want using int or long alone because there is going to be an upper bound,
there's similarly going to be a bound on just how precise our numbers can be.
And indeed, let's go back to VS Code here. I'll clear my terminal window yet again.
And this time, let me use some slightly unlikely syntax to specify that I don't
want to see the default number of numbers after the decimal point, which %f gives
us automatically. Let's go ahead and show me 20 decimal point numbers after the
decimal point. And the weird syntax for this is to do not %f, but %.20 to indicate
to see that I want to see 20 digits, not the default after, now, the decimal point.
Let me rerun make calculator. Let me do ./calculator again. And let's do 1, let's
do 3. And now this is even weirder, right? From grade school, you presumably
learned that 1 divided by 3 is, of course, 1/3. But that should be 0.33333,
infinitely many times, or, on paper, with a little line over it.
But the computer is doing some weird approximation here. It's a whole bunch of 3's
and then 4326744079590. Well, what's really happening under the hood, well, again,
is this issue of floating-point imprecision. If you only have a finite number of
bits and, in turn, a finite amount of memory, the computer can really only be so
precise intuitively.
Now, how can we go about improving the situation? Well, there is one alternative.
Instead of using float, I can use something called a double, which, as the name
suggests, uses twice as many bits as a float. So instead of 32 typically, it will
use 64. And that's just like the difference between a long and an int, which gave
us more bits. But in this case, this will be used for more precision.
Let's go ahead and cast x to a double. Let's cast y to a double. And now let's go
ahead and, using the same format code-- %.20f is still OK for doubles. Let me do
make calculator. Let me do ./calculator. And now let me do 1 divided by 3. And we
still have some of that imprecision.
And it's even more of it if we looked at more than just 20 digits. But now we have
more threes after the decimal point. So it's at least more, and more, and more
precise, but it's not perfect. But it's at least more precise.
So these kinds of issues, then, are going to be necessary to keep in mind any time
you do something numerically, scientifically, at least with a language C where
you're going to bump up against these real-world limitations of hardware and, in
turn, language.
Now, later in the semester, we'll transition to a language called Python. And
that's actually going to solve at least one of these problems for us by just
automatically giving us more bits, so to speak, as we need them, at least for
integers. But even the issue of floating-point imprecision is going to remain.
Now, just how real-world are these issues? Well, back in the year 1999, we got a
taste of this when the world realized in the years leading up to that date that it
might not have been the best idea to implement computers and software therein by
storing gears using just two digits. Like, instead of storing 1999 to represent the
year 1999, a lot of computers, for reasons of space and cost, were in the habit of
cutting a corner and just using two digits to keep track of the year.
The problem with that is that if systems were not updated by the year 1999 to
support the year 2000, 2001, and so forth, is that, just like before with integer
overflow, some computers might add 1 to the year in their memory, '99. It should be
the year 2000, but if they're only using two digits to represent years, they might
mistake the year-- as some systems may very well have-- for the year 1900 instead,
taking literally a big step backwards, if you will.
Now, you'd like to think that kind of issue is behind us, especially as we
understand all the more about the limitations of code and computing. But we're
actually going to run up against this very same type of issue again in just a few
years. On January 19 in the year 2038, we will have run out of bits in most
computers right now to keep track of time.
It turns out, years ago, humans decided to use a 32-bit integer to keep track of
how many seconds had elapsed over time. They chose a somewhat arbitrary date in the
past-- January 1, 1970-- And they just started counting seconds from there on out.
And so if a computer stores some number of seconds, that tells the computer how
many seconds have passed since that particular date, January 1, 1970.
Unfortunately, using a 32-bit integer, as we've seen, you can only count so high,
at which point, you overflow the size of that variable. And so potentially, if we
don't get ahead of this as humans, as a society, as computer scientists, on the
date January 19, 2038, that bit might flip over, thereby overflowing the size of
those integers, bringing us back computationally to December 13, 1901.
So this is to say now, with all of this computational ability and code comes a
responsibility to actually write correct code. Next week, we'll peel back some of
these layers. But for now, this was week 1, and best of luck on problem set 1.
[APPLAUSE]
[MUSIC PLAYING]
[MUSIC PLAYING] DAVID MALAN: All right. This is CS50, and this is week six, wherein
we finally transition from Scratch to C to, now, Python. And, indeed, this is going
to be somewhat of a unique experience in that, just like a few weeks past--
perhaps, for the first time-- and now, today, you're going to learn a new language.
But the goal isn't just to throw another fire hose of content and syntax and
whatnot at you, but rather, to really equip you all to actually teach yourself new
languages in the future.
And so, indeed, what we'll do today, what we'll do this coming week is prepare you
to stand on your own. And once Python is passe and the world has moved on to some
other language in some number of years, you'll be well equipped to figure out how
to wrap your mind around some new syntax, some new language and solve problems, as
well. Now, you recall, in week zero, this is where we started-- just saying hello
to the world.
And that quickly escalated just a week later in C to be something much, much more
cryptic. And if you've still struggled with some of the syntax, find yourself
checking your notes or your previous code, that's totally normal. And that's one of
the reasons why there are languages besides C out there-- among them, this language
called Python. Humans over the decades have realized, gee, that wasn't necessarily
the best design decision, or humans have realized, wow, you know what? Now that
computers have gotten faster with more memory and faster CPUs, we can actually do
more with our programming languages.
Thankfully, what you're about to see is "Hello, World!" for the third time, but
it's going to be literally this. None of the crazy syntax above or below, fewer
semicolons, if any, fewer currently braces. And, really, a lot of the distractions
get out of the way. So to get there, let's consider exactly how we've been
programming up until now. So you write a program in C and you've got, hopefully, no
syntax error, so you're ready to build it-- that is, compile it.
And so, you've run make, and then, you've run the program, like ./hello. Or if you
think back to week two, where we took a peek underneath the hood of what make is
doing, it's really running the actual compiler-- something called clang-- maybe
with some command-line arguments creating a program called hello. And then, you
could do ./hello. So, today, you're going to start doing something similar in
spirit, but fewer steps.
No longer will you have to compile your code and then run it, and then, maybe, fix
or change it, and then compile your code and run it, and then repeat, repeat. The
process of running your code is going to be distilled into just a single step. And
the way to think of this, for now, is that, whereas C is frequently used as,
indeed, a compiled language whereby you convert it first to 0s and 1s, Python's
going to let you speed things up whereby you, the human programmer, don't have to
compile it.
You're just going to run what's called an interpreter-- which, by design, is named
the exact same thing as the language itself-- and by running this program installed
in VS Code or, eventually, on your own Macs or PCs. This is just going to tell your
computer to interpret this code and figure out how to get down to that lower level
of 0s and 1s. But you don't have to compile the code yourself anymore.
So with that said, let's consider what the code is going to look like, side by
side. In fact, let's look back at some Scratch blocks, just like we did with C in
week one, and do some side by sides. Because even though some of the syntax this
week and beyond is going to be different, the ideas are truly going to be the same.
There's not all that much intellectually new just yet.
So whereas, in week zero, we might have said hello to the world with this purple
puzzle piece, today, of course-- or, rather, in week one, it looked like this in C.
But today, moving forward, it's going to, quite simply, look like this instead. And
if we go back and forth for just a moment, here, again, is the version in C,
noticing the very C-like characteristics. And just at a glance here, in Python, I
claim it's now this. What do you apparently need not worry about anymore? What's
gone?
So semi-colon is gone. And, indeed, you don't need those to finish most of your
thoughts anymore. Anything else?
AUDIENCE: Backslash n.
DAVID MALAN: So the backslash n is absent. And that's curious because we're still
going to get a new line, but we'll see that it's become the default. And this one's
a little more subtle, but now, the function is called print instead of printf. So
it's a little more familiar in that sense.
All right. So when it comes to using libraries-- that is, code that other people
have written-- in the past, we've done things like #include cs50.h to use CS50's
own header file or standard I/O or standard lib or string or any number of other
header files you have all used. Moving forward, we're going to give you, for this
first week, a similar CS50 library-- just very short-term training wheels that
we'll quickly take off because, in reality, it's a lot easier to do things in
Python, as we'll see. But the syntax for this, now, is going to be to import the
CS50 library in this way.
And when we have, now, this ability, we can actually start writing some code right
away. In fact, let me switch over to VS Code here. And just as in the past, I'll
create a new file. But instead of creating something called .c, I'm going to go
ahead and create my first program called hello.py, using code space hello dot py.
That, of course, gives me this new tab.
And let me actually, quite simply, do what I proposed-- print, quote unquote,
"Hello, world" without the /n, without the semicolon, without the f in print. And
now, let me go down to my terminal window. And I don't have to compile it. I don't
have to do dot slash. I, instead, run a program called python, whose purpose in
life is, now, to interpret my code top to bottom, left to right.
And if I run python of hello.py, crossing my fingers, as always-- voila. Now I have
printed out "hello, world." So we seem to have gotten the new line for free, in the
sense where it's automatically happening. The dollar sign isn't weirdly on the same
line, like it once was in week one. But that's just a minor detail here. If we
switch back to, now, some other capabilities-- well, indeed, with the CS50 library,
you can also not just import the library itself, but specific functions.
And you'll see that, temporarily, we're going to give you a helper function called
get_string, just like in C, that just makes it work exactly the same way as in C.
And we'll see a couple of other functions that will just make life easier,
initially. But, quickly, will we take those training wheels off so that nothing is,
indeed, CS50-specific. All right. Well, how about functions, more generally, in
Python? Let's do a whirlwind tour, if you will, much like we did in that first week
of C, comparing one to the other.
So back in our world of Scratch, one of the first programs we wrote was this one
here, whereby we ask the human their name. We then used the return value that was
automatically stored in this answer variable as an second argument to join so that
we could say "Hello, David" or "Hello, Carter." So this was back in week zero. In
week one, we converted it to this.
And here is a perfect example of things like escalating quickly. And, again, this
is why we start in Scratch. There's just so much distraction here to achieve the
same idea. But even today, we're going to chip away at some of that syntax. So, in
C, we had to declare the variable as a string, here. We of course, had the
semicolon and more. Well, in Python, the comparable code, now, is going to look,
more simply, like this. So semicolon is, again, gone on both lines, for that
matter. So that's good.
DAVID MALAN: Yeah. So I didn't have to specifically say that answer is now a
string. And, indeed, Python is dynamically typed. And, in fact, it will infer from
context exactly what it is you are storing in that variable. Other details that
seem a little bit different? A little bit different. What else jumps out at you
here? I'll go back. This was the C version. And maybe focus, now, on the second
line because we've rather exhausted the first.
AUDIENCE: You don't need to worry about %s or percent anything. You just have the
variable after [? them. ?]
AUDIENCE: Concatenate.
That's going to concatenate Hello comma space, and then, David or Carter or
whatever the human has typed in. But it turns out, there's going to be different
ways to do this in Python. And we'll show you a few different ones. And here, too,
try not to get too hung up on or frustrated by all of the different ways you can
solve problems. Odds are, you're going to be picking up tips and techniques for
years to come if you continue programming. So let's just give you a few of the
possible ways.
So here's a second way you could print out hello comma David or hello comma Carter.
But what has changed? In the previous version, I used concatenation explicitly. And
the space here is important, grammatically, just so we get that in the final
phrase. Now, I'm proposing to get rid of that space to add a comma outside of the
double quotes, as well.
But if you think back to C, this probably just means that print, similar in spirit
to printf, can take not just one argument, but even two. And in fact, because of
this comma in the middle that's outside of the double quotes, it's hello comma, and
then, it will be automatically concatenated with-- even without using the plus, to
whatever the value of answer is. And by default, just for grammatical prettiness,
the print function always gives you a space for free in between each of the
multiple arguments you pass in.
We'll see how you can override that down the line. But, for now, that's just
another way to do it. Now, perhaps the better, if slightly cryptic way to do this--
or just the increasingly common way-- is, probably, the third version, which looks
a little weird, too. And, probably, the weirdness jumps out. We've suddenly
introduced these curly braces, which I promised were mostly gone. And they are.
But inside of this string here, I've done a curly brace, which might mean what?
Just intuitively. And here is an example of how you learn a new language. Just
infer, from context, how Python probably works. What might this mean? Yeah?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. So this is an indication, because the curly braces-- because
this was the way Python was designed-- that we want to plug in the value of answer,
not literally A-N-S-W-E-R. And the fancy word here is that the answer variable will
be interpolated-- that is, substituted with its actual value. But, but, but-- and
this is actually weird-looking; this was introduced a few years ago to Python. What
else did I have to change to make these curly braces work, apparently?
Yeah?
DAVID MALAN: Yeah. There's this weird f. And so, it's like part of printf. But now,
it's inside the parentheses there. This is just the way Python designed this. So a
few years ago, when they introduced what are called format strings or fstrings, you
literally prefix your quoted string with the letter f. And then, you can use
trickery like this, like putting curly braces so that the value will be substituted
automatically.
If you forget the f, you're going to literally see hello comma curly brace answer
closed curly brace. If you add the f, it's, indeed, interpolated. The value is
plugged in. All right. Questions on how we can just say hello to the world via
Python, in this case. Yeah?
AUDIENCE: [? The f. ?]
DAVID MALAN: Without the f? If you omit the f, you will literally see H-E-L-L-O
comma curly brace A-N-S-W-E-R closed curly brace. So, in fact, let's do this. Let
me go back to VS Code here, quickly. I've still got my file called hello.py open.
And let me go ahead and change this ever so slightly. So I'm going to go ahead
and-- let's say from cs50 import get_string. And that's just the new syntax I
propose using to import a function from someone else's library.
I'm going to now go ahead and ask the question-- let's go ahead and use get_string,
storing the result in answer. So get_string, quote unquote, "What's your name?" And
then, on this line, I'm going to deliberately make a mistake here, exactly to your
question. Let me just say hello comma answer, and just this. Now, even though
answer is a variable, Python's not going to be so presumptuous as to just plug in
the value of a variable called answer.
What it's going to do, of course, is-- if I type in my name-- whoops. I typed too
fast. Let me go ahead and rerun that again. If I run python with hello.py, type in
my name and hit Enter, I get hello comma answer. Well, let me do one better. Let me
apply these curly braces as before. Let me rerun python of hello.py. What's your
name? D-A-V-I-D.
And here's, again, the answer to your question. Now, we get, literally, the curly
braces. So the fix here, ultimately, is just going to be to add the f there, rerun
my program again with David. And now, hello comma David. So this is, admittedly, a
little more cryptic than the ones with the plus or the comma, but this is just
increasingly common. Why? because you can read it left to right. It's nice and
convenient. It's less cryptic than the %s's.
So it's a new and improved version, if you will, of printf in C, based on decades
of experience of programmers doing things like this. Questions on printing in this
way? We're now on our way to programming in Python. Anything? All right. Well, what
more can we do with this language, here? Well, let me propose that we consider that
we have, for instance, a few other features that we can add to the mix, as well--
namely, let's say some data types, as well.
So let me flip over here, to back to the slides. And there's different data types
in Python, as we'll soon see. But they're not as explicit. As we already saw, by
using a string from get_string, you don't have to explicitly state what it is. But
you saw-- recall, in C-- all of these various data types. And then, in Python,
nicely enough, this list is about to get shorter.
And so, here is our list in C. Here is an abbreviated list in Python. So we're
still going to have strings, but they're going to be more succinctly called strs
now, S-T-R. We're still going to have ints for integers. We're still going to have
floats for floating point values. We're even going to have bools for true and
false. But what's missing, now, from the list is long and floats. And why is that?
Or rather, long and double.
We'll recall that, in C, those used more bits. Well, in Python, the smaller data
types, previously-- int and float, themselves-- just used more bits for you. And
so, you don't need to distinguish between small and large. You just use one data
type, and the language gives you a bigger range than before. It turns out, though,
there's going to be some other features, as well, of Python, and these data types--
one of which will be called range, another of which will be list.
So gone will be arrays. We'll actually use something literally called a list.
Tuples-- sort of x, y pairs for coordinates and things like that. Dicts for
dictionaries-- so we'll have built-in capabilities for storing keys and values
we'll see, and even a set. Mathematically, a set is a collection of values, but it
automatically gets rid of duplicates for you.
Let me go ahead and create a file called dictionary.py. Let me propose that I try
to implement, say-- problem set five-- our spell checker in Python instead of C and
achieve, ultimately, the same kind of behavior whereby I'll be able to spell check
a whole bunch of words. So this is jumping the gun a little bit because you're
about to see syntax will revisit over the course of today. But, for now, I've got a
new file called dictionary.py.
And let me begin to create some placeholders for functions. We'll see in just a bit
that, in Python, you can define a function called check, and that check function
can take a word as its input. And I'll come back to this in just a moment. In
Python, I can define a second function like load, which itself will take a whole
dictionary, just like in problem set five. And I'll go ahead and come back to the
implementation of this.
But one of the reasons for these higher-level languages like Python is that you can
stand on the shoulders of programmers before you and solve very common problems
much more quickly. So that you can focus on building your new app or your web
application or your own project to solve problems of interest to you. So at the
risk of crushing some spirits, let me propose that, in Python if you want a
dictionary for something like a spell checker, well, that's fine.
Go ahead and give yourself a variable, like words, to store all of those words and
just assign it equal to a dictionary-- or dict, for short, in Python. That will
give you a hash table. Now, it turns out, in speller recall, you don't need to
worry about words and definitions. It's just about spell-checking the words. So
strictly speaking, we don't need keys and values. We just need keys.
So I'm going to save myself a few more keystrokes by just saying that, technically,
in Python, using a set suffices. Again, a set is just a collection of values with
no duplicates. But they don't necessarily have keys and values. It's just one or
the other. But now that I have-- on line one, I claim the equivalent, in Python, of
a hash table, I can actually do something like this. Here's how I might implement
the check function in Python.
If the word passed into this function is in my variable called words, well, return
True. Else, go ahead and return False. Done. No, wait. You're thinking, if anything
at all, maybe we want to handle lowercase instead of just uppercase and lowercase.
Well, you know what? In Python, if you want to force a whole word to lowercase, you
don't have to iterate over it with a loop. You don't have to use any of that C-type
functions or anything. Just say word.lower, and that will convert the whole thing
to lowercase for parity with the dictionary.
All right. How about something like the load function in Python? Well, in Python,
you can open files just like in C. For instance, in Python, I might do open, the
dictionary argument in read mode, just like fopen in Python. I might do something
like this. For each line in that file, let me go ahead and add, to my words
variable, that line. And then, let me go ahead and close that file. And I think I'm
done.
I'm just going to go ahead and return True, just because I think I'm already done.
Now, here, too, I could nitpick a little bit. Technically, if I'm reading in every
line from the file, every line in the dictionary ends with, technically, a
backslash n. But there's an easy way to get rid of that, just like you might see
with an alternative syntax. What I'm actually going to do is this. Let me grab from
the current line, the current word, by stripping off with reverse strip-- rstrip; a
function we'll, again, see-- that just gets rid of the trailing new line-- the
backslash n at the end of that line.
And what I really want to do, then, is add this word to that dictionary. Meanwhile,
if I want to figure out what the size is of my dictionary, well-- and, see, you're
probably writing code to iterate over all of those lines, and you're just going to
count them up using a variable. Not so in Python. You can just return the length of
those words. And better still, in Python, you don't have to manage your own memory.
No more malloc. No more free. No more manual thinking about memory. The language
just deals with all of that for you.
So you know what? It suffices for me to just return True and claim that unloading
is done for me. And that's it. Again, whether, you're in the middle of or already
finished, this might, perhaps, adjust some frustration, but also, enlightenment in
that this is why higher-level languages exist. You can build on top of the same
principles, the same ideas, with which you've been dealing, struggling even this
past week.
But you can now express yourself all the more succinctly. This one line implements
a hash table for you, and all of this, now, just uses that hash table in a simpler
way. Any questions, now, on this, keeping in mind that the point, nonetheless, of
speller in p-set 5 is to understand what's really going on underneath the hood and,
better still, to notice this.
This might seem all rather amazing, but let me go ahead and do this. I've actually
got a couple of versions of speller written here, and I've got a version written in
C that I won't show the source code for. But I'm going to go ahead and make that
version of speller in C. And I'm going to go ahead here and, let's say, split my
window here for just a moment. And I'm going to go into a Python version of
speller, really, that I just wrote.
And on the left-hand side here, let me go ahead and run speller-- the version I
compiled in C-- using a big text like the Sherlock Holmes text, which has a whole
lot of words in it. And on the right-hand side, let me run python of speller.py,
which is a separate file I wrote in advance, just like we give you speller.c. And
I'll, similarly, run this on the Sherlock Holmes text.
And I'm going to do my best to hit Enter on the left and the right of my screen at
the same time. But we should see, hopefully, the same list of misspelled words and
the timings thereof. So here we go on the right. Here we go on the left. All right.
A race to see which one wins here. C is on the left. Python is on the right. OK.
Interesting. Hopefully, Python's close behind.
Note that some of this is internet delay. And so, it might not necessarily be a
crazy number of seconds. But the system is, indeed, using, if we measure it, a low
level. How much time the CPU spent executing my code? C took a total of 1.64
seconds. That was pretty fast, even though it took a moment more for all of the
bytes to come over the internet. The Python version, though, took what? 2.44
seconds.
So what might the inference be? One, maybe I'm just better at programming in C than
I am in Python, which is probably not true. But what else might you infer from this
example? Should we, maybe, give up on Python, stick with C? No? So what might be
going on here? Why is the Python version, that I claim is correct-- and I think the
numbers all line up, just not the times. Where is the trade-off here?
Well, here, again, is this design trade-off. Yeah?
DAVID MALAN: Yeah, exactly. In order to save the human programmer time, there's a
lot more features built into Python-- more functions, more automatic management of
memory and so forth-- and you have to pay a price. Someone else's code is doing all
of that work for you. But if they've written some number of lines of code, those
are just more lines of code that need to be executed for you, whereas here, the
computer is at the risk of oversimplifying only running my lines of code. So
there's just less overhead. And so, this is a perpetual trade-off.
Typically, when using a more user-friendly and more modern language, one of the
prices you might pay is performance. Now, there's a lot of smart computer
scientists in the world, though, trying to push back on those same trade-offs. And
so, these interpreters, like the command I wrote, Python technically can--
especially if you run a program again and again-- actually, secretly, behind the
scenes, compile your code for you, down to 0s and 1s. And then, the second, the
third, the fourth time you run that program, it might very well be faster.
So this is a bit of a head fake here, in that I'm running them once and only once.
But we could get benefit over time if we kept running the Python version again and
again and, perhaps, fine-tune the performance. But, in general, there's going to be
this trade-off. Now, would you rather spend the 60 seconds I wrote implementing a
spell checker or this 6 hours, 16 hours you might be or have spent implementing the
same in C? Probably not. For productivity's sake, this is why we have these
additional languages.
Just for fun, let me flip over to another screen here and open up a version of
Python that's actually-- in just a second-- on my own Mac instead of the cloud so
that I can actually do something with graphics. So, here, I just have a black and
white terminal window on my very own Mac. And I've pre-installed Python, just like
we've done so for VS Code in the cloud for you.
Notice that I've got this photo of, perhaps, one of your favorite TV shows here,
with the cast of The Office. Notice all of the faces in this image here. And let me
propose that we try to find one face in the crowd, CSI-style, whereby we want to
find, perhaps, the Scranton Strangler, so to speak. And so, here is an example of
this guy's face. Now, how do we go about finding this specific face in the crowd?
Well, our human eyes, obviously, can pluck him out, especially if you're familiar
with the show. But let me go ahead and do this instead. Let me go ahead and propose
that we run code that I already wrote in advance here. This is a Python program
with more lines of code that we won't dwell on for today. But it's meant to
motivate what we can do. From a pillow library, implying a Python image library, I
want to import some type of information, some feature called image so that I can
manipulate images, not unlike our own problem set four.
And this is powerful. in? Python. You can just [MIMICS EXPLOSION] import face
recognition as a library that someone else wrote. From there, I'm going to create a
variable called image. I'm going to use this face recognition libraries.
load_image_file function. It's a little verbose, but it's similar in spirit to
fopen. And I'm going to open office.jpeg. I'm going to, then, declare a second
variable called face_locations, plural, because what I'm expecting to get back, per
the documentation for this library, is a list of all of the faces' locations that
are detected.
All right. Then, I'm going to iterate over each of those faces using a for loop,
that we'll see in more detail. I'm going to, then, infer what the top, right,
bottom, and left corners are of that face. And then, what I'm going to do here is
show that face alone, if I've detected the face in question. So let me go ahead,
here, and run detect.py. And we'll see not just the one face we're looking for.
But if I run Python of detect.py, it's going to do all of the analysis. I'll see a
big opening here, now, of all of the faces that were detected in this here program.
[CHUCKLES] OK, some better than others, I guess, if you zoom in on catching
someone. Typical Angela. If you want to, now, find that one face, I think we need
to train the software a bit more. So let me actually open up a second program
called recognize that's got more going on.
But let me, with a wave of a hand, point out that I'm now loading not only the
office.jpeg, but also toby.jpeg to train the algorithm to find that specific face.
And so, now, if I run this second version-- recognize.py-- with Python of
recognize.py-- hold my breath for just a moment; it's analyzing, presumably, all of
the faces-- you see the same, original photo.
But do you see one such face highlighted here? This version of the code found Toby,
highlighted him with the screen and, voila, we have face recognition. So for better
or for worse, this is what's happening, increasingly societally, nowadays. And
honestly, even though I didn't write the code live-- because it's a good dozen or
more lines of code-- it's not terribly many. And literally, all the authorities--
all we have to do is import face recognition and, voila, you have access. These
technologies are here already.
But let's consider, for just a moment-- how did we find Toby? How might that
library-- even though we're not going to look at its implementation details, how
does it find Toby and distinguish him from all of these other faces in the crowd?
What might it be doing, intuitively. Think back even to p-set four, what you,
yourselves, have access to, data-wise. Yeah?
DAVID MALAN: Yeah, exactly. And to summarize for the camera here, we have trained
the software, if you will, by giving it a photo of Toby's face. So, by looking for
the same or, really, similar pixels-- especially if it's a slightly different image
of Toby-- we can, perhaps, identify him in the crowd. And what really is a human
face? Well, at the end of the day, the computer only knows it as a pattern of bits
or, really, at a higher level, a pattern of pixels.
So maybe a human face is, perhaps, best defined, in general, as two eyes and a nose
and a mouth that, even though all of us look similar, structurally, odds are, the
measurement between the eyes and the nose and the width of the mouth, the skin tone
and all of these other physical characteristics are patterns that software could,
perhaps, detect and then look, statistically, through the image, looking for the
closest possible match to these various measurement shapes, colors and sizes and
the like.
And, indeed, that might be the intuition. But what's powerful here, again, is just
how easy and readily available this technology now is. All right. So with that
said, let's propose to consider what more we can do with Python itself, get back to
the fundamentals, so that you, yourselves can start to implement something along
those same lines. So besides having access to things like a get_string function,
the CS50 library provides a few other things, as well-- namely, in C, we had these.
But in Python, we're going to have fewer. In Python, our library, short-term, is
going to give you not only get_string, but also get_int and get_float. Why? It's
actually just annoying, as we'll soon see, to get back an integer or a float from a
user and just make sure that it's an int and a float and not a word like cat or
dog, or some string that's not actually a number.
Well, we can import not just the specific function, get_string, but we can actually
import all of these functions one at a time, like this, as we'll soon see. Or you
can even, in Python, import specific functions from a file. One of you asked a
while back, when you include something like CS50.h or standard I/O .h, you're
actually getting all of the code in that file, which, potentially, can add bulk to
your own program or time.
In this case, when you import specific functions from Python, you can be a little
more narrowly precise as to what it is you want to have access to. All right. So,
with that said, let's go ahead and see what conditionals look like in Python. So in
the left-hand side again, here, we'll see Scratch. So it's just a contrived example
asking if x is less than y, then, say, x is less than y.
In C, it looked like this. In Python, now, it's going to look like this instead.
And here's before in C, and here's after. And just to call out a few of the obvious
differences, what has changed, in Python, for conditionals, it would seem? What's
the difference? Yeah.
DAVID MALAN: Yeah. So there's no more curly braces. And, indeed, you don't use
those. What appears to be taking their place, if you might infer? What seems to
have taken their place? What do you think?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: So the colon at the start of this line, here. But also even more
important, now, is this indentation below it. So some of you, and we know this from
office hours, have a habit of indenting everything on the left, right? And it's
just this crazy mess to look at. Frustrating for you, surely. But C and Clang is
pretty tolerant when it comes to things like white space in a program.
Python, uh-uh. They realized, years ago, that-- let's help humans help themselves
and just require standard indentation. So four spaces would be the norm here. But
because it's indented below that colon, that, indeed, indicates that this, now, is
part of that condition. Something else has gone missing, versus C, in this
conditional. What else is a little simplified?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. So no more parentheses. You can still use them, especially when
you need to, logically, to do order of operations, like in math. But in this case,
if you just want to ask a simple question, like if x less than y, you can just do
it like that. How about when you have an if else? Well, this is almost the same,
here, with these same changes. In C, this looked like this. And it's starting to
get a bit bulky-- at least, if we use our curly braces in this way.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. Instead of else if, it's elif. Why? [SIGHS] Apparently, else
space if was just too many keystrokes for humans to type, so they condensed it into
this way. Probably means it's a little more distinguishable, too, for the computer
between the if and the else, too. But just something to remember, now. It's,
indeed, elif and not else if. All right. So what about variables in Python? I've
used a couple of them already, but let's distill exactly how you define and declare
these things, as well.
So, in Scratch, if we wanted to create a variable called counter and set it equal,
initially, to 0, we would do something like this-- specify that it's an int, use
the assignment operator, end the thought with a semicolon. In Python, it's just
simpler. You name the variable, use the assignment operator, as before, you set it
equal to some value, and that's it. You don't mention the type. You don't mention
the semicolon or anything more.
What if you want to change a variable, like counter, by 1-- that is, incremented by
1? You have a few different ways here. In C, we saw syntax like this, where you can
say counter equals counter plus 1, which, again, feels illogical. How can counter
equal counter plus 1? But, again, we read this code, really, right to left,
updating its value by 1. In Python, it's almost the same. You just get rid of the
semicolon. So that logic is there.
DAVID MALAN: Plus plus is no more, sadly, in Python. Just too many ways to do the
same thing, so they got rid of it in favor of just this syntax, here. So keep that
in mind, as well.
What about loops, when you want to do something in Python again and again. Well, in
Scratch, in week zero, here's how we meowed three times, specifically. In C, we had
a couple of ways of doing this. This was the more mechanical approach, where you
create a variable called i. You set it equal to 0. You then do while i is less than
3, the following. And then, you, yourself increment i again and again. Mechanical
in the sense that you have to implement all of these gears and make them turn
yourself, but this was a correct way to do that.
In Python, we can still achieve the same idea, but we don't need the int keyword.
We don't need any of the semicolons. We don't need the parentheses. We don't need
the curly braces. We can't use the plus plus, so maybe that's a minor step
backwards if you're a fan. But otherwise, the code, the logic is exactly the same.
But there's other ways to achieve this same idea.
Recall that, in C, we could also do this. You could use a for loop, which does
exactly the same thing. Both are correct. Both are, arguably, well-designed. It's
to each their own when it comes to choosing between these. In Python, though, we're
going to have to think through how to do this. So you don't do the same for loop as
in C. The closest I could come up with is this, where you say for i-- or whatever
variable you want to do the counting-- in-- literally the preposition-- and then,
you use square brackets here.
And we've used square brackets before, in the context of arrays and things like
that. And the 0, 1, 2 looks like an array, in some sense, even though we've also
seen arrays with curly braces. But these square brackets, for now, denote a list.
Python does not have arrays. An array is that contiguous chunk of memory, back to
back to back, that you have to resize somehow by moving things around in memory, as
per two weeks ago.
In Python, though, you can just create a list like this using square brackets. And
better still, as we'll see, you can add or even remove things from that list down
the road. This, though, is not going to be very well-designed. This will work. This
will iterate in Python three times. But what might rub you the wrong way about this
design, even if you've never seen Python before? How does this example not end
well? Yeah?
DAVID MALAN: Yeah. If you're making a large list, you have to type out each one of
these numbers, like comma 3, comma 4, comma 5, comma, dot, dot, dot, 50 comma, dot,
dot, dot, 500. Like, surely, that's not the best solution, to have all of these
numbers on the screen, wrapping endlessly on the screen. So, in Python, another way
to do this would be to use a function called range, which, technically, is a data
type onto itself.
And this returns to you as many values as you ask for it. range takes some other
arguments, as well. But the simplest use case here is, if you want back the numbers
0, 1, and 2-- a total of three values-- you say, hey, Python, please give me a
range of three values. And by default, they start at 0 on up. But this is more
efficient than it would be to hard code the entire list at once.
And the best metaphor I could come up with is something like this. Here, for
instance, is a deck of cards. This is normal, human size, and there's presumably 52
cards here. So writing out 0 through 51 on code would be a little ridiculous for
the reasons you know. And it would just be very unwieldy and ugly and wrapping in
all of that. It would be the virtual equivalent of me handing you all of these
cards at once to just deal with.
And, right, they're not that big, but it's a lot of cards to hold on to. It
requires a lot of memory or physical storage, if you will. What range does,
metaphorically, is, if you ask me for three cards, I hand you them one at a time,
like this, so that, at any point in time, you only have one number in the
computer's memory until you're handed the next. The alternative-- the previous
version would be to hand me all three cards at once, or all 52 cards at once. But
in this case, range is just way more efficient.
You can do range of 1,000. That's not going to give you a list of 1,000 values all
at once. It's going to give you 1,000 values one at a time, reducing memory
significantly in the computer itself. All right. So, besides this, what about doing
something forever in Scratch? Well, we could do this, literally, with a forever
block, which didn't quite exist in C. In C, we had to hack it together by saying
while True-- because True is, by definition, T-R-U-E, always true. So this just
deliberately induces an infinite loop for us.
In Python, the logic's going to be almost the same. And infinite loops in Python
tend to actually be even more common because you can always break out of them, as
you could in C. In Python, it looks like this. And this is slightly more subtle,
but gone are the curly braces. Gone are the parentheses. But ever so slight
difference, too? A capital T for True and it's going to be a capital F for False.
Stupid little differences. Eventually, you're going to mistype one or the other.
But these are the kinds of things to keep an eye out and to start recognizing in
your mind's eye when you read code.
AUDIENCE: In the for loop, was i set to 0 once for [? every loop? ?]
DAVID MALAN: In the for loop, was i-- it was set to 0 on the first iteration, then
1 on the next, then 2 on the third. And the same thing for range. It just doesn't
use up as much memory all at once. Other questions, now, on any of these building
blocks of Python? All right. Well, let's go ahead and build something a little more
than hello. Let me propose that, over here, we implement, maybe, the simplest of
calculators here.
So let me go back to VS Code here, open my terminal window and open up, say, a file
called calculator.py. And in calculator.py, we'll have an opportunity to explore
some of these building blocks, but we'll allow things to escalate pretty quickly to
more interesting examples so that we can do the same thing, ultimately, as well.
And, in fact, let me go ahead and do this. Moreover, I've brought some code with me
in advance.
For instance, something called calculator0.c, from the first week of C. And let me
go ahead and split my window here, in fact, so that I can now do something like
this. Let me move this over here, here. Calculator.py. So now, I have, on the left
of my screen, calculator.c-- or calculator0.c because that's the first version I
made-- and calculator.py on the right.
Let me go ahead and implement, really, the same idea here. So on the right-hand
side, the analog of including cs50.h would be from cs50 import get_int if I want
to, indeed, use this function. Now, I'm going to go ahead and give myself a
variable x without defining its type. I'm going to use this get_int function and
I'm going to prompt the user for x, just like in C.
I'm, then, going to go ahead and prompt the user for another int, like y, here,
just like in C. And at the very end, I'm going to go ahead and do print x plus y.
And that's it. Now, granted, I have some comments in my C version of the code, just
to remind you of what each line is doing. But I've still distilled this into six
lines-- or, really, four if I get rid of the blank line.
So it's already, perhaps, a bit tighter here. It's tighter because something really
important, historically, is missing. What did I seem to omit altogether that we
haven't really highlighted yet? Yeah?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. The main function is gone. And in fact, maybe you took for
granted that it just worked a moment ago when I wrote hello, but I didn't have a
main function in hello, either. And this, too, is a feature of Python and a lot of
other languages, as well. Instead of having to adhere to these long-standing
traditions, if you just want to write code and get something done, fine. Just write
code and get something done without, necessarily, all of this same boilerplate.
Slight aesthetic bug. I put my space in the wrong place here. So that's a newbie
mistake. Let me fix that, aesthetically. Let me rerun python of calculator.py. Type
in 1. Type in 2. And, voila, there is now my same version again. But let me
propose, now, that we get rid of this training wheel. We don't want to keep taking
one step forward and then two steps back by adding these training wheels, so let me
instead do this.
DAVID MALAN: Exactly. The input function, by design, always returns a string of
text. After all, that's what the human typed in. And even though, yes, I typed the
number keys on the keyboard, it's still coming back as all text. Now, maybe we
should use like a get_int function. Well, that doesn't exist in Python. All you can
do is get textual input-- a string from the user. But we can convert one to the
other.
And so, a fix for this so that we don't accidentally concatenate-- that is, join x
plus y together-- would be to do something like this. Let me go back to my Python
code, here. And whereas, in C, we could previously do typecasting-- we can convert
one type to another-- that generally wasn't the case when you were doing something
complex, like a string to an int.
You could do a char to an int and vise versa. But for a string, recall, there was a
special function in the C-type library called a to I, like Ascii to integer. That's
the closest analog, here. And, in fact, the way to do this in Python would be to
use a function called int, which, indeed, is the name of the data type, too, even
though I have not yet had to type it. And I can convert the output of the input
function automatically from a string immediately to an int.
All right. What if we do something slightly different, now, with our calculator.
Instead of addition, let's do division instead. So z equals x divided by y, thereby
giving me a third variable z. Let me go ahead and run python of calculator.py
again. I'll type in 1. I'll type in 3 this time. And what problem do you think
we're about to see? Or is it gone? What happened when I did this in C, albeit with
some slightly more cryptic syntax, when I divided one number, like 1 divided by 3?
DAVID MALAN: Yeah. So it would round down to the nearest integer, whereby you
experience truncation. So if you take an integer like 1, you divide it by another
integer like 3, that technically should be 0.33333, infinitely long. But in C,
recall, you truncate the value. If you divide an int by an int, you get back an
int, which means you get only the integer part, which was the 0.
Now, Python actually handles this for us and avoids the truncation. But it leaves
us, still, with one other problem here, which is going to be, for instance, not
necessarily visible at a glance. This looks correct. This has solved the problem in
C. So truncation does not happen. The integers are automatically converted to a
float-- a floating point value. But what other problem did we trip over, back in
week one?
What else got a little dicey when dealing with simple arithmetic? Anyone recall?
Well, the syntax in Python is a little different, but let me go ahead and do this.
It turns out, in Python, if you want to see more significant digits than what I'm
seeing here by default, which is a dozen or so, let me go ahead and print out z as
follows. Let me first print out a format string because I want to format z in an
interesting way. And notice, this would have no effect on the difference. This is
just a format string that, for no compelling reason at the moment, is interpolating
z in those curly braces using an fstring or format string.
If I run this again with 1 and 3, we'll see, indeed, the exact same thing. But when
you use an fstring, you, indeed, have the ability to format that string more
precisely. Just like with %f in Python, you could start to fine-tune how many
significant digits you see-- in C, rather. In Python, you can do the same, but the
syntax is a little different. If you want the computer to interpolate z and show
you 50 significant digits-- that is, 50 numbers after the decimal point-- syntax is
similar to C, but it's a little different.
You literally put a colon after the variable's name. dot 50 means show me the
decimal point and, then, 50 digits to the right, and the f just indicates please
treat this as a floating point value. So now, if I rerun python of calculator.py,
divide 1 by 3, unfortunately, Python has not solved all of the world's problems for
us. This, again, was an example of floating point imprecision. So that problem is
still latent.
So just because the world has advanced, doesn't necessarily mean that all of our
problems from C have gone away. There are solutions using third-party libraries for
scientific calculations and the like. But out of the box, floating point
imprecision is still an issue. Meanwhile, there was one other problem in C that we
ran into involving numbers, and that was this-- integer overflow. Recall that an
integer in C only took up, what, 32 bits typically, which meant you could count as
high as 4 billion or, maybe, if you're doing positive and negatives, as high as 2
billion, after which, weird things would happen.
So we've taken a couple of steps forward, one step sideways. But, indeed, we have
solved some of our problems here. All right. Questions, now, on any of these
examples thus far? Question? All right. Well, how about another problem that we
encountered in C. Let's revisit it here in Python, as well. So let me go ahead and,
on the left-hand side here, let me open up a file called, say, compare3.c on the
left, and let me go ahead and create a new file on the right called compare.py.
Because recall that bad things happened when we needed to compare two values in C.
So on the left, here, is a reminder of what we once did in C, whereby, if we want
to compare values, we can get an int in C, store it in x. A get_int in C, store it
in y. We then have our familiar, conditional logic here, just printing out if x x
less than y or not. Well, we can certainly do the same thing, ultimately, in Python
by using some fairly familiar syntax. And let's just demonstrate this one quickly.
Let me go over here, too. I'll do from cs50 import get_int, even though I could do
this, instead, with the input function itself. x equals get_int, and I'll prompt
the user for that. y equals get_int, and I'll prompt the user for that. After that,
recall that I can say, without parentheses, if x is less than y, then print out,
without the f, "x is less than y." Then, I can go ahead and say else if x is
greater than y, I can print out, quote unquote, "x is greater than y."
If you'd like to interject now, what did I screw up? Anyone? Yeah?
AUDIENCE: Elif.
DAVID MALAN: Elif, right? So elif x is greater than y, else-- this part's the
same-- print "x is equal to y." There's no new logic going on here. But, at least
syntactically, it's a little cleaner. Indeed, this program is only 11 lines long,
albeit without any comments. Let me go ahead and run python of compare.py. Let's
see. Is 1 less than 2? Indeed. Let's run it again.
Is 2 less than 1? No, it's greater than. And let's, lastly, type in 1 and 1 twice.
x is equal to y. So we've got a pretty side-by-side, one-to-one conversion here.
Let's do something a little more interesting, then. In C, how about I open,
instead, something where we actually compared for a purpose? So if I open up, from
earlier in the course-- how about agree.c, which prompt the user to agree to
something or not?
And let me code up a new version here, called agree.py. And I'll do this on the
right-hand side, with agree.py. But on agree.c on the left-- notice that this is
how we did this yes-no thing in C-- we compared c, a character, equal to single
quotes 'Y' or equal to single quotes little 'y.' And then, the same thing for n.
Now, in Python, this one is actually going to be a little bit different, here.
Let me go ahead and, in the Python version of this, let me do something like this.
We'll use get_string. Actually, no. We'll just use input in this case. So let's do
s equals input. And we'll ask the user the same thing-- Do you agree, question
mark. Then, let's go ahead and say, if s equals equals-- how about Y? Huh. How do I
do this? Well, a few things.
Turns out, I'm going to do this-- s equals equals little y. Then, I'm going to go
ahead and print out "Agreed." And elif s equals equals capital N or s equals equals
lowercase n, I'm going to go ahead and print out "Not agreed." And I claim, for the
moment, that this is identical, now, to the program on the left in C. But what's
different? So we're still doing the same kind of logic, these equal equals for
comparing for equality.
But notice that, nicely enough, Python got rid of the two vertical bars, and it's
just literally the word "or." If you recall seeing ampersand ampersand to express a
logical and in C, [GRUNTS] you can just write, literally, the word "and." And so,
here's a hint of why Python tends to be pretty popular. People just like that it's
a little closer to English. There's a little less of the cryptic syntax here.
Now, this is correct, as this code will now work. But I've also used double quotes
instead of single quotes, and I also omitted, a few minutes ago, from my list of
data types in Python the word "char." In Python, there are no chars. There are no
individual characters. If you want to manipulate an individual character, you use a
string-- that is to say, a str-- of size 1.
Now, in Python, you can use single quotes or double quotes. I'm deliberately using
double quotes everywhere, just for consistency with how we treat strings in C. It's
pretty common, though, to use single quotes instead, if only because, on most
keyboards, you don't have to hold the Shift key anymore. Humans have really started
to optimize just how quickly they want to be able to code.
So using a single quote tends to be pretty popular in Python and other languages,
as well. They are fundamentally the same, single or double, unlike in C, where they
have meaning. So this is correct, I claim. And, in fact, let me run this real
quick. I'll open up my terminal window here. Let me get rid of the version in C and
run python of agree.py. And I'll type in Y. OK. I'll run it again and type in
little y. And I'll stipulate it's going to work for no, as well.
But this isn't necessarily the only way we can do this. There are other ways to
implement the same idea. And in fact, I can go about doing this instead. Let me go
back up to my code here. And we saw a hint of this earlier. We know that lists
exist in Python, and you can create them just by using square brackets. So what if
I simplify the code a little bit and just say if s is in the following list of
values-- capital Y or lowercase y. It's not all that different, logically, but it's
a little tighter. It's a little more compact.
So if I run agree.py again and type in capital Y or lowercase y, that still, now,
works. Well, I can tighten this up further if I want to add more features. Well,
what if I want to support not just big Y and little y, but how about "Yes" or "yes"
or, in case the user is yelling or someone who isn't good with CapsLock types in
"YES?" Wait a minute. But it could be weird. Do we want to support this or this?
This just gets really tedious, quickly, combinatorially, if you consider all of
these possible permutations. What would be smarter than doing something like this,
if you want to just be able to tolerate "yes" in any form of capitalization?
Logically, what would be nice?
AUDIENCE: Maybe, whatever the input is, you just transfer it over to all lowercase
while uppercase, and then redo it?
DAVID MALAN: Exactly. Super common paradigm. Why don't we just force the user's
input to all lowercase or all uppercase-- doesn't matter, so long as we're self-
consistent-- and just compare against all uppercase or all lowercase. And that will
get rid of all of the possible permutations, otherwise. Now, in C, we might have
done something like this. We might have simplified this whole list and just said--
let's say we'll do-- how about lowercase? So y or yes, and we'll just leave it at
that.
But we need to force, now, s to lowercase. Well, in C, we would have used the C-
type library. We would have done to.lower and call that function, passing it in.
Although, not really because, in C-type, those operate on individual characters or
chars, not whole strings. We actually didn't see a function that could convert a
whole string in C to lowercase.
But in Python, we're going to benefit from some other feature, as well. It turns
out that Python supports what's called object-oriented programming. And we're only
going to scratch the surface of this in CS50. But if you take a higher-level C
course in programming or CS, you explore this as a different paradigm. Up until
now, in C, we've been focusing on what's called, really, procedural programming.
You write procedures. You write functions, top to bottom, left to right.
And when you want to change some value, we were in the habit of using a procedure--
that is, a function. You would pass something, like a variable, into a function,
like toupper or tolower, and it would do its thing and hand you back a value. Well,
it turns out that it would be nicer, programming-wise, if some data types just had
built-in functionality. Why do we have our variables over here and all of our
helper functions, like toupper and tolower over here, such that we constantly have
to pass one into the other.
It would be nice to bake into our data type some built-in functionality so that you
can change variables using their own, default built-in functionality. And so,
Object-Oriented Programming, otherwise known as OOP, is a technique whereby certain
types of values, like a string-- AKA str-- not only have properties inside of
them-- attributes, just like a struct in C-- your data can also have functions
built into them, as well.
So, whereas in C, which is not object-oriented, you have structs. And structs can
only store data, like a name and a number when implementing a person. In Python,
you can, for instance, have not just a structure-- otherwise known as a class--
storing a name and a number. You can have a function call that person or email that
person or actual verbs or actions associated with that piece of data.
Now, in the context of strings, it turns out that strings come with a lot of useful
functionality. And in fact, at this URL here, which is in docs.python.org, which is
the official documentation for Python, you'll see a whole list of methods-- that
is, functions-- that come with strings that you can actually use to modify their
values. And what I mean by this is the following. If we go through the
documentation, poke around, it turns out that strings come with a function called
lower.
And if you want to use that function, you just have to use slightly different
syntax than in C. You do not do tolower, and you do not say, as I just did, lower
because this function is built into s itself. And just like in C, when you want to
go inside of a variable, like a structure, and access a piece of data inside of it,
like name or number, when you also have functions built into data types-- AKA
methods; a method is just a function that is built into a piece of data-- you can
do s dot lower open paren, closed paren in this case.
And I can do this down here, as well. If s.lower in, quote unquote, "n" or "no",
the whole thing, I can force this whole thing to lowercase. So the only difference
here, now, as an object-oriented programming, instead of constantly passing a value
into a function, you just access a function that's inside of the value. It just
works because of how the language itself is defined. And the only way you know
whether these functions exist is the documentation-- a class, a book, a website or
the like.
Questions, now, on this technique? All right. I claim this is correct. Now, even
though you've never programmed, most of you, in Python before, not super well-
designed. There's an subtle inefficiency, now, on lines 3 and 5 together. What's
dumb about how I've used lower, might you think? Yeah?
AUDIENCE: I feel like, using it twice, you'd just want another [? variable. ?]
DAVID MALAN: Yeah. If you're going to use the same function twice and ask the same
question, expecting the same answer, why are you calling the function itself twice?
Maybe we should just store the result in a variable. So we could do this in a
couple of different ways. We, for instance, could go up here and create another
variable called t and set that equal to s.lower. And then, we could just change
this to be t, here.
But honestly, I don't think we technically need another variable altogether, here.
I could just do something like this. Let's change the value of s to be the
lowercase version thereof. And so, now, I can quite simply refer to s again and
again like this, reusing that same value. Now, to be sure, I have now just lost the
user's original input. And if I care about that-- if they typed in all caps, I have
no idea anymore. So maybe I do want to use a separate variable, altogether.
But a takeaway here, too, is that strings in Python are technically what we'll call
immutable-- that is, they cannot be changed. This was not true in C. Once we gave
you arrays in week two or memory in week four, you could go to town on a string and
change any of the characters you want-- uppercasing, lowercasing, changing it,
shortening it and so forth. But in this case, this returns a copy of s, forced to
lowercase. It doesn't change the original string-- that is, the bytes in the
computer's memory.
When you assign it back to s, you're essentially forgetting about the old version
of s. But because Python does memory management for you-- there's no malloc,
there's no free-- Python automatically frees up the original bytes, like Y-E-S, and
hands them back to the operating system for you. All right. Questions, now, on this
technique? Questions on this?
In general, I'll call out-- the Python documentation will start to be your friend
because, in class, we'll only scratch the surface with some of these things. But in
docs.python.org, for instance, there's a whole reference of all of the built-in
functions that come with the language, as well as, for instance, those with a
string. All right. Before we take a break, let's go ahead and create something a
little familiar too based on our weeks here, in C. Let me propose that we revisit
those examples involving some meows.
So, for instance, when we had our cat meow back in the first week and, then, second
in C, we did something that was a little stupid at first whereby we created a file,
as I'll do here-- this time, called meow.py. And if I want a cat to meow three
times, I could run it once, like this, a little copy-paste. And now, python of
meow.py, and I'm done. Now, we've visited this example two times, at least, now in
Scratch and in C.
It's correct, I'll stipulate, but what's, obviously, poorly designed? What's the
fault here? Yeah?
DAVID MALAN: It should just be a loop, right? Why type it three times? Literally,
copying and pasting is almost always a bad thing-- except in C, when you have the
function prototypes that you need to borrow. But in this case, this is just
inefficient. So what could we do better here, in Python? Well, in Python, we could
probably change this in a few different ways. We could borrow some of the syntax we
proposed in slide form earlier, like give me a variable called i. Set it to 0, no
semicolon. While i is less than 3-- if I want to do this three times-- I can go
ahead and print out "meow."
And then, I can do i plus equals 1. And I think this would do the trick. Python of
meow.py, and we're back in business already. Well, if I wanted to change this to a
for loop, well, in Python, it would be a little tighter, but this would not be the
best approach. So for i in 0, 1, 2, I could just do print "meow", like this. And
that, too, would get the job done. But, to our discussion earlier, this would get
stupid pretty quickly if you had to keep enumerating all of these values. What did
we introduce instead?
The range function. Exactly. So that hands me back, way more efficiently, just the
values I want, indeed, one at a time. So even this, if I run it a third or fourth
time, we've got the same result. But now, let's transition to where we went with
this back in the day. How can we start to modularize this? It would be nice, I
claimed, if MIT had given us a meow function. Wouldn't it be nice if Python had
given us a meow function? Maybe less compelling in Python, but how can I build my
own function?
Well, I did this briefly with the spell checker earlier, but let me go ahead and
propose that we could implement, now, our own version of this in Python as follows.
Let me go ahead and start fresh here and use the keyword def. So this did not exist
in C. You had the return value, the function name, the arguments. In Python, you
literally say def to define a function. You give it a name, like meow.
And now, I'm going to go ahead and, in this function, just print out meow. And this
lets me change it to anything else I want in the future. But for now, it's an
abstraction. And in fact, I can move it out of sight, out of mind-- just going to
hit Enter a bunch of times to pretend, now, it exists, but I don't care how it is
implemented. And up here, now, I can do something like this. For i in range of 3,
let me go ahead and not print "meow" anymore. Let me just call meow and tightening
up my code further.
Let's see. Python of meow.py. This is, I think, going to be the first time it does
not work correctly. OK. So here, we have, sadly, our first Python error. And let's
see. The syntax is going to be different from C or Clangs output. Traceback is the
term of art here. This is like a trace back of all of the lines of code that were
just executed or, really, functions you've called. The file name is uninteresting.
This is my codespace, specifically, but the file name is important here-- meow.py.
Our line 2 is the issue-- OK, I didn't get very far before I screwed up-- and then,
there's a name error. And you'll see, in Python, there's typically these
capitalized keywords that hint at what the issue is. It's something related to
names of variables. "meow" is not defined. All right. You're programming Python for
the first time. You've screwed up. You're following some online tutorial. You're
seeing this. Reason through it. Why might "meow" not be defined?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Maybe. Is it because "meow" is defined after? As smart as Python seems
to be, vis-a-vis C, they have some similar design characteristics. So let's try
that. So let me scroll all the way back down to where I moved this earlier. Let me
get rid of it-- way down there. I'll copy it to my clipboard. And let me just hack
something together. Let me just put it up here. And let's see if this works. So
now, let me clear my terminal, run python of meow.py.
OK. We're back in business. So that was actually really good intuition. Good
debugging technique, just reason through it. Now, this is contradicting what I
claimed back in week one, which was that the main part of your program, ideally,
should just be at the top of the file. Don't make me look for it. It's not a huge
deal with a four-line program, but if you've got 40 lines or 400 lines, you don't
want the juicy part of your program to be way down here, and all of these functions
way up here.
There's none of that hackish copying and pasting of the return type, the name and
the arguments to a function, like we needed in C. This is now OK instead, except
for one, minor detail. Let me go ahead and run python of meow.py. Hopefully, now,
I've solved this problem by having [GROANS] a main function.
But now, nothing has happened. All right. Even if you've never programmed in Python
before, what might explain this behavior, and how do I fix? Again, when you're off
in the real world, learning some new language, all you have is deductive logic to
debug. Yeah?
DAVID MALAN: Right. So the solution, to be clear, in C was that we had to put the
prototype up here. Otherwise, we'd get an error message. In this case, I'm actually
not getting an error message. And, indeed, I'll claim that you don't need the
prototypes in Python. Just not necessary because that was annoying, if nothing
else. But what else might explain? Yeah, in the back?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. Maybe you have to call main itself. If main is not some special
status in Python, maybe just because it exists isn't enough. And, indeed, if you
want to call main, the new convention is actually going to be-- as the very last
line of your program, typically-- to literally call main. It's a little stupid-
looking, but they made a design decision. And this is how, now, we work around it.
Python of meow.py. Now we're back in business.
But now, logically, why does this work the way it does? Well, in this case-- top to
bottom-- line 1 is telling Python to define a function called main and, then,
define it as follows, lines 2 and 3. But it's not calling main yet. Line 6 is
telling Python how to define a function called meow, but it's not calling these
lines yet. Now, on line 10, you're telling Python, call main. And at that point,
Python has been trained, if you will, to know what main is on line 1, to know what
meow is on line 6.
And so, it's now perfectly OK for main to be above meow because you never called
them yet. You defined, defined, and then, you called. And that's the logic behind
this. Any questions, now, on the structure of this technique, here? Now, let's do
one more, then. Recall that the last thing we did in Scratch and in C was to,
actually, parameterize these same functions.
So suppose that you don't want main to be responsible for the loop here. You
instead want to, very simply, do something like "meow" three times and be done with
it. Well, in Python, it's going to be similar in spirit to C. But, again, we don't
need to keep mentioning data types. If you want "meow" to take some argument-- like
a number n-- you can just specify n as the name of that argument. Or you can call
it anything else, of course, that you want. You don't have to specify int or
anything else.
In your code, now, inside of meow, you can do something like for i in, let's say--
I definitely, now, can't do this because that would be weird, to start the list and
end it with n. So, if I can come back over here, what's the solution? How can I do
something n times?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. Using range. So range is nice because I can pass in, now, this
variable n. And now, I can meow-- whoops. Now i can print out, quote unquote,
"meow." So it's almost the same as in Scratch, almost the same as in C. But it's a
little simpler. And if, now, I run meow.py, I'll have the ability, now, to do this
here, as well. All right. Questions on any of this? Right now, we're taking this
stroll through week one. We're going to, momentarily, escalate things to look not
only at some of these basics, but also, other features, like we saw with face
recognition with the speller or the like.
Because of how many of us are here, we have a huge amount of candy out in the
lobby. So why don't we go ahead and take a 10-minute break? And when we come back,
we'll do even fancier, more powerful things with Python in 10.
All right. So we are back. Among our goals, now, are to introduce a few more
building blocks so that we can solve more interesting problems at the end, much
like those that we began with. You'll recall, from a few weeks ago, we played with
this two-dimensional Super Mario world. And we tried to print a vertical column of
three or more bricks. Well, let me propose that we use this as an opportunity to,
now, tinker with some of Python's more useful, more user-friendly functionality, as
well.
So let me code a file called mario.py, and let's just print out the equivalent of
that vertical column. So it's of height 3. Each one is a hash, so let's do for i in
range of 3 initially, and let's just print out a single hash. And I think, now,
python of mario.py-- voila. We're in business, printing out just that same column
there. What if, though, we want to print a column of some variable height where the
user tells us how tall they want it to be?
Well, let me go up here, for instance and, instead, how about-- let's do this. How
about from cs50 import? How about the get_int function, as before? So it will deal
with making sure the user gives us an integer. And now, in the past, whenever we
wanted to get a number from a user, we've actually followed a certain paradigm. In
fact, if I open up here, for instance, how about mario1.c from a while back, you
might recall that we had code like this. And we specifically use the do while loop
in C whenever we want to get something from the user, maybe, again and again and
again, until they cooperate. At which point, we finally break out of the loop.
So it turns out, Python does have while loops, does have for loops, does not have
do while loops. And yet, pretty much any time you've gotten user input, you've
probably used this paradigm. So it turns out that the Python equivalent of this is
to do, similar in spirit, but using only a while loop. And a common paradigm in
Python, as I alluded earlier, is to actually deliberately induce an infinite loop
while True-- capital T-- and then, do what you want to do, like get an int from the
user and prompt them for the height, for instance, in question.
And then, if you're sure that the user has given you what you want-- like n is
greater than 0, which is what I want, in this case, because I want a positive
integer; otherwise, there's nothing to print-- you literally just break out of the
loop. And so, we could actually use this technique in C. It's just not really done
in C. You could absolutely, in C, have done a while True loop with the parentheses,
lowercase true. You could break out of it, and so forth.
But in Python, this is the Python way. And this is actually a term of art. This way
in Python is pythonic This is "the way everyone does it," quote unquote. Doesn't
mean you have to, but that's the way the cool Python programmers would implement an
idea like this-- trying to do something again and again and again until the user
actually cooperates. But all we've done is take away the do while loop. But still,
logically, we can implement the same idea.
Now, below this, let me go ahead and just print out, for i in range of n this
time-- because I want it to be variable and not 3. I can go ahead and print out the
hash-- let me go ahead and get rid of the C version here-- open my terminal window
and I'll run, again, Python of mario.py. I'll type in 3 and I get back those three
hashes. But if I, instead, type in 4, I now get four hashes instead.
So the takeaway here is, quite simply, that this would be the way, for instance, to
actually get back a value in Python that is consistent with some parameter, like
greater than 0. How about this? Let's actually practice what we preached a moment
ago with our meowing examples and factoring all this out. Let me go ahead and
define a main function, as before. Let me go ahead and assume, for the moment, that
a get_height function exists, which is not a thing in Python. I'm going to invent
it in just a moment.
And now, I'm going to go ahead and do something like this. for i in the range of
that height, well, let's go ahead and print out those hashes. So I'm assuming that
get_height exists. Let me go ahead and implement that abstraction, so define a
function, now, called get_height. It's not going to take any arguments in this
design. While True, I can go ahead and do the same thing as before-- assign a
variable n, the return value of get_int prompting the user for that height.
And then, if n is greater than 0, I can go ahead and break. But if I break here, I,
logically-- just like in C-- end up executing below the loop in question. But
there's nothing there. But if I want get_height to return the height, what should I
type here on line 14, logically? What do I want to return, to be clear?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah. So I actually want to return n. And here's another curiosity of
Python, vis-a-vis C. There doesn't seem to be an issue of scope anymore, right? In
C, it was super important to not only declare your variables with the data types,
you also had to be mindful of where they exist-- inside of those curly braces. In
Python, it turns out you can be a little looser with things, for better or for
worse.
And so, on line 11, if I create a variable called n, it exists on line 11, 12 and
even 13, outside of the while loop. So to be clear, in C, with a while loop, we
would have ordinarily had not a colon. We would have had the curly brace, like here
and over here. And a week ago, I would have claimed that, in C, n does not exist
outside of the while loop, by nature of those curly braces.
Even though the curly braces are gone, Python actually allows you to use a variable
any time after you have assigned it a value. So slightly more powerful, as such.
However, I can tighten this up a little bit, logically. And this is true in C. I
don't really need to break out of the loop by using break. Recall that or know that
I can actually-- once I'm ready to go, I can just return the value I care about,
even inside of the loop. And that will have the side effect of breaking me out of
the loop and, also, breaking me out of and returning from the entire function.
So nothing too new here, in terms of C versus Python, except for this issue with
scope. And I, indeed, returned n at the bottom there, just to make clear that n
would still exist. So either of those are correct. Now, I just have a Python
program that I think is going to allow me to implement this same Mario idea. So
let's run python of mario.py. And-- OK, so nothing happened. Python of mario.py.
What did I do wrong?
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, I have to call main. So, at the bottom of my code, I have to
call main here. And this is a stylistic detail that's been subtle. Generally
speaking, when you are writing in Python, there's not a CS50 style guide, per se.
There's actually a Python style guide that most people adhere to. And in this case,
double blank lines between functions is the norm. I'm doing that deliberately,
although it might, otherwise, not be obvious.
But now that I've called main on line 16, let's run mario.py once more. Aha. Now we
see it. Type in 3, and I'm back in business, printing out the values there. Yeah?
DAVID MALAN: Sure. Why do I need the if condition at all? Why can't I just return n
here as by doing return n. Or if I really want to be succinct, I could technically
just do this. The only reason I added the if condition is because, if the user
types in negative 1, negative 2, I wanted to prompt them again and again. That's
all. But that would be totally acceptable, too, if you were OK with that result
instead.
Well, let me do one other thing here to point out why we are using get_int so
frequently. This new training wheel, albeit temporarily. So let me go back to the
way it was a moment ago and let me propose, now, to take away get_int. I claimed
earlier that, if you're not using get_int, you can just use the input function
itself from Python. But that always returns a string, or a str. And so, recall that
you have to pass the output of the input function to an int, either on the same
line or, if you prefer, on another line, instead.
But it turns out what I didn't do was show you what happens if you don't cooperate
with the program. So if I run python of mario.py now, works great, even without the
get_int function. And I can do it with 4. Still works great. But let me clear my
terminal and be difficult, now, as the user and type in "cat" for the height
instead. Enter. Now, we see one of those trace backs again.
This one is different. This isn't a name error, but, apparently, a value error. And
if I ignore the stuff I don't understand, I can see "invalid literal for int with
base 10-- "cat."" That's a super cryptic way of saying that C-A-T is not a number
in decimal notation. And so, I would seem to have to, somehow, handle this case.
And if you want to be more curious, you'll see that this is, indeed, a traceback.
And C tends to do this, too, or the debugger would do this for you, too. You can
see all of the functions that have been called to get you to this point. So
apparently, my problem is, initially, in line 14. But line 14, if I keep scrolling,
is uninteresting. It's main. But line 14 leads me to execute line 2, which is,
indeed, in main. That leads me to execute line 9, which is in get_height.
And so, OK, here is the issue. So the closest line number to the error message is
the one that probably reveals the most. Line 9 is where my issue is. So I can't
just blindly ask the user for input and, then, convert it to an int if they're not
going to give me an int. Now, how do we deal with this? Well, back in problem set
two, you might recall validating that the user typed in a number and using a for
loop and the like.
Well, it turns out, there's a better way to do this in Python, and the semantics
are there. If you want to try to convert something to a number that might not
actually be a number, turns out, Python and certain other languages literally have
a keyword called try. And if only this existed for the past few weeks, I know. But
you can try to do the following with your code. What do I want to try to do?
Well, I want to try to execute those few lines, except if there's an error. So I
can say except if there's a value error-- specifically, the one I screwed up and
created a moment ago. And if there is a value error, I can print out an informative
message to the user, like "not an integer" or anything else. And what's happening
here, now, is literally this operative word, try.
Python is going to try to get input and try to convert it to an int, and it's going
to try to check if it's greater than 0 and then try to return it. Why? Three of
those lines are inside of, indented underneath the try block, except if something
goes wrong-- specifically, a value error happens. Then, it prints this. But it
doesn't return anything. And because I'm in a loop, that means it's going to do it
again and again and again until the human actually cooperates and gives me an
actual number.
And so, this, too, is what the world would call pythonic. In Python, you don't,
necessarily, rigorously try to validate the user's input, make sure they haven't
screwed up. You honestly take a more lackadaisical approach and just try to do
something, but catch an error if it happens. So catch is also a term of art, even
though it's not a keyword here. Except if something happens, you handle it.
So you try and you handle it. You best-effort programming, if you will. But this is
baked into the mindset of the Python programming community. So now, if I do python
of mario.py and I cooperate, works great as before. Try and succeed. 3 works. 4
works. If, though, I try and fail by typing in "cat," it doesn't crash, per se. It
doesn't show me an error. It shows me something more user-friendly, like "not an
integer." And then, I can try again with "dog." "Not an integer." I can try again
with 5. And now, it works.
So we won't, generally, have you write much in the way of these try-except blocks,
only because they get a little sophisticated quickly. But that is to reveal what
the get_int function is doing. This is why we give you the training wheels, so
that, when you want to get an int, you don't have to jump through all these
annoying hoops to do so. But that's all the library's really doing for you, is just
try and except.
You won't be left with any training wheels, ultimately. Questions, now, on getting
input and trying in this way? Anything at all? Yeah?
DAVID MALAN: Oh, could you put the condition outside of the try block? Short
answer, yes. And, in fact, I struggled with this last night when tweaking this
example to show the simplest version. I will disclaim that, really, I should only
be trying, literally, to do the fragile part. And then, down here, I should be
really doing what you're proposing, which is do the condition out here.
The problem is, though, that, logically, this gets messy quickly, right? Because
except if there's a value error, I want to print out "not an integer." I can't
compare n against 0, then, because n doesn't exist because there was an error. So
it turns out-- and I'll show you this; this is now the advanced version of Python--
there's actually an else keyword you can use in Python that does not accompany if
or elif. It accompanies try and except, which I think is weirdly confusing. A
different word would have been better.
But if you'd really prefer, I could have done this, instead. And this is one of
these design things where reasonable people will disagree. Generally speaking, you
should only try to do the one line that might very well fail. But honestly, this
looks stupid. No, it's just unnecessarily complicated. And so, my own preference
was actually the original, which was-- yeah, I'm trying a few extra lines that,
really, aren't going to fail, mathematically. But it's just tighter. It's cleaner
this way.
And here's, again, the sort of arguments you'll start to make yourself as you get
more comfortable with programming. You'll have an opinion. You'll disagree with
someone. And so long as you can back you argument up, it's pretty reasonable,
probably. All right. So how about we, now, take away some piece of magic that's
been here for a while. Let me go ahead and delete all of this here. And let me
propose that we revisit not that vertical column and the exceptions that might
result from getting input, but these horizontal question marks that we saw a while
ago.
So I want all of those question marks on the same line. And yet, I worry we're
about to see a challenge here because print, up until now, has been putting new
lines everywhere automatically, even without those backslash n's. Well, let me
propose that we do this. for i in the range of 4. If I want four question marks,
let me just print four question marks. Unfortunately, I don't think this is correct
yet. Let me run python of mario.py. And, of course, this gives me a column instead
of the row of question marks that I want.
So how do we do this? Well, it turns out, if you read the documentation for the
print function, it turns out that print, not surprisingly, perhaps, takes a lot of
different arguments, as well. And in fact, if you go to the documentation for it,
you'll see that it takes not just positional arguments-- that is, from left to
right, separated by commas. It turns out, Python has supports a fancier feature
with arguments where you can pass the names of arguments to functions, too.
So what do I mean by this? If I go back to VS Code here and I've read the
documentation, it turns out that, yes, as before, you can pass multiple arguments
to Python, like this. Hello comma David comma Nalan, that will just automatically
concatenate all three of those positional arguments together. They're positional in
the sense that they literally flow from left to right, separated by commas.
But if you don't want to just pass in values like that, you want to actually print
out, as I did before, a question mark. But you want to override the default
behavior of print by changing the line ending, you can actually do this. You can
use the name of an argument that you know exists from the documentation and set it
equal to some alternative value. And in fact, even though this looks cryptic, this
is how I would override the end of each line, to be quote, unquote.
That is nothing because, if you read the documentation, the default value for this
end argument-- does someone want to guess-- is-- is backslash n. So if you read the
documentation, you'll se that backslash n is the implied default for this end
argument. And so, if you want to change it, you just say end equals something else.
And so, here, I can change it to nothing and, now, rerun python of mario.py. And
now, they're all in the same line.
Now, it looks a little stupid because I made that week one mistake where I still
need to move the cursor to the next line. That's just a different problem. I'm just
going to go over here and print nothing. I don't even need to print backslash n
because, if print automatically gives you a backslash n, just call print with
nothing, and you'll get that for free. So let me rerun python of mario.py. And now,
it looks a little prettier at the prompt.
And to be super clear as to what's going on-- suppose I want to make an exclamation
here. I could change the backslash n default to an exclamation point, just for
kicks. And if I run python of mario.py Again, now, I get this exclamation with
question marks and exclamation points, as well. So that's all that's going on here.
And this is what's called a named argument. It literally has a name that you can
specify when calling it in. And it's different from positional in that you're
literally using the name.
Let me propose something else, though. And this is why people like Python. There's
just cool ways to do things. That's a three-line, verbose way of printing out four
question marks. I could certainly take the shortcut and just do this. But that's
not really that interesting for anyone, especially if I want to do it a variable
number of times. But Python does let you do this.
If you want to multiply a character some number of times, not only can you use plus
for concatenation, you can use star or an asterisk for multiplication, if you
will-- that is, concatenation again and again and again. So if I just print out,
quote unquote, "?" times 4, that's actually going to be the tightest way, the most
distinct way I can print four question marks instead. And if I don't use 4, I use
n, where I get n from the user. Bang.
Now, I've gotten rid of the for loop entirely, and I'm using the star operator to
manipulate it instead. And, to be super clear here, insofar as Python does not have
malloc or free or memory management that you have to do, guess what Python also
doesn't have. Anything on your minds in the past couple of week? Doesn't have--
AUDIENCE: Pointers.
DAVID MALAN: Pointers, yeah. So Python does not have pointers, which just means
that all of that happens for you automatically, underneath the hood, again, by way
of code that someone else wrote.
How about one more throwback with Mario? We've talked about, in week one, this two-
dimensional structure where it's like I claim 3 by 3-- a grid of bricks, if you
will. Well, how can we do this in Python? We can do this in a couple of ways, now.
Let me go back to my mario.py, and let me do something like for i in range of--
we'll just do 3, even though I know, now, I could use get_int or I could use input
and int.
And if I want to do something two-dimensionally, just like in C, you can nest your
for loops. So maybe I could do for j in range of 3. And then, in here, I could
print out a hash symbol. And then, let's see if that gives me 9 total. So if I've
got a nested loop like this, python of mario.py hopefully gives me a grid. No, it
gave me a column of 9. Why, logically, even though I've got my row and my columns?
Yeah.
AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, the line ending. So in my row, I can't let print just keep
adding new line, adding new line. So I just have to override this here and let me
not screw up like before. Let me print one at the end of the whole row, just to
move the cursor down. And I think, now, together, we've got our 3 by 3. Of course,
we could tighten this up further. If I don't like the nested loop, I probably could
go in here and just print out, for instance, a brick times 3. Or I could change the
3 to a variable if I've gotten it from the user.
So I can tighten this up further. So, again, just different ways to solve the same
problem and, again, evidence of why a lot of people like Python. There's just some
more pleasant ways to solve problems without getting into the weeds, constantly, of
doing things, like with for loops and while loops endlessly. All right. Well, how
about some other building blocks?
Lists are going to be so incredibly useful in Python, just as arrays were in C. But
arrays are annoying because you have to manage the memory yourself. You have to in
advance how big they are or you have to use pointers and malloc or realloc to
resize them. Oh my god. The past two weeks have been painful, in that sense. But
Python does this all for free for you. In fact, there's a whole bunch of functions
that come with Python that involve lists, and they'll allow us, ultimately, to do
things again and again and again within the same data structure.
And, for instance, we'll be able to get the length of a list. You don't have to
remember it yourself in a variable. You can just ask Python how many elements are
in this list. And with this, I think we can solve some old problems, too. So let me
go back here, to VS Code. Let me close mario and give us a new program called
scores.py. And rather than show the C and the Python now, let's just focus on
Python.
And in scores.c way back when, we just averaged three test scores or something like
that-- 72, 73, and 33-- a few weeks ago. So if I want to create a list in this
Python version of 72, 73, 33, I just use my square bracket notation. C let you use
curly braces if you know the values in advance, but Python's just this. And now, if
I want to compute the average-- in C, recall, I did something with a loop. I added
all the values together. I, then, divide it by the total number of values just like
you would in grade school, and that gave me the average.
Well, Python comes with a lot of super handy functions-- not just length, but
others, as well. And so, in fact, if you want to compute the average, you can take
the sum of all of those scores and divide it by the length of all of those scores.
So Python comes with length, comes with sum. You can just pass in a whole list of
any size and let it deal with that problem for you. So if I want to, now, print out
this average, I can print out Average colon-- and then, I'll plug in my average
variable for interpolation. Let me make this an fstring so that it gets formatted,
and let me just run python of scores.py.
And there is my average. It's rounding weird because we're still vulnerable to some
floating point imprecision, but at least I didn't need loops and I didn't have to
write all this darn code just to do something that Excel and Google Spreadsheets
can just do like that. Well, Python is closer to those kinds of tools, but more
powerful in that you can manipulate the data yourself. How about, though, if I want
to get a bunch of scores manually from the user and, then, sum them together. Well,
let's combine a few ideas here. How about this?
First, let me go ahead and import the get_int function from the CS50 library, just
so we don't have to deal with try and except or all of that. And let me go ahead
and give myself an empty list. And this is powerful. In C, [SIGHS] there's no point
to an empty array because, if you create an empty array with square bracket
notation, it's not useful for anything. But in Python, you can create it empty
because Python will grow and shrink the list for you automatically, as you add
things to it.
So if I want to get three scores from the user, I could do something like this--
for i in range of 3. And then, I can grab a variable called "score" or anything. I
could call get_int, prompt the human for the score that they want to type in. And
then, once they do, I can do this. Thinking back to our object-oriented programming
capability now, I could do scores.append, and I can append that score to it.
And you would only know this from having read the documentation, heard it in class,
in a book or whatnot, but it turns out that, just like strings have functions like
lower built into them, lists have functions like append built into them that just
literally appends to the end of the list for you, and Python will grow or shrink it
as needed. No more malloc or realloc or the like.
So this just appends to the scores list. That score, and then again and again and
again. So the array starts at-- sorry, the list starts at size 0, then grows to 1
then 2 then 3 without you having to do anything else. And so, now, down here, I can
compute an average with the sum of those scores divided by the length of the total
number of scores. And to be clear, length is the total number of elements in the
list. Doesn't matter how big the values themselves are.
Now I can go ahead and print out an fstring with something like Average colon
average in curly braces. And if I run python of scores.py-- I'll type in, just for
the sake of discussion, the three values, I still get the same answer. But that
would have been painful to do in C unless you committed, in advance, to a fixed
size array-- which we already decided, weeks ago, was annoying-- or you grew it
dynamically using malloc or realloc or the like.
All right. What else can I do? Well, there's some nice things you might as well
know exist. Instead of scores.append, you can do slight fanciness like this. If you
want to append something to a list, you can actually do plus equals, and then put
that thing in a temporary list of its own and just use what is essentially
concatenation-- but not concatenation of strings, but concatenation of lists. So
this new line 6 appends to the score's list-- this tiny, little list I'm
temporarily creating with just the current new score.
So just another piece of syntax that's worth seeing that allows you to do something
like that, as well. All right. Well, how about we go back to strings for a moment?
And all of these examples, as always, are on the course's website afterward.
Suppose we want to do something like converting characters to uppercase. Well, to
be clear, I could do something like this. Let me create a program called
uppercase.py.
Let me prompt the user for a before string as by using the input function or
get_string, which is almost the same. And I'll prompt the user for a string
beforehand. Then, let me go ahead and print out, how about, the keyword "After,"
and then end the new line with nothing, just so that I can see "Before" on one line
and "After" on the next line. And then, let me do this-- and here's where Python
gets pleasant, too, with loops-- for c in before-- print c.upper end equals quote,
unquote. And then, I'll print this here.
All right. That was fast, but let's try to infer what's going on. So line 1 just
gets input from the user, stores it in a variable called before. Line two literally
just prints "After" but doesn't move the cursor to the next line. What it, then,
does is this. And, in C, this was a little more annoying. You needed a for loop
with i. You needed array notation with the square brackets. But, Python, if you say
for variable in string-- so for c, for character, in string, Python is going to
automatically assign c to the first letter that the user types in.
Then, on the next iteration, the second letter, the third letter, and the fourth.
So you don't need any square bracket notation, you just use c, and Python will do
it for you and just hand you back, one at a time, each of the letters that the user
has typed in. So if I go back over here and I run, for instance, python of
uppercase.py and I'll type in, how about, "david" in all lowercase and hit Enter,
you'll now see that it's all uppercase instead by iterating over it, indeed, one
character at a time.
But we already know, thanks to object-oriented programming, strings themselves have
the functionality built in to not just uppercase single characters, but the whole
string. So, honestly, this was a bit of a silly exercise. I don't need to use a
loop anymore, like in C. And so, some of the habits you've only just developed in
recent weeks, it's time to start breaking them when they're not necessary.
I can create a variable called after, set it equal to before.upper-- which, indeed,
exists, just like dot lower exists. And then, what I can go ahead and print out is,
for instance-- let's get rid of this print line here and do it at the end-- "After"
and print the value of that variable. So now, if I rerun uppercase.py, type in
"david" in all lowercase, I can just uppercase the whole thing all at once because,
again, in Python, you don't have to operate on characters individually.
Questions on any of these tricks up until now? No? All right. How about a few other
techniques that we saw in C that we'll bring back, now, in Python. So it turns out,
in Python, there are other libraries you can use, too, that unlock even more
functionality. So, in C, if you wanted command line arguments, you just change the
signature for main to be, instead of void, int argc comma string argv, open
brackets for an array or char star, eventually.
Well, it turns out, in Python, that, if you want to access command line arguments,
it's a little simpler, but they're tucked away in a library-- otherwise known as a
module-- called sys, the system module. Now, this is similar, in spirit, to the
CS50 library, and that's got a bunch of functionality built in. But this one comes
with Python itself. So if I want tot create a program like greet.py, in VS Code,
here, let me go ahead and do this.
From the sys library, let's import argv. And that's just a thing that exists. It's
not built into main because there is no main, per se, anymore. So it's tucked away
in that library. And now, I can do something like this. If the length of argv
equals equals 2, well, let's go ahead and print out something friendly, like hello
comma argv bracket 1, and then, close quotes. Else, if the length of argv is not
equal to 2, Let's just go ahead and print out hello, world.
Now, at a glance, this might look a little cryptic, but it's identical to what we
did a few weeks ago. When I run this, python of greet.py, with no arguments, it
just says "hello, world." But if I, instead, add a command line argument, like my
first name and hit Enter, now, the length of argv is no longer 1. It's going to be
2. And so, it prints out "Hello, David" instead.
So the takeaway here is that, whereas in C, argv technically contained the name of
your program, like ./hello or ./greet, and then everything the human typed.
Python's a little different in that, because we're using the interpreter in this
way-- technically, when you run python of greet.py, the length of argv is only 1.
It contains only greet.py, so the name of the file. It does not unnecessarily
contain Python itself because what's the point of that being there, omnipresently?
It does contain the number of words that the human typed after Python itself. So
argv is length 1 here. argv is length 2 here. And that's why, when it did equal 2,
I saw "Hello, David" instead of the default "Hello, world." So same ability to
access command line arguments, add these kinds of inputs to your functions, but you
have to unlock it by way of using argv instead, in this way.
If you want to see all of the words, you could do something like this. Just as-- if
we combine ideas, here-- for i in range of, how about, length of argv. Then, I can
do this-- print argv bracket i. All right. A little cryptic, but line 3 is just a
for loop iterating over the range of length of argv. So if the human types in two
words, the length of argv will be 2. So this is just a way of saying iterate over
all of the words in argv, printing them one at a time.
So python of greet.py, Enter just prints out the name of the program. python of
greet.py with David prints out greet.py and, then, David. I can keep running it
though with more words, and they'll each get printed one at a time. But what's
nice, too, about Python-- and this is the point of this exercise-- honestly, this
looks pretty cryptic. This is not very pleasant to look at. If you just want to
iterate over every word in a list, which argv is, watch what I can do.
I can do for arg or any variable name in argv. Let me just, now, print out that
argument. I could keep calling it i, but i seems weird when it's not a number. So
I'm changing to arg as a word, instead. If I now do python of greet.py, it does
this. If I do python of greet.py, David, it does that again. David Malan, it does
that again. So this is, again, why Python is just very appealing. You want to do
something this many times, iterate over a list? Just say it, and it reads a little
more like English. And there's even other fanciness, too, if I may.
It's a little stupid that I keep seeing the name of the program, greet.py, so it'd
be nice if I could remove that. Python also supports what are called slices of
arrays-- sorry, slices of lists. Even I get the terminology confused. If argv is a
list, then it's going to print out everything in it. But if I want a slice of it
that starts at location 1 all the way to the end, you can use this funky syntax in
between the square brackets, which we've not seen yet, that's going to start at
item 1 and go all the way to the end.
And so, this is a nice, clever way of slicing off, if you will, the very first
element because now, when I run greet.py, David Malan, I should only see David and
Malan. If I only want one element, I could do 1 to 2. If I want all of them, I
could do 0 onward. I could give myself just one of them in this way. So you can
play with the start value and the end value in this way, to slice and dice these
lists in different ways.
That would have been a pain in C, just because we didn't really have the built-in
support for manipulating arrays as cleanly as this. All right. Just so you've seen
it, too-- though, this one is less exciting to see live-- if I go ahead and create
a quick program here, it turns out, there's something else in the sys library, the
ability to exit programs-- either exiting with status code 1 or 0, as we've been
doing any time something goes right or wrong.
So, for instance, let me whip up a quick program that just says, if the length of
sys.argv does not equal 2, then let's yell at the user and say you're missing a
command line argument. Otherwise, command-line argument. And let's, then, return
sys.exit(1). Else, let's go ahead and, logically, just say print a formatted string
that says hello-- as before-- sys.argv 1. Now, things look different all of a
sudden, but I'm doing something deliberately.
First, let's see what this does. So, on line 1, I'm importing not argv,
specifically. I'm importing the whole sys library, and we'll see why in a second.
Well, it turns out that the sys library has not only the argv list, it also has a
function called exit, which I'd like to be able to use, as well. So it turns out
that, if you import a whole library in this way, that's fine. But you have to refer
to the things inside of it by using that same library's name and a dot to namespace
it, so to speak.
So here, I'm just saying, if the user does not type in two words, yell at them with
missing command line argument, and then, exit with 1. Just like in C, when you do
exit 1, just means something went wrong. Otherwise, print out hello to this. And
this is starting to look cryptic, but it's just a combination of ideas. The curly
braces means interpolate this value, plug it in here. sys.argv is just the verbose
way of saying go into the sys library and get the argv variable therein. And
bracket 1, of course, just like arrays in C, is just the second element at the
prompt.
So when I run this version, now-- python of exit.py-- with no arguments, I get
yelled at in this way. If, however, I type in two arguments total-- the name of the
file and my own name-- now, I get greeted with hello, David. And it's the same idea
before. This was a very low-level technique, but same thing here. If you do echo
dollar sign question mark Enter, you'll see the exit code of your program.
Let me go ahead and create a program called names.py that's just going to be an
opportunity to, maybe, search over a whole bunch of names. Let me go ahead and
import sys, just so I have access to exit. And let me go ahead and create a
variable called names that's going to be a list with a whole bunch of names. How
about here? Charlie and Fred and George and Ginny and Percy and, lastly, Ron. So a
whole bunch of names here.
And it'd be a little annoying to implement code that iterates over that, from left
to right, in C, searching for one of those names. In fact, what name? Well, let's
go ahead and ask the user to input the name that they want to search for so that we
can tell them if the name is there or not. And we could do this, similar to C, in
Python, doing something like this. So for n in names, where n is just a variable to
iterate over each name-- if the name I'm looking for equals the current name in the
list-- AKA n-- well, let's print out something friendly, like "Found." And then,
let's do sys.exit 0 to indicate that we found whoever that is.
Otherwise, if we get all the way to the bottom here, outside of this loop, let's
just print "Not found" because if we haven't exited yet. And then, let's just exit
with 1. Just to be clear, I can continue importing all of sys, or I could do from
sys import exit, and then, I could get rid of sys dot everywhere else. But
sometimes, it's helpful to know exactly where functions came from. So this, too, is
just a matter of style, in this case. All right. So let's go ahead and run this.
python of names.py, and let's look for Ron, all the way at the end. All right. He's
found. And let's search for someone outside of the family here, like Hermione. Not
found. OK. So it seems to be working in this way. But I've essentially implemented
what algorithm? What algorithm would this seem to be, per line 7 and 8 to 9 and 10?
AUDIENCE: Linear.
DAVID MALAN: Yeah. So it's just linear search. It's a loop, even thought he syntax
is a little more succinct today, and it's just iterating over the whole thing.
Well, honestly, we've seen an even more terse way to do this in Python. And this,
again, is what makes it a more pleasant language, sometimes. Why don't I just do
this? Instead of iterating one at a time, why don't I just say this? Let me go
ahead and change my condition to just be-- how about if the name we're looking for
is in the names list, we're done. We found it.
Use the end preposition that we've seen a couple of times, now, that itself asks
the question, is something in something else? And Python will take care of linear
search for us. And it's going to work exactly the same if I do python of names.py,
search for Ron. It's still going to find him and it's still going to do it
linearly, in this case. But I don't have to write all of the lower-level code
myself, in this case. Questions, now, on any of this? The code's just getting
shorter and shorter. No? What about-- let's see. What else might we have here?
How about this? Let's go ahead and implement that phonebook that we started,
metaphorically, with in the beginning of the course. Let's code up a program called
phonebook.py. And in this case, let's go ahead and let's create a dictionary this
time. Recall that a dictionary is a little something that implements something like
this-- a two-column table that's got keys and values, words and definitions, names
and numbers. And let's focus on the last of those, names and numbers, in this case.
Well, I claimed earlier that Python has built-in support for dictionaries-- dict
objects-- that you can create with one line. I didn't need it for speller because a
set is sufficient when you only want one of the keys or the values, not both. But
now, I want some names and numbers. So it turns out, in Python, you can create an
empty dictionary by saying dict open parenthesis, closed. And that just gives you,
essentially, a chart that looks like this, with nothing in it.
Or there's more succinct syntax. You can, alternatively, do this, with two curly
braces, instead. And, in fact, I've been using a shortcut all this time. When I had
a list, earlier, where my variable was called scores, and I did this, that was
actually the shorthand version of this-- hey, Python, give me an empty list. So
there's different syntax for achieving the same goal. In this case, if I want a
dictionary for people, I can either do this or, more commonly, just two curly
braces, like that.
All right. Well, what do I want to put in this? Well, let me actually put some
things in this. And I'm going to just move my closed curly brace to a new line. If
I want to implement this idea of keys and values, the way you do this in Python is
key colon value comma. Key colon value. So you'd implement it more in code. So, for
instance, if I want Carter to be the first key in my phone book and I want his
number to be +1-617-495-1000, I can put that as the corresponding value.
The colon is in between. Both are strings, or strs, so I've quoted both
deliberately. If I want to add myself, I can put a comma. And then, just to keep
things pretty, I'm moving the cursor to the next line. But that's not strictly
required, aesthetically. It's just good style. And here, I might do +1-949-468-
2750. And now, I have a dictionary that, essentially, has two rows, here-- Carter
and his number and David and his number, as well. And if I kept adding to this,
this chart would just get longer and longer.
Suppose I want to search for one of our numbers. Well, let's prompt the user for
the name, for whose number you want to search by getting string. Or you know what?
We don't need this CS50 library. Let's just use input and prompt the user for a
name. And now, we can use this super terse syntax and just say if name in people,
print the formatted string number colon and-- here, we can do this-- people bracket
name.
OK. So this is getting cool quickly, confusingly. So let me run this. python of
phonebook.py Let's type in Carter. And, indeed, I see his number. Let's run it
again with David, and I see my number here. So what's going on? Well, it turns out
that a dictionary is very similar, in spirit, to a list. It's actually very
similar, in spirit, to an array in C. But instead of being limited to keys that are
numbers, like bracket 0, bracket 1, bracket 2, you can actually use words. And
that's all I'm doing here on line 8.
If I want to check for the name Carter, which is currently in this variable called
name, I can index into my people dictionary using not a number, but using,
literally, a string-- the name Carter or David or anything else. To make this
clearer, too, notice that I'm, at the moment, using this format string, which is
adding some undue complexity. But I could clarify this, perhaps, further as this.
I could give myself another variable called number, set it equal to the people
dictionary, indexing into it using the current name. And now, I can shorten this to
make it clearer that all I'm doing is printing the value of that. And, in fact, I
can do this even more cryptically. This would be weird to do, but if I only ever
want to show David's phone number and never Carter's, I can literally, quote
unquote, "index into" the people dictionary because, now, when I run this, even if
I type Carter, I'm going to get back my number instead.
But that's all that's happening if I undo that, because that's now a bug. But I
index into it using the value of name. Dictionaries are just so wonderfully
convenient because, now, you can associate anything with anything else but not
using numbers, but entire key words, instead. So here's how, if, in speller, we
gave you not just words, but hundreds of thousands of definitions, as well, you
could essentially store them as this. And then, when the human wants to look up a
definition in a proper dictionary, not just for spell checking, you could index
into the dictionary using square brackets and get back the definition in English,
as well.
AUDIENCE: Is the way this code does, as presented, saying that Python has
[INAUDIBLE]?
DAVID MALAN: A really good question. So, to summarize, how is Python finding that
name within that dictionary? This is where, honestly, speller in p-set 5 is what
Python's all about. So you have struggled, are struggling with implementing your
own spell checker and implementing your own hash table. And recall that, per last
week, the goal of a hash table is to, ideally, get constant time access. Not
something linear, which is slow and even better than something logarithmic, like
log base 2 of n.
So Python and the really smart people who invented it, they have written the code
that does its best to give you constant time searches of dictionaries. And they're
not always going to succeed, just as you and your own problem set are probably
going to have some collisions once in a while and start to have chains of length
lists of words. But this is where, again, you defer to someone else, someone
smarter than you, someone with more time than you to solve these problems for you.
And if you read Python's documentation, you'll see that it doesn't guarantee
constant time, but it's going to, ideally, optimize the data structure for you to
get as fast as possible. And of all of the data structures like a dictionary, a
hash table is, really, like the Swiss army knife of computing because it just lets
you associate something with something else. And even though we keep focusing on
names and numbers, that's a really powerful thing because it's more powerful than
lists and arrays, which are only numbers and something else. Now, you can have any
sorts of relationships, instead.
All right. Let me show a few other examples before we culminate with some more
powerful techniques in Python, thanks to libraries. How about this problem we
encountered in week 4, which was this. Let me code up a program called, again,
compare.py here but, this time, compare to strings and not numbers. So let me, for
instance, get one string from the user called s.
Just for the sake of discussion, let me get another string from the user called t
so that we can actually do some comparison here. And if s equals equals t, let's go
ahead and print out that they're the same. Else, let's go ahead and print out that
they're different. So this is very similar to what we did in week 4. But in week 4,
recall we did this specifically because we had encountered a problem.
For instance, if I run-- whoops. If I run-- what's going on? [INAUDIBLE] Come on.
Oh. OK. Wow, OK. Long day. All right. If I run the proper command, python of
compare.py, then let's go ahead and type in something like "cat" in all lowercase,
"cat" in all lowercase. And they're the same. If, though, I do this again with
"dog" and "dog," they're the same. And, of course, "cat" and "dog," they're
different. But does anyone recall, from two weeks ago, when I typed in my name
twice, both identically capitalized.
What did it say? That they were, in fact, different. And why was that? Why were two
strings in C different, even though I typed literally the same thing? Two different
places in memory. So each string might look the same, aesthetically, but, of
course, was stored elsewhere in memory. And yet, Python appears to be using the
equality operator-- equals equals-- like you and I would expect, as humans--
actually comparing for us char by char in each of those strings for actual [?
quality. ?]
So this is a feature of Python, in that it's just easier to do. And why? Well, this
derives from the reality that, in Python, there are no pointers anymore. There's no
underlying memory management. It's not up to you, now, to worry about those lower-
level details. The language itself takes care of that for you. And so, similarly,
if I do this and don't ask the user for two strings, but just one, and then, I do
something like this. How about give myself a second variable t, set it equal to
s.capitalize, which, note, is not the same as upper; capitalize, by design, per
Python's documentation, will only capitalize the first letter for you-- I can now
print out, say, two fstrings here-- what the value of s is and, then, let me print
out, with another fstring, what the value of t is.
And recall that, in C, this was a problem because if you capitalize s and store it
in t, we accidentally capitalized both s and t. But in this case, in Python, when I
actually run this and type in "cat" In all lowercase, the original s is unchanged
because, when I use capitalize on line 3, this is, indeed, capitalizing s. But it's
returning a copy of the result. It cannot change s itself because, again, for that
technical term, s is immutable. Strings, once they exist, cannot be changed
themselves. But you can return copies and modify mutated copies of those same
strings.
So, in short, all of those headaches we encountered in week 4 are now solved,
really, in the way you might expect. And here's another one that we dwelled on in
week 4, with the colored liquid in glasses. Let me code up a program called
swap.py. And in swap.py, let me set x equal to 1, y equal to 2. And then, let me
just print out an fstring here. So how about x is this comma y is that. And then,
let me do that twice, just for the sake of demonstration.
And in here, recall that we had to create a swap function. But then, we had to pass
it in by reference with the ampersand. And oh my god, that was peak complexity in
C. Well, if you want to swap x and y in Python, you could do x comma y equals y
comma x. And now, python of swap.py. And there we go. All of that's handled for
you. It's like a shell game without even a temporary variable in mind.
So what more can we do here? How about a few final building blocks? And these
related, now, to files from that week 4. Suppose that I want to save some names and
numbers in a CSV file-- Comma Separated Values, which is like a very lightweight
spreadsheet. Well, first, let me create a phonebook.csv file that just has name
comma number as the first row there.
But after that, I'm going to go ahead, now, and code up a phonebook.py program that
actually allows me to add things to this phonebook. So let me split my screen here
so that we can see the old and the new. And down here, in my code for phonebook.py,
in this new and improved version, I'm going to actually import a whole other
library, this one called CSV. And here, too, especially for people in data science
and the like, really like being able to manipulate files and data that might very
well be stored in spreadsheets or CSVs-- Comma Separated Values, which we saw
briefly in week 4.
After this, I want to get, maybe, a name from the user. So let's prompt the user
for some input for their name. And then, let's prompt the user for a number, as
well, using input prompting for number. All right. And now, this is a little
cryptic, and you'd only know this from the documentation. But if you want to write
rows to a CSV file that you can, then, view in Excel or the like, you can do this--
give me a variable called writer-- but I could call it anything I want.
Let me use a csv.writer function that comes with this CSV library, passing in the
file. This is like saying, hey, Python, treat this open file as a CSV file so that
things are separated with commas and nicely formatted in rows and columns. Now, I'm
going to do this-- use that writer to write a row. Well, what do I want to write? I
want to write a short list-- namely, the current name and the current number-- to
that file, but I don't want to use fprintf and %s and all of that stuff that we
might have had in the past. And now, I just want to close the file.
Let me reopen my terminal. Let me run python of phonebook.py, and let me type in
David and then +1-949-468-2750 and, crossing my fingers, watching the actual CSV at
top-left. My code has just added me to the file. And if I were to run it again, for
instance, with Carter and +1-617-495-1000, crossing my fingers again-- we've
updated the file. And it turns out, there's code now, via which I can even read
that file. But I can, first, tighten this up, just so you've seen it.
It turns out, in Python, it's so common to open files and close them. Humans make
mistakes, and they often forget to close files, which might, then, end up using
more memory than you intend. So you can, alternatively, do this in Python so that
you don't have to worry about closing files. You can use this keyword instead. You
can say with the opening of this file as a variable called file do all of the
following underneath.
So I'm indenting most of my code. I'm using this new, Python-specific keyword
called width. And this is just a matter of saying, with the following opening of
the file, do those next four lines of code, and then, automatically close it for me
at the end of the indentation. It's a minor optimization, but this, again, is the
pythonic way to do things, instead. How else might I do this, too?
Well, it turns out that the code I've written here-- on line 9, especially-- is a
little fragile. If any human opens this spreadsheet-- the CSV file in Excel, Google
Spreadsheets, Apple Numbers-- and maybe moves the columns around just because,
maybe, they're fussing. They saved it, and they don't realize they've, now, changed
my assumptions. I don't want to, necessarily, write name and number always in that
order because what if someone screws up and flips those two columns by literally
dragging and dropping?
So it turns out that, instead of using a list here, we can use another feature of
this library, as follows. Instead of using a writer, there's something called a
dictionary writer or dict writer that takes the same argument as input-- the file
that's opened. But now, the one difference here is that you need to tell this
dictionary writer that your field names are name and number. And let me close the
CSV here.
Name and number are the names of the fields, the columns in this CSV file. And when
it comes time to write a new row, the syntax here is going to be a little uglier,
but it's just a dictionary. The name I want to write to the dictionary is going to
be whatever name the human typed in. The number that I want to write to the CSV
file is going to be whatever the number the human typed in. But what's different,
now, about this code is, by simply using a dictionary writer here instead of the
generic writer, now, the columns can be in this order or this order or any order.
And the dictionary writer is going to figure out, based on the first line of text
in that CSV, where to put name, where to put number. So if you flip them, no big
deal. It's going to notice, oh, wait, the columns changed. And it's going to insert
the columns correctly. So just, again, another more powerful feature that lets you
focus on real work, as opposed to actually getting tied up in the weeds of writing
code like this, otherwise.
Questions on this one, as well? But what we will do, now, is come full circle to
some of the more sophisticated examples with which we began, and I'm going to go
back over to my own Mac laptop here, where I've got my own terminal window up and
running, and I was just going to introduce a couple of final libraries that really
speak to just how powerful Python can be and how quickly you can get up and
running.
To be fair, can't necessarily do all of these things in the cloud, like in code
spaces, because you need access to your own speakers or microphone or the like. So
that's why I'm doing it on my own Mac, here. But let me go ahead and open up a
program called speech.py. And I'm not using VS Code here. I'm using a program
called VI that's entirely terminal window based. But it's going to allow me, for
instance, to import the Python text to speech version 3 library.
I'm going to give myself a variable called engine that's going to be set equal to
the Python text to speech 3 libraries init method, which is just going to
initialize this library that relates to text to speech. I'm going to, then, use the
engine's say function to say something like, how about, hello comma world. And
then, as my last line, I'm going to say engine.runAndWait, capitalized as such, to
tell my program, now, to run that speech and wait until it's done.
All right. I'm going to save this file. I'm going to run python of speech.py. And
I'm going to cross my fingers, as always, and--
DAVID MALAN: All right. So now, I have a program that's actually synthesizing
speech using a library like this. How can I, now, modify this to be a little more
interesting? Well, how about this? Let me go ahead and prompt the user for their
name, like we've done several times here, using Python's built-in name function.
And now, let me go ahead and use a format string in conjunction with this library,
interpolating the value of name there.
DAVID MALAN: OK. It's a weird choice of inflection, but we're starting to
synthesize voice, not unlike Siri or Google Assistant or Alexa or the like. Now, we
can, maybe, do something a little more advanced, too. In addition to synthesizing
speech in this way, we could synthesize, for instance, an actual graphic. Let me go
ahead, now, and do something like this. Let me create a program called qr.py.
I'm going to go ahead and import a library called OS, which gives you access to
operating system related functionality in Python. I'm going to import a library
I've pre-installed called qrcode, which is a two-dimensional barcode that you might
have seen in the real world. I'm going to go ahead and create an image variable
using this qrcode library's make function, which, per its documentation, takes a
URL, like one of CS50's own videos.
And then, lastly, let's just go ahead and open with the command open qr.png on my
Mac so that, hopefully, this just automatically opens. All right. I'm going to go
ahead and just double-check my syntax here so that I haven't made any mistakes. I'm
going to go ahead and run python of qr.py. Enter. That opens up this. Let me go
ahead and zoom in. If you've got a phone handy and you'd like to scan this code
here, whether in person or online-- I apologize. You won't appreciate it.
Amazing! OK. And, lastly, let me go back into our speech example here, create a
final ending here in our final moments. And how about we just say something like
"This was CS50," like this. Let's go ahead, here. Fix my capitalization, just for
tidiness. Let's get rid of the name. And now, with our final flourish and your
introduction to Python equipped-- here we go--
[APPLAUSE]
[MUSIC PLAYING]
[MUSIC PLAYING] DAVID J. MALAN: All right. This is CS50, Harvard University's
introduction to the intellectual enterprises of computer science and the arts of
programming. My name is David Malan, and I actually took this course myself, back
in 1996. I was a sophomore at the time. I was actually concentrating in government,
because a year prior, as a first year, I'd come into Harvard thinking that I liked
history and constitutional law and similar classes in high school. And so when I
got here, I rather gravitated toward that which was familiar. I figured, if I liked
and if I were good at that particular subject in high school, then that's
presumably who I'm supposed to be here.
But it wasn't until sophomore year that I got up the nerve to step foot in the CS50
classroom, and even then, it was only out of curiosity. Like I had no intention of
studying computer science of even taking CS50 when I got to campus. But people were
talking about it, and there was a lot of beware. And it was perhaps for the
initiated only, and I didn't really ultimately what computer science was.
But for me, the light bulb went off. I found that, contrary to what I'd seen in
high school, where I saw friends of mine like programming away in the computer lab,
heads down, antisocially just doing whatever it was they were doing, it really
wasn't that, once I got to this particular class and this particular place. It was
much more about problem solving more generally and just learning how to express
yourself in code, in different languages. So that you can actually solve problems
of interest to you. Even if you have no intention of being a computer scientist or
an engineer, but just want to be able to solve problems, analyze data do
interesting things, in the arts, humanities, social sciences, physical sciences, or
really any other field.
And indeed, this particular path led me to computer science, but the hope for CS50
more generally is that, indeed, you just find your way to applying principles that
you'll learn over the coming months to whatever field is of interest to you. With
that said, it was definitely a lot of work and not without its frustrations for me.
But there was no better feeling than like banging your head proverbial against the
wall for some number of hours, even days, trying to fix a bug, a mistake in your
code. And then, oh my God, the rush of emotion of accomplishment of pride of
exhaustion when you finally solve some problem that's really been weighing on you.
It's just so incredibly gratifying but also empowering.
Because unlike a lot of fields, like computer science was built by humans
themselves. And so if a human built this, surely, you, another human, can
understand it as well. And so even though there's going to be some distractions
along the way, you're going to see what looks incredibly cryptic, if you've never
programmed before. Over time and with practice, everything just starts to make more
sense. And with time and with practice, you just get better at this particular
field.
And indeed, really, the key to success in programming in general is just to allow
yourself enough time. And so at least, thankfully, I quickly got into the habit of
starting early in the week, for instance, when writing actual code. Why? Because
you're going to run up against a wall. You're not going to see some bug.
Something's not going to jump out at you, and that's fine. That's when you call it
a day, take a break, move onto something else, and then just come back to it. And
that's what keeps programming fun for me, even all of these years later, whether
it's teaching or actually applying it.
But there's, down the road, a history of an MIT hack, and it looked a little
something like this, in yesteryear. And there was a little sigh the MIT students,
when they made this hack. On the wall it says, getting an education from MIT is
like drinking from a fire hose, which indeed they have connected to what should
have been otherwise just a water fountain. And that's going to be what it feels
like, sometimes, not just in computer science per se, but just an unfamiliar field.
If you're not from STEM, if you're not from CS, that's fine. But so much of it,
ultimately, is going to be absorbed by you and going to be within your grasp by
terms end. So just keep in mind, that's very much the intent, but you'll be amazed
what you're able to create, to accomplish, just three or so months hence. Indeed,
2/3 of you, contrary to what you might think are assume, have never taken a CS
class before. So it's absolutely not the case that the person to the left or the
right surely must know more than you. Indeed, it's quite the opposite.
And as you'll see in the coming weeks, as you write your own code and solve your
own problems, what ultimately matters in this course is not so much where you end
up relative to your classmates but where you end up relative to yourself when you
began. And it really is all about that delta, whether you've programmed or not,
just getting something out of a class like this. And if it does take time, and if
you do feel those frustrations, but you simultaneously eventually feel that sense
of accomplishment, that just means it's all working. And indeed, hopefully, all the
more worthwhile and gratifying, ultimately, as a result.
So what are we going to do in the coming weeks? So here we are in week zero. We'll
soon see why computers and computer scientists start counting, if you will, from 0.
But week 0, is one in which we explore computational thinking, thinking like a
computer, and starting to clean up your thought processes. Getting you to think, to
solve problems more methodically, and then ultimately, translating that into code.
And some of you might recognize this environment here, a.k.a. Scratch,
coincidentally also from MIT. You might have used it in grade school. We'll use it
today and a little bit this weekend in the course's first homework assignment or
problem set. But not so much to play around in a way that you might have if you did
use it in yesteryears, but to explore ideas of computer science and programming
that we're going to use and reuse every week hereafter as well.
Thereafter, we're going to transition just next week to week one, so to speak.
Whereby, we'll introduce you to a more traditional language, a lower level
language, an older language called C. And in C, you're going to use your keyboard,
not so much your mouse and pointing and clicking, but you're going to write code
that soon is going to look a little something like this. And if you've programmed
before, you can probably glean what this is going to do. If you've never programmed
before, which is the case for most of you, this too will soon make sense.
But this is the most canonical program that most any programmer ever writes called
Hello, World, and indeed, that and all of the surrounding syntax above and below
just that sentence Hello, World, will soon make all the more sense.
We'll also take a look thereafter at bugs. A bug is a mistake in a program. Here is
an actual bug in an actual computer in yesteryear, but we'll teach you how to debug
programs, find your own mistakes, find others' mistakes, and improve that code as
well. We'll transition then to algorithms, step-by-step instructions for solving
some problems, which we'll touch on today too. And if you picture here, this is
actually a pretty representative problem. Odds are, you haven't had to deal with
something like this, but it's representative sorting, for instance.
If you think of each of these small bars as being a small number, each of the
bigger bars is being a bigger number, you might wonder, well, how could you as a
human sort all of these bars, like get all the short bars over here, all the big
bars over there? Well, odds are, if you're like me, you would probably kind of
eyeball it, and if you could physically interact, you might just start grabbing the
smallest elements first, put them over on the left. Maybe grab the biggest
elements, put them over on the right.
But what's your algorithm there? Like how would you teach someone younger than you,
who's never done that before, how to do it? How would you compel your Mac or PC or
phone to do something like that? You can't just wave your hand, and say, oh, figure
it out. Move things around. You have to express yourself more methodically. So
we'll translate even ideas like this into code too. And that's what the Googles and
others of the world are doing constantly, as they sort and organize the world's
information.
We'll use metaphors along the way, if it helps. We'll talk about your computer's
memory as being like a postal address. Like every mailbox in the world has some
form of postal address, street, city, state, country, and the like, and it turns
out, that's how your Mac, your PC, and your phone also work. You've got a whole
bunch of memory, like the picture before, but you can think of it really as
individual mailboxes.
And you can put anything you want in those mailboxes, and you can go to a mailbox
to grab information that's from it. So at the end of the day, that's really all
your computer is doing with information. It's just organizing it, not into
mailboxes per se, but a term you probably know called bytes, for instance, instead.
We'll talk about problems that arise even nowadays. In fact, most of you are
familiar with your Mac, PC, even phone like spontaneously rebooting sometimes,
crashing, the little annoying spinning beach ball or hourglass icon that happens.
Like what is with that? Well, those are just bugs in programs that humans at Apple
and Google and Microsoft and others, they screwed up, and they wrote buggy code.
And your computer, when it encounters those mistakes, doesn't know what to do. And
so 9 times out of 10, so to speak, it just crashes or freezes or the like, but that
kind of stuff will make more sense.
So even the real world will make sense, and pictured here are some lower level
terms we'll eventually get to mid-semester. But generally speaking, when something
is going this way, as per this arrow, and something is going this way, as per this
arrow, like that does not end well. And that often is what happens when your
computer crashes. Someone's using memory up here, but someone else is using memory
down here, and then they're not really talking left hand and right hand. So that is
just a high level overview of some of the problems we'll encounter, but we'll focus
to on data, ultimately.
So pictured here is something fairly technical called a hash table. It's an amalgam
of something we're going to soon call an array and also something we call a link to
list. And these are just fancy terms for describing how you can organize
information even more flexibly than just putting individual values in mailboxes.
Like how could you build structures, like actual data structures so to speak, two-
dimensional structures at that?
And so what you're seeing here is a glimpse, as some of you might have recognized,
of some Harry Potter universe names, but they're organized somewhat alphabetically.
And notice, that any time there's multiple people with a name that starts with H,
like Hermione, Harry, and Hagrid, well, they can't all fit in that mailbox, if each
of these squares along the left is that same mailbox. So you have to chain them
together.
Well, you'll learn how to do that in code. So that even if you get more data than
you expect, if your business is booming, and you're some web-based business, how do
you keep adding and adding information to your software to actually keep up with
it? But this, again, is what code's going to soon look like, as soon as next week,
in week one, this here being C, but we'll transition in a few weeks to a more
modern, higher level language, so to speak, called Python.
Indeed, the course very deliberately, back in my day and now this, introduces you
first to C, which funny enough, many people don't tend to program in certainly
every day. I use C, generally, September, October, November, December, when
teaching CS50 itself. But it's everywhere, nonetheless.
In fact, even today's other languages, with which you might be familiar, like
Python and Java and yet others still, you see this same primitive language
underneath the hood, because it's so darn fast. And as you'll learn over the coming
weeks, it really gives you access to and an understanding of what's going on
conceptually down here. So that thereafter, after CS50, when you're writing code,
you can think at a very high level what's actually going on. So in fact, in just a
few weeks, what looks like this in C is going to look instead like this in Python.
And you'll better understand what's going on underneath the hood, and odds are,
after this class, you'll reach for a language, like Python more frequently than C,
but you're going to benefit from that bottom-up understanding thereof.
Thereafter and towards term's end, we'll introduce you to a few other ideas, like
where do you put large amounts of data? In things called databases, not things like
spreadsheets, like here, but actual databases. We're using those same kinds of data
structures, you lay things out in an interesting way in memory. Thereafter, we'll
transition to a very familiar environment that you and I use every day, the web.
Like the web has become rather the User Interface, or UI, that we use everywhere,
on the laptops, desktops, and even mobile devices, nowadays.
Well, pictured here is a language called HTML. It's not a programming language.
It's a markup language, and some of you might have made home pages or portfolios in
the past. But you'll understand what's going on here, but more powerfully, you'll
understand how the computer sees that same kind of code, builds up a hierarchical
family tree-type structure in memory. And then you can manipulate that tree with
code to actually add more and more information, chat messages, anything on the
screen that you like.
And finally, we'll tie all of this together by introducing what are called
frameworks and libraries, third-party code that makes it a lot easier to solve
problems of interest to you. And so in particular, here, this is the very first web
app that I myself made back in like 1997. I was part of the first-year intramural
sports program, not as an athlete but as the programmer, and I was teaching myself
how to build web applications. I only knew C and maybe a little bit of something
else at the time.
But this became, for Harvard at least, the very first website for the first-year
intramural sports program, and it wasn't just a static website with links and
images and the like. It was interactive. You could register for sports. We could
input exactly who was in a tournament bracket or the like, and it could actually
automatically keep track of this data. So there too, after just three months of a
class like this, you'll go from writing quite simply this week and next Hello,
World to building things like this for whether it's web, mobile, or other platforms
as well, if you so choose.
But we'll get you off of the course's infrastructure, by the end of the term. You
won't be using any toy environments along the way. We'll empower you, ultimately,
to write code after CS50, especially if this is the only CS class you ever take, on
your own Mac or PC, using the same software, but not the cloud-based version
thereof. But all of this software is itself free and can be used by you powerfully
after the course's own end.
But along the way, as you may know, there is this tradition within the class,
particularly in healthy times, of a number of events that really brings people get
together, not just collaboratively and academically, but to just solve problems and
generally engage with each other as well. Coming up first, CS50 Puzzle Day, which
is meant to be not jigsaw puzzles but logic puzzles that require no prior
experience with computer science or programming. But it's just an opportunity to
quietly work on a packet of puzzles with some number of friends for prizes and
more.
Later in the semester, once you tackle your final projects, the capstone of the
course, where we don't give you a homework to write, you yourself come up with
something to build. We'll get together generally around 7:00 PM in the evening,
wrap up around 7:00 AM, if you so choose. And it's an evening, a 12-hour
opportunity to collaborate with classmates on your very own final project, in a
large space on campus, that ends-- if you're awake with us-- at 5:00 AM. We can hop
on some CS50 shuttles and go down the road for some pancakes at IHOP around 6:00.
Of course-- of course, this is 6:00, 7:00 AM at that point, but it's an opportunity
finally to lead into what's called the CS50 fair, which is an end of semester
celebration, an exhibition, of everything that you'll accomplish over the coming
months. And in fact, pictured here are some of your predecessors in healthy times.
The CS50 fair allows you to come with your laptop or phone and exhibits of
students, faculty, and staff across campus put together something in person and on
video that people can delight in seeing, as you exhibit what it is you created and
what you learned over the course of the several weeks. And ultimately, a chance to
just share and inspire others as well. And you'll all walk home, ultimately, with
your own I took CS50 T-shirts saying as much as well.
So with that high level overview of the course, I propose that we begin to take a
look at what computer science itself is and what it is we're going to be doing over
the next several weeks at this lower level [INAUDIBLE] too. So what is computer
science? Right? If you're maybe like me or new people like my friends in high
school, you probably assume that it means programming. And that's absolutely a big
part of it for a lot of people, because with code, you can write, you can express
ideas, and solve actual problems, especially involving data.
But computer science itself is really the study of information, if you will. How do
you represent it, and how do you actually process it? And in that sense,
computational thinking is just the application of ideas from computer science, a
course like this, to problems of interest to you, again, in the arts, humanities,
sciences, social sciences, whatever the domain of interest is to you.
So with that, if computer science is all about information and with it the solving
of problems, well, what does it actually mean to solve a problem? Let's see if we
can't propose a model into which all of the lessons learned will ultimately follow.
And I'd propose that this is problem solving.
You've got some input, which is like the problem you want to solve. The goal is to
solve it. So that's the so-called output, and then somewhere in here, the
proverbial black box, is some kind of secret sauce that gets the work done. And in
the coming months, we'll have to decide, well, how are we going to represent these
inputs and outputs, and really, how do we code up? How do we write solutions for
what it is that's solving the problem of interest to us?
Yeah. So binary, so binary is indeed the system that computers somehow use. So in
this case, bi implying two, and so computers have two digits, it turns out, at
their disposal. And in fact, if you've ever heard the technical term bit, which is
like a smaller version of a byte-- more on that soon. Well, a binary digit is the
origin of that term "Bit," because if you get rid of some of the letters, and are
left from binary digit with just B-I-T, thus is a bit.
A bit is just a 0 and 1. It's two more digits than you might have on your own
finger, and of course, it's fewer though than you and I have. You and I typically
use, as humans, the decimal system. Dec meaning 10, because you and I generally use
0 through 9.
So on the one hand-- another pun intended-- you've got unary. Computers use binary.
We humans generally think and talk in terms of decimal. But at the end of the day,
these are fundamentally going to be the same thing, which is to say that it's all
pretty accessible to us. Even if you're not a computer person, I daresay you're
about to be.
And so by human convention, let's just assume that if you were a computer, be it a
laptop, desktop, phone, or the like, and you want to represent the number 0, you
know what, you just keep the light switch off. You keep a light bulb off. If by
contrast, you're that same computer, and you want to represent the number 1, you
take that same switch, that same light bulb, and just turn it on. So a light bulb
that's on represents a 1, and a light bulb that's off represents a 0.
So why is this relevant to computers? Well, at the end of the day, you and I are
charging our laptops or phones at night. So there's some physical resource being
replenished there, whether you're on battery or some power cord.
And so inside of a computer are just thousands, millions of tiny little switches,
nowadays. You can think of them metaphorically as light bulbs, but they don't
actually shine light. But there are tiny, tiny little switches, and those switches,
if you've ever heard the term, are just called transistors.
So how might I do this? Well, let me go ahead and propose that I just grab one of
our own light bulbs here on stage. This one is off. So for instance, if this were
miniaturized inside of your Mac, PC, or phone, this would be a transistor, and
indeed, here's the little switch on the bottom.
And if your computer wants to represent a 0, it just leaves the switch off, and the
light is not shining. If you want to represent a 1, well now, I've counted as high
as 1, because the switch is now on. I've grabbed a little electricity. I'm holding
on to it inside of the computer, and so now I see that this is a 1.
All right, but unfortunately, with just one switch, one light bulb, I can only
count from 0 to 1. How do I count out higher, might you think, intuitively?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, so more light bulbs. So let me do this. Let me just grab
something to put these on, so I can use a few of them at a time. And let me propose
that here, instead of having just one light bulb, let me give myself maybe three in
total.
So all of them are initially off, and if you think of this in miniature form, in
your mind's eye, this is like a computer with three transistors. Three switches
representing now the number you and I know as 0. Why? They're just all off.
So how does a computer go about representing the number 1? Well, it turns on one of
these light bulbs. And how does the computer represent the number 2? Well, you
might think, if I may, you just turn on a second light bulb.
And if you might think, how does a computer represent 3? You just turn on the third
light bulb. And so as such, with three bits, a computer would seem to be able to
count from 0 on up to 1, 2, 3.
But it turns out, if I'm a little smarter here, I can actually count higher than
that. Why? Well, I'm just considering the combination of bulbs being on here. What
if I do something like this? This is still 0, I will claim, but what if I propose
now that this will be how a computer represents 1-- on, off, off.
This, though, will be how the computer represents 2. Notice, I didn't turn on the
same two. I'm just turning on the one in the middle. This I now claim will be how a
computer represents 3. This is going to be-- in just a second-- how a computer
represents the number we know as 4, and yet, I'm still only using three bulbs.
Why, though, do these represent the numbers we know as 0 through 7? Well, let me go
ahead, and maybe let's do this. Instead of just considering there to be light
bulbs, let's assign some special significance to each of them, based on where it
is. And maybe for this, could we get maybe three volunteers, three volunteer?
OK. You're being volunteered. OK. Come on up. If you want to go over to the stage
there. Yeah. You want to come on up as well, and over here as well. So there are
some stairs on either end. Maybe a round of applause for our first volunteers of
term.
[APPLAUSE]
All right. So you want to be our number 1, and if you want to go ahead and stand
roughly right here. How about do you want to be number 2?
AUDIENCE: Yeah.
DAVID J. MALAN: Come on over right to the right of here, and you'll be number 4, it
turns out. If you want to come over here, on this end, let's give you all a moment
to introduce yourselves briefly to your classmates, if you'd like.
DAVID J. MALAN: Welcome. All right. So so glad to have all three of you up here.
Thank you.
[APPLAUSE]
Let me propose now that we'd like you three to represent how about the number 0.
And I claim now that if each of you now represents a switch, you have fancier light
bulbs now. One is a 1. One is a 2. One is a 4, but each of you is still just has a
switch on the bottom, in fact, of your plastic devices.
I claim these three volunteers are representing the number 0. Let me ask you all
now, how might you represent the number 1? How should you cooperate here? OK. So we
would have on, off, off, which I think matches what I did with my three light bulbs
as well, if you want to go and turn yours off.
How might you three represent the number 2? OK, so off, on, off now, from right to
left. How would you three represent the number 3? Ah, so that's why my two light
bulbs went on at the end.
How would you three represent the number 4? Perfect. Number 5, number 6, and number
7? All right, and give us one more. How would you represent 8?
AUDIENCE: We can't.
DAVID J. MALAN: OK. You can't. How about then one more volunteer, one more
volunteer? OK. Come on up. All right. What's your name?
DAVID J. MALAN: All right, [? and Moin, ?] you're going to be number 8, and if now
you all-- actually, let's make this how would you represent number 8, all
collectively, as 4 bits or for switches? OK, 8, and now lastly, give me 15.
Everyone's awkwardly doing arithmetic in their head, oh, using unary. Yeah. Is that
everyone-- Yes. OK. Round of applause. OK. Thank you all.
If you want to leave your numbers over here, we have a CS50 stress ball for you,
but thank you for volunteering. You can turn those numbers off and leave them over
here. So thank you.
And now why were they those patterns, and why did I very deliberately have our
volunteers line up in that way? Well, I wanted them using base-2, a.k.a. binary,
but with binary there comes certain rules. And even if you're not familiar with
binary beyond that it exists and relates somehow to computers, it's actually pretty
much identical to the system you and I use every day, known as base 10, a.k.a.
decimal.
So let's consider, if you will, by rewinding to primary school for just a moment,
like how decimal works. And you'll see that even if you're not a computer person,
you actually are. You just have to tweak your mental model ever so slightly.
So here is the number that you're probably viewing as 123, but why is that? Well,
it's not really 123. This is just a pattern of three symbols on the screen, 1, 2,
3, and your mind is rapidly assigning mathematical meaning to them, 123, but why is
that? Well, if you're like me, you probably learned back in the day, when you have
a three-digit number like, this the rightmost number is in the 1's place, the
middle digit is the 10's place, the leftmost digit is in the 100's place, and why
is that relevant? Well, if you then quickly do some mental math, as you and I just
do instantly nowadays, that just means 100 times 1 plus 10 times 2 plus 1 times 3,
of course, 100 plus 20 plus 3 gives us the number you and I know as 123.
But beyond that, how do we get to just two digits instead of as many as 9 in the
decimal system? Well, let's generalize this. In the decimal system, you and I know,
if we've got three digits represented by these hashes here, yes, it's the 1's
place, 10's place, 100's place, and if we keep going 1,000's, 10,000's, and so
forth, but why is that?
Well, base terminology is now a little more germane. That's technically the 10 to
the 0th column, the 10 to the 1, 10 to the 2. So these are powers of 10, where 10
is your base.
Computers just simplify things a little bit, because computers, at the end of the
day, only have access to electricity, on or off. They don't have access to 10
different types of electricity, just 2, on or off, if you will. Well, they just use
a different base.
And the rightmost digit would be in the so-called 2 to the 0ths. Then the middle
digit is 2 to the 1. The left most is 2 to the 2, a.k.a. 1's place, 2's place, 4's
place, and as we kept going, 8, and if we keep going, 16, 32, 64, 128, and so
forth, but the idea is fundamentally the same.
So why is this how the computer represents the number you and I know is 0? Well,
off, off, off, from right to left or in this case left to right, is just 0. Why?
Because that's 4 times 0 plus 2 times 0 plus 1 times 0 is, of course, 0. This is
why 001 represents 1. This is why 010 represents 2 and 3 and 4 and 5 and 6 and 7 on
up.
And why did we need a 4th bit to represent 8? Well, we kind of needed to carry the
1, so to speak, using our familiar human terminology. But for that we need a 4th
bit, another transistor, and this now represents the number 8. And that's why we
ended with on-- from left to right-- off, off, off.
So I keep saying on and off, or the light bulb is on or off, but really, I just
mean 1 or 0. And so computers and we humans think of things digitally as just being
0's and 1's, but mechanically, you can think of it indeed is these light bulbs.
Now, a bit is not very useful. Even 3 bits, 4 bits, not that useful. You can count
to 7 or 15, generally speaking, bytes are a more useful unit of measure.
And anyone familiar how many bits is in a byte? Yeah. So 8 bits are in a byte. You
can think of it as an octet equivalently. In some contexts, there are nuances
there, but think of a byte as just being 8 bits, and that's just a more useful
measure.
So what does this mean in real terms? So if you've ever downloaded like a music
file or a photograph or a video, those are measured in bytes. Probably not small
numbers of bytes, probably kilobytes for thousands of bytes, megabytes for millions
of bytes, gigabytes for billions of bytes, especially for video. That just means
you have a lot of patterns of 8 bits, some combination of 0's and 1's on your
computer's hard drive.
Here then, with a byte of bits, 8 bits, is how a computer would typically represent
the number 0. And if that same computer uses all 8 of its bits, its full byte, to
change it to 1-- anyone who's quick with math or have seen this before, how high
can a computer count with 8 bits or 1?
[INTERPOSING VOICES]
Yeah, 255. Why is that? Well, we're not going to turn this into a constant math
exercise. Indeed, after today, we're not really going to think about or talk about
bits at this low level. But this is the 1's place, 2's, 4's, 8s, 16, 32, 64, 128,
and if I do all of that math from left to right, that indeed gives me 255. It
ignores how we might represent negative numbers, but perhaps more on those some
other day.
But computers, of course, do so much more than numbers and math and all this low
level stuff. We send text messages, write documents, emails, and the like. So how
might a computer represent something like the letter A? I claim, at the end of the
day, your Mac, your PC, your phone just has lots of transistors, lots of switches
that it can use in units of 8, in units of bytes.
How, though, if it's already using those patterns of 0's and 1's apparently to
represent numbers from 0 on up, how do you go about representing letters of the
alphabet, might you think? Yeah? OK. So we could assign a number to every letter.
OK. So let me just conjecture, well, let's just call A 0, for simplicity, B 1, C 2,
and now let me play devil's advocate. OK, how do I now represent 0 or 1 or 2?
Well, we've maybe created a problem for ourselves, if now we have to steal some
numbers to represent letters. We kind of have to pick a lane, but there's a
solution to that too that we'll see. And it turns out the world is not quite as
simple as A being 0. A typically is represented, by computers everywhere, phones
everywhere, with the number 65, the decimal number 65. Using 8 bits, if we turn
some of the 0's to 1's, let me just stipulate, you can represent the letter A using
8 bits, by turning certain ones on and certain ones off, but we will try not to
focus on that binary level too much.
So if A is 65, it turns out that B is going to be 66, and C is going to be 67, and
so forth, and so where does that get us? Well, it turns out there's a whole system
that maps numbers to letters. And here, as I alluded to verbally a moment ago, is
the pattern of 0's and 1's via which you'd represent 65.
And just quick check here, we won't constantly do math 1's place, that's easy, 2's,
4's, 8's, 16's, 32's, 64's place. So 64 plus 1 gives us 65. So once I do that, how
do I get to all of the others?
Well, it turns out a bunch of Americans years ago came up with this ASCII, the
American Standard Code for Information Interchange. Now, what does that mean? Well,
it's just an acronym describing what really you proposed, a mapping between numbers
and letters, not quite as simple as 0, 1, 2. Starts at 65, 66, 67 for capital
letters, but here are most of the letters in use today, at least with this system.
So this is just a big chart from online, and you'll see in the middle of this
chart, here, here's my 65, A. Here's my 66, B, C, and let's see, 72 is H, 73 is I,
and so forth. So there's a mapping, at least for English, between all of these
numbers and all of these letters. And if we focus here, those are the beginning of
our uppercase alphabet.
So suppose then that today, tomorrow, you receive a text message from someone, and
underneath the hood, now that you're a computer person, you figure out a way to see
what pattern of 0's and 1's was sent. In this case, it's wireless as opposed to
wired, but it's still some pattern of 0's and 1's. And your phone is turning some
switches of its own on and off to represent that message from a friend.
Suppose that the three patterns you received were these three bytes, from left to
right, spelling out a three-letter word. Well, if we do out the math, 1's place,
2's place, and so forth-- I'll spoil it for you. Suppose that you received a text
message that doesn't literally say 72, 73, 33, but you've received a pattern of 8
plus 8 plus 8, 24 bits that if you do out the math represent the decimal number 72,
73, 33. Anyone recall what message you might have received from the green and white
charts? Yeah?
AUDIENCE: Hi.
DAVID J. MALAN: Hi. Yes, hi is the message, but 72, 73 gives us H and I. What's 33?
Any guesses to 33? Yeah, over here.
Yeah. So it's an exclamation point. How would you know that? Well, you really do
need some kind of cheat sheet, a.k.a. ASCII in this case. And if we look
elsewhere-- let me highlight the left of the chart-- you can see that next to 33 in
decimal is indeed the exclamation point.
So back in the day, a bunch of humans got in a room, decided that, hey, when we
start building PCs and later Macs and phones, we all just have to agree on this
form of representation of letters of the English alphabet, in this case. We just
need to agree on this mapping.
But somewhat curiously, notice this. It turns out that, once you paint yourself
into this corner and start using 65 for A, 66 for B, well, how do you represent 65
the number and 66 the number, if you want to do math or use Excel or something like
that? Does anyone see the solution, perhaps? How do you represent the number 1 in
ASCII? Yeah, in the middle?
AUDIENCE: [INAUDIBLE].
DAVID J. MALAN: Yeah. So this is getting a little maybe inception or something, but
you could represent numbers with other numbers. And so if you want to represent the
number you and I know as 1, like when you type it on your keyboard, turns out the
computer stores that as the decimal number 49. If you hit 2 on your keyboard, the
computer is not storing 2, per se. It's storing the decimal number 50.
Now, thankfully, the paradox stops there. We just have a mapping now of numbers to
numbers. But really, at the end of the day-- and you're going to learn this when we
start writing code in that other language, C, next week-- it's just context-
dependent, at the end of the day.
Inside of your Mac, PC, and phone, there's just all of these permutations of bits,
all of these patterns of 0's and 1's. And generally speaking, when you open up a
text message that you've received from someone, it's 0's and 1's. But obviously, if
it's a text message, the whole point of text messages is to send text, and so those
patterns of 0's and 1's, by default, will typically be interpreted as letters of
the alphabet.
So you won't see 0's and 1's. You won't see decimal numbers. You'll see the English
message that your friend intended.
By contrast, if you open up something like Excel, that same pattern 0's zeros and
1's might indeed work out to be 72, 73, 33. You might see cells in your spreadsheet
with literally those three numbers. Why? Because spreadsheets are all about numbers
and number crunching and math, in many cases.
If by contrast, you open up Photoshop and try to look at that same pattern of 0's
and 1's, it's not going to be 72, 73, 33. It's not going to be 0's and 1's. It's
not going to be hi. It's going to be some color of the rainbow. You're going to use
those patterns of 0's and 1's, it turns out too, to represent colors.
And indeed, so long as you and I just agree, as humans long have, what these
patterns are going to be, all of our systems, many of our systems nowadays are
indeed interoperable. But I'm being very biased here, and indeed, the A and ASCII
is very American-centric. What do you not see in this chart? If you speak any other
language than English, odds are, you're not seeing characters you know and love and
need every day to type or send messages.
Well, there's a huge character set that's not supported here, whether it's accented
characters and a lot of Asian alphabets. You have many more symbols than can fit
even on this screen here. And so humans kind of painted themselves into a corner,
early on, or really, Americans did. But on a typical keyboard, US English keyboard,
yeah, you have A's and B's and C's, uppercase and lowercase, but you also have
accented characters here.
And nowadays, not sure if this is maybe necessary, but nowadays, you have other
characters on your keyboard, like these. And these are a playful incarnation of
what's actually a technical solution to this problem. If I claim for the moment
that ASCII historically used 7 bits to represent letters-- and let's just round
that up to a byte-- 8 bits to represent letters, ASCII can represent as many as
255, or really 256, total characters.
Why 256? Well, if you have them all 0, that's 0, and the highest number I claimed a
moment ago was 255. So that's 256 total possibilities. That's not many letters.
It's fine for English, but not a lot of human languages.
So what might the intuitive solution be, if you want to represent accented
characters, Asian characters, emoji, even like these, which are just keys on a
keyboard nowadays? What's the intuitive solution, if a byte's too few? Yeah?
And the solution then to ASCII is what we'll call Unicode. So Unicode is also just
a mapping of numbers to letters but in many different languages. And indeed, the
Unicode Consortium is a bunch of people from all different companies-- a lot of
different companies and countries and cultures whose mission, as an organization,
is to capture digitally all forms of human language in this case. And to ensure
that especially smaller demographics of humans speaking lesser-known languages are
nonetheless represented and preserved digitally using some mapping of these 0's and
1's.
It turns out, though, if you start using 32 bits, as many as 32 bits, to represent
characters on a keyboard, that's 4 billion possible permutations of 0's and 1's.
That's way more than we need for most human languages. So there's a little bit of
room in there for some of those more playful things, like those emoji.
So for instance, suppose you got a text message with this pattern of 0's and 1's.
Or if we do out the math, suppose you receive a text message that, if you do out
the math in decimal, is 4,036,991,106. Anyone know what emoji you're looking at? It
would be weird if you do, but what is this?
Well, it turns out that, as of this past year, this is the most popular emoji to be
sent by many measures, Face with Tears of Joy. So that is the pattern that a bunch
of humans in the Unicode Consortium decided would represent this. But you'll
notice, many of you might have iPhones, some of you might have Android devices too,
and sometimes, these don't actually look quite the same. This happens to be the
current version of Face with Tears of Joy on iOS.
On Android, it tends to look a little something more like this, and here is kind of
a curiosity. Even though you and I look at these things and they look like images,
they're not images. They're characters, at least as we've defined them now in
Unicode. And iOS and Android and Windows and Facebook and other companies and apps,
nowadays, really just have different fonts, if you will.
So just like fonts with English and other languages can give you different
characters with serifs or not, emoji are themselves, yes, drawings that someone
made, but they're really just a font. And so that same pattern of 0's and 1's might
just render slightly differently on someone's phone or another. If you've ever
gotten like an icon on your phone that's broken, and you've been sent an emoji, but
it's like a square or something arbitrary and not sensible, it might just mean that
you have not updated to the latest version of iOS or Android, which just updates
the font of supported emoji. Because those folks at Unicode, pretty much every year
nowadays, are adding more and more emoji to that particular character set.
Now, I went down the rabbit hole of figuring out the other day just which are the
most popular emoji these days. On Twitter specifically, this past year, the most
popular emoji, by contrast, was Loudly Crying Face. I don't know if that says more
about 2021 or about Twitter, but you'll see different trends, certainly, in how
these are used.
But even humans themselves didn't necessarily think two steps ahead, and now a lot
of the emoji are the default yellow color. But there's a lot of emoji that aren't
these cartoon characters, but they're meant to represent humans in various
professions or gestures or the like. And nowadays too, you've probably noticed on
your phone and Macs and PCs, there are different skin tones that you can assign to
certain emojis. If it's supported by the company and by Unicode, you can actually
touch and hold on a certain emoji, and then you can choose the appropriate skin
tone to represent yourself or someone else. And that then modifies the display.
Well, let's just think for a moment here, how did Apple and Google and Microsoft
and others go about implementing support for emoji with different skin tones? How
could you do this? If you want to represent some smiling emoji but in five, in this
case, different skin tones, you could come up with what? Five different patterns
that are identical, structurally, except for the skin tone used in places in that
image.
But that's a little inefficient to just do copy, paste, paste, paste, paste, and
change the color in Photoshop, if you will. That's going to use more bits, more
information than you might need to. How else, if you now start to think a little
bit more like a computer scientist, if at the end of the day, all you have are 0's
and 1's, how else could you implement skin tones, might you think? Yeah?
AUDIENCE: RGB.
DAVID J. MALAN: OK. So RGB, we'll come to that in just a moment. That stands for
Red, Green, Blue. That's one way. In this case, though, I'm seeking an alternative
to just using five different patterns of 0's and 1's to represent the same emoji
but different skin tones. So not quite RGB. Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: OK. So store one copy of the emoji and then store different
variants of the color that you want to assign to that emoji. Yeah. So this is
actually an example of-- do you want to elaborate?
DAVID J. MALAN: OK. So you can use a loop to actually output these things. More on
that in a moment. Let me go down this road for just a moment. This would be in some
sense a better design, if you will, but why? Yeah?
AUDIENCE: A filter?
DAVID J. MALAN: OK. So filter, if we think of in the Instagram sense. You can
change the color of something, and that could be related here too.
Let me spoil. I think if we go down this one particular road, the way the Unicode
folks decided to do this some years ago where the first byte or bytes that you
receive via text or email just represent like the structure of the emoji, the
default yellow version, thereof. But if it's immediately followed by a certain
pattern of bits that these humans standardize to represent each of these different
shades of skin tone, then the phone, the Mac, the PC will change that default
color, yellow in most cases, to whatever the more apt human tone is.
So you just use twice as many bits, but you don't use five times as many bits. So
what do I mean? You don't have five completely distinct patterns, per se. For each
of these possible variants, you have a representation of just the emoji itself,
structurally, and then reusable patterns for those five skin tones.
Unfortunately, that wasn't quite versatile enough for other features that were in
the pipeline, and nowadays too, and there's a double meaning now to representation.
Emojis intended to focus on certain professions, and early on too, were certain
professions associated with certain genders and vice versa. And you couldn't
necessarily be one gender or another, in a certain profession or another. There
were these combinatorics that just weren't possible.
But nowadays, as you might have seen, you can have couples in love for instance
that actually look a little more like three emojis, but just kind of combined into
one. And indeed, this is just one key press on your phone, and you can combine
different emoji on the left and then the right with the emoji in the middle. And so
it turns out how computers nowadays represent these patterns are one set of bits
for the character on the left, one set of bits for a character on the right, one
set of bits for whatever emoji you want in the middle. And then you assemble more
complicated compositions of emoji by just reusing those same patterns and bits and
bits.
The Unicode folks don't have to come up with a whole new representation for some
very specific incarnation. They can create one for person, for male, for, female
for other characters that you might want to display, and reuse those same patterns
of 0's and 1's. And so here, you see the imperfection, of or lack of foresight, of
humans for building a system early on that was entirely American-centric, no
characters, emoji, or the like, that's evolved too. And so that's an important
detail in computing, nowadays. It too is evolving, and the languages you're about
to learn in the coming days, those too are evolving as well.
And new features are getting added, and even programming languages have version
numbers. You might have a different version of an app on your phone. Programming
languages too have different versions as well. Questions then thus far on how
information is represented using ASCII or Unicode or anything in between?
Yeah?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Oh, good question. So to recap, why can't you just-- well, let me
summarize that as why can't you similarly use different patterns to change the
context of what these patterns of bits represent, whether it's a number or a letter
or a graphic? In actuality, that's kind of what's happening underneath the hood.
It's not standardized in quite the same way, but starting next week, when we
transition from scratch to C, you'll learn about types, data types. Where the onus
initially is going to be on you, the programmer, to tell the program whether or not
this pattern of bits should be interpreted as a number or as a letter or as a color
or something else. Nowadays, though, and toward the end of the semester, you'll use
languages, like Python, where the computer just figures it out for you by context,
which makes it even easier and faster to program as well.
Other questions on Unicode, ASCII or the like? All right. Well, how about just a
few other forms of information? RGB was called out earlier, Red, Green, Blue. How
do images get represented in computers?
Well, in fact, it's typically an assembly of some amount of red, some amount of
green, some amount of blue, but there are other representations. If you're a
graphic designer, you might know them, but RGB is still pretty common. What does
this mean? This means to represent every dot on your phone or every dot on your TV
or your laptop or desktop, there is a number representing how much red that dot
should show, a number representing how much green, and a number representing how
much blue it should show, red, green, blue, respectively.
So for instance, if a dot on your screen were using these three numbers, these
three values or bytes, 72, 73, 33, in a text message or email, that would be
interpreted as I claimed "Hi!" But in Photoshop or in some graphical program, that
same pattern would be interpreted as let's call it a medium amount of red, a medium
amount of green, and a little bit of blue. And why medium and little?
Turns out, each of these are bytes, the smallest value you can have in a byte we
said is 0. The largest value you can have a byte is 255, so I'm just kind of
spitballing here. This is like medium, medium, and a low amount of red, green,
blue, specifically. Those three colors, like wavelengths of light, are combined in
such a way that you would have this dot on the screen, a sort of murky shade of
yellow or brown.
That is how a computer would store precisely that color, and in fact, we've seen
this color. When you type in Face with Tears of Joy, generally, on your screen, it
looks like this, typically much smaller. But let's zoom in, or let's Zoom in a
little more. What are you starting to see, if you know the term?
AUDIENCE: Pixels.
DAVID J. MALAN: So pixels, it's getting very pixelated. A pixel is just a dot on
the screen, and if you really zoom in on it, you can literally see all of the dots
that compose an emoji, in this case on iOS, in the font that Apple is using to
represent this particular pattern of 0's and 0's. So one of those yellow dots-- and
there's many of them all kind of blend together here-- each dot on the screen I
claim is 3 bytes. How much red, green, blue for this dot? How much red green blue
for this dot? How much red green blue for this dot?
And you'll notice too, that when it gets to be sort of brownish here, the dots
really stand out. The 3 values, the 3 bytes, a.k.a. 24 bits, are just slightly
different. And so underneath the hood, this is why images, photographs that you
take or gif that you download, get so darn big, potentially, because you have a
number representing every dot on the screen.
Well, if this I claim is indeed how images are typically represented, using pattern
of bits that are assigned to some amount of red, green, or blue, how do you get
video? What is a video, if at the end of the day, all we have are 0's and 1's?
What's a video, perhaps? Yeah? Let's go here, way in back. Yeah. Pixels really
changing values over time. and do you want to confirm or deny, the hand that went
up here?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, or equivalently, a sequence of images that, over time, are
changing on the screen. So both of those are valid interpretations, and just for
fun, if you grew up with these picture books, you might remember a little something
like this. if we could dim the lights?
[MUSIC PLAYING]
So that's the old school analog way to implement a video, in the sense that that
artist wrote out like hundreds of pieces of paper, with almost identical images,
but where the ink from their pencil or pen was slightly moving. And if you digitize
that, such that each of those strokes are represented with dots instead, that's
really what you're seeing is a sequence of all of these images flying across the
screen. And if we dive into the real world, if you've ever watched a film, a
Hollywood movie is typically 24 FPS, Frames Per Second. That really means you're
seeing 24 images per second, or on TV or in soap operas, it's often 30 frames per
second. That makes things look a little more smooth.
So it's not actual motion picture, if you will. It's sequences of pictures, and
your brain and mind are interpolating that, oh, this is smooth movement, even
though we're just seeing a lot of pictures really fast. Now, that gets really big,
and we'll talk later in the semester how you can compress information, so that
you're not using way more bits than you actually need to. And there's fancy
algorithms that folks have developed, but at the end of the day, that's really all
a video might be is a sequence of images. Conversely, if you want to represent the
music that accompanies that or something else, if any of you play an instrument and
can read sheet music, how could you digitize this?
[PIANO MUSIC]
Like how could you represent musical notes in a computer? You and I hear them when
we play files, but what's really going on underneath the hood? Any musicians, piano
players? Anyone? Yeah?
DAVID J. MALAN: OK, so Hertz value, so some frequency. So sound is some frequency,
and it's kind of hitting your eardrum. And that's what makes it sound low or high
or somewhere in between.
So maybe we could assign, just like there's letters A through G here, maybe we
could assign specific frequency values, which are just going to be numbers measured
in something called Hertz, something per second. And maybe we could have a few
other numbers for each of these notes, not just the note or the frequency. Maybe,
we could represent the loudness of it, like how hard or how softly a human might
equivalently press it. Maybe a third number, like duration, like how long is there
finger on the keyboard?
So you can imagine quantifying something like music that in the real world is
perfectly continuous as something more discrete, by representing each note over
time as just some sequence of values. And there's so many different ways to do
this, MIDI, if you've heard, mp3's, AAC. Almost all of the file extensions you see
on your Mac or PC, if you see them at all, ultimately just mean there's a different
form of representation for, in this case, something like sound.
So let me just stipulate, there are these and many more ways to represent inputs
and outputs, and thankfully, humans have standardized a lot of this. They don't
always agree, and this is why we have different file formats for Apple numbers and
Microsoft Excel and Google Spreadsheets and stupid incompatibilities like that. But
generally speaking, humans have standardized how we represent the inputs and
outputs to and from problems.
But let's now focus on this Black box, so to speak, in the middle, this
abstraction. So abstraction is technically a term that you'll see all over the
place in computer science, and really, problem solving, that just refers to the
simplification of something, so that you don't focus on the lower level
implementation details. You really just focus on the high level goals or the
process itself.
Therefore, your car, if you have a license and have driven or have been in a car, a
car, so far as you're concerned, is probably an abstraction. Most of us, if you're
like me, probably don't really know or care how the engine works and all the parts
that are moving. To you, it's just a way of getting from point A to point B. It's
an abstraction, but someone, hopefully the mechanic, does know those lower level
implementation details.
If you had to understand how a car works every time you want to go to school or to
the store, it's probably going to be a pretty slow process. You just want to think
and operate at this higher level of abstraction, and we're going to do this all the
time when writing code and solving problems. So what then is in this black box,
this abstraction at the moment? Well, generally, it's what a computer scientist
would call an algorithm, step-by-step instructions for solving some problem.
Now, let's consider the implementation details, that is to say how you might solve
certain problems, and let's take a old school example but in modern form.
This icon, if you have an iPhone, is, of course for your contacts application. And
if you've got a whole bunch of family members or friends or colleagues in your
phonebook, you have some kind of contacts pictured here, and it's alphabetized
typically by first name and last name. And odds are, you and I are in the habit, if
they're not already a favorite, of like clicking on Search and then using
autocomplete.
And what happens when you start typing autocomplete? Well if you type in the letter
H, you'll see only, presumably, Hagrid, Harry, Hermione, and so forth. If you type
in H-A that shows you only Hagrid and Harry, and it all happens super fast.
So how is that happening? Well, typically, you could just start at the top and look
to the bottom, searching for all of the H's or all of the H-A's, but for larger
data sets that's going to get slow. For the Googles of the world, that's going to
get really slow. And even on our phones when you have hundreds, thousands of
contacts, eventually, even that kind of approach, that algorithm step by step it
might be slow.
So how might we go about searching for someone in a phonebook like this, like say
John Harvard? Well, here's an old school incarnation of this, and odds are, you
might not have had occasion to even physically use this thing, nowadays. And in
fact, this is a bit of a white lie, because this is the yellow pages, which means
this is a book of companies not people. But this is all you can find, and at that,
it's even hard to find this. But this is the same thing in analog form, physical
form.
So if I wanted to search for someone like John Harvard, how could I do that? Well,
I could start on page 1, and I could start searching for page 2, page 3, page 4,
page 5. A little hard to do physically, especially since no one's used this phone
book in a lot of years. But is this algorithm correct, turning page by page, very
inelegantly? Is this correct?
Will I find John Harvard, if he's in here? All right. So yes. This is a little
stupidly tedious, because if there's like 1,000 pages, he might be a few hundred
pages into this, but it's correct. At some point, I will find him, and if he's on
the page, I'll be able to call. Why? Because presumably, the names are alphabetized
in here, and there's no cheat sheet on the edge.
So I have to search for John Harvard from left to right, searching for H, if it's
alphabetized by last name. Well, what would be marginally better? Well, how about
two pages at a time? It's hard to do with a 20-year-old phone book, where the pages
are grown together, but 2, 4, 6, 8, 10, 12. This algorithm, is this correct?
AUDIENCE: No.
DAVID J. MALAN: Yeah. So I'm skipping every other page. So if I don't consider
that, and I find myself in like the I section or the J section, well, I might
accidentally conclude, nope, I haven't found John Harvard yet, just because I
skipped him, because it was sandwiched between two pages. Now, I can fix this, I
think, if I do hit the I section, well, let me just double back one page, just in
case he was in that last page. So it's recoverable, but it's almost twice as fast,
minus that hiccup there.
But what most of us would do, and what your phones are doing, albeit digitally, is
they open up roughly to the middle of the phonebook. And they look down, and they
say, oh, I'm in roughly the M section. So I'm roughly halfway through the 1,000-
page phonebook. But what do I now know about John Harvard? Where is he, to my left
or to my right?
All right. So alphabetically, he's presumably to my left, and so here I can, both
metaphorically and physically, tear the problem in half. You don't need to be
impressed. It's really easy down the spine that way, but I know that John Harvard
is to the left here. But now I can throw, unnecessarily dramatically, half and page
one out of the way, and what do I now know? I've gone from 1,000 pages to like 500.
I can repeat roughly the same algorithm. Go to the half of this, and so this time,
I went back a little too far. I'm in now the E section. So what do I know? Is John
Harvard to my left or to my right?
To my right, so I can, again, tear the problem in half. Throw this half away, and
now I'm really flying. I'm doing it verbally slowly, but that went from 1,000 pages
to 500 to now 250. And now I can do it again, 125. I do it again, roughly like 67,
and keep doing it again and again and again, until I get left with, hopefully, just
one single page or in this case an ad for, ironically, a mechanic.
OK, so what is the implication for our performance? Well, let's just do this sort
of in the abstract, if you will, if that first algorithm were to be plotted just
quickly on a chart without even numbers. Here's my x-axis, size of problem on the
x-axis. So the bigger the problem, the farther out that way. Time to solve the
problem. The higher you go up on the y-axis, the more time you're taking to solve
it.
How would we draw the running time, the amount of time taken to run that first
algorithm? Well, it's going to be a straight line. Why? Because if you add one more
page next year because more people move to Cambridge, you're going to add one more
page turn potentially, so one more second, one more unit of time. So it's a
straight line. And we'll abstract it away as "n." If there's n pages in the phone
book, the slope of this line is essentially n.
The second algorithm, wherein I was doing two pages at a time, was twice as fast,
but it's still a straight line. And in fact, let me just draw some dotted lines
here. If the phone book is this big, with my first algorithm, it might take this
many units of time, this many steps, this many page turns. But with that second
algorithm, notice that the intersection is much lower on the yellow line than on
the red.
So n over 2 means there's half as many pages here, if n is the number of pages. So
indeed, that algorithm-- the second one-- is twice as fast minus the little hiccup
that I have to double back one page. But that's not a big deal if I'm still doing
things twice as fast. But the third algorithm looks fundamentally different. It
looks like this. Logarithms, if you recall from high school or prior-- if you
don't, that's fine too.
It's just a fundamentally different function, a different shape. And notice that
the green line is going up and up and up but a much slower rate of increase, which
means crazy things are possible. If two towns in Massachusetts, like Cambridge and
Allston, across the river, merge next year, for instance, in terms of their phone
book, they're phone book just got twice as big.
For the first algorithm, that's going to take me twice as many steps to go through.
The second algorithm, almost it's going to take me 50% more steps to go through,
two at a time. But the third algorithm, that I ended with, tearing things again and
again, dividing and conquering, if you will, in half and in half and in half, how
many more steps will my third algorithm take if Cambridge and Allston merge into a
phone book that's twice as big?
DAVID J. MALAN: Just one more step, right? No big deal. You just take a really big
bite out of the problem once you decide if John Harvard is to the left or to the
right. And so you've made much faster progress. And so this, in essence, is what
your computer, your phone is probably doing underneath the hood when searching for
Harry or Hermione or Hagrid or anyone else because it's that much faster,
especially when you have large data.
If you don't have that many contacts, it probably doesn't matter if you search from
top to bottom or more in the form of this divide-and-conquer algorithm. But if
you're the Googles of the world or you're analyzing large data sets, indeed, this
is going to add up quite quickly. So where do we go with this? Well, we're going to
introduce next something called pseudocode. How can I translate what I did verbally
there, sort of intuitively, to actual code?
Well, this won't be Scratch. This won't be C or Python just yet. It's just going to
be an English-like syntax. And this is how many programmers would start solving a
problem. They don't start typing out code in C or Python or the like. They use
English or whatever their human language is to jot down an outline for their ideas.
My first step, really, was picking up the phone book. My second step was opening to
the middle of the phone book.
My third step was somewhat different-- look at the page, because why? My fourth
step was if person I'm looking for is on the page, I then do what? It never
happened in my example, but I call the person. So I'm done. Else if the person is
earlier in the book alphabetically, as John Harvard was in the case of my H, then I
should search to the middle of the left of the phone book. And then I should go
back to step three.
Step 3 is look at the page, thereby repeating the same process again and again.
Step 9, though, might be else if the person is later in the book, then let's go
ahead and open to the middle of the right half of the book and then go back to line
3. Else there's a fourth scenario we should probably consider, lest my search
process freeze or crash or give me one of those spinning beach balls with a bug.
Yeah--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, what if John Harvard isn't in the phone book? I'd prefer that
my algorithm, my phone not just reboot or freeze. I should handle that with some
kind of catch all. Else, so to speak, let's just quit the program. So there's well-
defined behavior for every possible scenario of the four. Now, let's call out a few
of these salient terms. It turns out, if I highlight in yellow here, there's a
pattern to what I've been doing here.
These are all of my English verbs. And in a moment, we're going to start calling
those verbs "functions." When you program or write code and you want the program or
the computer to do something for you, some action or verb, we're going to refer to
those actions or verbs as these things called "functions," like those here. By
contrast, I've just highlighted, instead, my "if," my "else if," my "else if," and
"else."
And the last characteristic here is this here. Someone called this out earlier, in
fact. These lines, 8 and 11, are now highlighted and represent what? What might we
call these in code if you've done-- yeah, so these are loops, some kind of cycles
that result in my doing the same thing again and again, but there's a key detail
with this algorithm in pseudocode. Even though it's telling me to go back to line
3, why is this algorithm eventually going to stop?
Why do I not constantly keep looking for John Harvard forever by nature of these
loops telling me to keep going back to line 3? Good. Eventually, he'll be on the
page or, to your point earlier, he won't be at all and we're out of pages, and so
we just quit. And that's the key about going to the left half or the right half. It
doesn't matter if you do the same thing again and again.
You're not going to get stuck in a so-called infinite loop so long as you keep
dividing the problem and shrinking it into something smaller, smaller, smaller.
Eventually, there's going to be no problem left to solve. So even if you don't
think of yourself as a computer person, even if you've never written code, what
you'll find in the coming days is that these ideas that we've just kind of
harnessed from real life are at your fingertips already.
And a lot of the process of learning to code is just going to be some bumps in the
road because you can't quite see the new syntax in a familiar way. But you'll find
that the ideas, in fact, are going to be more familiar than you might otherwise
think. And so we'll see in a bit-- and we'll take a break in a moment to take a
breather-- that you will see these same ideas in a moment in the context of
Scratch, an actual programming language via which we'll drag and drop puzzle pieces
to make actual code work.
We'll see some variants of these ideas, things called "arguments" and "return
values" and "variables." But we'll ultimately convert it into this somehow. Anyone
want to wager what this program will do if fed to your Mac or PC or phone? Here's
just a massive pattern of zeros and ones.
AUDIENCE: [INAUDIBLE]
But at the end of the day, you're never going to be writing the 0's and 1's
yourselves, though our ancestors, once upon a time, did in some form. We'll be
using a much higher-level language, like this in C, or better yet, in just a
moment, like in Scratch, like this.
And indeed, this is why today we focus on and begin with Scratch, this graphical
programming language, so we have a way of expressing ourselves with functions,
conditionals, loops, and more, but in a way that doesn't have stupid parentheses
and curly braces and all of these visual distractions in the way and will translate
that thereafter to this lower-level language. But for now, that was a lot.
That was definitely a fire hose. Let's go ahead and take a 10-minute break. Feel
free to get up or stay here. And we'll resume in a bit with some actual code.
[VIDEO PLAYBACK]
[THUNDER RUMBLING]
- (SINGING) Hi. Hi. We're your Weather Girls, and have we got news for you. You
better listen. Get ready, all you lonely girls. And leave those umbrellas at home.
All right. Humidity is rising. Hmm, rising. Barometer's getting low. How low, girl?
According to all sources-- what sources, now-- the street's the place to go. We
better hurry up. 'Cause tonight, for the first time, just about half-past 10:00--
half-past 10:00-- for the first time in history, it's going to start raining men--
start raining men!
It's raining men, hallelujah! It's raining men, amen! I'm going to go out. I'm
going to let myself get absolutely soaking wet! It's raining men, hallelujah!
[END PLAYBACK]
[APPLAUSE]
So this then is Scratch, a graphical programming language from our friends down the
road at MIT's Media Lab that indeed some of you might have used in grade school or
the like, for playing and writing code and the like, but you maybe didn't
necessarily think about how some of these primitives ultimately worked.
And in fact, everything you've done-- if you've used Scratch before-- and
everything you'll see today is going to apply to all of the weeks to come, as we
explore these things called "functions" and "loops" and "conditionals," "Boolean
expressions" and more. With Scratch, because it's so graphical and animated-
congruent, can you create animations, like this one, interactive art, and software
more generally.
But you'll do so by dragging and dropping puzzle pieces that only lock together if
it makes logical sense to do so. And you won't have to deal with, in this first
week of class, is curly braces, parentheses, all of the weird symbology that you
might recall seeing, when we just wanted to say, "Hello, world." Now, this
particular program, "Raining Men," was written by a former CS50 teaching fellow,
Andrew Berry, who's actually now the general manager of the Cleveland Browns, the
American football team.
And so these are just some of the programs that some of your predecessors in the
class have created. And you'll see, in the remainder of class here, a couple of
others as well, and more in the course's first assignment, namely, problem set
zero. So how do we get there? Well, first a quick tour of what it is we're going to
do. This, in Scratch, is perhaps the simplest program you can write.
And even if you've never seen Scratch or any programming language before, you can
probably guess that this just says, on the screen somehow, "Hello, World." But what
you don't have to do is type esoteric commands and weird syntax, those curly braces
and parentheses I keep alluding to. You just drag this yellow puzzle piece. You
drag this purple puzzle piece. Let them magnetically lock together, so to speak.
Click a button and boom. With those same building blocks and several others, can
you make exactly the sorts of things that Andrew brought to life as well. So here's
what we're about to see. At Scratch.MIT.edu is a cloud-based programming
environment on MIT servers. You can also download it offline on your own Mac or PC.
And it gives you an interface like this.
On the left-hand side of the screen, you'll see a blocks palette. These puzzle
pieces, a.k.a. blocks, come in different colors which rather categorize them. So
pictured here, for instance, in blue, are a whole bunch of motion-related blocks.
So Andrew used a whole bunch of those to have the singer and the men moving around
on the screen in synchronicity with the song that was playing in the background.
Meanwhile, in the middle of this interface is going to be the code area. And this
is where Andrew, and soon you, will drag and drop some of those puzzle pieces and
other colors as well and lock them together to get your character-- soon to be
invented-- to do something on the screen. Indeed, at the bottom right here, will
you see, ultimately, a sprite area, where a sprite is a technical term for like a
character in a video game or a programming environment like this.
By default, historically, Scratch is the cat, the mascot, if you will, for this
programming environment. And so here we see, by default, just one sprite selected
because on the top right of the screen is the stage for that sprite. And you can
click and zoom in to make it full screen. But this is the world in which Scratch--
by default, the cat-- will live. But you can change Scratch's costume so that it
looks like a singer or the man falling from the sky or the like or anything else,
either creating the art yourself or importing some of the things that come with it
or elsewhere online.
So what is this world that Scratch rather lives in? Well, generally speaking, we
won't have to care too much about numbers because we'll be able to ask questions,
like interactive ones, like is Scratch the cat, or any character otherwise,
touching the edge of the screen, touching something else? But Scratch does exist in
this two-dimensional coordinate-system world. So when the cat or any character is
dead center in the middle, that would be xy location 0,0, if you will.
Meanwhile, over here is 240 pixels, or dots, all the way to the right. So this
would be 240,0, where y is 0 because it's right on that midline. So it's neither up
or below. Over here to the left, of course, would be -240 and 0. Above the cat
would be x equals 0, because it's right on that vertical midline, and 180. And then
down here, as you might guess, would be 0, negative 180.
Generally speaking, we don't have to care about those precise pixel coordinates.
But it's helpful, ultimately, if you do want the cat to move up, down, left, or
right. Having some sense of direction according to the x-axis and y-axis as well
can help you express your ideas, ultimately. So what might some of those ideas be?
Well, let's do this. I'm going to go ahead and create, on Scratch.MIT.edu, just an
empty screen like this one here.
And so this is the exact same interface. But now I'm in my browser, full screen, so
that I can start writing some code. And let's get that cat to say something
actually on the screen. Now, this takes a little bit of practice. But honestly,
just by scrolling through these puzzle pieces can you quickly get a sense of what's
possible, not just categorically, but specifically. And I'll jump around because
I've done this, of course, before.
But I'm going to go to events, in yellow, first. And I'm going to drag and drop
this first block, called when Green Flag clicked. And I've zoomed in there just to
make it a little more legible. And notice that the shape of this Green Flag just so
happens to mirror this Green Flag here at top, next to this red Stop Sign, of
sorts. And the Green Flag is going to mean go and the red Stop Sign is going to
mean stop, to start or stop our program.
Next week, you're going to be writing a textual command at your keyboard to do the
exact same idea. But for now, it's a button. So when Green Flag clicked, what do I
want Scratch to do? Well, how about we have Scratch just initially say something
like, "Hello, world," which indeed, historically, is the first program that most
any programmer might write. So anything related to what the cat looks like it's
doing is actually going to be under looks, here in purple.
So I'm going to drag over Say "hello." And you'll notice something curious and
different about this purple block. It says, of course, "Say" in purple. But then
there's this white oval and some text that, by default, is "hello" because MIT just
decided that, by default, the placeholder will be "hello." But anytime you see this
white oval, it's an opportunity to provide an input into the function called Say.
And so here I'm borrowing terminology from before. Problem solving, again, is all
about inputs producing outputs. And in between there is some algorithm. In a
moment, we're going to start referring to algorithms quite frequently as
"functions." Why? Because it's the implementation of some algorithm. So let me
override the default with, "Hello, world." I'll zoom out. And now if I go to the
top right of the screen and click the Green Flag, we'll see, hopefully, my very
first program in code.
Now, it wasn't a huge lift, right? It only was a matter of dragging and dropping
puzzle pieces. But what has now happened? Well, it turns out that two things have
happened. When I, the human, clicked on that Green Flag, I triggered, what we're
going to start calling now, an "event." An event is generally something graphical
or interactive that just happens in a computer program.
You and I trigger events on our phones all day long. Whenever you tap or drag or
long press or pinch or any of those gestures in vogue nowadays on phones, you are
triggering events. And people at Apple and Google and elsewhere have written code
that listen for those events and do something when that event happens. That's what
I just did. When Green Flag is clicked, I want something to happen, namely, I want
this purple function, this verb, this action called Say, to do something.
What do I want it to do? I want it to say what this input is. And I'm going to
introduce another vocabulary term. The white ovals here are, yes, inputs, very
generically. But in a programmer's terminology, they're called "arguments,"
otherwise known as "parameters." And that just means an input to a function that
modifies its behavior in some way. When I click Stop, that's just another event.
And that one is just built into Scratch. Scratch knows that when you click the
green Stop Sign, everything should just stop automatically. I don't have to write
code to support that feature. So that's all fine and good, "Hello, world." But if I
keep doing stop and start and stop and start, it's going to do the same thing again
and again. And it's really not that interesting, at the end of the day, maybe
gratifying once, but it'd be nice if this were a little more interactive.
So it turns out that we can do that too. But we need a different mental model
instead. So in this case here, when we think about this function, Say, in this
input, "Hello, world," this actually maps pretty cleanly to this model earlier,
that I propose is problem solving, is computer science, if you will. The input to
the current problem is going to be in white here, "Hello, world."
The algorithm is the "say" algorithm. Now, I don't know how MIT got it to print out
the little, pretty speech bubble on the screen. But they wrote those underlying
low-level implementation details. And they gave me and you a purple function,
called Say, that just does that for you. You and I don't have to reinvent that
wheel. The output of Say is another technical term, now, called a "side effect."
A side effect is usually something visual that happens, like as a side effect of
you calling a function. And so the side effect here is that the cat has this speech
bubble magically appear, inside of which is "Hello, world." So we have an input. We
have an output. We have an algorithm. But now we're talking about these ideas in
the context of programming. So now the input is an "argument."
The algorithm is a "function." And the output, in this case, is a "side effect"--
terminology that you'll just hear more and more. And it'll eventually sink in, but
not to worry if the terminology doesn't come naturally early on. So what more might
I do with this? Let me go back to Scratch here and make this maybe perhaps more
interactive and actually get the cat to say something a little more dynamically.
So instead of "Hello, world," why don't we get him to say hello to me or to you or
anyone else? So let me do this. Let me go under, say-- let me get rid of this
first. And you'll notice this neat trick. As soon as you start dragging on a block,
if it gets close to it, it kind of goes gray, and it can be magnetically snapped
together. You don't have to do it very precisely. Conversely, if I want to get rid
of a puzzle piece, I can just drag it anywhere on the left, let go, and that
deletes it.
Or you can Right-click or Control-click and a little menu will let you delete it as
well. Well, let me do this instead. Under Sensing, which I know is there because
I've done this before, are a whole bunch of things related to Sensing, whereby the
cat can kind of feel out its world, in some sense. It can do things like ask this
question, "Am I touching the mouse pointer?"-- like the user's cursor.
"Am I touching a specific color that you can override to be anything you want?" "Is
the distance to the mouse pointer some specific value?" But for now, I'm going to
focus on this, this blue puzzle piece that asks a question, which itself is this
white oval that I can apparently change, and then it's going to wait for a
response. But this puzzle piece is a little different. It's a little special. It
comes with a freebie.
It comes with what we're going to call, technically, a "return value." So some
functions don't just do something on the screen. They hand you back, so to speak, a
value that you can do anything that you want with. Nothing happens immediately
unless you do something with that so-called return value. So let me go ahead and
drag this thing over here, ask, "What's your name?" And I'll use the default
question.
That seems a reasonable place to start. I'm not going to override that default. And
now let me go ahead and zoom out. Let me go back to Looks. Let me go to Say. And
let me just form the English sentence I want. So let me zoom in here and type in
"hello," maybe comma, space. I could do "David," but that's obviously not right
because I'm asking for a name, and then I'm like, in advance, hard-coding my name.
That's not what I want.
I just want, "hello," comma. And now let me zoom out and grab one more Say block.
Let me maybe Say here. OK, I don't want to say, "Hello, hello." I don't want to
just type in my own name because, again, then what's the point of asking the user
for their name? But notice this. If I go back to the sensing block, this is where
that oval that's blue, called Answer, is useful.
This will be the so-called "return value" of that function. So I'm just going to go
ahead and do this and drag and drop. Even though it's not the right size, it is the
right shape. And so Scratch will be smart about it and grow to fill that puzzle
piece for you. Let me zoom out now. And now let me click the Green Flag. You'll see
that Scratch is indeed prompting me with a speech bubble, "What's your name?"
Notice the little text box below the cat is asking, what's your name? So I'm going
to type in D-A-V-I-D and hit Enter. Or I can click the blue check. Enter. OK, it's
a little weird. I wanted him to say, "hello," not just my name. So let me Stop. Let
me start it again. All right, hello, what's your name? D-A-V-I-D. Enter. Huh-- kind
of rude. Why is there this bug? Like, I wanted to say, "Hello, David," not just
"David." And yet twice it has failed to do so. Yeah--
AUDIENCE: [INAUDIBLE]
It just seems to have just said, "David." So all right, how can I fix this? Well,
here's where you start to poke around and think about how you might solve this. Let
me go back under Looks. Maybe there's a smarter way to do this. Maybe I could do--
OK, I could do this. How about instead of just Say "hello," there's apparently
another puzzle piece where I can time it so I can maybe slow things down a little
bit.
So let me do this. Let me throw away all of this. Let me drag a Say "hello" for 2
seconds. Let me drag another Say "hello" for 2 seconds. Let me change the first one
to, indeed, "hello" comma. And then let me go back to Sensing. Let me grab that
same answer because I threw it away a second ago, and I'll just change it. I don't
even have to delete "hello." I can just overwrite it like this.
So now I think we'll kind of pump the brakes and see things more slowly. Let me
Stop. Let me start. D-A-V-I-D, Enter. Hello, David. OK, so it's better, like it
seems to be working. I think your hypothesis was right, just looks kind of stupid,
right? Like, the fact that it's saying Hello--
[PAUSE]
--David, like we can do better. And like, literally every piece of software on your
phone or Mac or PC is better than that. It adds words together in the user
interfaces you and I are familiar with. So let's go a little more fishing here. Let
me throw away these. Let me go back to Looks and just get the simpler Say. I want
this to say, "Hello" comma name, where name comes from that Answer return value.
So how can I do this? Well, let me go under Operations, which we haven't been
before. There's a lot of stuff in here. Some of it's mathematically related,
adding, subtracting, and so forth. You can generate random numbers which might be
useful. And if I keep scrolling down, there's this Join "apple" and "banana." But
that's just placeholder text. You can join one piece of text with another piece of
text, by default "apple" and "banana."
But let's change it to "hello" and my name. So this, too, wrong size but right
shape. So let me let it snap into place. Let me go ahead now and do "hello" comma.
And now I think I just want to go grab that Answer return value. Let me drag the
same oval as before, clobber-- that is, overwrite-- banana. So now I'm kind of
composing functions. The output of one function, Join, is going to be the input of
another function, Say.
So let's see what happens now that they're kind of stacked on top of each other or
nested, so to speak. Click the Green Flag, D-A-V-I-D. Enter. "Hello, David." All
right, that was pretty fast. Let's just do it once more. Stop. Start. Here we go,
D-A-V-I-D. Enter. OK. All right, it's not the most exciting program in the world.
But it's more correct. It's better designed just because that's what you would kind
of expect the software to do and not be some kind of lame user interface that's
just inserting random delays to just make it kind of work, like that's a
workaround, a hack, if you will.
But there's some cool things you can do with Scratch. And we won't really go down
the rabbit hole of all of the fun and family-friendly features that it has. But
there is one that's kind of cool here. Let me go into the Extensions button at the
bottom left of my screen. And this one's kind of cool. Let me go to Text to Speech.
And you'll notice that this one requires internet because it's cloud based.
But this just gave me some new puzzle pieces in a new category, Text to Speech. And
these green ones do exactly what they say. So let me do this. Let me zoom out
again. Let me keep the Join block. And I'm just going to temporarily toss it over
here. It's not going to delete itself because I didn't drag it over to the other
side. But I'm going to get rid of the Say block, in purple. I'm going to do the
Speak block here, in green, and let it snap into place.
And then I'm going to drag and drop this onto the input to Speak. And now, perhaps
a little more adorably, let's try this. Green Flag, what's your name? D-A-V-I-D.
Enter. And--
[LAUGHTER]
It's a little robotic. But at least now it has synthesized speech. And I've kind of
got my own, like, Siri or Google Assistant or Alexa thing going on here now, where
it's now recognized whatever text it is, and it's played it. Well, let's make this
an actual cat that doesn't talk in that weird human voice. Let me go ahead and get
rid of most of this stuff. And let's get the cat to actually meow, like a cat tends
to.
And let me go under the Sounds block. Now MIT gives you a few sounds for free
because it's designed around a cat, by default. And I'm going to go ahead and grab
this one, Play Sound Meow until done. And now-- and we heard a teaser for this
earlier in the crowd--
[MEOW]
It's a little piercing, admittedly. Maybe we can lower the volume a little bit
there. But notice, if I want the cat to meow a second time, I'll just click it
again.
[MEOW]
[MEOW]
OK.
[MEOW]
All right, so it's kind of cute now, right? So it's just meow-- OK, yes, echo,
echo. So it's meowing now every time I hit the Green Flag. Now, that's great, but
even a kid is probably going to--
[MEOW]
[MEOW]
--just meow, perhaps, like again and again, without having to keep--
[MEOW]
--hitting the button. So how might we do this? All right, well, if I want it to
meow multiple times, why don't I just, like, grab it another time and another time?
Alternatively, you can Right-click or Control-click a puzzle piece and just
duplicate it from a little menu that drops down. So here we go, three meows.
[MEOWING]
All right, that's not really a happy cat. It sounds maybe hungry. So can we slow
that down? Well, maybe. In fact, if I poke around, let me go under Control. It
looks like there's a Wait block. Wait 1 Second, by default. And notice, Scratch
will be pretty accommodating. If you just hover in between blocks, it will grow to
fill that too. So I could change it to 1 or 2 or anything, seconds. I'll just leave
it at the default for now, 1. And now I'll go ahead and do this.
[MEOWING]
OK, so cuter and less hungry and just more friendly. But this isn't the best
design. It is correct. And let's use that as a term of art. Correct means the code
does what you want it to do. I want the cat to meow three times slowly. And it did.
So I'd wager this is correct. But it's not the best design. And this is where
things get more subjective, right? Like, you could write accurate sentences in an
essay for an English class, but otherwise, it's just completely a mess.
Like, your arguments here and there, and you don't say anything wrong, but you
don't say it well. In the context of code, we can do better than this. And
Copy/Paste or repeating yourself again and again tends to be bad practice. Why?
Suppose that you want to change the Wait to 2 seconds instead of 1. It's admittedly
not a big deal. Fine, I click there, I change it to 2. I click there, I change it
to 2.
But what if you meow 5 times, 10 times? Now I have to change the Wait, like, in 5,
10 different places. Like, that's just stupid. It's taking unnecessary human time,
and you're going to screw up eventually, especially if your program is getting
longer. You're going to miss one of the inputs. You're going to leave the number
wrong. And that's a bug.
So just based on what you've seen already or if you've program before, which a few
of you have, what's the term of art here that will solve this? How can we design
this better?
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: I heard it here. Yeah, so a loop-- a loop, some kind of cycle that
says, do that again. Do that again-- not infinitely many times, necessarily, but
some finite number. Well, you can perhaps see a spoiler on the screen. Under the
same orange Control category is a Repeat block. And by default, it's proposing 10.
But we can change that. So let me do this. I'm going to throw away most of this
Copy/Paste as redundant.
I'm going to detach this temporarily just to make room for something else. And I'm
going to drag a Repeat block over here and let that snap into place. And I'm going
to change it for now, just to be 3, for consistency. And this is the correct shape
even though it's too small, but Scratch will accommodate that for us. And now--
same output but arguably better designed.
Why? Because if I want to change the number of meows, I change it in one place, no
Copy/Paste messiness. If I want to change the waiting, one place. I don't have to
change it in multiple places and not screw up. So let me hit the Green Flag.
[MEOWING]
All right, so-- nice. Now, it would have been nice if MIT had just given us a meow
block that just automates all of this for us. Let me wager, they gave us the low-
level implementation details. They gave us the Play Sound Meow. But I had to
implement a decent number of blocks just to get a cat to meow again and again. I
feel like we should have gotten that for free from MIT.
Well, they don't have to be the only ones that invent blocks for us to use. You can
write your own functions, your own verbs or actions. So how can we do this? Let's
make our own puzzle piece, called Meow, that uses this code but creates it in such
a way that it's reusable elsewhere. So let me do this. Under my blocks in pink
here, I'm going to go ahead and click, literally, Make a Block.
Now, here's an interface by which I can give the block a name. M-E-O-W will be the
name of this block. And I'm just going to go ahead and quickly click OK. That just
gives me a very generic, pink puzzle piece that starts with the word Define because
scratch is asking me to define, that is, implement or create, this new puzzle piece
for me. Well, what does it mean to Meow?
I'm going to claim that it means to do these two steps, to play the sound meow and
then just wait for 1 second. But what's powerful about this idea is look at this up
top. Now that I've made a block, it exists in Scratch. MIT didn't need to create
this for me. I created it for myself and even you, if we end up sharing code. So I
can now drag Meow up in here. And what's nice about Meow is that itself is, yes, a
function, but it's also an abstraction.
Like, never again do I or even you need to worry or care about what it means to
meow or implement it. I can sort of drag it out of the way. I didn't delete it--
drag it out of the way. Out of sight, out of mind. Why? Because my code is now even
better designed, in some sense, because it's more readable. What is it doing? When
the Green Flag is clicked, repeat 3 times Meow.
It just says what it means. And so it's a lot easier to read it, and it's a lot
easier to think about it, especially if you're using Meow in other projects too.
Now, let me go ahead and click Play.
[MEOW]
Same thing.
[MEOW]
[MEOW]
But I can make this custom puzzle piece, this own function of mine, Meow, even more
powerful. Let me kind of rewind a bit and go to my Meow puzzle piece. And I am
going to Control-click or Right-click on my pink puzzle piece. And I'm going to
edit it. So I kind of regret making Meow so simple. Wouldn't it be nice if Meow
took an input, a.k.a. an argument, that tells Meow how many times to meow.
Then I can get rid of that loop and just tell Meow how many meows I actually want.
So I'm going to click on another button here called, literally, Add an Input. And
it's going to have placeholder here. So I'm just going to put a placeholder there.
I keep using "n" for number, which is a go to in computer scientist terms. And I'm
going to add some descriptive text just so that it's a little more self-
explanatory.
I'm just going to say Meow n Times. But there's only one oval. Times is just going
to be explanatory text. And now notice what has happened. Now my puzzle piece takes
an input, a.k.a. an argument, that will tell that function to meow some number of
times. But it's not just going to work magically. I need to implement that lower
level detail. So let me zoom out. I have to remind myself what this function was.
So I'm going to drag it higher up just so they're on the screen at the same time.
I'm going to go ahead now and temporarily move this over here. I'm going to
temporarily detach this over here. Why? Because what I thing I want to do is move
my loop into the function itself, move the Play and the Wait into the loop. But I
don't want a hardcode 3. Notice that n here is its own oval I can drag a copy of n
and just let it go there.
So now I have a new version of Meow that takes an argument, n, that tells Meow how
many times to meow. And now let me, again, drag this out of sight, out of mind,
because who cares how I implemented it? Once it's implemented, it's sort of done.
Now my program is even better designed, in some sense. Why? Because now it really
just says what it means. There's no loop. There's no repeat, no implementation
details.
When Green Flag Clicked, Meow 3 Times. And so functions indeed let you implement
algorithms, like they're just code that do something for you. But they're also
themselves abstractions. Why? Because once a function exists, it has a name. And
you can think about it in that term. And you can use it by its name. You don't have
to care or remember how the function itself was built, whether it's by you or even
MIT. So again, here I'll click the Green Flag. It's the same thing.
[MEOWING]
So still correct, but better and better designed. And so any time, here and out,
with Scratch, or soon C, and eventually Python, when you find yourself doing
anything resembling Copy/Paste or again and again grabbing the same code, probably
an opportunity to say, wait a minute. Let me refactor this, so to speak, that is,
rip out the code that seems to be repeated again and again and put it in its own
function so you can give it a descriptive name and use and reuse it. Any questions
just yet on now saying or these loops or these functions that we're using? Yeah--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: How did I make it so it meows three times? So I originally only had
a puzzle piece called Meow. And I decided to improve it. So I held down Control and
I Right-clicked or Control-clicked on the pink puzzle piece at top left. And I
clicked Edit. And that brought back the original interface that lets me add some
arguments to the puzzle piece itself. And I clicked Add an Input on the left here.
And then I clicked on Add a Label over here. So that just lets you customize it
even further. All right, so we've done this. Let's add one of those other
primitives too to do something optionally. So how about we make the cat meow only
if it's being petted by a human, as by moving the mouse to hover over the cat, like
a human would pet a cat? Well, let me go ahead and throw away the meowing for now.
And let me simplify it by just using a sound. I'm going to go ahead and do this.
I'm going to go ahead and have a Control block that says If, because I want to
implement the idea of if the cursor is touching the cat, then play sound meow. Or I
could use my same pink puzzle piece. But I'm going to throw that away and focus
only now on the sounds. And I'm going to do this.
If touching mouse pointer-- so I need to sense something about the world. And we
saw this earlier-- so If Touching Mouse Pointer. So notice this shape here, way too
big. But it is the right shape. So if I hover just right, it'll snap into place.
And this now, in blue, is my Boolean expression, a yes/no question, true false.
"If" is a conditional. And what do I want to do? Well, if the cat is touching the
mouse pointer, I want to go ahead and play sound meow until done.
So let's do this. I'm going to hit Green Flag, click. Now nothing's happened yet
because it's a conditional, right? It's only supposed to do something if I'm
touching the cat. Let me move the cursor over to the cat. And-- wait for it. Hmm--
another bug. Why is the cat not meowing even though I very explicitly said, If
Touching Mouse Pointer, Meow? Yeah, in the middle--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, this is-- again, my computer's just so darn fast, like yours.
I click the Green Flag, it asks the question, am I touching the mouse pointer?
Well, no, because my cursor was up there, not touching the cat. It's too late. The
cat's out of the bag. And so we have to instead solve this by some other means. How
can we fix this? How do we fix that sort of race? Yeah--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, so why don't we just keep asking the question until I
eventually am or am not actually petting the cat? So let me detach this
temporarily. Let me go under Control. Let me go under-- instead of repeat some
finite number of times, let's just do it forever. So sometimes loops that do work
forever are a good thing. The clock on your phone, that's in a loop forever because
you want it to always tell time and not stop at the end of the day.
So sometimes you do want code to loop forever, as in this case. So let me go ahead
and drag and drop it there. Let me, again, click the Green Flag. Nothing's
happening yet. But notice, the program is still running. And so if I move my
cursor, move my cursor, move my cursor, and--
[MEOWING]
OK, so maybe we could add some Waiting. But the cat does not want to be pet, in
this case. But it's indeed conditional. So there we have an incarnation in Scratch
of doing something conditionally. Now, we can make this really cool, really fast,
if you will. Let me stop this version. Let me go ahead and do this. Let me go ahead
and throw all of this away. Let me go into my little Extensions bucket over here.
And let me do Video Sensing, since most laptops or phones these days have cameras.
And there, indeed, I am, with Sanders behind me. And let me do this. When Video
Motion-- and let me get out of the way. When Video Motion is Greater Than some
value. So 10 is the default. This is just a number that measures how much motion
there is or isn't. So small number is like no motion.
Big number is lots of motion. So I'm going to choose 50, somewhat arbitrarily
here-- so 50. This is not normal to program off to the side. But I'm now going to
say this. When Video Motion is 50, go ahead and Play Sound Meow like this. So the
cat is still in that world. I'm going to stop the program and rerun it. So here we
go, Green Flag. And now here comes-- all right, this is a little creepy, the way
I'm petting the cat, but-- and-- [SIGH]
[MEOWING]
OK.
[LAUGHTER]
There we go. OK, so 50 was too big of a number. I have to pet the cat faster.
[MEOWING]
[LAUGHTER]
Yeah, so--
[MEOW]
OK, so you can make things even more interactive in this way by just assembling
different puzzle pieces. And honestly, there are so many different puzzle pieces in
here. We're not going to even scratch the surface of a lot of them. But they
generally just do what they say. And indeed, when you see on the screen here this
pallet of puzzle pieces, really a lot of programming, especially early on, when
learning a language, is just trying different things and try and fail.
And if it doesn't work quite right, look for an alternative solution there too, as
even I just had to do a moment ago. Well, let's go ahead and use, actually, how
about another example of something a predecessor of yours made? Let me go ahead and
grab a program I opened in advance here called Whack-A-Mole Might we get a brave
volunteer to come up, who is willing to whack a mole with their head, virtually?
Maybe-- OK, let's see, how about in way back? You want to come on down? All right,
come on down. Sure, a round of applause for our volunteer.
[APPLAUSE]
AUDIENCE: Hi there.
[APPLAUSE]
All right, so same idea here-- I'll take the mic back. You'll have to stand in
front of the camera. In just a moment, you're going to have to position your head
in a box that your classmate from yesteryear created.
[MUSIC PLAYING]
AUDIENCE: OK.
[LAUGHTER]
[APPLAUSE]
So notice how using some fairly simple primitives, things do get interesting pretty
fast. And how was that implemented? Well, there were probably at least four
sprites. So you're not confined to just one cat. You can create more and more
sprites, change what they look like. So they actually look like a mole, in this
case. There's probably some conditionals in there, some loops for 30 seconds.
That's checking if Josh's head's movement is exceeding some value over this way or
over this way, then increment something called a variable. We'll see those too.
Just like in algebra you might have x and y and z, storing values like numbers, so
can computer programs, have variables called x or y or z, or more descriptively
called Score, as in this case at top right, or another variable called Countdown,
typically one word in code, but in this case two words, that just store some value.
So there's probably some math going on in there whereby the author of this program
just is incrementing, that is, adding 1 and 1 every time it detected that a mole
had been whacked, in this case, with movement. So back in the day, I, myself,
actually implemented my very first program in Scratch when I was a graduate
student, actually, at MIT-- cross-registered at MIT, taking a class from MIT's
Media Lab, specifically, the lifelong kindergarten group, which is the group that
created Scratch, itself.
And the program I wrote all those years ago and still rather cling to is a little
something here called Oscartime, that I thought I'd play just a quick excerpt of
myself here. So in this case, consider, as the music starts playing, how this
program, which is much more sophisticated, certainly, than the earliest "Say hello"
examples we just did might also be implemented. Let me go ahead now and click the
Green Flag.
DAVID J. MALAN: OK, so some trash is moving, presumably in some kind of loop from
the top. If I'm touching the mouse cursor, it follows me. If I hover over the trash
can, it responds. If I let go, in some kind of loop, Oscar pops out, creates a
variable with the current score. And it happens again.
OSCAR THE GROUCH: (SINGING) It's awful, the holes. And the laces are torn. A gift
from my mother the day I was born. I love it because it's trash. Oh, I--
OSCAR THE GROUCH: (SINGING) I have here some newspaper, 13 months old.
DAVID J. MALAN: So more and more sprites are suddenly appearing. And notice, that
each time they're appearing from a different part of the screen. That's an
illusion, perhaps, too, that-- pick a random number between x and y. So you can
actually pick some range of values to have the game constantly changing. And
indeed, I'm going to go ahead and click Stop, since I spent like eight hours plus,
years ago, making this.
And I can never listen to the song again, not that I should be anyway at this point
in my life. But this song is synchronized then with a lot of the actions that's
happening. And ultimately, there's just a lot of building blocks. But I didn't sit
down and implement Oscartime, as I called it, all at once. I really did take baby
steps, so to speak. And I figured out, well, how could I decompose this vision I
had at the time to create this game ultimately?
And how do I bite off maybe the easiest parts first? And honestly, the first thing
I did was I found this image, and I just dragged and dropped it into Scratch-- OK,
done-- like, lamppost is installed. It doesn't do anything. It's not interactive.
But I at least set the stage, so to speak, for the program. Then what else might I
have done? Well, let me do this. Let me go ahead and open up in another editor here
an early incarnation of Oscartime by doing this.
Let me go into Oscartime here. Let me full screen this. And here you have-- let me
hide the trash for just a moment-- is what I might call the second version of my
program, wherein, at the top right of the stage here, I had the lamppost, which I
just dragged and dropped and got going, but then I added an actual sprite. And it
has to be a sprite if you want it to do things interactively.
The lamppost-- not a sprite. It's just an image a costume, if you will, for the
whole stage itself, a backdrop. But this thing is indeed a sprite because it needs
to respond to code and events, like dragging and dropping. So what might I have
done early on with that code? Well, maybe the first version would have been
something like this, whereby my very first version of Oscartime might have said
something like, oh, this.
How about, let me control the program as before-- or, rather, events. When the
Green Flag is clicked, what do I want to do? Well, I want to go ahead and forever
do something like this. Forever-- so I want the lid to open up if I touch it. So if
the cursor gets near the lid, I want the lid to open up. And then if I move away, I
want it to close. So how can I do that? I want an If, but I just don't want one
question, I really want two, a fork in the road that goes left or right, so to
speak.
And let me grab this puzzle piece here, as I did long ago. So notice, it grows to
fill. What's the question I want to ask? Well, under Sensing, I'm going to go ahead
here and say If this trashcan is Touching the Mouse Pointer-- what do I want to do?
Well, I want to change what the trashcan looks like. And this part, I did in
advance of class. If you go up here to Costumes, this is where all the graphical
stuff happens.
And you'll see that I imported a whole bunch of different costumes that
effectively, much like a video, when you play them quickly, creates the illusion of
movement, some animation. But it's really just dot, dot, dot, dot, dot-- different
images showing on the screen. Well, some of these costumes are called like Oscar1,
Oscar2. Oscar1 is closed. Oscar2 is open. So let's just deal with those first.
So if I'm touching the mouse pointer, let me go under-- how about Looks? And we
didn't use this before, but there's this block, Switch Costume to Something Else.
I'm going to drag and drop this inside of the If. And notice it's a little bit
indented. I'm going to change it not to Oscar8, but Oscar2. Otherwise, If Not
Touching the Mouse Pointer-- this is the other direction in the fork in the road--
let's go ahead and switch the costume back to what I described as Oscar1.
So let me run this program. And not much of interest is happening yet. But notice,
if I move the cursor up, down-- but how is that working? It's just changing the
costume that's being overlaid on the sprite. So it looks like interactivity, but
you are really just changing the aesthetics. And we humans are just kind of
assuming, oh, it's opening up. Well, no, it's just changing a costume.
So here's the difference. The high-level abstraction-- trashcan opening. The lower-
level implementation detail-- costume changing, creating that illusion. And if I
want it to look prettier, I could just have many other costumes and go boom, boom,
boom, boom, boom to create more frames per second, if you will. So I need to do one
other thing. Maybe if I accidentally leave the trashcan open, let me make one
change here.
Let me make sure that the very first thing I do when the Green Flag is clicked, is
always start with the trashcan closed because otherwise, you might accidentally
leave it open. So this gets me into some default state. So now it's always closed
until I manually hover over it instead. Well, what might I have done next? Well, if
I wanted to introduce something like the trash, I need a second sprite.
And here, in advance, I grabbed the image already. Let me pretend that this never
happened. Let me drag this away here. And now I have nothing in my code area for
this piece of trash. But it is the second sprite. And all I did was I clicked on
the little cat plus icon here, created a second sprite. I named it trash. I added a
costume for it. Sort of the aesthetic stuff, I did in advance.
But here I'll do now the code. How do I want to do this? Well, how about when the
Green Flag is clicked, for the trash can, I want the trash can in parallel to do--
I want the trash, the piece of trash, to do its own thing. So what I want it to do
is maybe let's do Motion, how about? And let's go to a specific coordinate. Now,
there's a lot of options here. There's Turning, Go to a Random Position, Go to x,y,
Glide, more elegantly.
If you think back to that coordinate system, 0,0 is in the middle. 240 is straight
above it. All right, now, after I do that, what do I want to do? Well, how about I
control this thing by forever falling. Now, how do I make the trash move? We
haven't seen this puzzle piece yet. But under Motion, the very first thing is
called Move Some Number of Steps. By default, it's 10. But we'll do it more simply.
Let me go ahead and move-- oh, sorry. Move is going to move it in whatever
direction it's facing. I only want it to move down. So here, even I'm getting
confused as to how many different ways there are to do things. What I thing I want
to do is this. Let me only change my y-axis as follows. So here's another puzzle
piece called Change y. So again, y is the vertical.
So let me just change y by one pixel downward at a time, so -1 one pixel at a time.
So it's kind of slow. And I think now-- I think that's it. Let me hit Stop. Notice
that my trashcan is still going to be interactive. I haven't changed or deleted
that code. I've just added now code for my piece of trash. If I click the Green
Flag, notice that-- after I enable it-- let me start that again.
I had it hidden for before class. But let me enable it now-- Green Flag, notice it
starts dead center, at x equals 0, y equals 240, and it's dropping one pixel at a
time. If that seems a little boring, we can change it to -10 pixels at a time and,
boom, it's done. So that's how you might change the speed of a program. But I'm
going to leave it more simply as -1. And honestly, it would be nice if it doesn't
always start from the top.
Otherwise, this game is not going to be very interactive. I'm literally going to be
grabbing the trash from the same place every time. So why don't I, instead, Stop
this. Let me go under Operators, and let's pick a random number. So let me change
the hardcoded-- the manually inputted-- 0, and let's make x be somewhere between 0,
so in the middle and all the way over to-- what was it-- oh, I got my numbers
wrong-- 240 and my y will be 180.
Sorry, I got my x and my y confused. So let me play this again. And now we have a
game that's more like games you might have played growing up or even now, like
there's some randomness to it. So the CPU, so to speak, is doing something more
interesting. Let me run it again. Now it's a little to the left. Let me run it
again. Now it's a little more to the left. Again-- now it's back to the right. So
randomness just makes games more interesting.
And this is why when you play any video game, if different things are happening,
there's probably just some randomness. And it's quantized as just a simple number.
Now, I think I just need one final flourish here, if I may. Let me go ahead and add
this. How about Events-- or rather-- yes, Events. When Green Flag is clicked, I can
do multiple things within the same sprite. They don't all have to be attached to
the same one.
Let me go ahead and forever go ahead and do something else. How about, Whenever the
Trash is-- how about-- Touching the Trash Can-- so Forever If-- let's see, I need a
Sensing block. So how about, Is Touching-- not the Mouse Pointer, this time, but
Touching Oscar himself there. Now let's see what happens. All right, so let's go
ahead and click the Green Flag.
Now I go down over here and let go. OK, I kind of want it to go into the trash can.
How do I make it go into the trash can? How can we take this high-level idea, put
trash into the trash can, and make it seem to disappear? Logically, what could we
do? Yeah--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: OK, so when it touches it, let's have it disappear. So I could hide
it. Or honestly, if the game is going to be ongoing, like it was, letting me drop
more and more trash, let me just have it go ahead and pick a new random location.
So let me do this. Let me go ahead and Copy this puzzle piece up here and
Duplicate. And I don't want the whole thing, sorry. Let me get rid of this. Let me
just do this.
Let me go back to some random location at the top. So now notice what happens. If I
click and drag on it-- here it goes-- and I let go, it looks like it's going into
the trash can because it snaps back up to some random location. Now, the only thing
I'm not doing really is keeping track of any kind of score. And it turns out, if I
full screen this, it's not going to be draggable, by default.
So just as a corner case, so to speak, something that you might trip over
otherwise, let me go ahead and under, let's see, Sensing, it turns out I also need
this for the piece of trash. There's this way of setting, in Scratch, a sprite to
be draggable or not draggable. I need to explicitly make it draggable so that when
I do full screen this thing now, it still remains draggable and someone like myself
can play it again and again.
Well, how about we supplement this with one final flourish? Why don't we keep track
now of the user score? So how about, when the user actually drags the piece of
trash to the trash can, let me go under Variables here, where, in advance, I've
already made myself a variable called Score. I could have called it x or y or z or
ABC, but that's not very descriptive. In programming, you typically give things a
more descriptive English, or some other language, name.
So I called this one Score. So how do I want to do this in my Score? Well, let me
go ahead and initially set this game score to 0 at the very top of one of these
scripts-- one of these programs up here. And then any time my piece of trash is
touching Oscar, let's not just jump to the top, let's change the score by 1 up
here. So now notice, If Touching Oscar, Change the Score-- that is, Add 1 to the
Score-- and then Pick a new Random location.
And now Green Flag-- let's do this slowly. Here it goes. The trashcan opens. I let
go. And now notice, at the top left of my program, notice the score is now 2.
Notice the score, if I do this again, is about to become 3. And so here we have
building blocks, literally, of making this program better and better and better.
And so, indeed, that's how you generally approach solving any problem with code, be
it in Scratch or C or Python or some other.
You take this vision you might have or some vision you've been assigned in a
homework assignment and try to break it down into these constituent parts and just
pluck off the easy ones first. Put the lamp post there first, and at least feel
like you're making some progress. Then pluck off something like the trash can, and
just make it do a little thing. And it doesn't have to be in some same order here.
I could have done this in a million different ways. But figure out what the small
pieces are that, ultimately, like a few of the problems we've solved today,
assemble into a greater solution there too. So that you have now a mental model for
these types of blocks and others, let's return for a moment to this. We saw a
moment ago that when I started saying, "Hello, David," and nesting those puzzle
pieces, we had a whole different paradigm altogether.
My input for that second version of, "Hello, world," was to now pass in, for
instance, "What's Your Name?" into my function, called Ask. That gave me not a side
effect, but what I called, again, a return value, called Answer, by default, in
Scratch. And now notice and recall, when I had that same output become the input to
my next block, it looked a little something like this-- Say.
So how does this type of block and this nesting, this stacking of blocks, fit into
the same mental model? Well, same idea-- my input for that part of the story is now
taking in not one input but two-- two arguments-- "hello" and the answer from
before. The function, in this case, is that new block called Join. The output
thereof is, "Hello, David," which itself became-- if we sort of animate this-- the
input to my final function, which indeed was still Say.
And this is only to say-- no pun intended-- that almost everything that you do with
these puzzle pieces, be it in the context of Oscartime or the mole whacking or even
just something simple like, "Hello, world," will ultimately fit into that
relatively simple mental model there. Now, I thought we'd end by taking a look at
just a couple of final examples. These ones, too, made by some of your
predecessors.
And for this, I thought we would not write code together, but read it instead. And
so allow me to open up one other example here that will show us a few different
versions of a program that a predecessor made. Give me just a moment here. And
we'll see how we might build up to something even more interactive. And in just a
moment, we'll see something they called Ivy's Hardest Game, focused here on these
particular mechanics.
So here is version 0, so to speak, of this program, wherein the goal was to create
a game where you have to get out of some kind of maze. And you have to get out, in
this case, the Harvard crest from this maze. Let me go ahead and just hit Play on
this Green Flag so you can see what the first building block for this program might
have been. Notice that my hand here is actually on the Arrow keys on my keyboard.
And it seems that by moving up, down, left, or right, this little crest on the
screen responds in exactly that way. Now, let's hypothesize for just a moment. Even
though we've not done anything quite like this before, how might this code be
implemented? How do you get a sprite, be it a cat or a crest, to respond to keys on
a keyboard-- might you think intuitively? Yeah--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, there could be something sensing what key you're pressing on.
And if you do it again in forever a loop, you'll just constantly be listening for
keystrokes. And this is how, like, every piece of software nowadays works. It's
constantly waiting for your phone to be tapped or something to be typed on the
screen. So let me go ahead and look inside of this existing program here.
And there's more going on, but we'll take a quick glance what's actually going on.
Well, up here at top left, notice, we just have Go To x Equals 0 and y Equals 0.
That means put the Harvard crest dead center in the middle of the stage. Then we
have Forever two functions that we made in advance as custom functions-- Listen for
Keyboard, Feel for Walls.
So it's doing two things at once. It's forever listening for the keyboard-- up,
down, left, right-- and feeling for the walls, in the sense that if I get too far
to the left, I don't want it to keep moving past that black wall. And if it moves
too far to the right, I don't want it to blow through that wall either. So it's
going to do two things constantly, listening for keyboard and feeling for walls, so
to speak.
And how are those implemented? Well, this one's a bit long. But on the left here is
Listen for Keyboard. So this pink puzzle piece, Listen for Keyboard, first checks
If the Key Up Arrow is Pressed, question mark, Boolean expression in a conditional,
Change y By 1. That means, move it up 1. Else If the Key Down Arrow is Pressed,
then Change y by -1, and similar for Left Arrow, similar for Right Arrow.
And even though there's not a loop in this pink function, there is where I'm using
it. So it's constantly being asked again and again. How about feeling for walls?
Well, over here to the right-- it's a little cut off-- but here you have, If
Touching Left Wall, Change x by 1. So if you hit the wall, it's too late. You're
kind of blowing through it already. So I want to move it back one pixel so it's no
longer touching that wall.
Similarly, if it's touching the right wall, I want to back it up one pixel so it's
no longer touching that wall. So it's kind of like bouncing off ever so slightly so
that it doesn't slip through that actual wall. And what are those walls? Well,
notice down here, it's just a simple sprite with a black line that I've oriented
vertically instead of horizontally. And that's just so that I can ask questions of
these other two sprites.
Now, that gives me that form of interactivity. What more can I now do? Well, what
if we make things a little more interactive here? Let me go ahead and see inside
version 1 our second. And let me propose what's going to happen here. Well, how
might we add a little something like Yale into the mix? Well, what's Yale going to
do when I hit the Green Flag now based on this code? Any hunches? Here is the code
for my Yale sprite. Yeah--
AUDIENCE: [INAUDIBLE]
If Touching the Left Wall Or-- notice the green block-- Touching the Right Wall,
then just Turn around 180 Degrees. And indeed, if you think this through logically,
that just means you're bouncing this way and this way by just flipping yourself
around 180 degrees for just this Yale sprite. So if I go ahead and zoom in on this
and click the Green Flag, I can still move up and down.
But Yale is just kind of doing this all day long, back and forth and back and
forth, forever. Nothing bad happens if I try to go through it. But we could add
that, certainly, to the mix. In fact, let's add one final feature before we play
this particular game. And let me go ahead and open up the final version of these
building blocks that adds MIT to the mix. So here is MIT.
Someone want to explain what this code does? And this is what we're doing. This
itself is a skill. Reading someone else's code and understanding it is half of the
part of programming besides writing. Yeah--
AUDIENCE: [INAUDIBLE]
DAVID J. MALAN: Yeah, it's chasing down the Harvard logo outline. So this is
apparently the name of the costume that this student made, Harvard logo outline.
And apparently, it goes to a random position first. But then it forever points to
Harvard. So no matter where I'm moving it, up, down, left, or right, MIT is being a
little more strategic than Yale, bouncing back and forth like this.
So let's go ahead and play this one in full screen. And here we have a Green Flag.
So if I move up, MIT, rather strategically, is following me no matter where I go.
All right, so still, nothing bad happens. But now it's struggling, right? It's
going up, down, up, down. It's trying to follow me even though I'm not moving. So
we need some final flourishes. And so I think, for this, we need perhaps one final
volunteer.
After this, cake awaits for everyone outside, as is an end of first lecture CS50
tradition. Would you like to come up and be our volunteer?
[APPLAUSE]
All right. And so this will be the actual version but written by one of your
predecessors that I'll full screen here. It's going to stitch together all of these
same primitives and more, but add the notion of scores and lives so that there's
actually a goal, which in this case is to move the Harvard crest to constantly
pursue the character on the right-hand side so that your sprite touches that one.
Would you like to introduce yourself?
DAVID J. MALAN: All right, wonderful. Welcome aboard. And here we come with some
instructions and final flourish if we want to keep the lights up but perhaps
increase the music.
[MUSIC PLAYING]
DAVID J. MALAN: Notice he is using the up, down, left, right. But there's many more
walls now. First level's pretty easy. But now Yale's in the mix, bouncing back and
forth. Again, pretty easy. Now there's two Yale's at slightly different positions.
MIT is coming soon. But first, we have three Yales.
MC HAMMER: (SINGING) As such. And this is a beat, uh, you can't touch.
[APPLAUSE]
MC HAMMER: (SINGING) I told you, homeboy, you can't touch this. Yeah, that's how it
look when you know you can't touch this. Look at my eyes, man, you can't touch
this. Yo, let me bust the funky lyrics. you can't touch this. Fresh new kicks and
pants, you got to like that. Now, you know you want to dance. So move out of your
seat and get a fly--
DAVID J. MALAN: You got to go quick.
MC HAMMER: (SINGING) Catch this beat while it's rolling. Hold on. Pump a little
bit, and let them know what's going on, like that, like that. Cold on a mission, so
fall on back. Let them know that you're too much and this a beat, uh, they can't
touch. Yo, I told you. You can't touch this.
[APPLAUSE]
Yo, sound the bell. School's in, sucker. You can't touch this. Give me a song, a
rhythm. Making them sweat, that's what I'm giving them. Now they know, you talk
about the Hammer, you talking about a show that's hyped and tight. Singers are
sweating, so pass them a wipe or a tape to learn. What's it going to take--
MC HAMMER: (SINGING) Legit. Either work hard or you might as well quit. That's word
because you know--
[APPLAUSE]
Congrats. All right, that's it for CS50. Welcome. Cake is now served. We'll see you
next time.
[PROJECTOR CLICKING]
[MUSIC PLAYING]