cs50 Cybersecurity Lecture1-720p MBR-en
cs50 Cybersecurity Lecture1-720p MBR-en
From some faces, some of you have seen this movie around the holidays,
but now, I've taken it off the screen and we'll move on now
with some actual algorithms.
If you'd like to come back on replay and actually see what the answer is,
we'll, of course, leave it on-demand.
So what are some of the actual algorithms
used nowadays for encryption that are best practices?
This rotational cipher that I described earlier, Caesar's simple one,
is not to be recommended.
It's wonderful for demonstration sake and discussion's sake,
but it's not something you should be using in practice unless, for instance,
you're in, say, middle school trying to send a message on a piece of paper
through your classroom of classmates and worried
that the teacher might intercept it and the teacher probably
doesn't have the instinct to or the care to actually
brute force their way through it and figure out what the key is.
But that's the level of security you're getting with something
like that rotational cipher.
But in the real world, with our phones and desktops and laptops today,
generally used our AES or triple DES, both of which
are popular algorithms that have been vetted by the world
and are very commonly used as secret key encryption ciphers
or symmetric key encryption ciphers, which, to be clear,
require that both the sender and the receiver
know and use the exact same key.
And for our purposes today, let me just stipulate
that the mathematics of these two and other algorithms
much more sophisticated and documented in textbooks,
but, therefore, it makes it much harder for the adversary
to figure out, as by trying 25 different keys, what the actual key in use
might be.
Questions now about secret key cryptography or any of the primitives
we've just discussed?
STUDENT: So is it possible that if someone hacks the-- like
gets to know about the hash value-- the hash function of a company that it
is using, he might be able to use the hash values
and use-- like find a reverse function and then get the passwords for that?
DAVID J. MALAN: A good question.
I wouldn't worry mathematically about someone reversing the hash functions,
if only because with all of the ones that are in popular use
today in modern systems, there are a lot of smart mathematicians, computer
scientists, professionals who have vetted, if not proven mathematically,
that these things work as expected.
However, if the passwords that have been hashed are relatively easy to guess,
or if the adversary just gets lucky with whatever technique
they are using, it is absolutely possible to find at least a password,
a input that maps to that hash value, but often
not without significant effort.
And so generally, a company does not want to,
should not try to keep proprietary or secret what
hash function they're using, what encryption algorithm they're using.
If anything, I dare say, it should be reassuring
to the public if and when companies are using best practices and de facto
standards, all of these algorithms are designed
to keep secret not the algorithm itself, which literally can be found
in like university textbooks nowadays and on Wikipedia and beyond,
but rather, to keep secret the thing that's designed to be secret,
which is the key.
And now, if you're using too small of a key like I did originally,
well, then you're just using the algorithm poorly, perhaps.
But so long as you're adhering to best practices
and picking a really big, recommended-sized key,
then things mathematically should be trustworthy.
STUDENT: For an attacker, rather than like basically cracking a hash
or cracking an algorithm, wouldn't it be easier
to just try and access the basic server database
and access the hash function like generated code?
So rather, access how the specific algorithm works.
That way, they can basically just reverse-engineer it?
DAVID J. MALAN: Everything you described is possible.
However, I would push back on this assumption
that the company should try to keep its hash algorithm secure or hidden.
You should trust in the mathematics of what
we're discussing today, both in the context of hashes
and in the context of encryption.
And I've pulled back up on the screen here the number of possible hashes
that exist when using one of the most modern standards for hashing passwords.
This is such a big number--
I dare say, I don't remember how many atoms are in the universe,
but I'm going to guess it's fewer than this, maybe.
The idea is, intuitively, that if the search space of possible hash values
or the search space of possible keys is so darn big,
both you and I, not to speak darkly, are going
to be dead before the attacker actually figures out what
that password or that hash actually is.
So that's generally the presumption.
Most of what we do today in terms of security all boils down
to probabilities and trying to derive the probability of being exploited way,
way, way down, even though, if your password is still 00000000,
doesn't matter if there's this many or more possibilities if the adversary
tries that one first.
So keeping algorithms secret, keeping ciphers secret
is generally not best practice.
You should be trusting that the math and the probabilities
will protect your data if you are using these algorithms correctly.
And how about one more question before we resume?
STUDENT: How cipher work with word?
Not number, like with words, how it work?
How we can cipher-- or cryptograph like our latest with words, not the number,
how it can be work?
DAVID J. MALAN: OK, so if your key is a word and not a number,
let me first say that generally when it comes to encryption,
the keys are not words.
These are not passwords, they're not meant to be used in quite the same way.
These keys are generally generated by the computer for you,
and so as such, they're just random numbers for the most part.
With that said, even if it is a word like apple, there are ways--
and you would learn this in a class like CS50
itself-- to convert a word to the underlying numeric representation.
There's a system called ASCII or Unicode.
So capital A is actually the number 65 in most systems.
Capital B is the number 66.
But we can go one level deeper.
There's actually a pattern of 0's and 1's that represent A's and B's and C's
and so forth, so we can convert everything in the world of computers
to numbers.
And for that, let me encourage you to take CS50x online.
So that, then, is secret key cryptography
or symmetric key cryptography, but it doesn't solve all of our problems,
because I've taken for granted throughout this whole discussion
that the sender and the receiver have a shared secret between them.
Whether it's a simple key like 1 or 2 or 13--
hopefully not 26-- or hopefully some much bigger value.
But there's kind of a chicken and the egg problem there,
so to speak, in English whereby how do you actually establish
a shared secret between parties A and B if A and B have never talked before,
in fact?
So for instance, if you're visiting Amazon.com
for the first time, a popular e-commerce website, or gmail.com for your email,
ideally, and you probably know this already
from just living in the real world nowadays,
ideally you want that connection to Amazon or Gmail to be encrypted,
to be scrambled in some way.
Why?
Well, you don't want your password being stolen by someone.
You don't want your credit card number being intercepted by someone.
You don't want your personal emails being read by other people.
So it stands to reason that encryption is generally a good thing.
And you've seen this, perhaps, in the URL bar
via something called HTTPS where the S literally is meant to mean Secure.
But odds are, you don't know anyone personally at amazon.com
and you don't know anyone personally at gmail.com.
So what key are you going to use to communicate securely
with these websites, not to mention new websites that don't even exist today
but might come online tomorrow, how do you establish a shared secret
with someone else?
So that's a fundamental gotcha or caveat with symmetric key
or secret key encryption, is that it assumes
that you have a shared secret between you and the other person.
But the chicken and the egg scenario comes
in whereby the only way to establish a shared secret
would be to send it to the other person securely,
but if you can't communicate securely, you can't even
send them the secret you want to use.
So you're caught in this deadlock.
Thankfully, thanks to math, there are ways
that we can solve this, too, via not symmetric key cryptography,
but public key cryptography, otherwise known as asymmetric key cryptography.
And among the algorithms here might be these, something called Diffie-Hellman,
MQV, RSA, and others as well.
And I dare say, on this list, maybe RSA is among the most well-known.
It's perhaps an acronym you've actually seen in the wild.
Now what do we mean by public key cryptography,
or more specifically, public key encryption?
Well, in the world of public key encryption,
or asymmetric key encryption, the asymmetry
is implying that you actually don't use one key between the two people
A and B. You actually use two keys.
In the world of public key encryption, everyone in the world
has both a public key and a private key.
And these two are just really big numbers.
There is a mathematical relationship between these numbers, the public key
and the private key, but that's a relationship
that your phone or your laptop or your desktop
figures out when generating these values for you.
So unlike our previous discussion of passwords, which you and I as humans do
choose and memorize or store in our password managers,
when it comes to keys, these are generally,
in the world of public key cryptography, generated for you.
And as the name suggests, the whole purpose of these keys
is to tell the whole world if you want what your public key is.
It is not in any way secret.
You can literally email it out, you can put it in the signature of every email,
you can post it on your website, on social media.
The whole point of the public key is to make it, indeed, public.
But, suffice it to say, the private key should be kept secret by you,
private by you on your own device.
That should never be shared with anyone else.
But the cool thing about public key cryptography and the mathematics
underlying it is that if you share your public key
with someone else on the internet, they can use that public key
to encrypt a message and then send it to you over email
or chat or any other technology.
And if you had to guess, what is the only key
in the world that can decrypt a message that has
been encrypted with your public key?
The only key in the world that can decrypt
a message that has been encrypted with your public key is your private key.
That's what the mathematical relationship ultimately does for you.
So, pictorially here, if this is our algorithm that
implements this idea of public key encryption,
let's see what the inputs and outputs should be.
If the goal is to send a message to you and you
have shared with the world your public key, whoever is sending you
this message uses your public key, their plaintext message, and out of that
comes ciphertext.
That, then, is how asymmetric key encryption works.
Meanwhile, when you receive that message,
you can use your own private key and the ciphertext you've just
received to get back the plaintext.
And this is what we mean by asymmetric.
Unlike secret key cryptography or symmetric key cryptography where
you're using the same key back and forth, plus 1 or minus 1
in the case of the rotational cipher, with asymmetric encryption,
you are using one key for one process and another key for the decryption
process.
So that's what's fundamentally different.
RSA is one of the most popular algorithms for this.
The browsers you probably use every day are probably
using some variant of RSA underneath the hood.
We won't get into great detail about the mathematics,
but one of the most important details about RSA
is that it relies on really big prime numbers.
In fact, in a nutshell, what happens with RSA is your computer or your phone
chooses a really big prime number called p.
It then chooses a really big other prime number called q.
Then it multiplies them together to get a new value, we'll call it n.
And it uses that value n in the resulting mathematics
that the algorithm's authors came up with, dot-dot-dot.
The presumption here is that when you take a really big prime number
and multiply it against a really big other prime number,
it is really hard to figure out from the product of those numbers
what the original p and q were.
And if you're a little hazy on prime numbers,
it's a number that can be only-- that can only be divided by itself and 1.
And indeed, we can use those, coming up with two big ones,
multiply it together in order to get this value n that is subsequently
used in the rest of the mathematics.
What are the rest of those mathematics?
In essence, this.
And this will be the scariest-looking formulas you perhaps
see over the course of this class.
The value n I just described is used as to divide values
ultimately if you're unfamiliar with mod here, this means to, in this context,
take the remainder of some value.
So what are we doing?
Here is a quick summary of how encryption and decryption works
with RSA.
If you have some message m that you want to send to another person
and you have come up with somehow, via the dot-dot-dot process
earlier that I alluded to, you've come up with your own public key e there.
Well then, someone can take their message,
encrypt it by raising that message to the power of e, the exponent of e,
and then divide it, divide it, divide it, divide it by n
and figure out what the remainder is when dividing by n.
That then gives you a value called c for ciphertext.
When you then receive that message c, you can use your private key,
known here as d, and you raise the ciphertext,
its numeric value, to the power of d-- that is, the exponent in d, and you
divide, divide, divide by n in order to figure out
that remainder, which will give you back the original message.
Now that is a significant oversimplification of what's going on,
but that's the essence of the algorithm.
It has to do with picking two very large prime numbers,
multiplying them together to get that value n,
and then using n as well as other values that, dot-dot-dot, are generated
by the algorithm for you, e and d, in order to encrypt and decrypt messages
ultimately.
And this is what's generally known as modular arithmetic.
It involves lots of division and division and division
in order to come up with these remainders,
but ultimately, it is a very secure way to asymmetrically share information
without having to agree on one shared key in advance,
but rather, using a public and a private key instead.
Now there are other techniques that come with this world
of public key cryptography, and another technique is that of key exchange.
So by contrast, if you do actually want to establish
some kind of shared secret, there are alternative algorithms
that different humans have invented over the years.
So there are alternatives to one algorithm or another,
and one of these alternatives is actually
called Diffie-Hellman, named after another pair of authors here.
So here is the essence of the mathematics for this algorithm,
the goal of which is indeed key exchange.
To figure out, using fancy mathematics, how both A and B can come up
with the same value that they can then use as a shared secret,
but without anyone who intercepts any of their messages
being able to figure out what is that shared value, that shared secret.
So what's the essence of the math here?
Well, you first pick a value g, which is called a generator.
It can be as simple as the number 2.
And you pick a big prime number, call it p here.
And those are agreed-upon in advance.
Meanwhile, person A, say Alice, picks her own private key A,
which is another really big number, and then she does this math. g
to the power of A mod p.
And again, mod refers to taking the remainder of some value.
Meanwhile, B, or Bob, still uses the same g, still uses the same p,
picks his own private key called B and raises g to the power of B modulo p,
and that gives him back this value capital
B, whereas Alice had capital A. Then, turns out that Alice and Bob can
send those values across the internet--
A one way, B the other way, and thanks to some fancy modular arithmetic
here, too, Alice can take Bob's B value and raise it
to the power of her A value, which effectively gives you
g to the power of A times B mod p.
Bob, meanwhile, can take Alice's A value that was sent to him,
raise it to the power of his private key B, and then mod p.
So calculate the remainder with respect to p.
The end result, and it's totally fine if these mathematics
are uncomfortable for you or whoo!
Just know that, thanks to some basic principles of mathematics,
this results in both Alice and Bob having the exact same value--
we'll call it s for shared secret--
even though the value never went across the internet in its entirety.
Alice sent part of it this way, Bob sent part of it this way,
but because Alice and Bob held on to private values, the little A
and the little B, they kept that to themselves, they're
able to do these mathematics that ensure that they both came up
with the same value even though you or I, if we intercepted
any one of those messages, we could not figure out what it is.
And now that they have a shared secret s,
they can use that using any of those other symmetric
ciphers we talked about earlier.
AES I put on the board briefly, triple DES I put on the board briefly.
Heck, we could even use this in a rotational cipher
if we really wanted to, but not, indeed, best practice.
So again, don't worry so much about focusing on the mathematics,
but if you were to take a higher-level class in theoretical computer science,
these are intellectual rabbit holes that you could go down to better understand
how the software works.
And now to my comments earlier about not trying
to invent your own cryptographic functions,
this is the kind of reason why.
This is the degree of sophistication that you and I take
for granted in our phones, our laptops, and desktops
that have been vetted by industry and academics alike.
Generally best practice is to rely on standards
that have been tried and tested rather than
try to come up with your own creative cryptosystem, so to speak,
that may very well have faults that you yourself do not know.
And the icing on the cake is that this is ultimately, if curious as
to the underlying mathematics, what value ultimately
Alice and Bob are both calculating, g to the power A times B mod p.
But more on that in a higher-level mathematics course if indeed
of interest.
How about one final building block that you
get from this world of public key cryptography,
and this is one that's going to be increasingly omnipresent,
I do think, in our world, especially as we move away
from very archaic paper-pencil signatures
that you might write with a pen on a paper,
and rather, moving to what we'll call digital signatures as well.
It turns out that once you're comfortable with the idea
of public key cryptography generally involving a public key
and a private key, the first of which is literally public,
you can share it with the world; the second of which is meant to be private,
kept only to you.
And if you can take at face value my claim
that through appropriate mathematics, there's
a relationship possible between these two numbers,
that whereas one can encrypt data, the other can decrypt,
even if you don't care to get into the specifics of the mathematics,
but you just agree that, OK, that sounds reasonable to me,
that that math can work, we can now use that building block
of a public key and a private key to solve other problems as well.
Not just encrypt messages from point A to point B
and back, but rather, to sign information, sign documents,
even, and say, yes, this was signed by David or someone else.
So how does this work?
In the world of digital signatures, here's
a few more acronyms of algorithms that are commonly
used even though we'll continue to simplify them in our discussion.
DSA, ECDSA, RSA, and others can be used to give you
the ability to sign documents or other pieces of information digitally.
So what does it mean to sign something digitally?
It's not at all like this with a unique signature,
it's all mathematics involved.
So, here, then, might be our algorithm for digitally signing
some document or piece of information.
And I claim that the input to this process is a message.
A letter that you've written, a contract that you want to sign,
something that you want to put your digital signature on.
And the output of this message initially is going to be a hash.
So we can use any number of hash functions
we talked about earlier that take as input an arbitrary length
input, like a message, a document, an essay, a contract,
and produce as output a fixed length hash value.
So we've seen that and we've stipulated that is indeed
possible, similar in spirit to our password discussion earlier.
You can even do it for larger inputs than passwords.
You can do it for entire documents as well.
Once you have that hash, here's how you digitally sign the document.
You use your private key, you pass that as input, as well as the hash value
you just computed a moment ago into the digital signature algorithm,
and the output of that process is a signature.
So if you think about this intuitively, what are we doing?
Well, we're taking an arbitrary-sized document.
Maybe it's a letter that you've written, maybe it's
a contract that you've written that you need to sign that might be short
or it might be really long.
Here's where the value of cryptographic hash functions come in.
Recall that a cryptographic hash function, by definition,
takes an arbitrary-sized input and reduces it to a fixed-sized output.
So it doesn't matter how big the original
was, you can distill it into a distinct representation that's shorter.
So, per this diagram, if you take that hash value
and you encrypt it with your private key, what we say
is that the output of that process, which
is just a really big number or some sequence of weird-looking text,
is your digital signature.
Now this is a little weird because what we're doing now
is the opposite of public key encryption.
With public key encryption, remember, someone else
used your public key to encrypt a message to you
and you used your private key to decrypt it.
But in the case of digital signatures, the story gets flipped upside-down.
You use your private key and a hash of your message
to digitally sign your document and the output of that is a signature-- again,
a number or some string of text.
And you send that signature to the recipient saying, this
is my digital signature, you can verify it now if you so choose.
And they should.
So that invites the question, well, how does the recipient
verify your digital signature?
How do they know that this weird-looking sequence of characters or numbers
actually was signed by you?
Well, recall that you have not only a private key, but a public key as well.
And that public key is accessible to everyone, including that recipient.
And so, what happens is this.
When that recipient gets your document and your digital signature,
so to speak, they probably want to and should verify the digital signature
to confirm that, yes, you signed off on that document or contract.
So what does that box look like?
Well, they have received not only the document itself, the so-called message,
they've also received your digital signature.
So you've sent them two things.
And the digital signature, you can think of it like a human signature,
but it's, of course, a big number or a string of text.
But they've sent you two things-- the document and that signature.
So what do you do?
You take the document you've received and you run it
through the exact same publicly available hash
function, because the document might be long,
so you want to collapse it into a short hash representation
thereof, just like our use of passwords.
So that you can just do easily, no private information involved.
But then what do you do?
You then take the public key of the person who signed this document, you
take the signature that they claim is their signature,
and you decrypt their signature with their public key.
That should output the exact same hash that you just calculated.
So to summarize, the message itself the document in this story is public.
It's not encrypted, it's not something you really worry about being private.
What you really care about in this story is
that it was signed by a specific person.
So if that message, that document is available to both the sender
and the receiver, both of them do this first process of hashing the message,
hashing the document just to get some succinct representation thereof.
So it's not this big, it's this big.
Makes the math quicker and easier.
However, what the recipient does is upon receiving not only that message, which
they just hashed, but also your claimed digital signature,
they try to decrypt your signature using your public key.
And here, too, just as the private key can
reverse the encryption done by a public key,
so can the public key reverse the encryption done by a private key.
So if the recipient mathematically gets the exact same hash
after decrypting what you sent them, it must be the case
mathematically that the only person in the world who
could have signed this document is, in fact, you
because they have your public key.
And maybe some third party, some registry,
some company has said, yes, that is David Malan's public key,
you can trust that.
And so, if David Malan's private key has not been compromised,
you can trust that any signature that you can decrypt with my public key
must have been encrypted with my private key.
And it takes a while, I think, for these ideas, and certainly the mathematics
to sink in, but for now, if you just trust
that there's two big numbers in the world, one public, one private,
there's a mathematical relationship between them such that one can reverse
the effects of the other in either direction,
we humans can use this now not only to secure
our messages per our discussion of encryption,
we can also use it to authenticate messages
and attest, yes, this came from David Malan or did not.
And unlike a human signature on a piece of paper
that can obviously just be photographed, duplicated, traced over,
the secrecy of digital signatures relies on keeping your private key private,
and that notion does not exist in the world of human signatures,
and so in that sense, digital signatures are objectively better
than our old-form human ones.
Questions now?
And I know that's a lot, and it's OK if it didn't all go down at once.
Questions on digital signatures, public key encryption or decryption,
or anything prior?
STUDENT: Would these public and private keys be attributed to, what,
your IP address?
DAVID J. MALAN: A good question.
To what are they attributed?
Not to your IP address typically.
They are typically stored in a registry, like a central registry that
knows that this is Vlad's public key, this is David's public key and so
forth.
And it relies on a system of trust and transitivity.
So if you trust this third party company that is storing all of our public keys,
then you can trust whoever it is "they" are, in turn, trusting.
Or it can be more distributed.
Your public key can literally be distributed
in the footer of your emails.
It can be posted on your website.
It can be on your LinkedIn profile or the like.
And so long as other people in the world trust
your emails or your website or LinkedIn, they
can trust that that is, in fact, your public key.
So different ways to implement that system of trust.
Other questions?
STUDENT: Hashing uses a mathematical function and encryption uses
a mathematical function plus a key.
Like the Caesar Cipher basically uses the simple function plus the key.
Is that analogy correct?
DAVID J. MALAN: Yes, that is correct.
And if it helps you-- this is an oversimplification,
but it's generally helpful, I think, to think of hashing as one-way.
So you can only convert a value to a hash value but not the opposite.
But encryption is like two-way--
it's reversible hashing, so to speak.
The output still looks weird and random, but you can undo the process.
And one way to think about this is in the world of hashing,
because I claim that you can take like an infinite domain,
like any possible message you want to send, and convert it
to a finite range,--
for instance, all A-words could be a hash value of 1,
all B-words could have a hash value of 2.
That simple example already captures the reality
that if you only have the hash values 1, 2,
I have no idea what the original input is.
And it doesn't matter how hard I try, I'm never going to figure it out
because it could be apple or avocado or something else that starts with A.
So hashing in that sense, one-way hashing throws away information such
that it's not recoverable.
But encryption does the opposite.
It would be pretty useless if encryption threw away information
because the whole point of encryption is to secure messages and information
we want to send.
So encryption is reversible; hashing, in general, is not.
And, as you know, the key, no pun intended, to encryption
is necessary so that you can reverse the process in a way that
remains secret to other people.
How about one more question, and then we'll take a short break
and then we'll come back and wrap up.
STUDENT: Is there any possibility to spoof the signatures?
DAVID J. MALAN: Short answer, no.
Like so long as you are using a standard that we believe
to be correct and not compromised, so long as your private key has not
been stolen by someone or no one's taken it off of your phone or your computer,
they should not-- it should not be possible to forge it.
The probability is so, so, so low, it should be the least of your concerns
is the idea.
Now it turns out, there is yet one other application
of this world of public key cryptography that solves a problem from last time.
Recall that we ended our first class on a note of emphasizing
that passwords and password managers can improve our security if used properly,
but there's another technology that's becoming increasingly available.
And it's colloquially called passkeys.
Or more technically, it's an implementation
of a standard called web authentication.
And it turns out that these passkeys, which
are available on certain platforms and certain websites and evermore
will be available soon quite shortly, they, too,
rely on public and private keys as follows.
And thankfully now, as fancy as the mathematics
we're alluding to today sound, there really are only two ways
to use these public and private keys--
to either encrypt with one and decrypt with the other or vice versa.
So we have just a fairly basic building block
that we can use in one direction or another.
So how do passkeys work?
In the near-future, as you will find, when
you go to certain websites or applications,
you probably will not be prompted as frequently to type in a username
and pick a password, which is to say, you
don't have to generate a hard-to-guess password,
you don't have to memorize a hard-to-guess password.
You don't have to even store a hard-to-guess password in a password
manager because passkeys eliminate passwords.
It moves us more toward a world of passwordless accounts.
Now how can that be?
Because up until now, we've been using usernames and passwords
to authenticate ourselves.
Well, it turns out, we humans have been getting really good at this math,
even if it doesn't feel like it today, we've
been getting really good at using mathematics
to solve these problems as well.
So imagine the following scenario.
When you go to a website in the future or app,
rather than being prompted to create a username and password,
you'll just be prompted to create a passkey.
What that means is your laptop or desktop or phone will probably
prompt you with some form of factor.
They'll ask you for your fingerprint or they'll ask you for a scan of your face
or maybe a pin code, a short number that you type in just
to demonstrate with high probability that you
are authorized to be using this device and creating this account.
What then will your device and the website do?
Your device will generate a public key and a private key
just for that one website or app.
Your device will send the public key to that new website, along with your user
ID or username, some identifying information
so that they know your David or someone else.
But you don't send a password.
You only send to the website or app your public key.
And you keep private, within your browser
or some other piece of software, your corresponding private key.
And to be clear, this public-private key pair is used only for this one website.
You'll do this repeatedly, but automatically
for every other website in the world in this model.
So what happens when you not register for that website, which
you've just done, but you want to log into it tomorrow,
next week, or next year?
Well, assuming you still have that same device
or you're using some kind of cloud service
that synchronizes all of your past keys, your public and private keys,
across devices--
so you haven't lost these past keys, here's
how you would log in to the website tomorrow, next week, or next year.
The website would send you when you visit a challenge,
and a challenge is like some little message.
It's like a number or a word or a phrase.
It's some piece of randomly-generated data
that the website wants you to digitally sign.
Well, how do you digitally sign information?
I proposed earlier that you can use your private key
and pass that key and that challenge, which is just a random input given
to you by the website, into your digital signature algorithm, this black box.
And the output of that, as before, is your signature.
And what is your device do?
It sends that signature for that challenge to the website.
And if you followed along earlier well enough,
you might now realize where we're going with this.
How does the website now verify that that is, in fact, your signature?
That this did come from David's device and not some adversary online?
The website, because it's stored yesterday,
last week, last year, your public key, it
will use your public key to decrypt your signature
using the same algorithm to get back hopefully the same challenge value.
And if the output of this verification process
matches the challenge the website sent you a second before,
it must be the case mathematically that you
are, in fact, who you claim to be because it
was your device that registered for this website a day, a week,
a year ago as well.
So again, if we trust in the mathematics here
and we trust that these algorithms allow us to encrypt information and decrypt
it using a public key and private key, or conversely,
a private key and public key, we can, with very, very high confidence,
probabilistically say, yes, this is David Malan,
I'm going to allow him back into this account.
So what's the implication of this passwordless world that
uses passkeys keys, or web authentication more technically?
It means that we're getting out of the business, potentially,
as a society of having to remember dozens
or hundreds or thousands of different passwords for all of our accounts.
It does require, though, that we don't lose the device or the devices that
registered for these websites or apps, but again, increasingly,
as the world providing cloud services, whether it's with Apple or Microsoft
or Google or others, that presumably can synchronize
your passkeys across devices and will conclude ultimately today,
by talking about how they can be synchronized securely, even
without Google and Microsoft and Apple knowing what your own passkeys are, so
long as they provide us with a certain technical guarantee.
So the upside of this is we can move away from passwords,
and you can even share these passkeys with other people if you so choose.
The catch is, right now, they're not omnipresently
available on every website out there.
It's probably going to take some time for the world to come on board,
but I do dare say, in the coming weeks, months, and years,
you will see passkeys increasingly offered to you.
And so indeed, the next time you visit a website
that asks you, hey, do you want to register with your fingerprint
or with your face or with a PIN code?
And you're never even asked for a password, odds are,
it's using this passkey technology instead.
Well, let's go ahead and take one more five-minute break here,
and when we come back, we'll talk about securing data
as it's moving back and forth and sitting on our own systems.
All right, so we are back.
And allow me to claim that we now have a bunch of ways
to hash data and also encrypt data and also now, decrypt data.
So how can we use these building blocks to solve
some other perhaps familiar problems?
Well, there's this notion of encryption in transit,
which is a fancy way of saying that you and I probably prefer nowadays
that our data be encrypted whenever it's traveling from point A
to point B. Whether that point B is Amazon.com, Gmail.com,
WhatsApp, or any other service that we're communicating with,
we ideally want no one in between us-- some machine in the middle, so
to speak, to be able to get at that same data.
Because in particular, what you should be worried about
is a scenario like this where if Alice is trying to communicate with Bob,
you might worry that there's some eavesdropper, so to speak,
named Eve between Alice and Bob.
And maybe this is via wires nowadays on the internet.
Maybe it's somehow wirelessly.
Maybe Eve actually represents a company that Alice and Bob
are communicating between, like Gmail or Outlook or the like.
So encryption in transit, though, is important to distinguish
from other forms of encryption.
In particular here, Alice might very well
have an encrypted connection not to an eavesdropper, per se, but just
a third party like Gmail.
So assume that Eve here is Gmail.
And meanwhile, Bob, when checking his email account,
has an encrypted connection to Eve as well, which, in this story now,
is Gmail.
So Alice has a secure connection to Gmail and Bob
has a secure connection to Gmail as well,
but that does not mean necessarily that Alice has a secure connection to Bob.
Security does not really work through transitivity, so to speak.
This might very well mean that the data is only
encrypted while in transit from A to E and from B to E,
but that doesn't mean that Eve, or Gmail in this story,
can't be reading all of Alice's and Bob's emails.
And indeed, that is technically possible on Google's end.
They, of course, run all of the servers that your Gmail accounts might be on.
There's nothing technically probably stopping them
from reading anything and everything.
Now hopefully they have policies.
Hopefully very few humans actually have the privileges or the authorization
to even do anything close to that.
But technically speaking, just because Alice has a secure connection to Gmail
and Bob has a secure connection to Gmail,
that doesn't mean that their communications will
be encrypted entirely between A and B. And there are lots of examples of this
as well.
Zoom, for instance, when it comes to video conferencing,
you might have an encrypted connection to Zoom,
I might have an encrypted connection to Zoom.
That does not necessarily mean that Zoom couldn't be Eve in this story
listening and watching everything that we're saying while video conferencing
as well.
So encryption in transit is good in that it at least keeps random people out
of the picture because they don't have access to these encrypted channels,
but if there is this third party, this machine in the middle
or company in the middle, even they might have access to data that we
do not want them to have access to.
So what, then, is a stronger alternative?
Increasingly possible, increasingly available, and something you as a user
should be looking for with greater frequency is what
we would call an end-to-end encryption.
This is a stronger guarantee whereby you can
trust that Alice's connection to Bob is, in fact, secure
even if-- not pictured here, there are 1, 2, 3, 4 machines in the middle,
companies in the middle, eavesdroppers in the middle.
If you use encryption properly end-to-end,
you can ensure that the only thing Eve or Google or Zoom can see
is just your ciphertext, the seemingly random strings of text
or 0's and 1's that represent your encrypted data, but without your key,
they have no idea what that data actually is.
So end-to-end encryption isn't necessarily in most
company's best interest.
Why?
Well, companies like Gmail tend to presumably mine our data,
whether it's for advertising purposes or otherwise.
And so it's sometimes in companies' interest to have access to your data
to keep it secure on their servers, but still
in a way that they have access to it.
Now that might be not comfortable for you.
And so there are alternatives.
For instance, iMessage for Apple users and WhatsApp
internationally is known in particular for offering end-to-end encryption
which, if implemented truthfully and technically correctly,
should guarantee that even though your messages might
be going through WhatsApp servers, no employee at WhatsApp
can actually see your messages because it's encrypted
all the way from A to B, even though it's
going through a potential eavesdropper.
But that depends on exactly what form of encryption you're using,
and if it's not end-to-end, it might only
be encrypted in transit such that Eve's, that eavesdropper,
might indeed have access to the data.
So as to how you can use end-to-end encryption,
it's an option that a service must provide to you in this case
or you must choose services that offer it.
It's not necessarily something that's always available,
but it is increasingly available in different software.
So let's now consider a fairly mundane operation,
but one that has implications for these same technologies and solutions.
That is, deleting a file, be it on your Mac or your PC
or your phone or some other device.
Now where is data stored in your devices?
Well generally, it might be in a device like this,
a large, somewhat older but large hard drive that
can store lots and lots of files and folders,
or perhaps something smaller known as a solid state
drive that might store information entirely digitally
without any moving parts.
And even smaller might be something like this
that you carry around like a USB stick, and they are even smaller nowadays,
too, that similarly stores some data digitally.
Now how do we go about deleting files from a computer or any
of these devices?
Well, you typically click it and drag it somewhere, or maybe you right-click it
or maybe you tap and drag it to some trash or the like.
There's any number of user interface mechanisms for deleting files,
but let's consider for our purposes what happens underneath the hood.
So let me stipulate that your hard drive, your solid state
drive, your USB stick just contains ultimately
a whole bunch of 0's and 1's, and those 0's and 1's represent your files
and folders.
So when you go about deleting a file, by dragging it
to the recycle bin on Windows, or dragging it to the Trash
Can on macOS, what actually happens?
Well, it turns out, not anything at all, really.
When you recycle a file on Windows or when you trash a file on macOS,
it doesn't actually get deleted in the sense that you and I might expect.
By delete it, I mean it's gone.
I don't want to be able to find it anywhere.
OK, wait a minute, though.
Of course, we all know by now, at least on computers,
you at least have to empty the Recycle Bin or empty the Trash Can.
So OK, maybe I missed that step.
But even then, contrary to what you might expect,
emptying the Recycle + Bin, emptying the Trash Can also does not generally
delete the data.
And here's where I'd, again, emphasize, wait a minute,
when I delete a file, I want it gone, removed from my computer altogether.
But what macOS and Windows and operating systems in general tend to do instead,
when you even empty the Recycle Bin or Trash Can,
they don't actually get rid of the file, per se, they just forget where it is.
Somewhere in the computer's memory, there's
like a spreadsheet of sorts, some kind of database or table
with at least two columns, one of which has the name of your file
or the location of your file, the other of which
has some kind of reference to which 0's and 1's on your actual computer
implement that specific file.
Maybe these 0's and 1's are for one file, these 0's and 1's are
for another file, and so forth.
So somewhere, your computer is keeping track of what
is where physically on your computer.
But when you delete a file by emptying the Trash or Recycle Bin,
the computer just, eh, forgets where it is.
And more importantly, it frees up the space so it can be used later.
So what do I mean by that?
Well, suppose I do go ahead and delete a file
and empty the Recycle Bin or Trash Can, and suppose
that these yellow 0's and 1's represent the file that I no longer care about.
Well, what's actually going to happen underneath the hood, so to speak,
of the computer?
Well eventually, some of those yellow 0's and 1's might just
get reused for other files.
In other words, these 0's and 1's highlighted in yellow
represent a file that used to be there, but is not.
That is equivalent to saying some other file can now use those same
0's and 1's.
And so here's some random 0's and 1's that may be overwrite some of the file,
but not all of it.
Notice, there's still a bunch of yellow 0's and 1's here
in my depiction of my computer.
So it turns out that over time, yes, your file will probably
get actually deleted.
What do I mean by that?
Eventually those 0's and 1's will be repurposed, changed from 1 to 0,
changed from 0 to 1 such that your file, for all intents and purposes,
is actually gone, because it's been repurposed, that space, altogether.
But notice, at least at this point in time,
and shortly after you delete a file, even if you've created or downloaded
new files, there might still be parts of your files
around, which means that sensitive word document or Excel file or images
that you had on your computer, there might still be remnants of them,
just a few lines from any of those.
So you should realize that deleting a file doesn't really get rid of it
in the way you might expect or hope.
To do that, you need to be a little better with practices.
Now what do I mean by this?
Secure deletion is another beast altogether.
And typically when we delete files, they're not deleted securely.
They're not deleted typically in a way that you would hope.
So secure deletion does what you might really hope for, get rid of this file
altogether.
So if we go back to the original contents of my computer
with all of these here 0's and 1's, and suppose
that I want to delete this file here at the top of the screen,
in an extreme ideal world, those 0's and 1's would just be gone.
Like that's pretty darn secure.
Those bits, those 0's and 1's, they don't even exist anymore.
Now this is probably not the best way to securely delete information
because if I just got rid of those 0's and 1's somehow, like my hard drive
is getting like literally smaller and smaller
in terms of how much stuff I can put on it if I don't have as many bits
or 0's and 1's available.
So that's probably not the best long-term solution
because it's expensive.
It's like getting rid of some of my capacity.
So we don't actually do that, but how might we securely delete a file?
I don't think we want to just wait and hope that those 0's and 1's eventually
get reused by the system because we might still
be left with some remnants which might not be ideal.
So what we can do when securely deleting a file is something like this--
change all of the 0's and 1's that we don't care about anymore or want,
change them all to 0's.
And this will effectively securely delete the file
because now the 1's that were previously there
that represented some piece of information are just completely gone.
Or equivalently, I could change them all to 1's.
Or I could even change it to random 0's and 1's.
The point is, to securely delete a file, you
should change all of the 0's and 1's to at least some other pattern
so that the file is effectively gone.
Now how can you use this to your benefit?
Well, some operating systems nowadays support
what's called full-disk encryption, and this is good for a number of reasons.
One, if you enable a feature called full-disk encryption,
which is actually a specific incarnation of an idea known as encryption at rest.
Encryption in transit refers, of course, to your data going back and forth
from point A to point B. Encryption at rest
means it's just sitting there on your device, in your pocket, or on your lap
or on your desktop, sitting unused, maybe on or off.
So when it comes to full-disk encryption or encryption at rest,
you ideally want all of your data somehow encrypted on your Mac,
on your PC, on your phone.
And only when you log in with your password or maybe
your fingerprint or your face should that data be decrypted automatically,
and this can happen pretty darn fast nowadays with modern hardware,
should the data be unencrypted so you can actually
use it and interact with that device.
So why is this advantageous?
Well, one, if your device gets stolen, so long
as you're not logged into it, so long as it's locked,
so long as the lid is closed, so long as it's unplugged or any other number
of scenarios, at least if someone takes your laptop from the table in Starbucks
or the cafe, well, hopefully, if you have
a good password or good biometrics, they're
not going to be able to get any of your data.
They can maybe delete all of your data and they can
and sell your computer, they can use your computer, but they probably,
if you're practicing best practices, don't have access
to the data that's on the system.
Why?
Because it's completely encrypted at rest and they don't know your password,
they don't have your fingerprint, they don't have your face,
they should not be able to decrypt that data.
So in other words, if this is my unencrypted data,
the way I want it and need it when I'm using my computer,
full-disk encryption, at rest, would change my entire computer
to look random.
These are random 0's and 1's now that I generated by using,
for instance, my password or my fingerprint or my face.
And this is what your hard drive or your solid state drive
should look like when the lid is closed, when the power is off.
When you are logged out of it, it should be random 0's and 1's.
And the upside of this now is that, again,
if it's stolen while in this state, there's no data to be used
by the adversary because it looks like random 0's and 1's.
Better yet, if you deliberately want to get rid of the device
because you want to trade it in for resale value,
because you want to donate it to someone else,
because you want to sell it to someone online,
when using full-disk encryption, the upside
is that so long as you had a really hard-to-guess password, your data is,
for all intents and purposes, securely deleted already.
Because only if the new buyer figures out or knows
your password or has your same fingerprint or has your same face,
they're not going to be able to access any of your data anyway.
And this is important nowadays because it turns out, with modern hardware,
even if you might want to change all of the 0's and 1's to all 0's or all 1's
or all random data, it turns out that today's hardware can fail over time.
So even little USB sticks or solid state drives over time can kind of wear out.
But they're smart enough, thanks to software
known as firmware inside of it, as soon as the device realizes, wait a minute,
those bits over there aren't working properly anymore,
the device might not let you change them to all 0's or all 1's or a random 0's
and 1's anymore.
It might just leave them as is forever.
Which is to say, it's even more important to start
using full-disk encryption, encryption at rest,
when you first get a device because that way,
you can trust that even if parts of the device degrade over time,
all of the data that's there and has been there
was at least encrypted with one of your passwords or one of your biometrics
in the past.
So this is the kind of feature to look for in your Mac, your PC, or your phone
to ensure that it is somehow enabled.
Thankfully, once you log back in with your password,
it goes back to the original data and you can use it.
Of course, then, an implication of this best practice
is that if you lose your laptop or your phone
or your desktop's password, or your fingerprint somehow changed,
or your face sufficiently changes, you might be locked out
of all of your data, too, but again, that's
just another example of this trade-off between usability and security as well.
Now a downside, an evil side to full-disk encryption
is ransomware, which is how adversaries are monetizing attacks.
It's not uncommon nowadays for hackers, for adversaries,
when they get into a system, whether it's your laptop
or, for instance, a corporate network, or in some cases, hospital
systems or a city's own computer networks, to not try to do any damage
or just do something like spam or cryptocurrency mining,
but to actually encrypt all of the data on these systems they somehow
accessed online.
Why?
Well, if they encrypt all of the data they can then ask for a ransom
and say, listen, if you don't give me this many bitcoins,
I'm going to give you the key that I used to encrypt your data.
And if you poke around online, there have been many examples of this,
unfortunately, where hackers have gotten into systems that were not
very well-protected, all of the data therein was encrypted,
and this is an opportunity for the adversaries
to try to extort, say, financial gain from a situation
by then only handing you the keys, if ever, once you've actually paid up.
And there, too, there's the risk, as in any ransom scenario,
where who even knows if they're going to give you the proper key in the end,
but this is increasingly a concern for municipalities, for companies,
for universities, and the like.
So just as we have some upsides here, there,
too, is this trade-off in what you can do.
And lastly, we thought we'd end on a note about the future
because this is a topic that will come up
and has come up over time, this topic of quantum computing.
So for those less familiar, we've been talking a lot
about bits, 0's and 1's today, and at the end
of the day that's how today's computer systems are implemented.
Patterns of 0's and 1's to represent numbers and letters and colors
and videos and sounds and everything.
We've been discussing today data more generally.
Now typically, in our world now, a bit, a binary digit, can either there be a 0
or it can be a 1, as per the diagram we had on the screen in these examples.
Either a 0 or a 1.
In the world of quantum computing, thanks to some very fancy physics
and quantum mechanics in particular, it is possible,
it seems, physically, for us to implement the idea of bits a little bit
differently using quantum techniques.
And there's this idea of not just a bit, but a quantum bit or qubit whose power
derives from the reality that physically, you
can implement a qubit in such a way that it is representing both a 0
and a 1 at the exact same time.
So it can be not in just one state, so to speak,
one condition at once, but two states at once.
And if you have two qubits, they can be in four states at once.
If you have three, they can be in eight states at once.
If you have 32 of them, they can be in 4 billion states at once.
Now what's the implication of this?
Well, when we talk about cryptography, when
we talk about hashing, when we talk about just very large numbers
and trying to figure out via brute force or some other mechanism
what some input to a function was, if you have exponentially more computing
capabilities by not being able to do one or two
things at a time with individual bits, but two or four or eight or 4
billion things at once, it stands to reason
that if adversaries have access to quantum computing before you
and I do, then all of the security you and I now rely on
and that we've talked about today could suddenly become insecure.
Because we're trusting right now that it's just
going to take the adversary a lot, a lot,
a lot of time, maybe money, maybe resources,
maybe risk to attack our accounts.
But if they have exponentially more resources than you and me,
then our data really is at risk.
And all of the mathematics we've been trusting need to be hardened instead.
Now hopefully you and I will have access to quantum computing at the same time
as or ideally before all of these adversaries,
so hopefully our algorithms for securing information
will continue to evolve along with these technologies.
So this isn't necessarily something you need to worry about for now.
Indeed, I think after today, we have more than enough to worry about.
So for today, that's all.
We'll see you next time.