Lec 20
Lec 20
Prof.D.Mukhopadhyay
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Model No # 01
Lecture No # 20
Stream Ciphers
In today’s class, we will continue with stream ciphers. Yesterday, you have seen some
small examples to understand that stream. The problem was I was interested in
reconstructing the stream cipher that is given in the sequence. Can I again reconstruct the
corresponding LFSR? In today’s class, we can essentially talk about two topics - we will
discuss about linear complexity, I think I started with this and I am continue with this,
and mainly will concentrate on this algorithm, that is, Berlekamp Massey algorithm,
which is used to solve this problem. It is a very phenomenal and seminar paper. So, we
will discuss on this. And it is quite important.
I hope you understand why we are studying this? Because consider a very simple stream
cipher. Essentially have got a key stream k, for example, and suppose you are just taking
M i and obtaining the corresponding cipher text C i . In context models, like for example,
chosen plaintext or non-plaintext models, where the corresponding value of M i and the
corresponding value of C i , you know the value of K. But the point is that, given this
value of K, can you also know the corresponding initial secret key with which it is
started. This is the key stream, not the actual key. So, there is an internal algorithm here
and there is an input secret key, which is been provided as an input. Right? There is a
secret k or key, which is the actual secret of the cipher, therefore, the observer is
essentially observing at this point. From there, he is interested in constructing the LFSR
and also the key.
We are considering a single LFSR system and trying to understand that whether a single
LFSR system, which has got nice pseudorandom properties, can be reconstructed using
Berlekamp Massey algorithm. If you remember the problem, I can essentially decompose
or rather write all these key bits as linear combination of the internal secrets. How many
secret values are there? For N length LFSR, there are actually two N secrets. You do not
know the initial seat that gives you N bits. You also do not know the corresponding
connection polynomial. Another thing which you do not know is the length of the LFSR.
Therefore, you will find that can you actually write them as the system of linear
equations and solve them. But this is an unwilling process.
(Refer Slide Time: 03:13)
Berlekamp Massey algorithm gives a very simple and elegant technique in order to solve
this problem. With this motivation, we will study this problem. First of all, this is a recap
of at the LFSR structure, so you see that we have got S j minus 1 to S j minus L. Please
pay attention because there is certain amount of formalism required to understand the
basic principle of the algorithm. There is some amount of inverse mathematics.
Therefore, S j minus 1, S j minus 2 to S j minus L plus 1 and S j minus L. How many flops
are there? There are L flip flops or L storage element. So, the corresponding feedback S j
you get as a linear combination of these points. But whether you are taking the feedback
or not depends upon a control bit. There are also L control bits here. You take this and
multiply with this corresponding feedback and you exhort that with the previous thing.
So you know what to read out this particular equation. It says that S j is equal to the
sigma of C i multiplied with S j minus I, and your j starts from L. why? Because before
that you have just streaming the data, therefore, the feedback effect does not come before
this. j equal to L point. The problem starts when only when j is equal to L, and after that,
this has to be a linear combination of the previous L values. This is a very important
equation which we have to keep in mind, that is, S j is equal to sigma i equal to one to L
C i S of j minus I. To remember, it is just the coefficients multiplied with the previous L
values of the LFSR.
This is actually trivially true. Then, length has to be greater than equal to 0. But the
problem will be essentially therefore considering when N is actually greater than equal to
L. When N is greater than equal to L, it is the interesting problem. In that case, first of
all, we will start with contradiction. Let us contradict this statement. If I contradict this
statement, that means, L dash greater than equal to N plus 1 minus L is contradictory.
Therefore I am assuming that L dash be less than N plus 1 minus L. Therefore L dash is
less than equal to N minus L, so I can write this as L dash is less than equal to N minus
L.
This is by contradiction. We have just contradicted the statement of the theorem. And
what we will see that if this is true then we end up with something which is wrong, that
is, we are violating the initial starting point. What are the starting points? Let us see this.
This is the case 1, that is, L greater than equal to N. The theorem is trivially true for case
2. Let us consider two LFSR’s. The two LFSR’s has got, say for example, C 1 to C L and
C 1’ dash to C L dash. These are the corresponding coefficients of the two LFSR’s. We
have assumed that L dash is less than equal to N minus L, that is, by contradicting the
theorem statement. This is cleared that in this corresponding polynomial LFSR, the first
LFSR has generated the sequence till N minus one. Therefore, I can write this equality
but this is not equal to S N . Why? Because it is not generating the corresponding S N
value. That is the N plus one th value in the sequence. But this LFSR is generating all the
values, therefore, you start from L dash and continue till N. This equality holds,
therefore, these two equations are clear to us. This is a simple statement of where we
have started with, that is, the initial theorem statement basically. This LFSR is generating
till N minus 1 values but this one is generating till the nth point, that is, nth value. But
this one is not generating the nth value, therefore, this inequality. Is it okay?
(Refer Slide Time: 10:58)
What we will do is essentially this -- you start with first LFSR coefficients, that is,
consider sigma C i S N-1 . Therefore, I am calculating this particular value when j is equal
to N. I am trying to evaluate this and I will show that if this is true, then this is actually
equal to S N . Therefore that contradicts my initial starting point. You understand the idea?
So, how do I start to prove this? You can see that sigma if I just write sigma i equal to 1
to L C 1 S N-i ,
(Refer Slide Time: 12:23)
We can actually substitute this value. Why can we substitute this value? Do you
understand this? Because in order to substitute, you note one thing that this has to lie in
the range of j, otherwise this definition does not hold right. Therefore, you see that here
you consider this particular sequence. This says, the sigma i equal to1 to L C i of S N-I and
we note one thing that what are the two extreme ends of this sequence? In that case, in
this corresponding sigma value, it starts with S N-1 and continues till S N-L . We have
assumed that N minus L is greater than equal to L dash.
N minus L is actually greater than equal to L dash, therefore, this particular sequence –
S N-1 to S N-L is actually a subset of S L dash to S L-1 . Why? Because L dash is smaller than
N minus L. This is essentially a subset of this particular sequence.
If I consider this sequence from S L dash to S N-1 , then L dash, because of contradicted L
dash, was less than equal to N minus L. Therefore, this is actually a smaller value as
compared to S N-L . Therefore, this is a bigger sequence as compared to this.
It becomes equal to C k dash S N-1-k . Till this part, is it clear? Now, you can interchange
these two sigma values. So, you can bring this one here and bring this one here and again
you have got this. If you consider the next sigma, it is C i S N-1-k . Here, I can use this
particular equation and instead of this, I can write S N-k . Note that you can again write this
because S N-L dash till S N-1 is a subset of S L to S N-1 as L dash is less than equal to N
minus L. Therefore, if you write this… Now, you see that what you have essentially
obtained here is nothing but equal to S N .
Therefore, what you had essentially contradicting is the initial starting point, that is, the
first LFSR cannot generate the S N digit. Where did we go wrong? We essentially
assumed this, which was wrong. Therefore, this proves the theorem, that is, L dash is
actually not less than equal to N minus L but it is actually greater than equal to N minus
L plus 1 or N plus 1 minus L. Did you understand the principle? I mean, you can always
go back and look at the proof in details but this is the idea that if there is a particular
LFSR, which is unable to generate sequences from S 0 to S N , we generate the sequence
S 0 to S N-1 .
(Refer Slide Time: 17:19)
Then you need to add on to that length. Therefore, if you add on to that length and the
length becomes L dash, where the previous length was L, then there is an definite
relation between L dash and L, and that is what we have proved in this theorem. Now,
you can actually better understand the linear complexity problem. What is the linear
complexity problem? This is what I defined -- this is the minimum length of all the
LFSR’s from which we generated the sequence S 0 to S N-1 . So, clearly you can see that
L N (S) will be less than equal to N. Why?
(( ))
Yes. Therefore, if I am considering S 0 to S N-1 , any N bit LFSR can definitely generate it.
The problem is can I obtain lesser than that? Another thing you note that moreover L N (S)
must be monotonically decreasing. Actually this only monotonically non-decreasing, so
there is a mistake. It only monotonically non-decreasing with increasing value of N.
Moreover, L N (S) must be actually monotonically non-decreasing with increasing N.
Why it is true? Because you will straight away contradict the initial hypothesis. So, we
will start with certain conventions. The conventions are that the all 0 sequence is
generated by the LFSR, whose length is 0. We are trying to develop a recursive
contraction. There are always some initial starting points for any recursive algorithm.
These are the starting points, that is, all 0 sequence is generated by the LFSR, whose
length is L equal to 0. And if, S 0 to S N-1 are all 0 and your S N is equal to 1, the length
which is required is actually equal to N plus 1.
This we saw for our example. If you remember 0 0 1, 0 0 could have been generated by a
0 length LFSR but 0 0 1 was generated by a three-stage LFSR. Therefore, this is what it
says exactly and you can actually… These are the conventions. Now let us consider
another lemma. So, lemma is if some LFSR of length L generates the sequence S 0 to S N-
1 but not the sequence S 0 to S N-1 S N , that is essentially you see that this S N is not being
generated, then you can actually show that a linear complexity for N plus one for this
sequence is actually greater than max of LN (S) or N plus 1 minus LN (S). This is actually
quite trivial. This follows from the previous idea that we know it is monotonic, and
therefore, L N L N+1 (S) has to be greater than equal to L N (S). We also prove this result.
So, it has to be greater than the maximum of these two. Therefore, it is either greater than
equal to the maximum of this or whichever is the maximum value, it has to be greater
than that. So LN+1 (S) is has to be actually greater than equal to maximum of LN (S), N
plus 1 minus LN (S). This is an interesting lemma, which will help us to recursively
compute the linear complexity.
(Refer Slide Time: 21:24)
So, you note one thing that when do you require an updating, I mean if N plus 1 minus
L N (S) is actually greater than L N (S), then you required to add on to the length. And
when this happen, it means that N plus 1 is greater than 2 LN (S). So, I can write this as N
is greater than equal to 2 L N (S). For rest of the cases, updating is not required. That is,
when it is less than 2 LN (S), then updating is actually not necessary.
So, the length gets updated only, depending upon certain cases. But whenever there is an
update, the update will happen only when N is actually greater than equal to 2 LN (S). Do
you understand why? Because of the monotonicity again. Yeah. Now, we are actually
more or less trying to understand the Berlekamp Massey algorithm. It is a recursive
algorithm which produces one of the LFSR’s of length L N (S), that is, minimum length.
L N (S) is minimum length, which generates the sequence. We generate the sequence S 0
S 1 to S N-1 for any integer value of N. For this again, I let us look back at the connection
polynomial.
So, you had C D equal to one plus C 1 D plus, and so on till C L D power L. This was my
connection polynomial, which has degree at most L in the indeterminate. The
indeterminate in this case is D, in the variable D. So, the convention is that second areas
starting convention follows from the previous thing, that is, C D is equal to 1 for the
LFSR of length L equal to 0. If length L is equal to 0, then we will assume that C D is
equal to one. This is just convention. Therefore, for a given sequence S, I will write that
C and D is equal to 1 plus C 1 N D plus, and so on, till C LN(S) N D to the power of L N (S).
This is just a rewriting of the previous connection polynomial. Only thing is that I have
specified that the length is actually equal to L N (S), and all these sequences if you note
carefully are actually denoted like C 1 C 2 and C 3 so on till C LN(S) . But there at the top, I
have written mean values to indicate that this is a generated of the sequence till S N-1 .
That is from S 0 to S N S N-1 . This LFSR essentially generates the sequence S 0 S 1 , and so
on, till S N-1 . So, what we are interested in calculating in that case? C N+1 D. Why we are
interested in calculating the value of C N+1 D because C N+1 D will give us the
corresponding coefficients, which will generate the sequence S 0 S 1 to S N . Now, you see
that…
Before going into the next thing, we will try to understand certain things. That is, we will
try to prove a recursive constriction of this particular polynomial, that is, C N D, and at
the same time, we will also show that lemma one that we stated…. We essentially till
now proved an inequality, that is, L N+1 is was actually greater than equal to the
maximum of LN (S) or N plus 1 minus LN (S). Actually that greater than equal to should
be replaced by equality, that is, it will be equality instead of a greater than equal to. In
order to prove that, we will essentially develop a proof, this merges these two proofs. So,
we will prove that equality by induction and at the same time we will give you a proposal
for a polynomial, which will be a candidate for C N+1 D.
This is a constructive way of proving. This is not an existential proof but a constructive
proof. So, we will not only prove the existence but we will also show you how it is
constructed. In order to do that, we will define something called discrepancy, which we
have already seen. Why? If you remember the previous example, we were able to
calculate the values till a particular point. For example, if you remember, we had 0 0 1 1,
that 0 0 1 can be quite trivially constructive. How? You can just take three-stage LFSR
and you can just write the feedback. The feedback polynomial could be simply a shift
register like this, so you can take 0 0 and 1 and you can stream out the value of 0 0 1.
What is the corresponding polynomial here? C D is equal to 1 plus D cube. But the
moment you see that you consider this particular 1, the next thing that if get feedback fed
back? it is actually a 0. Right? When you want a 1 here, you are actually getting a with
this particular LFSR, and you are getting back 0. That means, there is a discrepancy. This
is the idea of discrepancy. What are you actually getting back and what you want –exhort
between these two. The moment you see that you actually get back 1 and you want is 1,
you know that you have to make certain modifications in this structure. I will just do a
simple modification, I will introduce exhort here, and take this feedback.
In that case, this LFSR will generate the sequence 0 0 1 and 1. If you remember, the
previous thing what we did is we actually solve this and found this. So, you see that you
get 0 0 1, that is quite trivial, but the next thing is an exhort between 0 and 1, so you get
back 1. Till this point is it fine? What is the updated polynomial now? C D. If I write in
terms of C and D, this was C 3 D but this is C 4 D. What is C 4 here? 1 plus D plus D cube.
Therefore, you see that you are able to construct the corresponding polynomial but when
you are doing this algorithm, you will take care of the fact that there is a discrepancy at
this point, and accordingly modify the polynomial. And when there is no discrepancy,
there is no modification required. You can just go ahead with the previous polynomial --
previous construction of the LFSR.
Yes.
(( ))
Therefore, immediately, you can understand that the corresponding discrepancy for all
these values till m minus 1 was equal to 0, but when j was equal to m, the discrepancy
was a non-0 value. By induction hypothesis, you can understand that Lm+1 (S) is equal to
L n (S). This we have already discussed. By induction hypothesis, Ln (S) is actually equal
to max of Lm (S) and m plus 1 minus L l m S. You can apply this and note that L m (S) is
actually less than L m (S). Therefore, rather this, this q is something, which is generated
the software. So this is actually a since. Since L m (S) is less than L n (S), therefore L n (S) is
actually equal to m plus one minus Lm (S). So, L n (S) is actually m plus 1 minus Lm (S).
Now, please remember this value till this point, that is, L n (S) was equal to m plus 1
minus Lm (S). So after this, we will give you a proposal for the next candidate. Till now,
we have assumed that we know C m D and we would like to find out C n plus 1 D,
which generates the sequence till S n .
That is the idea. But let us see this more closely to understand this phenomenon. So,
what is the degree of C D? The degree of C D would be the maximum of this degree.
What is the degree of this? Ln (S). And, what is degree of this? Lm (S). But n minus m is
multiplied, so it is n minus m plus Lm (S). Note that n minus m plus L m (S) can be
actually written equal to n plus 1 minus L n (S). Why? Because of the previous equality
that we found out. If you remember, the previous equality L n (S) was equal to m plus one
minus Lm (S). If you substitute this value here, you obtain this. Therefore C D is actually
an allowable connection polynomial because maximum length is inside this. Now, only
one thing remains to prove that C D does the connection. Because if C D does the
connection, then what you have proved is the that the length, that is the next length, that
is actually equal to maximum of Ln (S) and n plus 1 minus Ln (S).
So, you have proved that Ln+1 (S) is actually equal to maximum of Ln (S), n plus 1 minus
L n (S). And therefore the induction gets proved. We get the proof by induction. But till
now, we have actually not proved one thing -- C D actually does the correction.
Therefore, what we now need to prove is that C D does the correction, which means it
generates the sequence digit S n , and at the same time does not disturb the previous
sequences. So, in order to understand this, let us observe the value of C n D.
How can I write? I mean the coefficient will be correspondingly C 1, it will be equal to C 1
n and so on till C n-m . C n-m will be C n-m n exhort with 1 because you will have
correspondingly this particular term will give you a coefficient of 1 and so on. I mean
you can write C n-m+1 , this will be equal to C n-m+1 an exhort with C 1 m. So, subsequently
you have till this particular point, therefore, after a point there will be an exhort. Now,
note that if I need to find out the corresponding discrepancy value, I shall be interested in
computing this value -- S j exhort with C i S j-1 sigma, where I runs from 1 L.
I need to ensure that this is actually equal to 0 for j running from L to n, for all the
values. So you note that I can now write this in two separate parts because if you
remember, the corresponding coefficient C i , that is, these coefficients, I can write as an
exhort of this exhort with this. Therefore, this gives you the first part and this gives you
the second part. If I am interested in computing S j exhort with C i S j-I, where I runs from
say, 1 to l. Then this will be equal to S j exhort with sigma C i n S j-I. This is one part. and
the other part will be the corresponding part from S j-n+m plus the other part of the sigma.
What is the other part of the sigma? I hope you have understood why I am writing j
minus n plus m. Because at this point I am writing out the corresponding coefficient for
C n-m . C n-m is this and the corresponding seed value will be j minus n plus j minus. I mean
j minus n minus m, so that is exactly this value. Therefore, the rest part of the sigma
comes in, and you have got C i m and S j-n+m+I. Note that this I will run from 1 to Lm (S)
and this will run from 1 to Ln (S). For all the previous values, that is, for j running from L
to n minus 1, this value will be 0, because this LFSR was properly generating all these
values. But what about this? The suffix here is j minus n plus m. When a j runs from L to
n minus 1, you substitute here n, n minus 1, then you get n minus 1, n minus 1 minus n
plus m. so that is n minus 1.
Therefore this sequence is actually generating the first m digits of the sequence of its
corresponding LFSR. Therefore, this was actually properly generating, this value is 0
value. This a 0 and this is a 0, so you get 0 for j running from L to n minus 1. And what
about when j equal to n? This will generate a 1 because of the discrepancy and this will
also generate a 1 because of the discrepancy, and both the discrepancies will get
cancelled out. Therefore we say that the corresponding coefficient C D actually generates
the entire sequence S n+1 . So, this is not so trivial proof, therefore, please go back and
look at the proof again and try to work with some small examples.
When L is less than equal to n by 2. I described why it is so and only at that particular
time you require to update the value of L, otherwise the L is fine. You can work through
the details of this thing but I will give you some examples. I will tell you one thing that
until and unless you work with your hand you will not be so clear. Therefore, please go
back and work with some toy examples. Consider this sequence of periodicity 20,
therefore this is for example, a sequence. You can actually plot the variation of the linear
complexity with n by calculating this using Berlekamp Massey algorithm. And this is
actually called the linear profile, so you can actually plot them and it will look like this.
If this is line corresponding to L equal to n by 2, then whenever L is less than equal to n
by 2, the modification has taken place. Otherwise, when L is actually greater than n by 2,
the modification of L is not equal. You get the step function at those places. So, you can
little bit look in more closely
To obtain certain interesting properties, I will conclude with the example with which I
started. Therefore, this is the sequence. Yesterday we saw that a four-state LFSR was
unable to generate the sequence. So, in order to solve the problem, let us apply the
Berlekamp Massey algorithm. To do that, you will store that in the form of a table like
this. So you see that S n d T(D) C(D) L m B(D) and N are some variables up there, and
we will start operating this table. This is the sequence, which you wish to generate from
0 0 1 1 1 0 1 1. You start with a value of C(D) equal to 1, that was my convention, and
your L value was 0, and B(D), I mean, m is actually in this case you start with initialize
with minus 1, and B D is the corresponding polynomial which generates the sequence.
This was my C m D in my discussion in the analysis. So, n is equal to 0 means still now
you have generated till 0. That is, you have generated nothing till this point. This is just a
initialization of the algorithm. So what you do next is you get 0. What is the
discrepancy? It is 0, therefore, you do not require to do any other thing. Therefore, you
see that all the things are kept intact. Next what you get is again a 0, you have a
discrepancy of 0 and therefore you do not do anything. But the next thing you get is a 1.
The moment you get 1, your discrepancy is equal to 1, because you feeding back 0 and
you are getting what you want, that is, 1. Therefore, the exhort is 1. So, we require
updating the corresponding value of C(D). So you see that what you do is that you exhort
the previous value of C(D), that is, 1 with the corresponding value of B(D) but you
multiply with D to the power of N minus m.
What is your N here? Two. And your m value is minus 1, so minus 1 is 2 plus 1, 3, so
you get 1 plus D cube. You see that 1 plus D cube, which has the length of 3 should be
able to generate this particular sequence 0 0 1. And this we have already seen right with
our hand exercise. Therefore, what is the value of m? Equal to two. So, m is the previous
sequence before the length got changed. The length got changed here from 0 to 3 and
what you have generated previously was till 3 (2). Therefore, m is actually equal to the
previous value, therefore in this case, its 2, and the corresponding polynomial was 1.
Now, you have generated till 3, that is, the I mean 0 1 2 3. So, the next thing, which you
get is 1, and you again find that discrepancy here, which means that you need to change
the value 1 plus D plus, I mean, you need to change the value of C(D).
So what you do is that you take N and you see that this is m, therefore 3 minus 2 is 1. So,
you multiply this with 1 plus D cube and you also exhort the previous value of B(D) but
multiplying that with D. Therefore, you get 1 plus D plus D cube, so you get 1 plus D
plus D cube and your length… You see that you are not updating the length. Can you tell
me why? Yeah, because of that inequality. Therefore, in this case, you do not require to
update this value and therefore you go ahead with it. So, you have generated till this
point. Therefore, you see that it works fine here for these things because it generates the
discrepancy of 0. There is no other updating required for these stages. But at this point
we were unable to construct with a 3-bit LFSR. And you see that the discrepancy is in
this case is 1. Why? Because 1 plus D cube, so if you exhort the previous this value and
this value, that is the previous sequence values to get 1 on 1 exhort to actually feeding
back 0 but what you want is a 1. What is the discrepancy value? Its equal to one,
therefore, you need to modify the value of C(D). Therefore, you take 1 plus D plus D
cube and exhort that with the previous value of B(D). What is the previous value of B D?
It is equal to one multiplied with D to the power of 7 minus 2, that is, D to the power of
5. So, you take 1 plus D plus D cube plus D to the power of 5 and you get L equal to 5.
So, you see that you have modified the length again because 3 is actually smaller than 7
by 2. Therefore, you have modified the value of this to 5, and therefore you see that 5-
stage LFSR with this particular polynomial is able to generate the entire sequence.
Therefore, by using Berlekamp Massey algorithm, you are able to calculate the
corresponding minimal length of the LFSR, which will generate the sequence and also
find out the corresponding polynomial. And actually, you can see from the algorithm
statement the complexity of this algorithm is o N square. Therefore, this is a quadratic
algorithm to generate the corresponding sequence. This is quite efficient in that case. So,
references that I have followed are as follows. These are standard references but I will
suggest you to go back and read this paper. Its freely downloadable, it is got shift
registers synthesis and B C h decoding. Just concentrate on the first part of this paper.
This gives you a description but this actually tops in prime fields and I generalize this
proof for g f 2, I mean not generalize, many more specific. This is a classic paper, it
IEEE transactions on information theory paper but you understand… So, what you have
learnt from this example is that a single LFSR system is not good.
If you have a single LFSR-based stream cipher, then you can actually do a known
plaintext attack. You can obtain the key stream and from that, you can actually
reconstruct the LFSR. You can know everything about LFSR. Given two N bit sequence,
you can construct the entire thing, therefore, you need a multi LFSR system. In the future
classes, in the next day classes, we will still continue with stream ciphers and we will try
to understand more detailed, I mean, better constructions of stream ciphers using
LFSR’s, and also not LFSR’s.