Distant Integer Signals: Identity, Maths Club of IISER Kolkata
Distant Integer Signals: Identity, Maths Club of IISER Kolkata
You may, as an extra, add notes wherever applicable. If you make a nice
observation that is not part of the paper - please include them in your notes
to get bonus credit.
You may not show Rough Work. But if you use an unfamiliar result,
please give a detailed proof for it.
1
• More than one solution to a problem is always welcome, as long as the
solutions are not redundant. In that case, please write Alt. Soln.
before every alternate solution to the same problem.
• Do not spend too much time on Bonus questions if you haven’t com-
pleted the paper. They are meant to be hard and might not have a set
answer - it is upon you to figure out a way to find them. But - if you
do, you are rewarded!
Prerequisites
Integer Signals
Signals are ubiquitous and extremely important in everyday life. They can
range from low frequency remote control signals for your television to high
energy Gamma ray bursts from distant galaxies. There can be various kinds
2
of signals - time varying electromagnetic waves or encoded messages to name
a few.
In our problem, we shall look at signals in the form of integer sequences and
try to figure out some quirky results.
The set of integers is unbounded, and while one cannot generally characterize
any limits to signal strength - modelling it via an infinite set seems unreason-
able. Historically, several such varying signals had been classified into small,
discrete sets. A famous example is Ptolemy’s classification of apparent stellar
magnitudes in Hipparchus’ catalogue into six different classes.
In our case, we would prefer to classify integer signals with help of integer
bases (radices).
3
What is the order of change of this number if we increment B by 1? (3)
A dime a dozen
You are now invested in these signals - so you decide to measure longer
intervals with the detector. You now come across a new kind - it measures a
patch with
1
Here, a count means a digit in the signal. A sequence of counts form a signal.
2
Note that from a practical standpoint - a string of zeroes mean no observation. This
is why we can ignore any preceeding zeroes in a signal (and in some cases, trailing zeroes).
4
Definition 4. A positive non-decreasing signal in base B is a positive
integer sequence (a0 , a1 , a2 , . . .), where ai ∈ N, 0 ≤ ai < B ∀i and ai ≤ aj
whenever i < j.
5
Figure 1: A general uni-antimodal signal in base 10. Note that it is positive
non-decreasing upto a point, whereupon it drops and another non-negative,
non-decreasing signal is produced.
6
(Bonus) How many m-antimodal base B signals of length n are there?
A matter of chance
Assume that signal counts (including 0 count) are i.i.d uniform random, i.e.
each of them have an equal probability of being chosen.
Indeed, assuming that every count is equally likely makes things much easier
than they seem to be. But in reality, this is hardly the case. It is akin to
looking at a dark patch of the sky and expect something to show up. Given
that higher counts mean stronger signals, the chance of a count showing up
decreases as the count value increases.
Let us set a probability distribution for the signal counts. To do this, we will
implicitly assume that the true counts are in [0, ∞) while the discretization
is only to aid inference. Consider the distribution χ22 , whose pdf is given by
1 x
pχ22 (x) = e− 2 , x ∈ [0, ∞)
2
7
where j can take values from {0, 1, 2, . . . , B − 1}.
These intervals are chosen apriori such that P (count = j) = pj > P (count =
i) = pi whenever j < i (condition A). This is to ensure that a count of 0 is
the most likely, a count of 1 less likelier all the way up to B − 1, which has
a very small probability of showing up.
An extreme case of this would be one where you have an infinite base B,
which means that the signal counts can take values in the set N. Given the
nature of χ22 , we can create the increasing sequence as an infinite arithmetic
progression without violating condition A (you can check that explicitly or
just see it intuitively). You will now see that despite all the intervals being
the same, the probability of occurrence varies a lot for higher counts.
Well, well well. You have now successfully found a way to categorize signals
into a discrete set based on their strengths and frequencies. Now that you
have the discrete probability distribution (p0 , p1 , p2 , . . . , pB−1 ) for each of the
counts,
8
Can you assign a probability distribution for the sample space of all n
length signals in base B? Use the aforementioned discrete distribution
for the counts.
You have successfully completed two steps to success. Step 1 was mostly
built around computing abundances of certain kinds of integer signals, while
Step 2 focussed more on the chance of detecting them. Now a final challenge
awaits you before we bid goodbye to each other this time.
No noise in here
As with most cases in reality, trends are seldom ideal. There are several
mathematical models that relate some quantity X with some other quantity
Y through a linear relationship. However when taking data, you see that not
all points perfectly agree with this definition no matter how carefully you
take data. There is some inherent noise attached to each macro-system - and
this is something we always want to minimize.
So when you observe distant signals - those that feature strong counts fre-
quently, your interest might be drawn. Might they be the product of some
distant event that is of great importance? You wish to unveil it all, but you
lack the machinery - so you start one step at a time.
9
Consider the chi-square distribution with 2 degrees of freedom χ22 that we
defined earlier. For the sake of simplicity, we shall assume that the base B
is infinite and that the sequence of intervals used for the count is given by
(x0 = 0, r, 2r, 3r, . . .).
Given some noise from χ22 with the discretization according to the above
sequence, find the integer value for the first moment a.k.a the expectation
E(X), where X ∼ discrete(p0 , p1 , p2 , . . .). Note that the expectation of a
discrete random variable X is given by
X
E(X) = xp(x)
x
where the sum is over all the values that the random variable can assume
and p(x) is the value of the PMF p at the outcome x. (15)
Therefore, every detected signal might have some noise attached to it. How
much does this change the base signal? Let us try to take a more elaborate
look.
Definition 5. Consider an n length signal in base B = ∞. Define it as
u = (u0 , u1 , u2 , . . . , un−1 ).
We would now want to quantify how similar these two signals are, under the
consideration that the most similar two signals can get is when they both
are constant and equal to each other.
Can you construct some way(s) in which we can figure out the similarity
of two signals? (20)
10
The problem above is very important, because it will let you form a numerical
model for an idea - and look for its pros and cons. We, on the other hand,
will consider something well posed in literature - cross correlation.
While all ui are fixed integers, the same can not be said for the ϵi , as they are
random integers drawn from the discrete distribution discrete(p0 , p1 , p2 , . . .),
where the pi are defined as the probability of getting the count i, measured
on the partition (0, r, 2r, 3r, . . .) of the support of χ22 . You might want to go
back and review your previous results if you have missed it.
11
Using the properties of the Expectation function, and assuming that u is
a deterministic variable and ϵi ∼ discrete(p0 , p1 , p2 , . . .) random variables,
show that
n−1 n−1
X 1 X
Eu⋆v [0] = u2i + r/2 ui
i=0
e − 1 i=0
(10)
Hence compute the noise factor Nu,v = Eu⋆v [0] − u ⋆ u[0], and mention a
way in which Nu,v can be lowered. (5)
It must be astonishing to see that the way in which we determine what value
of count to assign given a particular strength of a signal in [0, ∞) contributes
significantly to the noise level. While you might be really excited to diminish
the noise factor by an appropriate transformation - hold your horses!
Recall what the intervals [0, r), [r, 2r), [2r, 3r), . . . stand for? Yes, they stand
for the probability that a random signal count has value 0, 1, 2, . . .. So if we
were to widen the interval [0, r) to our heart’s content, eventually we’ll only
get zeroes as our signal counts, and all hope of distinguishing strong signals
from weaker noise (due to less probability) would go out of the window!
Surely, that is not what we want...
To get to a middle ground, we need to set a threshold for really strong and
improbable signals - which should almost always hint to a dedicated event.
For our simple B = ∞ case, let this count be b. So any signal with a count
above b would be classified extraordinary - i.e. it may not be produced by
random noise. To do that, we demand that (condition B)
Z br+1
pχ22 dx = 0.95
0
Finally we are ready to tackle growing signals with this framework. These
signals start at low values, impersonating noise. But after a while they go to
12
counts larger than b and remain increasing, before they drop again to zero,
where random noise takes over. We have already seen that the fraction of
increasing signals of length n goes to zero as B → ∞. We will now try to see
how the noise factor varies as the sequence length increases. All assumptions
are carried over from before.
This relates how quickly the sequence u = (ui )i grows with how strongly the
threshold value b affects the noise factor. Computing the ratio Nu,v /u ⋆ u[0]
will give a more robust idea about the optimum level at which a growth
function will cause minimization of noise. While we let you ponder on this
problem for a while, we shall quckly take our leave, but not before setting a
final problem -
13