0% found this document useful (0 votes)
61 views13 pages

Distant Integer Signals: Identity, Maths Club of IISER Kolkata

1) The document provides an overview of integer signals, which are defined as sequences of integers. It introduces concepts like integer bases, non-negative and positive non-decreasing signals, and uni-antimodal signals. 2) It poses problems related to counting the number of possible signals of a given length and base, and calculating the fraction of non-decreasing signals. 3) It also discusses modeling integer signals as random variables and defining probability distributions over their counts. In particular, it considers a chi-squared distribution for the counts.

Uploaded by

Sabarno Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views13 pages

Distant Integer Signals: Identity, Maths Club of IISER Kolkata

1) The document provides an overview of integer signals, which are defined as sequences of integers. It introduces concepts like integer bases, non-negative and positive non-decreasing signals, and uni-antimodal signals. 2) It poses problems related to counting the number of possible signals of a given length and base, and calculating the fraction of non-decreasing signals. 3) It also discusses modeling integer signals as random variables and defining probability distributions over their counts. In particular, it considers a chi-squared distribution for the counts.

Uploaded by

Sabarno Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Distant integer signals

Identity, Maths Club of IISER Kolkata

March 31, 2023

How the paper works

This paper will help you in building a mathematical framework to solve a


problem at hand. The process is divided into small, simple steps and there
might not be an unique way of arriving at the final result; your creativity
matters. Sometimes, you may be asked to raise pertinent questions about the
problem at hand. Answers are only required to the problems in coloured
boxes.

You may, as an extra, add notes wherever applicable. If you make a nice
observation that is not part of the paper - please include them in your notes
to get bonus credit.

You may not show Rough Work. But if you use an unfamiliar result,
please give a detailed proof for it.

General tips for better assessment


• Problems are generally linked to one another in a chronological manner.
Solving them in that fashion is recommended.

1
• More than one solution to a problem is always welcome, as long as the
solutions are not redundant. In that case, please write Alt. Soln.
before every alternate solution to the same problem.

• LATEX submissions are preferred (as pdf) to handwritten or MS Word


docx files. If docx is submitted math notation should be proper -
you might want to use the ‘Equation tool’ in MS Word. Handwritten
solutions should be legible, otherwise they won’t be checked.

• Do not spend too much time on Bonus questions if you haven’t com-
pleted the paper. They are meant to be hard and might not have a set
answer - it is upon you to figure out a way to find them. But - if you
do, you are rewarded!

Prerequisites

Necessity. Elementary Combinatorics

Necessity. Naı̈ve set theory.

Necessity. Knowledge of elementary properties of natural numbers (like


ordering and base).

Necessity. Knowledge of standard probability distributions

Necessity. Common sense.

Cosmetic. Impressive knowledge of Olympiad problems.

Integer Signals

Signals are ubiquitous and extremely important in everyday life. They can
range from low frequency remote control signals for your television to high
energy Gamma ray bursts from distant galaxies. There can be various kinds

2
of signals - time varying electromagnetic waves or encoded messages to name
a few.

In our problem, we shall look at signals in the form of integer sequences and
try to figure out some quirky results.

Definition 1. An integer signal henceforth is defined as a sequence of


integers (a0 , a1 , a2 , . . .) where ai ∈ Z ∀i.

A finite and an infinite integer sequence has obvious definitions.

The set of integers is unbounded, and while one cannot generally characterize
any limits to signal strength - modelling it via an infinite set seems unreason-
able. Historically, several such varying signals had been classified into small,
discrete sets. A famous example is Ptolemy’s classification of apparent stellar
magnitudes in Hipparchus’ catalogue into six different classes.

In our case, we would prefer to classify integer signals with help of integer
bases (radices).

Definition 2. An integer base (or radix) is a collection of digits which


are used to represent any integer. All integer bases are formed from
the digits 0, 1, . . . , B − 1, where B defines the largest digit in the B-base
system. For example, in decimal B = 10, while in binary B = 2.

Definition 3. Under a B-base system, a non-negative integer sequence is


defined as a sequence (a0 , a1 , a2 , . . .) where ai ∈ N ∪ {0}, 0 ≤ ai < B ∀i.

A non-negative integer sequence (subsequently called a signal ) in base B can


have a finite length n ∈ N. We will see later that we can choose to look at
signals of a certain finite length.

How many such n length distinct signals are possible? (3)

3
What is the order of change of this number if we increment B by 1? (3)

A dime a dozen

It is undeniable that there can be a whole lot of signals of length n in base


B, obviously increasing in both n and B. While a general signal might have
a bunch of haphazard counts1 , it may carry a lot of information about the
underlying event producing it.

Consider distant integer signals arriving at your detector. In general, there


would be nothing too specific about incoming random noise but this time,
you see a peculiar oddity. When you start getting a positive count2 , you
notice that the signal is non-decreasing until it suddenly stops. This is an
interesting observation - since it might hint at some coherent event out there.

Amongst all possible base B signals of length n, what fraction is formed


from positive non-decreasing ones? (5)

You are now invested in these signals - so you decide to measure longer
intervals with the detector. You now come across a new kind - it measures a
patch with

1. A non-decreasing positive signal of a certain length.

2. A subsequent non-decreasing non-negative signal of some length fol-


lowing a strict drop.

3. None of the two signals are null (i.e. of length 0).

1
Here, a count means a digit in the signal. A sequence of counts form a signal.
2
Note that from a practical standpoint - a string of zeroes mean no observation. This
is why we can ignore any preceeding zeroes in a signal (and in some cases, trailing zeroes).

4
Definition 4. A positive non-decreasing signal in base B is a positive
integer sequence (a0 , a1 , a2 , . . .), where ai ∈ N, 0 ≤ ai < B ∀i and ai ≤ aj
whenever i < j.

A non-negative non-decreasing signal is an obvious extension.

We would call the right concatenation of a non-negative non-decreasing to a


positive non-decreasing signal (with a strict drop in count at the concatena-
tion) as a uni-antimodal signal. The name comes from the fact that it has a
unique local minimum.

5
Figure 1: A general uni-antimodal signal in base 10. Note that it is positive
non-decreasing upto a point, whereupon it drops and another non-negative,
non-decreasing signal is produced.

How many uni-antimodal signals of length n in base B are present?


(Hint: Consider the first signal to be of length k, the latter to be of
length n − k, and sum the possibilities for all 1 ≤ k ≤ n − 1.) (40)

Also determine whether their growth is polynomial, exponential or loga-


rthmic as n increases. (5)

These can be generalized to m - antimodal signals, where an initial positive


non-decreasing is followed by m non-negative non-decreasing, with a strict
drop at m points - corresponding to an antimode (local minima).

6
(Bonus) How many m-antimodal base B signals of length n are there?

A matter of chance

While you marvel at these signals, there is always a lingering concern. Do


these really hint at something special, or is it just random noise? There are
definitely several ways to resolve this - but let’s move ahead in small steps.

Assume that signal counts (including 0 count) are i.i.d uniform random, i.e.
each of them have an equal probability of being chosen.

What probability distribution would you assign to the sample space of


base B sequences of length n? (10)

(Hint: Try to find the probability of occurrence of an arbitrary signal.


This gives you the Probability Mass Function (PMF), which characterizes
the distribution.)

Indeed, assuming that every count is equally likely makes things much easier
than they seem to be. But in reality, this is hardly the case. It is akin to
looking at a dark patch of the sky and expect something to show up. Given
that higher counts mean stronger signals, the chance of a count showing up
decreases as the count value increases.

Let us set a probability distribution for the signal counts. To do this, we will
implicitly assume that the true counts are in [0, ∞) while the discretization
is only to aid inference. Consider the distribution χ22 , whose pdf is given by
1 x
pχ22 (x) = e− 2 , x ∈ [0, ∞)
2

Next we find a strictly increasing sequence (x0 = 0, x1 , x2 , . . . , xB = ∞), with


xi<B < ∞ such that
Z xj+1
P (count = j) = pj = pχ22 (x)dx
xj

7
where j can take values from {0, 1, 2, . . . , B − 1}.

These intervals are chosen apriori such that P (count = j) = pj > P (count =
i) = pi whenever j < i (condition A). This is to ensure that a count of 0 is
the most likely, a count of 1 less likelier all the way up to B − 1, which has
a very small probability of showing up.

Show that for condition A to hold, we would need the following to be


true for all j ∈ {0, 1, 2, . . . , B − 2}

xj+1 − xj xj+2 −xj


> ln 2 − e− 2
2
(20)

Can you build such a strictly increasing sequence (x0 = 0, x1 , . . . , x10 =


∞) for base 10 signals so that condition A is satisfied? (10)

An extreme case of this would be one where you have an infinite base B,
which means that the signal counts can take values in the set N. Given the
nature of χ22 , we can create the increasing sequence as an infinite arithmetic
progression without violating condition A (you can check that explicitly or
just see it intuitively). You will now see that despite all the intervals being
the same, the probability of occurrence varies a lot for higher counts.

For the sequence (x0 = 0, x1 , x2 , . . .) being an arithmetic progression


with B = ∞, how does the ratio pj /pj+1 vary? Can you find an intuitive
explanation for it? (10)

Well, well well. You have now successfully found a way to categorize signals
into a discrete set based on their strengths and frequencies. Now that you
have the discrete probability distribution (p0 , p1 , p2 , . . . , pB−1 ) for each of the
counts,

8
Can you assign a probability distribution for the sample space of all n
length signals in base B? Use the aforementioned discrete distribution
for the counts.

Is there a formal name for such distributions? (15)

You have successfully completed two steps to success. Step 1 was mostly
built around computing abundances of certain kinds of integer signals, while
Step 2 focussed more on the chance of detecting them. Now a final challenge
awaits you before we bid goodbye to each other this time.

No noise in here

As with most cases in reality, trends are seldom ideal. There are several
mathematical models that relate some quantity X with some other quantity
Y through a linear relationship. However when taking data, you see that not
all points perfectly agree with this definition no matter how carefully you
take data. There is some inherent noise attached to each macro-system - and
this is something we always want to minimize.

Signals come with noise as well. In information processing, people often


encode information in noisy channels so that it is nearly impossible to retrieve
the encoded information without knowing the key. In case of distant signals,
you would really want to separate the event from the background noise -
fortunately it is possible if the noise follows certain conditions. The literature
for denoising signals is immense, we will just take a look at the grassroots
framework.
Can you give an example of a scenario in Mathematics where random
noise is used in a model? (5)

So when you observe distant signals - those that feature strong counts fre-
quently, your interest might be drawn. Might they be the product of some
distant event that is of great importance? You wish to unveil it all, but you
lack the machinery - so you start one step at a time.

9
Consider the chi-square distribution with 2 degrees of freedom χ22 that we
defined earlier. For the sake of simplicity, we shall assume that the base B
is infinite and that the sequence of intervals used for the count is given by
(x0 = 0, r, 2r, 3r, . . .).

Find p0 and pj in terms of p0 . (5)

Given some noise from χ22 with the discretization according to the above
sequence, find the integer value for the first moment a.k.a the expectation
E(X), where X ∼ discrete(p0 , p1 , p2 , . . .). Note that the expectation of a
discrete random variable X is given by
X
E(X) = xp(x)
x

where the sum is over all the values that the random variable can assume
and p(x) is the value of the PMF p at the outcome x. (15)

Therefore, every detected signal might have some noise attached to it. How
much does this change the base signal? Let us try to take a more elaborate
look.
Definition 5. Consider an n length signal in base B = ∞. Define it as
u = (u0 , u1 , u2 , . . . , un−1 ).

Every count ui has an inherent noise ϵi attached to it, so define the


noise-added signal as

v = (u0 + ϵ0 , u1 + ϵ1 , u2 + ϵ2 , . . . , un−1 + ϵn−1 ).

We would now want to quantify how similar these two signals are, under the
consideration that the most similar two signals can get is when they both
are constant and equal to each other.

Can you construct some way(s) in which we can figure out the similarity
of two signals? (20)

10
The problem above is very important, because it will let you form a numerical
model for an idea - and look for its pros and cons. We, on the other hand,
will consider something well posed in literature - cross correlation.

Definition 6. For two discrete time functions f, g, their cross correlation


function is defined as

X
f ⋆ g[n] ≜ f (m)g(m + n)
m=−∞

where x stands for the complex conjugate of x.

Since we are dealing with integer signals, conjugation won’t be necessary.

For simplicity, we will consider the rank-wise cross-correlation, i.e. f ⋆ g[0].


It shouldn’t be too hard to see that with the above mentioned definitions,
n−1
X
u ⋆ v[0] = um (um + ϵm )
m=0

While all ui are fixed integers, the same can not be said for the ϵi , as they are
random integers drawn from the discrete distribution discrete(p0 , p1 , p2 , . . .),
where the pi are defined as the probability of getting the count i, measured
on the partition (0, r, 2r, 3r, . . .) of the support of χ22 . You might want to go
back and review your previous results if you have missed it.

Definition 7. For i.i.d random variables X,Y (with distributions station-


ary in time) with signal instances (x0 , x1 , . . . , xn−1 ) and (y0 , y1 , . . . , yn−1 ),
define the expected cross-correlation function as
n−1
X n−1
X n−1
X
EX⋆Y [0] = E( xm y m ) = E(xm ym ) = E(xm )E(ym )
m=0 m=0 m=0

The i.i.d (independent and identically distributed) condition is crucial to


make the above simplification to the expression.

11
Using the properties of the Expectation function, and assuming that u is
a deterministic variable and ϵi ∼ discrete(p0 , p1 , p2 , . . .) random variables,
show that
n−1 n−1
X 1 X
Eu⋆v [0] = u2i + r/2 ui
i=0
e − 1 i=0

(10)

Hence compute the noise factor Nu,v = Eu⋆v [0] − u ⋆ u[0], and mention a
way in which Nu,v can be lowered. (5)

It must be astonishing to see that the way in which we determine what value
of count to assign given a particular strength of a signal in [0, ∞) contributes
significantly to the noise level. While you might be really excited to diminish
the noise factor by an appropriate transformation - hold your horses!

Recall what the intervals [0, r), [r, 2r), [2r, 3r), . . . stand for? Yes, they stand
for the probability that a random signal count has value 0, 1, 2, . . .. So if we
were to widen the interval [0, r) to our heart’s content, eventually we’ll only
get zeroes as our signal counts, and all hope of distinguishing strong signals
from weaker noise (due to less probability) would go out of the window!
Surely, that is not what we want...

To get to a middle ground, we need to set a threshold for really strong and
improbable signals - which should almost always hint to a dedicated event.
For our simple B = ∞ case, let this count be b. So any signal with a count
above b would be classified extraordinary - i.e. it may not be produced by
random noise. To do that, we demand that (condition B)
Z br+1
pχ22 dx = 0.95
0

Condition B will give you r as a function of b. Use that to compute


Eu⋆v [0] again and show how the noise factor scales with b as compared
to r in the last case. (5)

Finally we are ready to tackle growing signals with this framework. These
signals start at low values, impersonating noise. But after a while they go to

12
counts larger than b and remain increasing, before they drop again to zero,
where random noise takes over. We have already seen that the fraction of
increasing signals of length n goes to zero as B → ∞. We will now try to see
how the noise factor varies as the sequence length increases. All assumptions
are carried over from before.

Suppose that the sequence count is given by a slowly strict growing


function f (n), where f (i)P= ui . Suppose that as i increases from 0
to n, the average value 1i ij=0 f (j) increases linearly, following a linear
function ϕ(i). If ∃k such that ϕ(k) > b but ϕ(k − 1) < b, then compute
the expression
Nu,v
when the sequence length is k. For b >> 1, k show that the noise factor
is already quadratic in b. If the growth of the function is slower, with
k ≈ b, then show that the noise factor is cubic in b. (50)

This relates how quickly the sequence u = (ui )i grows with how strongly the
threshold value b affects the noise factor. Computing the ratio Nu,v /u ⋆ u[0]
will give a more robust idea about the optimum level at which a growth
function will cause minimization of noise. While we let you ponder on this
problem for a while, we shall quckly take our leave, but not before setting a
final problem -

(Bonus) Can you do a similar analysis for uni-antimodal sequences?

13

You might also like