IOAA 2024 Data Analysis
IOAA 2024 Data Analysis
0 Hours
Instructions
1. The Data Analysis competition will be 3 hours in duration and is marked out of a total of 150 points.
2. There are Detailed Worksheets for carrying out detailed work / rough work. On each of the
Detailed Worksheets, please fill in
Student Code
Question no.
Page no. and total number of pages.
3. Start each problem on a new page of the Detailed Worksheets. Please write only on the printed side
of the sheet. Do not use the reverse side. If you have written something on any sheet that you do not
want to be marked, cross it out.
4. Graph Paper is required for your solutions. On each Graph Paper sheet, please fill in
Student Code
Question no.
Graph no. and total number of graph paper sheets used.
5. There is a summary Answer Sheet with your student ID code for your final answers.
6. Please remember that the graders may not understand your language. As far as possible, write your
solutions only using mathematical expressions and numbers. If it is necessary to explain something in
words, please use short phrases (if possible in English).
7. You are not allowed to leave your exam desk without permission. If you need any assistance
(malfunctioning calculator, need to visit a restroom, need more Detailed Worksheets, etc.), please put
up your hand to signal the invigilator.
8. The beginning and end of the competition will be indicated by a long sound signal. Additionally, there
will be a short sound signal fifteen minutes before the end of the competition (before the final long
sound signal).
9. At the end of the competition you must stop writing immediately. Sort and put your Summary
Answer Sheets, Graph Papers, and Detailed Worksheets in one stack. Put all other papers in another
stack. You are not allowed to take any sheet of paper out of the examination area.
10. Wait at your table until your envelope is collected. Once all envelopes are collected, your student
guide will escort you out of the competition room.
The following tables containing object positions and magnitudes from Stripe 82 were
downloaded for analysis. However, due to a file system corruption on the computer, the file
names were scrambled, and now you cannot tell which table belongs to which survey.
Tables 1 and 2 appear next to each other below, with an identification number for each
source, its equatorial coordinates, and its magnitude in the g-band (m ) with its error (err m ).
g g
(a) (5 points) From these tables, which survey (SDSS or DES) is Table 1 and which is Table 2?
Assume that both surveys are equivalent regarding detector response, exposure times, and
site characteristics.
(b) (35 points) Using the data in the table, plot the magnitude (m ) on the x-axis (linear scale)
g
and the error in magnitude (err m ) on the y-axis (logarithmic scale) using the semi-log paper
g
marked as Graph 1. Estimate the angular coefficient A (slope) and linear coefficient B (y-axis
intercept) for each dataset. There is no need to calculate the associated errors.
(c) (5 points) The Signal to Noise ratio (S/N ) is approximately the inverse of the error in the
magnitude, S/N ≈ 1/(err m ). Using the linear fit calculated in the previous part, what is
g
(d) (15 points) An object in Table 1 that is within 1 arcsecond of an object in Table 2 can be
considered to be the same object. By looking at the RA and Dec of the objects in both tables,
identify the objects in common and write down a new table with the matching IDs, I D and 1
ID .2
(e) (15 points) Using the matched table from part (d), plot the g-band magnitude of each
survey against the other, Table 1 on x-axis, and Table 2 on y-axis using the millimetre (linear)
paper marked as Graph 2. Draw on error bars for each point in both horizontal and vertical
directions, using values double err m (known as a 2σ uncertainty). From your graph, identify
g
the stars that would be suitable for photometric calibration between the two surveys and
write down their correspondings IDs from Table 1.
Table 1 Table 2
err
I D1 RA Dec mg I D2 RA Dec mg err m g
mg
(a) (25 points) Calculate the distance (in parsecs) of each globular cluster from the Sun as well
as their Cartesian coordinates (x, y, z). The x-axis points to the Galactic Centre and the y-axis
points in the direction of galactic rotation. The system is right-handed.
(b) (15 points) From the given data, estimate the distance from the Sun to the centre of the
distribution of globular clusters and its uncertainty.
(c) (30 points) To test the validity of Shapley's hypothesis that globular clusters are
symmetrically distributed around the Galactic Centre, make histograms with five bins (i.e. sort
the data and divide them into five equally-sized intervals) for each of the distributions in the
x, y, and z directions. Mark the value of the quartiles (Q , Q , Q ) of the three distributions
1 2 3
Hint: The three quartiles divide the sorted sample into four sections, each containing 25% of
the data, with the second and third sections representing the interquartile range.
(d) (5 points) Using the quartiles, calculate the symmetry factor value for the three
distributions as given by:
|Q1,x+Q3,x−2Q2,x| |Q1,y+Q3,y−2Q2,y| |Q1,z+Q3,z−2Q2,z|
Φx = , Φy = , Φz =
Q3,x−Q1,x Q3,y−Q1,y Q3,z−Q1,z
Classify the three distributions in the x, y, and z directions based on their calculated
symmetry factor values, according to the table shown below. Hence, on the answer sheet,
write True (T) if the analysed sample follows Shapley’s hypothesis or False (F) otherwise.
0. 0 ≤ Φ ≤ 0. 1 symmetrical
0. 1 < Φ ≤ 0. 2 quasi-symmetrical
Φ > 0. 2 asymmetrical