0% found this document useful (0 votes)
7 views4 pages

Challenge Remote

remote challenge

Uploaded by

Anas Jamshed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Challenge Remote

remote challenge

Uploaded by

Anas Jamshed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Challenge

Welcome to our ions.bio Data Challenge! All tasks should be analyzed using Python,
which is our language of choice. In total, you should not spend more than 3 hours on
the exercise, although a complete solution may take much longer. There is no right or
wrong answer. For us, it is only relevant to see how you approach such a task, how
creative you may get during the data analysis, and mainly how you communicate the
results. There is no need to polish the code or results. Most important is that you have
fun and enjoy solving the problems!

If you feel that you cannot answer a task due to specific domain knowledge limitations,
feel free to skip this task or send an email to [email protected] for additional information.

Good luck, and we look forward to seeing your innovative solutions and insights!

Tasks:
Task 1: Truck company
Task 2: Mass spectrometer control
Task 1: Truck company
Mega Truck is a logistics company specializing in shipping services, operating with a
fleet of 10 trucks that regularly travel to designated destinations. The roundtrip
distances for these routes are detailed in routes.csv. Over time, you need to gather
data on the routes each truck took. Unfortunately, all records of the trucks' trips were
lost. However, we do know that each truck’s trip consisted of completing 8 routes.

Each truck is fitted with a tracking device that logs the mileage every time the engine
is turned off, creating a new entry. This means that a log entry is recorded each time
a truck returns to the home base. Additional entries may also be logged, such as
during cargo unloading, refueling or bio-breaks.

You can find the mileage logs in logs.csv.

There is a receipt that correspond to one of the trucks. These receipts allowed us to
determine the truck's destinations, as recorded in sample_trip.csv.

1) Can you identify the truck for which we have the recorded routes in the
sample_trip.csv?
2) Can you reconstruct the trips for each of the trucks?
Task 2: Mass spectrometer control
In the past, we analyzed a blood sample with our mass spectrometer and found 10.000
peptides. For each peptide, we acquired its charge, its mass, its fragments, and its
time. In the file library.csv, every row corresponds to a peptide that we measured.
The total measurement time was 120 minutes. After applying some special algorithms,
we could identify the peptides and marked some as important peptides that are
relevant for a certain disease.

Next, we want to measure another blood sample. This time we want to apply special
instrument settings so that whenever we measure one of the important peptides, we
adjust our instrument so that we can measure with maximum performance.

For the sake of simulation, we added the file measurement.csv – which contains all
data points that will be measured during this measurement run. While measuring, we
will get one datapoint after another over the time course of 120 minutes (Note that the
rows are sorted by time). Below is some Python-code that would simulate the
instrument streaming the data. Also note that the values of our recent measurements
are not identical to the expected values due to measurement inaccuracy and noise.
Particularly for the time, we expect some drift over the measurement. Typically there
is a fixed offset between measurements as well as a variable shift over time.

Our goal is now to incorporate some logic (e.g., the function is_important()) that checks
if a measurement is an important one from the library so that we can trigger the do()
function and apply our special instrument settings.

import pandas as pd
library = pd.read_csv('library.csv')
sample = pd.read_csv('measurement.csv')
def instrument_gen():
for i in range(len(sample)):
yield sample.iloc[i]
instrument = instrument_gen()
def do():
pass
def is_important(measurement):
pass
while True:
try:
measurement = next(instrument)
# if the measurement is important do stuff
if is_important(measurement):

do()

except StopIteration:
break

1) Can you come up with a solution that is capable of detecting the important
measurements during the acquisition?
2) The time drift will be potentially key for looking up the measurements in the
library. Can you visualize the observed drift in time?

You might also like