lab1-dataAlgorithms copy
lab1-dataAlgorithms copy
• asks then 5 integer numbers to the user, and for each number answers YES if the
number is in I, NO otherwise
Provide two possible implementations, one by simply using a list for I and a linear search,
the other one by using a partitioned list for I, that is a list of lists, each sub-list being a
partition. In this second case, you can use a Python dictionary, in which a couple (key,
value) represents one partition: the key is the label/index for the partition while the value is
the list of values for the partition. How many partitions ? How could you distribute the
900,000 numbers over partitions ? Could an hashing function based on the modulo
operator help? Recall that H(n) = n mod m, where n and m are positive integers, returns a
number between 0 and m-1 for any n.
- compute the sum of values v for each k value ; for instance the couple (3 ,S) should
be resulted, where S = Σ(3,v)∈C v
- how could you improve execution time and memory consumption? could sorting
(k,v) couples on k help? If so, use a technique you know to sort C on k, and then
provide a Python function groupSortedAndSum(C) where C is now assumed to be
sorted on k. Then observe/estimate the cost of sorting + grouping and summing. Do
you notice any improvements?
- could partitioning based on the modulo operator help in lowering execution time? If
so, provide a Python program relying on modulo-based partitioning, by re-using
groupAndSum(C) in each part. Does execution time improve? Do you believe that
with partitioning, a parallel approach in which parts are processed in parallel would
be even better?
n-way merge-sort. Design and implement in Python an algorithm for n-way merge-sort of
n lists of integers. We assume that each of the n lists is sorted in ascendent fashion.
fi
fl
M2 MIAGE Lab Session
The algorithm is encapsulated into a function which we name n-way-ms taking as input the
list of sorted integer lists.
For instance, if L = [ [2, 3.5, 3.7, 4.5, 6, 7], [2.6, 3.6, 6.6, 9], [3, 7, 10] ]
Consider that you are not allowed to use any sort operation, the n-way-ms algorithm is
required to produce its output by means of a linear scan of all the lists in the input list.
In coding the algorithm it is worth using the pop(i) function fro lists. For instance l.pop(0)
returns the rst value of the list and discard it from the liste itself.
Question: can you estimate the cost of the algorithm? Do you believe more ef cient
versions would exist?
Remark: besides being interesting in per se, n-way merge-sort is important to know
because this kind of data manipulation is behind one of the main steps of shuf e-and-sort
in MapReduce-based systems.
Counting words. Write a Python function that takes a list T of strings and returns a list of
couples (w, c) where w is a word occurring in strings of T and c is the number of
occurrences of c. For instance, if
T=["aa bb cc", "bb aa gg", "aa gg"] then the list returned is [("aa",3), ("bb",2), ("gg",2)]
In case T is a big collection of text lines, would techniques previously explored help for
improving execution time?
fi
fi
fl