0% found this document useful (0 votes)

2 views2 pages

lab1-dataAlgorithms copy

The document outlines a lab session for M2 MIAGE focused on SQL and MapReduce principles using Python. It includes exercises on searching via partitioning, grouping and aggregation, n-way merge-sort, and counting words, each requiring specific Python implementations and optimizations. The exercises emphasize performance improvements through techniques like partitioning, sorting, and efficient data structures.

Uploaded by

dahmanisaad642

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views2 pages

lab1-dataAlgorithms copy

Uploaded by

dahmanisaad642

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

M2 MIAGE Lab Session

Exercise sheet : understanding main principles behind SQL and MapReduce

via Python programming

Searching via partitioning. Write a Python program that

• generates a collection I of 900,000 random integer numbers.

• asks then 5 integer numbers to the user, and for each number answers YES if the
number is in I, NO otherwise

Provide two possible implementations, one by simply using a list for I and a linear search,
the other one by using a partitioned list for I, that is a list of lists, each sub-list being a
partition. In this second case, you can use a Python dictionary, in which a couple (key,
value) represents one partition: the key is the label/index for the partition while the value is
the list of values for the partition. How many partitions ? How could you distribute the
900,000 numbers over partitions ? Could an hashing function based on the modulo
operator help? Recall that H(n) = n mod m, where n and m are positive integers, returns a
number between 0 and m-1 for any n.

Grouping and aggregation. Write a Python program that:

- generates a collection C of 30,000,000 couples (k,v) where k is a random integer

number between 1 and 7 while v is a random integer number between 1 and 1000.

- compute the sum of values v for each k value ; for instance the couple (3 ,S) should
be resulted, where S = Σ(3,v)∈C v

- rst provide a naive solution, by means of a function groupAndSum(C) where C is

the collection of couples (k,v). Please do not use any form of dictionary (or nested
list). You can use a ( at) list in order to keep track of already processed key values.
Then estimate the cost in terms of execution time of groupAndSum(C) ; if n is the
number of couples, the execution time is rather linear in n or quadratic (n2) or
other?

- how could you improve execution time and memory consumption? could sorting
(k,v) couples on k help? If so, use a technique you know to sort C on k, and then
provide a Python function groupSortedAndSum(C) where C is now assumed to be
sorted on k. Then observe/estimate the cost of sorting + grouping and summing. Do
you notice any improvements?

- could partitioning based on the modulo operator help in lowering execution time? If
so, provide a Python program relying on modulo-based partitioning, by re-using
groupAndSum(C) in each part. Does execution time improve? Do you believe that
with partitioning, a parallel approach in which parts are processed in parallel would
be even better?

n-way merge-sort. Design and implement in Python an algorithm for n-way merge-sort of
n lists of integers. We assume that each of the n lists is sorted in ascendent fashion.
fi
fl
M2 MIAGE Lab Session
The algorithm is encapsulated into a function which we name n-way-ms taking as input the
list of sorted integer lists.

For instance, if L = [ [2, 3.5, 3.7, 4.5, 6, 7], [2.6, 3.6, 6.6, 9], [3, 7, 10] ]

then n-way-ms(L) returns

[ 2, 2.6, 3, 3.5, 3.6, 3.7, 4.5, 6, 6.6, 7, 7, 9, 10]

Consider that you are not allowed to use any sort operation, the n-way-ms algorithm is
required to produce its output by means of a linear scan of all the lists in the input list.

In coding the algorithm it is worth using the pop(i) function fro lists. For instance l.pop(0)
returns the rst value of the list and discard it from the liste itself.

Question: can you estimate the cost of the algorithm? Do you believe more ef cient
versions would exist?

Remark: besides being interesting in per se, n-way merge-sort is important to know
because this kind of data manipulation is behind one of the main steps of shuf e-and-sort
in MapReduce-based systems.

Counting words. Write a Python function that takes a list T of strings and returns a list of
couples (w, c) where w is a word occurring in strings of T and c is the number of
occurrences of c. For instance, if

T=["aa bb cc", "bb aa gg", "aa gg"] then the list returned is [("aa",3), ("bb",2), ("gg",2)]

In case T is a big collection of text lines, would techniques previously explored help for
improving execution time?
fi
fi
fl

Breaking Barriers: Micro-Mortgage Analytics
No ratings yet
Breaking Barriers: Micro-Mortgage Analytics
121 pages
Python Programming -Basics to Advanced (1)
No ratings yet
Python Programming -Basics to Advanced (1)
107 pages
Week-4 Sorting, Dictionaries and Functions
No ratings yet
Week-4 Sorting, Dictionaries and Functions
110 pages
Unit 5
No ratings yet
Unit 5
29 pages
Chapter 12
No ratings yet
Chapter 12
54 pages
infIILecture6.en.handout
No ratings yet
infIILecture6.en.handout
34 pages
Ada Sumission Adib
No ratings yet
Ada Sumission Adib
35 pages
Unit05 3 MergeSort
No ratings yet
Unit05 3 MergeSort
27 pages
data structure 总结
No ratings yet
data structure 总结
10 pages
CS1010S Lecture 07 - Searching & Sorting
No ratings yet
CS1010S Lecture 07 - Searching & Sorting
140 pages
Python Prac3 14
No ratings yet
Python Prac3 14
24 pages
Project Report On Employee Training and Development of Itc Chirala
0% (2)
Project Report On Employee Training and Development of Itc Chirala
99 pages
PDSP2023 Lecture09 12sep2023
No ratings yet
PDSP2023 Lecture09 12sep2023
6 pages
Rice Crop Parameters
100% (2)
Rice Crop Parameters
9 pages
Sample - Motion To Set Aside Default - Civil Case - California
93% (15)
Sample - Motion To Set Aside Default - Civil Case - California
7 pages
Implement Linear Search and Compute Space and Time Complexities1
No ratings yet
Implement Linear Search and Compute Space and Time Complexities1
7 pages
Insertion and Merge Sort
No ratings yet
Insertion and Merge Sort
26 pages
python programs
No ratings yet
python programs
13 pages
244MCC112L-MCA Python Lab Manual (1)
No ratings yet
244MCC112L-MCA Python Lab Manual (1)
27 pages
Extra Class 2
No ratings yet
Extra Class 2
19 pages
Lab Manual 22CS303 DAA
No ratings yet
Lab Manual 22CS303 DAA
25 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
44 pages
Cs3401-Algorithm Lab Manual
No ratings yet
Cs3401-Algorithm Lab Manual
49 pages
DSA Turing
No ratings yet
DSA Turing
8 pages
ARRAY
No ratings yet
ARRAY
25 pages
11. Recursion, Searching, Sorting
No ratings yet
11. Recursion, Searching, Sorting
14 pages
Computer Project Class 11 Cbse
No ratings yet
Computer Project Class 11 Cbse
9 pages
Python Programming Lab Programs
No ratings yet
Python Programming Lab Programs
38 pages
23CS101T PSPP - Unit 3 & 4 Programs
No ratings yet
23CS101T PSPP - Unit 3 & 4 Programs
13 pages
fulltech_bootcamp
No ratings yet
fulltech_bootcamp
55 pages
Algorithm Materials
No ratings yet
Algorithm Materials
43 pages
FYCS Practical Mannual For DAA 1-11 (2021-22)
No ratings yet
FYCS Practical Mannual For DAA 1-11 (2021-22)
14 pages
Python Programming Jagesh Soni
No ratings yet
Python Programming Jagesh Soni
15 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
55 pages
Minor Assignment-2(Computer Science Thinking-Recursion, Searching, Sorting and Big O)
No ratings yet
Minor Assignment-2(Computer Science Thinking-Recursion, Searching, Sorting and Big O)
2 pages
X Sa Xy CQitu ZXNBND
No ratings yet
X Sa Xy CQitu ZXNBND
16 pages
Sodapdf
No ratings yet
Sodapdf
20 pages
Array
No ratings yet
Array
27 pages
Print Lab Programs 1
No ratings yet
Print Lab Programs 1
17 pages
Class_10_AI_Practical_File 2023-24
No ratings yet
Class_10_AI_Practical_File 2023-24
20 pages
Python Record
No ratings yet
Python Record
37 pages
Record
No ratings yet
Record
15 pages
Types of Sorting and Searching in Python
No ratings yet
Types of Sorting and Searching in Python
6 pages
interview python
No ratings yet
interview python
15 pages
V315H3 Pe6
No ratings yet
V315H3 Pe6
35 pages
Practical File Of: Python Practicals
No ratings yet
Practical File Of: Python Practicals
19 pages
SampleQues Technical
No ratings yet
SampleQues Technical
10 pages
Sorting Method
No ratings yet
Sorting Method
6 pages
Python Coding Queries Answered
No ratings yet
Python Coding Queries Answered
10 pages
Python Lab Assignment Solution
No ratings yet
Python Lab Assignment Solution
17 pages
THIRD UNIT SECOND HALF
No ratings yet
THIRD UNIT SECOND HALF
12 pages
TEC - Algorithm Lab
No ratings yet
TEC - Algorithm Lab
42 pages
Python Lab Program 2022
No ratings yet
Python Lab Program 2022
8 pages
Python File
No ratings yet
Python File
23 pages
Python Lab Manual
No ratings yet
Python Lab Manual
17 pages
Python Lab File1
No ratings yet
Python Lab File1
18 pages
Document (1)
No ratings yet
Document (1)
16 pages
Principles of Fumigation (Basic) - CTS
100% (5)
Principles of Fumigation (Basic) - CTS
48 pages
Python Lab
No ratings yet
Python Lab
27 pages
4.0 Trait Approach To Leadership MZ 04
100% (1)
4.0 Trait Approach To Leadership MZ 04
36 pages
Coding Interview Cheat Sheet
No ratings yet
Coding Interview Cheat Sheet
2 pages
Basics of Cell Culture
No ratings yet
Basics of Cell Culture
35 pages
Course - Data Structures With Python
No ratings yet
Course - Data Structures With Python
9 pages
basics of DSA
No ratings yet
basics of DSA
8 pages
Python Content
No ratings yet
Python Content
7 pages
Petroleum_Refining_and_Petrochemicals_notes
No ratings yet
Petroleum_Refining_and_Petrochemicals_notes
29 pages
Presentacion Ingles
No ratings yet
Presentacion Ingles
21 pages
Sara,+37604 100975 1 CE
No ratings yet
Sara,+37604 100975 1 CE
13 pages
hgvhv
No ratings yet
hgvhv
9 pages
Self Monitoring and Social Categorization (PSY 411)
No ratings yet
Self Monitoring and Social Categorization (PSY 411)
11 pages
6em Soc
No ratings yet
6em Soc
194 pages
DLL - Mapeh 4 - Q4 - W3
No ratings yet
DLL - Mapeh 4 - Q4 - W3
9 pages
Exchanges Special Report Features Holli Holden Heflin
No ratings yet
Exchanges Special Report Features Holli Holden Heflin
16 pages
UDM PPT
No ratings yet
UDM PPT
7 pages
Associates: Youth Mind First Aid
No ratings yet
Associates: Youth Mind First Aid
8 pages
HW PreF-Reading Unit3MultipleChoiceQuestion
No ratings yet
HW PreF-Reading Unit3MultipleChoiceQuestion
3 pages
Home Trade Scam
100% (1)
Home Trade Scam
6 pages
NCC Education International Diploma IN Computer Studies Visual Basic December 2009 - Local Examination
No ratings yet
NCC Education International Diploma IN Computer Studies Visual Basic December 2009 - Local Examination
6 pages
Chua vs. Padillo
No ratings yet
Chua vs. Padillo
6 pages
Student Record Keeping System Database: Team Members
No ratings yet
Student Record Keeping System Database: Team Members
27 pages
Curriculum Vitae of Cell: Email:: Tamanna Ahmed
No ratings yet
Curriculum Vitae of Cell: Email:: Tamanna Ahmed
2 pages
Research Tools To Identify Your Target Audience
No ratings yet
Research Tools To Identify Your Target Audience
8 pages
2021 Activity 6 in Spirituality in Nursing
No ratings yet
2021 Activity 6 in Spirituality in Nursing
2 pages
Constitutional Crisis
No ratings yet
Constitutional Crisis
3 pages
P9 - 2012
No ratings yet
P9 - 2012
1 page
Solved Paris Participates in Her Employer S Nonqualified Deferred Compensation Plan For
No ratings yet
Solved Paris Participates in Her Employer S Nonqualified Deferred Compensation Plan For
1 page
Up Zila Pramukh Result 2020
No ratings yet
Up Zila Pramukh Result 2020
1 page
A Love Beyond Time
No ratings yet
A Love Beyond Time
1 page
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

lab1-dataAlgorithms copy

Uploaded by

lab1-dataAlgorithms copy

Uploaded by

M2 MIAGE Lab Session

Exercise sheet : understanding main principles behind SQL and MapReduce

Searching via partitioning. Write a Python program that

• generates a collection I of 900,000 random integer numbers.

Grouping and aggregation. Write a Python program that:

- generates a collection C of 30,000,000 couples (k,v) where k is a random integer

- rst provide a naive solution, by means of a function groupAndSum(C) where C is

then n-way-ms(L) returns

[ 2, 2.6, 3, 3.5, 3.6, 3.7, 4.5, 6, 6.6, 7, 7, 9, 10]

You might also like