IC152 Lab Assignment 6

This document provides instructions for Assignment 6 which involves analyzing correlation between variables, creating scatter plots, finding word frequencies in text, and writing the results to a CSV file. It includes 3 problems: 1) analyzing correlation between variable sets, creating scatter plots, and finding correlation coefficients; 2) finding word frequencies in a text file and writing the results to a CSV; and 3) an extra problem to apply the word frequency analysis to a text in another language. Students are instructed to submit a compressed folder of their Python files with a name indicating their student ID.

Uploaded by

Badal Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

IC152 Lab Assignment 6

Uploaded by

Badal Gupta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IC152: Assignment 6

Correlation, Dictionary, and Writing CSVs

2 questions

PLAG CHECK WILL BE DONE FROM THIS ASSIGNMENT. MAKE SURE

YOUR CODE SUBMITTED ON LMS IS NOT COPIED FROM OTHER
GROUPS

You must keep filenames as mentioned in the assignment.

Important: If you copy the assignment or any of its parts from others or
share with others, our plagiarism softwares will catch it and you will be
awarded 0 marks for the whole assignment or F grade for the course.

If you are solving this assignment in the A11’s PC Lab: Keep Fn + F9

pressed during the start of your machine (do not repeatedly press, keep it
continuously pressed), and then select the second option with “ubuntu”.
Please check if you are able to login to moodle, else shift to another
machine.

Problem 0: Command Line Arguments on Terminal and Writing csv.

- Open samplescript.py and try to understand its contents.
- In python, you can ‘import sys’ and then use ‘fname1 = sys.argv[1]’
for the first input file name, ‘fname1 = sys.argv[2]’ for the second
input file name and so on. You will need this in problem 1 and
problem 2 both.
- The remaining part of the code writes a list of dictionaries to a csv
with name ‘sampleScriptOutput.csv’, which you will need in problem
2. Follow the next 5 steps to run “samplescript.py” on the terminal
with command line arguments.

1. Click on properties where your assignment folder is located:

2. Copy the path under attribute “Parent folder:”:

3. Click on “Activities” in top right of Ubuntu (Linux) and type terminal.

Following tab should appear:-
4. write “cd ” (cd followed by a space) and then paste the copied path
using “Shift + Ctrl + V” keys:

5. Now you can “cd Assignment6/” and then try “python3

samplescript.py problem1b_i_xy problem1b_ii_xy” and see what it
does (note: sampleScriptOutput.csv will be created in
Assignment6/):
Problem 1: Correlation

Consider two vectors: X = [x1-μx,x2-μx, …, xn-μx] and Y = [y1-μy,y2-μy, …, yn-μy].

μx is the mean of xi’s, and μy of yi’s. The cosine of the angle between these
two vectors can be given by ratio of dot product of these two vectors with
their magnitude, i.e. ratio of X.Y and |X||Y|.

Interestingly, correlation is a statistical method that measures the

similarity of the variation between two random vectors. The correlation
coefficient (value in between -1 to +1 similar to cosine) in between two
vectors can be calculated with the help of the given formula:

Where, n = sample size, xi and yi are the sample points with index i.
a. Prove that the ratio of X.Y and |X||Y| is equal to r given in above
equation (X and Y are vectors defined before) . Show your proof to
the lab instructor/TAs for evaluation. 5 marks
b. When one variable increases as the other increases the correlation
(r) is positive. If one decreases as the other increases r is negative.
Complete absence of correlation is represented by r = 0. Plot the
variables (x and y) as scatter plots given in files: 15 marks
i. problem1b_i_xy
ii. problem1b_ii_xy
iii. problem1b_iii_xy
- Each file has two lines. The first line has the character ‘x’ followed
by the values of xi’s. Similarly the second line has the character ‘y’
followed by values of yi’s. You should ignore x and y written in the
file and use remaining values in each line for plotting.
- Although there are a fixed number of points in the above mentioned
files, the code should be generic to work for any number of points in
the text file.
- The code should also handle corner cases, e.g. when a file has
characters instead of numbers.
- Save the file as problem1b.py. It should take input files as three
arguments, exact usage:
####################################################
python3 problem1b.py problem1b_i_xy problem1b_ii_xy problem1b_iii_xy
####################################################
- In python, you can ‘import sys’ and then use ‘file_name1 =
sys.argv[1]’ for the first input file name, ‘file_name1 = sys.argv[2]’
for second input file name and so on.
- Executing the above command (between #s) in the linux terminal
must save the images with the same names as input files but with
the .png extension. E.g., problem1b_i_xy.png, problem1b_ii_xy.png, and
problem1b_iii_xy.png for inputs mentioned in the terminal command
above.
- Your python file/code should work for any number of input files.
- The python code/file should prompt the user with the usage
instruction if the user forgets to provide any input file.
c. Write the code to find correlation between the different cases of
two variables (x and y) as given in part b and use the equation for r
(mentioned above). 10 marks
- The code file name must be problem1c.py.
- Executing problem1c.py with following usage in linux terminal,
should write different values of r separated by a space in a line
of a file.
####################################################
python3 problem1b.py problem1b_i_xy problem1b_ii_xy problem1b_iii_xy
####################################################
- The output file name must be Output1c.txt
- Although there are a fixed number of points in the given input
files, the code should be generic to work for any number of
points in the text files.
- The code should also handle corner cases, e.g. when a file has
characters instead of numbers.
- Your python file/code should work for any number of input
files.
- The python code/file should prompt the user with the usage
instruction if the user forgets to provide any input file.
- Analyze the numerical value and scatter plots of the variables
(i.e. if correlation is positive, y should increase with increase in
x). Tell your observations to the instructor/TAs.

Problem 2: Dictionary and Writing a csv file

Language models are used to complete the sentences and correct the
recognized text in different AI applications. This question forms the basis
for language models, where not just words in the language but their
context and frequency is also important. 20 marks
[1]
- Read the problem2Input in your python code and find the
frequency of each word in the file using a dictionary. This text is
from Stanford’s large movie review dataset v1.0.
- Sort the keys (words) and values (frequencies) in the dictionary in
descending order of the values/frequencies.
- Write the words (in descending order of frequencies) in the first
column of the csv file. Write the corresponding frequencies in the
second column.
- You can use the following code to write a dictionary to file:
#######################code starts here####################
# dict format required for csv
myDict = [{'word': 'a', 'frequency': 1000}, {'word':
'the', 'frequency': 700}, {'word': 'me', 'frequency':
20}]
# code to write above dict to csv
import csv
with open('problem2Output.csv', 'w') as csvop:
# creating dictionary writer object
writerObj = csv.DictWriter(csvop, fieldnames =
['word', 'frequency'])

# write fieldnames
writerObj.writeheader()
writerObj.writerows(myDict)
#######################code ends here####################

- You will need to convert your dictionary to a list of dictionaries as

required by the above code.
- The program should run for any type of input text file, if it is an
empty file the program should prompt the user to give a file with
text.
- 'problem2Output.csv' must be the name of the output file.
- The program should show the usage to the user if no input is given.
- The name for file should be problem2.py, and should take the input
file as an argument similar to previous question:
###################################################
python3 problem2.py problem2Input
###################################################
After the csv file is created, open the csv file and observe the words with
top 10 frequencies. Would you have guessed them without solving this
problem? Share your observations with the instructor/TAs.

Extra Problem: Try problem 2 code with the text from a wikipedia article
in the language you know. Try to guess the top 5 words in the language
before you start coding or open the csv file after coding (save csv as
extraProblemOutput.csv).

Create the folder having your python files, with name having your roll
number followed by “_assignment6” (don’t use inverted commas in folder
name), compress the folder with .zip extension and submit it on moodle.

Make sure that you delete all your files from the lab PC/Laptop, and shut
it down before you leave.

References:
[1] Maas, Andrew, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew
Y. Ng, and Christopher Potts. "Learning word vectors for sentiment
analysis." In Proceedings of the 49th annual meeting of the association for
computational linguistics: Human language technologies, pp. 142-150.
2011.

Final Cbse Practicals
60% (5)
Final Cbse Practicals
21 pages
Sav Facilities Map
No ratings yet
Sav Facilities Map
2 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
file handling
No ratings yet
file handling
23 pages
Python Lab Programs
No ratings yet
Python Lab Programs
15 pages
Moksh Practical File1
No ratings yet
Moksh Practical File1
43 pages
PLC Lab Programs
No ratings yet
PLC Lab Programs
8 pages
Monthly Test - Class 12 AUGUST
No ratings yet
Monthly Test - Class 12 AUGUST
4 pages
Practical Ans C.S
No ratings yet
Practical Ans C.S
9 pages
12 CS Practical-1
No ratings yet
12 CS Practical-1
15 pages
CS 1301 Homework 3 - Building A Dictionary1
No ratings yet
CS 1301 Homework 3 - Building A Dictionary1
4 pages
Python Practical List Xii 2019-2020
No ratings yet
Python Practical List Xii 2019-2020
7 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Revision Assignment CS
No ratings yet
Revision Assignment CS
13 pages
Programs
No ratings yet
Programs
9 pages
Practical 2024 25
No ratings yet
Practical 2024 25
7 pages
Class XII CS Practical File
No ratings yet
Class XII CS Practical File
25 pages
Py99_sam
No ratings yet
Py99_sam
3 pages
PT 2 - Practice Sheets
100% (1)
PT 2 - Practice Sheets
6 pages
LABORATORY MANUALjinay
No ratings yet
LABORATORY MANUALjinay
26 pages
CBSE Practicals
No ratings yet
CBSE Practicals
37 pages
CS Class 12
No ratings yet
CS Class 12
21 pages
Pract File - Part3 - Final Computer Practical Helpful Notes Download It
No ratings yet
Pract File - Part3 - Final Computer Practical Helpful Notes Download It
36 pages
CSPracticalFile
No ratings yet
CSPracticalFile
13 pages
Final - AISSCE 2023 Class 12 A CS Journal Questions
No ratings yet
Final - AISSCE 2023 Class 12 A CS Journal Questions
33 pages
Data File Handling Worksheet
No ratings yet
Data File Handling Worksheet
10 pages
Sample Test - Solved
No ratings yet
Sample Test - Solved
6 pages
File Handling (FINAL)
No ratings yet
File Handling (FINAL)
37 pages
Python Lab Manual
No ratings yet
Python Lab Manual
22 pages
Practical File Questions
No ratings yet
Practical File Questions
34 pages
Document 1 (2)
No ratings yet
Document 1 (2)
58 pages
First Two Plc Lab Programs Eee_ec (1)
No ratings yet
First Two Plc Lab Programs Eee_ec (1)
10 pages
lecture10
No ratings yet
lecture10
7 pages
Somya - Xii (A) - 60 - CS Practicals
No ratings yet
Somya - Xii (A) - 60 - CS Practicals
47 pages
Front+indxxx
No ratings yet
Front+indxxx
4 pages
python programs
No ratings yet
python programs
10 pages
Python_Practicals
No ratings yet
Python_Practicals
4 pages
Unit Test - 1 Feb 26
No ratings yet
Unit Test - 1 Feb 26
5 pages
Final Practice Questions Set 1
No ratings yet
Final Practice Questions Set 1
8 pages
CS Practice Paper Term1
No ratings yet
CS Practice Paper Term1
19 pages
Upload - NMB
No ratings yet
Upload - NMB
15 pages
Practice 1
No ratings yet
Practice 1
20 pages
Cs Practicals
No ratings yet
Cs Practicals
54 pages
DSE ASSIGNMENT
No ratings yet
DSE ASSIGNMENT
30 pages
LatestPythonLabManual-2023 Batch
No ratings yet
LatestPythonLabManual-2023 Batch
15 pages
Sheep
No ratings yet
Sheep
59 pages
Solution_Practical _file_list_XII_CS_202425
No ratings yet
Solution_Practical _file_list_XII_CS_202425
11 pages
DATA FILE HANDLING chapter clearance
No ratings yet
DATA FILE HANDLING chapter clearance
21 pages
Python
No ratings yet
Python
14 pages
Combined
No ratings yet
Combined
41 pages
Python Answers
No ratings yet
Python Answers
6 pages
Python Pranks and Mischief with NLP
From Everand
Python Pranks and Mischief with NLP
Edward Franklin
No ratings yet
CSE1012-1
No ratings yet
CSE1012-1
24 pages
computer code practical
No ratings yet
computer code practical
17 pages
Stacks & File Handling Worksheet-1 Class 12 CS
No ratings yet
Stacks & File Handling Worksheet-1 Class 12 CS
7 pages
6 To 10
No ratings yet
6 To 10
10 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Python Lab Manual
No ratings yet
Python Lab Manual
17 pages
PYTHON ASSIGNMENT 3 and 4
No ratings yet
PYTHON ASSIGNMENT 3 and 4
3 pages
Cs 12th Term 1 2023-24
No ratings yet
Cs 12th Term 1 2023-24
8 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Speak English
No ratings yet
Speak English
94 pages
CESC Task Sheet Week 5 and 6
No ratings yet
CESC Task Sheet Week 5 and 6
3 pages
Extension Registration Slip
No ratings yet
Extension Registration Slip
19 pages
Engineering Problem Solving With C++ 4th Edition Etter Test Bank 1
100% (69)
Engineering Problem Solving With C++ 4th Edition Etter Test Bank 1
5 pages
Introduction To Business Analytics-Ragesh T.S.
No ratings yet
Introduction To Business Analytics-Ragesh T.S.
5 pages
Cit111 Computer Hardware and Software (17-18) Syllabus
No ratings yet
Cit111 Computer Hardware and Software (17-18) Syllabus
3 pages
Form Pre Assessment
No ratings yet
Form Pre Assessment
2 pages
Pearson MYP Maths Years4&5 Standard TableofContents
No ratings yet
Pearson MYP Maths Years4&5 Standard TableofContents
3 pages
Powerpoint
No ratings yet
Powerpoint
25 pages
Arctic Community Nunavut Unit Plan
No ratings yet
Arctic Community Nunavut Unit Plan
6 pages
Applied Ai Enterprise Java ER Red Hat Developer
100% (1)
Applied Ai Enterprise Java ER Red Hat Developer
64 pages
A Phenomenological Qualitative Inquiry On The Standpoints of Students of Peer Influence On Proper Hygiene
No ratings yet
A Phenomenological Qualitative Inquiry On The Standpoints of Students of Peer Influence On Proper Hygiene
23 pages
Sets MCQ
No ratings yet
Sets MCQ
3 pages
Attitude of ABM Students Towards Mathematics
86% (14)
Attitude of ABM Students Towards Mathematics
13 pages
Jazzlyn Shequin: Elementary Educator
No ratings yet
Jazzlyn Shequin: Elementary Educator
2 pages
Rav Shlomo Miller, and The Dodelson Divorce
No ratings yet
Rav Shlomo Miller, and The Dodelson Divorce
13 pages
How To Upgrade TP-LINK Wireless N Router
No ratings yet
How To Upgrade TP-LINK Wireless N Router
3 pages
Q2 L1 Iphp
100% (1)
Q2 L1 Iphp
6 pages
NCP Visual Sensory
No ratings yet
NCP Visual Sensory
2 pages
Course Outline: Using Rocscience Tools For Geotechnical Engineering
No ratings yet
Course Outline: Using Rocscience Tools For Geotechnical Engineering
4 pages
Effective Instruction
No ratings yet
Effective Instruction
6 pages
Philosophy12 q2 W6 Humanpersoninsociety v4
No ratings yet
Philosophy12 q2 W6 Humanpersoninsociety v4
21 pages
TCNJ Lesson Plan: Aim/Focus Question
No ratings yet
TCNJ Lesson Plan: Aim/Focus Question
8 pages
Organizational Structuring and Project Team Structuring in Integrated Product Development Project
No ratings yet
Organizational Structuring and Project Team Structuring in Integrated Product Development Project
14 pages
Micro, MCQ
No ratings yet
Micro, MCQ
18 pages
Annexure - I Application Format For Empanelment As State Quality Monitor (Roads)
No ratings yet
Annexure - I Application Format For Empanelment As State Quality Monitor (Roads)
3 pages
C.O.T-in Science 5
No ratings yet
C.O.T-in Science 5
1 page
Foreign Language 1: Workbook II: Basic Japanese Sentence Structure
100% (1)
Foreign Language 1: Workbook II: Basic Japanese Sentence Structure
11 pages
ENTREP Summative 2
No ratings yet
ENTREP Summative 2
3 pages