0% found this document useful (0 votes)
22 views17 pages

Idsup A1

Uploaded by

werblacklisted
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views17 pages

Idsup A1

Uploaded by

werblacklisted
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

SIKSHA ‘O’ ANUSANDHAN

DEEMED TO BE UNIVERSITY

Admission Batch: 2021 Session: 2023-24

Laboratory Record

Introduction to Data Science using Python (CSE 3054)

Submitted by

Name: Priyanshu Rout

Registration No.: 2141018124

Branch: Computer Science and Engineering

Semester: 6th Section: 22

Department of Computer Science & Engineering


Faculty of Engineering & Technology (ITER)
Jagamohan Nagar, Jagamara, Bhubaneswar, Odisha - 751030
INDEX

Sl. Name of Program Page Remarks


No No

01 Minor Assignment - 1 1-15


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

1. An anonymous dataset containing each user’s salary (in dollars) and tenure as a
data scientist (in years) is given.
salaries and tenures = [(83000, 8.7), (88000, 8.1), (48000, 0.7),
(76000, 6), (69000, 6.5), (76000, 7.5), (60000, 2.5), (83000, 10),
(48000, 1.9), (63000, 4.2)]
Find out the average salary for each tenure and print a massage according to its
value, i.e.,” less than two”,” between two and five” and” more than five” tenure and
group together the salaries correspond-ing to each bucket. Compute the average
salary for each group.

Program :-

from collections import defaultdict


salaries_and_tenures = [(83000, 8.7), (88000, 8.1),
(48000, 0.7), (76000, 6),
(69000, 6.5), (76000, 7.5),
(60000, 2.5), (83000, 10),
(48000, 1.9), (63000, 4.2)]
salary_per_tenure = defaultdict(list)
for salary, tenure in salaries_and_tenures:
salary_per_tenure[tenure].append(salary)
average_salary_by_tenure = {
tenure: sum(salaries) / len(salaries)
for tenure, salaries in salary_per_tenure.items()
}
print(average_salary_by_tenure)
def tenure_bucket(tenure):
if tenure < 2:
return "less than two"
elif tenure < 5:
return "between two and five"
else:
return "more than five"
salary_by_tenure_bucket = defaultdict(list)
for salary, tenure in salaries_and_tenures:
bucket = tenure_bucket(tenure)
salary_by_tenure_bucket[bucket].append(salary)

average_salary_by_bucket = {
tenure_bucket: sum(salaries) / len(salaries)
for tenure_bucket, salaries in salary_by_tenure_bucket.items()
}

print(average_salary_by_bucket)

Name: Saswat Mohanty Regd.


Name : Priyanshu Rout 1 Regd. No. : 2141018124
of &
of &

OUTPUT: -
{8.7: 83000.0, 8.1: 88000.0, 0.7: 48000.0, 6: 76000.0, 6.5: 69
000.0, 7.5: 76000.0, 2.5: 60000.0, 10: 83000.0, 1.9: 48000.0,
4.2: 63000.0} {'more than five': 79166.66666666667,'less than
two': 48000.0, 'between two and five':61500.0}

2. For the above data there seems to be a correspondence betwee n years of experience
and paid accounts
Users with very few and very many years of experience tend to pay; users with
average amounts of experience don’t. Find out the condition for this correspondence
and print it.

Program :-

from collections import defaultdict

salaries_and_tenures = [(83000, 8.7), (88000, 8.1),


(48000, 0.7), (76000, 6),
(69000, 6.5), (76000, 7.5),
(60000, 2.5), (83000, 10),
(48000, 1.9), (63000, 4.2)]

def predict_paid_or_unpaid(years_experience):
if years_experience < 3.0:
return "paid"
elif years_experience < 8.5:
return "unpaid"
else:
return "paid"

salary_per_tenure = defaultdict(list)

for salary, tenure in salaries_and_tenures:


paid_or_unpaid = predict_paid_or_unpaid(tenure)
salary_per_tenure[tenure].append(paid_or_unpaid)

res={}

for key, value in salary_per_tenure.items():


res[key]=value[0]

print(res)

Name
Name :: Priyanshu Rout
Saswat Mohanty 2 Regd. Regd. No. : 2141018124
of &
of &

OUTPUT: -
{8.7: 'paid', 8.1: 'unpaid', 0.7: 'paid', 6: 'unpaid', 6.5: 'u
npaid', 7.5: 'unpaid', 2.5: 'paid', 10: 'paid', 1.9: 'paid', 4
.2: 'unpaid'}

3. Write a Python Script to generate random passwords (alpha nu meric). Ask users to
enter the length of password and number o f passwords they want to generate and then
save all the genera ted passwords as a textfile named “MyPasswords.txt”.

Program :-

import random
import string

x= string.ascii_uppercase + string.ascii_lowercase +
string.ascii_uppercase + string.digits

number = input('Number of passwords - ')


number = int(number)

length = input('password length? - ')


length = int(length)

for i in range(number):
password = ''
for j in range(length):
password += random.choice(x)

print(password)
password += "\n"

file = open('MyPasswords.txt', "a")


file.write(password)
file.close()

Name: :Priyanshu
Name Saswat Mohanty
Rout 3 Regd.
Regd. No. : 2141018124
of &
of &

OUTPUT: -
Number of passwords - 4
password length? - 8
nDOL17rB
QU5XWHKq
dWDKDAj2
vftmoigI

4. Given a file named “MyText.txt” containing several lines/paragraphs, find all unique
characters (ignore space, comma, full stop, brackets, and quotes etc.) present in the file.
Capital and small letter are counted as same. Find the frequency (fi) of all characters in
the file and print the output as follows. The character “a” is present times in the
document. The character “t” is present times in the document.

Program :-

f = open("MyText.txt","r")
lines = f.readlines()
string=''+str(lines)
uniquechar=''.join(set(string))
print(len(uniquechar))
for i in range(len(uniquechar)):
if uniquechar[i].isalpha():
print(uniquechar[i],'=',end='')
print(string.count(uniquechar[i]))
f.close()

OUTPUT: -
11
h =1
w =1
o =2
l =3
e =1
r =1
d=1

Name
Name : Priyanshu
: Saswat Mohanty Rout 4 Regd. Regd. No. : 2141018124
of &
of &

5. Use the above program as a function and use it to write another function to compare
contents of two files “MyText1.txt” and “MyText2.txt”.
a. The output must also give the following information.
File MyText1 contain more (or less or equal) characters than MyText2.
b. The output must be printed in the following format depending on content of the file.
File MyText1 contain more (or less or equal) unique characters than MyText2.
c. The frequency of each character must be summarized.
The frequency of character of character “x” in file MyText1 is more (or less or equal) to
characters than MyText2.
d. The relative frequency of each character also must be summarized.
The relative frequency of character of character “x” in file MyText1 is more (or less or
equal) to characters than MyText2.
The input files should be nonempty.

Program :-

import re
my_dict={}
my_dict2={}
def readFile():
f=open('Mytext.txt','r')
for line in f:
x=re.findall("[0-9a-zA-Z]", line)
for char in x:
ch=char.lower()
if ch in my_dict:
my_dict[ch]=int(my_dict[ch])+1
else:
my_dict[ch]=1
print(my_dict)
readFile2()
def readFile2():
f=open("Mytext2.txt",'r')
for line in f:
x=re.findall("[0-9a-zA-Z]", line)
for char in x:
ch=char.lower()
if ch in my_dict2:
my_dict2[ch]=int(my_dict2[ch])+1
else:
my_dict2[ch]=1
print(my_dict2)
compareWord(my_dict,my_dict2)
def compareWord(my_dict1,my_dict2):
print(set(my_dict2)-set(my_dict1))
x=set(my_dict2)-set(my_dict1)
for key,val in my_dict1.items():

Name : Saswat Mohanty Regd.


Name : Priyanshu Rout 5 Regd. No. : 2141018124
of &
of &

if key not in my_dict2:


print("The char "+key+" is more in MyText1.txt than MyText2.txt")
continue
if(my_dict1[key]>my_dict2[key]):
print("The char "+key+" is more in MyText1.txt than MyText2.txt")
elif(my_dict1[key]<my_dict2[key]):
print("The char "+key+" is less in MyText1.txt than MyText2.txt")
elif(my_dict1[key]==my_dict2[key]):
print("The char "+key+" is equal in MyText1.txt and MyText2.txt")
for key in x:
print("The char "+key+" is less in MyText1.txt than MyText2.txt")

def printFreq(my_dict):
for key,val in my_dict.items():
print("The character \""+str(key)+"\" is present "+str(val)+" times in the
document")

if name == ' main ':


readFile()

OUTPUT: -
{'h': 3, 'i': 5, 'a': 3, 'm': 1, 'r': 1, 's': 1, 't': 1}
{'h': 3, 'e': 2, 'y': 1, 'a': 2, 'w': 2, 't': 1, 's': 1, 'u':
1, 'p': 1, 'l': 3, 'o': 2, 'r': 1, 'd': 1}
{'l', 'w', 'p', 'd', 'u', 'y', 'o', 'e'}
The char h is equal in MyText1.txt and
MyText2.txt The char i is more in
MyText1.txt than MyText2.txt The char
a is more in MyText1.txt than
MyText2.txt The char m is more in
MyText1.txt than MyText2.txt The char
r is equal in MyText1.txt and
MyText2.txt The char s is equal in
MyText1.txt and MyText2.txt The char t
is equal in MyText1.txt and
MyText2.txt The char l is less in
MyText1.txt than MyText2.txt The char
w is less in MyText1.txt than
MyText2.txt The char p is less in
MyText1.txt than MyText2.txt The char
d is less in MyText1.txt than
MyText2.txt The char u is less in
MyText1.txt than MyText2.txt The char
y is less in MyText1.txt than
MyText2.txt The char o is less in
MyText1.txt than MyText2.txt The char
e is less in MyText1.txt than
MyText2.txt

Name : Saswat Mohanty Regd.


Name : Priyanshu Rout 6 Regd. No. : 2141018124
of &
of &

6. Read a list named StringList1 containing strings from the key board. Generate a stri
ng MStringList1 that contains all items of StringList1 that are repeated twice or a great
er number of times and print this list. By observing the outcome of MStringList1 perfor
m the following tasks:
a. Check wather an item of MStringList1 occurs even number of times or odd number
of times in StringList1.
b. Remove the ith (i ≥ 2) occurrence of a given word in a StringList1.

Program :-

def check(lst):
lst1=[]
set1=set(lst)
for i in set1:
if lst.count(i)>=2:
lst1.append(i)
print('MStringList1',lst1)
print(list(set1))
lst=eval(input("Enter the list"))
check(lst)

OUTPUT: -
Enter the list[7,8,9,25,85]
MStringList1 []
[7, 8, 9, 85, 25]

7. From the file” MyText.txt” count frequencies of various alphabets (Convert upper
case into lower case), plot the results for this as a bar chart with x-axis being the letter
and y axis as the corresponding frequency.

import matplotlib.pyplot as plt


f = open("MyText.txt","r")
file1=f.readlines()
filestr = file1[0]
freq = {}
for i in filestr:
if i in freq:
freq[i] += 1
else:
freq[i] = 1
Name : Saswat Mohanty Regd.
Name : Priyanshu Rout 7 Regd. No. : 2141018124
of &
of &

freq = dict((k.lower(), v) for k, v in freq.items())


xaxis = list(freq.keys())
yaxis = list(freq.values())
plt.bar(range(len(freq)), yaxis, tick_label=xaxis)
plt.show()

OUTPUT: -

Name:: Priyanshu
Name Saswat Mohanty
Rout 8 Regd.
Regd. No. : 2141018124
of &
of &

8. Use the following data to plot the number of applicants per year as a scatter plot.
year = [2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012]
no application per year = [921261, 929198, 1043739, 1186454, 1194938, 1304495,
1356805, 1282000, 479651]

Program :-

from matplotlib import pyplot as plt

year = [2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012]
no_application_per_year = [921261, 929198, 1043739, 1186454,
1194938, 1304495, 1356805, 1282000, 479651]
plt.scatter(year,no_application_per_year)

plt.xlabel('# of applicant')
plt.ylabel('year')

plt.show()

OUTPUT: -

Name : Priyanshu Rout 9 Regd. No. : 2141018124


Name : Saswat Mohanty Regd.
of &
of &

9. Plot xsinx, x2sinx , x3sinx and x4sinx in a single plot in the range x ∈ [−10, 10].

Program :-

import matplotlib.pyplot as plt


import numpy as np
x = np.arange(-10,10)
y = x*(np.sin(x))
y1 = x*x*(np.sin(x))
y2 = x*x*x*(np.sin(x))
y3 = x*x*x*x*(np.sin(x))

plt.plot(x,y,x,y1,x,y2,x,y3)
plt.show()

OUTPUT: -

Name
Name: :Priyanshu Rout
Saswat Mohanty 10 Regd. Regd. No. : 2141018124
of &
of &

10. Plot histogram for age of male and female in different plots for the following data of
male and female age.
male age = [53,51,71,31,33,39,52,27,54,30,64,26,21,54,52,20,59,32]
female age = [53,65,68,21,75,46,24,63,61,24,49,41,39,40,25,54,42,32,48,23,23]

Program :-

#Male
import matplotlib.pyplot as plt
male_age = [53,51,71,31,33,39,52,27,54,30,64,26,21,54,52,20,59,32]
male_age_set = set(male_age)
plt.hist(male_age, edgecolor='black')
plt.xlabel('age')
plt.ylabel('Person count')
plt.show()
OUTPUT: -

#Female
import matplotlib.pyplot as plt
female_age = [53,65,68,21,75,46,24,63,61,24,49,41,39,40,25,54,42,
32,48,23,23]
plt.hist(female_age, edgecolor='black')
plt.xlabel('Age')
plt.ylabel('Person count')
plt.show()

Name:: Priyanshu
Name Saswat Mohanty
Rout 11 11 Regd. Number:
Regd. 1941012407
No. : 2141018124
of &
of &

OUTPUT: -

11. Plot the temperature extremes in certain region of India for each month, starting in
January, which are given by (in degrees Celsius).
max: 17, 19, 21, 28, 33, 38, 37, 37, 31, 23, 19, 18
min: -62, -59, -56, -46, -32, -18, -9, -13, -25, -46, -52, -58

Program :-

from matplotlib import pyplot as plt


x = [1,2,3,4,5,6,7,8,9,10,11,12]
max = [17, 19, 21, 28, 33, 38, 37, 37, 31, 23, 19, 18]
min = [-62, -59, -56, -46, -32, -18, -9, -13, -25, -46, -52, -58]

plt.xlabel('Months')
plt.ylabel('Temperature (°C)')
plt.plot(x,max, 'm-.', x, min, 'c:')
plt.show()

OUTPUT: -

Name : Priyanshu Rout 12 Regd. No. : 2141018124


Name: Saswat Mohanty 12 Regd. Number: 1941012407
of &
of &

12. Python Program to find all Numbers in a Range (given by user) which are Perfect
Squares and Sum of all Digits in the Number is Less than 10.

Program :-

l=int (input ("Enter lower bound "))


u=int (input ("Enter upper bound "))
a= []
a= [x for x in range(l,u+1) if (int(x**0.5))**2==x and
sum(list(map(int,str(x))))<10]
print(a)

OUTPUT: -
Enter lower bound 20
Enter upper bound 40
[25, 36]

13. Plot a bar chart with axis labels for given data:
mentions = [500, 505]
years = [2017, 2018]
Do not give any extra condition for x-axis as well as y-axis. Now again plot the bar chart
for this data and start y-axis from 0.
State the difference in both the bar chart.

Program :-

from matplotlib import pyplot as plt

mentions = [500, 505]


years = [2017, 2018]

plt.bar(range(len(years)), mentions)
plt.ylim(bottom=0)

plt.ylabel("Mentions")
plt.xticks(range(len(years)),years)

plt.show()

Name
Name : Saswat Mohanty
: Priyanshu Rout 13 Regd.
Regd. No. : 2141018124
of &
of &

OUTPUT: -

14. Plot the scatter plot for following data with unequal axis and then equal axis. Also
state the difference in two.
test 1 grades = [ 99, 90, 85, 97, 80]
test 2 grades = [100, 85, 60, 90, 70]

Program :-

#equal
from matplotlib import pyplot as plt
test_1_grades = [ 99, 90, 85, 97, 80]
test_2_grades = [100, 85, 60, 90, 70]
plt.scatter(test_1_grades, test_2_grades)
plt.xlabel("test 1 grade")
plt.ylabel("test 2 grade")
plt.axis("equal")
plt.show()

Name
Name : Saswat Mohanty
: Priyanshu Rout 14 Regd. Number:
Regd. No. : 2141018124
of &
of &

OUTPUT: -

#unequal
from matplotlib import pyplot as plt
test_1_grades = [ 99, 90, 85, 97, 80]
test_2_grades = [100, 85, 60, 90, 70]

plt.scatter(test_1_grades, test_2_grades)
plt.xlabel("test 1 grade")
plt.ylabel("test 2 grade")

plt.show()

OUTPUT: -

Name
Name : Saswat Mohanty
: Priyanshu Rout 15 Regd. Number:
Regd. No. : 2141018124

You might also like