0% found this document useful (0 votes)
80 views

FODS Using Python Practical File

The document contains details of 10 experiments conducted to learn Python programming and data science concepts. Each experiment has the aim, program code, and output. The experiments cover topics like installing Python, using loops and functions, random story generation, reading/writing CSV files, NumPy, statistics, linear regression, and recommender systems. Overall, the document outlines a set of Python programming assignments designed to teach fundamental and advanced data science techniques.

Uploaded by

Tripti Gaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

FODS Using Python Practical File

The document contains details of 10 experiments conducted to learn Python programming and data science concepts. Each experiment has the aim, program code, and output. The experiments cover topics like installing Python, using loops and functions, random story generation, reading/writing CSV files, NumPy, statistics, linear regression, and recommender systems. Overall, the document outlines a set of Python programming assignments designed to teach fundamental and advanced data science techniques.

Uploaded by

Tripti Gaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

SUBMITTED TO:-

NAME:- Ujjawal Gaur


Ms. Anshita Dhoot
Enrollment No. :-
00625511922

1
Sr. Date of Name of Experiment Page Sign Remark
No. Experiment no.
1. Introduction and installation of Python
and Python IDEs for data science
(Spyder-Anaconda, Jupyter Notebook
3
etc.
2. Design a Python program to
generate and print a list except for
the first 5 elements, where the 5
values are squares of numbers
between 1 and 30.
3. Design a Python program to 6
understand the working of loops.
4. Design a Python function to find 7
the Max of three numbers.
5. Design a Python program for 8
creating a random story generator
6. Create a synthetic dataset
(.csv/.xlsx) to work upon and 9
design a Python program to read
and print that data.
7. Design a Python program using 11
Numpy library function.
8. Perform Statistics and Data 13
Visualization in python.
9. Design a Python program to 14
implement Linear Regression
10. Design a Python program to create 16
a recommender system

2
▪ Experiment: 1
Introduction to Python for Data Science:
Python is a popular programming language for data science due to its simplicity, versatility,
and a wide range of libraries and tools specifically designed for data analysis, manipulation,
and visualization. It has gained immense popularity in the data sci ence community because of
its ease of use, strong community support, and the ability to handle data -intensive tasks
efficiently.
Python provides various libraries and frameworks like NumPy, pandas, Matplotlib, Seaborn,
Scikit-Learn, TensorFlow, and PyTorch that make it a powerful tool for data analysis, machine
learning, and deep learning tasks. In this introduction, we'll focus on the installation of
Python and some commonly used Integrated Development Environments (IDEs) for data
science.

Installation of Python:
To get started with Python for data science, you'll need to install Python on your computer. Here's a step-by-
step guide for installing Python:
1. DOWNLOAD PYTHON: VISIT THE OFFICIAL PYTHON WEBSITE AT
HTTPS://WWW.PYTHON.ORG/DOWNLOADS/ AND CHOOSE THE VERSION THAT IS APPROPRIATE FOR YOUR
OPERATING SYSTEM. AS OF MY LAST KNOWLEDGE UPDATE IN SEPTEMBER 2021, PYTHON 3.X IS
RECOMMENDED FOR DATA SCIENCE.

2. RUN THE INSTALLER: AFTER DOWNLOADING THE PYTHON INSTALLER, RUN IT. DURING THE
INSTALLATION PROCESS, MAKE SURE TO CHECK THE BOX THAT SAYS, "ADD PYTHON TO PATH." THIS
OPTION ALLOWS YOU TO RUN PYTHON FROM THE COMMAND LINE WITHOUT SPECIFYING THE FULL PATH
TO THE PYTHON EXECUTABLE.

3. INSTALL PYTHON: FOLLOW THE ON-SCREEN INSTRUCTIONS TO COMPLETE THE INSTALLATION. PYTHON
WILL BE INSTALLED IN THE DEFAULT LOCATION ON YOUR SYSTEM.

4. VERIFY INSTALLATION: OPEN A COMMAND PROMPT (WINDOWS) OR TERMINAL (MACOS AND LINUX)
AND TYPE PYTHON --VERSION OR PYTHON3 --VERSION TO VERIFY THAT PYTHON HAS BEEN INSTALLED
SUCCESSFULLY. YOU SHOULD SEE THE VERSION NUMBER DISPLAYED.

Installing Python IDEs for Data Science:


Python can be used with various Integrated Development Environments (IDEs) that provide a
user-friendly interface, code editing, debugging, and project management tools tailored for
data science tasks. Here are some popular Python IDEs for data science:

Jupyter Notebook: Jupyter Notebook is a web-based interactive computing environment widely used in
data science. It allows you to create and share documents containing live code, equations, visualizations, and
narrative text. You can install Jupyter Notebook using pip:
➢ pip install notebook
To start a Jupyter Notebook session, run jupyter notebook in your terminal.
3
Anaconda: Anaconda is a data science platform that comes with Python and a suite of pre-installed data
science packages and tools. It's particularly useful for managing package dependencies. You can download
Anaconda from the official website: https://www.anaconda.com/products/distribution
Spyder: Spyder is an open-source IDE designed for scientific computing and data analysis. It comes
bundled with the Anaconda distribution, but you can also install it separately using pip:
➢ pip install spyder

Choose an IDE that suits your workflow and preferences. Each has its strengths and can be a great choice for
data science tasks.
In conclusion, Python is a powerful language for data science, and installing the appropriate IDE can greatly
enhance your productivity. Whether you choose Jupyter Notebook, Anaconda, PyCharm, VSCode, or
Spyder, having a well-configured environment is crucial for effective data analysis and modelling.

4
▪ Experiment 2
Aim: Design a Python program to generate and print a list except for the first 5 elements, where the values
are squares of numbers between 1 and 30.

Program:
l = list()
for i in range(1,31):
l.append(i**2)
print(l[5:])

Output:
[36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729,
784, 841, 900]

5
▪ Experiment 3
Aim: Design a python program to understand the working of loops.

Program:
for i in range(1, 6):
print(i)

count = 1
while count <= 5:
print(count)
count += 1

Output:
1
2
3
4
5
1
2
3
4

6
▪ Experiment 4
Aim: Design a python program to find Max of Three numbers

Program:
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
num3 = float(input("Enter the third number: "))

if num1 >= num2 and num1 >= num3:


print('Maximum is num1:', num1)
elif num2 >= num1 and num2 >= num3:
print('Maximum is num2:', num2)
else:
print('Maximum is num3:', num3)

Output:
Enter the first number: 9
Enter the second number: 15
Enter the third number: 2
Maximum is num2: 15.0

7
▪ Experiment 5
Aim: Design a python program for creating a random story generator

Program:
import random

characters = ["Alok", "Rajat", "Chaitanya", "Dhruv", "Neeraj"]


settings = ["in a magical forest", "on a deserted island", "in a bustling city", "in a faraway kingdom","in an abandoned
building","under the ocean"]
actions = ["discovered a hidden treasure", "went on an epic adventure", "solved a mysterious puzzle", "made a new friend","broke
a wall","opened a secret concealed door"]
moods = ["happy", "sad", "excited", "nervous", "curious","anxious","confused","depressed"]
endings = ["and they lived happily ever after.", "and they all learned an important lesson.", "but it was the beginning of a new
journey.","and all of it exploded.","and it didn't end well. ...", "and they ran away in spite what happened."]

character = random.choice(characters)
setting = random.choice(settings)
action = random.choice(actions)
mood = random.choice(moods)
ending = random.choice(endings)

story = f"{character} {action} {setting}. They were feeling {mood}, {ending}"

print(story)

Output:
Rajat broke a wall in a bustling city. They were feeling anxious, but it was the beginning of a new journey.

8
▪ Experiment 6
Aim: Create a synthetic datasheet (.csv/.xlsx) to work upon and design a Python Program to read and print
the data

Program to create the datasheet:


import pandas as pd
from faker import Faker
import random

fake = Faker()
data = []

for _ in range(25):
registration_number = str(random.randint(100000000, 999999999))
name = fake.name()
gender = random.choice(['Male', 'Female'])
email=fake.email()
address = fake.address()
company_name = fake.country()
website=fake.url()
job=fake.job()

data.append([registration_number, name, gender, email, address, company_name, website,job])

df = pd.DataFrame(data, columns=['Registration Number','Name','Gender','Email','Address','Country','Website','Job'])

df.to_excel('sample_data3.xlsx', index=False)

Program to read the datasheet:


import pandas as pd

dataframe1 = pd.read_excel('sample_data3.xlsx', usecols=[1,3])

dataframe1.set_index('Name', inplace=True)
print(dataframe1)

Output:
Name Email
Sarah Conley [email protected]
Stephen Murphy [email protected]
Brian Sullivan [email protected]
Jason Wilson [email protected]
Lauren Rowland [email protected]
Sydney Johnson [email protected]
Jamie Barron [email protected]
Richard Salazar [email protected]
9
Richard Mason [email protected]
Kaylee Martin [email protected]
Hayley Downs [email protected]
Jonathan Reid [email protected]
Jeanette Beasley [email protected]
Richard Turner [email protected]
Elizabeth Scott [email protected]
Daniel Robinson [email protected]
Carly Taylor [email protected]
Julian Hernandez [email protected]
James Phelps [email protected]
Donald Graves [email protected]
Joshua Ramsey [email protected]
Derrick Gonzalez [email protected]
Michelle Christian [email protected]
Robert Hill [email protected]

10
▪ Experiment 7
Aim: Design a python program using numpy library functions

Program:
import numpy as np

random_array_1 = np.random.randint(1, 50, size=(4, 4))


random_array_2 = np.random.randint(1, 50, size=(4, 4))
print("Random Array 1:")
print(random_array_1)
print("Random Array 2:")
print(random_array_2)

dot_product = np.dot(random_array_1, random_array_2)


print("\nDot product of above two arrays:")
print(dot_product)

mean = np.mean(dot_product)
print("\nMean of the dot product:", mean)

std_dev = np.std(dot_product)
print("Standard Deviation: ",std_dev)

print('Flattening the first array:',random_array_1.flatten())


print('Rearranging the second array:\n',random_array_2.reshape(8,2))

Output:
Random Array 1:
[[20 44 44 8]
[ 1 4 21 16]
[14 16 13 6]
[47 41 27 20]]
Random Array 2:
[[47 7 39 24]
[23 21 11 10]
[41 29 6 40]
[13 25 32 25]]

Dot product of above two arrays:


[[3860 2540 1784 2880]
[1208 1100 721 1304]
[1637 961 992 1166]
11
[4519 2473 3086 3118]]

Mean of the dot product: 2084.3125


Standard Deviation: 1116.2817587167453
Flattening the first array: [20 44 44 8 1 4 21 16 14 16 13 6 47 41 27 20]
Rearranging the second array:
[[47 7]
[39 24]
[23 21]
[11 10]
[41 29]
[ 6 40]
[13 25]
[32 25]]

12
▪ Experiment 8
Aim: Perform Statistics and Data Visualization in python.

Program:
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("tips.csv")
plt.scatter(data['sex'], data['tip'])
plt.title("Gender Gap")

plt.xlabel('Gender')
plt.ylabel('Tip')

plt.show()

Output:

13
▪ Experiment 9
Aim: Design a Python program to implement Linear Regression.

Program:
import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


n = np.size(x) # number of observations/points

m_x = np.mean(x) # mean of x and y vector


m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot
plt.scatter(x, y, color = "b",
marker = "o", s = 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "r")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)

if __name__ == "__main__":
main()

14
Output:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

15
▪ Experiment 10
Aim: Design a Python program to create a recommender system.

Program:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

ratings = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/ratings.csv")
ratings.head()

movies = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv")
movies_name =pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv", usecols = [0,1])
movies.head()

n_ratings = len(ratings)
n_movies = len(ratings['movieId'].unique())
n_users = len(ratings['userId'].unique())

print(f"Number of ratings: {n_ratings}")


print(f"Number of unique movieId's: {n_movies}")
print(f"Number of unique users: {n_users}")
print(f"Average ratings per user: {round(n_ratings/n_users, 2)}")
print(f"Average ratings per movie: {round(n_ratings/n_movies, 2)}")

user_freq = ratings[['userId', 'movieId']].groupby('userId').count().reset_index()


user_freq.columns = ['userId', 'n_ratings']
user_freq.head()

# Find Lowest and Highest rated movies:


mean_rating = ratings.groupby('movieId')[['rating']].mean()
# Lowest rated movies
lowest_rated = mean_rating['rating'].idxmin()
movies.loc[movies['movieId'] == lowest_rated]
# Highest rated movies
highest_rated = mean_rating['rating'].idxmax()
movies.loc[movies['movieId'] == highest_rated]
# show number of people who rated movies rated movie highest
ratings[ratings['movieId']==highest_rated]
# show number of people who rated movies rated movie lowest
ratings[ratings['movieId']==lowest_rated]

## the above movies has very low dataset. We will use bayesian average
movie_stats = ratings.groupby('movieId')[['rating']].agg(['count', 'mean'])
movie_stats.columns = movie_stats.columns.droplevel()

# Now, we create user-item matrix using scipy csr matrix


from scipy.sparse import csr_matrix

def create_matrix(df):

16
N = len(df['userId'].unique())
M = len(df['movieId'].unique())

# Map Ids to indices


user_mapper = dict(zip(np.unique(df["userId"]), list(range(N))))
movie_mapper = dict(zip(np.unique(df["movieId"]), list(range(M))))

# Map indices to IDs


user_inv_mapper = dict(zip(list(range(N)), np.unique(df["userId"])))
movie_inv_mapper = dict(zip(list(range(M)), np.unique(df["movieId"])))

user_index = [user_mapper[i] for i in df['userId']]


movie_index = [movie_mapper[i] for i in df['movieId']]

X = csr_matrix((df["rating"], (movie_index, user_index)), shape=(M, N))

return X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper

X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper = create_matrix(ratings)

from sklearn.neighbors import NearestNeighbors


"""
Find similar movies using KNN
"""
def find_similar_movies(movie_id, X, k, metric='cosine', show_distance=False):

neighbour_ids = []

movie_ind = movie_mapper[movie_id]
movie_vec = X[movie_ind]
k+=1
kNN = NearestNeighbors(n_neighbors=k, algorithm="brute", metric=metric)
kNN.fit(X)
movie_vec = movie_vec.reshape(1,-1)
neighbour = kNN.kneighbors(movie_vec, return_distance=show_distance)
for i in range(0,k):
n = neighbour.item(i)
neighbour_ids.append(movie_inv_mapper[n])
neighbour_ids.pop(0)
return neighbour_ids

movie_titles = dict(zip(movies['movieId'], movies['title']))


movie_id=int(input("Enter the movie id of the movie you have seen: "))

similar_ids = find_similar_movies(movie_id, X, k=10)


movie_title = movie_titles[movie_id]

print(f"Since you watched {movie_title}")


for i in similar_ids:
print(movie_titles[i])

17
Output:
Number of ratings: 100836
Number of unique movieId's: 9724
Number of unique users: 610
Average ratings per user: 165.3
Average ratings per movie: 10.37
Enter the movie id of the movie you have seen: 8
Since you watched Tom and Huck (1995)
Losing Isaiah (1995)
Jury Duty (1995)
Next Karate Kid, The (1994)
Babysitter, The (1995)
Son in Law (1993)
Lassie (1994)
Kiss of Death (1995)
Heavyweights (Heavy Weights) (1995)
Richie Rich (1994)
Little Rascals, The (1994)

18

You might also like