FODS Using Python Practical File
FODS Using Python Practical File
1
Sr. Date of Name of Experiment Page Sign Remark
No. Experiment no.
1. Introduction and installation of Python
and Python IDEs for data science
(Spyder-Anaconda, Jupyter Notebook
3
etc.
2. Design a Python program to
generate and print a list except for
the first 5 elements, where the 5
values are squares of numbers
between 1 and 30.
3. Design a Python program to 6
understand the working of loops.
4. Design a Python function to find 7
the Max of three numbers.
5. Design a Python program for 8
creating a random story generator
6. Create a synthetic dataset
(.csv/.xlsx) to work upon and 9
design a Python program to read
and print that data.
7. Design a Python program using 11
Numpy library function.
8. Perform Statistics and Data 13
Visualization in python.
9. Design a Python program to 14
implement Linear Regression
10. Design a Python program to create 16
a recommender system
2
▪ Experiment: 1
Introduction to Python for Data Science:
Python is a popular programming language for data science due to its simplicity, versatility,
and a wide range of libraries and tools specifically designed for data analysis, manipulation,
and visualization. It has gained immense popularity in the data sci ence community because of
its ease of use, strong community support, and the ability to handle data -intensive tasks
efficiently.
Python provides various libraries and frameworks like NumPy, pandas, Matplotlib, Seaborn,
Scikit-Learn, TensorFlow, and PyTorch that make it a powerful tool for data analysis, machine
learning, and deep learning tasks. In this introduction, we'll focus on the installation of
Python and some commonly used Integrated Development Environments (IDEs) for data
science.
Installation of Python:
To get started with Python for data science, you'll need to install Python on your computer. Here's a step-by-
step guide for installing Python:
1. DOWNLOAD PYTHON: VISIT THE OFFICIAL PYTHON WEBSITE AT
HTTPS://WWW.PYTHON.ORG/DOWNLOADS/ AND CHOOSE THE VERSION THAT IS APPROPRIATE FOR YOUR
OPERATING SYSTEM. AS OF MY LAST KNOWLEDGE UPDATE IN SEPTEMBER 2021, PYTHON 3.X IS
RECOMMENDED FOR DATA SCIENCE.
2. RUN THE INSTALLER: AFTER DOWNLOADING THE PYTHON INSTALLER, RUN IT. DURING THE
INSTALLATION PROCESS, MAKE SURE TO CHECK THE BOX THAT SAYS, "ADD PYTHON TO PATH." THIS
OPTION ALLOWS YOU TO RUN PYTHON FROM THE COMMAND LINE WITHOUT SPECIFYING THE FULL PATH
TO THE PYTHON EXECUTABLE.
3. INSTALL PYTHON: FOLLOW THE ON-SCREEN INSTRUCTIONS TO COMPLETE THE INSTALLATION. PYTHON
WILL BE INSTALLED IN THE DEFAULT LOCATION ON YOUR SYSTEM.
4. VERIFY INSTALLATION: OPEN A COMMAND PROMPT (WINDOWS) OR TERMINAL (MACOS AND LINUX)
AND TYPE PYTHON --VERSION OR PYTHON3 --VERSION TO VERIFY THAT PYTHON HAS BEEN INSTALLED
SUCCESSFULLY. YOU SHOULD SEE THE VERSION NUMBER DISPLAYED.
Jupyter Notebook: Jupyter Notebook is a web-based interactive computing environment widely used in
data science. It allows you to create and share documents containing live code, equations, visualizations, and
narrative text. You can install Jupyter Notebook using pip:
➢ pip install notebook
To start a Jupyter Notebook session, run jupyter notebook in your terminal.
3
Anaconda: Anaconda is a data science platform that comes with Python and a suite of pre-installed data
science packages and tools. It's particularly useful for managing package dependencies. You can download
Anaconda from the official website: https://www.anaconda.com/products/distribution
Spyder: Spyder is an open-source IDE designed for scientific computing and data analysis. It comes
bundled with the Anaconda distribution, but you can also install it separately using pip:
➢ pip install spyder
Choose an IDE that suits your workflow and preferences. Each has its strengths and can be a great choice for
data science tasks.
In conclusion, Python is a powerful language for data science, and installing the appropriate IDE can greatly
enhance your productivity. Whether you choose Jupyter Notebook, Anaconda, PyCharm, VSCode, or
Spyder, having a well-configured environment is crucial for effective data analysis and modelling.
4
▪ Experiment 2
Aim: Design a Python program to generate and print a list except for the first 5 elements, where the values
are squares of numbers between 1 and 30.
Program:
l = list()
for i in range(1,31):
l.append(i**2)
print(l[5:])
Output:
[36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729,
784, 841, 900]
5
▪ Experiment 3
Aim: Design a python program to understand the working of loops.
Program:
for i in range(1, 6):
print(i)
count = 1
while count <= 5:
print(count)
count += 1
Output:
1
2
3
4
5
1
2
3
4
6
▪ Experiment 4
Aim: Design a python program to find Max of Three numbers
Program:
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
num3 = float(input("Enter the third number: "))
Output:
Enter the first number: 9
Enter the second number: 15
Enter the third number: 2
Maximum is num2: 15.0
7
▪ Experiment 5
Aim: Design a python program for creating a random story generator
Program:
import random
character = random.choice(characters)
setting = random.choice(settings)
action = random.choice(actions)
mood = random.choice(moods)
ending = random.choice(endings)
print(story)
Output:
Rajat broke a wall in a bustling city. They were feeling anxious, but it was the beginning of a new journey.
8
▪ Experiment 6
Aim: Create a synthetic datasheet (.csv/.xlsx) to work upon and design a Python Program to read and print
the data
fake = Faker()
data = []
for _ in range(25):
registration_number = str(random.randint(100000000, 999999999))
name = fake.name()
gender = random.choice(['Male', 'Female'])
email=fake.email()
address = fake.address()
company_name = fake.country()
website=fake.url()
job=fake.job()
df.to_excel('sample_data3.xlsx', index=False)
dataframe1.set_index('Name', inplace=True)
print(dataframe1)
Output:
Name Email
Sarah Conley [email protected]
Stephen Murphy [email protected]
Brian Sullivan [email protected]
Jason Wilson [email protected]
Lauren Rowland [email protected]
Sydney Johnson [email protected]
Jamie Barron [email protected]
Richard Salazar [email protected]
9
Richard Mason [email protected]
Kaylee Martin [email protected]
Hayley Downs [email protected]
Jonathan Reid [email protected]
Jeanette Beasley [email protected]
Richard Turner [email protected]
Elizabeth Scott [email protected]
Daniel Robinson [email protected]
Carly Taylor [email protected]
Julian Hernandez [email protected]
James Phelps [email protected]
Donald Graves [email protected]
Joshua Ramsey [email protected]
Derrick Gonzalez [email protected]
Michelle Christian [email protected]
Robert Hill [email protected]
10
▪ Experiment 7
Aim: Design a python program using numpy library functions
Program:
import numpy as np
mean = np.mean(dot_product)
print("\nMean of the dot product:", mean)
std_dev = np.std(dot_product)
print("Standard Deviation: ",std_dev)
Output:
Random Array 1:
[[20 44 44 8]
[ 1 4 21 16]
[14 16 13 6]
[47 41 27 20]]
Random Array 2:
[[47 7 39 24]
[23 21 11 10]
[41 29 6 40]
[13 25 32 25]]
12
▪ Experiment 8
Aim: Perform Statistics and Data Visualization in python.
Program:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("tips.csv")
plt.scatter(data['sex'], data['tip'])
plt.title("Gender Gap")
plt.xlabel('Gender')
plt.ylabel('Tip')
plt.show()
Output:
13
▪ Experiment 9
Aim: Design a Python program to implement Linear Regression.
Program:
import numpy as np
import matplotlib.pyplot as plt
# putting labels
plt.xlabel('x')
plt.ylabel('y')
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
if __name__ == "__main__":
main()
14
Output:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
15
▪ Experiment 10
Aim: Design a Python program to create a recommender system.
Program:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
ratings = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/ratings.csv")
ratings.head()
movies = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv")
movies_name =pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv", usecols = [0,1])
movies.head()
n_ratings = len(ratings)
n_movies = len(ratings['movieId'].unique())
n_users = len(ratings['userId'].unique())
## the above movies has very low dataset. We will use bayesian average
movie_stats = ratings.groupby('movieId')[['rating']].agg(['count', 'mean'])
movie_stats.columns = movie_stats.columns.droplevel()
def create_matrix(df):
16
N = len(df['userId'].unique())
M = len(df['movieId'].unique())
neighbour_ids = []
movie_ind = movie_mapper[movie_id]
movie_vec = X[movie_ind]
k+=1
kNN = NearestNeighbors(n_neighbors=k, algorithm="brute", metric=metric)
kNN.fit(X)
movie_vec = movie_vec.reshape(1,-1)
neighbour = kNN.kneighbors(movie_vec, return_distance=show_distance)
for i in range(0,k):
n = neighbour.item(i)
neighbour_ids.append(movie_inv_mapper[n])
neighbour_ids.pop(0)
return neighbour_ids
17
Output:
Number of ratings: 100836
Number of unique movieId's: 9724
Number of unique users: 610
Average ratings per user: 165.3
Average ratings per movie: 10.37
Enter the movie id of the movie you have seen: 8
Since you watched Tom and Huck (1995)
Losing Isaiah (1995)
Jury Duty (1995)
Next Karate Kid, The (1994)
Babysitter, The (1995)
Son in Law (1993)
Lassie (1994)
Kiss of Death (1995)
Heavyweights (Heavy Weights) (1995)
Richie Rich (1994)
Little Rascals, The (1994)
18