Mkce Python Lab Manual
Mkce Python Lab Manual
PYTHON PROGRAMMING
FOR DATA SCIENCE
2023 -2024
STUDENT NAME
SUBJECT NAME/CODE
SEMESTER/YEAR
INDEX
PAGE
S.NO DATE EXPERIMENT MARKS SIGNATURE
NO
5. Comparing Statistical
Measures: NumPy vs Pandas
Comparing Statistical
6. Measures: NumPy vs Pandas
using dataset
7. Linear Regression Trendline
Plotter Using Matplotlib
AIM :
To create a Python program that reads three integers from the user and displays
them in sorted order.
ALGORITHM :
1. Read three integers from the user.
2. Use the min and max functions to find the smallest and largest values.
PROGRAM :
OUTPUT:
RESULT :
AIM:
Create a program that determines whether a given letter of the alphabet is a vowel
or a consonant.
ALGORITHM :
1. Check if the entered character is alphabetical using the isalpha() method.
4. Check if the entered character is 'a', 'e', 'i', 'o', or 'u'. If yes, display a message
indicating it's a vowel.
5. Check if the entered character is 'y'. If yes, display a message indicating it's
sometimes a vowel and sometimes a consonant.
6. If the entered character is not a vowel or 'y', display a message indicating it's
a consonant.
PROGRAM :
OUTPUT:
RESULT :
The program correctly identifies whether the entered letter is a vowel, consonant or
‘y’ based on the given conditions.
EXP NO DATE
3 Generate a Random Password Using Function
AIM :
To create a Python function that generates a random password within the specified
criteria and display the generated password in the main program.
ALGORITHM :
3. Use a loop to generate random characters for the password based on the
random length.
7. In the main program, call the generate_password() function and display the
generated password.
PROGRAM :
import random
def generate_password():
password_length = random.randint(7, 10)
password = ''
for x in range(password_length):
password += chr(random.randint(33, 126))
return password
if __name__ == "__main__":
password = generate_password()
print("Generated Password:", password)
OUTPUT:
Generated Password: 7WWIeK\3U\
RESULT :
AIM :
The aim is to read student information from a CSV file, display the first five rows
of the data frame, calculate the average age of the students, and filter out students
with grades above a certain threshold.
ALGORITHM :
1. Import the necessary libraries, including Pandas.
2. Read the CSV file containing student information into a Pandas DataFrame.
3. Display the first five rows of the DataFrame using the head() method.
4. Calculate the average age of the students by computing the mean of the 'age'
column.
6. Filter out the students with grades above the threshold using boolean
indexing.
import pandas as pd
df = pd.read_csv(‘lab-4.csv’)
average_age = df['Age'].mean()
print ("\nAverage age of the students:", average_age)
RESULT :
Thus the program for file handling using pandas dataframe has been successfully
executed.
EXP NO DATE
5 Comparing Statistical Measures: NumPy vs
Pandas
AIM :
The aim of this program is to demonstrate how to Generate a NumPy array with
random numbers, convert the same array into a Pandas DataFrame with
appropriate column names and then calculate the mean, median, and standard
deviation of the data using both NumPy and Pandas functions.
ALGORITHM :
1. Import the necessary libraries: NumPy and Pandas.
3. Convert the NumPy array into a Pandas DataFrame with appropriate column
names.
4. Use NumPy and Pandas functions to calculate the mean, median, and
standard deviation of the data.
PROGRAM :
import numpy as np
import pandas as pd
np_array = np.random.rand(5, 3)
column_names = ['Column_1', 'Column_2', 'Column_3']
df = pd.DataFrame(np_array, columns=column_names)
mean_np = np.mean(np_array)
median_np = np.median(np_array)
std_np = np.std(np_array)
mean_pd = df.mean()
median_pd = df.median()
std_pd = df.std()
print("NumPy Mean:")
print(mean_np)
print("\nPandas Mean:")
print(mean_pd)
print("\nNumPy Median:")
print(median_np)
print("\nPandas Median:")
print(median_pd)
print("\nNumPy Standard Deviation:")
print(std_np)
print("\nPandas Standard Deviation:")
print(std_pd)
OUTPUT:
NumPy Mean:
0.5107690724628491
Pandas Mean:
Column_1 0.630971
Column_2 0.559101
Column_3 0.518407
dtype: float64
NumPy Median:
0.5644428322384722
Pandas Median:
Column_1 0.676098
Column_2 0.595252
Column_3 0.481189
dtype: float64
RESULT :
AIM :
The aim of this program is to demonstrate how to Generate a NumPy array with
random numbers from the dataset, convert the same array into a Pandas DataFrame
with appropriate column names and then calculate the mean, median, and standard
deviation of the data using both NumPy and Pandas functions.
ALGORITHM :
1. Import the necessary libraries: NumPy and Pandas.
3. Convert the NumPy array into a Pandas DataFrame with appropriate column
names.
4. Use NumPy and Pandas functions to calculate the mean, median, and
standard deviation of the data.
PROGRAM :
import numpy as np
import pandas as pd
np_array=np.random.rand(3,5)
df = pd.read_csv('lab-ex-6.csv')
column_names = ['Tamil','English','Maths','Science','Social']
df = pd.DataFrame(np_array, columns=column_names)
mean_np = np.mean(np_array)
median_np = np.median(np_array)
std_np = np.std(np_array)
mean_pd = df.mean()
median_pd = df.median()
std_pd = df.std()
print("NumPy Mean:")
print(mean_np)
print("\nPandas Mean:")
print(mean_pd)
print("\nNumPy Median:")
print(median_np)
print("\nPandas Median:")
print(median_pd)
print("\nNumPy Standard Deviation:")
print(std_np)
print("\nPandas Standard Deviation:")
print(std_pd)
OUTPUT:
NumPy Mean:
0.5937982966893395
Pandas Mean:
Tamil 0.217006
English 0.678402
Maths 0.585628
Science 0.613438
Social 0.874517
dtype: float64
NumPy Median:
0.6476541923679712
Pandas Median:
Tamil 0.266269
English 0.847279
Maths 0.518450
Science 0.647654
Social 0.900979
dtype: float64
0.2741247946668009
English 0.312321
Maths 0.199324
Science 0.256798
Social 0.081868
dtype: float64
RESULT :
AIM :
The aim of this program is to visually represent the relationship between two
variables in a dataset using a scatter plot. Additionally, the program will utilize
linear regression to add a trendline to the scatter plot, aiding in understanding the
underlying relationship between the variables.
ALGORITHM :
Import the necessary libraries: Matplotlib for plotting and linear regression,
and NumPy for numerical operations.
Create a scatter plot using Matplotlib, with one variable on the x-axis and
the other variable on the y-axis.
PROGRAM :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
# Create trendline
trendline = slope * x + intercept
plt.plot(x, trendline, color='red', label='Trendline')
# Show plot
plt.grid(True)
plt.show()
OUTPUT:
RESULT :
The scatter plot with the trendline provides a visual representation of the
relationship between the two variables in the dataset.
EXP NO DATE
8 Correlation Analysis using Seaborn
AIM :
The aim of this program is to generate a visual representation (heat map) of the
correlation matrix of a given dataset using Sea born. The heat map will display the
pairwise correlations between variables in the dataset, with customized colour
palette and annotations for better interpretation.
ALGORITHM :
PROGRAM :
import pandas as pd
correlation_matrix = df.corr()
plt.title('Correlation Heatmap')
plt.show()
OUTPUT:
RESULT :
The resulting heat map provides a clear and visual representation of the correlation
structure within the dataset.
EXP NO DATE
9 Plotting Sinusoidal and Co-sinusoidal Trends with
Subplots
AIM :
The aim of the program is to create a figure with multiple subplots using
Matplotlib. Each subplot will display a line plot representing the trend of different
variables. Legends will be added to the plots, and the appearance of the subplots
will be customized.
ALGORITHM :
Import the necessary libraries, including Matplotlib and NumPy.
Define the data for the line plots (e.g., values for x-axis and y-axis for each
variable).
Customize the appearance of the subplots (e.g., set titles, labels, colors,
markers).
PROGRAM :
axs[0].legend()
axs[1].legend()
axs[0].set_title('Sinusoidal Trend')
axs[0].set_xlabel('x')
axs[0].set_ylabel('sin(x)')
axs[1].set_title('Cosinusoidal Trend')
axs[1].set_xlabel('x')
axs[1].set_ylabel('cos(x)')
plt.show()
OUTPUT:
RESULT :
The output will be a figure with two subplots, each displaying a line plot
representing the trend of different variables (sin(x) and cos(x)).
EXP NO DATE
10 Handling Missing Data in Pandas: Techniques for
Data Imputation and Cleaning
AIM :
The aim of the program is to handle missing data in a dataset using Pandas. This
involves implementing techniques such as dropping missing values, filling missing
values with mean or median, and forward/backward filling.
ALGORITHM :
Identify missing values in the dataset using methods like isna() or info().
PROGRAM :
import pandas as pd
data = pd.read_csv(‘lab-10.csv’)
missing_values = data.isna().sum()
print("Missing Values:")
print(missing_values)
clean_data_dropna = data.dropna()
clean_data_mean = data.fillna(data.mean())
clean_data_median = data.fillna(data.median())
clean_data_ffill = data.ffill()
clean_data_bfill = data.bfill()
OUTPUT:
RESULT :
Thus the program for handling missing data in a dataset using Pandas by
implementing techniques such as dropping missing values, filling missing values
with mean or median, and forward/backward filling has been executed
successfully.
EXP NO Generating Hashed Features for Categorical DATE
11 Variables
AIM :
The aim of the program is to perform feature hashing on a categorical variable
using either Pandas or Scikit-learn. This involves converting categorical variables
into a numerical format by applying a hash function.
ALGORITHM :
Optionally, add the hashed features to the original dataset or create a new
DataFrame with hashed features.
PROGRAM :
1) Using Pandas
import pandas as pd
import hashlib
data = pd.read_csv(‘lab-11.csv’)
data['hashed_feature'] = data[categorical_column].apply(lambda x:
hashlib.sha1(str(x).encode('utf-8')).hexdigest())
print(data.head())
2) Using scikit-learn
import pandas as pd
data = pd.read_csv(‘lab-11.csv’)
hasher = FeatureHasher(input_type='string')
hashed_features = hasher.fit_transform(data[categorical_column])
hashed_data = pd.DataFrame(hashed_features.toarray())
print(hashed_data.head())
OUTPUT:
1) Using Pandas
Modified Dataset with Hashed Feature:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard
hashed_feature
0 1f1362ea41d1bc65be321c0a378a20159f9a26d0
1 b37f6ddcefad7e8657837d3177f9ef2462f98acf
2 08a35293e09f508494096c1c1b3819edb9df50db
3 98fbc42faedc02492397cb5962ea3a3ffc0a9243
4 450ddec8dd206c2e2ab1aeeaa90e85e51753b8b7
2) Using scikit-learn
Modified Dataset with Hashed Feature:
0 1 2 3 4 5 6 7 \
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
RESULT :
The output will be the modified dataset containing hashed features for the
categorical variable. It will include the original dataset with an additional column
representing the hashed feature(s).
EXP NO DATE
12 Exploring Different Types of Joins between Two
DataFrames
AIM :
The aim of the program is to merge two datasets based on a common column using
Pandas. This involves performing an inner join, left join, and right join between the
datasets.
ALGORITHM :
Import the necessary libraries, including Pandas.
Perform an inner join between the datasets using the pd.merge() function.
Perform a left join between the datasets using the pd.merge() function with
the how='left' parameter.
Perform a right join between the datasets using the pd.merge() function with
the how='right' parameter.
PROGRAM :
import pandas as pd
df1 = pd.read_csv(‘lab-12(df1).csv’)
df2 = pd.read_csv('lab-12(df2).csv’)
common_column = 'Close'
print("Inner Join:")
print(inner_join.head())
print("\nLeft Join:")
print(left_join.head())
print("\nRight Join:")
print(right_join.head())
OUTPUT:
RESULT:
The output will be the merged datasets for each type of join. Each merged dataset
will contain rows from the original datasets based on the specified join type (inner,
left, or right), along with columns from both datasets.
EXP NO DATE
13 Comparison of Numerical Variable Distributions
Across Categories Using Box Plots
AIM :
The aim of this program is to create a box plot using Matplotlib or Seaborn to
visualize the distribution of a numerical variable across different categories and
Add appropriate labels and titles to the plot.
ALGORITHM :
PROGRAM :
OUTPUT:
RESULT :
Thus the program with appropriate labels for the x-axis, y-axis, and title,
making it easy to interpret the distribution of the numerical variable across
different categories has been executed successfully.
EXP NO DATE
14 Bar Chart of Category Frequency Distribution
AIM :
The aim of this program is to visualize the frequency distribution of
categories present in a dataset using a bar chart.
ALGORITHM :
Import necessary libraries: Pandas, Matplotlib
Read the dataset from a CSV file using pandas
Compute the frequency counts of each category in the dataset.
Create a bar chart to visualize the frequency distribution of categories
Set the title of the plot and label the x and y axes
Rotate the x-axis labels to prevent overcrowding
Display the plot
PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('lab-14.csv')
category_counts = df['Category'].value_counts()
# Plotting the bar chart
plt.figure(figsize=(10, 6))
category_counts.plot(kind='bar')
plt.title('Frequency Distribution of Categories')
plt.xlabel('Category')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
OUTPUT:
RESULT :
Thus, the resulting chart shows the frequency distribution of categories in the
dataset.
DATE
EXP NO Correlation Analysis using Scatter Plot
15
AIM :
The aim of this Python program is to calculate the correlation coefficient
between two numerical variables from a CSV dataset and visualize their
correlation using a scatter plot.
ALGORITHM :
Import the necessary libraries: pandas for data manipulation and
matplotlib.pyplot for plotting.
Load the dataset from a CSV file into a pandas DataFrame.
Extract two numerical variables from the DataFrame.
Calculate the correlation coefficient between the two variables using the
corr() function.
Print the calculated correlation coefficient.
Visualize the correlation using a scatter plot.
Display the scatter plot.
PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt
OUPUT:
Correlation Coefficient between variable1 and variable2: 0.9222499938509208
RESULT :
This Python program for Correlation analysis using scatter plot has been
executed successfully.