0% found this document useful (0 votes)
49 views46 pages

Mkce Python Lab Manual

The document discusses a Python programming course for data science. It includes an index listing experiments students will complete related to data analysis and machine learning techniques using Python libraries like Pandas and NumPy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views46 pages

Mkce Python Lab Manual

The document discusses a Python programming course for data science. It includes an index listing experiments students will complete related to data analysis and machine learning techniques using Python libraries like Pandas and NumPy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Department of Artificial Intelligence and Data Science

PYTHON PROGRAMMING
FOR DATA SCIENCE
2023 -2024

STUDENT NAME

SUBJECT NAME/CODE

SEMESTER/YEAR
INDEX
PAGE
S.NO DATE EXPERIMENT MARKS SIGNATURE
NO

1. Sort the given three integers


in the ascending order.

2. Finding whether a given letter


is a vowel or a consonant

Generate a Random Password


3.
Using Function

4. File handling using Pandas


Data frame

5. Comparing Statistical
Measures: NumPy vs Pandas
Comparing Statistical
6. Measures: NumPy vs Pandas
using dataset
7. Linear Regression Trendline
Plotter Using Matplotlib

8. Correlation Analysis using


Seaborn
Plotting Sinusoidal and Co-
9. sinusoidal Trends with
Subplots
Handling Missing Data in
10. Pandas: Techniques for Data
Imputation and Cleaning
INDEX
PAGE
S.NO DATE EXPERIMENT MARKS SIGNATURE
NO

11 Generating Hashed Features


for Categorical Variables
Exploring Different Types of
12. Joins between Two Data
Frames
Comparison of Numerical
13. Variable Distributions Across
Categories Using Box Plots
14. Bar Chart of Category
Frequency Distribution

15. Correlation Analysis using


Scatter Plot
EXP DATE
NO Sort the given three integers in the ascending
1 order.

AIM :
To create a Python program that reads three integers from the user and displays
them in sorted order.

ALGORITHM :
1. Read three integers from the user.

2. Use the min and max functions to find the smallest and largest values.

3. Compute the sum of all three values.

4. Calculate the middle value by subtracting the minimum and maximum


values from the sum.

5. Display the three integers in sorted order.

PROGRAM :

num1 = int(input("Enter the first integer: "))


num2 = int(input("Enter the second integer: "))
num3 = int(input("Enter the third integer: "))

smallest = min(num1, num2, num3)


largest = max(num1, num2, num3)
total_sum = num1 + num2 + num3

middle = total_sum - smallest - largest

print("Integers in Sorted order: ", smallest, middle, largest )

OUTPUT:

Enter the first integer:


Enter the second integer:
Enter the third integer:
Integers in sorted order:

RESULT :

The program displays the integers in sorted order.


EXP NO Finding whether a given letter is a vowel or a DATE
2 consonant

AIM:
Create a program that determines whether a given letter of the alphabet is a vowel
or a consonant.

ALGORITHM :
1. Check if the entered character is alphabetical using the isalpha() method.

2. Read a character of the alphabet from the user.

3. Read the input character and convert it to lowercase.

4. Check if the entered character is 'a', 'e', 'i', 'o', or 'u'. If yes, display a message
indicating it's a vowel.

5. Check if the entered character is 'y'. If yes, display a message indicating it's
sometimes a vowel and sometimes a consonant.

6. If the entered character is not a vowel or 'y', display a message indicating it's
a consonant.

PROGRAM :

character = input("Enter a letter of the alphabet or a number: ").lower()


if character.isalpha():
if character in ['a', 'e', 'i', 'o', 'u']:
print("The entered character is a vowel.")
elif character == 'y':
print("Sometimes y is a vowel, and sometimes y is a consonant.")
else:
print("The entered character is a consonant.")
elif character.isdigit():
print("The entered character is a number.")
else:
print("The entered character is not a letter or a number.")

OUTPUT:

Enter a letter of the alphabet or a number: a


The entered character is a vowel.

Enter a letter of the alphabet or a number: s


The entered character is a consonant.

Enter a letter of the alphabet or a number: y


Sometimes y is a vowel, and sometimes y is a consonant.

RESULT :
The program correctly identifies whether the entered letter is a vowel, consonant or
‘y’ based on the given conditions.
EXP NO DATE
3 Generate a Random Password Using Function

AIM :
To create a Python function that generates a random password within the specified
criteria and display the generated password in the main program.

ALGORITHM :

1. Define a function generate_password() that takes no parameters.

2. Generate a random length for the password between 7 and 10 characters.

3. Use a loop to generate random characters for the password based on the
random length.

4. Each character should be randomly selected from positions 33 to 126 in the


ASCII table.

5. Concatenate the random characters to form the password.

6. Return the generated password from the function.

7. In the main program, call the generate_password() function and display the
generated password.
PROGRAM :

import random

def generate_password():
password_length = random.randint(7, 10)
password = ''
for x in range(password_length):
password += chr(random.randint(33, 126))
return password

if __name__ == "__main__":
password = generate_password()
print("Generated Password:", password)

OUTPUT:
Generated Password: 7WWIeK\3U\

RESULT :

The program successfully generates a random password with a length between 7


and 10 characters, with each character randomly selected from positions 33 to 126
in the ASCII table.
EXP NO DATE
4 File handling using Pandas Dataframe

AIM :
The aim is to read student information from a CSV file, display the first five rows
of the data frame, calculate the average age of the students, and filter out students
with grades above a certain threshold.

ALGORITHM :
1. Import the necessary libraries, including Pandas.

2. Read the CSV file containing student information into a Pandas DataFrame.

3. Display the first five rows of the DataFrame using the head() method.

4. Calculate the average age of the students by computing the mean of the 'age'
column.

5. Prompt the user to enter a grade threshold.

6. Filter out the students with grades above the threshold using boolean
indexing.

7. Display the filtered DataFrame.


PROGRAM :

import pandas as pd

df = pd.read_csv(‘lab-4.csv’)

print ("First five rows of the DataFrame:")


print (df.head())

average_age = df['Age'].mean()
print ("\nAverage age of the students:", average_age)

threshold = float(input("\nEnter the grade threshold: "))

filtered_df = df[df['Grade'] <= threshold]

print ("\nStudents with grades less than or equal to the threshold:")


print (filtered_df)
OUTPUT:

First five rows of the DataFrame:

ID Name Age Grade


0 NaN NaN NaN NaN
1 111111.0 John Doe 23.0 90.0
2 111112.0 Jane Smith 12.0 70.0
3 111113.0 Sarah Thomas 23.0 45.0
4 111114.0 Frank Brown 18.0 80.0

Average age of the students: 21.285714285714285

Enter the grade threshold: 30

Students with grades less than or equal to the threshold:

ID Name Age Grade


5 111115.0 Mike Davis 19.0 20.0
8 111118.0 Fred Clark 26.0 23.0
9 111119.0 Bob Lopez 20.0 12.0
11 111121.0 Ferik Anderson 24.0 23.0
13 111123.0 Feliz antony 21.0 25.0

RESULT :
Thus the program for file handling using pandas dataframe has been successfully
executed.
EXP NO DATE
5 Comparing Statistical Measures: NumPy vs
Pandas

AIM :
The aim of this program is to demonstrate how to Generate a NumPy array with
random numbers, convert the same array into a Pandas DataFrame with
appropriate column names and then calculate the mean, median, and standard
deviation of the data using both NumPy and Pandas functions.

ALGORITHM :
1. Import the necessary libraries: NumPy and Pandas.

2. Generate a NumPy array with random numbers using numpy.random.rand()


function.

3. Convert the NumPy array into a Pandas DataFrame with appropriate column
names.

4. Use NumPy and Pandas functions to calculate the mean, median, and
standard deviation of the data.

5. Print the results.

PROGRAM :

import numpy as np
import pandas as pd

np_array = np.random.rand(5, 3)
column_names = ['Column_1', 'Column_2', 'Column_3']
df = pd.DataFrame(np_array, columns=column_names)

mean_np = np.mean(np_array)

median_np = np.median(np_array)

std_np = np.std(np_array)

mean_pd = df.mean()
median_pd = df.median()
std_pd = df.std()

print("NumPy Mean:")
print(mean_np)
print("\nPandas Mean:")
print(mean_pd)
print("\nNumPy Median:")
print(median_np)
print("\nPandas Median:")
print(median_pd)
print("\nNumPy Standard Deviation:")
print(std_np)
print("\nPandas Standard Deviation:")
print(std_pd)
OUTPUT:
NumPy Mean:
0.5107690724628491

Pandas Mean:
Column_1 0.630971
Column_2 0.559101
Column_3 0.518407
dtype: float64

NumPy Median:
0.5644428322384722

Pandas Median:
Column_1 0.676098
Column_2 0.595252
Column_3 0.481189
dtype: float64

NumPy Standard Deviation:


0.21253242293540204
Pandas Standard Deviation:
Column_1 0.147784
Column_2 0.085243
Column_3 0.276785
dtype: float64

RESULT :

The program successfully demonstrates the statistical comparison of Numpy vs


Pandas
EXP NO DATE
6 Comparing Statistical Measures: NumPy vs
Pandas using dataset

AIM :
The aim of this program is to demonstrate how to Generate a NumPy array with
random numbers from the dataset, convert the same array into a Pandas DataFrame
with appropriate column names and then calculate the mean, median, and standard
deviation of the data using both NumPy and Pandas functions.

ALGORITHM :
1. Import the necessary libraries: NumPy and Pandas.

2. Generate a NumPy array with dataset using dataframe creation.

3. Convert the NumPy array into a Pandas DataFrame with appropriate column
names.

4. Use NumPy and Pandas functions to calculate the mean, median, and
standard deviation of the data.

5. Print the results.

PROGRAM :
import numpy as np
import pandas as pd
np_array=np.random.rand(3,5)
df = pd.read_csv('lab-ex-6.csv')

column_names = ['Tamil','English','Maths','Science','Social']
df = pd.DataFrame(np_array, columns=column_names)
mean_np = np.mean(np_array)
median_np = np.median(np_array)
std_np = np.std(np_array)

mean_pd = df.mean()
median_pd = df.median()
std_pd = df.std()

print("NumPy Mean:")
print(mean_np)
print("\nPandas Mean:")
print(mean_pd)
print("\nNumPy Median:")
print(median_np)
print("\nPandas Median:")
print(median_pd)
print("\nNumPy Standard Deviation:")
print(std_np)
print("\nPandas Standard Deviation:")
print(std_pd)

OUTPUT:
NumPy Mean:

0.5937982966893395

Pandas Mean:

Tamil 0.217006
English 0.678402

Maths 0.585628

Science 0.613438

Social 0.874517

dtype: float64

NumPy Median:

0.6476541923679712

Pandas Median:

Tamil 0.266269

English 0.847279

Maths 0.518450

Science 0.647654

Social 0.900979

dtype: float64

NumPy Standard Deviation:

0.2741247946668009

Pandas Standard Deviation:


Tamil 0.105272

English 0.312321

Maths 0.199324

Science 0.256798

Social 0.081868

dtype: float64

RESULT :

The program successfully demonstrates the statistical comparison of Numpy vs


Pandas
EXP NO DATE
7 Linear Regression Trendline Plotter Using
Matplotlib

AIM :
The aim of this program is to visually represent the relationship between two
variables in a dataset using a scatter plot. Additionally, the program will utilize
linear regression to add a trendline to the scatter plot, aiding in understanding the
underlying relationship between the variables.

ALGORITHM :
 Import the necessary libraries: Matplotlib for plotting and linear regression,
and NumPy for numerical operations.

 Load or generate the dataset containing two variables.

 Create a scatter plot using Matplotlib, with one variable on the x-axis and
the other variable on the y-axis.

 Use linear regression to fit a trendline to the scatter plot data.

 Plot the trendline on the scatter plot.

 Display the scatter plot with the trendline.

PROGRAM :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

# Generate sample data (replace this with your dataset)


df = pd.read_csv('lab-7.csv')
x = df['User ID'].values.reshape(-1, 1)
y = df['Post ID']

# Create scatter plot


plt.scatter(x, y, color='blue', label='Data Points')

# Fit linear regression model


model = LinearRegression()
model.fit(x, y)

# Get slope and intercept of the fitted line


slope = model.coef_[0]
intercept = model.intercept_

# Create trendline
trendline = slope * x + intercept
plt.plot(x, trendline, color='red', label='Trendline')

# Add labels and legend


plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot with Trendline')
plt.legend()

# Show plot
plt.grid(True)
plt.show()
OUTPUT:

RESULT :
The scatter plot with the trendline provides a visual representation of the
relationship between the two variables in the dataset.
EXP NO DATE
8 Correlation Analysis using Seaborn

AIM :
The aim of this program is to generate a visual representation (heat map) of the
correlation matrix of a given dataset using Sea born. The heat map will display the
pairwise correlations between variables in the dataset, with customized colour
palette and annotations for better interpretation.

ALGORITHM :

1. Import necessary libraries: Seaborn, Pandas (for loading dataset).


2. Load the dataset into a Pandas DataFrame.
3. Compute the correlation matrix using Pandas DataFrame's .corr() method.
4. Use Seaborn's heatmap() function to plot the correlation matrix as a
heatmap.
5. Customize the color palette and add annotations to the heatmap for better
visualization.
6. Display the heatmap.

PROGRAM :

import seaborn as sns

import pandas as pd

import matplotlib.pyplot as plt


df = pd.read_csv('/content/Diabetes CSV - Lab exp 7.csv')

correlation_matrix = df.corr()

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')

plt.title('Correlation Heatmap')

plt.show()

OUTPUT:

RESULT :

The resulting heat map provides a clear and visual representation of the correlation
structure within the dataset.
EXP NO DATE
9 Plotting Sinusoidal and Co-sinusoidal Trends with
Subplots

AIM :
The aim of the program is to create a figure with multiple subplots using
Matplotlib. Each subplot will display a line plot representing the trend of different
variables. Legends will be added to the plots, and the appearance of the subplots
will be customized.

ALGORITHM :
 Import the necessary libraries, including Matplotlib and NumPy.

 Define the data for the line plots (e.g., values for x-axis and y-axis for each
variable).

 Create a figure with multiple subplots using plt.subplots().

 Plot each line plot on a separate subplot using plt.plot().

 Add legends to the plots using plt.legend().

 Customize the appearance of the subplots (e.g., set titles, labels, colors,
markers).

 Show the plot using plt.show().

PROGRAM :

import matplotlib.pyplot as plt


import numpy as np
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

fig, axs = plt.subplots(2)

axs[0].plot(x, y1, label='sin(x)', color='blue')


axs[1].plot(x, y2, label='cos(x)', color='red')

axs[0].legend()
axs[1].legend()

axs[0].set_title('Sinusoidal Trend')
axs[0].set_xlabel('x')
axs[0].set_ylabel('sin(x)')
axs[1].set_title('Cosinusoidal Trend')
axs[1].set_xlabel('x')
axs[1].set_ylabel('cos(x)')

plt.show()
OUTPUT:

RESULT :
The output will be a figure with two subplots, each displaying a line plot
representing the trend of different variables (sin(x) and cos(x)).
EXP NO DATE
10 Handling Missing Data in Pandas: Techniques for
Data Imputation and Cleaning

AIM :
The aim of the program is to handle missing data in a dataset using Pandas. This
involves implementing techniques such as dropping missing values, filling missing
values with mean or median, and forward/backward filling.

ALGORITHM :

 Import the necessary libraries, including Pandas.

 Load the dataset into a Pandas DataFrame.

 Identify missing values in the dataset using methods like isna() or info().

 Implement techniques to handle missing data:

 Dropping missing values using dropna() method.

 Filling missing values with mean or median using fillna() method.

 Forward filling missing values using ffill() method.

 Backward filling missing values using bfill() method.

 Display the modified dataset after handling missing data.

PROGRAM :

import pandas as pd
data = pd.read_csv(‘lab-10.csv’)

missing_values = data.isna().sum()
print("Missing Values:")
print(missing_values)

clean_data_dropna = data.dropna()

clean_data_mean = data.fillna(data.mean())

clean_data_median = data.fillna(data.median())

clean_data_ffill = data.ffill()

clean_data_bfill = data.bfill()

print("\nCleaned Data (Dropped missing values):")


print(clean_data_dropna.head())

print("\nCleaned Data (Filled with mean):")


print(clean_data_mean.head())

print("\nCleaned Data (Filled with median):")


print(clean_data_median.head())
print("\nCleaned Data (Forward filled):")
print(clean_data_ffill.head())

print("\nCleaned Data (Backward filled):")


print(clean_data_bfill.head())

OUTPUT:
RESULT :

Thus the program for handling missing data in a dataset using Pandas by
implementing techniques such as dropping missing values, filling missing values
with mean or median, and forward/backward filling has been executed
successfully.
EXP NO Generating Hashed Features for Categorical DATE
11 Variables

AIM :
The aim of the program is to perform feature hashing on a categorical variable
using either Pandas or Scikit-learn. This involves converting categorical variables
into a numerical format by applying a hash function.

ALGORITHM :

 Import the necessary libraries, including Pandas or Scikit-learn.

 Load the dataset containing categorical variables into a DataFrame.

 Identify the categorical variable(s) that need to be hashed.

 Apply a hash function to convert the categorical variable(s) into numerical


format.

 Optionally, add the hashed features to the original dataset or create a new
DataFrame with hashed features.

 Display the modified dataset with hashed features.

PROGRAM :

1) Using Pandas
import pandas as pd

import hashlib

data = pd.read_csv(‘lab-11.csv’)

categorical_column = 'Categorical Variable'

data['hashed_feature'] = data[categorical_column].apply(lambda x:
hashlib.sha1(str(x).encode('utf-8')).hexdigest())

print("Modified Dataset with Hashed Feature:")

print(data.head())

2) Using scikit-learn

from sklearn.feature_extraction import FeatureHasher

import pandas as pd

data = pd.read_csv(‘lab-11.csv’)

categorical_column = 'Categorical Variable'

data[categorical_column] = data[categorical_column].apply(lambda x: [str(x)])

hasher = FeatureHasher(input_type='string')

hashed_features = hasher.fit_transform(data[categorical_column])

hashed_data = pd.DataFrame(hashed_features.toarray())

print("Modified Dataset with Hashed Feature:")

print(hashed_data.head())
OUTPUT:

1) Using Pandas
Modified Dataset with Hashed Feature:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score Categorical Variable \


0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

hashed_feature
0 1f1362ea41d1bc65be321c0a378a20159f9a26d0
1 b37f6ddcefad7e8657837d3177f9ef2462f98acf
2 08a35293e09f508494096c1c1b3819edb9df50db
3 98fbc42faedc02492397cb5962ea3a3ffc0a9243
4 450ddec8dd206c2e2ab1aeeaa90e85e51753b8b7

2) Using scikit-learn
Modified Dataset with Hashed Feature:
0 1 2 3 4 5 6 7 \
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

8 9 ... 1048566 1048567 1048568 1048569 1048570 \


0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

1048571 1048572 1048573 1048574 1048575


0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0

[5 rows x 1048576 columns]

RESULT :
The output will be the modified dataset containing hashed features for the
categorical variable. It will include the original dataset with an additional column
representing the hashed feature(s).
EXP NO DATE
12 Exploring Different Types of Joins between Two
DataFrames

AIM :
The aim of the program is to merge two datasets based on a common column using
Pandas. This involves performing an inner join, left join, and right join between the
datasets.

ALGORITHM :
 Import the necessary libraries, including Pandas.

 Load the two datasets into separate DataFrames.

 Identify the common column(s) on which the datasets will be merged.

 Perform an inner join between the datasets using the pd.merge() function.

 Perform a left join between the datasets using the pd.merge() function with
the how='left' parameter.

 Perform a right join between the datasets using the pd.merge() function with
the how='right' parameter.

 Display the merged datasets for each type of join.

PROGRAM :

import pandas as pd

df1 = pd.read_csv(‘lab-12(df1).csv’)

df2 = pd.read_csv('lab-12(df2).csv’)
common_column = 'Close'

inner_join = pd.merge(df1, df2, on=common_column, how='inner')

left_join = pd.merge(df1, df2, on=common_column, how='left')

right_join = pd.merge(df1, df2, on=common_column, how='right')

print("Inner Join:")
print(inner_join.head())

print("\nLeft Join:")
print(left_join.head())

print("\nRight Join:")
print(right_join.head())
OUTPUT:

RESULT:
The output will be the merged datasets for each type of join. Each merged dataset
will contain rows from the original datasets based on the specified join type (inner,
left, or right), along with columns from both datasets.
EXP NO DATE
13 Comparison of Numerical Variable Distributions
Across Categories Using Box Plots

AIM :

The aim of this program is to create a box plot using Matplotlib or Seaborn to
visualize the distribution of a numerical variable across different categories and
Add appropriate labels and titles to the plot.
ALGORITHM :

 Import necessary libraries.


 Read your data into a pandas Data Frame or any other data structure.
 Group your data by the categorical variable.
 Use Matplotlib or Seaborn to create a box plot.
 Pass the grouped data to the box plot function.
 Specify the x-axis as the categorical variable and the y-axis as the numerical
variable.
 Add appropriate labels to the x-axis and y-axis.
 Add a title to the plot.
 Display the plot

PROGRAM :

import matplotlib.pyplot as plt


import numpy as np
np.random.seed(10)
data = {
'Category A': np.random.normal(loc=0, scale=1, size=100),
'Category B': np.random.normal(loc=1, scale=1.5, size=100),
'Category C': np.random.normal(loc=-1, scale=0.5, size=100)
}
plt.figure(figsize=(8, 6))
plt.boxplot(data.values(), labels=data.keys())
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Box Plot of Numerical Variable Across Categories')
plt.grid(True)
plt.show()

OUTPUT:

RESULT :

Thus the program with appropriate labels for the x-axis, y-axis, and title,
making it easy to interpret the distribution of the numerical variable across
different categories has been executed successfully.
EXP NO DATE
14 Bar Chart of Category Frequency Distribution

AIM :
The aim of this program is to visualize the frequency distribution of
categories present in a dataset using a bar chart.
ALGORITHM :
 Import necessary libraries: Pandas, Matplotlib
 Read the dataset from a CSV file using pandas
 Compute the frequency counts of each category in the dataset.
 Create a bar chart to visualize the frequency distribution of categories
 Set the title of the plot and label the x and y axes
 Rotate the x-axis labels to prevent overcrowding
 Display the plot

PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('lab-14.csv')
category_counts = df['Category'].value_counts()
# Plotting the bar chart
plt.figure(figsize=(10, 6))
category_counts.plot(kind='bar')
plt.title('Frequency Distribution of Categories')
plt.xlabel('Category')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

OUTPUT:

RESULT :
Thus, the resulting chart shows the frequency distribution of categories in the
dataset.
DATE
EXP NO Correlation Analysis using Scatter Plot
15

AIM :
The aim of this Python program is to calculate the correlation coefficient
between two numerical variables from a CSV dataset and visualize their
correlation using a scatter plot.
ALGORITHM :
 Import the necessary libraries: pandas for data manipulation and
matplotlib.pyplot for plotting.
 Load the dataset from a CSV file into a pandas DataFrame.
 Extract two numerical variables from the DataFrame.
 Calculate the correlation coefficient between the two variables using the
corr() function.
 Print the calculated correlation coefficient.
 Visualize the correlation using a scatter plot.
 Display the scatter plot.

PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset


df = pd.read_csv('lab-15.csv')
# Select the two numerical variables for correlation analysis
variable1 = df['Likes/Reactions']#var-1
variable2 = df['Comments']#var-2
# Calculate the correlation coefficient
correlation_coefficient = variable1.corr(variable2)

# Print the correlation coefficient


print("Correlation Coefficient between variable1 and variable2:",
correlation_coefficient)

# Visualize the correlation using a scatter plot


plt.figure(figsize=(8, 6))
plt.scatter(variable1, variable2, color='blue', alpha=0.5)
plt.title('Scatter Plot of variable1 vs variable2')
plt.xlabel('Likes/Reactions')
plt.ylabel('Comments')
plt.grid(True)
plt.show()

OUPUT:
Correlation Coefficient between variable1 and variable2: 0.9222499938509208
RESULT :
This Python program for Correlation analysis using scatter plot has been
executed successfully.

You might also like