0% found this document useful (0 votes)

49 views46 pages

Mkce Python Lab Manual

The document discusses a Python programming course for data science. It includes an index listing experiments students will complete related to data analysis and machine learning techniques using Python libraries like Pandas and NumPy.

Uploaded by

ARUNAGIRINATHAN K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views46 pages

Mkce Python Lab Manual

Uploaded by

ARUNAGIRINATHAN K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Department of Artificial Intelligence and Data Science

PYTHON PROGRAMMING
FOR DATA SCIENCE
2023 -2024

STUDENT NAME

SUBJECT NAME/CODE

SEMESTER/YEAR
INDEX
PAGE
S.NO DATE EXPERIMENT MARKS SIGNATURE
NO

1. Sort the given three integers

in the ascending order.

2. Finding whether a given letter

is a vowel or a consonant

Generate a Random Password

3.
Using Function

4. File handling using Pandas

Data frame

5. Comparing Statistical
Measures: NumPy vs Pandas
Comparing Statistical
6. Measures: NumPy vs Pandas
using dataset
7. Linear Regression Trendline
Plotter Using Matplotlib

8. Correlation Analysis using

Seaborn
Plotting Sinusoidal and Co-
9. sinusoidal Trends with
Subplots
Handling Missing Data in
10. Pandas: Techniques for Data
Imputation and Cleaning
INDEX
PAGE
S.NO DATE EXPERIMENT MARKS SIGNATURE
NO

11 Generating Hashed Features

for Categorical Variables
Exploring Different Types of
12. Joins between Two Data
Frames
Comparison of Numerical
13. Variable Distributions Across
Categories Using Box Plots
14. Bar Chart of Category
Frequency Distribution

15. Correlation Analysis using

Scatter Plot
EXP DATE
NO Sort the given three integers in the ascending
1 order.

AIM :
To create a Python program that reads three integers from the user and displays
them in sorted order.

ALGORITHM :
1. Read three integers from the user.

2. Use the min and max functions to find the smallest and largest values.

3. Compute the sum of all three values.

4. Calculate the middle value by subtracting the minimum and maximum

values from the sum.

5. Display the three integers in sorted order.

PROGRAM :

num1 = int(input("Enter the first integer: "))

num2 = int(input("Enter the second integer: "))
num3 = int(input("Enter the third integer: "))

smallest = min(num1, num2, num3)

largest = max(num1, num2, num3)
total_sum = num1 + num2 + num3

middle = total_sum - smallest - largest

print("Integers in Sorted order: ", smallest, middle, largest )

OUTPUT:

Enter the first integer:

Enter the second integer:
Enter the third integer:
Integers in sorted order:

RESULT :

The program displays the integers in sorted order.

EXP NO Finding whether a given letter is a vowel or a DATE
2 consonant

AIM:
Create a program that determines whether a given letter of the alphabet is a vowel
or a consonant.

ALGORITHM :
1. Check if the entered character is alphabetical using the isalpha() method.

2. Read a character of the alphabet from the user.

3. Read the input character and convert it to lowercase.

4. Check if the entered character is 'a', 'e', 'i', 'o', or 'u'. If yes, display a message
indicating it's a vowel.

5. Check if the entered character is 'y'. If yes, display a message indicating it's
sometimes a vowel and sometimes a consonant.

6. If the entered character is not a vowel or 'y', display a message indicating it's
a consonant.

PROGRAM :

character = input("Enter a letter of the alphabet or a number: ").lower()

if character.isalpha():
if character in ['a', 'e', 'i', 'o', 'u']:
print("The entered character is a vowel.")
elif character == 'y':
print("Sometimes y is a vowel, and sometimes y is a consonant.")
else:
print("The entered character is a consonant.")
elif character.isdigit():
print("The entered character is a number.")
else:
print("The entered character is not a letter or a number.")

OUTPUT:

Enter a letter of the alphabet or a number: a

The entered character is a vowel.

Enter a letter of the alphabet or a number: s

The entered character is a consonant.

Enter a letter of the alphabet or a number: y

Sometimes y is a vowel, and sometimes y is a consonant.

RESULT :
The program correctly identifies whether the entered letter is a vowel, consonant or
‘y’ based on the given conditions.
EXP NO DATE
3 Generate a Random Password Using Function

AIM :
To create a Python function that generates a random password within the specified
criteria and display the generated password in the main program.

ALGORITHM :

1. Define a function generate_password() that takes no parameters.

2. Generate a random length for the password between 7 and 10 characters.

3. Use a loop to generate random characters for the password based on the
random length.

4. Each character should be randomly selected from positions 33 to 126 in the

ASCII table.

5. Concatenate the random characters to form the password.

6. Return the generated password from the function.

7. In the main program, call the generate_password() function and display the
generated password.
PROGRAM :

import random

def generate_password():
password_length = random.randint(7, 10)
password = ''
for x in range(password_length):
password += chr(random.randint(33, 126))
return password

if __name__ == "__main__":
password = generate_password()
print("Generated Password:", password)

OUTPUT:
Generated Password: 7WWIeK\3U\

RESULT :

The program successfully generates a random password with a length between 7

and 10 characters, with each character randomly selected from positions 33 to 126
in the ASCII table.
EXP NO DATE
4 File handling using Pandas Dataframe

AIM :
The aim is to read student information from a CSV file, display the first five rows
of the data frame, calculate the average age of the students, and filter out students
with grades above a certain threshold.

ALGORITHM :
1. Import the necessary libraries, including Pandas.

2. Read the CSV file containing student information into a Pandas DataFrame.

3. Display the first five rows of the DataFrame using the head() method.

4. Calculate the average age of the students by computing the mean of the 'age'
column.

5. Prompt the user to enter a grade threshold.

6. Filter out the students with grades above the threshold using boolean
indexing.

7. Display the filtered DataFrame.

PROGRAM :

import pandas as pd

df = pd.read_csv(‘lab-4.csv’)

print ("First five rows of the DataFrame:")

print (df.head())

average_age = df['Age'].mean()
print ("\nAverage age of the students:", average_age)

threshold = float(input("\nEnter the grade threshold: "))

filtered_df = df[df['Grade'] <= threshold]

print ("\nStudents with grades less than or equal to the threshold:")

print (filtered_df)
OUTPUT:

First five rows of the DataFrame:

ID Name Age Grade

0 NaN NaN NaN NaN
1 111111.0 John Doe 23.0 90.0
2 111112.0 Jane Smith 12.0 70.0
3 111113.0 Sarah Thomas 23.0 45.0
4 111114.0 Frank Brown 18.0 80.0

Average age of the students: 21.285714285714285

Enter the grade threshold: 30

Students with grades less than or equal to the threshold:

ID Name Age Grade

5 111115.0 Mike Davis 19.0 20.0
8 111118.0 Fred Clark 26.0 23.0
9 111119.0 Bob Lopez 20.0 12.0
11 111121.0 Ferik Anderson 24.0 23.0
13 111123.0 Feliz antony 21.0 25.0

RESULT :
Thus the program for file handling using pandas dataframe has been successfully
executed.
EXP NO DATE
5 Comparing Statistical Measures: NumPy vs
Pandas

AIM :
The aim of this program is to demonstrate how to Generate a NumPy array with
random numbers, convert the same array into a Pandas DataFrame with
appropriate column names and then calculate the mean, median, and standard
deviation of the data using both NumPy and Pandas functions.

ALGORITHM :
1. Import the necessary libraries: NumPy and Pandas.

2. Generate a NumPy array with random numbers using numpy.random.rand()

function.

3. Convert the NumPy array into a Pandas DataFrame with appropriate column
names.

4. Use NumPy and Pandas functions to calculate the mean, median, and
standard deviation of the data.

5. Print the results.

PROGRAM :

import numpy as np
import pandas as pd

np_array = np.random.rand(5, 3)
column_names = ['Column_1', 'Column_2', 'Column_3']
df = pd.DataFrame(np_array, columns=column_names)

mean_np = np.mean(np_array)

median_np = np.median(np_array)

std_np = np.std(np_array)

mean_pd = df.mean()
median_pd = df.median()
std_pd = df.std()

print("NumPy Mean:")
print(mean_np)
print("\nPandas Mean:")
print(mean_pd)
print("\nNumPy Median:")
print(median_np)
print("\nPandas Median:")
print(median_pd)
print("\nNumPy Standard Deviation:")
print(std_np)
print("\nPandas Standard Deviation:")
print(std_pd)
OUTPUT:
NumPy Mean:
0.5107690724628491

Pandas Mean:
Column_1 0.630971
Column_2 0.559101
Column_3 0.518407
dtype: float64

NumPy Median:
0.5644428322384722

Pandas Median:
Column_1 0.676098
Column_2 0.595252
Column_3 0.481189
dtype: float64

NumPy Standard Deviation:

0.21253242293540204
Pandas Standard Deviation:
Column_1 0.147784
Column_2 0.085243
Column_3 0.276785
dtype: float64

RESULT :

The program successfully demonstrates the statistical comparison of Numpy vs

Pandas
EXP NO DATE
6 Comparing Statistical Measures: NumPy vs
Pandas using dataset

AIM :
The aim of this program is to demonstrate how to Generate a NumPy array with
random numbers from the dataset, convert the same array into a Pandas DataFrame
with appropriate column names and then calculate the mean, median, and standard
deviation of the data using both NumPy and Pandas functions.

ALGORITHM :
1. Import the necessary libraries: NumPy and Pandas.

2. Generate a NumPy array with dataset using dataframe creation.

3. Convert the NumPy array into a Pandas DataFrame with appropriate column
names.

4. Use NumPy and Pandas functions to calculate the mean, median, and
standard deviation of the data.

5. Print the results.

PROGRAM :
import numpy as np
import pandas as pd
np_array=np.random.rand(3,5)
df = pd.read_csv('lab-ex-6.csv')

column_names = ['Tamil','English','Maths','Science','Social']
df = pd.DataFrame(np_array, columns=column_names)
mean_np = np.mean(np_array)
median_np = np.median(np_array)
std_np = np.std(np_array)

mean_pd = df.mean()
median_pd = df.median()
std_pd = df.std()

OUTPUT:
NumPy Mean:

0.5937982966893395

Pandas Mean:

Tamil 0.217006
English 0.678402

Maths 0.585628

Science 0.613438

Social 0.874517

dtype: float64

NumPy Median:

0.6476541923679712

Pandas Median:

Tamil 0.266269

English 0.847279

Maths 0.518450

Science 0.647654

Social 0.900979

dtype: float64

NumPy Standard Deviation:

0.2741247946668009

Pandas Standard Deviation:

Tamil 0.105272

English 0.312321

Maths 0.199324

Science 0.256798

Social 0.081868

dtype: float64

RESULT :

The program successfully demonstrates the statistical comparison of Numpy vs

Pandas
EXP NO DATE
7 Linear Regression Trendline Plotter Using
Matplotlib

AIM :
The aim of this program is to visually represent the relationship between two
variables in a dataset using a scatter plot. Additionally, the program will utilize
linear regression to add a trendline to the scatter plot, aiding in understanding the
underlying relationship between the variables.

ALGORITHM :
 Import the necessary libraries: Matplotlib for plotting and linear regression,
and NumPy for numerical operations.

 Load or generate the dataset containing two variables.

 Create a scatter plot using Matplotlib, with one variable on the x-axis and
the other variable on the y-axis.

 Use linear regression to fit a trendline to the scatter plot data.

 Plot the trendline on the scatter plot.

 Display the scatter plot with the trendline.

PROGRAM :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

# Generate sample data (replace this with your dataset)

df = pd.read_csv('lab-7.csv')
x = df['User ID'].values.reshape(-1, 1)
y = df['Post ID']

# Create scatter plot

plt.scatter(x, y, color='blue', label='Data Points')

# Fit linear regression model

model = LinearRegression()
model.fit(x, y)

# Get slope and intercept of the fitted line

slope = model.coef_[0]
intercept = model.intercept_

# Create trendline
trendline = slope * x + intercept
plt.plot(x, trendline, color='red', label='Trendline')

# Add labels and legend

plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot with Trendline')
plt.legend()

# Show plot
plt.grid(True)
plt.show()
OUTPUT:

RESULT :
The scatter plot with the trendline provides a visual representation of the
relationship between the two variables in the dataset.
EXP NO DATE
8 Correlation Analysis using Seaborn

AIM :
The aim of this program is to generate a visual representation (heat map) of the
correlation matrix of a given dataset using Sea born. The heat map will display the
pairwise correlations between variables in the dataset, with customized colour
palette and annotations for better interpretation.

ALGORITHM :

1. Import necessary libraries: Seaborn, Pandas (for loading dataset).

2. Load the dataset into a Pandas DataFrame.
3. Compute the correlation matrix using Pandas DataFrame's .corr() method.
4. Use Seaborn's heatmap() function to plot the correlation matrix as a
heatmap.
5. Customize the color palette and add annotations to the heatmap for better
visualization.
6. Display the heatmap.

PROGRAM :

import seaborn as sns

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv('/content/Diabetes CSV - Lab exp 7.csv')

correlation_matrix = df.corr()

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')

plt.title('Correlation Heatmap')

plt.show()

OUTPUT:

RESULT :

The resulting heat map provides a clear and visual representation of the correlation
structure within the dataset.
EXP NO DATE
9 Plotting Sinusoidal and Co-sinusoidal Trends with
Subplots

AIM :
The aim of the program is to create a figure with multiple subplots using
Matplotlib. Each subplot will display a line plot representing the trend of different
variables. Legends will be added to the plots, and the appearance of the subplots
will be customized.

ALGORITHM :
 Import the necessary libraries, including Matplotlib and NumPy.

 Define the data for the line plots (e.g., values for x-axis and y-axis for each
variable).

 Create a figure with multiple subplots using plt.subplots().

 Plot each line plot on a separate subplot using plt.plot().

 Add legends to the plots using plt.legend().

 Customize the appearance of the subplots (e.g., set titles, labels, colors,
markers).

 Show the plot using plt.show().

PROGRAM :

import matplotlib.pyplot as plt

import numpy as np
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

fig, axs = plt.subplots(2)

axs[0].plot(x, y1, label='sin(x)', color='blue')

axs[1].plot(x, y2, label='cos(x)', color='red')

axs[0].legend()
axs[1].legend()

axs[0].set_title('Sinusoidal Trend')
axs[0].set_xlabel('x')
axs[0].set_ylabel('sin(x)')
axs[1].set_title('Cosinusoidal Trend')
axs[1].set_xlabel('x')
axs[1].set_ylabel('cos(x)')

plt.show()
OUTPUT:

RESULT :
The output will be a figure with two subplots, each displaying a line plot
representing the trend of different variables (sin(x) and cos(x)).
EXP NO DATE
10 Handling Missing Data in Pandas: Techniques for
Data Imputation and Cleaning

AIM :
The aim of the program is to handle missing data in a dataset using Pandas. This
involves implementing techniques such as dropping missing values, filling missing
values with mean or median, and forward/backward filling.

ALGORITHM :

 Import the necessary libraries, including Pandas.

 Load the dataset into a Pandas DataFrame.

 Identify missing values in the dataset using methods like isna() or info().

 Implement techniques to handle missing data:

 Dropping missing values using dropna() method.

 Filling missing values with mean or median using fillna() method.

 Forward filling missing values using ffill() method.

 Backward filling missing values using bfill() method.

 Display the modified dataset after handling missing data.

PROGRAM :

import pandas as pd
data = pd.read_csv(‘lab-10.csv’)

missing_values = data.isna().sum()
print("Missing Values:")
print(missing_values)

clean_data_dropna = data.dropna()

clean_data_mean = data.fillna(data.mean())

clean_data_median = data.fillna(data.median())

clean_data_ffill = data.ffill()

clean_data_bfill = data.bfill()

print("\nCleaned Data (Dropped missing values):")

print(clean_data_dropna.head())

print("\nCleaned Data (Filled with mean):")

print(clean_data_mean.head())

print("\nCleaned Data (Filled with median):")

print(clean_data_median.head())
print("\nCleaned Data (Forward filled):")
print(clean_data_ffill.head())

print("\nCleaned Data (Backward filled):")

print(clean_data_bfill.head())

OUTPUT:
RESULT :

Thus the program for handling missing data in a dataset using Pandas by
implementing techniques such as dropping missing values, filling missing values
with mean or median, and forward/backward filling has been executed
successfully.
EXP NO Generating Hashed Features for Categorical DATE
11 Variables

AIM :
The aim of the program is to perform feature hashing on a categorical variable
using either Pandas or Scikit-learn. This involves converting categorical variables
into a numerical format by applying a hash function.

ALGORITHM :

 Import the necessary libraries, including Pandas or Scikit-learn.

 Load the dataset containing categorical variables into a DataFrame.

 Identify the categorical variable(s) that need to be hashed.

 Apply a hash function to convert the categorical variable(s) into numerical

format.

 Optionally, add the hashed features to the original dataset or create a new
DataFrame with hashed features.

 Display the modified dataset with hashed features.

PROGRAM :

1) Using Pandas
import pandas as pd

import hashlib

data = pd.read_csv(‘lab-11.csv’)

categorical_column = 'Categorical Variable'

data['hashed_feature'] = data[categorical_column].apply(lambda x:
hashlib.sha1(str(x).encode('utf-8')).hexdigest())

print("Modified Dataset with Hashed Feature:")

print(data.head())

2) Using scikit-learn

from sklearn.feature_extraction import FeatureHasher

import pandas as pd

data = pd.read_csv(‘lab-11.csv’)

categorical_column = 'Categorical Variable'

data[categorical_column] = data[categorical_column].apply(lambda x: [str(x)])

hasher = FeatureHasher(input_type='string')

hashed_features = hasher.fit_transform(data[categorical_column])

hashed_data = pd.DataFrame(hashed_features.toarray())

print("Modified Dataset with Hashed Feature:")

print(hashed_data.head())
OUTPUT:

1) Using Pandas
Modified Dataset with Hashed Feature:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score Categorical Variable \

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

hashed_feature
0 1f1362ea41d1bc65be321c0a378a20159f9a26d0
1 b37f6ddcefad7e8657837d3177f9ef2462f98acf
2 08a35293e09f508494096c1c1b3819edb9df50db
3 98fbc42faedc02492397cb5962ea3a3ffc0a9243
4 450ddec8dd206c2e2ab1aeeaa90e85e51753b8b7

2) Using scikit-learn
Modified Dataset with Hashed Feature:
0 1 2 3 4 5 6 7 \
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

8 9 ... 1048566 1048567 1048568 1048569 1048570 \

0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

1048571 1048572 1048573 1048574 1048575

0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0

[5 rows x 1048576 columns]

RESULT :
The output will be the modified dataset containing hashed features for the
categorical variable. It will include the original dataset with an additional column
representing the hashed feature(s).
EXP NO DATE
12 Exploring Different Types of Joins between Two
DataFrames

AIM :
The aim of the program is to merge two datasets based on a common column using
Pandas. This involves performing an inner join, left join, and right join between the
datasets.

ALGORITHM :
 Import the necessary libraries, including Pandas.

 Load the two datasets into separate DataFrames.

 Identify the common column(s) on which the datasets will be merged.

 Perform an inner join between the datasets using the pd.merge() function.

 Perform a left join between the datasets using the pd.merge() function with
the how='left' parameter.

 Perform a right join between the datasets using the pd.merge() function with
the how='right' parameter.

 Display the merged datasets for each type of join.

PROGRAM :

import pandas as pd

df1 = pd.read_csv(‘lab-12(df1).csv’)

df2 = pd.read_csv('lab-12(df2).csv’)
common_column = 'Close'

inner_join = pd.merge(df1, df2, on=common_column, how='inner')

left_join = pd.merge(df1, df2, on=common_column, how='left')

right_join = pd.merge(df1, df2, on=common_column, how='right')

print("Inner Join:")
print(inner_join.head())

print("\nLeft Join:")
print(left_join.head())

print("\nRight Join:")
print(right_join.head())
OUTPUT:

RESULT:
The output will be the merged datasets for each type of join. Each merged dataset
will contain rows from the original datasets based on the specified join type (inner,
left, or right), along with columns from both datasets.
EXP NO DATE
13 Comparison of Numerical Variable Distributions
Across Categories Using Box Plots

AIM :

The aim of this program is to create a box plot using Matplotlib or Seaborn to
visualize the distribution of a numerical variable across different categories and
Add appropriate labels and titles to the plot.
ALGORITHM :

 Import necessary libraries.

 Read your data into a pandas Data Frame or any other data structure.
 Group your data by the categorical variable.
 Use Matplotlib or Seaborn to create a box plot.
 Pass the grouped data to the box plot function.
 Specify the x-axis as the categorical variable and the y-axis as the numerical
variable.
 Add appropriate labels to the x-axis and y-axis.
 Add a title to the plot.
 Display the plot

PROGRAM :

import matplotlib.pyplot as plt

import numpy as np
np.random.seed(10)
data = {
'Category A': np.random.normal(loc=0, scale=1, size=100),
'Category B': np.random.normal(loc=1, scale=1.5, size=100),
'Category C': np.random.normal(loc=-1, scale=0.5, size=100)
}
plt.figure(figsize=(8, 6))
plt.boxplot(data.values(), labels=data.keys())
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Box Plot of Numerical Variable Across Categories')
plt.grid(True)
plt.show()

OUTPUT:

RESULT :

Thus the program with appropriate labels for the x-axis, y-axis, and title,
making it easy to interpret the distribution of the numerical variable across
different categories has been executed successfully.
EXP NO DATE
14 Bar Chart of Category Frequency Distribution

AIM :
The aim of this program is to visualize the frequency distribution of
categories present in a dataset using a bar chart.
ALGORITHM :
 Import necessary libraries: Pandas, Matplotlib
 Read the dataset from a CSV file using pandas
 Compute the frequency counts of each category in the dataset.
 Create a bar chart to visualize the frequency distribution of categories
 Set the title of the plot and label the x and y axes
 Rotate the x-axis labels to prevent overcrowding
 Display the plot

PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset
df = pd.read_csv('lab-14.csv')
category_counts = df['Category'].value_counts()
# Plotting the bar chart
plt.figure(figsize=(10, 6))
category_counts.plot(kind='bar')
plt.title('Frequency Distribution of Categories')
plt.xlabel('Category')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

OUTPUT:

RESULT :
Thus, the resulting chart shows the frequency distribution of categories in the
dataset.
DATE
EXP NO Correlation Analysis using Scatter Plot
15

AIM :
The aim of this Python program is to calculate the correlation coefficient
between two numerical variables from a CSV dataset and visualize their
correlation using a scatter plot.
ALGORITHM :
 Import the necessary libraries: pandas for data manipulation and
matplotlib.pyplot for plotting.
 Load the dataset from a CSV file into a pandas DataFrame.
 Extract two numerical variables from the DataFrame.
 Calculate the correlation coefficient between the two variables using the
corr() function.
 Print the calculated correlation coefficient.
 Visualize the correlation using a scatter plot.
 Display the scatter plot.

PROGRAM :
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset

df = pd.read_csv('lab-15.csv')
# Select the two numerical variables for correlation analysis
variable1 = df['Likes/Reactions']#var-1
variable2 = df['Comments']#var-2
# Calculate the correlation coefficient
correlation_coefficient = variable1.corr(variable2)

# Print the correlation coefficient

print("Correlation Coefficient between variable1 and variable2:",
correlation_coefficient)

# Visualize the correlation using a scatter plot

plt.figure(figsize=(8, 6))
plt.scatter(variable1, variable2, color='blue', alpha=0.5)
plt.title('Scatter Plot of variable1 vs variable2')
plt.xlabel('Likes/Reactions')
plt.ylabel('Comments')
plt.grid(True)
plt.show()

OUPUT:
Correlation Coefficient between variable1 and variable2: 0.9222499938509208
RESULT :
This Python program for Correlation analysis using scatter plot has been
executed successfully.

Cisco UCS Troubleshooting
No ratings yet
Cisco UCS Troubleshooting
134 pages
25 Books For Success
No ratings yet
25 Books For Success
10 pages
CKA Certified Kubernetes Administrator Updated Practice Questions
No ratings yet
CKA Certified Kubernetes Administrator Updated Practice Questions
7 pages
Scenario Based Python Questions-Unit 1
No ratings yet
Scenario Based Python Questions-Unit 1
30 pages
SC-300 Reviewer
No ratings yet
SC-300 Reviewer
23 pages
Computer Practical Term1
No ratings yet
Computer Practical Term1
13 pages
Practical File Computer Science Class 12th 2024-25-Output
No ratings yet
Practical File Computer Science Class 12th 2024-25-Output
41 pages
BCSL-021 Solved Assignment 2023-24 - Protected
No ratings yet
BCSL-021 Solved Assignment 2023-24 - Protected
18 pages
From Forms To HTML: Understanding and Using Oracle Projects' HTML Pages
100% (1)
From Forms To HTML: Understanding and Using Oracle Projects' HTML Pages
29 pages
Computer Science
No ratings yet
Computer Science
33 pages
Python Lab - Record (Exp 1 - 7)
No ratings yet
Python Lab - Record (Exp 1 - 7)
21 pages
Talha Nadeem 11610
100% (1)
Talha Nadeem 11610
6 pages
Pandas Notes 1
No ratings yet
Pandas Notes 1
6 pages
05HiMAP MC (1230)
No ratings yet
05HiMAP MC (1230)
20 pages
Class XII CS Practical File
No ratings yet
Class XII CS Practical File
25 pages
CS Practical
No ratings yet
CS Practical
74 pages
Output
No ratings yet
Output
60 pages
Download
No ratings yet
Download
46 pages
Lecture 21 - CSS Lists
No ratings yet
Lecture 21 - CSS Lists
66 pages
Grade XII - Computer Science Practical Manual
No ratings yet
Grade XII - Computer Science Practical Manual
40 pages
Pract File - Part3 - Final Computer Practical Helpful Notes Download It
No ratings yet
Pract File - Part3 - Final Computer Practical Helpful Notes Download It
36 pages
Practical File Questions
No ratings yet
Practical File Questions
34 pages
Class 12 Practical Programs
No ratings yet
Class 12 Practical Programs
55 pages
Class Xii Report File 29 11 2022
No ratings yet
Class Xii Report File 29 11 2022
56 pages
Computer Practical
No ratings yet
Computer Practical
79 pages
Xii Practical File
No ratings yet
Xii Practical File
28 pages
Class 12 Cs Practical Programs 2023-2024 (Updated)
No ratings yet
Class 12 Cs Practical Programs 2023-2024 (Updated)
43 pages
STD XII CS Rec 2023-24
No ratings yet
STD XII CS Rec 2023-24
41 pages
Computer Science Practical Assignment XII
No ratings yet
Computer Science Practical Assignment XII
26 pages
GlassJet AR6000 Operation Manual Rev E
No ratings yet
GlassJet AR6000 Operation Manual Rev E
196 pages
Comuter Practical Filje
No ratings yet
Comuter Practical Filje
21 pages
Computer Practical
No ratings yet
Computer Practical
44 pages
GR 12-Program List & Pgms With Output (1-22) Edit
No ratings yet
GR 12-Program List & Pgms With Output (1-22) Edit
49 pages
Cspratical
No ratings yet
Cspratical
33 pages
Priyanka Final Project
No ratings yet
Priyanka Final Project
71 pages
Practical File Computer Science - 24 - 25
No ratings yet
Practical File Computer Science - 24 - 25
25 pages
Siddhant Part 1
No ratings yet
Siddhant Part 1
54 pages
LabManual (1 13)
No ratings yet
LabManual (1 13)
24 pages
Class 12 Practical List
No ratings yet
Class 12 Practical List
21 pages
Cs Practicals
No ratings yet
Cs Practicals
61 pages
12th Practical File 2022-23 All Programs
No ratings yet
12th Practical File 2022-23 All Programs
19 pages
Cs Record File
No ratings yet
Cs Record File
19 pages
One Score Questions and Answers - CS - Plus2 - New
No ratings yet
One Score Questions and Answers - CS - Plus2 - New
19 pages
Finally Yy Yyy Yyy
No ratings yet
Finally Yy Yyy Yyy
33 pages
12 Cs Lab Progrms 1 20
No ratings yet
12 Cs Lab Progrms 1 20
39 pages
Program 1
No ratings yet
Program 1
33 pages
Computer Science Practical Exercises 2024
No ratings yet
Computer Science Practical Exercises 2024
15 pages
Practical File 2025-26
No ratings yet
Practical File 2025-26
33 pages
Practical File
No ratings yet
Practical File
32 pages
Arpit Negi Project CS
No ratings yet
Arpit Negi Project CS
29 pages
Cs Practical Files
No ratings yet
Cs Practical Files
30 pages
Class-12 Practical Program
No ratings yet
Class-12 Practical Program
30 pages
LatestPythonLabManual-2023 Batch
No ratings yet
LatestPythonLabManual-2023 Batch
15 pages
Finalize
No ratings yet
Finalize
26 pages
Lecture 1
No ratings yet
Lecture 1
59 pages
JAVA QUESTION - MR - ABHISHEK AVULA
No ratings yet
JAVA QUESTION - MR - ABHISHEK AVULA
6 pages
Python Lab Manual
No ratings yet
Python Lab Manual
17 pages
CSPROJECT (1) (1) (1) (2) (1) - Organized
No ratings yet
CSPROJECT (1) (1) (1) (2) (1) - Organized
38 pages
Python Programs
No ratings yet
Python Programs
10 pages
Ionic Tutorial
No ratings yet
Ionic Tutorial
14 pages
Computer Science
No ratings yet
Computer Science
11 pages
CSC Record Xii prg5-13
No ratings yet
CSC Record Xii prg5-13
9 pages
Computer Practical Programs
No ratings yet
Computer Practical Programs
10 pages
Updated Python Manual2024-25
No ratings yet
Updated Python Manual2024-25
15 pages
Python For Cybersecurity Using Python For Cyber Offense and Defense 1st Edition Poston Iii Download
100% (2)
Python For Cybersecurity Using Python For Cyber Offense and Defense 1st Edition Poston Iii Download
53 pages
CSPractical File
No ratings yet
CSPractical File
13 pages
Practical New
No ratings yet
Practical New
5 pages
Orange CS083 12 MS
No ratings yet
Orange CS083 12 MS
18 pages
Solution - Practical - File - List - XII - CS - 202425
No ratings yet
Solution - Practical - File - List - XII - CS - 202425
11 pages
Sansan Brochure - June 2020
No ratings yet
Sansan Brochure - June 2020
8 pages
Lab Programs Python
No ratings yet
Lab Programs Python
20 pages
Genrate QR Code in EBS Custom Report - P
No ratings yet
Genrate QR Code in EBS Custom Report - P
8 pages
First Two PLC Lab Programs Eee - Ec
No ratings yet
First Two PLC Lab Programs Eee - Ec
10 pages
Computer Science-CLASS-12-RECORD PROGRAMS
No ratings yet
Computer Science-CLASS-12-RECORD PROGRAMS
10 pages
Practical
No ratings yet
Practical
14 pages
Lecture #04, Microprocessor Lab
No ratings yet
Lecture #04, Microprocessor Lab
7 pages
Practical No
No ratings yet
Practical No
7 pages
Garbage Collector Robot
No ratings yet
Garbage Collector Robot
6 pages
Board Practical Solutions
No ratings yet
Board Practical Solutions
6 pages
SLT Form Two
No ratings yet
SLT Form Two
5 pages
What Is Cluster Computing?: Clear Answers For Common Questions
No ratings yet
What Is Cluster Computing?: Clear Answers For Common Questions
5 pages
PLC Additional Programs
No ratings yet
PLC Additional Programs
5 pages
Powersynth: Multi-Chip Power Module Layout Synthesis: Application of Fast Design Optimization Tools For Mcpms
No ratings yet
Powersynth: Multi-Chip Power Module Layout Synthesis: Application of Fast Design Optimization Tools For Mcpms
1 page
3 DOF Gyroscope Data Sheet
No ratings yet
3 DOF Gyroscope Data Sheet
2 pages
Mini Project
No ratings yet
Mini Project
2 pages
Resume - Albina Ismayilova
No ratings yet
Resume - Albina Ismayilova
2 pages
Blue Simple Professional CV Resume
No ratings yet
Blue Simple Professional CV Resume
1 page
AWS CloudTrail CheatSheet
No ratings yet
AWS CloudTrail CheatSheet
1 page
Concert Band 9 10
No ratings yet
Concert Band 9 10
1 page
C Programs To Become Expert In Programming
From Everand
C Programs To Become Expert In Programming
Shubham Yadav
No ratings yet