0% found this document useful (0 votes)

7 views43 pages

Dfs Manual

Uploaded by

hexagonsih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views43 pages

Dfs Manual

Uploaded by

hexagonsih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

GOVERNMENT COLLEGE OF ENGINEERING, ERODE – 638 316

RECORD NOTE BOOK

Certified that this is the bonafide record of work done by Selvan / Selvi
_____________________________________ of SEVENTH Semester of B.E Electrical and
Electronics Engineering branch during the Academic Year 2024 – 2025 in the OCS353 – DATA
SCIENCE FUNDAMENTALS

Staff In-Charge Head of the Department

Submitted for the Anna University practical examination on __________________ at

Government College of Engineering, Erode – 638 316
Date: ________________________

Internal Examiner External Examiner

1
LIST OF EXPRIMENTS
EX. DATE EXPRIMENT NAME PAGE MARKS SIGN
NO NO

1 Working with NumPy 3

Write a NumPy program to convert a

2 python dictionary to a NumPy ndarray 7

3 Working with Pandas DataFrame 9

4 Basic Plots using Matplotlib 17

5 Univariate Analysis 22

Using the diabetes data set from UCI

6 data set for apply bivariate analysis 26

Statistical and Probability measures on

7 the Iris data set (This program requires 30
iris.csv file)

Supervised and Unsupervised learning

8 with python program 34

Apply and explore various plotting

9 function on any data set 37

2
Ex.no: 1
Date:

Working with NumPy

Aim
To write python programs to working with NumPy

Program
i. NumPy program to create a null vector of size 10 and update sixth value to 11
Ans:
import numpy as np
vector = np.zeros(10)
vector[5] = 11
print(vector)
output:
[ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]
ii. NumPy program to convert an array to a float type
import numpy as np
array = np.array([1, 2, 3, 4, 5])
float_array = array.astype(float)
print(float_array)
output:
[1. 2. 3. 4. 5.]
iii. NumPy program to create a 3 * 3 matrix with values ranging from 2 to 10
Ans:
import numpy as np
matrix = np.arange(2, 11).reshape(3, 3)
print(matrix)
output:
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]

3
iv. Write a NumPy program to convert a list and tuple into arrays
Ans:
import numpy as np
lst = [1, 2, 3, 4]
tpl = (5, 6, 7, 8)
array_from_list = np.array(lst)
array_from_tuple = np.array(tpl)
print(array_from_list)
print(array_from_tuple)
output:
[1 2 3 4]
[5 6 7 8]
v. Write a NumPy program to convert the values of Centigrade degrees into Fahrenheit degrees
and vice versa. Values have to be stored into a NumPy array.
Ans
import numpy as np
centigrade = np.array([0, 20, 37, 100])
fahrenheit = (centigrade * 9/5) + 32
print("Centigrade to Fahrenheit:", fahrenheit)
fahrenheit_to_centigrade = (fahrenheit - 32) * 5/9
print("Fahrenheit to Centigrade:", fahrenheit_to_centigrade)
output:
Centigrade to Fahrenheit: [ 32. 68. 98.6 212. ]
Fahrenheit to Centigrade: [ 0. 20. 37. 100.]
vi. Write a NumPy program to perform the basic arithmetic operations
Ans:
import numpy as np
array1 = np.array([10, 20, 30, 40])
array2 = np.array([1, 2, 3, 4])
addition = np.add(array1, array2)
subtraction = np.subtract(array1, array2)
multiplication = np.multiply(array1, array2)
division = np.divide(array1, array2)

4
print("Addition:", addition)
print("Subtraction:", subtraction)
print("Multiplication:", multiplication)
print("Division:", division)
Output:
Addition: [11 22 33 44]
Subtraction: [ 9 18 27 36]
Multiplication: [ 10 40 90 160]
Division: [10. 10. 10. 10.]
vii. Write a NumPy program to transpose an array
Ans:
import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6]])
transpose_array = np.transpose(array)
print("Original array:")
print(array)
print("Transposed array:")
print(transpose_array)
Output:
Original array:
[[1 2 3]
[4 5 6]]
Transposed array:
[[1 4]
[2 5]
[3 6]
viii. Use NumPy, create an array with 5 dimensions and verify that it has 5 dimensions
Ans:
import numpy as np
array_5d = np.ones((2, 2, 2, 2, 2))
print("Number of dimensions:", array_5d.ndim)

5
Output:
Number of dimensions: 5
ix. Write a NumPy program to merge three given NumPy arrays of same shape
Ans:
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array3 = np.array([7, 8, 9])
merged_array = np.concatenate((array1, array2, array3))
print("Merged array:", merged_array)
output:
Merged array: [1 2 3 4 5 6 7 8 9]
x. Create two arrays of six elements, write a NumPy program to count the number of instances
of a value occurring in one array on the condition of another array.
Ans:
import numpy as np
array1 = np.array([1, 2, 3, 2, 4, 2])
array2 = np.array([5, 6, 7, 6, 8, 6])
value_to_count = 2
condition_value = 6
count = np.sum((array1 == value_to_count) & (array2 == condition_value))
print("Number of instances:", count)
output:
Number of instances: 3

Result
Thus, the python program to work with NumPy has executed successfully

6
Ex.no: 2
Date:

Write a NumPy program to convert a python dictionary to a NumPy

ndarray.
Aim
To write python NumPy program to convert a python dictionary to a NumPy ndarray.

Program:
import numpy as np
# Original dictionary
data_dict = {
'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0},
'column1': {'a': 3.0, 'b': 1, 'c': 0.0, 'd': -1.0},
'column2': {'a': 4, 'b': 1, 'c': 5.0, 'd': -1.0},
'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}
}
# Convert the dictionary to a NumPy ndarray
ndarray = np.array([list(col.values()) for col in data_dict.values()]).T
print("Original dictionary:")
print(data_dict)
print("Type:")
print("ndarray:")
print(ndarray)
print("Type:", type(ndarray))

Sample output:
Original dictionary:
{‘column0’: {‘a’: 1, ‘b’: 0.0, ‘c’: 0.0, ‘d’: 2.0},
‘column1’: {‘a’: 3.0, ‘b’: 1, ‘c’: 0.0, ‘d’: -1.0},
‘column2’: {‘a’: 4, ‘b’: 1, ‘c’: 5.0, ‘d’: -1.0},
‘column3’: {‘a’: 3.0, ‘b’: -1.0, ‘c’: -1.0, ‘d’: -1.0}}
Type:
ndarray:

7
[[1. 0. 0. 2.]
[3. 1. 0. -1.]
[4. 1. 5. -1.]
[3. -1. -1. -1.]]
Type:<class ‘numpy.ndarray’>

output:
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0}, 'column1': {'a': 3.0, 'b': 1, 'c': 0.0, 'd': -1.0}, 'column2': {'a':
4, 'b': 1, 'c': 5.0, 'd': -1.0}, 'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type:
ndarray:
[[ 1. 3. 4. 3.]
[ 0. 1. 1. -1.]
[ 0. 0. 5. -1.]
[ 2. -1. -1. -1.]]
Type: <class 'numpy.ndarray'>

Result
Thus, python NumPy program to convert a python dictionary to a NumPy ndarray is executed
successfully.

8
Ex.no: 3
Date:

Working with Pandas DataFrame

Aim
To write python program to work with Pandas DataFrame.

Program
i. Create your own simple Pandas DataFrame and print its values.
import pandas as pd
# Creating a simple DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
# Printing DataFrame values
print("DataFrame values:")
print(df.values)
output:
DataFrame values:
[['Alice' 24 'New York']
['Bob' 27 'Los Angeles']
['Charlie' 22 'Chicago']
['David' 32 'Houston']]
ii. Perform appending, slicing, addition and deletion of rows with a pandas dataframe.
import pandas as pd
# Initial DataFrame
data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)

9
# 1. Append a new row
new_row = pd.DataFrame([{'Name': 'Eve', 'Age': 29, 'City': 'San Francisco'}])
df = pd.concat([df, new_row], ignore_index=True)
# 2. Slice rows (e.g., select rows 1 to 3)
sliced_df = df.iloc[1:4]
print("Sliced DataFrame (rows 1 to 3):")
print(sliced_df)
# 3. Add rows (concatenate with another DataFrame)
additional_data = pd.DataFrame({
'Name': ['Frank', 'Grace'],
'Age': [30, 25],
'City': ['Seattle', 'Austin']
})
df = pd.concat([df, additional_data], ignore_index=True)
# 4. Delete a row by index (e.g., delete row with index 2)
df = df.drop(index=2)
print("\nDataFrame after appending, adding, and deleting rows:")
print(df)
Output:
Sliced DataFrame (rows 1 to 3):
Name Age City
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
DataFrame after appending, adding, and deleting rows:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
3 David 32 Houston
4 Eve 29 San Francisco
5 Frank 30 Seattle
6 Grace 25 Austin

10
iii. Using Pandas, Create a DataFrame with a list of dictionaries, row indices, and column
indices
Program
import pandas as pd
# List of dictionaries
data = [
{'Name': 'Alice', 'Age': 24, 'City': 'New York'},
{'Name': 'Bob', 'Age': 27, 'City': 'Los Angeles'},
{'Name': 'Charlie', 'Age': 22, 'City': 'Chicago'},
{'Name': 'David', 'Age': 32, 'City': 'Houston'}
]
# Specifying row indices and column order
df = pd.DataFrame(data, index=['row1', 'row2', 'row3', 'row4'], columns=['Name', 'Age', 'City'])
print("DataFrame with specified row and column indices:")
print(df)
Output:
DataFrame with specified row and column indices:
Name Age City
row1 Alice 24 New York
row2 Bob 27 Los Angeles
row3 Charlie 22 Chicago
row4 David 32 Houston
iv. Write a Pandas program to goet the powers of an array values element-wise.
Note: First array elements raised to powers from second array
Sample data:
{‘X’: [78, 85, 96, 80, 86], ‘Y’: [84, 94, 89, 83, 86], ‘Z’: [86, 97, 96, 72, 83]}
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 72
3 80 83 72
4 86 86 83

11
Program
import pandas as pd
import numpy as np
# Sample data as a dictionary
data = {'X': [78, 85, 96, 80, 86], 'Y': [84, 94, 89, 83, 86], 'Z': [86, 97, 96, 72, 83]}
df = pd.DataFrame(data)
# Element-wise power: X raised to the power of Y
df['Power_X_Y'] = np.power(df['X'], df['Y'])
print("Original DataFrame:")
print(df[['X', 'Y', 'Z']])
print("\nDataFrame with element-wise power of X^Y:")
print(df[['X', 'Y', 'Z', 'Power_X_Y']])
Output:
Original DataFrame:
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
DataFrame with element-wise power of X^Y:
X Y Z Power_X_Y
0 78 84 86 0
1 85 94 97 4551265826121030281
2 96 89 96 0
3 80 83 72 0
4 86 86 83 0

12
v. Write a Pandas Program to get the numeric representation of an array by identifying distinct
values of a given column of a DataFrame
Sample output:
Original DataFrame:
Name Date_Of_Birth Age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcnell 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Gino Mcnell 15/09/1997 23.0
Numeric representation of an array by identifying distinct values:
[0 1 2 3 1]
Index([‘Alberto Franco’, ‘Gino Mcnell’, ‘Ryan Parkes’, ‘Eesha Hinton’], dtype=’object’)
Program
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Alberto Franco', 'Gino Mcnell', 'Ryan Parkes', 'Eesha Hinton', 'Gino Mcnell'],
'Date_Of_Birth': ['17/05/2002', '16/02/1999', '25/09/1998', '11/05/2002', '15/09/1997'],
'Age': [18.5, 21.2, 22.5, 22.0, 23.0]
}
df = pd.DataFrame(data)
# Getting the numeric representation of 'Name' column by identifying distinct values
df['Name_numeric'] = pd.factorize(df['Name'])[0]
print("Original DataFrame:")
print(df[['Name', 'Date_Of_Birth', 'Age']])
print("\nNumeric representation of an array by identifying distinct values:")
print(df['Name_numeric'].values)
print("\nUnique names with their numeric index mapping:")
print(pd.Index(df['Name'].unique()))

13
Output:
Original DataFrame:
Name Date_Of_Birth Age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcnell 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Gino Mcnell 15/09/1997 23.0
Numeric representation of an array by identifying distinct values:
[0 1 2 3 1]
Unique names with their numeric index mapping:
Index(['Alberto Franco', 'Gino Mcnell', 'Ryan Parkes', 'Eesha Hinton'], dtype='object')

vi. Write a Pandas program to count the number of rows and columns of a DataFrame.
Sample python dictionary data and list labels:
exam_data = {‘name’: [‘Anastasia’, ‘Dima’, ‘Katherine’, ‘James’, ‘Emily’, ‘Michael’, ‘Matthew’,
‘Laura’, ‘Kevin’, ‘Jonas’],
‘score’: [12.5, 9, 16.5, np.nan, 9. 20, 14.5, np.nan, 8. 19],
‘attempts’: [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
‘qualify’: [‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘yes’, ‘yes’, ‘no’, ‘no’, ‘yes’]}
labels = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
Expected Output:
Number of Rows: 10
Number of Columns: 4
Program
import pandas as pd
import numpy as np
exam_data = {
'name': ['BarathKumar', 'TamilSelvan', 'Dharshan', 'Saravanan', 'SudhanKumar', 'EsaiVani',
'KalaiVani', 'Rupriya', 'Abirami', 'Murugan'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']

14
}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
# Creating the DataFrame with row labels
df = pd.DataFrame(exam_data, index=labels)
# Counting rows and columns
num_rows = df.shape[0]
num_columns = df.shape[1]
print("Number of Rows:", num_rows)
print("Number of Columns:", num_columns)
Output:
Number of Rows: 10
Number of Columns: 4
vii. Write a Pandas program to check a given column is present in a DataFrame or not
Sample data:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Col4 is not present in DataFrame.
Col1 is present in DataFrame.
Program
import pandas as pd
data = {
'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]
}
df = pd.DataFrame(data)
def check_column_presence(df, column_name):

15
if column_name in df.columns:
print(f"{column_name} is present in DataFrame.")
else:
print(f"{column_name} is not present in DataFrame.")
check_column_presence(df, 'col4')
check_column_presence(df, 'col1')
Output:
col4 is not present in DataFrame.
col1 is present in DataFrame.

Result
Thus, the python programs to work with Pandas DataFrame are executed successfully.

16
Ex.no: 4
Date:

Basic Plots using Matplotlib

Aim
To write python programs to plot basic plots using Matplotlib
i. Using the ‘concrete strength’ dataset, explore relationships between two continuous variables
with Scatterplots
Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set a random seed for reproducibility
np.random.seed(42)
# Save the DataFrame as a CSV file
file_path = '/content/concrete_strength_parabolic.csv'
df.to_csv(file_path, index=False)
print(f"Concrete strength dataset saved to {file_path}")
# Plotting relationships between continuous variables
# Scatterplot between 'Cement' and 'Strength'
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='Cement', y='Strength', color='blue')
plt.title('Relationship between Cement and Concrete Strength')
plt.xlabel('Cement (kg/m³)')
plt.ylabel('Concrete Strength (MPa)')
plt.show()
# Scatterplot between 'Water' and 'Strength'
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='Water', y='Strength', color='green')
plt.title('Relationship between Water and Concrete Strength')
plt.xlabel('Water (kg/m³)')
plt.ylabel('Concrete Strength (MPa)')

17
plt.show()

Output:

18
ii. Draw a Scatter Plot for the following Pandas DataFrame with Team name and Rank Points
as x and y axis,
[‘Australia’, 2500], [‘Bangladesh’, 1000], [‘England’, 2000], [‘India’, 3000], [‘Srilanka’, 1500]
Program
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create the DataFrame with team names and rank points
data = {
'Team': ['Australia', 'Bangladesh', 'England', 'India', 'Srilanka'],
'Rank Points': [2500, 1000, 2000, 3000, 1500]
}
df_teams = pd.DataFrame(data)
# Display the DataFrame
print("DataFrame:")
print(df_teams)
# Plotting the scatter plot
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df_teams, x='Team', y='Rank Points', color='Pink', s=100)
# Adding labels and title
plt.title('Scatter Plot of Team Rank Points')
plt.xlabel('Team')
plt.ylabel('Rank Points')
plt.show()

19
Output:

iii. make a three-dimensional plot with randomly generate 50 data points for x, y, and z. Set the
point colour as red, and size of the point as 50.
Ans:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
# Generating random data for x, y, and z axes
np.random.seed(42)
x = np.random.rand(50)
y = np.random.rand(50)
z = np.random.rand(50)
# Creating a 3D plot
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
# Plotting the points with specified color and size

20
ax.scatter(x, y, z, color='red', s=50)
# Adding labels for clarity
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z Axis')
ax.set_title('3D Scatter Plot with Random Data Points')
plt.show()
Output:

Result
Thus, the python programs to plot the basic plots using Matplotlib is executed successfully.

21
Ex.no: 5
Date:

Univariate Analysis
Aim
To write python programs to apply Univariate Analysis
Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
Apply Univariate analysis:
a. Frequency
b. Mean
c. Median
d. Mode
e. Variance
f. Standard Deviation
g. Skewness and Kurtosis

Program
# Replace with your actual file path if different
import pandas as pd
import numpy as np
from scipy import stats
# Load the dataset
file_path = "/content/diabetes.csv" # Replace with your actual file path if different
data = pd.read_csv(file_path)
# Filter data for Outcome = 0 and Outcome = 1
data_0 = data[data['Outcome'] == 0]
data_1 = data[data['Outcome'] == 1]
# Dictionary to store the results
analysis_results = {
"Outcome = 0": {
"Pregnancies Frequency": data_0["Pregnancies"].value_counts(),
"Glucose Mean": np.mean(data_0["Glucose"]),
"BloodPressure Median": np.median(data_0["BloodPressure"]),
"SkinThickness Mode": stats.mode(data_0["SkinThickness"])[0],
"Insulin Variance": np.var(data_0["Insulin"]),
"BMI Standard Deviation": np.std(data_0["BMI"]),

22
"DiabetesPedigreeFunction Skewness": stats.skew(data_0["DiabetesPedigreeFunction"]),
"Age Kurtosis": stats.kurtosis(data_0["Age"])
},

"Outcome = 1": {
"Pregnancies Frequency": data_1["Pregnancies"].value_counts(),
"Glucose Mean": np.mean(data_1["Glucose"]),
"BloodPressure Median": np.median(data_1["BloodPressure"]),
"SkinThickness Mode": stats.mode(data_1["SkinThickness"])[0],
"Insulin Variance": np.var(data_1["Insulin"]),
"BMI Standard Deviation": np.std(data_1["BMI"]),
"DiabetesPedigreeFunction Skewness": stats.skew(data_1["DiabetesPedigreeFunction"]),
"Age Kurtosis": stats.kurtosis(data_1["Age"])
}
}
# Display the analysis for both outcomes
for outcome, stats_dict in analysis_results.items():
print(f"\nStatistical Analysis for {outcome}:")
for stat_name, value in stats_dict.items():
print(f"{stat_name}: {value}")

output:

Statistical Analysis for Outcome = 0:

Pregnancies Frequency: Pregnancies
1 106
2 84
0 73
3 48
4 45
5 36
6 34
7 20

23
8 16
10 14
9 10
13 5
12 5
11 4
Name: count, dtype: int64
Glucose Mean: 109.98
BloodPressure Median: 70.0
SkinThickness Mode: 0
Insulin Variance: 9754.796735999955
BMI Standard Deviation: 7.682161307861215
DiabetesPedigreeFunction Skewness: 2.00021791479704
Age Kurtosis: 1.9318725201269862
Statistical Analysis for Outcome = 1:

Pregnancies Frequency: Pregnancies

0 38
1 29
3 27
7 25
4 23
8 22
5 21
2 19
9 18
6 16
10 10
11 7
13 5
12 4
14 2

24
15 1
17 1
Name: count, dtype: int64
Glucose Mean: 141.25746268656715
BloodPressure Median: 74.0
SkinThickness Mode: 0
Insulin Variance: 19162.902149699297
BMI Standard Deviation: 7.249404266473003
DiabetesPedigreeFunction Skewness: 1.7127179440927176
Age Kurtosis: -0.36378456012609117

Result
Thus, the python programs to apply Univariate Analysis is executed successfully

25
Ex.no: 6
Date:

Use the diabetes data set from UCI data set for performing the following:
Apply Bivariate Analysis
Program:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Load the dataset
file_path = "/content/diabetes.csv" # Update with your actual path if needed
data = pd.read_csv(file_path)
# Display dataset info
print("Dataset Info:")
print(data.info())
print("\nDataset Head:")
print(data.head())
# Multiple Regression Analysis - Logistic Regression for 'Outcome' Prediction
# Define predictors and target variable
X = data.drop(columns=["Outcome"]) # Independent variables
y = data["Outcome"] # Dependent variable
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Fit logistic regression model
logistic_model = LogisticRegression(max_iter=200)
logistic_model.fit(X_train, y_train)
# Predict on the test set
y_pred = logistic_model.predict(X_test)
# Model Evaluation

26
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("Logistic Regression Model Evaluation:")
print(f"Accuracy: {accuracy:.3f}")
print("Confusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)
# Logistic Regression Summary using StatsModels for detailed statistics
X_train_sm = sm.add_constant(X_train) # Adding constant for intercept in statsmodels
logit_model = sm.Logit(y_train, X_train_sm)
result = logit_model.fit()
print("\nLogistic Regression Analysis Summary (StatsModels):")
print(result.summary())

Output:
Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64

27
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
Dataset Head:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1

DiabetesPedigreeFunction Age Outcome

0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

Logistic Regression Model Evaluation:

Accuracy: 0.736
Confusion Matrix:
[[120 31]
[ 30 50]]

Classification Report:
precision recall f1-score support

0 0.80 0.79 0.80 151

1 0.62 0.62 0.62 80

accuracy 0.74 231

macro avg 0.71 0.71 0.71 231

28
weighted avg 0.74 0.74 0.74 231

Optimization terminated successfully.

Current function value: 0.459388
Iterations 6

Logistic Regression Analysis Summary (StatsModels):

Logit Regression Results
========================================================================
======
Dep. Variable: Outcome No. Observations: 537
Model: Logit Df Residuals: 528
Method: MLE Df Model: 8
Date: Sat, 26 Oct 2024 Pseudo R-squ.: 0.2905
Time: 17:33:50 Log-Likelihood: -246.69
converged: True LL-Null: -347.71
Covariance Type: nonrobust LLR p-value: 2.378e-39
========================================================================
====================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------
const -9.4451 0.915 -10.321 0.000 -11.239 -7.651
Pregnancies 0.0580 0.039 1.477 0.140 -0.019 0.135
Glucose 0.0359 0.005 7.714 0.000 0.027 0.045
BloodPressure -0.0108 0.007 -1.584 0.113 -0.024 0.003
SkinThickness -0.0015 0.008 -0.179 0.858 -0.018 0.015
Insulin -0.0010 0.001 -0.884 0.377 -0.003 0.001
BMI 0.1090 0.019 5.740 0.000 0.072 0.146
DiabetesPedigreeFunction 0.4215 0.357 1.182 0.237 -0.278 1.120
Age 0.0359 0.012 3.106 0.002 0.013 0.059

Result
Thus, the python program to Bivariant analysis with the diabetes data set from UCI data set is
executed successfully.

29
Ex.no: 7
Date:

Statistical and Probability measures on the Iris data set (This program
requires iris.csv file)
Aim
To write a python program to apply statistical and probability measures on any data set

Program
import pandas as pd
import matplotlib.pyplot as plt
# Load the Iris dataset from a text file, Excel file, or from the web
# 1. Reading data from a text file (CSV format)
# Uncomment if you have iris.csv locally:
# df_text = pd.read_csv('path_to_your_file/iris.csv')
# 2. Reading data from an Excel file
# Uncomment if you have iris.xlsx locally:
df_web = pd.read_csv('/content/iris.csv')
# 3. Reading data directly from a URL (web)
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
#df_web = pd.read_csv(url, header=None, names=column_names)
# Displaying the first few rows to verify data load
print("First five rows of the Iris dataset:")
print(df_web.head())
# Descriptive Analytics on the Iris dataset
# 1. Basic information about the dataset
print("\nDataset Information:")
print(df_web.info())
# 2. Summary statistics
print("\nSummary Statistics:")
print(df_web.describe())
# 3. Checking for unique species
print("\nUnique Species in the dataset:")

30
print(df_web['species'].unique())
# 4. Count of each species
print("\nCount of each species:")
print(df_web['species'].value_counts())
# 5. Mean, median, and standard deviation of Sepal Length
print("\nMean Sepal Length:", df_web['sepal_length'].mean())
print("Median Sepal Length:", df_web['sepal_length'].median())
print("Standard Deviation of Sepal Length:", df_web['sepal_length'].std())
# 6. Correlation matrix to see relationships between variables
print("\nCorrelation Matrix:")
print(df_web.corr())
# 7. Grouping data by species and calculating mean values
print("\nMean values by species:")
print(df_web.groupby('species').mean())
# 8. Plotting pairplot for visual analysis (if needed)
#Uncomment to visualize if running in an environment with plotting capability
import seaborn as sns
sns.pairplot(df_web, hue="species")
plt.show()
Output:
First five rows of the Iris dataset:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype

31
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
None
Summary Statistics:
sepal_length sepal_width petal_length petal_width species
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333 1.000000
std 0.828066 0.435866 1.765298 0.762238 0.819232
min 4.300000 2.000000 1.000000 0.100000 0.000000
25% 5.100000 2.800000 1.600000 0.300000 0.000000
50% 5.800000 3.000000 4.350000 1.300000 1.000000
75% 6.400000 3.300000 5.100000 1.800000 2.000000
max 7.900000 4.400000 6.900000 2.500000 2.000000
Unique Species in the dataset:
[0 1 2]
Count of each species:
species
0 50
1 50
2 50
Name: count, dtype: int64
Mean Sepal Length: 5.843333333333334
Median Sepal Length: 5.8
Standard Deviation of Sepal Length: 0.8280661279778629
Correlation Matrix:
sepal_length sepal_width petal_length petal_width species
sepal_length 1.000000 -0.117570 0.871754 0.817941 0.782561

32
sepal_width -0.117570 1.000000 -0.428440 -0.366126 -0.426658
petal_length 0.871754 -0.428440 1.000000 0.962865 0.949035
petal_width 0.817941 -0.366126 0.962865 1.000000 0.956547
species 0.782561 -0.426658 0.949035 0.956547 1.000000
Mean values by species:
sepal_length sepal_width petal_length petal_width
species
0 5.006 3.428 1.462 0.246
1 5.936 2.770 4.260 1.326
2 6.588 2.974 5.552 2.026

Result
Thus, the python program to apply statistical and probability measures on any data set is
executed successfully.

33
Ex.no: 8
Date:

Supervised and Unsupervised learning with python program

Aim
To implement Supervised and unsupervised learning with python program
Program
i. Supervised learning
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url="https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names=['sepal-length','sepal-width','petal-length','petal-width','Class']
dataset=pd.read_csv(url,names=names)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
from sklearn.preprocessing import LabelEncoder
from matplotlib.colors import ListedColormap
# Encode the class labels as integers
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
# Use only the first two features for a 2D decision boundary plot
X_two_features = X[:, :2]
X_train_2D, X_test_2D, y_train_2D, y_test_2D = train_test_split(X_two_features, y_encoded,
test_size=0.20, random_state=42)
# Fit KNN model on 2D data
knn_2D = KNeighborsClassifier(n_neighbors=5)
knn_2D.fit(X_train_2D, y_train_2D)
# Create a mesh grid for plotting the decision boundary
h = .02

34
x_min, x_max = X_two_features[:, 0].min() - 1, X_two_features[:, 0].max() + 1
y_min, y_max = X_two_features[:, 1].min() - 1, X_two_features[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Predict the labels for each point in the mesh grid
Z = knn_2D.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot the decision boundary
plt.figure(figsize=(8, 6))
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
plt.contourf(xx, yy, Z, cmap=cmap_light)
# Plot the original data points
plt.scatter(X_two_features[:, 0], X_two_features[:, 1], c=y_encoded, cmap=cmap_bold,
edgecolor='k', s=20)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title("KNN Decision Boundary (2 features)")
plt.show()
output:

35
ii. Unsupervised learning Implementation of K-means Clustering Algorithm
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
np.random.seed(0)
X = np.random.randn(200, 2) + np.array([2, 2])
X = np.vstack((X, np.random.randn(200, 2) + np.array([-2, -2])))
X = np.vstack((X, np.random.randn(200, 2) + np.array([2, -2])))
X = np.vstack((X, np.random.randn(200, 2) + np.array([-2, 2])))
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='*', s=200,
color='black')
plt.show()

OUTPUT

Result
Thus, the python program to implement Supervised and unsupervised learning with python
program

36
Ex. No: 9
Date:

Apply and explore various plotting functions on any data set.

Aim
To apply and explore various plotting functions on any data set
i. Apply and explore various plotting functions on UCI data set for performing the following:
i. Normal Value
ii. Density and contour plots
iii. Three-dimensional plotting
Program
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy.stats import norm

# Load the dataset

file_path = "/content/diabetes.csv" # Update path as needed
data = pd.read_csv(file_path)

# 1. Normal Value Distribution Plot

# Let's take 'Glucose' and 'BMI' as examples
plt.figure(figsize=(14, 6))

# Plot for Glucose

plt.subplot(1, 2, 1)
sns.histplot(data['Glucose'], kde=True, stat="density", line_kws={'linestyle':'--'}, color="skyblue")
x_vals = np.linspace(data['Glucose'].min(), data['Glucose'].max(), 100)
plt.plot(x_vals, norm.pdf(x_vals, data['Glucose'].mean(), data['Glucose'].std()), color="red",
linestyle="--")
plt.title("Normal Distribution of Glucose")
plt.xlabel("Glucose")

37
plt.ylabel("Density")

# Plot for BMI

plt.subplot(1, 2, 2)
sns.histplot(data['BMI'], kde=True, stat="density", line_kws={'linestyle':'--'}, color="orange")
x_vals = np.linspace(data['BMI'].min(), data['BMI'].max(), 100)
plt.plot(x_vals, norm.pdf(x_vals, data['BMI'].mean(), data['BMI'].std()), color="red", linestyle="--")
plt.title("Normal Distribution of BMI")
plt.xlabel("BMI")
plt.ylabel("Density")
plt.show()

# 2. Density and Contour Plots

# Using Glucose vs. Insulin for example
plt.figure(figsize=(8, 6))
sns.kdeplot(x=data['Glucose'], y=data['Insulin'], cmap="coolwarm", fill=True, thresh=0.05)
plt.title("Density and Contour Plot of Glucose vs Insulin")
plt.xlabel("Glucose")
plt.ylabel("Insulin")
plt.show()

# 3. Three-Dimensional Plotting
# 3D plot of Age, BMI, and Glucose colored by Outcome
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot
sc = ax.scatter(data['Age'], data['BMI'], data['Glucose'], c=data['Outcome'], cmap="viridis", s=50,
alpha=0.7)
ax.set_xlabel("Age")
ax.set_ylabel("BMI")
ax.set_zlabel("Glucose")
ax.set_title("3D Plot of Age, BMI, and Glucose")

38
plt.colorbar(sc, label="Outcome")
plt.show()
Output:

39
40
ii. Apply and explore various plotting functions on UCI data set for performing the following:
i. Correlation and scatter plots
ii. Histograms
iii. Three-dimensional plotting
Program:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

# Load the dataset

file_path = "/content/diabetes.csv"
data = pd.read_csv(file_path)

# i. Correlation and Scatter Plots

# Correlation Heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(data.corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Matrix")
plt.show()

# Pairplot for Scatter Plots (pairwise relationships between variables)

sns.pairplot(data, hue="Outcome", diag_kind="kde")
plt.suptitle("Scatter Plots for Pairwise Relationships", y=1.02)
plt.show()

# ii. Histograms
# Plot histograms for continuous variables
data.hist(bins=15, figsize=(15, 10), color="skyblue", edgecolor="black")
plt.suptitle("Histograms of Diabetes Dataset Features", y=0.95)
plt.show()

41
# iii. Three-Dimensional Plotting
# 3D plot of Age, BMI, and Glucose colored by Outcome
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

output:

42
Result
Thus, the python program to apply and explore various plotting functions on any data set

IPMVP StatisticsUncertainty 2014
100% (1)
IPMVP StatisticsUncertainty 2014
26 pages
Econometrics by Example PDF
No ratings yet
Econometrics by Example PDF
1 page
Introduction To Social Psychology Notes
100% (1)
Introduction To Social Psychology Notes
5 pages
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
No ratings yet
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
53 pages
My Practical File
100% (1)
My Practical File
40 pages
Python Numpy Programming: Eliot Feibush
No ratings yet
Python Numpy Programming: Eliot Feibush
66 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Machine Learning Codes
No ratings yet
Machine Learning Codes
30 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
FDS Record
No ratings yet
FDS Record
59 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
55 pages
Effects of Ambient Odors of Lavender and Cloves On Cognition, Memory, Affect and Mood
No ratings yet
Effects of Ambient Odors of Lavender and Cloves On Cognition, Memory, Affect and Mood
12 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
BDA (18CS72) Module-5
No ratings yet
BDA (18CS72) Module-5
52 pages
Section 7
No ratings yet
Section 7
33 pages
11th PGM
No ratings yet
11th PGM
9 pages
Fds Lab 1-3 Exp
No ratings yet
Fds Lab 1-3 Exp
18 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
Correlation and Regression - Interview Questions in Business Analytics
No ratings yet
Correlation and Regression - Interview Questions in Business Analytics
5 pages
1 PB
No ratings yet
1 PB
18 pages
Module3 Advance Pythonlibraries
No ratings yet
Module3 Advance Pythonlibraries
53 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Mod 2 Finalans
No ratings yet
Mod 2 Finalans
9 pages
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
No ratings yet
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
29 pages
Miyar Ahmed Proposal
100% (1)
Miyar Ahmed Proposal
26 pages
Manual
No ratings yet
Manual
21 pages
Week1 3 PR2
No ratings yet
Week1 3 PR2
4 pages
Fds PDF
No ratings yet
Fds PDF
58 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Scientific Computing
No ratings yet
Scientific Computing
24 pages
1.-Book of Program 2018
100% (1)
1.-Book of Program 2018
308 pages
Statistical Methods For Bioinformatics Lecture 2
No ratings yet
Statistical Methods For Bioinformatics Lecture 2
47 pages
2017 An Analysis of The Factors Affecting The Spending and Saving Habits of College Students Samantha Villanueva Skidmore College
No ratings yet
2017 An Analysis of The Factors Affecting The Spending and Saving Habits of College Students Samantha Villanueva Skidmore College
25 pages
OSDBMS
No ratings yet
OSDBMS
59 pages
The Effect of Dividend Policy On Stock Price: Evidence From The Indian Market
No ratings yet
The Effect of Dividend Policy On Stock Price: Evidence From The Indian Market
9 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
Python
No ratings yet
Python
20 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
Elsevier - First and Second Order Sensitivity Analysis of MLP
No ratings yet
Elsevier - First and Second Order Sensitivity Analysis of MLP
9 pages
Manual
No ratings yet
Manual
52 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Research Reviewer 2
No ratings yet
Research Reviewer 2
12 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
5 WEEK Python Programs
No ratings yet
5 WEEK Python Programs
20 pages
Experiment No-7 Aaryo PDF
No ratings yet
Experiment No-7 Aaryo PDF
8 pages
SEMMplus Syntax
No ratings yet
SEMMplus Syntax
28 pages
Univds
No ratings yet
Univds
8 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Python Lab Manual
No ratings yet
Python Lab Manual
17 pages
Business Statistics Sem 3
No ratings yet
Business Statistics Sem 3
21 pages
Batch 1 Set Question
No ratings yet
Batch 1 Set Question
3 pages
Ds Lab
No ratings yet
Ds Lab
15 pages
Development of Mathematics Interest in Adolescence: Influences of Gender, Family, and School Context
No ratings yet
Development of Mathematics Interest in Adolescence: Influences of Gender, Family, and School Context
31 pages
Pandas Numpy
No ratings yet
Pandas Numpy
7 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
CS3361 DS LAB - Edited
No ratings yet
CS3361 DS LAB - Edited
2 pages
Data Science Record
No ratings yet
Data Science Record
44 pages
Research in Autism Spectrum Disorders: Laudan B. Jahromi, Crystal I. Bryce, Jodi Swanson
No ratings yet
Research in Autism Spectrum Disorders: Laudan B. Jahromi, Crystal I. Bryce, Jodi Swanson
12 pages
Batch2 Ds
No ratings yet
Batch2 Ds
34 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Python Programming Manual
No ratings yet
Python Programming Manual
12 pages
Fods Lab Ans
No ratings yet
Fods Lab Ans
36 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Determinants of House Prices in Turkey A HPM and ANN
No ratings yet
Determinants of House Prices in Turkey A HPM and ANN
10 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
TQM Unit V
No ratings yet
TQM Unit V
86 pages
Untitled 8
No ratings yet
Untitled 8
2 pages
Python Exps Questions
No ratings yet
Python Exps Questions
10 pages
Assessment of The Human Factors Influences On Maritime Accidents in Tanzania: A Case of Dar Es-Salaam-Zanzibar Route
No ratings yet
Assessment of The Human Factors Influences On Maritime Accidents in Tanzania: A Case of Dar Es-Salaam-Zanzibar Route
11 pages
APP Lab Manual Final
No ratings yet
APP Lab Manual Final
43 pages
IDS UNIT 5 Linear Regression
No ratings yet
IDS UNIT 5 Linear Regression
27 pages
7 TFT Color Display With Capacitive Touch Screen
No ratings yet
7 TFT Color Display With Capacitive Touch Screen
17 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
Driving Errors, Driving Violations and Accident Involvement
No ratings yet
Driving Errors, Driving Violations and Accident Involvement
14 pages
Introduction To Regression Analysis
No ratings yet
Introduction To Regression Analysis
14 pages
Fda Lab
No ratings yet
Fda Lab
43 pages
Ay-Sem8-Internship Report
No ratings yet
Ay-Sem8-Internship Report
34 pages
g150xtn03 1
No ratings yet
g150xtn03 1
28 pages
Boyer Et Al 2022 Religious Leaders Can Motivate Men To Cede Power and Reduce Intimate Partner Violence Experimental
No ratings yet
Boyer Et Al 2022 Religious Leaders Can Motivate Men To Cede Power and Reduce Intimate Partner Violence Experimental
9 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Regression Analysis LAB - Session 17 - 1
No ratings yet
Regression Analysis LAB - Session 17 - 1
2 pages
Ancit Internship
No ratings yet
Ancit Internship
3 pages
23335cables and Connectors
No ratings yet
23335cables and Connectors
18 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Editoriale Su Publication Bias
No ratings yet
Editoriale Su Publication Bias
4 pages
C Programs To Become Expert In Programming
From Everand
C Programs To Become Expert In Programming
Shubham Yadav
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Dfs Manual

Uploaded by

Dfs Manual

Uploaded by

GOVERNMENT COLLEGE OF ENGINEERING, ERODE – 638 316

RECORD NOTE BOOK

Staff In-Charge Head of the Department

Submitted for the Anna University practical examination on __________________ at

Internal Examiner External Examiner

1 Working with NumPy 3

Write a NumPy program to convert a

3 Working with Pandas DataFrame 9

4 Basic Plots using Matplotlib 17

Using the diabetes data set from UCI

Statistical and Probability measures on

Supervised and Unsupervised learning

Apply and explore various plotting

Working with NumPy

Write a NumPy program to convert a python dictionary to a NumPy

Working with Pandas DataFrame

Basic Plots using Matplotlib

Statistical Analysis for Outcome = 0:

Pregnancies Frequency: Pregnancies

DiabetesPedigreeFunction Age Outcome

Logistic Regression Model Evaluation:

0 0.80 0.79 0.80 151

accuracy 0.74 231

Optimization terminated successfully.

Logistic Regression Analysis Summary (StatsModels):

Supervised and Unsupervised learning with python program

Apply and explore various plotting functions on any data set.

# Load the dataset

# 1. Normal Value Distribution Plot

# Plot for Glucose

# Plot for BMI

# 2. Density and Contour Plots

# Load the dataset

# i. Correlation and Scatter Plots

# Pairplot for Scatter Plots (pairwise relationships between variables)

You might also like