Panipat Institute of Engineering and Technology Samalkha
Department of Computer Science and Engineering
(Artificial Intelligence and Machine Learning)
Python Lab II (PC – CS – AI&ML – 218A)
Practical File
Submitted To: Submitted by:
Ms. Gaurisha Ankit Raj
Asst. Professor B.tech CSE (AI&ML)
4th sem (2823302)
Affiliated to:
Kurukshetra University, Kurukshetra, India
Index
Sr. No. Aim Page no. Date Remark
Write a program to
1. implement of basic python 3
libraries - numpy, scipy.
Write a program to
implement of basic of python
2. libraries–matplotlib, pandas, 6
scikitlearn.
Write a program to create
3. 9
sample from population.
Write a program to evaluate
4. Mean, Median, Mode of 11
dataset.
Write a program to
5. implement Central Limit 13
theorem in dataset.
Write a program to
6. implement Measure of 16
Spread in dataset.
Write a program to
implement to differentiate
7. between descriptive and 19
inferential statistics.
Write a program to
8. 22
implement pmf, pdf and cdf.
Write a program to
implement different
9. visualization techniques on 26
sample dataset.
Write a program to
implement different
10. hypothesis test on sample 31
dataset.
Program – 1
Aim: Write a program to implement of basic python libraries - numpy, scipy.
Code:
# Importing required libraries import
numpy as np
from scipy import linalg
# NumPy Basics print("-----
NumPy -----")
# Create a NumPy array
a = np.array([[1, 2], [3, 4]])
print("Array a:\n", a)
# Array addition b =
np.array([[5, 6], [7, 8]])
print("Array b:\n", b)
sum_array = a + b print("Sum of a
and b:\n", sum_array)
# Transpose print("Transpose
of a:\n", a.T)
# Dot product dot_product = np.dot(a, b)
print("Dot product of a and b:\n", dot_product)
# Basic statistics
print("Mean of a:", np.mean(a))
print("Standard Deviation of a:", np.std(a))
# SciPy Basics
print("\n----- SciPy -----")
# Linear Algebra: Solving system of linear equations
# Example: 2x + 3y = 8 and 3x + 4y = 11
A = np.array([[2, 3], [3, 4]])
b = np.array([8, 11])
x = linalg.solve(A, b) print("Solution of the system 2x + 3y = 8 and 3x + 4y = 11 is:\n x
=", x[0], ", y =", x[1])
Output
Program – 2
Aim: Write a program to implement of basic of python libraries–matplotlib,
pandas, scikitlearn.
Code:
import pandas as pd import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Sample Data using Pandas data
={
'Hours_Studied': [1, 2, 3, 4, 5],
'Marks_Scored': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data) print("Data
Table:\n", df)
# Plotting with Matplotlib
plt.scatter(df['Hours_Studied'], df['Marks_Scored'], color='Red')
plt.title('Study Hours vs Marks') plt.xlabel('Hours Studied')
plt.ylabel('Marks Scored') plt.grid(True) plt.show()
# Simple Linear Regression with Scikit-learn
X = df[['Hours_Studied']] y =
df['Marks_Scored']
model = LinearRegression() model.fit(X,
y)
# Predict marks for 6 hours of study predicted =
model.predict([[6]]) print("\nPredicted marks for 6 hours of
study:", predicted[0])
Output
Program - 3
Aim: Write a program to create sample from population.
Code:
import pandas as pd
# Create a sample dataset (a dictionary) data
={
'Name': ['Sukriti', 'Tejas', 'Aditya', 'Keshav', 'Diya', 'Kashvi', 'Jaishree', 'Wandy', 'Savidhi',
'Rydhym'],
'Age': [17, 20, 22, 16, 24, 23, 19, 25, 26, 21],
'Score': [85, 78, 90, 88, 92, 70, 95, 80, 87, 75]
}
# Convert the dictionary to a pandas DataFrame df
= pd.DataFrame(data)
# Show the original dataset
print("Original Dataset:\n", df)
# Define the sample size (e.g., 4)
sample_size = 4
# Create a random sample from the dataset
sample_df = df.sample(n=sample_size)
# Print the random sample print("\nRandom Sample from
the Dataset:\n", sample_df)
Output
Program -4
Aim: Write a program to evaluate Mean, Median, Mode of dataset.
Code:
import pandas as pd
from scipy import stats
# Sample dataset: Product names and their prices
data = {
'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Printer', 'Tablet', 'Webcam', 'Speaker',
'Charger', 'Router'],
'Price': [750, 25, 45, 200, 120, 300, 60, 85, 25, 120]
}
# Create DataFrame
df = pd.DataFrame(data)
# Display the dataset print("Product
Prices:\n")
print(df)
# Extract the 'Price' column
prices = df['Price']
# Calculate statistics mean_price
= prices.mean() median_price =
prices.median()
mode_price = stats.mode(prices, keepdims=True)[0][0]
# Display results print("\nPrice
Statistics-")
print(f"Mean Price : ₹{mean_price:.2f}")
print(f"Median Price : ₹{median_price}") print(f"Mode
Price : ₹{mode_price}")
Output
Program - 5
Aim: Write a program to implement Central Limit theorem in dataset.
Code:
import pandas as pd
import matplotlib.pyplot as plt
# Create dataset data
={
'Product': ['Pen', 'Notebook', 'Bag', 'Bottle', 'Shoes', 'Cap', 'Watch', 'Book', 'Lamp', 'Mouse'],
'Price': [10, 25, 45, 30, 120, 18, 250, 40, 35, 22]
}
df = pd.DataFrame(data)
# Plot original distribution
plt.hist(df['Price'], bins=10, color='skyblue', edgecolor='black')
plt.title("Original Price Distribution") plt.xlabel("Price")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()
# Central Limit Theorem using pandas only
sample_means = []
sample_size = 4
for _ in range(1000):
sample = df.sample(n=sample_size, replace=True)
sample_mean = sample['Price'].mean()
sample_means.append(sample_mean)
# Convert to Series
sample_means_series = pd.Series(sample_means)
# Plot sampling distribution
plt.hist(sample_means_series, bins=30, color='orange', edgecolor='black')
plt.title("Sampling Distribution of Mean (Sample Size = 4)")
plt.xlabel("Sample Mean Price") plt.ylabel("Frequency") plt.grid(True)
plt.show()
Output
Program – 6
Aim: Write a program to implement Measure of Spread in dataset.
Code:
import pandas as pd
# Sample dataset of employee ages data = {'Employee_Ages':
[25, 30, 22, 35, 28, 40, 27, 32, 31, 29]} df = pd.DataFrame(data)
# Display the dataset print("Dataset:\n",
df)
# Calculate Range age_range = df['Employee_Ages'].max() -
df['Employee_Ages'].min()
# Calculate Variance (sample) age_variance
= df['Employee_Ages'].var()
# Calculate Standard Deviation (sample) age_std_dev
= df['Employee_Ages'].std()
# Calculate Interquartile Range (IQR) Q1
= df['Employee_Ages'].quantile(0.25) Q3
= df['Employee_Ages'].quantile(0.75)
age_iqr = Q3 - Q1
# Display the measures of spread
print(f"\nMeasures of Spread for Employee Ages:") print(f"Range:
{age_range}")
print(f"Variance: {age_variance:.2f}") print(f"Standard
Deviation: {age_std_dev:.2f}") print(f"Interquartile
Range (IQR): {age_iqr}")
Output
Program – 7
Aim: Write a program to implement to differentiate between descriptive
and inferential statistics
Code:
import pandas as pd
from scipy.stats import chisquare
# Sample dataset: Observed counts of different fruit preferences in a survey data
={
'Fruit': ['Apple', 'Banana', 'Orange', 'Grapes', 'Mango'],
'Observed_Count': [50, 30, 40, 20, 60]
}
df = pd.DataFrame(data)
print("Dataset:\n", df)
# --- Descriptive Statistics --- print("\n---
Descriptive Statistics ---") total_responses =
df['Observed_Count'].sum()
percentages = (df['Observed_Count'] / total_responses) * 100
print(f"Total Responses: {total_responses}")
print("Percentage of each fruit preference:") for
fruit, pct in zip(df['Fruit'], percentages):
print(f"{fruit}: {pct:.2f}%")
# --- Inferential Statistics --- print("\n---
Inferential Statistics ---")
# Suppose we expect equal preference for all fruits (null hypothesis) expected_counts
= [total_responses / len(df)] * len(df)
# Perform Chi-Square goodness of fit test chi_stat, p_value =
chisquare(f_obs=df['Observed_Count'], f_exp=expected_counts)
print(f"Chi-Square Statistic: {chi_stat:.2f}")
print(f"P-Value: {p_value:.4f}")
if p_value < 0.05: print("Result: Reject the null hypothesis. Preferences are not
equally distributed.") else:
print("Result: Fail to reject the null hypothesis. Preferences are equally distributed.")
Output
Program – 8
Aim: Write a program to implement pmf, pdf and cdf.
Code:
import pandas as pd import
matplotlib.pyplot as plt
from scipy.stats import binom, norm
# --- Helper function to create linspace without numpy ---
def linspace(start, stop, num): step = (stop - start) /
(num - 1)
return [start + step * i for i in range(num)]
# --- PMF: Binomial Distribution (Discrete) --- n,
p = 10, 0.5
df_pmf = pd.DataFrame({'Successes': list(range(n + 1))})
df_pmf['PMF'] = df_pmf['Successes'].apply(lambda x: binom.pmf(x, n, p))
# --- PDF & CDF: Normal Distribution (Continuous) ---
x_values = linspace(-4, 4, 1000) df_pdf_cdf =
pd.DataFrame({'x': x_values})
df_pdf_cdf['PDF'] = df_pdf_cdf['x'].apply(lambda x: norm.pdf(x, loc=0, scale=1))
df_pdf_cdf['CDF'] = df_pdf_cdf['x'].apply(lambda x: norm.cdf(x, loc=0, scale=1))
# --- Plotting --- plt.figure(figsize=(12,
4))
# PMF plot
plt.subplot(1, 3, 1)
plt.stem(df_pmf['Successes'], df_pmf['PMF'], basefmt=" ")
plt.title('PMF - Binomial Distribution') plt.xlabel('Number
of Successes') plt.ylabel('Probability')
# PDF plot plt.subplot(1, 3, 2)
plt.plot(df_pdf_cdf['x'], df_pdf_cdf['PDF'])
plt.title('PDF - Normal Distribution')
plt.xlabel('x')
plt.ylabel('Density')
# CDF plot plt.subplot(1, 3, 3)
plt.plot(df_pdf_cdf['x'], df_pdf_cdf['CDF'])
plt.title('CDF - Normal Distribution')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.tight_layout()
plt.show()
Output
Program – 9
Aim: Write a program to implement different visualization techniques on
sample dataset.
Code:
import pandas as pd
import matplotlib.pyplot as plt
# Sample dataset: Monthly sales (in units) of 3 products data
={
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'Product_A': [150, 180, 220, 210, 250, 300],
'Product_B': [80, 120, 160, 150, 190, 230],
'Product_C': [100, 110, 130, 120, 170, 200]
}
df = pd.DataFrame(data)
print("Sales Dataset:") print(df)
# Set up the figure and subplots with better spacing
fig, axs = plt.subplots(2, 2, figsize=(14, 10))
# 1. Line Plot - Monthly sales trend for each product
axs[0, 0].plot(df['Month'], df['Product_A'], marker='o', label='Product A')
axs[0, 0].plot(df['Month'], df['Product_B'], marker='s', label='Product B')
axs[0, 0].plot(df['Month'], df['Product_C'], marker='^', label='Product C')
axs[0, 0].set_title('Monthly Sales Trend') axs[0, 0].set_xlabel('Month')
axs[0, 0].set_ylabel('Units Sold') axs[0, 0].legend()
axs[0, 0].grid(True)
# 2. Bar Plot - Total sales of each product over 6 months total_sales
= df[['Product_A', 'Product_B', 'Product_C']].sum()
axs[0, 1].bar(total_sales.index, total_sales.values, color=['skyblue', 'salmon', 'lightgreen']) axs[0,
1].set_title('Total Sales by Product')
axs[0, 1].set_xlabel('Product')
axs[0, 1].set_ylabel('Total Units Sold')
# 3. Pie Chart - Sales proportion of products in June
june_sales = df[df['Month'] == 'Jun'][['Product_A', 'Product_B', 'Product_C']].iloc[0]
axs[1, 0].pie(june_sales, labels=june_sales.index, autopct='%1.1f%%', startangle=90,
colors=['skyblue', 'salmon', 'lightgreen']) axs[1, 0].set_title('Sales Proportion in June')
# 4. Histogram - Distribution of Product_A sales over months axs[1,
1].hist(df['Product_A'], bins=5, color='violet', edgecolor='black') axs[1,
1].set_title('Product A Sales Distribution') axs[1, 1].set_xlabel('Units
Sold')
axs[1, 1].set_ylabel('Frequency')
# Adjust layout to avoid overlapping
plt.tight_layout(pad=3.0) plt.show()
Output
Program – 10
Aim: Write a program to implement different hypothesis test on sample
Code:
import pandas as pd
from scipy import stats
# Sample dataset: Marks of students in two classes
class_a_scores = [85, 78, 90, 88, 76, 95, 89, 92] class_b_scores
= [80, 75, 85, 70, 78, 82, 77, 74]
# Convert to DataFrame for clarity df_scores
= pd.DataFrame({
'ClassA': class_a_scores,
'ClassB': class_b_scores
}) print("Student Scores:\n",
df_scores)
# 1. One-sample t-test for ClassA print("\n1. One-sample
t-test (H0: mean of ClassA = 80)")
t_stat, p_val = stats.ttest_1samp(class_a_scores, 80) print(f"t-statistic =
{t_stat:.4f}, p-value = {p_val:.4f}") if p_val < 0.05: print("Reject H0:
Mean of ClassA is significantly different from 80\n") else: print("Fail
to reject H0: No significant difference from mean = 80\n")
# 2. Two-sample t-test between ClassA and ClassB print("2. Two-
sample t-test (H0: mean of ClassA = mean of ClassB)")
t_stat2, p_val2 = stats.ttest_ind(class_a_scores, class_b_scores) print(f"t-
statistic = {t_stat2:.4f}, p-value = {p_val2:.4f}") if p_val2 < 0.05:
print("Reject H0: Means of ClassA and ClassB are significantly different\n")
else:
print("Fail to reject H0: Means might be equal\n")
# 3. Chi-Square Test for Independence print("3. Chi-Square Test for Independence
(Gender vs Online Learning Preference)")
gender = ['Male', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Female'] prefers_online
= ['Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes']
# Create a contingency table using pandas df_pref
= pd.DataFrame({
'Gender': gender,
'Prefers_Online': prefers_online
})
contingency_table = pd.crosstab(df_pref['Gender'], df_pref['Prefers_Online'])
print("\nContingency Table:\n", contingency_table)
# Perform chi-square test
chi2, p_val3, dof, expected = stats.chi2_contingency(contingency_table)
print(f"\nchi-square = {chi2:.4f}, p-value = {p_val3:.4f}") if p_val3 < 0.05:
print("Reject H0: Gender and preference for online learning are related\n") else:
print("Fail to reject H0: No significant relation between gender and preference\n")
Output