Data Analytics All Practical
Data Analytics All Practical
Lab File
Data Analytics Lab
(BADS651)
ACADEMIC SESSION 2024-25
SEM: VI
print(f"Average: {avg_value}")
print(f"Square Root of each number: {sqrt_values}")
print(f"Rounded values (to 2 decimal places): {rounded_values}")
1
# Call the function
perform_operations()
Output:
2
Program – 2
Aim: To perform data import/export (.CSV, .XLS, .TXT) operations using data
frames in Python.
Program:
import pandas as pd
# Correct file paths using raw string (r"") or double backslashes (\\)
csv_path = r"D:\GL BAJAJ\DAata Analytics\customers-100.csv"
csv_data = pd.read_csv(csv_path)
print("\nCSV Data:\n", csv_data.head()) # Show first 5 rows
except Exception as e:
print("Error loading CSV file:", e)
except Exception as e:
print("Error loading Excel file:", e)
try:
txt_data = pd.read_csv(txt_path, sep="\t", engine="python", on_bad_lines="skip") #
Auto-detect separator
3
print("\nTXT Data:\n", txt_data.head()) # Show first 5 rows
except Exception as e:
print("Error loading TXT file:", e)
4
Output:
5
Program – 3
Aim: To get the input matrix from user and perform Matrix addition,
subtraction, multiplication, inverse transpose and division operations using
vector concept in Python.
Program:
import numpy as np
for i in range(rows):
row = list(map(float, input(f"Enter elements for row {i+1} separated by space: ").split()))
matrix.append(row)
return np.array(matrix)
try:
# Matrix Addition
matrix_addition = matrix1 + matrix2
print("Matrix Addition:\n", matrix_addition)
6
# Matrix Subtraction
matrix_subtraction = matrix1 - matrix2
print("Matrix Subtraction:\n", matrix_subtraction)
# Matrix Multiplication
matrix_multiplication = np.dot(matrix1, matrix2)
print("Matrix Multiplication:\n", matrix_multiplication)
# Matrix Transpose
matrix_transpose = np.transpose(matrix1)
print("Matrix Transpose:\n", matrix_transpose)
except Exception as e:
print("Matrix Operations")
7
# Get user input for two matrices
print("Enter the first matrix:")
matrix1 = get_matrix_input()
main()
Output:
8
Program – 4
# Mean
mean_value = statistics.mean(data)
print(f"Mean: {mean_value}")
# Median
median_value = statistics.median(data)
print(f"Median: {median_value}")
# Mode
try:
mode_value = statistics.mode(data)
print(f"Mode: {mode_value}")
except statistics.StatisticsError:
print("Mode: No unique mode (multiple modes or no mode)")
# Standard Deviation
stdev_value = statistics.stdev(data)
9
print(f"Standard Deviation: {stdev_value}")
Output:
10
Program – 5
import pandas as pd
import numpy as np
df = pd.DataFrame(data)
11
print(df_fill_forward)
Output:
12
ii) Min-Max Normalization in Python:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
13
14
Program – 6
Aim: To perform dimensionality reduction operation using PCA for Houses Data
Set.
Program:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
file_path = "D:\GL BAJAJ\DAata Analytics\House price data .xlsx" # Use raw string (r"")
df = pd.read_excel(file_path, engine="openpyxl") # Ensure openpyxl is installed
# Step 2: Standardize the Data (PCA works better with scaled data)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_features)
15
pca_result = pca.fit_transform(scaled_data)
plt.figure(figsize=(8, 5))
plt.scatter(pca_result[:, 0], pca_result[:, 1], c="blue", alpha=0.5)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA on House Prices Dataset")
plt.grid()
plt.show()
pca_full = PCA().fit(scaled_data)
cumulative_variance = np.cumsum(pca_full.explained_variance_ratio_) * 100
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker="o",
linestyle="--", color="red")
plt.xlabel("Number of Principal Components")
16
plt.grid()
plt.show()
Output:
17
Program – 7
# 1. Prepare the dataset (for this example, let's generate some data)
# Generate a simple linear dataset
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # Feature: 100 random values between 0 and 2
18
# 5. Make predictions
y_pred = model.predict(X_test)
print(f"R-squared: {r2}")
plt.show()
Output:
19
Program – 8
Aim: To perform K-Means clustering operation and visualize for iris data set
Program:
!pip install mlxtend
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
]
# Step 2: Convert the list of transactions into one-hot encoded DataFrame
te = TransactionEncoder()
te_data = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_data, columns=te.columns_)
print(df)
20
# Step 4: Derive Association Rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
plt.colorbar(label="Cluster")
plt.show()
from sklearn.metrics import accuracy_score
true_labels = iris.target # Actual labels from dataset
print("Accuracy (approximate):", accuracy_score(true_labels, I.flatten()))
true_labels = iris.target
mapping = {}
for cluster in range(3):
mask = (I.flatten() == cluster) # Find all data points in cluster
21
if np.sum(mask) > 0: # Ensure the mask is not empty
most_common_label = mode(true_labels[mask], keepdims=True).mode[0] # Fix
mapping[cluster] = most_common_label
# Compute accuracy
accuracy = accuracy_score(true_labels, mapped_clusters)
print("Corrected Accuracy:", accuracy)
Output:
Cluster Assignments: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 2 0 0 0
0 2 2 2 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 2 2 2 0 2 0 2 2
0 2 0 0 2 2 2 2 0 2 0 2 0 2 2 0 0 2 2 2 2 2 0 0 2 2 2 0 2 2 2 0 2 2 2 0 2
2 0]
22
Program – 9
Aim: Write R script to diagnose any disease using KNN classification and plot the
results.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset
y = df['Outcome']
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
23
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25,
random_state=42)
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score(knn, X_train, y_train, cv=5, scoring='accuracy')
cv_scores.append(scores.mean())
# Best k
best_k = k_range[cv_scores.index(max(cv_scores))]
print(f"Best K value: {best_k}")
y_pred_knn = knn_best.predict(X_test)
24
# Evaluation
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_knn)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Disease', 'Disease'],
yticklabels=['No Disease', 'Disease'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix - KNN')
plt.tight_layout()
plt.show()
25
Output:
Best K value: 7
26
🔍 Random Forest Accuracy: 0.73
precision recall f1-score support
27
Program – 10
Aim: To perform market basket analysis using Association Rules (Apriori).
Program:
print(df)
28
print("\n📦 Frequent Itemsets (Support >= 0.6):")
print(frequent_itemsets)
Output:
29