Data Science Record - 05
Data Science Record - 05
NO: 01
PERFORM DATA EXPLORATION AND PREPROCESSING
DATE: 23.01.2025
AIM:
To write a python code that will perform data exploration and preprocessing for the
uploaded dataset.
ALGORITHM:
Step 1: Start the program
Step 2: Import the necessary python libraries
Step 3: Load the data set in the current file directory
Step 4: Perform data exploration and data preprocessing for the loaded dataset
Step 5: Display the output
Step 6: Stop the program
CODE:
import pandas as pd
pd.set_option("display.max_rows", None, "display.max_columns", None, "display.width",
None)
file_path = '/content/traffic_accidects.csv'
df = pd.read_csv(file_path)
print("First few rows of the dataset:")
print(df.head())
print("First few rows of the dataset:")
print(df.head())
print("\nSummary Statistics:")
print(df.describe(include="all"))
print("\nMissing Values:")
print(df.isnull().sum())
if 'Age' in df.columns:
df['Age'] = df['Age'].fillna(df['Age'].median())
if 'Salary' in df.columns:
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())
if 'AccidentDate' in df.columns:
df['AccidentDate'] = df['AccidentDate'].fillna("Unknown")
df['AccidentDate'] = pd.to_datetime(df['AccidentDate'], errors='coerce')
if 'Gender' in df.columns:
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})
if 'SeverityScore' in df.columns:
df = df.dropna(subset=['SeverityScore'])
if 'AccidentDate' in df.columns:
current_year = pd.Timestamp.now().year
df['YearsSinceAccident'] = current_year - df['AccidentDate'].dt.year
if 'Salary' in df.columns:
Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
df = df[(df['Salary'] >= lower_bound) & (df['Salary'] <= upper_bound)]
print("\nCleaned Dataset:")
print(df.head().to_string())
OUTPUT:
Particulars Marks Allotted Marks Awarded
Program / Simulation 40
Program Execution 30
Result 20
Viva Voce 10
Total 100
RESULT:
Thus, a program for data exploration and preprocessing has been successfully
executed.
EXP.NO: 02 (a)
OUTPUT:
OUTPUT:
EXP.NO: 02 (b)
DATE: 30.01.2025 Implement linear and logistic regression
CODE:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
iris = load_iris()
X = iris.data[:, 0].reshape(-1, 1)
y = (iris.target == 0).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
def sigmoid(z): return 1 / (1 + np.exp(-z))
def gradient_descent(X, y, theta, lr=0.01, iters=1000):
for _ in range(iters):
theta -= lr * (X.T @ (sigmoid(X @ theta) - y)) / len(y)
return theta
theta = np.zeros(X_train_poly.shape[1])
theta_optimal = gradient_descent(X_train_poly, y_train, theta)
predictions = sigmoid(X_test_poly @ theta_optimal) >= 0.5
accuracy = np.mean(predictions == y_test)
print(f"Accuracy: {accuracy * 100:.2f}%")
x_values = np.linspace(X_train.min(), X_train.max(), 100).reshape(-1, 1)
x_poly = poly.transform(x_values)
y_values = sigmoid(x_poly @ theta_optimal) >= 0.5
plt.scatter(X_train, y_train, color='blue', label='Training data')
plt.plot(x_values, y_values, color='red', label='Decision Boundary')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Setosa (1) vs Not Setosa (0)')
plt.title('Logistic Regression with Curved Decision Boundary')
plt.legend()
plt.grid(True)
plt.show()
OUTPUT:
Particulars Marks Allotted Marks Awarded
Program / Simulation 40
Program Execution 30
Result 20
Viva Voce 10
Total 100
RESULT:
Thus, a program for linear and logistic regression has been successfully executed.
EXP.NO: 03
AIM:
To write a Python code for the implementation of the Naive Bayes Classifier for classifying
data based on probability distributions.
ALGORITHM:
Step 1: Start the program
Step 2: Import the necessary Python libraries
Step 3: Load the dataset and preprocess the data
Step 4: Compute the prior probabilities and likelihood using Bayes' theorem
Step 5: Build the Naïve Bayes classifier and train it on the dataset
Step 6: Use the trained model to make predictions
Step 7: Display the output
Step 8: Stop the program
CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
class NaiveBayesClassifier:
def __init__(self):
self.class_priors = {}
self.means = {}
self.variances = {}
self.classes = None
Program / Simulation 40
Program Execution 30
Result 20
Viva Voce 10
Total 100
RESULT:
EXP.NO: 04
AIM:
DATE: 13.03.2025 POWER BI
ALGORITHM:
STEP 1: Load Data - Import dataset into Power BI using Get Data and
compute Experience.
BI AI Insights.
STEP 6: Evaluate and Visualize - Compute accuracy, generate confusion matrix, plot
ROC curve.
OUTPUT:
MARK ALLOCATION:
Program / Simulation 40
Program Execution 30
Result 20
Viva Voce 10
Total 100
RESULT:
Thus, the zomato sales dataset has been successfully visualized using a PowerBI
dashboard.