Python for Data Science IA 1 Programs
Python for Data Science IA 1 Programs
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
def generate_dataset(n_samples=100):
np.random.seed(42)
X = 2 * np.random.rand(n_samples, 1)
y = 3 * X + 4 + np.random.randn(n_samples, 1)
class SimpleLinearRegression:
def __init__(self):
self.slope = None
self.intercept = None
def fit(self, X, y):
n = len(X)
X_mean = np.mean(X)
y_mean = np.mean(y)
if __name__ == "__main__":
X, y = generate_dataset()
dataset = pd.DataFrame({
"X": X.flatten(),
"y": y.flatten()
})
print("Dataset:")
print(dataset)
model = SimpleLinearRegression()
model.fit(X_train.flatten(), y_train.flatten())
y_pred = model.predict(X_test.flatten())
mse = mean_squared_error(y_test, y_pred)
Explanation:
Step-by-step breakdown:
Step 1: Importing Libraries
o numpy: Used for generating synthetic data and performing
numerical operations.
o matplotlib.pyplot: Used for visualizing the data and the regression
line.
o LinearRegression: This is the linear regression model from scikit-
learn that will be used to fit the data.
o train_test_split: Splits the dataset into training and testing sets.
KNN program
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
class KNNClassifier:
def __init__(self, k=3):
self.k = k self.X_train = None
self.y_train = None
iris = load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'], iris['target']]
columns=iris['feature_names'] + ['target'])
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
knn = KNNClassifier(k=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nPredictions:")
for i, (true_label, pred_label) in enumerate(zip(y_test, y_pred)):
status = "Correct" if true_label == pred_label else "Incorrect"
print(f"Test Sample {i + 1}: True Label = {true_label}, Predicted =
{pred_label}, {status}")
Explanation:
Step 1: Importing Libraries
o numpy: Used for handling arrays and matrix operations.
import numpy as np
import matplotlib.pyplot as plt
k=4
centroids, clusters = k_means(X, k)
Explanation:
Step-by-step breakdown:
Step 1: Importing Libraries
o numpy: For handling numerical data and matrix operations.
Iteration:
1. Assigning Labels: For each data point, it computes the distance from
the point to each centroid and assigns the point to the nearest
centroid (i.e., the cluster).
2. Recalculating Centroids: After assigning labels to all points, it
recalculates the centroids by averaging the points within each
cluster.
3. Repeat: Steps 1 and 2 are repeated iteratively until the centroids no
longer change (i.e., convergence is reached).
Naïve Bayes Program
import numpy as np
from sklearn.datasets import make_classification
class NaiveBayes:
def __init__(self):
self.class_probs = {}
self.class_means = {}
self.class_vars = {}
for c in classes:
self.class_probs[c] = np.mean(y == c)
for c in classes:
X_c = X[y == c]
self.class_means[c] = np.mean(X_c, axis=0)
self.class_vars[c] = np.var(X_c, axis=0)
return np.array(predictions)
nb = NaiveBayes()
nb.fit(X, y)
predictions = nb.predict(X)
accuracy = np.mean(predictions == y)
print(f"Accuracy: {accuracy * 100:.2f}%")
Explanation:
Step-by-step breakdown:
Step 1: Importing Libraries
o numpy: Used for numerical operations (though not directly used
here, it is used in the underlying data).
o train_test_split: This function splits the dataset into a training set
and a test set.
o GaussianNB: This is the Naive Bayes classifier for continuous
features (assuming Gaussian/Normal distribution).
o load_iris: A function to load the Iris dataset, which contains flower
data and their corresponding species.
o accuracy_score: This function calculates the accuracy of predictions
by comparing the predicted labels (y_pred) with the true labels
(y_test).
Step 2: Loading the Dataset
o load_iris() loads the Iris dataset, which consists of 150 samples,
each containing 4 features (sepal length, sepal width, petal length,
petal width) and corresponding target labels (y), which represent
three species of Iris flowers.
Step 3: Splitting the Data
o train_test_split() divides the data into a training set and a testing
set (with 70% training and 30% testing in this case). This helps in
evaluating the model on unseen data.
Step 4: Initializing the Naive Bayes Model
o GaussianNB() initializes the Naive Bayes classifier that assumes the
features are normally distributed (Gaussian distribution).
Step 5: Training the Model
o naive_bayes.fit(X_train, y_train) trains the Naive Bayes model using
the training data (X_train as input features and y_train as the target
labels).
Step 6: Making Predictions
o naive_bayes.predict(X_test) predicts the labels for the test data
(X_test) based on the trained model.
Step 7: Evaluating the Model
o accuracy_score(y_test, y_pred) compares the predicted labels
(y_pred) with the true labels (y_test) and calculates the accuracy.
In practice, Naive Bayes estimates the probability of each class by assuming that
the features are conditionally independent. For the Gaussian Naive Bayes (used
here), it assumes the features are normally distributed and uses the mean and
variance of each feature to calculate the likelihood.