Iml Lab (1) .177
Iml Lab (1) .177
LAB RECORD
Of
Submitted By-
Jeevith G L
[22BBTCS122]
Course In-charge:
Prof. Shana Aneevvan
Assistant Professor,
Dept. of CSE, SOET
Page
Sl.no. Experiments
No.
Implement the FIND-S algorithm. Verify that it successfully produces the trace in
4 10
for the Enjoy sport example. (Tom Mitchell Reference)
Implement a decision tree algorithm for sales prediction/classification in retail
5 12
sector
6 Implement back propagation algorithm for stock prices prediction. 15
7 Implement clustering algorithm for Insurance fraud detection 18
8 Implement clustering algorithm for identifying cancerous data 21
9 Apply reinforcement learning and develop a game of your own. 23
1. Students are advised to come to the laboratory at least 5 minutes before (to the starting time), those
who come after 5 minutes will not be allowed into the lab.
2. Plan your task properly much before to the commencement, come prepared to the lab with the
synopsis / program / experiment details.
3. Student should enter into the laboratory with:
a. Laboratory observation notes with all the details (Problem statement, Aim, Algorithm,
Procedure, Program, Expected Output, etc.,) filled in for the lab session.
b. Laboratory Record updated up to the last session experiments and other utensils (if
any) needed in the lab.
c. Proper Dress code and Identity card.
4. Sign in the laboratory login register, write the TIME-IN, and occupy the computer system allotted to
you by the faculty.
5. Execute your task in the laboratory, and record the results / output in the lab observation note book,
and get certified by the concerned faculty.
6. All the students should be polite and cooperative with the laboratory staff, must maintain the
discipline and decency in the laboratory.
7. Computer labs are established with sophisticated and high end branded systems, which should be
utilized properly.
8. Students / Faculty must keep their mobile phones in SWITCHED OFF mode during the lab
sessions. Misuse of the equipment, misbehaviors with the staff and systems etc., will attract severe
punishment.
9. Students must take the permission of the faculty in case of any urgency to go out; if anybody found
loitering outside the lab / class without permission during working hours will be treated seriously and
punished appropriately.
10. Students should LOG OFF/ SHUT DOWN the computer system before he/she leaves the lab after
completing the task (experiment) in all aspects. He/she must ensure the system / seat is kept properly.
AIM:
To study the usage of python and R tool.
PROGRAM:
Python
1. Getting Started:
Install Python via python.org or use tools like Anaconda for a bundled data science environment.
Use IDEs like PyCharm, Jupyter Notebook, or VS Code.
2. Core Concepts:
Learn the basics: data types, control structures, functions, and modules.
Explore libraries like NumPy (numerical computation), pandas (data manipulation), and matplotlib/
seaborn (visualization).
3. Advanced Applications:
Machine learning: scikit-learn, TensorFlow, or PyTorch.
4. Data visualization: Plotly, Dash.
Web scraping: Beautiful Soup, Selenium.
Automation: Python scripts for task automation.
5. Practice:
Platforms: Kaggle, HackerRank, or LeetCode.
Projects: Build small projects, such as data analysis scripts or visualization dashboards.
1. Getting Started:
Download R and RStudio from CRAN and RStudio.
Familiarize yourself with the RStudio environment.
2. Core Concepts:
Data manipulation: Learn functions for vectors, matrices, and data frames.
Libraries: Master dplyr (data manipulation), ggplot2 (visualization), and tidyr (data tidying).
3. Advanced Applications:
Statistical analysis: Perform hypothesis testing, regression, and clustering.
Machine learning: Use packages like caret or mlr.
4. Visualization: Build interactive plots with shiny or plotly.
5. Practice:
Use datasets like iris, mtcars, or external datasets from Kaggle.
Analyze real-world data to develop actionable insights.
AIM:
To build and implement a classifier for the sales data.
PROGRAM:
import pandas as pd
import urllib.request
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Step 2: Display first few rows and column names to understand the structure
print("Columns in the dataset:", data.columns)
print(data.head())
# Step 4: Handle missing values by replacing with the mean of numeric columns
data.fillna(data.select_dtypes(include=['number']).mean(), inplace=True)
# Check for one of the 'Normalized' columns (e.g., 'Normalized 1', 'Normalized 2', etc.)
if 'Normalized 1' in data.columns: # Replace this with the actual relevant column you choose
threshold = data['Normalized 1'].mean()
data['High_Sales'] = (data['Normalized 1'] > threshold).astype(int)
high_sales_created = True
elif 'Normalized 2' in data.columns:
threshold = data['Normalized 2'].mean()
data['High_Sales'] = (data['Normalized 2'] > threshold).astype(int)
high_sales_created = True
else:
print("No normalized columns found for defining high sales.")
exit()
# Step 8: Define the features (X) and target (y) only if 'High_Sales' was created
if high_sales_created:
X = data.drop(columns=['High_Sales']) # Features
y = data['High_Sales'] # Target
# Step 9: Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Confusion Matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# Classification Report
print("Classification Report:")
print(classification_report(y_test, y_pred))
else:
print("No valid 'High_Sales' column created. The model will not be trained.")
Output:
Result:
A classifier was successfully implemented on the sales data, accurately predicting sales categories based on
given features
AIM:
To develop a predictive model for predicting house prices.
PROGRAM:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 4: Split the dataset into features (X) and target (y)
X = data.drop(columns=['Price']) # Features
y = data['Price'] # Target
# Step 5: Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Output:
Result:
A predictive model was successfully developed, estimating house prices based on features like location, size,
and amenities
AIM:
To implement FIND-S algorithm and to verify that it successfully produces the trace in for the
Enjoy sport example.
PROGRAM:
import pandas as pd
# Filter the data to include only positive examples (Enjoy Sport == 'Yes')
positive_examples = df[df['Enjoy Sport'] == 'Yes']
# Step 1: Start with the first positive example as the hypothesis (the most specific hypothesis)
initial_hypothesis = positive_examples.iloc[0, :-1].values
hypothesis = initial_hypothesis
# Step 2: Iterate through the remaining positive examples and generalize the hypothesis
for index, example in positive_examples.iloc[1:].iterrows():
for i in range(len(hypothesis)):
# If the current hypothesis doesn't match the current example, generalize it
if hypothesis[i] != example[i]:
hypothesis[i] = '?'
Output:
Result:
The FIND-S algorithm was successfully implemented, identifying the most specific hypothesis for the
EnjoySport dataset by generalizing from positive examples
AIM:
To implement a decision tree algorithm for sales prediction/classification in retail sector.
PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import LabelEncoder
df['Category'] = label_encoder.fit_transform(df['Category'])
df['Region'] = label_encoder.fit_transform(df['Region'])
df['Promotion'] = label_encoder.fit_transform(df['Promotion'])
df['Sales'] = label_encoder.fit_transform(df['Sales']) # Target variable (Sales as High=1, Low=0)
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Confusion Matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# Classification Report
print("Classification Report:")
print(classification_report(y_test, y_pred))
plt.figure(figsize=(12,8))
plot_tree(model, feature_names=['Category', 'Region', 'Promotion'], class_names=['Low', 'High'],
filled=True)
plt.show()
Output:
Result:
A decision tree algorithm was successfully implemented, effectively classifying and predicting sales trends in
the retail sector based on historical data
AIM:
To implement back propagation algorithm for stock prices prediction.
PROGRAM:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
import matplotlib.pyplot as plt
# Create DataFrame
data = pd.DataFrame({'Date': dates, 'Close': stock_prices})
time_step = 60 # Use the last 60 days to predict the next day's price
X, y = create_dataset(scaled_data, time_step)
# Step 10: Inverse the scaling to get the actual price values
predictions = scaler.inverse_transform(predictions.reshape(-1, 1))
y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))
Output:
Result:
The backpropagation-based MLP model effectively predicted stock prices using past data. The plotted results
showed a close alignment between actual and predicted values, demonstrating the model's capability to capture
stock price trends.
AIM:
To implement clustering algorithm for Insurance fraud detection
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
n_samples = 1000
claim_amount = np.random.normal(2000, 500, n_samples) # Normal distribution for claim amount
age = np.random.normal(40, 10, n_samples) # Normal distribution for age
num_claims = np.random.randint(1, 10, n_samples) # Random number of claims
num_previous_frauds = np.random.randint(0, 2, n_samples) # 0 for non-fraudulent, 1 for fraudulent
claim_type = np.random.randint(0, 3, n_samples) # 0,1,2 representing different types of claims
# Simulate fraud data by adding some extreme claim amounts and a higher chance of previous fraud for
fraudulent cases
fraudulent_indices = np.random.choice(n_samples, size=50, replace=False)
claim_amount[fraudulent_indices] = np.random.normal(10000, 5000, size=50) # High fraudulent claim
amounts
num_previous_frauds[fraudulent_indices] = 1 # Assign fraud flag to these rows
# Create a DataFrame
data = pd.DataFrame({
'Claim Amount': claim_amount,
'Age': age,
# Optionally, you can check for outliers or anomalies in the larger claim amounts or fraudulent cases.
Output:
Result:
A clustering algorithm was successfully implemented, grouping insurance claims to detect potential fraud
patterns based on anomalies and similarities
AIM:
To implement clustering algorithm for identifying cancerous data
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
# Step 6: Visualizing the Clusters (using first two features for simplicity)
plt.figure(figsize=(8, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y_pred, cmap='viridis', label='Cluster')
plt.title('Breast Cancer Clustering with K-Means’)
Dept. of CSE, CMR Page 21
4CSPL2041 IML LAB
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar(label='Cluster')
plt.show()
Output:
Result:
A clustering algorithm was successfully implemented, effectively distinguishing cancerous and non-
cancerous data based on feature similarities.
Dept. of CSE, CMR Page 22
4CSPL2041 IML LAB
AIM:
To apply reinforcement learning and develop a game of your own.
PROGRAM:
import numpy as np
import random
import matplotlib.pyplot as plt
def reset(self):
self.position = (0, 0) # Reset to initial position
return self.position
self.position = new_position
# Reward function: goal position has a reward of +1, else -0.1 for each step
if self.position == self.goal:
return self.position, 1 # Reward for reaching the goal
else:
return self.position, -0.1 # Small negative reward for each step
def render(self):
grid = np.full((self.size, self.size), " ")
grid[self.goal] = "G" # Goal
done = False
while not done:
action = agent.choose_action(state)
next_state, reward = env.step(action)
agent.update_q_value(state, action, reward, next_state)
total_reward += reward
state = next_state
if state == env.goal:
done = True # Episode ends when goal is reached
rewards.append(total_reward)
if epoch % 100 == 0:
Dept. of CSE, CMR Page 24
4CSPL2041 IML LAB
return rewards
# Manually train the agent before testing (or load pre-trained Q-table)
train_agent(epochs=1000)
env.render()
print(f"Total reward after {steps} steps: {total_reward}")
Output:
Epoch 0: Total Reward = -0.30000000000000004
Epoch 100: Total Reward = -0.09999999999999987
Epoch 200: Total Reward = 0.10000000000000009
Epoch 300: Total Reward = -0.09999999999999987
Epoch 400: Total Reward = 1.1102230246251565e-16
Epoch 500: Total Reward = -0.19999999999999996
Epoch 600: Total Reward = 0.30000000000000004
Epoch 700: Total Reward = -0.19999999999999996
Epoch 800: Total Reward = 0.20000000000000007
Epoch 900: Total Reward = 0.10000000000000009
Epoch 0: Total Reward = -1.0000000000000004
Epoch 100: Total Reward = 0.10000000000000009
Epoch 200: Total Reward = -0.30000000000000004
Epoch 300: Total Reward = 0.20000000000000007
Epoch 400: Total Reward = 0.10000000000000009
Epoch 500: Total Reward = 0.30000000000000004
Epoch 600: Total Reward = 0.30000000000000004
Epoch 700: Total Reward = 0.20000000000000007
Result:
The final results after training and testing the reinforcement learning agent in the Grid World environment are
as follows:
During training, the agent gradually learned to navigate the grid, with total rewards fluctuating as it explored
different paths. Some episodes resulted in negative rewards due to inefficient movements, while others showed
improvement, reaching the goal with optimal steps.
After testing, the agent took 10 steps and achieved a total reward of -1.0, indicating that it may still need further
training or fine-tuning of hyperparameters to improve efficiency.
AIM:
To develop a traffic signal control system using reinforcement learning technique.
PROGRAM:
import numpy as np
import random
def reset(self):
self.state = 0 # Reset state to Red
return self.state
epochs = 100
steps_per_epoch = 10
# Update Q-table
agent.update_q_table(state, action, reward, next_state)
Output:
Epoch 1: Total Reward = 41
Epoch 2: Total Reward = 30
Epoch 3: Total Reward = 27
Epoch 4: Total Reward = 27
Epoch 5: Total Reward = 30
Epoch 6: Total Reward = 41
Epoch 7: Total Reward = 10
Epoch 8: Total Reward = 27
Epoch 9: Total Reward = 30
Epoch 10: Total Reward = 50
Epoch 11: Total Reward = 30
Epoch 12: Total Reward = 30
Epoch 13: Total Reward = 21
Epoch 14: Total Reward = 10
Epoch 15: Total Reward = 21
Epoch 16: Total Reward = 47
Epoch 17: Total Reward = 21
Epoch 18: Total Reward = 10
Epoch 19: Total Reward = 50
Epoch 20: Total Reward = 41
Epoch 21: Total Reward = 30
Epoch 22: Total Reward = 27
Epoch 23: Total Reward = 27
Epoch 24: Total Reward = 30
Dept. of CSE, CMR Page 30
4CSPL2041 IML LAB
Final Q-table:
[[43.28894396 50.26009451]
[38.81486928 45.50582136]
[38.16273778 46.17528699]]
Result:
A reinforcement learning-based traffic signal control system was successfully developed, optimizing signal
timings to reduce congestion and improve traffic flow.