0% found this document useful (0 votes)

8 views8 pages

Data Science

DATA SCIENCE LAB MANUAL

Uploaded by

Geetha A L

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

Data Science

DATA SCIENCE LAB MANUAL

Uploaded by

Geetha A L

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

MODULE-3

1 Train a regularized logistic regression classifier on the iris dataset (https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

or the inbuilt iris dataset) using sklearn. Train the model with the following hyperparameter C = 1e4 and report the best
classification accuracy.

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split from sklearn.linear_model

import LogisticRegression from sklearn.preprocessing import StandardScaler from

sklearn.pipeline import make_pipeline

# Load the Iris dataset iris = load_iris()

X = iris.data y = iris.target

# Split the data into training and testing

sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a pipeline with

StandardScaler and LogisticRegression with regularization pipeline = make_pipeline(StandardScaler(),

LogisticRegression(C=1e4, max_iter=1000)) # Train the model

pipeline.fit(X_train, y_train)

# Calculate the accuracy on the testing set accuracy =

pipeline.score(X_test, y_test) print("Classification accuracy:", accuracy)

2. Train an SVM classifier on the iris dataset using sklearn. Try different kernels and the associated hyperparameters. Train model
with the following set of hyperparameters RBFkernel, gamma=0.5, one-vs-rest classifier, no-feature-normalization. Also try
C=0.01,1,10C=0.01,1,10. For the above set of hyperparameters, find the best classification accuracy along with total number of
support vectors on the test data

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split from sklearn.svm import SVC

# Load the Iris dataset

iris = load_iris()

X = iris.data y = iris.target

# Split the data into training and

testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Set of hyperparameters

to try

hyperparameters = [

{'kernel': 'rbf', 'gamma': 0.5, 'C': 0.01},

{'kernel': 'rbf', 'gamma': 0.5, 'C': 1},

{'kernel': 'rbf', 'gamma': 0.5, 'C': 10}

best_accuracy = 0 best_model = None

best_support_vectors = None

# Train SVM models with different hyperparameters and find the best accuracy for params in

hyperparameters:
model = SVC(kernel=params['kernel'], gamma=params['gamma'], C=params['C'],
decision_function_shape='ovr')

model.fit(X_train, y_train)

accuracy = model.score(X_test, y_test) support_vectors =

model.n_support_.sum()

print(f"For hyperparameters: {params}, Accuracy: {accuracy}, Total

Support Vectors:
{support_vectors}")

if accuracy > best_accuracy:

best_accuracy = accuracy best_model = model

best_support_vectors = support_vectors print("\nBest accuracy:",

best_accuracy)

print("Total support vectors on test data:", best_support_vectors)

MODULE-4

Consider the following dataset. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
from sklearn.tree import DecisionTreeClassifier, export_graphviz

from sklearn.model_selection import train_test_split from sklearn.metrics import

accuracy_score

import pandas as pd from io import StringIO

from IPython.display import Image import pydotplus

# Define the dataset data = {

'Price': ['Low', 'Low', 'Low', 'Low', 'Low', 'Med',

'Med', 'Med', 'Med', 'High', 'High', 'High',
'High'],

'Maintenance': ['Low', 'Med', 'Low', 'Med', 'High', 'Med', 'Med', 'High', 'High', 'Med', 'Med', 'High', 'High'],

'Capacity': ['2', '4', '4', '4', '4', '4', '4', '2', '5', '4', '2', '2', '5'],

'Airbag': ['No', 'Yes', 'No', 'No', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes'], 'Profitable': [1, 1, 1, 0, 0, 0,

1, 0, 1, 1, 1, 0, 1]

df = pd.DataFrame(data)

# Convert categorical variables into numerical ones

df = pd.get_dummies(df, columns=['Price', 'Maintenance', 'Airbag']) # Separate features and

target variable

X = df.drop('Profitable', axis=1) y = df['Profitable']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a decision tree

classifier

clf = DecisionTreeClassifier(criterion='entropy')
# Train the classifier on the training data

clf.fit(X_train, y_train)

# Predict on the testing data y_pred = clf.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred) print("Accuracy:",

accuracy)

# Visualize the decision tree dot_data = StringIO()

export_graphviz(clf, out_file=dot_data, filled=True, rounded=True, special_characters=True, feature_names=X.columns)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png())

Consider the dataset spiral.txt (https://bit.ly/2Lm75Ly). The first two columns in the dataset corresponds to the co-ordinates of each
data point. The third column corresponds to the actual cluster label. Compute the rand index for the following methods:

K – means Clustering

 Single – link Hierarchical Clustering

 Complete link hierarchical clustering.

 Also visualize the dataset and which algorithm will be able to recover the true clusters.

import numpy as np

from sklearn.cluster import KMeans, AgglomerativeClustering from

sklearn.metrics import adjusted_rand_score

import matplotlib.pyplot as plt

# Load the dataset

data = np.loadtxt("Spiral.txt", delimiter=",", skiprows=1)

X = data[:, :2] # Features

y_true = data[:, 2] # Actual cluster labels

# Visualize the dataset

plt.figure(figsize=(8, 6))

plt.scatter(X[:, 0], X[:, 1], c=y_true, cmap='viridis') plt.title('True

Clusters')

plt.xlabel('X1')

plt.ylabel('X2') plt.show()

# K-means clustering

# kmeans = KMeans(n_clusters=3, random_state=42)

kmeans = KMeans(n_clusters=3, random_state=42, n_init=10) kmeans_clusters =

kmeans.fit_predict(X)

# Single-link Hierarchical Clustering

single_link = AgglomerativeClustering(n_clusters=3, linkage='single') single_link_clusters

= single_link.fit_predict(X)

# Complete-link Hierarchical Clustering

complete_link = AgglomerativeClustering(n_clusters=3, linkage='complete') complete_link_clusters

= complete_link.fit_predict(X)
# Compute the Rand Index

rand_index_kmeans = adjusted_rand_score(y_true, kmeans_clusters) rand_index_single_link =

adjusted_rand_score(y_true, single_link_clusters) rand_index_complete_link = adjusted_rand_score(y_true,

complete_link_clusters)

print("Rand Index for K-means Clustering:", rand_index_kmeans)

print("Rand Index for Single-link Hierarchical Clustering:", rand_index_single_link) print("Rand Index for Complete-

link Hierarchical Clustering:", rand_index_complete_link)

# This code will compute the Rand Index for each clustering method and provide a visualization of the true clusters.

# The Rand Index ranges from 0 to 1, where 1 indicates perfect clustering agreement with the true clusters.

# The method with a higher Rand Index is better at recovering the true clusters.

MODULE-5

Mini Project – Simple web scrapping in social media

import requests

from bs4 import BeautifulSoup

# URL of the Instagram profile you want to scrape url = 'https://

www.instagram.com/openai/'

# Send a GET request to the URL response =

requests.get(url) print(response.status_code)

# Check if the request was successful (status

code 200)
if response.status_code == 200:

# Parse the HTML content of the page

soup = BeautifulSoup(response.text, 'html.parser') # Find all post

elements

posts = soup.find_all('div', class_='v1Nh3') # Extract data

from each post

for post in posts:

print("Hi")

# Extract post link

post_link = post.find('a')['href'] # Extract post

image URL

image_url = post.find('img')['src'] print(f"Post

Link: {post_link}") print(f"Image URL:

{image_url}") print("------")

else:

print("Failed to retrieve data from Instagram")

BCA-121 Computer Fundamental
No ratings yet
BCA-121 Computer Fundamental
13 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Machine Learning With SQL
100% (1)
Machine Learning With SQL
12 pages
Machine Learning LAB: Practical-1
100% (2)
Machine Learning LAB: Practical-1
24 pages
CIT (Web, Software Mobile Application Development)
0% (1)
CIT (Web, Software Mobile Application Development)
34 pages
ACTIVITY 1: Introduction of Programming Language: Duration: 2 Hours Learning Outcomes
No ratings yet
ACTIVITY 1: Introduction of Programming Language: Duration: 2 Hours Learning Outcomes
4 pages
Next Generation ABAP Runtime Analysis (SAT) - Introduction - SAP Blogs
No ratings yet
Next Generation ABAP Runtime Analysis (SAT) - Introduction - SAP Blogs
21 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
mongoDB 1
No ratings yet
mongoDB 1
23 pages
TIA Portal SIMATIC Creating The Project and Hardware
No ratings yet
TIA Portal SIMATIC Creating The Project and Hardware
26 pages
Interrupts
No ratings yet
Interrupts
5 pages
Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Unit 2 Final
No ratings yet
Unit 2 Final
75 pages
Chapter 1 Introductionc
No ratings yet
Chapter 1 Introductionc
48 pages
M2 - Lab3 - Solucion - ME Factorytalk View Application Documentation
No ratings yet
M2 - Lab3 - Solucion - ME Factorytalk View Application Documentation
35 pages
Meteor Module
No ratings yet
Meteor Module
39 pages
Ammar Python Assignment
No ratings yet
Ammar Python Assignment
26 pages
MLfull
No ratings yet
MLfull
29 pages
Docuentation TMS
No ratings yet
Docuentation TMS
101 pages
ML06 Classical Techniques
No ratings yet
ML06 Classical Techniques
38 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Gain Mapper For Profit Optimizer Users Guide
No ratings yet
Gain Mapper For Profit Optimizer Users Guide
26 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Instant Download Java Programming Joyce Farrell PDF All Chapter
100% (6)
Instant Download Java Programming Joyce Farrell PDF All Chapter
53 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Lecture Human Computer Interaction Note
No ratings yet
Lecture Human Computer Interaction Note
37 pages
Predictive Modeling Machine Learning
No ratings yet
Predictive Modeling Machine Learning
16 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
AI Lab M.Tech
No ratings yet
AI Lab M.Tech
29 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Karmbir 19 ML
No ratings yet
Karmbir 19 ML
20 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
2 Machine Learning
No ratings yet
2 Machine Learning
21 pages
Practical File DL
No ratings yet
Practical File DL
14 pages
Guide
No ratings yet
Guide
24 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
PG TRB Os Class6
No ratings yet
PG TRB Os Class6
40 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
3 Classification
No ratings yet
3 Classification
16 pages
ML Models
No ratings yet
ML Models
21 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
Python Machine Learning in 7 Days
No ratings yet
Python Machine Learning in 7 Days
10 pages
Group B: Machine Learning
No ratings yet
Group B: Machine Learning
25 pages
Vtu ML
No ratings yet
Vtu ML
13 pages
Ex 6, EX 7 AIML
No ratings yet
Ex 6, EX 7 AIML
9 pages
Data Collection
No ratings yet
Data Collection
8 pages
Regression Linaire Python Tome II
No ratings yet
Regression Linaire Python Tome II
10 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Fractional KnapSack Problem
No ratings yet
Fractional KnapSack Problem
6 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
CS3491 Lab Manual
No ratings yet
CS3491 Lab Manual
21 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Exp4 - Supervised Learning
No ratings yet
Exp4 - Supervised Learning
10 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
CS 200 Lab1 Manual
No ratings yet
CS 200 Lab1 Manual
7 pages
ML3,4
No ratings yet
ML3,4
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
Machine Learning Case Study
No ratings yet
Machine Learning Case Study
8 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
ML Assignment 1 - Nageswar
No ratings yet
ML Assignment 1 - Nageswar
7 pages
Module 5
No ratings yet
Module 5
5 pages
Classification Review
No ratings yet
Classification Review
8 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Read Serial Data Directly Into Octave
No ratings yet
Read Serial Data Directly Into Octave
8 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
7 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
Big Data Practical
No ratings yet
Big Data Practical
20 pages
P 4 Andp 5
No ratings yet
P 4 Andp 5
4 pages
Machine - Learning - Assignment - 3
No ratings yet
Machine - Learning - Assignment - 3
5 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
No ratings yet
Fundamentals of Machine Learning Support Vector Machines, Practical Session
4 pages
Affinity Propagation
No ratings yet
Affinity Propagation
3 pages
Constructors and Parsing Methods of Wrapper Classes in Java
No ratings yet
Constructors and Parsing Methods of Wrapper Classes in Java
3 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
Mig Layout Cheat Sheet V2.5
No ratings yet
Mig Layout Cheat Sheet V2.5
5 pages
Testing Framework Using Selenium
No ratings yet
Testing Framework Using Selenium
3 pages
Classification
No ratings yet
Classification
4 pages
TD2345
No ratings yet
TD2345
3 pages
Entropy Coding - Wikipedia
No ratings yet
Entropy Coding - Wikipedia
2 pages
Dim STR As String
No ratings yet
Dim STR As String
2 pages
Language Evaluation Criteria
No ratings yet
Language Evaluation Criteria
2 pages
Cheat Sheet Building Supervised Learning Models
No ratings yet
Cheat Sheet Building Supervised Learning Models
3 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
Sample Resume
No ratings yet
Sample Resume
2 pages
Vishal Verma, IIITM, Gwalior
No ratings yet
Vishal Verma, IIITM, Gwalior
1 page
LinearFeedbackShiftRegister PDF
No ratings yet
LinearFeedbackShiftRegister PDF
1 page
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet