Assignment 1

Uploaded by

Rupam Pakhira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views2 pages

Assignment 1

Uploaded by

Rupam Pakhira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Analytics, Autumn 2024

Assignment 1

Recently, former US President and 2024 Republican presidential candidate Donald

Trump was shot at an outdoor rally in Pennsylvania on a Saturday evening. In response,
the intelligence team conducted a nationwide random sampling of individuals to
identify potential suspects. The collected data includes various attributes such as age,
gender, and occupation, which are detailed in the provided dataframe (Synthetic Data).
The last column of the dataframe indicates the probability of each individual being
either a criminal or innocent, with two possible labels: <=0.5 and >0.5. For classification
purposes, these labels represent distinct classes: a probability greater than 0.5 suggests
a potential criminal, while a probability of 0.5 or less indicates innocence.

Given this dataset, perform the following tasks. You may handle missing values (if any)
according to a scheme of your choice.

A. Perform an Exploratory Data Analysis (EDA) on the dataset. EDA may include
frequency distribution, univariate and multivariate correlation analysis, as well as basic
data visualization (remember the EDA tutorial on 16th August). You are required to
prepare a report (Latex Template) in pdf that includes Exploratory Data Analysis (EDA)
and insights gained from each implementation.

B. Implement Naive Bayes model on the given dataset without relying on any machine
learning libraries (e.g., sklearn). Your task is to code the Naive Bayes algorithm from
scratch to classify individuals as either criminal or innocent. You may use basic packages
such as numpy, pandas, and math. No marks will be given if you use any other custom
classifier or ML libraries.

C. Now, implement Naive Bayes, SVM, Decision Tree, and KNN using the sklearn module
to perform the classification task. Compare the performance of the sklearn Naive Bayes
implementation with your custom Naive Bayes implementation from part B.

D. Finally, implement an ensemble model by combining multiple classifiers, including

your custom Naive Bayes implementation (without sklearn), the sklearn version of SVM,
Decision Tree, and KNN. You must write the ensemble code from scratch, without
relying on any ensemble-related libraries.

Before evaluating the performance, we will first conduct a code plagiarism check. The
performance of each implementation will be assessed using our private test dataset. For
evaluation, we will use accuracy and F1 score metrics. Additionally, we will measure the
running time of each implementation.
For this assignment, each team must submit a single Python file (.py) along with a Colab
notebook link. The notebook should include clearly labelled headings for each cell for
the execution. Afterwards, create a plot to visualize the performance comparison of all
your implementations, including the ensemble model. Add this plot and associated
discussion in the report prepared for part A.

We’ll use MS Teams to accept the submissions. Only one member from each team should
submit the assignment deliverables.

Vijaya ML
88% (8)
Vijaya ML
26 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
CS4038D Data Mining Assignment 2 - 2024
No ratings yet
CS4038D Data Mining Assignment 2 - 2024
2 pages
Problem Statement
No ratings yet
Problem Statement
1 page
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
Assignment - 1 - Machine Learning
No ratings yet
Assignment - 1 - Machine Learning
3 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
No ratings yet
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
4 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Lab Sheet1
No ratings yet
Lab Sheet1
1 page
Task 2P-1
No ratings yet
Task 2P-1
4 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
Final Project Implementation
No ratings yet
Final Project Implementation
3 pages
E4 DS203 2023 Sem2
No ratings yet
E4 DS203 2023 Sem2
2 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
DS Assignment
No ratings yet
DS Assignment
7 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Ijeqi
No ratings yet
Ijeqi
10 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Challenge 2024
No ratings yet
Challenge 2024
5 pages
Data Scientist Exercise
No ratings yet
Data Scientist Exercise
2 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Ai 5
No ratings yet
Ai 5
7 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
AMLW Assignment 3
No ratings yet
AMLW Assignment 3
2 pages
MP 1
No ratings yet
MP 1
2 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
201CS240 Mllabmanual
No ratings yet
201CS240 Mllabmanual
20 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Datascience
No ratings yet
Datascience
8 pages
W2. Homework - Pipeline
No ratings yet
W2. Homework - Pipeline
1 page
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
CPE531 S18 MT Sol PDF
No ratings yet
CPE531 S18 MT Sol PDF
3 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
A953167755 19 2024 Ca1mktm698
No ratings yet
A953167755 19 2024 Ca1mktm698
2 pages
Assignment 2 Task Sheet
No ratings yet
Assignment 2 Task Sheet
3 pages
FIT1043 A2 Specification - S2 2024 - Gks6arg
No ratings yet
FIT1043 A2 Specification - S2 2024 - Gks6arg
5 pages
M818A: Machine Learning and Cyber Security-A
No ratings yet
M818A: Machine Learning and Cyber Security-A
11 pages
Assignment # 02
No ratings yet
Assignment # 02
1 page
Phase 3
No ratings yet
Phase 3
19 pages
Mba ZG536 Assignment - Ii 22ND October
No ratings yet
Mba ZG536 Assignment - Ii 22ND October
1 page
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Project Report
100% (3)
Project Report
36 pages
Predictive CA2
No ratings yet
Predictive CA2
1 page
CURE Project Deliverable 1 Sep 17
No ratings yet
CURE Project Deliverable 1 Sep 17
8 pages
COMP1831 LabExercise5
No ratings yet
COMP1831 LabExercise5
6 pages
Solutions To Applied Data Science AI
No ratings yet
Solutions To Applied Data Science AI
9 pages
QBUS6840 Group Assignment (30 Marks) : 1 Background and Task
No ratings yet
QBUS6840 Group Assignment (30 Marks) : 1 Background and Task
3 pages
DLWP Assignment 2
No ratings yet
DLWP Assignment 2
2 pages
Machine Learning Business Report PDF
No ratings yet
Machine Learning Business Report PDF
54 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Ad3461-ML Manual
No ratings yet
Ad3461-ML Manual
27 pages
SC Cat
No ratings yet
SC Cat
6 pages
Data Science Interns Tasks
No ratings yet
Data Science Interns Tasks
2 pages
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
No ratings yet
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
5 pages
AI5006 - Deep Learning
No ratings yet
AI5006 - Deep Learning
6 pages
DL Assignment 4
No ratings yet
DL Assignment 4
7 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Marks Hi Marks: Be Comp MCQ PDF
100% (1)
Marks Hi Marks: Be Comp MCQ PDF
878 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
11 pages
PHD Syllabus 2023 24 - Batch 20 12 2024
No ratings yet
PHD Syllabus 2023 24 - Batch 20 12 2024
24 pages
RMM Data Mining Lab Manual Iv-I Cse R16 2019-2020 PDF
No ratings yet
RMM Data Mining Lab Manual Iv-I Cse R16 2019-2020 PDF
136 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
216 649 4 PB - 3
No ratings yet
216 649 4 PB - 3
9 pages
UNIT-5 ML Notes
No ratings yet
UNIT-5 ML Notes
24 pages
PIFA: An Intelligent Phase Identification and Frequency Adjustment Framework For Time-Sensitive Mobile Computing
No ratings yet
PIFA: An Intelligent Phase Identification and Frequency Adjustment Framework For Time-Sensitive Mobile Computing
11 pages
Course Outline of Marketing Research (For MBA)
No ratings yet
Course Outline of Marketing Research (For MBA)
5 pages
Advanced Image Processing For Fingerprint-Based Blood Grouping
No ratings yet
Advanced Image Processing For Fingerprint-Based Blood Grouping
6 pages
Lecture Notes: Introduction To Machine Learning For The Sciences
No ratings yet
Lecture Notes: Introduction To Machine Learning For The Sciences
80 pages
ML Assignment 2
No ratings yet
ML Assignment 2
25 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Arxiv: Natural Language Processing (Almost) From Scratch
No ratings yet
Arxiv: Natural Language Processing (Almost) From Scratch
47 pages
Big Data
No ratings yet
Big Data
28 pages
Texture Analysis For Classification of Thyroid Ultrasound Images PDF
No ratings yet
Texture Analysis For Classification of Thyroid Ultrasound Images PDF
5 pages
AI Final Assignment
No ratings yet
AI Final Assignment
27 pages
Feature Extraction Method Based On Filter Banks and Riemannian Tangent Space in Motor-Imagery BCI2022
No ratings yet
Feature Extraction Method Based On Filter Banks and Riemannian Tangent Space in Motor-Imagery BCI2022
11 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Research On Vehicle-Based Driver StatusPerformance Monitoring Development, Validation, and Refinement of Algorithms For Detection of Driver Drowsiness, Final Report
No ratings yet
Research On Vehicle-Based Driver StatusPerformance Monitoring Development, Validation, and Refinement of Algorithms For Detection of Driver Drowsiness, Final Report
247 pages
ML GTU Solution
No ratings yet
ML GTU Solution
83 pages
Project
No ratings yet
Project
21 pages
STM Notes - Unit-3
No ratings yet
STM Notes - Unit-3
40 pages
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
No ratings yet
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
2 pages
Piyush Kumar Yadav
No ratings yet
Piyush Kumar Yadav
1 page