0% found this document useful (0 votes)

178 views8 pages

DSML Problem Statements

This document contains 25 problem statements related to data science and machine learning concepts. The problems involve performing tasks like data wrangling, exploratory data analysis, predictive modeling, and clustering using Python and various datasets. The datasets mentioned include Titanic, House Prices, Iris Flowers, COVID Vaccination data for India, and more. The problems cover concepts like data cleaning, feature engineering, visualization, decision trees, k-means clustering, evaluating classification models and more.

Uploaded by

Mangesh Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

178 views8 pages

DSML Problem Statements

Uploaded by

Mangesh Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Bansilal Ramnath Agarwal Charitable Trust’s

Vishwakarma Institute of Information Technology, Pune-48

(An Autonomous Institute Affiliated to Savitribai Phule Pune University)

Department of Computer Engineering

Data Science and Machine Learning Problem Statements

Suggested List of Assignments

1. Perform the following operations using Python on a data set : read data
from different formats(like csv, xls),indexing and selecting data, sort data,
describe attributes of data, checking data types of each column. (Use
Titanic Dataset).

2. Perform the following operations using Python on the Telecom_Churn

dataset. Compute and display summary statistics for each feature available
in the dataset using separate commands for each statistic. (e.g. minimum
value, maximum value, mean, range, standard deviation, variance and
percentiles).

3. Perform the following operations using Python on the data set

House_Price Prediction dataset. Compute standard deviation, variance and
percentiles using separate commands, for each feature. Create a histogram
for each feature in the dataset to illustrate the feature distributions.

4. Write a program to do: A dataset collected in a cosmetics shop showing

details of customers and whether or not they responded to a special offer
to buy a new lip-stick is shown in table below. (Implement step by step
using commands - Dont use library) Use this dataset to build a decision
tree, with Buys as the target variable, to help in buying lipsticks in the
future. Find the root node of the decision tree.

5. Write a program to do: A dataset collected in a cosmetics shop showing

details of customers and whether or not they responded to a special offer
to buy a new lip-stick is shown in table below. (Use library commands)
According to the decision tree you have made from the previous training
data set, what is the decision for the test data: [Age < 21, Income = Low,
Gender = Female, Marital Status = Married]?

6. Write a program to do: A dataset collected in a cosmetics shop showing

details of customers and whether or not they responded to a special offer
to buy a new lip-stick is shown in table below. (Use library commands)
According to the decision tree you have made from the previous training
data set, what is the decision for the test data: [Age > 35, Income =
Medium, Gender = Female, Marital Status = Married]?

7. Write a program to do: A dataset collected in a cosmetics shop showing

details of customers and whether or not they responded to a special offer
to buy a new lip-stick is shown in table below. (Use library commands)
According to the decision tree you have made from the previous training
data set, what is the decision for the test data: [Age > 35, Income =
Medium, Gender = Female, Marital Status = Married]?

8. Write a program to do: A dataset collected in a cosmetics shop showing

details of customers and whether or not they responded to a special offer
to buy a new lip-stick is shown in table below. (Use library commands)
According to the decision tree you have made from the previous training
data set, what is the decision for the test data: [Age = 21-35, Income = Low,
Gender = Male, Marital Status = Married]?

9. Write a program to do the following: You have given a collection of 8

points. P1=[0.1,0.6] P2=[0.15,0.71] P3=[0.08,0.9] P4=[0.16, 0.85]
P5=[0.2,0.3] P6=[0.25,0.5] P7=[0.24,0.1] P8=[0.3,0.2]. Perform the k-mean
clustering with initial centroids as m1=P1 =Cluster#1=C1 and
m2=P8=cluster#2=C2. Answer the following 1] Which cluster does P6
belong to? 2] What is the population of a cluster around m2? 3] What is
the updated value of m1 and m2?

10. Write a program to do the following: You have given a collection of 8

points. P1=[2, 10] P2=[2, 5] P3=[8, 4] P4=[5, 8] P5=[7,5] P6=[6, 4] P7=[1, 2]
P8=[4, 9]. Perform the k-mean clustering with initial centroids as m1=P1
=Cluster#1=C1 and m2=P4=cluster#2=C2, m3=P7 =Cluster#3=C3. Answer
the following 1] Which cluster does P6 belong to? 2] What is the
population of a cluster around m3? 3] What is the updated value of m1,
m2, m3?

11. Use Iris flower dataset and perform following :

1. List down the features and their types (e.g., numeric, nominal)
available in the dataset. 2. Create a histogram for each feature in the
dataset to illustrate the feature distributions.

12.Use Iris flower dataset and perform following :

1. Create a box plot for each feature in the dataset.

2. Identify and discuss distributions and identify outliers from them.

13. Use the covid_vaccine_statewise.csv dataset and perform the following

analytics.

a. Describe the dataset

b. Number of persons state wise vaccinated for first dose in India

c. Number of persons state wise vaccinated for second dose in India

14. Use the covid_vaccine_statewise.csv dataset and perform the following

analytics.
A. Describe the dataset.

B. Number of Males vaccinated

C.. Number of females vaccinated

15. Use the dataset 'titanic'. The dataset contains 891 rows and contains
information about the passengers who boarded the unfortunate Titanic
ship. Use the Seaborn library to see if we can find any patterns in the data.

16. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and
contains information about the passengers who boarded the unfortunate
Titanic ship. Write a code to check how the price of the ticket (column
name: 'fare') for each passenger is distributed by plotting a histogram.

17. Compute Accuracy, Error rate, Precision, Recall for following confusion
matrix ( Use formula for each)

True Positives (TPs): 1 False Positives (FPs): 1

False Negatives (FNs): 8 True Negatives (TNs): 90

18. Use House_Price prediction dataset. Provide summary statistics (mean,

median, minimum, maximum, standard deviation) of variables (categorical
vs quantitative) such as- For example, if categorical variable is age groups
and quantitative variable is income, then provide summary statistics of
income grouped by the age groups.

19. Write a Python program to display some basic statistical details like
percentile, mean, standard deviation etc (Use python and pandas
commands) the species of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-versicolor’
of iris.csv dataset.

20. Write a program to cluster a set of points using K-means for IRIS
dataset. Consider, K=3, clusters. Consider Euclidean distance as the
distance measure. Randomly initialize a cluster mean as one of the data
points. Iterate at least for 10 iterations. After iterations are over, print the
final cluster means for each of the clusters.

21. Write a program to cluster a set of points using K-means for IRIS
dataset. Consider, K=4, clusters. Consider Euclidean distance as the
distance measure. Randomly initialize a cluster mean as one of the data
points. Iterate at least for 10 iterations. After iterations are over, print the
final cluster means for each of the clusters.

22. Compute Accuracy, Error rate, Precision, Recall for the following
confusion matrix.
Actual Class\Predicted cancer = cancer = Total
class yes no

cancer = yes 90 210 300

cancer = no 140 9560 9700

Total 230 9770 1000

23. With reference to Table , obtain the Frequency table for the
attribute age. From the frequency table you have obtained, calculate the
information gain of the frequency table while splitting on Age. (Use step
by step Python/Pandas commands)

24. Perform the following operations using Python on a suitable data set,
counting unique values of data, format of each column, converting variable
data type (e.g. from long to short, vice versa), identifying missing values
and filling in the missing values.

25. Perform Data Cleaning, Data transformation using Python on any data
set.

https://www.kaggle.com/sudalairajkumar/covid19-in-india?select=covid
_vaccine_statewise.csv

Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Python Lab Manual
No ratings yet
Python Lab Manual
33 pages
Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
PL 300 Master Cheat Sheet
100% (1)
PL 300 Master Cheat Sheet
19 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
AI Project Cycle Question Bank
No ratings yet
AI Project Cycle Question Bank
14 pages
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
No ratings yet
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
91 pages
CLIQUE and PROCLUS
0% (1)
CLIQUE and PROCLUS
13 pages
Datascience
No ratings yet
Datascience
8 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
ML - Practical List
No ratings yet
ML - Practical List
3 pages
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
End Sem PYQ
No ratings yet
End Sem PYQ
8 pages
DATASCIENCE
No ratings yet
DATASCIENCE
3 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
PRACTICAL QUESTIONS For DSBDA
No ratings yet
PRACTICAL QUESTIONS For DSBDA
9 pages
Manishadav
No ratings yet
Manishadav
27 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
DM File Kashish
No ratings yet
DM File Kashish
40 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Write A Python Program To Check The Validity of A Password Given by The User. The Password
No ratings yet
Write A Python Program To Check The Validity of A Password Given by The User. The Password
5 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Rdatascience - Problem Statements
No ratings yet
Rdatascience - Problem Statements
2 pages
23HCS4142 PDF
No ratings yet
23HCS4142 PDF
24 pages
Questions
No ratings yet
Questions
7 pages
DSBDA Sample Problem Statements
No ratings yet
DSBDA Sample Problem Statements
3 pages
Final Paper MF 450 BA
No ratings yet
Final Paper MF 450 BA
1 page
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
GE Practical Sem 2
No ratings yet
GE Practical Sem 2
28 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
Lab Questions IDSE 2024
No ratings yet
Lab Questions IDSE 2024
7 pages
PPPL Final Practical Questions
No ratings yet
PPPL Final Practical Questions
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Science
No ratings yet
Data Science
18 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
EM526 Quiz 2
No ratings yet
EM526 Quiz 2
2 pages
ML Lab Question Set - 1
No ratings yet
ML Lab Question Set - 1
5 pages
AML ML Practical List
No ratings yet
AML ML Practical List
10 pages
ML Lab Manual 2024
No ratings yet
ML Lab Manual 2024
41 pages
PR List Dsbda
No ratings yet
PR List Dsbda
2 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
ML Lab Question Set - 21
No ratings yet
ML Lab Question Set - 21
4 pages
FDS Apr - May 2024
No ratings yet
FDS Apr - May 2024
4 pages
ML Lab Question Set - 2
No ratings yet
ML Lab Question Set - 2
5 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
2024 Fods Ques
No ratings yet
2024 Fods Ques
4 pages
Viva
No ratings yet
Viva
7 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
BTech Project Research Paper
No ratings yet
BTech Project Research Paper
7 pages
Semester I: Discipline: Interdisciplinary
No ratings yet
Semester I: Discipline: Interdisciplinary
155 pages
Morphological Cluster Induction of Bantu Words Using
No ratings yet
Morphological Cluster Induction of Bantu Words Using
9 pages
Business Analytics
No ratings yet
Business Analytics
35 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Course Pack - V Sem Machine Learning by DR SantoshKumar5
No ratings yet
Course Pack - V Sem Machine Learning by DR SantoshKumar5
27 pages
Melanoma Classification A Comprehensive Survey (3 240314 220858
No ratings yet
Melanoma Classification A Comprehensive Survey (3 240314 220858
67 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Journal of The American Society For Information Science and Technology - 2011 - Cobo - Science Mapping Software Tools
No ratings yet
Journal of The American Society For Information Science and Technology - 2011 - Cobo - Science Mapping Software Tools
21 pages
ATASSN CAT Course Syllabus
No ratings yet
ATASSN CAT Course Syllabus
8 pages
K-Means Clustering Using Weka Interface
No ratings yet
K-Means Clustering Using Weka Interface
6 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
34 pages
Big Data Mahout
No ratings yet
Big Data Mahout
10 pages
Review On Contact Tracing With Machine Learning
No ratings yet
Review On Contact Tracing With Machine Learning
4 pages
Chap 19 - CLustering
No ratings yet
Chap 19 - CLustering
18 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
Dr. Sanford To Dr. Samuel Flowerman, Director of AJC Department of Scientific Research, Regarding Publication Date For Attached Is Sixteen-Page Report and Schedule For Future Research Work
No ratings yet
Dr. Sanford To Dr. Samuel Flowerman, Director of AJC Department of Scientific Research, Regarding Publication Date For Attached Is Sixteen-Page Report and Schedule For Future Research Work
21 pages
Incremental Clustering Algorithms For Massive Dynamic Graphs
No ratings yet
Incremental Clustering Algorithms For Massive Dynamic Graphs
10 pages
2023-Fuzzy-Based Cluster Routing in Wireless Sensor Network
No ratings yet
2023-Fuzzy-Based Cluster Routing in Wireless Sensor Network
8 pages
DM 02 04 Data Transformation
No ratings yet
DM 02 04 Data Transformation
52 pages
Linktransformer:: A Unified Package For Record Linkage With Transformer Language Models
No ratings yet
Linktransformer:: A Unified Package For Record Linkage With Transformer Language Models
16 pages
Fault Detection Analysis Using Data Mining Techniques For A Cluster of Smart Office Buildings
No ratings yet
Fault Detection Analysis Using Data Mining Techniques For A Cluster of Smart Office Buildings
15 pages
JGR Atmospheres - 2022 - Fu - Quantifying Flash Droughts Over China From 1980 To 2017
No ratings yet
JGR Atmospheres - 2022 - Fu - Quantifying Flash Droughts Over China From 1980 To 2017
16 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

DSML Problem Statements

Uploaded by

DSML Problem Statements

Uploaded by

Bansilal Ramnath Agarwal Charitable Trust’s

Vishwakarma Institute of Information Technology, Pune-48

(An Autonomous Institute Affiliated to Savitribai Phule Pune University)

Department of Computer Engineering

Data Science and Machine Learning Problem Statements

Suggested List of Assignments

2. Perform the following operations using Python on the Telecom_Churn

3. Perform the following operations using Python on the data set

4. Write a program to do: A dataset collected in a cosmetics shop showing

5. Write a program to do: A dataset collected in a cosmetics shop showing

6. Write a program to do: A dataset collected in a cosmetics shop showing

7. Write a program to do: A dataset collected in a cosmetics shop showing

8. Write a program to do: A dataset collected in a cosmetics shop showing

9. Write a program to do the following: You have given a collection of 8

10. Write a program to do the following: You have given a collection of 8

11. Use Iris flower dataset and perform following :

12.Use Iris flower dataset and perform following :

1. Create a box plot for each feature in the dataset.

13. Use the covid_vaccine_statewise.csv dataset and perform the following

a. Describe the dataset

b. Number of persons state wise vaccinated for first dose in India

c. Number of persons state wise vaccinated for second dose in India

14. Use the covid_vaccine_statewise.csv dataset and perform the following

B. Number of Males vaccinated

C.. Number of females vaccinated

True Positives (TPs): 1 False Positives (FPs): 1

False Negatives (FNs): 8 True Negatives (TNs): 90

18. Use House_Price prediction dataset. Provide summary statistics (mean,

cancer = yes 90 210 300

cancer = no 140 9560 9700

Total 230 9770 1000

You might also like