DSML Problem Statements
DSML Problem Statements
1. Perform the following operations using Python on a data set : read data
from different formats(like csv, xls),indexing and selecting data, sort data,
describe attributes of data, checking data types of each column. (Use
Titanic Dataset).
1. List down the features and their types (e.g., numeric, nominal)
available in the dataset. 2. Create a histogram for each feature in the
dataset to illustrate the feature distributions.
15. Use the dataset 'titanic'. The dataset contains 891 rows and contains
information about the passengers who boarded the unfortunate Titanic
ship. Use the Seaborn library to see if we can find any patterns in the data.
16. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and
contains information about the passengers who boarded the unfortunate
Titanic ship. Write a code to check how the price of the ticket (column
name: 'fare') for each passenger is distributed by plotting a histogram.
17. Compute Accuracy, Error rate, Precision, Recall for following confusion
matrix ( Use formula for each)
19. Write a Python program to display some basic statistical details like
percentile, mean, standard deviation etc (Use python and pandas
commands) the species of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-versicolor’
of iris.csv dataset.
20. Write a program to cluster a set of points using K-means for IRIS
dataset. Consider, K=3, clusters. Consider Euclidean distance as the
distance measure. Randomly initialize a cluster mean as one of the data
points. Iterate at least for 10 iterations. After iterations are over, print the
final cluster means for each of the clusters.
21. Write a program to cluster a set of points using K-means for IRIS
dataset. Consider, K=4, clusters. Consider Euclidean distance as the
distance measure. Randomly initialize a cluster mean as one of the data
points. Iterate at least for 10 iterations. After iterations are over, print the
final cluster means for each of the clusters.
22. Compute Accuracy, Error rate, Precision, Recall for the following
confusion matrix.
Actual Class\Predicted cancer = cancer = Total
class yes no
23. With reference to Table , obtain the Frequency table for the
attribute age. From the frequency table you have obtained, calculate the
information gain of the frequency table while splitting on Age. (Use step
by step Python/Pandas commands)
24. Perform the following operations using Python on a suitable data set,
counting unique values of data, format of each column, converting variable
data type (e.g. from long to short, vice versa), identifying missing values
and filling in the missing values.
25. Perform Data Cleaning, Data transformation using Python on any data
set.
https://www.kaggle.com/sudalairajkumar/covid19-in-india?select=covid
_vaccine_statewise.csv