0% found this document useful (0 votes)
27 views1 page

TTDS Assignment 1 Fall 2023

The document provides instructions for 4 questions as part of an assignment on data science tools and techniques. Question 1 asks students to conduct a chi-square test to determine if there is independence between gender and soda brand preference based on survey results. Question 2 asks students to apply smoothing techniques to pre-process text data. Question 3 provides a small credit card approval dataset and asks students to construct a decision tree to predict approvals using information gain. Question 4 asks students to calculate the probability of approval according to Naive Bayes using the data from Question 3.

Uploaded by

hsaleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views1 page

TTDS Assignment 1 Fall 2023

The document provides instructions for 4 questions as part of an assignment on data science tools and techniques. Question 1 asks students to conduct a chi-square test to determine if there is independence between gender and soda brand preference based on survey results. Question 2 asks students to apply smoothing techniques to pre-process text data. Question 3 provides a small credit card approval dataset and asks students to construct a decision tree to predict approvals using information gain. Question 4 asks students to calculate the probability of approval according to Naive Bayes using the data from Question 3.

Uploaded by

hsaleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

NED UNIVERSITY OF ENGINEERING & TECHNOLOGY

MS – DATA SCIENCE
FALL 2023
TOOLS AND TECHNIQUES
FOR DATA SCIENCE (CT-583)
Assignment 1
Max Marks: 10
Due Date: 25-Nov-2023

Question 1
Suppose you are conducting a study to determine whether there is an association between gender and
preference for a particular brand of soda. You survey 100 people and ask them whether they prefer Brand
A or Brand B, as well as their gender. The results are as follows: [2.5 marks]
Brand A Brand B Total
Male 30 20 50
Female 25 25 50
Total 55 45 100
Using the chi-square method, test the hypothesis that gender and brand preference are independent. Use a
significance level of 0.05.

Question 2 [2.5 marks]


Apply the smoothing (binning) methods with equi-depth i.e., by bin means and by bin boundaries to
pre-process the following data:

T, O, U, L, S, T, E, C, H, N, I, Q

Question 3 [2.5 marks]


Suppose we have a dataset of 100 people who applied for a credit card. Each person is described by
four features: age (in years), income, employment status, and credit score. The target variable is
whether or not the person was approved for a credit card (Y for approved, N for not approved). Here's a
small subset of the data:
Age Income Employment Credit Score Approved
18-23 Low No Max Y
24-35 Medium Yes Max Y
36-50 High Yes Min N
18-23 Low Yes Max Y
24-35 Medium No Min N

Construct a decision tree that predicts whether or not a person will be approved for a credit card using
information gain to determine which feature to split on at each node.

Question 4 [2.5 marks]


What is the probability that predicts whether or not a person will be approved for a credit card, according to
Naive Bayes using the scenario of Question 3?

You might also like