Qns Exam2
Qns Exam2
The marketing manager at Tom Thumb wants to investigate the sales data of their new products in its
branches during the last six months. She is interested in finding important factors in sales unit to
optimize the store’s revenue. She collected the following data for each product id. The Tom Thumb hires
you as a data scientist to analyze the above sales data set. When you open the data set, you see the
following structure in Tom Thumb’s data:
Product ID Sales unit (I) Price (II) Month (III) Likability (IV)
1 High Regular price January 4.2
2 High Promoted price January 4.7
3 Low Regular price March 3.1
4 Average Promoted price February 3.7
⋮ ⋮ ⋮ ⋮ ⋮
Now, which types of model can be considered as the best to analyze the sales units as factors/functions
of price. Which... Sales unit is either high, average, or low.
• Linear regression.
• Binary logit.
• Multinomial logit.
• A&C.
Qn 2
After receiving the above data set, you decide to talk to the IT-group to see how far you can obtain more
data on sales in Tom Thumb’s branches during the last six months. The IT-group provide you the
following table:
Product ID Sales unit (I) Price (II) Month (III) Likability (IV)
The IT-group told you that the High and Low sale mean that the sales unit is greater than 4000 and less
than 2000 units, respectively. Otherwise, they coded it as the Average sale. They give you the absolute
sales unit in the new table. Also, they told you that when there is no discount, Price is coded as the
Regular price. Otherwise, it is called promoted price. Unfortunately, they do not have the absolute prices
due to a technical issue. The Tom Thumb manager wants you to predict how many units of product
they can sell if they always use Promoted prices. Now, which types of model can be considered to
answer the Tom Thumb manager request?
• Linear regression.
• Multinomial Probit.
• Multinomial logit.
• A&C.
Qn 3
An advertising firm wants to understand the relationship between the number of clicks of an online
advertising banner and number of times the ad has been loaded on pages. Now, Which types of models
can be considered to answer the above ___ issue?
• Linear Regression
• Binary Probit
• Binary Logit
• Ordered Logit
Qn 4
The above advertising firm also wants to understand the relationship between which of the two online
advertising banners will be clicked by a subject, based on their two different designs. Now, which types
of model can be considered as the best to answer the above ___ issue?
• Linear Regression
• Binary Probit
• Binary Logit
• B and C
Qn 5
The above advertising firm wants to understand the relationship between which of online advertising
banners will be clicked by a subject, based on their different designs and sizes. The marketing team
considered two types of design Normal vs. fancy and two types of size Small vs. Large. In total, the
marketing team provides 4 = 2 X 2 advertising banners. After collecting the data, which type of model
can be used to analyze the data?
• Linear Regression
• Multinomial Probit
• Multinomial Logit
• Band C
Qn 6
Amazon wants to know the effect of observed average price of product i on the rating of customers
about it. On amazon website, each product can be rated either 1, 2, 3, 4, or 5 stars. If Amazon wants to
study
Rating i = f (price i) + e
• Multinomial Probit
• Ordered Probit
• Multinomial Logit
• Linear Regression
Qn 7
Tax fraud is an important economic issue in every country. IRS (The USA Tax System) wants to predict the
USA tax income in the next ten years based on people's income database in the past years. Clearly, every
citizen prefers to pay lower income tax. If IRS wants to study
where i denotes person i in the income database, which types of model is more appropriate to predict
the USA tax income in following years?
• Multinomial Probit because of violation of lIA property. High income people are similar to each
other.
• Tobit Regression
• Truncated Linear regression
• Truncated Linear regression wherein incomei2 must be included to pin down non-linear trend,
i.e., high income people want to pay less tax
Qn 8
Chicago police wants to understand important factors of high crime rate in a neighborhood based on its
demographics. They decided to use their past database which records all crimes occurred in a
neighborhood by an adult over 18 years old, according the crime law in USA. If Chicago police wants to
study
where i denotes neighborhood i in the crime database, which types of model is more appropriate to
predict the crime rate in Chicago?
• Linear regression
• Tobit Regression
• Truncated Regression
• Heckman Two-Steps Selection Model.
Qn 9
A large US bank is interested in predicting the loan default rate. The loan default means a customer
decided to not pay back full amount of his loan on time. The bank decided to use its database on loans'
status in the past 20 years. They observe a dependent variable Default which is 0 if the subject did not
pay back on time. Otherwise, it denotes the amount of the loan that had been paid back. Let's consider
the following systematic relationship
where i denotes customer i in the database, which types of model is more appropriate to predict the
Default in future?
• Truncated regression
• Tobit Regression
• Heckman Two-Steps Selection Model
• B and C
Qn 10
What is the definition of the endogeneity in a regression model Y= β1 X1 + … βp Xp + ε
• There is an unobserved variable Z which is correlated with at least one of our predictors X j
• There is an unobserved variable Z which is correlated with all of our predictors X 1 , …, and X p
• There exists an omitted variable Z which correlated with Y
• There exists a very high level of correlation between at least two predictors X j, and X j’ such that
corr(X j, X j’) > 0.9
Qn 11
A retailer wants to understand the relationship between the observed price in store and customers’
decision of “Buy” or “Not to Buy”. According to its customer dataset, the retailer did a logistic analysis as
follows.
Qn 12
A retailer wants to understand the relationship between the observed price in store and customers
decision of buy or not to buy? According to its customer dataset, the retailer did a logistic analysis as
follows.
Where Φ ε denotes the cumulative distribution of normal distribution. The estimation result is provided
in the below table.
Qn 13
Let’s consider the following table. This table shows the result of a logistic model that have been done on
a conjoint study about cars’ design. In this study, designers show different combination of cars based on
the following attributes the number of seats, the cargo space, the engine, and the price of a car.
Moreover, the designers ask about the marital status of participant. You can consider “marital status” as
dummy variable that is 0 if the participant is married. Otherwise, “married status” is 1, i.e., the
participant either single, divorced or separated. In the following table, you can assume that a car with 2
sears, 2ft cargo, and Gas engine has been chosen as the reference categories for the categorical variables
seat, cargo, and engine respectively. Precisely, the researchers estimated the following latent utility
model based on the multinomial logistic approach:
U ij = β price j + β 4 seats I 4 seats + β 6 seats I 6 seats + β 3ft cargo I 3ft cargo + β electrical I electrical + β hybrid I hybrid +I marital status *
(β’ price j + β’ 4 seats I 4 seats + β’ 6 seats I 6 seats + β’ 3ft cargo I 3ft cargo + β’ electrical I electrical + β’ hybrid I hybrid )+ ε
If the price of a car goes up by 20K in the market, based on the above table, which of the following
statement is true?
Based on the above table, which type of car is the most favorite choice for unmarried people?
Qn 15
Based on the above table, which type of car is the most favorite choice for married people?
Qn 16
• The married people receive a higher utility from a greater number of seats than unmarried
people
• The married people receive a higher utility from a smaller number of seats than unmarried
people.
• The married people receive a higher utility from a greater number of seats than unmarried
people if the car has a hybrid engine.
• Since we only observer their choices of alternatives and do not observe the utility of people,
none of the above can be chosen.
Qn 17
• The married people receive a higher utility from smaller cargo space than unmarried people
• The married people receive a higher utility from larger cargo space than unmarried people.
• The married people receive a higher utility from larger cargo space than unmarried people if the
car has an electrical engine.
• Since we only observer their choices of alternatives and do not observe the utility of people,
none of the above can be chosen.
Qn 18
We know that the above conjoint analysis had been done in southern USA’s states. According to USA
culture, southern people marry at early ages. Based on the above table and question 13-17, which of the
following statement is true?
• The manufacturer will maximize its market share if it provides only a car that has all desired
features for married people.
• The manufacturer will maximize its market share if it provides only a car that has all desired
features for unmarried people.
• The manufacturer will maximize its market share if it provides two cars to target all customers
(married and unmarried) by providing all desired features for each group of people.
• None of the above
Qn 19
ROC Model Area AUC Standard Error 95% wald confidence limits
• The Price of Huggies is significantly the best predictor to predict consumers’ choice in the diaper
product category.
• The Price of Huggies and Display of Pampers are significantly the best predictors to predict
consumers’ choice in the diaper product category.
• The Price of Huggies should be better, on average, than display of pampers to predict
consumers’ choice in the diaper product category since its AUC is Larger.
• B AND C
Qn 20
Let’s assume that you have evidence of an asymmetric switching pattern among alternatives in your data
set, i.e. there is ___ against IIA property. What types of model should be used to capture the violation of
the asymmetric switching pattern?