0% found this document useful (0 votes)
99 views8 pages

Questions

This is questiosn to judiciary

Uploaded by

ashutosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views8 pages

Questions

This is questiosn to judiciary

Uploaded by

ashutosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

The next 8 questions ask you to analyze a dataset of student performance for 1,000 Freshman (1st

year students) at a specific SDUSD high school, where performance is measured across 11
courses taken by those students in the 2023-2024 school year.

Please use the scoresdat.csv dataset.

Lily is the district-wide Manager of Student Achievement and also your direct boss. She
would like to cluster students into cohorts in order to tailor the academic schedules
separately for each cohort.

However, Lily does not want to simply average the 11 test scores together, as certain
scores are likely highly correlated (eg, Math and Physics, English and Writing) and so a
simple average doesn't feel like the right approach to her. She is also wary that using 11
test scores in a clustering algorithm would mean that the algorithm operates in
11-dimensional space, which to her feels high.

You advise her that one option could be to use Principal Components Analysis on the set of
11 test scores to see if there is a lower-dimensional representation of the data that would
suffice.

Q7) Run PCA on the 11 test scores. How many principal components are required to
represent 90% of the variance in the data?

Q8) Lily thanks you for your excellent work and asks you to use the k-means algorithm to
cluster the students, using the first 3 principal components as the "input data" to the
algorithm.

However, she is unsure of how many clusters might exist in this student population. Using
the "Elbow Method" with a Scree Plot, how many clusters best represent these data?

Q9) For the number of clusters selected in the last question, what is the total within-cluster
sum of squares?

Q10) Lily would like to visualize the result. Add the cluster assignment as a categorical (ie
"factor") variable to the original dataset. Create a scatterplot of mathematics scores on the
horizontal axis and english scores on the vertical axis. Color the points according to their
cluster.
After looking at your plots, Lily remarks that some of the colors overlap, ie, that there is not
a "hard boundary" between clusters. When she was a student and took MGT 100, the
in-class example had "hard boundaries" without overlap. She feels like something is
wrong. You assure her that everything is correct. Select the reason(s) that explain why this
"color overlap" happens in this particular case.
a) K is larger here than the in-class example
b) K-means was fit on 3 variables rather than 2
c) Students generally scored lower in Mathematics than in English
d) The plot uses "original" variables whereas K-means was fit on Principal Components
e) Student test-score data is fundamentally different than smartphone ownership data

Question 11
Satisfied with your work, Lily takes your result to a Senior Councilwoman for
SDUSD, Xiaotong, in order to propose amending next-years academic schedule to
accommodate the K clusters you identified above. Xiaotong is curious about the
methods used.

Lily explains that "PCA and K-Means are both unsupervised algorithms that look
for structure in data." Is Lily's statement correct?

Yes, correct

No, incorrect

Question 12

Lily also mentions that "segmenting students into cohorts is both an art and a
science". Is Lily's statement correct?

Yes, correct

No, incorrect

Question 13

Xiaotong, being quite informed of advanced analytic methods, inquires about the
certainty of the result. Lily assures her that "K-means is guaranteed to final the
global minimum value for the within-cluster sum of squares." Is Lily's statement
correct?

Yes, correct

No, incorrect

Question 14

Xiaotong is persuaded with your results, as presented by Lily, and agrees to


re-work the 2024-2025 academic schedule for the Sophomore, Junior, and Senior
(ie, 2nd year, 3rd year, and 4th year) students at this high school to accommodate
the K cohorts of students that you have presented. However, Xiaotong is
uncertain what to do with the in-coming Freshman students.
Explain how you, Lily, and Xiaotong both (1) can assign Sophomore, Junior, and
Senior students into the identified cohorts, and (2) why you will have difficulty
assigning in-coming Freshman students into the identified cohorts.

While you were working for the SDUSD, your friend Aslan got hired at a Rivian, a relatively
new firm that manufactures high-end electric vehicles in the SUV and Truck categories.
You sign a non-disclosure agreement with Rivian, enabling you to work alongside Aslan.

The next 10 questions ask you to help Aslan analyze conjoint data from consumers making
vehicle choices on a survey.

Please use the conjointdat.RData dataset. Use load() to get the data in R. The data are
ready to be used with the mlogit() command (ie, I already did the dfidx / mlogit.data thing so
you don't need to do that).

Q16

Q17
Q18

Q19

Q20

Q21
Q22
Q23

Q24

Q25
Q26

Inspired by using "customer" analytic techniques in examples including combating


homelessness in Utah, you take a job with San Diego's local government to study adoption
of "clean" energy products such as solar panels.

The next 4 questions ask you to analyze a dataset of first-time adopters of solar panels for
residential homes.

Please use the sundat.csv dataset, which has the following 2 variables:

● month - an integer counter of the first 20 months of solar panel sales in San
Diego
● NFTAs - the number of first-time adopters of solar panels (in 1,000's)

Q28
Q29

Q33

You might also like