Data Science - Test Module
Data Science - Test Module
Greetings future data scientists and professionals eager to dive into the world of data science. Whether
you're here to upskill, gain insights into potential career pathways, or develop foundational expertise,
we're thrilled to have you join us.
Please Note: Before you begin to fill out this document, kindly make a copy and rename it to ‘Your
Name_ DS Testing’, e.g. “Ainee_DS Testing”
Learning Outcomes
By the end of this course, learners will be able to:
- Conduct sound data analysis.
- Describe a given data set and assess its quality.
- Understand issues in data collection.
- Build data pipelines (collection, cleaning, EDA, modeling, evaluation, results) for “repeatable”
work.
- Become well-versed with tools and technologies for data analysis (e.g., Pandas, sci-kit-learn)
- Understand the theory behind drawing inferences from data.
- Communicate results effectively.
For testing purposes: We will be sharing videos for one lesson of module 1, one quiz, and one data
assignment with you.
As you go through the content, here are some friendly reminders:
- Follow the sequence of the course material as outlined in this document. Try not to skip any
sections.
- Remember, you do not have to go through all the material in one sitting. You have 2 weeks to go
through the material. The data assignment will take around 5-6 hours. Hence you can divide it
over two weeks and complete it in chunks.
- Feel free to rewind, pause, and replay the video if needed for repetition.
- Attempt the quiz at the end and try to answer it independently. While answer keys will be
provided, these questions are designed for practice purposes and will not be graded.
- If you have any comments or feedback on your working document that you would like the
LUMSx team to view, don’t hesitate to send it in an email to us.
Thank you!
Data Assignment:
Click here to attempt the Data assignment.
Quiz:
Select ONE correct answer for each multiple choice question.
1. A restaurant hygiene inspector for a chain with multiple locations randomly selects some of their
locations for a cleanliness check of their kitchens. The inspectors check every kitchen in the
locations that were chosen. What type of sample is this?
a. Cluster sampling
b. Stratified sampling
c. Convenience sampling
2. You have a dataframe called quizScores with column names “1”, “2”, and “3”. The dataframe
contains 10 rows. What will be the result of the following line of code:
quizScores[[“1”]][1]
Which of the following will return the names (“Menu Item”), prices (“Price”), and calories
(“Calories”) of all items with price below 400 and calories below 500
a Incorrect The line of code will return an error and not a DataFrame
(see explanation of the error below)
b Incorrect The line of code will return an error and not a Series (see
explanation of the error below)
a Incorrect Since loc is inclusive for the end index when selecting a
range, the given code selects columns "Menu Item"
through "Review" for items meeting the criteria, but it
includes "Review" which wasn't asked for.
b Correct Since loc is inclusive for the end index when selecting a
range, the given code selects columns "Menu Item"
5
through "Calories" for items meeting the criteria,
fulfilling the requirements.
c Incorrect Since iloc is exclusive for the end index when selecting a
range, the given code selects columns "Menu Item"
through "Price" for items meeting the criteria, but does
not include "Review" which was required.