0% found this document useful (0 votes)

25 views

Data Science - Test Module

Uploaded by

Mukarram Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Data Science - Test Module

Uploaded by

Mukarram Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Welcome to our Introduction to Data Science Test Module.

Greetings future data scientists and professionals eager to dive into the world of data science. Whether
you're here to upskill, gain insights into potential career pathways, or develop foundational expertise,
we're thrilled to have you join us.
Please Note: Before you begin to fill out this document, kindly make a copy and rename it to ‘Your
Name_ DS Testing’, e.g. “Ainee_DS Testing”

About the Course

At LUMSx, we believe in empowering you, our learners with the ever-evolving field of Data Science,
where we will lay the foundation of descriptive statistics, and solve the intricacies of data biases. We’ll
then dive into the art of statistical inference and machine learning, where hypotheses are tested.
Together, we'll understand models and algorithms, exploring concepts like regression and classification,
as we try to unlock the fundamental challenges of learning. As our journey nears its end, we'll equip
ourselves with the tools and ethics needed to navigate scalable data collection and processing.

Learning Outcomes
By the end of this course, learners will be able to:
- Conduct sound data analysis.
- Describe a given data set and assess its quality.
- Understand issues in data collection.
- Build data pipelines (collection, cleaning, EDA, modeling, evaluation, results) for “repeatable”
work.
- Become well-versed with tools and technologies for data analysis (e.g., Pandas, sci-kit-learn)
- Understand the theory behind drawing inferences from data.
- Communicate results effectively.

For testing purposes: We will be sharing videos for one lesson of module 1, one quiz, and one data
assignment with you.
As you go through the content, here are some friendly reminders:
- Follow the sequence of the course material as outlined in this document. Try not to skip any
sections.

- Remember, you do not have to go through all the material in one sitting. You have 2 weeks to go
through the material. The data assignment will take around 5-6 hours. Hence you can divide it
over two weeks and complete it in chunks.

- Feel free to rewind, pause, and replay the video if needed for repetition.
- Attempt the quiz at the end and try to answer it independently. While answer keys will be
provided, these questions are designed for practice purposes and will not be graded.
- If you have any comments or feedback on your working document that you would like the
LUMSx team to view, don’t hesitate to send it in an email to us.

Thank you!

Module-1: Descriptive Statistics, Data Acquisition, and Tools

What are some possible sources of bias when collecting data? How do I sift through my data using data science tools (e.g.,
Pandas)?

Lesson- 4: Data Manipulation Using Pandas II

M1_L4_V1_Pandas Str Methods.mp4

M1_L4_V2_Pandas Sorting.mp4
M1_L4_V3_Pandas Groupby.mp4
M1_L4_V4_Groupby Other Features.mp4
M1_L4_V5_Pivot Tables.mp4

Data Assignment:
Click here to attempt the Data assignment.

Quiz:
Select ONE correct answer for each multiple choice question.

1. A restaurant hygiene inspector for a chain with multiple locations randomly selects some of their
locations for a cleanliness check of their kitchens. The inspectors check every kitchen in the
locations that were chosen. What type of sample is this?
a. Cluster sampling
b. Stratified sampling
c. Convenience sampling
2. You have a dataframe called quizScores with column names “1”, “2”, and “3”. The dataframe
contains 10 rows. What will be the result of the following line of code:

quizScores[[“1”]][1]

a. A Dataframe with the second element of the column “1”

b. A Series with the second element of the column “1”
c. Neither, since this will throw a key error
3. What is the primary drawback of quota sampling?
a. It is time consuming
b. It can introduce bias
c. It requires a large sample size
4. In a study where you want to find the top career choices for university students in Pakistan, you
visit the top 3 most expensive universities in the country to gather your data. What kind of bias
could most likely be present in your data?
a. Selection Bias
b. Non-response Bias
c. None of the above
5. Consider the following Dataframe named menu

Which of the following will return the names (“Menu Item”), prices (“Price”), and calories
(“Calories”) of all items with price below 400 and calories below 500

a. menu.loc[(menu["Price"]<400) & (menu["Calories"]<500), "Menu Item":"Review"]

b. menu.loc[(menu["Price"]<400) & (menu["Calories"]<500), "Menu Item":"Calories"]
c. menu.iloc[[1,2,5,6,7], 0:2]

Answers and Explanations:

MCQ Option Correct/Incorrect Explanation

a Correct The inspector randomly selects some locations (clusters)

and checks every kitchen within those selected locations,
making it a cluster sampling method.

b Incorrect Stratified sampling involves dividing the population into

homogeneous groups (strata) and then randomly selecting
samples from each group. This scenario does not involve
1
dividing the population into strata or any random
selection of samples from within them.

c Incorrect Convenience sampling involves selecting the most readily

available individuals or units as samples, rather than
using random selection. This scenario does not involve
convenience sampling as the selection is random.

a Incorrect The line of code will return an error and not a DataFrame
(see explanation of the error below)

b Incorrect The line of code will return an error and not a Series (see
explanation of the error below)

2 c Correct quizScores[[“1”]] will return a DataFrame with only one

column named “1”. Then, quizScores[[“1”]][1] tries to
create a Series using values in a column named 1 from the
returned DataFrame. However, no such column exists, as
the one in the previously returned DataFrame has the
name “1” (string) and not 1 (int). Thus, we get a
KeyError.

a Incorrect While sampling methods vary in time requirements, this

is not the primary drawback of quota sampling.

b Correct Quota sampling may lead to biased results because

individuals are not randomly selected, but rather chosen
3
based on predetermined characteristics.

c Incorrect Quota sampling does not necessarily require a large

sample size; it depends on the specific quotas set for the
sample.

4 a Correct By only sampling from the top 3 most expensive

universities, you may not capture the perspectives and
career aspirations of students from other socioeconomic
backgrounds, leading to a biased representation of career
choices.

b Incorrect Non-response bias occurs when certain groups within the

sample are more likely to respond to the survey than
others, but it doesn't directly relate to the method of
selecting the sample, as in this scenario.

c Incorrect Selection bias is likely present due to the limited and

specific selection of universities, which doesn't represent
the entire population of university students in Pakistan.

a Incorrect Since loc is inclusive for the end index when selecting a
range, the given code selects columns "Menu Item"
through "Review" for items meeting the criteria, but it
includes "Review" which wasn't asked for.

b Correct Since loc is inclusive for the end index when selecting a
range, the given code selects columns "Menu Item"
5
through "Calories" for items meeting the criteria,
fulfilling the requirements.

c Incorrect Since iloc is exclusive for the end index when selecting a
range, the given code selects columns "Menu Item"
through "Price" for items meeting the criteria, but does
not include "Review" which was required.

ISTQB Advanced Level Technical Test Analyst- Exam Insights: Q&A with Explanations
From Everand
ISTQB Advanced Level Technical Test Analyst- Exam Insights: Q&A with Explanations
SUJAN
No ratings yet
Who's #1?: The Science of Rating and Ranking
From Everand
Who's #1?: The Science of Rating and Ranking
Amy N. Langville
4.5/5 (4)
Inspection and Quality Control PDF
50% (2)
Inspection and Quality Control PDF
6 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
Set 1
No ratings yet
Set 1
2 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
DS IMP QB (E-Next - In)
No ratings yet
DS IMP QB (E-Next - In)
4 pages
data science practicals
No ratings yet
data science practicals
47 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
Set-B_CT2_ AnswerKey
No ratings yet
Set-B_CT2_ AnswerKey
10 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
IDS Syllabus
No ratings yet
IDS Syllabus
3 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
Set-D_CT2_answerKey
No ratings yet
Set-D_CT2_answerKey
11 pages
KEY IP PRE BOARD 2024-25 (1)
No ratings yet
KEY IP PRE BOARD 2024-25 (1)
10 pages
accounting paper
No ratings yet
accounting paper
6 pages
Informatics Practices - Marking Scheme
No ratings yet
Informatics Practices - Marking Scheme
6 pages
Computational
No ratings yet
Computational
7 pages
Data Science Assignment
No ratings yet
Data Science Assignment
9 pages
ESA- QP_UE19-20CS203_SDS_Scheme and Solution
No ratings yet
ESA- QP_UE19-20CS203_SDS_Scheme and Solution
12 pages
Practical Exam Papers (2024)(Set - 1 and 2)with solutions
No ratings yet
Practical Exam Papers (2024)(Set - 1 and 2)with solutions
8 pages
4227 GUI Ebook Data Science Interview Guide
No ratings yet
4227 GUI Ebook Data Science Interview Guide
25 pages
Data Science
No ratings yet
Data Science
15 pages
Datascience
No ratings yet
Datascience
8 pages
Aids-B Ii-Ii DSP Lab LP
No ratings yet
Aids-B Ii-Ii DSP Lab LP
2 pages
S7 Practice Questions
No ratings yet
S7 Practice Questions
7 pages
IS 364_Data Mining and Data Science for Cyber Security
No ratings yet
IS 364_Data Mining and Data Science for Cyber Security
6 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
ds
No ratings yet
ds
28 pages
Lecture 4 Data Pre-Processing
No ratings yet
Lecture 4 Data Pre-Processing
43 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
self-practical-file-Tina-Gupta
No ratings yet
self-practical-file-Tina-Gupta
45 pages
12 IP PB1 JPR MS
No ratings yet
12 IP PB1 JPR MS
10 pages
Python for ML
No ratings yet
Python for ML
41 pages
DWR TEE PAPER
No ratings yet
DWR TEE PAPER
8 pages
class-12-preBoard-paper (1)
No ratings yet
class-12-preBoard-paper (1)
15 pages
Half Yearly Examination 2022-23 PT2: Class XII
No ratings yet
Half Yearly Examination 2022-23 PT2: Class XII
7 pages
Course Outline (Ds & Ai) 2024
No ratings yet
Course Outline (Ds & Ai) 2024
13 pages
UNIT -4 -PART 2
No ratings yet
UNIT -4 -PART 2
36 pages
Python For DA
100% (2)
Python For DA
47 pages
Ip CLSS Xii 2024-25 Hy
No ratings yet
Ip CLSS Xii 2024-25 Hy
14 pages
Paper 2.
No ratings yet
Paper 2.
5 pages
Course Outline - FM217
No ratings yet
Course Outline - FM217
4 pages
Grade 12 IP - Practical File Questions 2024-2025
No ratings yet
Grade 12 IP - Practical File Questions 2024-2025
6 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
Ecotrix
No ratings yet
Ecotrix
8 pages
Question Bank1
No ratings yet
Question Bank1
9 pages
Foundation of Data Science previous year question paper
No ratings yet
Foundation of Data Science previous year question paper
40 pages
Informatics Practices Practical List22-2323
No ratings yet
Informatics Practices Practical List22-2323
6 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
Statistics and Analytics - 20sc02p
No ratings yet
Statistics and Analytics - 20sc02p
11 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Crack_Data_Science_Interview_�_1731300339
No ratings yet
Crack_Data_Science_Interview_�_1731300339
132 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Occupational Stress Is Increasingly A Significant Source of Economic Loss and An Important Occup2
No ratings yet
Occupational Stress Is Increasingly A Significant Source of Economic Loss and An Important Occup2
13 pages
Coursebook - 34
No ratings yet
Coursebook - 34
39 pages
Marking Scheme Example: (A) Written Examinations
100% (1)
Marking Scheme Example: (A) Written Examinations
2 pages
STRM060 - Workshop 1 - Introduction To IBN and The Nature of Negotiation
0% (1)
STRM060 - Workshop 1 - Introduction To IBN and The Nature of Negotiation
24 pages
CV Maruf
No ratings yet
CV Maruf
3 pages
Black Book On CSR Tcsdocx Compress Tanu
No ratings yet
Black Book On CSR Tcsdocx Compress Tanu
54 pages
Ethics in Research
50% (2)
Ethics in Research
30 pages
Effectiveness of Teachers Training in Assessment
No ratings yet
Effectiveness of Teachers Training in Assessment
5 pages
Thesis Marriage
100% (2)
Thesis Marriage
6 pages
Personal Values and Relationships Lesson Plan
No ratings yet
Personal Values and Relationships Lesson Plan
3 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
The Popularity Of Basic Income Evidence From The Polls Tijs Laenen download
No ratings yet
The Popularity Of Basic Income Evidence From The Polls Tijs Laenen download
89 pages
Edp 8
No ratings yet
Edp 8
15 pages
Advertising As A Means of Communication
No ratings yet
Advertising As A Means of Communication
16 pages
The Five Components of A Communication Model Communication Skills
No ratings yet
The Five Components of A Communication Model Communication Skills
2 pages
PAL-I-Demo
No ratings yet
PAL-I-Demo
16 pages
Brochure Online Internship Summer Training and Orientation Program During 20 June-20 10 July_2025.Pptx
No ratings yet
Brochure Online Internship Summer Training and Orientation Program During 20 June-20 10 July_2025.Pptx
6 pages
ww1 Matrix Website
No ratings yet
ww1 Matrix Website
2 pages
(Autism & Developmental Language Impairments 2019-Jan Vol. 4) Vanegas, Sandra B - Academic Skills in Children With Auti
No ratings yet
(Autism & Developmental Language Impairments 2019-Jan Vol. 4) Vanegas, Sandra B - Academic Skills in Children With Auti
10 pages
Introduction to Communication Research 3rd Edition John C. Reinard - Download the ebook now for full and detailed access
100% (2)
Introduction to Communication Research 3rd Edition John C. Reinard - Download the ebook now for full and detailed access
59 pages
Innocent Kibona Research Report Kb 1
No ratings yet
Innocent Kibona Research Report Kb 1
46 pages
Data Analysis
No ratings yet
Data Analysis
12 pages
Engel Kollat Blackwell Model of Consumer Behavior: Mathews Thankachan 2019-31-037
No ratings yet
Engel Kollat Blackwell Model of Consumer Behavior: Mathews Thankachan 2019-31-037
2 pages
Where can buy Promoting Positive Parenting An Attachment Based Intervention 1st Edition Femmie Juffer ebook with cheap price
100% (4)
Where can buy Promoting Positive Parenting An Attachment Based Intervention 1st Edition Femmie Juffer ebook with cheap price
81 pages
Literature Review On Shoes
100% (2)
Literature Review On Shoes
4 pages
CV Latest 2018
No ratings yet
CV Latest 2018
3 pages
Stem Cell Research, Publics' and Stakeholder Views: Robin Downey and Rose Geransar
No ratings yet
Stem Cell Research, Publics' and Stakeholder Views: Robin Downey and Rose Geransar
17 pages
Memoire
No ratings yet
Memoire
66 pages
Probe 2004.4 Fall
No ratings yet
Probe 2004.4 Fall
10 pages

Data Science - Test Module

Uploaded by

Data Science - Test Module

Uploaded by

Welcome to our Introduction to Data Science Test Module.

About the Course

Module-1: Descriptive Statistics, Data Acquisition, and Tools

Lesson- 4: Data Manipulation Using Pandas II

M1_L4_V1_Pandas Str Methods.mp4

a. A Dataframe with the second element of the column “1”

a. menu.loc[(menu["Price"]<400) & (menu["Calories"]<500), "Menu Item":"Review"]

Answers and Explanations:

a Correct The inspector randomly selects some locations (clusters)

b Incorrect Stratified sampling involves dividing the population into

c Incorrect Convenience sampling involves selecting the most readily

2 c Correct quizScores[[“1”]] will return a DataFrame with only one

a Incorrect While sampling methods vary in time requirements, this

b Correct Quota sampling may lead to biased results because

c Incorrect Quota sampling does not necessarily require a large

4 a Correct By only sampling from the top 3 most expensive

b Incorrect Non-response bias occurs when certain groups within the

c Incorrect Selection bias is likely present due to the limited and

You might also like