0% found this document useful (0 votes)

44 views6 pages

Data-Science-Assignments

The document outlines an assignment on the fundamentals of data science, covering key concepts such as the interdisciplinary nature of data science, the distinction between data science and machine learning, and the importance of soft skills. It also details the data science process through a book recommendation system, emphasizing stages like problem definition, data collection, and exploratory data analysis. Additionally, it classifies data attributes related to students and discusses the implications of misclassification in binary data.

Uploaded by

fazalabbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views6 pages

Data-Science-Assignments

Uploaded by

fazalabbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Assignment 01

Fundamentals of Data Science

Total Marks: 30 (10 marks per question)

Question 1: The Big Picture of Data Science

Imagine you are explaining Data Science to a friend who thinks it’s just about
"coding and numbers."
● How would you describe Data Science as an interdisciplinary field?
Mention at least three core areas it combines and explain why each is
important.
● How is Data Science different from Machine Learning (ML)? Provide a
real-world example (e.g., predicting weather vs. analyzing climate trends) to
highlight the difference.
● Why are soft skills like storytelling and communication critical for a
Data Scientist? Give an example of how poor communication could lead to a
failed project.
Answer:

The Big Picture of Data Science

Data Science is not just "coding and numbers." Here's how you might break it down:

1. Data Science as an Interdisciplinary Field: Data Science is much more than just writing
code or crunching numbers. It's a mix of various disciplines working together to solve real-world
problems using data. Three core areas it combines are:

 Mathematics and Statistics: This is essential because data science relies on

mathematical models and statistical analysis to extract meaningful insights from data.
Whether it's finding patterns or making predictions, math and stats are at the heart of
every analysis.
 Computer Science: This is the backbone of data science. It's where coding comes in. A
Data Scientist needs to know how to handle, manipulate, and store large datasets, write
algorithms, and build efficient systems for processing data.
 Domain Knowledge: Data science is often applied to specific fields, such as healthcare,
finance, or marketing. Domain knowledge helps a Data Scientist understand the context
of the data and ask the right questions, ensuring that insights are relevant and actionable.
Each of these areas is crucial because they allow Data Scientists to analyze data properly, extract
insights, and apply them in meaningful ways.

2. Data Science vs. Machine Learning (ML): While Data Science and Machine Learning are
related, they are not the same thing.

 Data Science involves the entire process of collecting, analyzing, and interpreting data to
make decisions or gain insights. It includes tasks like data cleaning, statistical analysis,
and data visualization.
 Machine Learning is a subset of Data Science that focuses on using algorithms to learn
from data and make predictions or decisions without being explicitly programmed. ML is
one of the tools a Data Scientist uses.

For example:

 Predicting the weather (Data Science) involves gathering vast amounts of weather data,
analyzing it with statistical models, and providing forecasts.
 Analyzing climate trends (Machine Learning) involves using algorithms to study long-
term climate patterns, identifying trends, and making predictions based on past data. The
focus here is more on learning from data to make forecasts about the future.

3. The Importance of Soft Skills (Storytelling and Communication): Soft skills are critical for
Data Scientists because they help transform complex data insights into clear, actionable
messages that non-experts can understand.

For example, poor communication could lead to a failed project if a Data Scientist identifies
important trends in a dataset but fails to communicate the findings effectively. If the insights are
presented in a confusing or overly technical way, stakeholders might not understand their
significance and fail to act on them. In contrast, a well-told story backed by data can inspire
action and drive business decisions.

In short, Data Science is a collaborative field that combines technical expertise and
communication skills to turn data into valuable insights for decision-making.
●

Question 2: The Data Science Process in Action

You are tasked with building a system to recommend books to users based on their
preferences.
● List and briefly explain the key stages of the Data Science process you
would follow for this project.
● Why is Exploratory Data Analysis (EDA) important before building the
model? Mention two specific tasks you’d perform during EDA (e.g.,
detecting outliers, checking data types).
● How would you evaluate the final model? Name one metric to assess its
performance.

Answer:
Question 2: The Data Science Process in Action

Key Stages of the Data Science Process:

1. Problem Definition:
Start by clearly understanding the problem at hand. In this case, the goal is to build
a system that recommends books to users according to their preferences. You need
to identify the type of recommendations (e.g., content-based, collaborative
filtering) and the required data to build the model.
2. Data Collection:
Gather relevant data, such as user ratings, book details (genres, authors, etc.), and
user profiles. This can come from databases, APIs, or publicly available datasets.
3. Data Cleaning and Preprocessing:
Clean the collected data to handle missing values, duplicates, and irrelevant
features. Normalize or scale the data if necessary, and convert categorical data
(like book genres) into numerical values using techniques like one-hot encoding.
4. Exploratory Data Analysis (EDA):
EDA is done to understand the data better. This step helps in identifying patterns,
correlations, and any anomalies. Visualizations like histograms and scatter plots
are useful for this.
5. Feature Engineering:
Create new features from the raw data that may enhance the model’s performance.
For example, combining user preferences or extracting metadata features from
book descriptions can add value.
6. Model Selection and Training:
Choose an appropriate model based on the problem type (e.g., collaborative
filtering, content-based filtering, or hybrid methods). Split the data into training
and testing sets and train the model using the training data.
7. Model Evaluation:
Assess the performance of the model using suitable metrics like precision, recall,
or RMSE (Root Mean Squared Error). Adjust model parameters as needed and
retest.
8. Deployment and Monitoring:
Once the model performs well, deploy it to a production environment where it can
recommend books to real users. Monitor its performance continuously and update
it as new data comes in.

Why is Exploratory Data Analysis (EDA) Important Before Building the Model?

Exploratory Data Analysis (EDA) is critical because it helps you understand the structure
of your data, the relationships between different features, and any issues that could affect
the model's performance.

 Detecting Outliers: Outliers can distort model predictions. Identifying and

handling outliers helps ensure that the model doesn't place too much weight on
extreme values.
 Checking Data Types: Ensuring that the data types (numerical, categorical) are
correctly assigned allows proper processing and avoids errors during modeling.
For example, if a categorical feature is mistakenly treated as a numerical feature, it
could lead to inaccurate results.
Question 3: Understanding Data Attributes
A dataset contains information about students in a school, including:
● Height (in cm)
● Favorite Subject (Math, Science, Arts)
● Exam Pass/Fail Status (Yes/No)
● Student ID (e.g., S001, S002)
For each attribute above:
● Classify its type (Nominal, Binary, or Other) and justify your answer.
● Which attribute is asymmetric binary? Explain why it’s asymmetric with a
real-world consequence (e.g., how misclassifying a "Fail" as "Pass" could
impact students).
● Why can’t we calculate the "average" of Student ID? Relate your answer
to the properties of nominal attributes.

Answer 3: Understanding Data Attributes

A dataset contains information about students in a school, including:

 Height (in cm)

 Favorite Subject (Math, Science, Arts)
 Exam Pass/Fail Status (Yes/No)
 Student ID (e.g., S001, S002)

For each attribute above:

 Classify its type (Nominal, Binary, or Other) and justify your answer.
 Which attribute is asymmetric binary? Explain why it’s asymmetric with
a real-world consequence (e.g., how misclassifying a "Fail" as "Pass" could
impact students).
 Why can’t we calculate the "average" of Student ID? Relate your answer
to the properties of nominal attributes.

Classification and Justification:

1. Height (in cm):

o Type: Other (Continuous or Quantitative)
o Justification: Height is a numerical value that can take any real
number within a range, making it a continuous variable. It’s used to
measure a quantity (the student's height).
2. Favorite Subject (Math, Science, Arts):
o Type: Nominal
oJustification: The favorite subject is a categorical variable with no
inherent order or ranking. Math, Science, and Arts are different
categories that represent preferences, but there’s no natural order
among them.
3. Exam Pass/Fail Status (Yes/No):
o Type: Binary
o Justification: This attribute has two possible values (Yes or No),
representing a binary outcome. It’s a categorical variable but with
only two categories, making it binary.
4. Student ID (e.g., S001, S002):
o Type: Nominal
o Justification: Student IDs are labels used to uniquely identify each
student. These IDs don’t have any mathematical significance and don't
follow a specific order. Each ID represents a unique individual, but
the numbers or characters are arbitrary, making them nominal.

Asymmetric Binary Attribute:

 Asymmetric Binary Attribute: Exam Pass/Fail Status (Yes/No)

 Explanation: The exam pass/fail status is an asymmetric binary attribute
because the consequences of a misclassification can have serious real-world
impacts. For example, if a student who has failed is incorrectly classified as
having passed, they might not receive the necessary support or intervention
to improve, which could affect their academic progress and future
opportunities. On the other hand, misclassifying a "Pass" as "Fail" may lead
to unnecessary intervention but won’t jeopardize the student’s future as
much as the reverse error.

Why Can’t We Calculate the "Average" of Student ID?

 Reason: We cannot calculate the "average" of Student IDs because Student

ID is a nominal attribute.
 Explanation: Nominal attributes are categorical and do not have a
meaningful order or numerical relationship. An ID like "S001" represents a
unique student, but it doesn't have any inherent numerical meaning.
Averaging nominal data doesn't make sense because the IDs are simply
labels used for identification, not quantities that can be averaged or
calculated in any meaningful way.

PSK Unit 1 Merged
No ratings yet
PSK Unit 1 Merged
125 pages
Data Science Module 1 q & A
No ratings yet
Data Science Module 1 q & A
16 pages
DS Handout 1
No ratings yet
DS Handout 1
4 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
30 pages
FDS Important Questions Detailed
No ratings yet
FDS Important Questions Detailed
10 pages
IDS_Crispy_Notes
No ratings yet
IDS_Crispy_Notes
36 pages
ixs8h-l8mgc
No ratings yet
ixs8h-l8mgc
40 pages
21CS64 Data Science and Visualization (PE)
No ratings yet
21CS64 Data Science and Visualization (PE)
37 pages
data science unit 1
No ratings yet
data science unit 1
30 pages
Data Science
No ratings yet
Data Science
10 pages
Class 9 (Chap #4)
No ratings yet
Class 9 (Chap #4)
9 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
FDS For Sem
No ratings yet
FDS For Sem
11 pages
Data Science excercises (Chaprers 1-4)
No ratings yet
Data Science excercises (Chaprers 1-4)
4 pages
Chapter No.4 Exercise Solution (Computer)
No ratings yet
Chapter No.4 Exercise Solution (Computer)
8 pages
DS 3-MARKS SEMESETER SUGGESTION (2)
No ratings yet
DS 3-MARKS SEMESETER SUGGESTION (2)
54 pages
QB for DS - V Sem Students
No ratings yet
QB for DS - V Sem Students
23 pages
Many to Many relation
No ratings yet
Many to Many relation
1 page
sfds aat
No ratings yet
sfds aat
8 pages
1 Introduction To Data Science
No ratings yet
1 Introduction To Data Science
14 pages
Ads TopperSh
No ratings yet
Ads TopperSh
50 pages
Digital Forensics Autopsy
0% (1)
Digital Forensics Autopsy
13 pages
question bank with answers
No ratings yet
question bank with answers
103 pages
Datasciencevictoryy
No ratings yet
Datasciencevictoryy
16 pages
IA_generativa_paraelnegocio
No ratings yet
IA_generativa_paraelnegocio
23 pages
Data Science
No ratings yet
Data Science
11 pages
datas_unit1
No ratings yet
datas_unit1
20 pages
MCQs English
No ratings yet
MCQs English
64 pages
What Is Qualitative Research Martyn Hammersley All Chapters Instant Download
100% (2)
What Is Qualitative Research Martyn Hammersley All Chapters Instant Download
55 pages
Redgate 2025 State of the Database Landscape Report
No ratings yet
Redgate 2025 State of the Database Landscape Report
49 pages
Data Science Management_vss
No ratings yet
Data Science Management_vss
84 pages
Data Science
No ratings yet
Data Science
14 pages
Scanned 20241018-1707 Page2 Image2
No ratings yet
Scanned 20241018-1707 Page2 Image2
7 pages
data scince report
No ratings yet
data scince report
11 pages
UNIT I Material
No ratings yet
UNIT I Material
25 pages
Da&ml PPT-1
No ratings yet
Da&ml PPT-1
35 pages
Introduction Data Science Edited
No ratings yet
Introduction Data Science Edited
33 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
Key Concepts and Steps in Quanti and Quali Research
No ratings yet
Key Concepts and Steps in Quanti and Quali Research
51 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes Part A: Content Design
6 pages
DS
No ratings yet
DS
94 pages
Lecture 8-9 Practice Qs Solution
No ratings yet
Lecture 8-9 Practice Qs Solution
3 pages
Deep-Learning-Assignments-02
No ratings yet
Deep-Learning-Assignments-02
3 pages
Data-Science
No ratings yet
Data-Science
14 pages
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
No ratings yet
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
7 pages
Examples Domain of Function
No ratings yet
Examples Domain of Function
2 pages
FDS UNIT 1 QB
No ratings yet
FDS UNIT 1 QB
7 pages
Python-Programming-Assignment-01
No ratings yet
Python-Programming-Assignment-01
2 pages
Skripsi PDF
No ratings yet
Skripsi PDF
97 pages
Unit - V Database Administration
No ratings yet
Unit - V Database Administration
37 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Data Science (Quick Guide) for College Exams
No ratings yet
Data Science (Quick Guide) for College Exams
34 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
FDM 2024 Assignment II
No ratings yet
FDM 2024 Assignment II
2 pages
Internship Report 2023-24 Data Science
100% (2)
Internship Report 2023-24 Data Science
23 pages
Google Meet Joining
No ratings yet
Google Meet Joining
1 page
6220010
No ratings yet
6220010
37 pages
AI-for-Parents-Assignment-02
No ratings yet
AI-for-Parents-Assignment-02
1 page
CIT 831 SOLVED 2020 - 1 and 2021 - 2
No ratings yet
CIT 831 SOLVED 2020 - 1 and 2021 - 2
10 pages
NORMALIZATION
No ratings yet
NORMALIZATION
11 pages
HR Analytics and People Analytics
No ratings yet
HR Analytics and People Analytics
7 pages
Data Science
No ratings yet
Data Science
10 pages
118-Article Text-881-1-10-20230330
No ratings yet
118-Article Text-881-1-10-20230330
11 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
1. Introduction to Data Science
No ratings yet
1. Introduction to Data Science
12 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
Python-Programming-Assignment-02
No ratings yet
Python-Programming-Assignment-02
5 pages
Big Data Engineer Resume Example
No ratings yet
Big Data Engineer Resume Example
1 page
Window Functions and Syntax (Slides)
No ratings yet
Window Functions and Syntax (Slides)
14 pages
Dissertation Process Steps
100% (2)
Dissertation Process Steps
5 pages
The Effects of Using Business Intelligence Systems On An Excellence Management and Decision-Making Process by Start-Up Companies: A Case Study
No ratings yet
The Effects of Using Business Intelligence Systems On An Excellence Management and Decision-Making Process by Start-Up Companies: A Case Study
11 pages
EMC VNX Series: SMI-S Provider Programmer Guide For VNX
No ratings yet
EMC VNX Series: SMI-S Provider Programmer Guide For VNX
185 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Quiz Sistem Basis Data
No ratings yet
Quiz Sistem Basis Data
77 pages
Seagate® Barracuda™ 120 SSD: Product Manual
No ratings yet
Seagate® Barracuda™ 120 SSD: Product Manual
24 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Introduction to Data Science __ 23CSH-283
100% (1)
Introduction to Data Science __ 23CSH-283
48 pages
Educational Governance in Myanmar Towards Leveling - Up To Global IR 4.0 Standards
No ratings yet
Educational Governance in Myanmar Towards Leveling - Up To Global IR 4.0 Standards
17 pages
Unit I
No ratings yet
Unit I
52 pages
Total Variance Explained: 1. Prepare and Organize Your Data
No ratings yet
Total Variance Explained: 1. Prepare and Organize Your Data
2 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
text File Handling Program
No ratings yet
text File Handling Program
5 pages
Classification and Tabulation of Data
No ratings yet
Classification and Tabulation of Data
15 pages
Project of It On Punjab Group of Colleges: Submitted To
No ratings yet
Project of It On Punjab Group of Colleges: Submitted To
59 pages
ACIF File Generation: Formdefs, Pagedefs and Fonts) ACIF Indexing and AFP Output File
No ratings yet
ACIF File Generation: Formdefs, Pagedefs and Fonts) ACIF Indexing and AFP Output File
3 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Oracle Database Assessment Draf
No ratings yet
Oracle Database Assessment Draf
17 pages
Unit 1 DBMS
No ratings yet
Unit 1 DBMS
201 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Day 1-2 SQL Server Architecture
No ratings yet
Day 1-2 SQL Server Architecture
109 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
32 pages
Adva Netconf CallHome
No ratings yet
Adva Netconf CallHome
16 pages
AI-for-Kids-Assignment-45
No ratings yet
AI-for-Kids-Assignment-45
4 pages
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
No ratings yet
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
6 pages
Dbms Lab RECORD
No ratings yet
Dbms Lab RECORD
56 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet

Data-Science-Assignments

Uploaded by

Data-Science-Assignments

Uploaded by

Assignment 01

Fundamentals of Data Science

Total Marks: 30 (10 marks per question)

Question 1: The Big Picture of Data Science

The Big Picture of Data Science

 Mathematics and Statistics: This is essential because data science relies on

Question 2: The Data Science Process in Action

Key Stages of the Data Science Process:

 Detecting Outliers: Outliers can distort model predictions. Identifying and

Answer 3: Understanding Data Attributes

A dataset contains information about students in a school, including:

 Height (in cm)

For each attribute above:

Classification and Justification:

1. Height (in cm):

Asymmetric Binary Attribute:

 Asymmetric Binary Attribute: Exam Pass/Fail Status (Yes/No)

Why Can’t We Calculate the "Average" of Student ID?

 Reason: We cannot calculate the "average" of Student IDs because Student

You might also like