0% found this document useful (0 votes)
19 views35 pages

Chapter 1 - Intr To DS and Business Understanding

Uploaded by

azharzad1848
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views35 pages

Chapter 1 - Intr To DS and Business Understanding

Uploaded by

azharzad1848
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Introduction to Data Science

& Business Understanding


Chapter 1
Objectives
At the end of this chapter, students will be able to
• Define each of the following key terms: Data Science
• Discuss basic characteristics of the Data Science
Projects
• Discuss the steps in the Data Science Project
• Describe how business understanding is important in
any data science project
• Understand what is a problem domain and related
requirements
• Identify key business processes involved
• Define the problem statement and objectives
Overview of the topics covered
• Basics of Data Science
• Data Science Components
• Data Science Process
• Data Science Applications
• Data Science Tools
• Business Understanding
• Overview of problem statement and
Introduction
What is Data Science?
• Data science is one of the most exciting emerging
fields.
• Data science is a term given to the practice of analyzing
raw data to discover any hidden patterns.
• Various applications and tools such as machine learning
and sophisticated algorithms are all used in this
process.
• It can be applied to both structured and unstructured
data.
• Data science is a more in-depth, detailed way of
analysing data than data analytics.
What is Data Science?
• Data scientists employ exploratory analysis of current or
past data using sophisticated tools to uncover new
insights and predict future events.
• Diagnostic analytics are used for discovery or to
determine what had happened in the past.
• This makes it useful for predictive analytics. These are
models that predict the possibility of a certain event
occurring in the future.
• It is also useful for prescriptive analytics, intelligent
models capable of making their own decisions and
learning within dynamic parameters.
Why do we need Data Science?
• Until recently data was structured and small in size. It
was able to be analyzed either manually or with the use
of simple tools and algorithms.
• Today due to technological developments more and
more data are produced. This is often semi-structured,
or completely unstructured. (80% of data is
unstructured)
• Handling, processing and analyzing huge amount of
data require some complex, powerful, and efficient
algorithms and technology termed as data Science.
• With the help of data science technology, we can
convert the massive amount of raw and unstructured
data into meaningful insights
Data Science Components
Data Science Components
The main components of Data Science are given below:
• Statistics: Statistics is a way to collect and analyze the numerical
data in a large amount and finding meaningful insights from it.
• Domain Expertise: In data science, domain expertise binds data
science together.
• Domain expertise means specialized knowledge or skills of a
particular area.
• Data engineering: Data engineering is a part of data science,
which involves acquiring, storing, retrieving, and transforming the
data. Data engineering also includes metadata (data about data)
to the data.
Data Science Process
Step 1: Frame the problem
• The first thing you have to do before you solve a problem is to
define exactly what it is.
• You need to be able to translate data questions into something
actionable.
Step 2: Collect the raw data needed for your problem
• Once you’ve defined the problem, you’ll need data to give you the
insights needed to turn the problem around with a solution.
Data Science Process
Step 3: Process the data for analysis
• Data can be quite messy, especially if it hasn’t been well-
maintained.
• Check for Following Errors
• Missing values, for example some of the student marks missing.
• Corrupted values, such as invalid entries.
• Time zone differences, perhaps your database doesn’t take into
account the different time zones of your users.
• Date range errors, perhaps you’ll have dates that makes no sense,
such as date registered from before sales started.
Data Science Process
Step 4: Exploratory data analysis
• There are two main goals to exploratory data analysis.
• The first is you want to know if the data that you have is suitable for
answering the question that you have.
• Is there enough data?
• Are there too many missing values?
• Am I missing certain variables or do I need to collect more data to
get those variables, etc?
• The second goal of exploratory data analysis is to start to develop a
sketch of the solution.
• Apply your statistical, mathematical and technological knowledge
Data Science Process
Step 5: Formal modeling
• The formal modeling phase is the way to specifically write down
what questions you’re asking and what parameters you’re trying to
estimate.
• Challenging your model and developing a formal framework is
really important to making sure that you can develop robust
evidence for answering your question.
• And It helps to examine their sensitivity to different assumptions.
Data Science Process
Step 6: Interpretation
• You’ve probably done many different analyses, you probably fit
many different models. And so you have many different bits of
information to think about.
• Part of the challenge of the interpretation phase is to assemble all
of the information and weigh each of the different pieces of
evidence.
• You know which pieces are more reliable, which are are more
uncertain than others, and which more important than others to get
a sense of the totality of evidence with respect to answering the
question.
Data Science Process
Step 7 : Communication
• The last phase is the communication phase.
• Any data science project that is successful will want to
communicate its findings to some sort of audience.
• That audience may be internal to your organization, it may be
external, it may be to a large audience or even just a few people.
Applications of Data Science
Image Recognition and Speech Recognition:
• Automatic image tagging suggestion on Facebook uses image
recognition algorithm.
• “Ok Google, Siri, Cortana", etc., and these devices respond as per
voice control which uses speech recognition algorithm.
Transport:
• Transport industries also using data science technology to create
self-driving cars.
Healthcare:
• Data science is being used for tumor detection, drug discovery,
medical image analysis, virtual medical bots, etc.
Applications of Data Science

Recommendation Systems:
• Many companies like Amazon, Netflix, Google Play, etc., are using
personalized recommendations (suggestions for similar products).
Risk Detection:
• Most of the finance companies/ banks are using data science to
avoid risk and any type of losses with an increase in customer
satisfaction.
Applications of Data Science
Crime Analysis:
• Data Analytics can be used for crime analysis based on the area of
frequent crime, historical pattern (predictive Policing) also used to
predict civil unrest in cities based on social media posts.

Sentiment Analysis:
• Sentiment analysis helps to rapidly read about products and services
on different social platforms
Applications of Data Science
Churn Prediction:
• Predicting which customers will churn will help a company to retain
their existing customers by analyzing historical transactions
Data Analysis in Education:
• Each student can be tracked by showing how much he is reading the
book, what pages he skips, how much he highlights, and whether he
is taking notes etc.
Data Science Tools

Following are some of the tools required for data science:


• Data Analysis tools: R, Python, SAS, MATLAB, Excel,
RapidMiner.
• Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend,
AWS Redshift
• Data Visualization tools: Python, R, Jupyter, Tableau, Cognos.
• Machine learning tools: Spark, Mahout, Azure ML studio
Business Understanding
• Business understanding is a
process of understanding
how our data science
project affects the
business; this means we
need to gain as much
information about the
business as possible to
build our data science
project. In real business, we
could always discuss with
the business user to gain a
better insight.
Process involved
• Business understanding – What does the business need?

• Data understanding – What data do we have / need? Is it


clean?

• Data preparation – How do we organize the data for


modeling?

• Modeling – What modeling techniques should we apply?

• Evaluation – Which model best meets the business


Business Understanding
Business Understanding is a critical stage in the data
science process that involves gaining a deep
understanding of the problem domain and business
requirements. It involves

1. understanding the business problem domain,


2. studying the business processes involved,
3. defining the problem statement, and
4. specifying the objectives of the analysis.
Business Understanding cntd…
Let’s try to understand this considering an example /
case study.

Case Study:

A retail company wants to improve its sales by identifying


the key factors that affect sales.
Understand Problem Domain
• The first step in the Business Understanding stage is to
gain a clear understanding of the business problem and
the domain in which it exists.
• In this case study, we are dealing with a retail company
that wants to improve its sales.
• We need to understand the retail industry, the products
that the company sells, and the customers that it
serves. This can involve conducting research, reviewing
industry reports, and interviewing stakeholders to gain
a clear understanding of the business requirements.
Study the Business Processes
Involved
• The next step is to study the business processes
involved in the sales process.

• This can include understanding how the products are


sourced, how they are marketed and promoted, how
they are priced, and how they are sold.

• This can also involve reviewing the company's existing


sales data to understand the patterns and trends that
exist.
Define the Problem Statement
• Once we have a clear understanding of the business
requirements and the processes involved, we can define
the problem statement.

• In this case study, the problem statement is to identify


the key factors that affect sales.

• We need to define this problem statement in a clear and


concise manner so that we can develop a plan of action
to address it.
Specify the Objectives
• The final step in the Business Understanding stage is to
specify the objectives of the analysis.

• This involves defining the specific outcomes that we


want to achieve through our analysis.

• In this case study, the objectives might include


identifying the most profitable products, understanding
the customer demographics that are most likely to
make a purchase, and identifying the most effective
marketing strategies.
Summary
• To summarize, the Business Understanding stage is a
critical first step in any data science project. It involves
gaining a deep understanding of the problem domain
and business requirements, studying the business
processes involved, defining the problem statement,
and specifying the objectives of the analysis. Through
these steps, we can develop a clear plan of action that
will guide our analysis and help us to achieve our
desired outcomes.
References

• https://datafloq.com/read/data-science-8-powerful-applications/709
0
• https://www.javatpoint.com/data-science
• https://makemeanalyst.com/
• Data Science for Business – What you need to know about Data M
ining and Data Analytic Thinking by Foster Provost & Tom Fawcett
• https://www.tutorialspoint.com/big_data_analytics/big_data_analyti
cs_lifecycle.htm
Practice case study
Case Study: A bank wants to improve customer
satisfaction by identifying the factors that influence
customer loyalty.

Understand Problem Domain / Business Requirements


In this case study, we are dealing with a bank that wants
to improve customer satisfaction. We need to understand
the banking industry, the types of products and services
that the bank offers, and the customers that it serves. We
can conduct research and review industry reports to gain
a clear understanding of the business requirements.
Study the Business Processes
Involved
The next step is to study the business processes involved
in the banking industry. This can include understanding
how the bank acquires customers, how it processes
transactions, how it manages customer accounts, and
how it offers customer service. This can also involve
reviewing the bank's existing customer data to
understand the patterns and trends that exist.
Define the Problem Statement

In this case study, the problem statement is to identify


the factors that influence customer loyalty.
Specify the Objectives

In this case study, the objectives might include

1. identifying the most important factors that influence


customer loyalty,

2. understanding the customer demographics that are


most loyal, and

3. identifying the most effective customer service


strategies.
Practice exercises:
Illustrate the steps of Business Understanding for the
following case study:

Case Study 1:
UTAS-Ibra wants to perform Student Performance
Prediction

Case Study 2:
A beverage company wants to launch a new health drink

You might also like