0% found this document useful (0 votes)
69 views

Chapter 1 - Intro To Data Science

The document provides an overview of the Applied Data Science course for Semester 3 of 2021-22 at UTAS-Ibri. It discusses data science concepts, components, process, applications and tools. The objectives are to define key terms like data science, discuss characteristics of data science projects and the steps involved. The topics covered include basics of data science, components, process, applications and tools. [END SUMMARY]

Uploaded by

Dr. Sanjay Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Chapter 1 - Intro To Data Science

The document provides an overview of the Applied Data Science course for Semester 3 of 2021-22 at UTAS-Ibri. It discusses data science concepts, components, process, applications and tools. The objectives are to define key terms like data science, discuss characteristics of data science projects and the steps involved. The topics covered include basics of data science, components, process, applications and tools. [END SUMMARY]

Uploaded by

Dr. Sanjay Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UTAS-Ibri, Semester 3, 2021-22

ITSE415-Applied Data Science


CLO-1: Data Science Concepts
Disclaimer

• This presentation is mostly copied / derived from the course material


provided by the IT Specialization Committee as per the requirements of the
course. Students who wish to explore more than the scope of this course
are encouraged to refer the material given in the reference section of CDP.
• The presenter takes the complete responsibility of the mistakes in the
presentation.
• The presenter gratefully acknowledge the IT Specialization Committee for
providing comprehensive readymade material to cover the learning
outcomes.

5/23/2022 Introduction to Database – Chapter 1 2


Objectives

Learning objective
After studying this chapter, you should be able to:
• Define each of the following key terms: Data Science
• Discuss basic characteristics of the Data Science Projects
• Discuss the steps in the Data Science Project

Course Learning outcome covered


1. Understand data science concepts.

5/23/2022 Introduction to Database – Chapter 1 3


Overview of the topics covered

 Basics of Data Science

 Data Science Components

 Data Science Process

 Data Science Applications

 Data Science Tools

5/23/2022 Introduction to Database – Chapter 1 4


Introduction

5/23/2022 Introduction to Database – Chapter 1 5


What is Data Science?
• Data science is one of the most exciting emerging fields.
• Data science is a term given to the practice of analyzing raw data to discover any
hidden patterns.
• Various applications and tools such as machine learning and sophisticated
algorithms are all used in this process.
• It can be applied to both structured and unstructured data.
• Data science is a more in-depth, detailed way of analysing data than data
analytics.
5/23/2022 Introduction to Database – Chapter 1 6
What is Data Science?
• Data scientists employ exploratory analysis of current or past data using
sophisticated tools to uncover new insights and predict future events.
• Diagnostic analytics are used for discovery or to determine what had happened in
the past.
• This makes it useful for predictive analytics. These are models that predict the
possibility of a certain event occurring in the future.
• It is also useful for prescriptive analytics, intelligent models capable of making their
own decisions and learning within dynamic parameters.
5/23/2022 Introduction to Database – Chapter 1 7
Why do we need Data Science?

5/23/2022 Introduction to Database – Chapter 1 8


Why do we need Data Science?
• Until recently data was structured and small in size. It was able to be analyzed
either manually or with the use of simple tools and algorithms.
• Today due to technological developments more and more data are produced. This
is often semi-structured, or completely unstructured. (80% of data is unstructured)
• Handling, processing and analyzing huge amount of data require some complex,
powerful, and efficient algorithms and technology termed as data Science.
• With the help of data science technology, we can convert the massive amount of
raw and unstructured data into meaningful insights.

5/23/2022 Introduction to Database – Chapter 1 9
Data Science Components

5/23/2022 Introduction to Database – Chapter 1 10


Data Science Components
The main components of Data Science are given below:
• Statistics: Statistics is a way to collect and analyze the numerical data in a large
amount and finding meaningful insights from it.
• Domain Expertise: In data science, domain expertise binds data science together.
Domain expertise means specialized knowledge or skills of a particular area.
• Data engineering: Data engineering is a part of data science, which involves
acquiring, storing, retrieving, and transforming the data. Data engineering also
includes metadata (data about data) to the data.
5/23/2022 Introduction to Database – Chapter 1 11
Data Science Process

5/23/2022 Introduction to Database – Chapter 1 12


Data Science Process

Step 1: Frame the problem


The first thing you have to do before you solve a problem is to define exactly what it is.
You need to be able to translate data questions into something actionable.

Step 2: Collect the raw data needed for your problem


Once you’ve defined the problem, you’ll need data to give you the insights needed to
turn the problem around with a solution.

5/23/2022 Introduction to Database – Chapter 1 13


Data Science Process
Step 3: Process the data for analysis
• Data can be quite messy, especially if it hasn’t been well-maintained.
• Check for Following Errors
• Missing values, for example some of the student marks missing.
• Corrupted values, such as invalid entries.
• Time zone differences, perhaps your database doesn’t take into account the different time
zones of your users.
• Date range errors, perhaps you’ll have dates that makes no sense, such as date registered
from before sales started.
5/23/2022 Introduction to Database – Chapter 1 14
Data Science Process
Step 4: Exploratory data analysis
There are two main goals to exploratory data analysis.

• The first is you want to know if the data that you have is suitable for answering the question
that you have.
• Is there enough data?
• Are there too many missing values?
• Am I missing certain variables or do I need to collect more data to get those
variables, etc?

• The second goal of exploratory data analysis is to start to develop a sketch of the solution.
• Apply your statistical, mathematical and technological knowledge.

5/23/2022 Introduction to Database – Chapter 1 15


Data Science Process

Step 5: Formal modeling

• The formal modeling phase is the way to specifically write down what questions you’re
asking and what parameters you’re trying to estimate.

• Challenging your model and developing a formal framework is really important to making
sure that you can develop robust evidence for answering your question.

• And It helps to examine their sensitivity to different assumptions.

5/23/2022 Introduction to Database – Chapter 1 16


Data Science Process

Step 6: Interpretation

• You’ve probably done many different analyses, you probably fit many different models. And
so you have many different bits of information to think about.

• Part of the challenge of the interpretation phase is to assemble all of the information and
weigh each of the different pieces of evidence.

• You know which pieces are more reliable, which are are more uncertain than others, and
which more important than others to get a sense of the totality of evidence with respect to
answering the question.

5/23/2022 Introduction to Database – Chapter 1 17


Data Science Process

Step 7 : Communication

• The last phase is the communication phase.

• Any data science project that is successful will want to communicate its findings to some
sort of audience.

• That audience may be internal to your organization, it may be external, it may be to a large
audience or even just a few people.

5/23/2022 Introduction to Database – Chapter 1 18


Applications of Data Science
Image Recognition and Speech Recognition:
• Automatic image tagging suggestion on Facebook uses image recognition algorithm.

• “Ok Google, Siri, Cortana", etc., and these devices respond as per voice control which
uses speech recognition algorithm.

Transport:
• Transport industries also using data science technology to create self-driving cars.

Healthcare:
• Data science is being used for tumor detection, drug discovery, medical image analysis,
virtual medical bots, etc.

5/23/2022 Introduction to Database – Chapter 1 19


Applications of Data Science
Recommendation Systems:
• Many companies like Amazon, Netflix, Google Play, etc., are using personalized
recommendations (suggestions for similar products).

Risk Detection:
• Most of the finance companies/ banks are using data science to avoid risk and any type of
losses with an increase in customer satisfaction.

Crime Analysis:
• Data Analytics can be used for crime analysis based on the area of frequent crime,
historical pattern (predictive Policing) also used to predict civil unrest in cities based on
social media posts.

5/23/2022 Introduction to Database – Chapter 1 20


Applications of Data Science
Sentiment Analysis:
• Sentiment analysis helps to rapidly read about products and services on different social
platforms

Churn Prediction:
• Predicting which customers will churn will help a company to retain their existing customers
by analyzing historical transactions

Data Analysis in Education:


• Each student can be tracked by showing how much he is reading the book, what pages he
skips, how much he highlights, and whether he is taking notes etc.

5/23/2022 Introduction to Database – Chapter 1 21


Data Science Tools

Following are some of the tools required for data science:

• Data Analysis tools: R, Python, SAS, MATLAB, Excel, RapidMiner.

• Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift

• Data Visualization tools: Python, R, Jupyter, Tableau, Cognos.

• Machine learning tools: Spark, Mahout, Azure ML studio.

5/23/2022 Introduction to Database – Chapter 1 22


References

1. https://datafloq.com/read/data-science-8-powerful-applications/7090
2. https://www.javatpoint.com/data-science
3. https://makemeanalyst.com/structure-data-science-project-different-phases-data-science-
project/#:~:text=Structure%20of%20a%20Data%20Science%20Project%20%7C%20Differe
nt,4%20Phase%204%3A%20Interpretation%205%20Phase%205%3A%20Communication

5/23/2022 Introduction to Database – Chapter 1


Thanks!

5/23/2022 Introduction to Database – Chapter 1 24

You might also like