Chapter 2 Data Science
Chapter 2 Data Science
Data Science
2.1. Overview of Data Science
Data is the oil for today's world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive
business advantage.
Data Science can help you to detect fraud using advanced machine
learning algorithms.
It helps you to prevent any significant monetary losses.
Significant advantages of using Data
5
Science (II)
Input :- in this step, the input data is prepared in some convenient form for
processing.
The form will depend on the processing machine.
For example, when electronic computers are used, the input data can be recorded on
any one of the several types of storage medium, such as hard disk, CD, flash disk and
so on.
Processing:- in this step, the input data is changed to produce data in a more
useful form.
For example, interest can be calculated on deposit to a bank, or a summary of
sales for the month can be calculated from the sales orders.
Data Processing Cycle
(II)
12
data.
For example, output data may be payroll for employees.
Data types and their representation
13
Semi-structured and
It is the process of gathering, filtering, and cleaning data before it is put in a data
warehouse or any other storage solution on which data analysis can be carried
out.
Data acquisition is one of the major big data challenges in terms of infrastructure
requirements.
The infrastructure required to support the acquisition of big data must deliver
low, predictable latency in both capturing data and in executing queries; be able
to handle very high transaction volumes, often in a distributed environment and
support flexible and dynamic data structures.
2. Data Analysis
23
It is the active management of data over its life cycle to ensure it meets
the necessary data quality requirements for its effective usage.
Data curation processes can be categorized into different activities
such as content creation, selection, classification, transformation,
validation, and preservation.
Data curation is performed by expert curators that are responsible for
improving the accessibility and quality of data.
Data curators (also known as scientific curators or data annotators)
hold the responsibility of ensuring that data are trustworthy,
discoverable, accessible, reusable and fit their purpose.
A key trend for the duration of big data utilizes community and crowd
sourcing approaches.
4. Data Storage
25
29
Mobile devices
(Tracking all objects all the time)
Areas of Applications of Big Data
30
Smarter Multi-
Healthcare channel
sales
Telecom
Homeland
Security
Trading
Analytics
TrafficControl
Search
Quality
Manufacturing
Big Data vs Data
Science
32
THANK YOU
?