Unit 1
Unit 1
1
Content
2
DATA,INFORMATION ,KNOWLEDGE
“386” is a data
4
Big Data
Its also a data with huge size.
5
Examples
Stock market
Social Media
Sensors
Web
6
Types of Big Data
Structured
Unstructured
Semi-Structured
7
Structured Data
Data that can be Stored , Accessed and Processed using fixed
format.
8
Unstructured Data
Any data with unknown form or structured is classified as
unstructured data.
9
Semi Structured Data
10
Characteristics Of Big Data
Volume
Variety
Velocity
Variability
11
Volume
12
4.6 billion
30 billion RFID tags camera
today phones
12+ TBs (1.3B in 2005) world wide
of tweet data
every day
100s of
millions of
GPS enabled
data every day
? TBs of
devices sold
annually
25+ TBs of
2+ billion
log data
every day people on
the Web by
end 2011
76 million smart meters in
2009…
200M by 2014
13
Variety
Social Banking
Finance
Media
Our
Gaming
Customer Known
History
Entertain Purchase
14
Velocity
Mobile devices
(tracking all objects all the time)
15
Some Make it 4V’s
16
Growth of Big Data
17
Big Data Analytics
It is complete process of collecting, gathering, organizing and
huge sets of data(Big Data) to identify pattern and to extract
other useful information to make a decision.
18
example
19
THANK YOU
20
content
Web data
Generated by humans or machines in airline Generated by humans and machines through word
reservation systems, inventory control systems, processing, email clients, tools for viewing or
ERP systems etc. editing media.
Non-Relational database
Data processing
In-Memory Processing
Reporting Layer(visualization)
Cost reduction.
etc
It’s a versatile data and its used for predictive analysis, that’s y
it is considered as a most popular big data.
But now companies have newly evolving big data sources such
as data from web browsers, mobile applications, social media
sites etc
Feedback Behaviour
3.Embedded Processes
Analytic tools will often translate the logic from a model into
SQL for the user or user can code an SQL script.
Grid Computing
Analytical Process
3. Resource pooling
4. Rapid elasticity
5. Measured service
Public clouds
Private clouds
No need to buy system of high capacity and then having risk of half of
the capacity unused.
Once access is granted user load their data and start analyzing.
Security of data
Data in the sandbox will have limited life ,i.e. build a data
needed for project and delete it as soon as project is done.
Simplicity
Analysis vs Reporting
Production data and all of the sandbox data are within the
production system so easy to link resources.
The sandbox will use both space and CPU resources (potentially
a lot of resources).
Probability
Elements of Inference
Bayesian approach
SVM
Purpose
Task
Output
Delivery Value
Canned Reports
Can be accessed with in the tools.
They are static with fixed metrics and dimensions
Dashboards may include data from various data sources and are
also usually fairly static.