CH1 - Big Data Introduction-En
CH1 - Big Data Introduction-En
si
oc d
at va I2
Pr e
es
C
a nc
Ad
CH
D
Jeff Reed (2017), Data Analytics: Applicable Data Analysis to Advance Any Business Using
the Power of Data Driven Analytics
Over 90% of all data in the world was created in the past 2 years
Bi
- Huge volumes
g
Bi
The challenges:
- Capturing, transporting, and moving the data
- Managing the data the hardware involved, and the software
- Processing: managing & programming to provide insight into the data
- Storing - safeguarding and securing
• Astronomy • Government
D
• Problem:
Manage the several Petabytes of data which is growing at 40-100% per
D
g
regulators
Understanding the fraud pattern months after the fact is only partially helpful
Fraud detection models need to evolve faster
• If only Visa could …
Reinvent how to detect the fraud patterns
Stop new fraud patterns before they can rack-up significant losses
Solution
Revolutionize the speed of detection
Visa loaded two years of test records, or 73 billion transactions,
amounting to 36 terabytes of data into Hadoop - the processing time fell
from one month with traditional methods to a mere 13 minutes
Example: The U.S. produces 1.2 billion clinical care documents each year.
These documents contain information about a patient’s medical history,
doctor’s visits, hospital visits, previous treatments, procedures, test results and
prescription medications.
Legacy systems are used to gain insights from internally generated data facing
issues of high storage costs, long data loading time, and long administration
D
g
processing times…
Bi
• Problem:
Traffic congestion has been increased worldwide as a result of increased
urbanization and population growth reducing the efficiency of transportation
infrastructure and increasing travel time and fuel consumption.
Retailers want to use “big data” to predict trends, prepare for demand, pinpoint
customers, optimize pricing & promotions, and monitor real-time analytics &
D
results by combining data from web browsing patterns, social media, industry
g
Bi
The Impact of Big Data on The Retail Sector: Examples And Use-Cases
https://www.datapine.com/blog/big-data-in-retail-examples/
• Path analysis
• Connectivity analysis
• Community analysis
• Centrality analysis
networks and processes were limited to one factory, but the boundaries of
individual factories will most likely no longer exist in favor of the
interconnect of multiple factories or even geographical regions..
Industry 4.0
0 Flat files
1 Relational Databases (RBDMs) - 1970s - OLTP (Online Transactional
processing)
2 Data Warehouses - 1990s - OLAP (Online Analytical processing) or
DSS (Decision Support Systems) workloads
3 Big Data - 2000s - Batch, with a movement towards Real-time
How ???
si
oc d
at va I2
Pr e
es
V1: Volume
C
a nc
Ad
• Big data's main attribute is its huge volume, which has been collected
through several sources.
D
si
oc d
at va I2
Pr e
es
V1: Volume
C
a nc
Ad
D
• Traditional database technology cannot meet the demand for effective data
g
• The modern era now has a wider range of opportunities because to the
information explosion.
si
oc d
at va I2
Pr e
es
V2: Variety
C
a nc
• Big data is collected and created in various formats and sources. It includes
Ad
si
oc d
at va I2
Pr e
es
V2: Variety
C
a nc
Ad
D
g
Bi
• One of the main objectives of big data is to collect all this unstructured data
and analyze it using the appropriate technology
• Variety of data definitely helps to get insights from different set of samples,
users and demographics.
It helps to bring different perspective to same information.
It also allows analyzing and understanding the impact of different form
and sources of data collection from a ‘larger picture’ point of view.
si
oc d
at va I2
Pr e
es
V3: Velocity
C
a nc
• Speed is one of the key drivers for success in company business. Fast turn-
Ad
extent.
• Low velocity of even high quality of data may hinder the decision making of
a business.
si
oc d
at va I2
Pr e
es
V4: Veracity
C
a nc
Ad
• It is very likely that the vast amounts of data include some ambiguity.
D
g
Bi
• Big data has to be filtered for clean and pertinent information if we want to
provide the company insights that will help it grow The used data as an
input should be properly prepared, conformed, verified, and made
consistent in order to make reliable judgments.
• Case 1: Facebook
Bi
• Case 2: Skype
• Case 4: Jumia
Organization (WHO) and the Centers for Disease Control and Prevention
Bi
(CDC), need to monitor and predict disease outbreaks to take timely preventive
actions. Traditional methods of disease surveillance may not provide real-time
insights.
1- What are the constraints that have to be faced by a big data solution
2- Propose an architecture of big data solution
3- What are expected benefits
https://bugssiufam.wixsite.com/bugados/single-post/2016/08/18/OLTPOnline-Transaction-
Processing-e-OLAPOnline-Analytical-Processing
MUST FSB, Anis Ben Aicha 34
n
si
oc d
Annexe: Data professions
at va I2
Pr e
es
C
a nc
Data engineer profile (skills)
Ad
Data Responsibilities:
Designing and building data pipelines
Developing and maintaining data storage solutions
Data cleaning and preparation
Building data processing tools and scripts
Monitoring and performance optimization
Data Responsibilities
Deployment and monitoring
Software engineering and automation
Data engineering and infrastructure