Basic Description of Big Dat As A Distributed Procesing Ans Strogae Strategy
Basic Description of Big Dat As A Distributed Procesing Ans Strogae Strategy
▪ @casjorge1967
2
Human power evolution
Information
society
Industrial
Agricultural society
society
Information society evolution
En 10
En 5 años
Hoy años
- PC Obligatorio
Hace - Internet Opcional
- PC Obligatorio - Cloud Obligatorio
Hace 15 años - Internet Obligatorio - Desarrollo
Obligatorio
- PC Obligatorio - Cloud Obligatorio
20 años - Internet Obligatorio - Desarrollo
Opcional
- PC Obligatorio - Cloud Opcional
- Internet Opcional
- PC Opcional
- Internet Opcional
A new software definition
PaaS/FaaS
• Processes • Service
• Automation • Code • Mass passing
• Classifying • IaaS • Economy of
• Order • Big Data scale
Cloud
Good Ideas
Big Data
Software: A ▪In 2016, the author Joshua Cooper Ramo in his
new book “The seven sense” says: Access to reliable
definition translation algorithms is more important than the
ability to speak different languages and, therefore,
being polyglot in the future will be an archaic
specialty, since the machines will take care of
that. That is, the power is not in those who
dominate more languages, or better speak
English, but in those who control the systems and
algorithms to make better translations. For Joshua
Cooper Ramo, what will replace English will not
be Chinese or Spanish or German or Arabic, but
algorithms and protocols. 7
Software: A ▪Since 1970 Josephn Weizenbaum announced
new that "anyone with a reasonably ordered mind can
definition become a good programmer, but requires maturity
to tolerate the long time Between an effort and
something that shows success. " The age of
connectivity will create a new caste. This caste
does not involve the training of millions of
programmers, but if a group of people with high
technical skills to design and control systems and
protocols, and with a historical and political
understanding, they can influence collective
thinking.
8
Automation ▪45% of human activities now paid are affected in
Pros & Cons one way or another but displaced by automation.
The study included more than 800 occupations,
2,000 activities in 3 major capacities (social,
cognitive, and physical). The conclusion is,
although not hopeless in relation to the release of
human time to reinvent itself, if it is blunt: There
are no immune to automation activities.
Mckinsey, "Four Fundamentals of Work Automation,“- Nov 2015
9
Automation
Quill (intelligent narrative program) is capable of
Pros & Cons
analyzing data, generating language naturally and
writing reports without the user even suspecting that he
is being taken care of by a machine.
11
1.
Introduction
Provoking quotes
about big data
12
“
“Any enterprise CEO really ought to be able to ask a
question that involves connecting data across the
organization, be able to run a company effectively, and
especially to be able to respond to unexpected events.
Most organizations are missing this ability to connect all
the data together.”
—Tim Berners Lee
13
“
“Information is the oil of the 21st century, and analytics
is the combustion engine.”
14
“
▪“Without big data, you are blind and deaf in the
middle of a freeway”
15
“
▪“The world is one big data problem.” – Andrew McAfee
▪“In God we trust. All others must bring data.” – Edwards Deming
17
▪Big data is more feasible when traditional
Big Data
technologies are not due to challenges in 5V:
Features
Also Variability
Is a feauture
considered by
some analysts to
reinforce the
concept of
challenges in
schemes than
vary with time.
18
Volume: “There were 5 exabytes of
information created between the dawn of
Big Data civilization through 2003, but that much
information is now created every 2 days.”
Quoted
Eric Schmidt, of Google, said in 2010
features
Marissa Mayer
20
Health retail: Walgreens
• At Walgreens, big data is being used by clinicians at in-store health clinics
delivering advanced analytics at the point of care to better assess patient
conditions and provide recommendations that improve health overall and avoid
Big Data: future medical costs. Over 7.5 billion medical events for 100 million people power
the big data system with information like demographics, enrollment, diagnoses,
Prominent procedures, and data from managed-care plans.
real use
Airlines: Delta
cases • Delta has used big data to help with one of the most uncomfortable travel
situations that exists—lost baggage. With over 130 million bags checked per
year, the company held a lot of tracking data about bags and became the first
major airline to allow customers to track their bags from mobile devices. To date,
the app has been downloaded over 11 million times and gives customers much
greater peace of mind while traveling while also differentiating Delta as a
customer-centric company.
Automotive: Tesla
•Tesla excel for instrumenting vehicles with sensors and sending all the
data back to the mother ship for analysis, using an Apache Hadoop®
cluster to collect the data. The data is used to improve the company’s
R&D, car performance, car maintenance, and customer satisfaction. The
company is notified if the car is not functioning properly and consumers
can be advised to get a service.
21
Logistics: UPS
•On a daily basis, UPS makes 16.9 package and document deliveries
every day and over 4 billion items shipped per year through almost
Big Data: 100,000 vehicles. One of the applications is for fleet optimization. On-
truck telematics and advanced algorithms help with routes, engine idle
time, and predictive maintenance. Since starting the program, the
Prominent company has saved over 39 million gallons of fuel and avoided driving
364 million miles.
real use
Telecommunications: Sprint
cases • Sprint spoke about using big data analytics to improve quality and
customer experience while reducing network error rates and
customer churn. They handle 10s of billions of transactions per
day for 53 million users, and their big data analytics put real-time
intelligence into the network, driving a 90% increase in capacity.
22
2.
Concepts
Basic terminology
23
“
Understanding the traditional
technology will help to comprehend
the produced Big Data paradigm shift.
24
From
Traditional/concepts
traditional to Dataset: Collections Data analysis: BI: Process of gaining
Process of examining insight in an
of related data where
Big Data each member shares
data to find facts, enterprise workings to
relations, patterns, improve decisión
same attributes.
concepts insigths or trends. making.
Advanced Visual
tools: User friendly
tolos for descriptive,
diagnostic, predictive &
prescriptive analytics. 26
BigData/concepts
From
traditional to Descriptive analysis:
Diagnostic analysis:
Big Data Respond to “what”
Respond to “Why”
questions, it is static and
concepts usually from OLTP
questions. It is interactive
& used in OLAP systems.
systems.
Analysis
types Predictive analysis:
Prescriptive analysis:
Respond to “Which”
Respond to “what-if”
questions, it respond to
questions. Usually has its
simulation of various
own visualization tool.
scenarios.
27
For more than 20 years databases turn around relational database
systems based on ACID properties and concepts, standard query
From language and a single scheme to develop and maintaing enterprise
traditional to solutions.
Big Data
concepts
28
Latest internet generation businesses like social networks, powerfull
search engines, IoT, smart cities, cloud computing, data deluge,
From digital transformation, lower costs, …
traditional to
Big Data
concepts
29
BIG DATA
ENABLERS:
• Analytics & Data Science
• Digitization
• Affordable Technology & Commodity Hardware
• Social Media
• Hyper-Connected Communities & Devices
• Cloud Computing
• Business Intelligence evolution
30
CAP Theorem ACID BASE
Traditional
• Consistency • Atomicity • Basically
concepts • Availability • Consistency Available
• Partition • Isolation • Soft state
Tolerance • Durability • Eventual
consistency
Limited and
precise PROS CONS
scope in
structured
data
31
Volume Velocity Variety
When to use
Big Data?
Additional Information
Additional Information
Additional Information
3.
Big data
mechanisms
Distributed processing
& storage technologies
36
New Parallel
approachs to Processing
solutions
Distributed
Clusters in an Clusters
inexpensive RDBMS
way (commodity
HW and Open
source SW) Storage NoSQL
NewSQL
Processing engine
• Responsible for processing data based on a predefined logic.
New
Resource Manager
Processing • Schedulles & prioritizes requests according to individual processing workloads
Coordination engine
• Ensure operational consistency across all serversto support distibuted locks and
queues, asyncronous comms
4.
NoSQL
devices
Distributed storage
technologies
39
DFS
New Storage Distributed File Systems
Concepts
RDBMS KVS
Big Data Key Value Systems
Graph DB
RDBMS
• Good fit for transactional workloads, and generally is a single node. Do not provide
out of box redundancy or fault tolerance.Vertical scaling. Some propietary
New solutions are DDB but with shared storage with single point of failure.
New SQL
• Combines ACID of RDBMS with scalability and fault tolerance of
NoSQL. It support SQL for data definition and data manipulation.
5.
Processing
fundamentals
Basic algorithms for
distributed processing
42
BSPJob
New
approachs to
solutions
Basic
processing
examples Map Reduce Job
ESP
Real Time
Processing CEP
Batch
BSP
Bulk
ESP: Event Streaming Processing Synchronization
Parallel
• Single source events
44
Full complex
big data
solution
example
45
Simple
macro steps
for designing
a Big Data
solution
HADOOP WORLD
MECHANISM PRODUCT
DFS HDFS
MAP REDUCE APACHE MAP REDUCE,
Mapping COUCHBASE,
COUCHDB, INFINISPAN
technologies COORDINATION
ENGINE
ZOOKEEPER, CONSUL,
DOOZERD, ETCD
QUERY ENGINE HIVE, PIG
ANALYTICS ENGINE MAHOUT
WORKFLOW ENGINE OOZIE
DATA TRANSFER FLUME (EVENTS),
ENGINE SQOOP (RDBS), SCRIBE
(FILES)
NUODB, INNODB
“
Ser miembro del
Inscríbete en
Meetup Big Data
www.softy365.com
Colombia
www.softy365.com
[email protected]
48
Thank you very much
for your time
If you have any questions about this document
please don’t hesitate to contact me at:
▪ @casjorge1967
49