0% found this document useful (0 votes)
16 views22 pages

Big Data Presentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views22 pages

Big Data Presentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Hello there!

Big Data
The ne The next frontier for innovation, competition, and productivity
Definition of Big Data

• Big data refers to datasets whose size


is beyond the ability of traditional data
management sofware to capture, store,
manage, and analyze.

• Big data refers to the large, diverse


information sets that grow ever-
increasingly. It contains the 3 V’s

• The definition can vary by sector,


depending on what kinds of software
tools are commonly available and
what sizes of datasets are common in
a particular industry.
There are four broad ways in which using big
data can create value.

• Big Data can unlock significant value by making information


transparent and usable at a much higher frequency.

• As organizations create and store more transactional data in digital


form, they can collect more accurate and detailed performance
information on everything from product inventories to sick days, and
therefore expose variability and boost performance.

• Big data allows narrower segmentation of customers and therefore


much more precisely tailored products or services.

• Sophisticated analytics gradually improve decision-making.


Sentimental
analysis
always connected Personalized marketing - Amazon
keeps some things I've been looking
always on at allows them to personalize what
they show me , which
customers
Helps getting insights in turns leads to better met consumer
about target expectations and happier
customers
customers

Sentimental analysis of the Now, businesses including


feelings around events and Walmart and target can leverage
products. Analyze the general technology use this information to
opinion of a person or public personalize their
communication and to make
about such a product better informed decisions
Saving lives with big data
• Rapid advances in genome sequencing technology, the life sciences industry is experiencing
an enormous draw in biomedical big data.

• This biomedical data is being used by many applications in research and personalized
medicine.

• With access to vast amounts of patient and population data, healthcare is enhancing
treatments, performing more effective research on diseases like cancer and Alzheimer’s,
developing new drugs, and gaining critical insights on patterns within population health

• Research in this area is enabling the development of methods to analyze large scale data
to develop solutions that tailor to each individual, and hence hypothesize to be more
effective.

• Data from sensors, organizations and people like fitbit and mobile app is growing
significantly
Types of data

Structured Semi-structured Unstructured

The data is vey easy to There is some structure but There is no structure and the
process .the examples are
not great as structred .the data is heterogeneous data source
excel and regular Dbms
system like oracle and SQL examples are json and html containing a combination diverse
server in form of mails and text
Characteristics
of Big Data

Velocity Variety
Volume
Veracity and Valence
Volume(size)
• Volume refers to the vast amounts of data generated every
second, minute, hour, and day in our digitized world and
relates to the sheer size of big data. The size and the scale of
storage for big data can be massive

• This brings additional challenges such as networking,


bandwidth, and the cost of storing data. As the volume
increases performance and cost start becoming a challenge

• So Businesses need a holistic strategy to handle the


processing of large scale data to their benefit in the most
cost-effective manner

• In business, the goal is to turn this much data into some form
of business advantage
Variety(complexity)
• Variety refers to the ever increasing different
forms that data can come in such as text,
images, voice, and geospatial data.

• It refers to increased diversity.It refers to


complexity

• A satellite image of wildfires from NASA is very


different from tweets sent out by people who are
seeing the fire spread.

• Emails, for example, is a hybrid entity

• Sometimes we also use qualitative versus


quantitative measures
Velocity
• Velocity refers to the speed at which data is
being generated and the pace at which data
moves from one point to the next at which
the data needs to be stored and analyzed.

• Being able to catch up with the velocity of big


data and analyzing it as it gets generated can
even impact the quality of human life.

• Sensors and smart devices monitoring the


human body can detect abnormalities in real
time and trigger immediate action, potentially
saving lives.

• Streaming data gives information on what's


going on right now. Streaming data has
velocity, meaning it gets generated at various
rates. And analysis of such data in real time
gives agility and adaptability to maximize
benefits you want to extract.
Veracity(quality) and Valence(Connectedness)
• Veracity refers to the biases, noise, and
abnormality in data and how correct the
data is.

• Or, better yet, It refers to the often


unmeasurable uncertainties and
truthfulness and trustworthiness of data

• .Businesses need to connect and correlate


relationships, hierarchies and multiple data
linkages. Otherwise, their data can quickly
spiral out of control.

• Valence refers to the connectedness of big


data in the form of graphs.

• The most important aspect of valence


is that the data connectivity increases over
time.

• Increase in valence can lead to creation of


Monolithic
Two ways architecture(supercomput
ers)
how you can But the draw backs are there is a lot of tuning

develop a
and short coming in term of performance

system in Distributed architecture.


big data Where the data is distributed in form of

environment
nodes and clusters. The framework which
handles clusters is haddop.
NO SQL
Databas
es
NoSQL (commonly referred to
as "Not Only SQL") represents a
completely different framework of
databases that allows for high-
performance, agile processing of
information at massive scale with
flexible data models. NoSQL
database technology stores
information in JSON documents
instead of columns and rows used
by relational databases
Key value store
Types of NoSQL Databases
1)Riak
2)reddis

Graph Databse
Column family
1)neo4j
store
2)infinite graph
1)Cassandra
Document storage
2)Hbase
1)Mongodb
2)couch Db
Hadoop
Introduction to Hadoop
Hadoop is an open-source
software framework used
for storing data and
running applications on
clusters of commodity
hardware
For distributed storage
and parallel processing
of big data
set.

The platform works by


distributing Hadoop big
data and analytics jobs
across nodes in a
computing cluster,
breaking them down into
smaller workloads that
Hadoop Ecosystem

With so many frameworks and tools available, how do we learn what they do?
Is it a viable option?

As big data continues down its path of growth, there is no doubt


that these innovative approaches – utilizing NoSQL database
architecture and Hadoop software – will be central to allowing
companies reach full potential with data. Additionally, this rapid
advancement of data technology has sparked a rising demand to
hire the next generation of technical geniuses who can build up
this powerful infrastructure. The cost of the technology and the
talent may not be cheap, but for all of the value that big data is
capable of bringing to table, companies are finding that it is a very
worthy investment.
How to download Hadoop on your
console
•Download and Install VirtualBox.

•Download and Install Cloudera Virtual Machine


(VM) Image.

•Launch the Cloudera VM.

Hardware Requirements: (A) Quad Core


Processor (VT-x or AMD-V support
recommended)
64-bit; (B) 8 GB RAM; (C) 20 GB disk free
Conclusion
• The availability of Big Data, low-cost commodity hardware,
and new information management and analytic software
have produced a unique moment in the history of data
analysis.

• The convergence of these trends means that we have the


capabilities required to analyze astonishing data sets
quickly and cost-effectively for the first time in history.

• These capabilities are neither theoretical nor trivial. They


represent a genuine leap forward and a clear opportunity
to realize enormous gains in terms of efficiency,
productivity, revenue, and profitability.

• And what did we learn


Thank you!

You might also like