Big Data Presentation
Big Data Presentation
Big Data
The ne The next frontier for innovation, competition, and productivity
Definition of Big Data
• This biomedical data is being used by many applications in research and personalized
medicine.
• With access to vast amounts of patient and population data, healthcare is enhancing
treatments, performing more effective research on diseases like cancer and Alzheimer’s,
developing new drugs, and gaining critical insights on patterns within population health
• Research in this area is enabling the development of methods to analyze large scale data
to develop solutions that tailor to each individual, and hence hypothesize to be more
effective.
• Data from sensors, organizations and people like fitbit and mobile app is growing
significantly
Types of data
The data is vey easy to There is some structure but There is no structure and the
process .the examples are
not great as structred .the data is heterogeneous data source
excel and regular Dbms
system like oracle and SQL examples are json and html containing a combination diverse
server in form of mails and text
Characteristics
of Big Data
Velocity Variety
Volume
Veracity and Valence
Volume(size)
• Volume refers to the vast amounts of data generated every
second, minute, hour, and day in our digitized world and
relates to the sheer size of big data. The size and the scale of
storage for big data can be massive
• In business, the goal is to turn this much data into some form
of business advantage
Variety(complexity)
• Variety refers to the ever increasing different
forms that data can come in such as text,
images, voice, and geospatial data.
develop a
and short coming in term of performance
environment
nodes and clusters. The framework which
handles clusters is haddop.
NO SQL
Databas
es
NoSQL (commonly referred to
as "Not Only SQL") represents a
completely different framework of
databases that allows for high-
performance, agile processing of
information at massive scale with
flexible data models. NoSQL
database technology stores
information in JSON documents
instead of columns and rows used
by relational databases
Key value store
Types of NoSQL Databases
1)Riak
2)reddis
Graph Databse
Column family
1)neo4j
store
2)infinite graph
1)Cassandra
Document storage
2)Hbase
1)Mongodb
2)couch Db
Hadoop
Introduction to Hadoop
Hadoop is an open-source
software framework used
for storing data and
running applications on
clusters of commodity
hardware
For distributed storage
and parallel processing
of big data
set.
With so many frameworks and tools available, how do we learn what they do?
Is it a viable option?