0% found this document useful (0 votes)
501 views27 pages

Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem

This document defines and discusses big data. It begins by explaining that big data refers to extremely large data sets that cannot be processed by traditional data processing applications. It then provides examples of big data sources like Facebook, YouTube, and Google. The document outlines some key characteristics of big data, such as its large volume, high velocity, variety of data types, and sometimes unreliable quality. It also discusses common tools used to analyze big data like Hadoop and the benefits and future growth of big data.

Uploaded by

Rajsree Rasmita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
501 views27 pages

Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem

This document defines and discusses big data. It begins by explaining that big data refers to extremely large data sets that cannot be processed by traditional data processing applications. It then provides examples of big data sources like Facebook, YouTube, and Google. The document outlines some key characteristics of big data, such as its large volume, high velocity, variety of data types, and sometimes unreliable quality. It also discusses common tools used to analyze big data like Hadoop and the benefits and future growth of big data.

Uploaded by

Rajsree Rasmita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

BIG DATA

Submitted by-Rajashree Rashmita


Reg.no-1825209016
MCA 4th sem
Content
 Introduction
 What is BIG DATA
 Examples of BIG DATA
 Characteristic of BIG DATA
 The structure of BIG DATA
 Why BIG DATA
 How it is different
 BIG DATA sources
 Tools used in BIG DATA
 Application of BIG DATA
 Risks of BIG DATA
 Benefits of BIG DATA
 How BIG DATA impact on IT
 Future of BIG DATA
 Conclusion
Introduction
 BIG DATA is a term defined for data sets that are large or
complex that traditional data processing applications are
inadequate .
 BIG DATA basically consists of analysis zing , capturing the
data , sharing , storage capacity , transfer, visualization and
querying and information privacy .
What is BIG DATA
Big Data means really a big data, it is a collection of large datasets that cannot be
processed using traditional computing techniques. Big data is not merely a data,
rather it has become a complete subject, which involves various tools, techniques
and frameworks.
Examples of BIG DATA
Ex- Face book, Flicker, YouTube, Twitter, Google,
Google news. Face book reports 2.5 billion
content items, 105 terabytes of data each half
hour, 300M photos and 4M videos posted per day .
In Twitter, Over 651 million users, generating over
6,000 tweets per second. 300 hours of video are
uploaded to YouTube every minute with more than
1 trillion video views. Google make 50 billion
pages indexed and more than 2.4 million queries
in every minute . Google news Articles from over
10,000 sources in real time . More than 4.5 million
photos uploaded in a day in Flicker. It is estimated
that all the global data generated from the
Characteristics of BIG DATA
Volume
 The quantity of generated and stored data every second .
 Here we are talking about Zettabyte or more .
 It is the task of big data to convert such Hadoop data into
valuable information .
 Data is generated by machine , networks and human
interaction on systems like social media .
 The volume of data to be analyzed is massive .
Velocity
 The speed of generation of data.
 Perhaps action being taken upon .
 The highest velocity data normally streams directly into memory
versus being written to disk .
 Some Internet Of Things(IOT) requires real-time evaluation and
action .
 E.g.-almost 2.5 million queries on Google are performed .
 Around 20 million photos are viewed .
 Every minute we upload 100 hours of video on Youtube .
 300,000 tweets are sent .
 Every minute over 200 million Emails are sent .
Variety
 BIG DATA is not just numbers , dates ,and strings . BIG
DATA is also 3D data ,geospatial , audio and video and
unstructured text , including log files and social media .
 Traditional database systems were designed to address
smaller volumes of structured data , fewer updates or a
predictable , consistent data structure .
 BIG DATA includes different types of data .
Veracity
 It is the extended definition for BIG DATA , which refers to
the data quality and data value .
 The data quality of captured data can vary greatly , affecting
the accurate analysis .
 Data quality is unreliable .
 Data coming from uncontrolled environments .
The Structure of big data
Now days 8 vs
10 vs
Why BIG DATA
 Growth of BIG DATA is needed .
 Increase of storage capacities .
 Increase of processing power .
 Availability of data (different data types) .
 Every day we create 2.5 quintillion bytes of data ; 90% of the
data in the world today has been created in the last 2 years
alone .
How it is different
 Automatically generated by a machine (sensor embedded in
an engine)
 Typically an entirely new source of data (use of the internet)
 Not designed to be friendly(text streams)
 May not have much values (need to focus on the important
part)
Tools used in BIG DATA
 Distributed servers / cloud (Amazon EC2)-processing is
hosted .
 Distributed storage (Amazon S3)-data is stored .
 Distributed processing(MapReduce)-programming model .
 High-performance schema –free database(MongoDB)-data is
stored and indexed .
 Analytic /semantic processing-operations are performed on
data .
Continue…
 Hadoop- It is a free , JAVA –based programming framework
that supports the processing large data sets in a distributed
computing environment .
 Facebook , LinkedIn , Twitter ,eBay use Hadoop .
 Hbase- A scalable , distributed database that supports
structured data storage for large tables .
 Hive- A data ware house infrastucture that provides data
summerization and ad hoc querying .
Application of BIG DATA
Benefits of BIG DATA
 Real-time big data is not just a process for storing petabytes
or exabytes of data in a data warehouse, it is about the ability
to make better decision and take meaningful actions at the
right time .
 Fast forward to the present and technologies like Hadoop
give you the scale and flexibility to store data before you
know how you are going to process it .
 Technologies such as MapReduce , Hive and impala enable
you to run queries without changing the data structures
underneath .
Continue….
 Our newest research finds that organizations are using big
data to target customer-centric outcomes, tap into internal
data and build a better information ecosystem .
 BIG DATA is already an important part of the $64 billion
database and data analytics market .
 It offers commercial opportunities of a comparable scale to
enterprise software in the late 1980s .
 And the internet boom of the 1990s , and the social media
explosion of today .
How BIG DATA impacts on IT
 BIG DATA is a troublesome force presenting opportunities
with challenge to IT organization .
 By 2015 4.4 million IT jobs in BIG DATA ; 1.9 million is in
US itself .
 India will require a minimum of 1 lakh data scientists in the
next couple of years in addition to data analysts and data
managers to support the BIG DATA space .
Future of BIG DATA
 $15 billion on software firms only specializing in data
management and analytics .
 This industry on its own is worth more than $100 billion and
growing at almost 10% a year which is roughly twice as fast as the
software business as a whole .
 In February , 2012 the open source analyst firm Wikibon released
the first market forecast for BIG DATA , listing $5.1 billion
revenue in 2012 with growth to $53.4 billion in 2017 .
 The McKinsey Global Institute estimates that data volume is
growing 40%per year , and will grow 44x between 2009 and
2020 .
Conclusion
 BIG DATA is now a reality with a huge profit potential .
 Tools and Technologies are available through Open-Sourse .
 Each one of us can benefit from working with BIG DATA
(Dynamic)in its pure form or in its traditional form (static) .
THANK YOU

You might also like