0% found this document useful (0 votes)
95 views10 pages

Sybca Bigdata

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views10 pages

Sybca Bigdata

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

 Introduction to Big Data

 What is Data?
The quantities, characters, or symbols on which operations are performed by a computer,
which may be stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
 What is Big Data?
Big Data is also data but with a huge size. Big Data is a term used to describe a
collection of data that is huge in volume and yet growing exponentially with time. In
short such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.

 “Extremely large data sets that may be analyzed computationally to reveal patterns ,
trends and association, especially relating to human behavior and interaction are
known as Big Data.”
 Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and
video uploads, message exchanges, putting comments etc.
TWITTER

 A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With
many thousand flights per day, generation of data reaches up to many Petabytes.
 Characteristics Of Big Data
• The following are known as “Big Data Characteristics”.
1. Volume
2. Velocity
3. Variety
4. Veracity
1. Volume:
Volume means “How much Data is generated”. Now-a-days,
Organizations or Human Beings or Systems are generating or getting
very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa
Byte(EB) and more.
2. Velocity:
Velocity means “How fast produce Data”. Now-a-days, Organizations or
Human Beings or Systems are generating huge amounts of Data at very
fast rate.

3. Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or
Human Beings or Systems are generating very huge amount of data at very fast
rate in different formats. We will discuss in details about different formats of
Data soon.
4. Veracity
Veracity means “The Quality or Correctness or Accuracy of Captured Data”.
Out of 4Vs, it is most important V for any Big Data Solutions. Because
without
Correct Information or Data, there is no use of storing large amount of data
at fast rate and different formats. That data should give correct business
value.
 Types of Digital Data
1. Structured
2. Unstructured
3. Semi-structured

 Structured
 Any data that can be stored, accessed and processed in the form of fixed format
is termed as a 'structured' data.
 Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it.
 However, nowadays, we are foreseeing issues when a size of such data grows
to a huge extent, typical sizes are being in the range of multiple zettabytes.
 Do you know? Data stored in a relational database management system is
one example of a 'structured' data.

• Examples Of Structured Data


An 'Employee' table in a database is an example of Structured Data

Employee_ID Employee_Name Gender Department Salary_In_lacs

2365 Rajesh Kulkarni Male Finance 650000


3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000
 Unstructured
 Any data with unknown form or the structure is classified as unstructured data.
 In addition to the size being huge, un-structured data poses multiple challenges in terms
of its processing for deriving value out of it.
 A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
 Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or
unstructured format.
• Examples Of Un-structured Data
The output returned by 'Google Search'
 Semi-structured
 Semi-structured data can contain both the forms of data.
 We can see semi-structured data as a structured in form but it is actually not defined
with e.g. a table definition in relational DBMS.
 Example of semi-structured data is a data represented in an XML file.

 Examples Of Semi-structured Data


Personal data stored in an XML file-

<rec><name>Prashant
Rao</name><sex>Male</
sex><age>35</age></rec
>
<rec><name>Seema
R.</name><sex>Female<
/sex><age>41</age></rec
>

You might also like