0% found this document useful (0 votes)
2 views

Big Data Basics

The document outlines the fundamentals of Big Data, including its five V's: Volume, Velocity, Variety, Veracity, and Value. It discusses different types of data, such as human-generated and machine-generated data, and various file systems like Google File System (GFS) and Hadoop Distributed File System (HDFS). Additionally, it highlights the importance of data analytics techniques in managing and analyzing large datasets.

Uploaded by

azamsyed811
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Big Data Basics

The document outlines the fundamentals of Big Data, including its five V's: Volume, Velocity, Variety, Veracity, and Value. It discusses different types of data, such as human-generated and machine-generated data, and various file systems like Google File System (GFS) and Hadoop Distributed File System (HDFS). Additionally, it highlights the importance of data analytics techniques in managing and analyzing large datasets.

Uploaded by

azamsyed811
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Big Data Basics

Outline
q Data Types and File Systems

q Five V’s of Big Data

q Big Data Analysis Techniques

Reference:
• Chapter 10, “Principles of Distributed Database Systems” by Özsu, M. Tamer, Valduriez,
Patrick. 4th Ed, ISBN 978-3-030-26253-2
• Chapter 1, “Big Data Fundamentals: Concepts, Drivers & Techniques”, by Thomas Erl,
Wajid Khattak, Paul Buhler. 1st Ed. ISBN-10: 0134291077,
Dr. M. N. Sadat 2
Image source: https://medium.com/@get_excelsior 3
5 V’s: Volume

From 2020 to 2025, IDC forecasts


new data creation to grow at a rate
of 23%, resulting in approximately
175 ZB (=1021 bytes) of data creation
by 2025

Image sources: https://www.virtualb.it/ 4


5 V’s: Velocity

5
5 V’s: Variety

Image source: https://www.bisok.com/analytics-and-business-intelligence/unstructured-data/

Dr. M. N. Sadat 6
5 V’s: Veracity

Dr. M. N. Sadat 7
5 V’s: Value

Dr. M. N. Sadat 8
Types of Data

human-generated data machine-generated data

Dr. M. N. Sadat 9
Types of Data

structured data unstructured data JSON and sensor


stored in a data are semi-
tabular form structured

metadata
Dr. M. N. Sadat 10
File Systems and Distributed File Systems
Example:
● Google File System (GFS)
● Hadoop Distributed File
System (HDFS)
● Network File System (NFS)
● Amazon S3
● GlusterFS
● Ceph

Dr. M. N. Sadat 11
Google File System (GFS)

● GFS aims at providing


performance, scalability, fault-
tolerance, and availability

● Optimized for Google data-


intensive applications such as
search engine or data analysis

● Reliability and fault-tolerance


provided via replication

Dr. M. N. Sadat 12
Google File System (GFS)
● Characteristics of Google data-intensive applications:
○ Files are very large, typically several gigabytes, containing many objects such
as web documents.
○ Workloads consist mainly of read and append operations, while random
updates are rare. Read operations consist of large reads of bulk data (e.g., 1 MB)
and small random reads (e.g., a few KBs).
○ The append operations are also large and there may be many concurrent
clients that append the same file.
○ Because workloads consist mainly of large read and append operations, high
throughput is more important than low latency

Dr. M. N. Sadat 13
Google File System (GFS)

● Files are divided into fixed-


size partitions, called chunks,
of large size, i.e., 64 MB
● The cluster nodes consist of
i. GFS clients that provide the
GFS interface to apps
ii. chunk servers that store
chunks, and
iii. a single GFS master that
maintains file metadata
such as namespace

Dr. M. N. Sadat 14
Data Analytics

Image source: https://www.pipelinersales.com/data-analytics/ 15


Data Analytics

Image source: https://dashthis.com/ 16

You might also like