0% found this document useful (0 votes)
69 views23 pages

Big Data: Introduction To Terms, Concepts and Tools

Big data is being generated from many sources at an increasing volume, velocity, and variety. It requires new techniques and technologies to capture, store, distribute, manage, and analyze this data. This document introduces some of the key concepts and tools in big data, including Hadoop for distributed storage and processing, different data store models, and how organizations are using big data analytics to solve problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views23 pages

Big Data: Introduction To Terms, Concepts and Tools

Big data is being generated from many sources at an increasing volume, velocity, and variety. It requires new techniques and technologies to capture, store, distribute, manage, and analyze this data. This document introduces some of the key concepts and tools in big data, including Hadoop for distributed storage and processing, different data store models, and how organizations are using big data analytics to solve problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

BIG DATA

INTRODUCTION TO TERMS, CONCEPTS AND TOOLS

LEONEN . RIVERA . VILLAFLOR


BIG DATA

Big Data may well be the Next Big


Thing in the
IT world.

Big data burst upon the scene in the first decade of the
21st century.
BRIEF OVERVIEW
Big Data

○ Big data is being generated by everything around us at all times.


○ Every digital process and social media exchange produces it.
○ Systems, sensors and mobile devices transmit it.
○ Big data is arriving from multiple sources at an alarming velocity, volume and variety.
○ To extract meaningful value from big data, you need optimal processing power, analytics capabilities and
skills.
BIG DATA

○ ‘Big Data’ is similar to ‘Small Data’, but bigger in size but having data bigger it
requires different approaches:
○ An aim to solve new problems or old problems in a better way.
○ Big Data generates value from the storage and processing of very large quantities
of digital information that cannot be analyzed with traditional computing
techniques.
ORGANIZATIONS EMBRACING BIG DATA
CHARACTERISTICS OF BIG DATA

VOLUM Velocity Variety


E
Big Data Technologies

OPERATIONAL ANALYTICAL
Overview of Big Data stores

Data models
○ Key-value
○ Graph
○ Document
○ Column-family
Hadoop Distributed File System

○ Hadoop is ideal for storing large amounts of data, like terabytes and
petabytes, and uses HDFS as its storage system. HDFS lets you
connect nodes (commodity personal computers) contained within
clusters over which data files are distributed. You can then access and
store the data files as one seamless file system. Access to data files is
handled in a streaming manner, meaning that applications or
commands are executed directly using the MapReduce processing
model
Hadoop

○ is an open-source software framework for storing data and


running applications on clusters of commodity hardware.
○ It provides massive storage for any kind of data, enormous
processing power and the ability to handle virtually limitless
concurrent tasks or jobs.
Hadoop: How is it used?

○ The data processing framework is the tool used to work with the data
itself. By default, this is the Java-based system known
as MapReduce. You hear more about MapReduce than the HDFS
side of Hadoop for two reasons: It's the tool that actually gets data
processed.
Selecting Big Data Stores

○ Choosing the correct data stores based on


your data characteristics
○ Moving code to data

○ Implementing polyglot data store solutions

○ Aligning business goals to the appropriate


data store
Processing Big Data

Integrating disparate data stores


○ Mapping data to the programming framework

○ Connecting and extracting data from storage

○ Transforming data for processing

○ Subdividing data in preparation for Hadoop


MapReduce
Processing Big Data

Employing Hadoop MapReduce


○ Creating the components of Hadoop
MapReduce jobs
○ Distributing data processing across
server farms
○ Executing Hadoop MapReduce jobs
○ Monitoring the progress of job flows
The Structure of Big Data

Structured
○ Most traditional data sources
Semi-structured
○ Many sources of big data
Unstructured
○ Video data
○ Audio data
The Structure of Big Data

Structured Semi-structured Unstructured

Traditional Data Warehousing Text Mining Video Surveillance / Analysis

Demand Forecasting in Manufacturing Disease Analysis on Electronic Health Records

Maintenance in Aerospace
Social Media Analysis

Claims and Tax Fraud in Public Sector

Credit Card Fraud Detection


WHY BIG DATA?
WHY BIG DATA?

Growth of Big Data is needed


○ Increase of storage capacities

○ Increase of processing power

○ Availability of data(different data types)

○ Every day we create 2.5 quintillion bytes of data.

○ 90% of the data in the world today has been created in the
last two years alone
BIG DATA SOURCES

Users Application Systems Sensors


Data Generation Points

○ Mobile Devices
○ Microphones
○ Readers / Scanners
○ Science Facilities
○ Programs / Softwares
○ Social Media
○ Cameras
SUMMARY

○ Big Data
○ Organizations embracing Big Data
○ Characteristics of Big Data
○ Big Data Technologies
○ Big Data Stores
○ Processing Big Data
○ Structure of Big Data
○ Why Big Data
○ Big Data Sources
BIG DATA
INTRODUCTION TO TERMS, CONCEPTS AND TOOLS

LEONEN . RIVERA . VILLAFLOR

You might also like