100% found this document useful (1 vote)

78 views4 pages

BigData Brief

Big data refers to large, complex datasets that cannot be processed by traditional data processing software. It comes from various sources like smartphones, social media, and online transactions. The key aspects of big data are volume, velocity, and variety. Common big data tools include Hadoop, Spark, MapReduce, Hive, and Impala which help address challenges like data volume, velocity, variety, and veracity. ETL extracts data from sources, transforms it, and loads it into a data warehouse, while ELT loads raw data and then transforms it within the data destination.

Uploaded by

Farid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

78 views4 pages

BigData Brief

Uploaded by

Farid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

[ Big Data ]

What exactly is big data?

The definition of big data is:
data that contains greater variety,
arriving in increasing volumes
and with more velocity

Put simply, big data is larger, more complex data sets,

especially from new data sources. These data sets are
so voluminous that traditional data processing software
just can’t manage them. But these massive volumes of data
can be used to address business problems you wouldn’t have been able to tackle before.

Big data Types:

Big data use cases:

• Customer segmentation
• Marketing campaign optimization
• Product development
• Social media sentiment analysis
• Supply chain optimization
• Financial analysis
• Fraud detection
• Image and speech recognition
• Personalized medicine
• Energy management
• Cybersecurity
• Smart city planning
• Traffic analysis
Big data Technology Challenges:

Challenge Description Solution Technology

Volume & Avoid risk of data loss from Replicate segments s of
Resilience machine failure in clusters of data in multiple machines,
commodity machines master nodes keeps track
of segments location

Volume & Avoid chocking of network Move processing logic to

Velocity bandwidth when moving where the data is stored,
large volumes of data using parallel processing
algorithms

Variety Efficient storage of large and NoSQL Database

small data objects (documents, graphs, key-
value pairs. Etc.)

Velocity Monitoring streams too large For-shaped architecture

to store to process data as a
stream and as a batch

Big data Ecosystem:

[ Q&A ]
1- What is Big Data, and where does it come from?
• big data is larger, more complex data sets, these data sets are so voluminous that
traditional data processing software just can’t manage them. Big Data comprises
unstructured and structured data sets such as videos, photos, audio, websites, and
multimedia content.

• It comes from various sources like:

o Smartphones
o Internet cookies
o Social media posts
o Online purchase transaction

2- What are the V’s in Big Data?

• Volume: The huge amount of data stored in data warehouses.

• Velocity: Velocity basically introduces the pace at which data is being produced in real-time.
• Variety: Big Data comprises structured, unstructured, and semi-structured data collected
from varied sources.
• Veracity: Data veracity basically relates to how reliable the data is, we can define it as the
quality of the data analyzed.

3- What is the difference between Database and Big data?

There are major differences in:
• Size
Traditional data sets tend to be measured in gigabytes and terabytes
Big data is usually measured in petabytes, zettabytes, or exabytes
• Sources
Traditional data derives from enterprise-level sources like ERP and CRM
Big data derives from a broader range of enterprise and non-enterprise-level data
• Organization
Traditional databases, like SQL & Oracle DB use fixed schema that is static and preconfigured.
Big data uses a dynamic schema. In storage, big data is raw and unstructured.
• Architecture
Traditional data is typically managed using a centralized architecture
Big data uses a distributed architecture.
4- What are the tools in big data?
What are (Hadoop/Spark / MapReduce / Hive / impala/ Impala/ Kafka/…)?
Bigdata tools & technologies are used to solve the bigdata challenges:

Challenge Description Solution Technology

Volume & Avoid risk of data loss from Replicate segments s of
Resilience machine failure in clusters of data in multiple machines,
commodity machines master nodes keeps track
of segments location

Volume & Avoid chocking of network Move processing logic to

Velocity bandwidth when moving where the data is stored,
large volumes of data using parallel processing
algorithms

Variety Efficient storage of large and NoSQL Database

small data objects (documents, graphs, key-
value pairs. Etc.)

Velocity Monitoring streams too large For-shaped architecture

to store to process data as a
stream and as a batch

5- What is the difference between ETL & ELT?

• ETL stands for Extract, Transform, and Load, while ELT stands for Extract, Load, and Transform.
o In ETL, data flows from the data source to staging to the data destination.
o ELT lets the data destination do the transformation, eliminating the need for data staging.

• ETL can help with data privacy and compliance, cleansing sensitive data before loading into
the data destination, while ELT can handle large volumes of unstructured data.

O Level Space Physics Notes
100% (5)
O Level Space Physics Notes
40 pages
Dissertation Kant
100% (2)
Dissertation Kant
15 pages
Essay On My Hero
100% (2)
Essay On My Hero
3 pages
Codetru - Big Data
100% (1)
Codetru - Big Data
17 pages
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
No ratings yet
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
32 pages
Elektor Electronics USA 1991 03
No ratings yet
Elektor Electronics USA 1991 03
72 pages
Big Data Final Presentation
50% (2)
Big Data Final Presentation
74 pages
Case-Control Study Design
No ratings yet
Case-Control Study Design
60 pages
Navigating The Labyrinth: A Study of Engagement and Artistry in Process Drama For Additional Language Teaching and Learning
No ratings yet
Navigating The Labyrinth: A Study of Engagement and Artistry in Process Drama For Additional Language Teaching and Learning
475 pages
CANON Color ImageRUNNER C2880, C2880i, C3380, C3380i Parts List
100% (1)
CANON Color ImageRUNNER C2880, C2880i, C3380, C3380i Parts List
150 pages
Unit 2 Data Warehouse New
No ratings yet
Unit 2 Data Warehouse New
45 pages
Relational Data Model
No ratings yet
Relational Data Model
38 pages
Laptop Issue Form Sample
100% (1)
Laptop Issue Form Sample
3 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
Big Data
100% (3)
Big Data
13 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
ETL State of The Art
No ratings yet
ETL State of The Art
198 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Big Data Platforms
No ratings yet
Big Data Platforms
8 pages
Introduction To Information and Big Data Security
No ratings yet
Introduction To Information and Big Data Security
39 pages
Myanmar Cyclone Shelter Assessment
No ratings yet
Myanmar Cyclone Shelter Assessment
116 pages
Database Slides
No ratings yet
Database Slides
100 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Data Warehousing Slides
No ratings yet
Data Warehousing Slides
76 pages
Octavia Manual Running Gear Part4
No ratings yet
Octavia Manual Running Gear Part4
136 pages
Big Data - S
No ratings yet
Big Data - S
79 pages
Data Dictionary
No ratings yet
Data Dictionary
4 pages
MIS-15 - Data and Knowledge Management
No ratings yet
MIS-15 - Data and Knowledge Management
55 pages
Secrets of Mind Power Harry Lorayne
No ratings yet
Secrets of Mind Power Harry Lorayne
45 pages
Data Warehousing: Defined and Its Applications
No ratings yet
Data Warehousing: Defined and Its Applications
31 pages
Company Profile-Polybond
No ratings yet
Company Profile-Polybond
40 pages
How To Use The TIMESTAMPADD Parameter To Retrieve by Today - X Time in An Alma Analytics Report
No ratings yet
How To Use The TIMESTAMPADD Parameter To Retrieve by Today - X Time in An Alma Analytics Report
27 pages
General Physics 1: Phys100
No ratings yet
General Physics 1: Phys100
20 pages
Denodo8 - Metadata Management Overview
No ratings yet
Denodo8 - Metadata Management Overview
28 pages
Intership
No ratings yet
Intership
40 pages
Statistics For Data Science PDF - Statistics-for-Data-Science PDF
No ratings yet
Statistics For Data Science PDF - Statistics-for-Data-Science PDF
14 pages
An Introduction To Big Data
No ratings yet
An Introduction To Big Data
31 pages
NASA Rocketry Basics
No ratings yet
NASA Rocketry Basics
38 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
BIG DATA and Its Traits
No ratings yet
BIG DATA and Its Traits
25 pages
Big Data Pipelines
No ratings yet
Big Data Pipelines
22 pages
A-Type Buyers Guide With Technology Comparison For Oxygen Plants 2021
No ratings yet
A-Type Buyers Guide With Technology Comparison For Oxygen Plants 2021
19 pages
Data Analytics
No ratings yet
Data Analytics
8 pages
Data Tracks
No ratings yet
Data Tracks
8 pages
Logical Data Model Project Plan
No ratings yet
Logical Data Model Project Plan
6 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
No ratings yet
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
100 pages
Informatica
No ratings yet
Informatica
7 pages
Research Methods Synopsis
No ratings yet
Research Methods Synopsis
22 pages
Design and Analysis of An Hydraulic Trash Compactor: Test Engineering and Management February 2020
No ratings yet
Design and Analysis of An Hydraulic Trash Compactor: Test Engineering and Management February 2020
13 pages
Big Data Use Case Template 2
No ratings yet
Big Data Use Case Template 2
27 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
Various Big Data Tools
No ratings yet
Various Big Data Tools
33 pages
1 - Understanding Big Data
No ratings yet
1 - Understanding Big Data
46 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
Alex Meadows
No ratings yet
Data Models
No ratings yet
Data Models
57 pages
Big Data
No ratings yet
Big Data
52 pages
Data Mapping
No ratings yet
Data Mapping
3 pages
ETL DataSanity
No ratings yet
ETL DataSanity
15 pages
Big Educational Data & Analytics Survey
No ratings yet
Big Educational Data & Analytics Survey
23 pages
What Is DataOps - The Ultimate DataOps Guide by Rivery
No ratings yet
What Is DataOps - The Ultimate DataOps Guide by Rivery
11 pages
Balloon Manual
No ratings yet
Balloon Manual
7 pages
Data Warehousing Chapter 1
No ratings yet
Data Warehousing Chapter 1
8 pages
The Recycling Folded Cascode A General Enhancement of The Folded Cascode Amplifier
No ratings yet
The Recycling Folded Cascode A General Enhancement of The Folded Cascode Amplifier
8 pages
Selecting The Right Data Warehouse For Analytics
No ratings yet
Selecting The Right Data Warehouse For Analytics
13 pages
Data Management Chapter1
No ratings yet
Data Management Chapter1
11 pages
Catch-up-Friday-Teaching-Guide-HG V - Week 3
No ratings yet
Catch-up-Friday-Teaching-Guide-HG V - Week 3
5 pages
Big Data: by It Faculty Alttc Ghaziabad
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
26 pages
AMCA Standard 99-0401-86 Classification For Spark Resistant Construction - REA HVAC
No ratings yet
AMCA Standard 99-0401-86 Classification For Spark Resistant Construction - REA HVAC
2 pages
Introduction To Big Data Management
No ratings yet
Introduction To Big Data Management
9 pages
Big Data Not Right Data Yes
No ratings yet
Big Data Not Right Data Yes
8 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
38 pages
Introduction To Data Management - Week 1 - 2024
No ratings yet
Introduction To Data Management - Week 1 - 2024
17 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Gender Inequality Reflected in Play Medea
No ratings yet
Gender Inequality Reflected in Play Medea
3 pages
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
No ratings yet
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
6 pages
Fundamentals of E-Commerce
No ratings yet
Fundamentals of E-Commerce
7 pages
Meeting DWH QA Challenges Part 1
No ratings yet
Meeting DWH QA Challenges Part 1
9 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
Big Data
No ratings yet
Big Data
3 pages
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
No ratings yet
China Orifice Forged Flanges Manufacturer & Supplier DHDZ
1 page
Aviation Ni-Cd BMT - Battery Maintenance Training
No ratings yet
Aviation Ni-Cd BMT - Battery Maintenance Training
2 pages
Big Data
No ratings yet
Big Data
11 pages
Sentence Correction Questions by e GMAT 10
No ratings yet
Sentence Correction Questions by e GMAT 10
18 pages
FLYFokker Leaflet Lavatory Modifications
No ratings yet
FLYFokker Leaflet Lavatory Modifications
2 pages
Big Data Analysis Guide
No ratings yet
Big Data Analysis Guide
11 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
Case Study
No ratings yet
Case Study
2 pages
Data Architecture Is Composed of Models
No ratings yet
Data Architecture Is Composed of Models
7 pages
Topics Hrs Topics Hrs Topics HRS: Databases: E-R Data Modeling Dimensional Modeling
No ratings yet
Topics Hrs Topics Hrs Topics HRS: Databases: E-R Data Modeling Dimensional Modeling
1 page
Enterprise Metadata Management Standard Requirements
From Everand
Enterprise Metadata Management Standard Requirements
Gerardus Blokdyk
No ratings yet
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
No ratings yet
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
7 pages

BigData Brief

Uploaded by

BigData Brief

Uploaded by

[ Big Data ]

What exactly is big data?

Put simply, big data is larger, more complex data sets,

Big data Types:

Big data use cases:

Challenge Description Solution Technology

Volume & Avoid chocking of network Move processing logic to

Variety Efficient storage of large and NoSQL Database

Velocity Monitoring streams too large For-shaped architecture

Big data Ecosystem:

• It comes from various sources like:

2- What are the V’s in Big Data?

• Volume: The huge amount of data stored in data warehouses.

3- What is the difference between Database and Big data?

Challenge Description Solution Technology

Volume & Avoid chocking of network Move processing logic to

Variety Efficient storage of large and NoSQL Database

Velocity Monitoring streams too large For-shaped architecture

5- What is the difference between ETL & ELT?

You might also like