100% found this document useful (1 vote)
78 views4 pages

BigData Brief

Big data refers to large, complex datasets that cannot be processed by traditional data processing software. It comes from various sources like smartphones, social media, and online transactions. The key aspects of big data are volume, velocity, and variety. Common big data tools include Hadoop, Spark, MapReduce, Hive, and Impala which help address challenges like data volume, velocity, variety, and veracity. ETL extracts data from sources, transforms it, and loads it into a data warehouse, while ELT loads raw data and then transforms it within the data destination.

Uploaded by

Farid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
78 views4 pages

BigData Brief

Big data refers to large, complex datasets that cannot be processed by traditional data processing software. It comes from various sources like smartphones, social media, and online transactions. The key aspects of big data are volume, velocity, and variety. Common big data tools include Hadoop, Spark, MapReduce, Hive, and Impala which help address challenges like data volume, velocity, variety, and veracity. ETL extracts data from sources, transforms it, and loads it into a data warehouse, while ELT loads raw data and then transforms it within the data destination.

Uploaded by

Farid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

[ Big Data ]

What exactly is big data?


The definition of big data is:
data that contains greater variety,
arriving in increasing volumes
and with more velocity

Put simply, big data is larger, more complex data sets,


especially from new data sources. These data sets are
so voluminous that traditional data processing software
just can’t manage them. But these massive volumes of data
can be used to address business problems you wouldn’t have been able to tackle before.

Big data Types:

Big data use cases:


• Customer segmentation
• Marketing campaign optimization
• Product development
• Social media sentiment analysis
• Supply chain optimization
• Financial analysis
• Fraud detection
• Image and speech recognition
• Personalized medicine
• Energy management
• Cybersecurity
• Smart city planning
• Traffic analysis
Big data Technology Challenges:

Challenge Description Solution Technology


Volume & Avoid risk of data loss from Replicate segments s of
Resilience machine failure in clusters of data in multiple machines,
commodity machines master nodes keeps track
of segments location

Volume & Avoid chocking of network Move processing logic to


Velocity bandwidth when moving where the data is stored,
large volumes of data using parallel processing
algorithms

Variety Efficient storage of large and NoSQL Database


small data objects (documents, graphs, key-
value pairs. Etc.)

Velocity Monitoring streams too large For-shaped architecture


to store to process data as a
stream and as a batch

Big data Ecosystem:


[ Q&A ]
1- What is Big Data, and where does it come from?
• big data is larger, more complex data sets, these data sets are so voluminous that
traditional data processing software just can’t manage them. Big Data comprises
unstructured and structured data sets such as videos, photos, audio, websites, and
multimedia content.

• It comes from various sources like:


o Smartphones
o Internet cookies
o Social media posts
o Online purchase transaction

2- What are the V’s in Big Data?

• Volume: The huge amount of data stored in data warehouses.


• Velocity: Velocity basically introduces the pace at which data is being produced in real-time.
• Variety: Big Data comprises structured, unstructured, and semi-structured data collected
from varied sources.
• Veracity: Data veracity basically relates to how reliable the data is, we can define it as the
quality of the data analyzed.

3- What is the difference between Database and Big data?


There are major differences in:
• Size
Traditional data sets tend to be measured in gigabytes and terabytes
Big data is usually measured in petabytes, zettabytes, or exabytes
• Sources
Traditional data derives from enterprise-level sources like ERP and CRM
Big data derives from a broader range of enterprise and non-enterprise-level data
• Organization
Traditional databases, like SQL & Oracle DB use fixed schema that is static and preconfigured.
Big data uses a dynamic schema. In storage, big data is raw and unstructured.
• Architecture
Traditional data is typically managed using a centralized architecture
Big data uses a distributed architecture.
4- What are the tools in big data?
What are (Hadoop/Spark / MapReduce / Hive / impala/ Impala/ Kafka/…)?
Bigdata tools & technologies are used to solve the bigdata challenges:

Challenge Description Solution Technology


Volume & Avoid risk of data loss from Replicate segments s of
Resilience machine failure in clusters of data in multiple machines,
commodity machines master nodes keeps track
of segments location

Volume & Avoid chocking of network Move processing logic to


Velocity bandwidth when moving where the data is stored,
large volumes of data using parallel processing
algorithms

Variety Efficient storage of large and NoSQL Database


small data objects (documents, graphs, key-
value pairs. Etc.)

Velocity Monitoring streams too large For-shaped architecture


to store to process data as a
stream and as a batch

5- What is the difference between ETL & ELT?


• ETL stands for Extract, Transform, and Load, while ELT stands for Extract, Load, and Transform.
o In ETL, data flows from the data source to staging to the data destination.
o ELT lets the data destination do the transformation, eliminating the need for data staging.

• ETL can help with data privacy and compliance, cleansing sensitive data before loading into
the data destination, while ELT can handle large volumes of unstructured data.

You might also like