0% found this document useful (0 votes)
112 views10 pages

Big Data Testing

Big data testing involves testing large and complex datasets that are difficult to process using traditional methods. It includes validating data at various stages like source, ETL processes, and the target data warehouse. Tools like Hadoop, Pig, Hive, Sqoop, and Kafka are used at different stages. Automation testing faces challenges due to large datasets, latency, and tools not being equipped to handle unexpected issues during performance and end-to-end testing.

Uploaded by

minal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views10 pages

Big Data Testing

Big data testing involves testing large and complex datasets that are difficult to process using traditional methods. It includes validating data at various stages like source, ETL processes, and the target data warehouse. Tools like Hadoop, Pig, Hive, Sqoop, and Kafka are used at different stages. Automation testing faces challenges due to large datasets, latency, and tools not being equipped to handle unexpected issues during performance and end-to-end testing.

Uploaded by

minal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

BIG DATA TESTING

BIG DATA ?

• It is a term for collection of data sets so large & complex that becomes difficult
to process using traditional data processing applications.

Big Data

Activities
Normal
Processing
Capabilities

Content Volume
• Social Networking sites like Facebook, LinkedIn, Twitter etc.,

• Mobile device data such as  Text messages,  Calls data, Apps data etc.,

“Big Data” Sources • Internet Transactions like e-Commerce websites, banking activities etc

• Network devices/ sensors data like weather forecasting, temp etc.,


Need of RDBMS

• Very Quick in response

• Enables relation between data elements to be defined &


managed

Traditional Data Processing • Single DB can be utilized for all applications

Limitations of traditional approach

• Data processing takes too long as the volume of data increases

• Not Scalable
Business Master data Transactions
Strategy
Business Processes
OLTP
Operations

OLTP & OLAP Information

OLAP
Business Data
Warehouse
Data Mining
Analytics
5 Vs

• Volume

• Velocity
Big Data
Characteristics • Variety

• Value

• Veracity
 Apache Hadoop is a framework that allows distributed processing
of large datasets across clusters of commodity of computers using
a simple programming model

 It is an architecture that can scale with huge volumes, variety and


speed requirements of big data by distributing the workload
across various commodity servers that process the data in parallel.

Goals of HDFS:

HADOOP  Fast recovery from


hardware failures.

 Access to streaming
data

 Accommodation of
large data sets

 Portability
Phases in Big Data Testing

Test Entry Points

• Data Staging Validation


Data
Source
(RDBMS, • Map reduce Validation
MongoD
Source ETL Target Data B
HADOOP Process Warehouse I
B, social
media • Output Validation
data etc)
 HDFS – For data storage

 Pig & Hive / Map reduce - for Processing


& Transforming data

 Sqoop – For bulk transfer of data between


Tools Used in Big Data Scenarios
RDBMS and HDFS

 Kafka – For real-time data streaming


• TestingWhiz -  Helps in verifying structured &
unstructured data sets, schemas at different
sources such as Hive, Map reduce, Sqoop &
Pig

• QuerySurge – Helps in end – end testing

Automation Tools &


Challenges in Big Data Testing Challenges in Big data testing:

Large datasets & possible latency.

Automation tools may not be well equipped to


handle unexpected challenges.

Performance testing

You might also like