BI Testing Tutorial V1.0

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

B.I.

Testing
.......torture the data

By:

Nikhil Bajaj
(Bachelor of Engineering in Information Technology)
( B.I. tester in iGATE Patni )

B.I. Testing.torture the data

Version 1.0------------August 2011 INDEX

Sr.no. Topic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Challenges in the BI Testing field Expectations from the IT Industry BI/DW What is BI? Business Intelligence and Data Warehouse What is a Data Warehouse? Generally how does the data flow in a Data Warehouse? What is a Data Mart? What is ETL? BI/DW Testing What is the need to test a Data Warehouse? Data Warehouse Testing and Database Testing Type of testing done in a Data Warehouse project Who all are involved in testing a data warehouse? What are the phases undergone by the QA team? How does the QA team prepare test cases? Query Format Example What are the tools that a QA team may use?

Page No. 2 4 5 5 7 8 10 11 12 13 13 14 15 17 18 19 21 25 28

Mail your queries to [email protected]

Page 1

B.I. Testing.torture the data

Challenges in the BI Testing field: There are many challenges to the development of the specialized skills required for BI testing: 1. Unwillingness on the part of DW developers Any IT professional planning to build a career in this exciting field aims to be an expert ETL developer, OLAP specialist, dimensional data modeler or DW architect; DW tester doesn't even make the list of desirable roles. This is due to the false perception that only such roles carry premium rates in the job market and only such roles get to face the technical challenges associated with a BI project. This has left the BI project team with very few takers for the challenging and critical role of tester. 2. Lack of awareness As a general practice, testers plan their career in such a way that they specialize and equip themselves with technical skills for the tools involved in test execution (e.g., Winrunner, SilkTest) and test management (e.g., Quality Center), with very little endeavor to develop skills in the underlying technology. But a good understanding of ETL/OLAP tools and technologies is an essential skill for BI testing and, so far, testers have not developed a keen interest in this skill. 3. Absence of tools The BI marketplace is flooded with many tools and vendors, each attempting to replace the other in the three layers of BI: database, ETL and OLAP. But there are no popular ETL/OLAP testing tools in the market that offer features for automated testing or functional testing.

Mail your queries to [email protected]

Page 2

B.I. Testing.torture the data

4. Lack of standard approach/methodology While standard methodologies exist for testing as a whole, there seems to be no industry-wide view on the suggested approach or methodology for BI testing. An ideal methodology should include a test strategy, a test plan and test cases that cover thorough testing of the various phases of data movement. Creating test cases and test data that provide adequate coverage to each of the phases is very critical for ensuring a comprehensive quality assurance (QA) of the DW.

Mail your queries to [email protected]

Page 3

B.I. Testing.torture the data

Expectations from the IT Industry Listed below are some initiatives that can provide the much-needed boost to BI testing field: 1. Promote awareness within the DW community that BI testing is a challenging proposition requiring highly valued skills, thereby encouraging ETL and BI developers to assume these roles. Moreover, leading IT players with extensive experience in the DW/BI area should promote well-defined career options and career progression plans to the ETL/OLAP developers and conventional testers. 2. Invest in research to formalize methodologies covering the entire spectrum of DW/BI testing in full detail. 3. Invest in building assets, tools and job aids to strengthen this function and provide productivity gains. 4. Develop training courses and course content to cross-train ETL/OLAP developers in testing nuances and testers in DW and ETL/OLAP tools and technology concepts. 5. Build strong testing teams with complimentary skills.

The topics covered in this document are prepared according to the above challenges faced. Keeping in mind all these challenges along with the expectations from us, let us first start with what is Business Intelligence, then we will see why is it often used with the term Data Warehouse, then what is Data Warehouse, Data Mart, ETL and what is the difference between database testing and data warehouse testing and finally go into the details of what is BI testing.

Mail your queries to [email protected]

Page 4

B.I. Testing.torture the data

To test an object, first we need to understand what is that object. So let us start with understanding Business Intelligence so that we can learn how to test it. BI/DW: What is BI? BI is an abbreviation of the two words Business Intelligence, bringing the right information at the right time to the right people in the right format. Definition: It is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. Explanation (What is BI about?): The five key stages of Business Intelligence: 1. Data Sourcing 2. Data Analysis 3. Situation Awareness 4. Risk Assessment 5. Decision Support 1. Data sourcing Business Intelligence is about extracting information from multiple sources of data. The data might be: text documents - e.g. memos or reports or email messages, photographs and images, sounds, formatted tables, web pages and URL lists. The key to data sourcing is to obtain the information in electronic form. So typical sources of data might include: scanners, digital cameras, database queries, web searches, computer file access, etc.

Mail your queries to [email protected]

Page 5

B.I. Testing.torture the data

2. Data analysis Business Intelligence is about synthesizing useful knowledge from collections of data. It is about estimating current trends, integrating and summarizing disparate information, validating models of understanding, and predicting missing information or future trends. This process of data analysis is also called data mining or knowledge discovery. 3. Situation awareness Business Intelligence is about filtering out irrelevant information, and setting the remaining information in the context of the business and its environment. The user needs the key items of information relevant to his or her needs, and summaries that are syntheses of all the relevant data (market forces, government policy etc.). Situation awareness is the grasp of the context in which to understand and make decisions. 4. Risk assessment Business Intelligence is about discovering what plausible actions might be taken, or decisions made, at different times. It is about helping you weigh up the current and future risk, cost or benefit of taking one action over another, or making one decision versus another. It is about inferring and summarizing your best options or choices. 5. Decision support Business Intelligence is about using information wisely. It aims to provide you warning of important events, such as takeovers, market changes, and poor staff performance, so that you can take preventative steps. It seeks to help you analyze and make better business decisions, to improve sales or customer satisfaction or staff morale. It presents the information you need, when you need it.

Mail your queries to [email protected]

Page 6

B.I. Testing.torture the data

Business Intelligence and Data Warehouse Business intelligence is a term commonly associated with data warehousing. In fact, many of the tool vendors position their products as business intelligence software rather than data warehousing software. This is because often BI applications use data gathered from a data warehouse or a data mart. However, not all data warehouses are used for business intelligence, nor do all business intelligence applications require a data warehouse. In this document, we are considering DW and BI testing as the same. Difference: Business intelligence usually refers to the information that is available for the enterprise to make decisions on. A data warehousing (or data mart) system is the backend, or the infrastructural component for achieving business intelligence.

Mail your queries to [email protected]

Page 7

B.I. Testing.torture the data

What is a Data Warehouse? Abbreviated DW, a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. A data warehouse is a place where data is stored for archival, analysis and security purposes. Usually a data warehouse is either a single computer or many computers (servers) tied together to create one giant computer system. Definition: A data warehouse is a subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management's decision making process. Explanation: The important characteristics of a Data Warehouse: 1. Subject Oriented 2. Integrated 3. Time-variant 4. Non-volatile 1. Subject Oriented It contains data that gives information about a particular subject instead of about a company's ongoing operations. 2. Integrated It contains data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. 3. Time-variant All data in the data warehouse is identified with a particular time period.

Mail your queries to [email protected]

Page 8

B.I. Testing.torture the data

4. Non-volatile Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

Mail your queries to [email protected]

Page 9

B.I. Testing.torture the data

Generally how does the data flow in a Data Warehouse?

Staging (Flat File) .txt File ETL Source 1. Staging History 2. Staging Error

Data Warehouse

Data Mart

Mail your queries to [email protected]

Page 10

B.I. Testing.torture the data

What is a Data Mart? Data Mart is subset of the data warehouse. It is a repository of data that holds information on a specific business area, for example Sales. Data Marts have the same definition as data warehouse but have limited audience and/or data content. So now the question is how do we move or copy the data from everyday transactional database to data warehouse? Here is where ETL comes to play.

Mail your queries to [email protected]

Page 11

B.I. Testing.torture the data

What is ETL? Short for Extract, Transform, Load, three database functions that are combined into one tool to pull data out of one database and place it into another database. Definition: ETL is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database. Explanation: The ETL process has 3 main steps, which are Extract, Transform and Load. 1. Extract The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process. 2. Transform Once the data has been extracted and converted in the expected format, its time for the next step in the ETL process, which is transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc. 3. Load The final ETL step involves loading the transformed data into the destination target, which might be a database or data warehouse.

Mail your queries to [email protected]

Page 12

B.I. Testing.torture the data

BI/DW Testing: The main difference between normal testing and testing a data warehouse is that we basically involve the SQL queries in our test case documents. What is the need to test a Data Warehouse? 1. Data selection from multiple source systems and analysis that follows, pose great challenge. 2. Volume and the complexity of the data. 3. Inconsistent and redundant data in a data warehouse. 4. Loss of data during the ETL process. 5. Non-Availability of comprehensive test bed 6. Critical Data for Business.

Mail your queries to [email protected]

Page 13

B.I. Testing.torture the data

Data Warehouse Testing and Database Testing All data warehouses are database, but not all databases are data warehouse. A Data Warehouse is a database that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analyzed far more efficiently as compared to your regular OLTP application databases. Testing a database and testing a data warehouse are more or less the same except for some points as follows: 1. The ETL processes together form a DW, so ETL testing is the main component of DW testing. 2. Since data warehouse is mainly used for reporting purpose, it becomes necessary to test the reporting functionality of it. 3. Data warehouses store very large amount of data as compared to databases. So testing the performance of a DW is also recommended. Whereas in databases, performance is not an issue. 4. Data warehouses have to store the historic data and this feature has to be checked in DW testing. Whereas in databases, historic data can be seen very rarely. This document mainly focuses on the ETL testing part.

Mail your queries to [email protected]

Page 14

B.I. Testing.torture the data

Type of testing done in a Data Warehouse project: The type and number of test performed on a data warehouse varies with projects. Some of the common ones are: 1. Requirement testing: Requirement testing is conducted before any other level of testing. It verifies whether or not all the requirements provided in the specification are fulfilled. 2. ETL testing: In the ETL testing stage, we make sure that appropriate changes in the source system are captured properly and propagated correctly into the data warehouse. 3. Smoke Testing: A smoke test is a collection of written tests that are performed on a system prior to being accepted for further testing. This is also known as a build verification test. 4. Functional Testing: In the functional testing stage, we make sure all the business requirements are fulfilled. 5. Unit Testing: Developers perform tests on their deliverables during and after their development process. The unit test is performed on individual components and is based on the developer's knowledge of what should be developed. 6. Integration Testing: Here we validate the business and functional requirement from which data according to correct business rules should produce the correct number of rows being transferred and to verify the data load volumes.

Mail your queries to [email protected]

Page 15

B.I. Testing.torture the data

7. Regression Testing: Validate that the system continues to function correctly after being changed. It is performed after a defect reported has been fixed by developer. 8. End-to-end testing: In the end-to-end testing stage, we let the system run for a few days to simulate production situations. 9. System Testing: System Testing is performed to prove that the system meets the functional specifications from an end to end perspective. We as a testing team will verify that the data in the source system databases and the data in the target are consistent through out the process. Here QA environment should be the replica of Production prior running the system test. 10.User Acceptance Testing: The objective of user acceptance testing is to certify that a release meets user expectations and is ready for production.

Mail your queries to [email protected]

Page 16

B.I. Testing.torture the data

Who all are involved in testing a data warehouse?


1. Business Analysts gather and document requirements 2. QA Testers develop and execute test plans and test scripts 3. Infrastructure people set up test environments 4. Developers perform unit tests of their deliverables 5. DBAs test for performance and stress 6. Business Users perform functional tests including User Acceptance

Tests (UAT)

QA, short for Quality Assurance is any systematic process of checking to see whether a product or service being developed is meeting specified requirements. Many companies have a separate department devoted to quality assurance, known as the QA team.

Mail your queries to [email protected]

Page 17

B.I. Testing.torture the data

What are the phases undergone by the QA team? While implementing the best practices at testing, the QA teams follow the various phases in data warehouse testing. They are: 1. Business understanding a. High Level Test Approach b. Test Estimation c. Review Business Specification d. Attend Business Specification and Technical Specification e. Walkthroughs 2. Test plan creation, review and walkthrough 3. Test case creation, review and walkthrough 4. Test Bed & Environment setup 5. Receiving test data file from the developers 6. Test predictions creation, review (Setting up the expected results) 7. Test case execution and regression testing if required. a. Comparing the predictions with the actual results by testing the business rules in the test environment. b. Displaying the comparison result in a separate worksheet. 8. Deployment a. Validating the business rule in the production environment.

Mail your queries to [email protected]

Page 18

B.I. Testing.torture the data

How does the QA team prepare test cases? This topic is very important for a test engineer who is responsible for writing the test cases. There are certain types of checks that can be done on the data under review: 1. 2. 3. 4. 5. 6. Attribute check Current Row check Duplicate check Original Key check Reconciliation check Relationship check

1. Attribute check Attribute check means verifying that the data is moving correctly from source table to target table. 2. Current Row check Current Row check means verifying that the current indicator is Y (an indicator for latest record) for all the latest rows (with latest time stamp). 3. Duplicate check Duplicate check means checking that there are no duplicate values for columns that are required to be unique. 4. Original Key check Original Key check means checking whether the NOT NULL columns have some value in them. 5. Reconciliation check Reconciliation check means verifying that the number of rows in target and the number of rows coming from source are the same.

Mail your queries to [email protected]

Page 19

B.I. Testing.torture the data

6. Relationship check Relationship check means checking that every primary key value in child table is present in parent table.

Mail your queries to [email protected]

Page 20

B.I. Testing.torture the data

Query Format Given below are the formats for writing SQL queries to perform all types of checks. 1. Attribute check Select count(1) From( Select source table attributes From source table Where list of conditions Except Select corresponding target table attributes From target table Where list of conditions )alias(alternate name) Expected output: Count=0 In the above query, we are first retrieving all the attributes from source table which are mapped to target and then removing from this list all the attributes that are present in target table. So the result count should be zero, meaning that all the attributes that are present in source table are present in target table and the test case can be passed.

2. Current Row check Select count(1) From( Select records From table_1 Where list of conditions(records with current time stamp but having indicator N) )alias

Mail your queries to [email protected]

Page 21

B.I. Testing.torture the data

Assumption: Indicator for current record: Y Indicator for old record: N Expected output: Count=0 In the above query, we are retrieving those records which have current timestamp but still their indicator is N. So if the result count is zero, it means that there are no such records who are current but have an indicator of being old and the test case can be passed.

3. Duplicate check Select count(1) From( Select attribute_list_1 From table_1 Where list of conditions Group by attribute_list_1 Having count(1)>1 )alias Expected output: Count=0 In the above query, we are retrieving the attributes which are supposed to be unique and then grouping them in the same order in which they were retrieved. This will group all the records which have these attributes duplicated and so the count will be greater than 1 for such records. When we take the count of such duplicate records and we get zero output, then this shows that there are no duplicate values for unique columns and the test case can be passed.

Mail your queries to [email protected]

Page 22

B.I. Testing.torture the data

4. Original Key check Select count(1) From table Where list of conditions And (any of NOT NULL values are NULL) Expected output: Count=0 In the above query, we are retrieving all the records which have any of the NOT NULL columns as NULL and then taking count of it. If the count is zero, this means there are no such records and the test case can be passed.

5. Reconciliation check Select count(*) From source table Where list of conditions Select count(*) From target table Where list of conditions Expected output: Source count = Target count In the above check, there are two queries, one fetching the count of total number of records in source table and the other fetching the count of total number of records in target table. If both the counts are same, this means that there are equal number of records in source and target and the test case can be passed.

Mail your queries to [email protected]

Page 23

B.I. Testing.torture the data

6. Relationship check Select count(child_id) From( Select parent_attribute_to_be_checked parent_id, Child_attribute_to_be_checked child_id From( Select distinct attributes from child table Left outer join Select distinct attributes from parent table On join conditions ) ) Where parent_id IS NULL Expected output: Count=0 In the above query, we are retrieving all the records in target table which has no parent in source table and then taking its count. If the count is zero, this means that there are no such records and the test case can be passed. Checking lookup condition is the most common example for this type of check.

Mail your queries to [email protected]

Page 24

B.I. Testing.torture the data

Example Below example will make the above queries easy to understand. Consider a source table STUDENTS and a target table FIRST_CLASS_STUDS. We have to test whether the transformations between these two tables given in the mapping document are working properly or not. Below table shows the mapping between the two tables. Source table STUDENTS STUDENTS STUDENTS Source columns SR. NO NAME Target table Target columns Transformation

STUDENTS STUDENTS STUDENTS STUDENTS

Capitalize each letter ROLL_NO FRST_CLAS_STUDS ROLL_NO It should be (P.K.) present in the source table PERCENTAGE FRST_CLAS_STUDS PERCENT Direct mapping CLASS ADDRESS DOB

FRST_CLAS_STUDS NAME

Attribute check: In target table, 3 columns are mapped from source table which have their own individual transformations. We have to test each attribute that is present in the target table keeping aside the other attributes in source table which are not mapped. Select count(*) from (Select upper(S.NAME), S.ROLL_NO, S. PERCENTAGE From STUDENTS S Where S. PERCENTAGE >= 60 Except Select F.NAME, F.ROLL_NO, F.PERCENT From FRST_CLAS_STUDS F) A;

Mail your queries to [email protected]

Page 25

B.I. Testing.torture the data

Expected output: Count=0

Duplicate check: In target table, attribute ROLL_NO is the primary key. So it has to be unique. We have to test whether this attribute is unique or not. Select count(*) from ( Select ROLL_NO from FRST_CLAS_STUDS Group by ROLL_NO Having count(*)>1) A; Expected output: Count=0

Original key check: In target table attribute ROLL_NO is the primary key. So it has to be NOT NULL. We have to test whether this attribute has values for all the records or not. Select count(*) from FRST_CLAS_STUDS Where ROLL_NO is NULL; Expected output: Count=0

Reconciliation check: We have to test whether correct number of records has been moved from source to target. Select count(*) from STUDENTS Where PERCENTAGE >= 60; Select count(*) from FRST_CLAS_STUDS; Expected output: count from 1st query = count from 2nd query

Mail your queries to [email protected]

Page 26

B.I. Testing.torture the data

Relationship check: In target, the attribute ROLL_NO is derived from attribute ROLL_NO in source. So we have to check whether all roll numbers in target are present in source or not. Select count(F. ROLL_NO) from ( select distinct F.ROLL_NO from FRST_CLAS_STUDS F Left outer join Select distinct S.ROLL_NO from STUDENTS S) Where S.ROLL_NO is NULL;

Mail your queries to [email protected]

Page 27

B.I. Testing.torture the data

What are the tools that a QA team may use? 1. Data access tools (e.g., TOAD, WinSQL) are used to analyze content of tables and to analyze results of loads. 2. ETL Tools (e.g. Informatica, Datastage). 3. Test management tool (e.g. Test Director, Quality Center) that maintains and tracks the requirements, test cases, defects and traceability matrix.

All the best for your future as a data warehouse or database tester!!!!!!!!!!!

Mail your queries to [email protected]

Page 28

You might also like