Testing PDF
Testing PDF
Testing PDF
Testing the backend databases like comparing the actual results with expected results.
Data base testing basically include the following.
1) Data validity testing.
2) Data Integrity testing
3) Performances related to database.
4) Testing of Procedure, triggers and functions.
For doing data validity testing you should be good in SQL queries
For data integrity testing you should know about referential integrity and different constraint.
For performance related things you should have idea about the table structure and design.
For testing Procedure triggers and functions you should be able to understand the same.
***************************************************************************
****
Database testing is done using smaller scale of data normally with OLTP (Online transaction
processing) type of databases while data warehouse testing is done with large volume with data
involving OLAP (online analytical processing) databases.
In database testing normally data is consistently injected from uniform sources while in data
warehouse testing most of the data comes from different kind of data sources which are sequentially
inconsistent.
We generally perform only CRUD (Create, read, update and delete) operation in database testing
while in data warehouse testing we use read-only (Select) operation.
Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.
There are number of universal verifications that have to be carried out for any kind of data
warehouse testing.
Below is the list of objects that are treated as essential for validation in ETL testing:
Verify that data transformation from source to destination works as expected
Verify that expected data is added in target system
Verify that all DB fields and field data is loaded without any truncation
Verify data checksum for record count match
Verify that for rejected data proper error logs are generated with all details
Verify NULL value fields
Verify that duplicate data is not loaded
Verify data integrity
***************************************************************************
Extract
In this step we extract data from different internal and external sources, structured and/or
unstructured. Plain queries are sent to the source systems, using native connections, message
queuing, ODBC or OLE-DB middleware. The data will be put in a so-called Staging Area (SA), usually
with the same structure as the source. In some cases we want only the data that is new or has been
changed, the queries will only return the changes. Some tools can do this automatically, providing a
changed data capture (CDC) mechanism.
Transform
Once the data is available in the Staging Area, it is all on one platform and one database. So we can
easily join and union tables, filter and sort the data using specific attributes, pivot to another
structure and make business calculations. In this step of the ETL process, we can check on data
quality and cleans the data if necessary. After having all the data prepared, we can choose to
implement slowly changing dimensions. In that case we want to keep track in our analysis and
reports when attributes changes over time, for example a customer moves from one region to
another.
Load
Finally, data is loaded into a central warehouse, usually into fact and dimension tables. From there
the data can be combined, aggregated and loaded into datamarts or cubes as is deemed necessary.
***************************************************************************
****
***************************************************************************
****
***************************************************************************
****
BI used for?
Organizations use Business Intelligence to gain data-driven insights on anything related to business
performance. It is used to understand and improve performance and to cut costs and identify new
business opportunities, this can include, among many other things:
Gathering Data
Gathering data is concerned with collecting or accessing data which can then be used to inform
decision making. Gathering data can come in many formats and basically refers to the automated
measurement and collection of performance data. For example, these can come from transactional
systems that keep logs of past transactions, point-of-sale systems, web site software, production
systems that measure and track quality, etc. A major challenge of gathering data is making sure that
the relevant data is collected in the right way at the right time. If the data quality is not controlled at
the data gathering stage then it can harm the entire BI efforts that might follow – always remember
the old adage - garbage in garbage out
Storing Data
Storing Data is concerned with making sure the data is filed and stored in appropriate ways to ensure
it can be found and used for analysis and reporting. When storing data the same basic principles
apply that you would use to store physical goods – say books in a library – you are trying to find the
most logical structure that will allow you to easily find and use the data. The advantages of modern
data-bases (often called data warehouses because of the large volumes of data) is that they allow
multi-dimensional formats so you can store the same data under different categories – also called
data marts or data-warehouse access layers. Like in the physical world, good data storage starts with
the needs and requirements of the end users and a clear understanding of what they want to use the
data for.
Analyzing Data
The next component of BI is analysing the data. Here we take the data that has been gathered and
inspect, transform or model it in order to gain new insights that will support our business decision
making. Data analysis comes in many different formats and approaches, both quantitative and
qualitative. Analysis techniques includes the use of statistical tools, data mining approaches as well as
visual analytics or even analysis of unstructured data such as text or pictures.
Providing Access
In order to support decision making the decision makers need to have access to the data. Access is
needed to perform analysis or to view the results of the analysis. The former is provided by the latest
software tools that allow end-users to perform data analysis while the latter is provided through
reporting, dashboard and scorecard applications.
***************************************************************************
****
What is Metadata?
Metadata is defined as data that describes other data. Metadata can be divided into two main types:
structural and descriptive.
Structural metadata describes the design structure and their specifications. This type of metadata
describes the containers of data within a database.
Descriptive metadata describes instances of application data. This is the type of metadata that is
traditionally spoken of and described as “data about the data.”
Metadata makes it easier to retrieve, use, or manage information resources by providing users with
information that adds context to the data they‟re working with. Metadata can describe information at
any level of aggregation, including collections, single resources, or component part of a single
resource. Metadata can be embedded into a digital object or can be stored separately. Web pages
contain metadata called metatags.
Metadata at the most basic level is simply defined as “data about data”. An item of metadata
describes the specific characteristics about an individual data item. In the database realm, metadata
is defined as, “data about data, through which the end-user data are integrated and
managed.” Metadata in a database typically store the relationships that link up numerous pieces of
data. “Metadata names these fields, describes the size of the fields, and may put restrictions on what
can go in the field (for example, numbers only).”
“Therefore, metadata is information about how data is extracted, and how it may be transformed. It
is also about indexing and creating pointers into data. Database design is all about defining metadata
schemas.” Meta data can be stored either internally, in the same file as the data, or externally, in a
separate area. If the data is stored internally, the metadata is together with the data, making it more
easily accessible to view or change. However, this method creates high redundancy. If metadata is
stored externally, the searches can become more efficient. There is no redundancy but getting to this
metadata may be a little more technical.
All the metadata is stored in a data dictionary or a system catalog. The data dictionary is most
typically an external document that is created in a spreadsheet type of document that stores the
conceptual design ideas for the database schema. The data dictionary also contains the general
format that the data, and in effect the metadata, should be. Metadata is an essential aspect to
database design, it allows for increased processing power, due to the fact that it can help create
pointers and indexes.
***************************************************************************
****
***************************************************************************
****
Shared nothing architecture (SNA) is a distributed computing architecture which consists of multiple
nodes such that each node has it‟s own private memory, disks and input/output devices independent
of any other node in the network. Each node is self sufficient and shares nothing across the network.
Therefore, there are no points of contention across the system and no scope for data sharing or
system resources. This type of architecture is highly scalable and has become quite popular especially
in the context of web development.
For instance, Google has implemented an SNA which evidentially enables it to scale web applications
effectively by simply adding nodes in its network of servers without slowing down the system.
***************************************************************************
****
Symmetric Multiprocessing (SMP) is the processing of programs by multiple processors that share a
common operating system and memory. This SMP is also called as "Tightly Coupled Multiprocessing".
A Single copy of the Operating System is in charge for all the Processors Running in an SMP. This
SMP Methodology doesn‟t exceed more than 16 Processors. SMP is better than MMP systems when
Online Transaction Processing is Done, in which many users can access the same database to do a
search with a relatively simple set of common transactions. One main advantage of SMP is its ability
to dynamically balance the workload among computers ( As a result Serve more users at a faster rate
)
Massively Parallel Processing (MPP)is the processing of programs by multiple processors that work on
different parts of the program and share different operating systems and memories. These Different
Processors which run communicate with each other through message interfaces. There are cases in
which there are upto 200 processors which run for a single application. An Interconnect arrangement
of data paths allows messages to be sent between different processors which run for a single
application or product. The Setup for MPP is more complicated than SMP. An Experienced Thought
Process should to be applied when u setup these MPP and one shold have a good in-depth knowledge
to partition the database among these processors and how to assign the work to these processors.
An MPP system can also be called as a loosely coupled system. An MPP is considered better than an
SMP for applications that allow a number of databases to be searched in parallel.
***************************************************************************
****
***************************************************************************
SELECT HASHAMP()
**************************************************************************
To Find AMP_NUMBER corresponding with data in table
***************************************************************************
****
***************************************************************************
****
1. PRIMARY INDEX
UNIQUE PRIMARY INDEX[UPI]
NON-UNIQUE PRIMARY INDEX[NUPI]
2. SECONDARY INDEX
UNIQUE SECONDARY INDEX[USI]
NON-UNIQUE SECONDARY INDEX[NUSI]
4. HASH INDEX
5. JOIN INDEX
***************************************************************************
****
Indexes in Teradata are created at column level. Indexes check the duplication based on that
particular column on which Index is created.
***************************************************************************
****
***************************************************************************
****
How Many types of Tables in Teradata?
PERMANENT TABLES
TWO TYPES OF PERMANENT TABLES
SET TABLES [CREATED AS DEFAULT] – DOES NOT ACCEPT DUPLICATE RECORDS
MULTISET TABLES - ACCEPTS DUPLICATE RECORDS
TEMPORARY TABLES
THREE TYPES OF TEMPORARY TABLES
VOLTAILE TABLES
GLOBAL TABLES
DERIVED TABLES
Global Tables
They exist only for the duration of the SQL session in which they are used.
The contents of these tables are private to the session, and System Automatically drops the table at
the end of that session.
System saves the Global Temporary Table Definition Permanently in the Data Dictionary.
The Saved Definition may be Shared by Multiple Users and Sessions with Each Session getting its Own
Instance of the Table.
Volatile Tables
If you need a temporary table for a single use only, you can define a volatile table.
The definition of a volatile table resides in memory (RAM) but does not survive across a system restart.
It improves performance even more than using global temporary tables because the system does not
store the definitions of volatile tables in the Data Dictionary.
Access-rights checking is not necessary because only the creator can access the volatile table.
Derived Tables
A special type of temporary table is the derived table. It is specified in SQL SELECT statement.
A Derived Table is Obtained from One or More Other Tables as the Result of a Sub-Query.
Scope of A Derived Table is only Visible to the Level of the SELECT statement calling the Sub-Query.
Using Derived Tables avoids having to use the CREATE and DROP TABLE Statements for Storing
Retrieved Information and Assists in Coding More Sophisticated, Complex Queries.
***************************************************************************
****
***************************************************************************
****
-It is a temporary workspace which is used for processing Rows for given
SQL statements.
-Spool space is assigned only to users . -
-Once the SQL processing is complete the spool is freed and given to some
other query.
-Unused Perm space is automatically available for Spool .
***************************************************************************
****
4. Input/output bugs:-
Valid values not accepted
Invalid values accepted
5. Calculation bugs:-
Mathematical errors
Final output is wrong
9. H/W bugs:-
Device is not responding to the application
***************************************************************************
****
Types of ETL Testing:-
1) Constraint Testing:
In the phase of constraint testing, the test engineers identifies whether the data is mapped from
source to target or not.
The Test Engineer follows the below scenarios in ETL Testing process.
a) NOT NULL
b) UNIQUE
c) Primary Key
d) Foreign key
e) Check
f) Default
g) NULL
NOTE: To check the order of the columns and source column to target column.
14) Retesting:
Re executing the failed test cases after fixing the bug.
**************************************************************************
*****
What is secondary index? Whats are its uses?
A secondary index is an alternate path to the data. Secondary indexes are used to improve
performance by allowing the user to avoid scanning the entire table during a query. A secondary
index is like a primary index in that it allows the user to locate rows. Unlike a primary index, it has no
influence on the way rows are distributed among amps. Secondary indexes are optional and can be
created and dropped dynamically. Secondary indexes require separate subtables which require extra
i/o to maintain the indexes.
Comparing to primary indexes, secondary indexes allow access to information in a table by alternate,
less frequently used paths. Teradata automatically creates a secondary index subtable. The subtable
will contain:
When a user writes an sql query that has a si in the where clause, the parsing engine will
hash the secondary index value. The output is the row hash of the si. The pe creates a request
containing the row hash and gives the request to the message passing layer (which includes the
bynet software and network). The message passing layer uses a portion of the row hash to point to a
bucket in the hash map. That bucket contains an amp number to which the pe's request will be sent.
The amp gets the request and accesses the secondary index subtable pertaining to the requested si
information. The amp will check to see if the row hash exists in the subtable and double check the
subtable row with the actual secondary index value. Then, the amp will create a request containing
the primary index row id and send it back to the message passing layer. This request is directed to
the amp with the base table row, and the amp easily retrieves the data row.
Secondary indexes can be useful for :
Satisfying complex condition
Processing aggregates
Value comparison
Matching character combination
Joining tables
*********************************************************************
Write lock:write locks enable users to modify data while locking out all other users except
readers not concerned about data consistency (access lock readers). Until a write lock is released, no
new read or write locks are allowed.
Access lock:access locks can be specified by users who are not concerned about data
consistency. The use of an access lock allows for reading data while modifications are in process.
Access locks are designed for decision support on large tables that are updated only by small single
row changes. Access locks are sometimes called “stale read” locks, i.e. You may get „stale data‟ that
hasn‟t been updated.
Locks may be applied at three levels:
Database – applies to all tables/views in the database
Table/view – applies to all rows in the table/views
Row hash – applies to all rows with same row hash