Teradata Architecture PDF Free
Teradata Architecture PDF Free
Teradata Architecture PDF Free
Teradata Architecture
LEVEL – LEARNER
Icons Used
2
Module 1: Teradata basics
Objectives:
After completing this chapter you will be able to answer below
questions
• What is Teradata?
• What are the unique features of Teradata?
• What are Teradata components and its functions?
• What is Teradata Architecture?
Introduction to Teradata Database
• Parallel processing
– Each AMP holds a portion of the data and they them in parallel
• Linear Scalability
– Double the AMPS and double the speed
• Mature Optimizer
– PE is the Matured optimizer
• Automatic Data distribution
– Each table has Primary index which is hashed and distributes to AMP
automatically
• Shared Nothing Architecture
– Each AMP has their own Memory, CPU and disk, so called shared
Nothing Architecture
• Single Data Store
– Teradata scalability allows all data to be on one system. This is Single
data store
Teradata –Parallel processing
BYNET
Teradata – Linear Scalability
Teradata Components
• Parsing engine (PE)
• BYNET (BanYan NETwork)
• AMP
• Disk
What is a Node?
• Two SMP nodes connected via the BYNETs are now one
Massively Parallel Processing (MPP) system.
Teradata Functional Overview
Parsing
Engine Process
Elements
• Manages session activities, such as logon,
password validation, and logoff.
Session Control
• Recovers sessions following client or server
failures.
• Decomposes SQL into relational data
Parser
management processing steps.
• Determines the most efficient path to access
Optimizer
data.
• Receives processing steps from the parser
and sends them to the appropriate AMPs via
the BYNET.
Dispatcher
• Monitors the completion of steps and
handles errors encountered during
processing.
How does PE builds best plan?
• AMPS are responsible for storing and retrieving rows from their
assigned disk (Vdisk).
• AMPs lock the tables and rows.
• AMPs sort rows and do all aggregation.
• AMPs handle all space management and space accounting.
• AMPs convert ASCII to EBCDIC when returning answer sets to the
mainframe.
• In Teradata 13, the AMP Worker Task (AWT) per AMP is increased for better
performance.
All Teradata Tables are spread across ALL AMPS
Disk Array
• Each AMP Vproc is assigned to a disk
• A Vdisk may contain 119 GB of its disk space
Teradata Components
23
Test Your Understanding
Questions:
24
Summary
25
Module 2: RDBMS Overview
Objectives:
• After completing this chapter you will be able to answer the
following questions
• What is RDBMS?
• Describe Logical/Relational Modeling?
• What is the relationship between primary and
foreign keys?
• What are the advantages of Relational Modeling?
Introduction to RBMS
Flexibility: Different tables from which information has to be linked and extracted can be easily
manipulated by operators such as project and join to give information in the form in which it is desired.
Security: Security control and authorization can also be implemented more easily by moving sensitive
attributes in a given table into a separate relation with its own authorization controls. If authorization
requirement permits, a particular attribute could be joined back with others to enable full information
retrieval.
Data Independence: Data independence is achieved more easily with normalization structure used in
a relational database than in the more complicated tree or network structure.
Data Manipulation Language: The possibility of responding to query by means of a language based
on relational algebra and relational calculus e.g SQL is easy in the relational database approach. For data
organized in other structure the query language either becomes complex or extremely limited in its
capabilities.
Cater for future requirements: By having data held in separate tables, it is simple to add records
that are not yet needed but may be in the future. For example, the city table could be expanded to
include every city and town in the country, even though no other records are using them all as yet. A flat
file database cannot do this
Module 3: Teradata Index
Objectives:
After completing this chapter you will be able to answer below
questions
• What is Primary Index?
• What is Secondary Index?
• How data rows are stored and retrieved?
Indexing
A table can have only one Primary Index, but you can combine
up to 64 columns together max to form one Multi-Column
Primary Index.
Multi-Column Primary Index
• The Teradata Parsing Engine will take the Primary Index Value of a row and
run a math calculation called the Hash Formula on that Primary Index
column value.
• It produces 32 - bit row hash which equates to an integer
• The Row Hash will go to a bucket in the Hash Map and is assigned to an
AMP
32 – bit row hash 00000000000000000101 = 13
• Every Teradata System has one Hash Map with a million buckets. Inside the
buckets are AMP numbers
Placing rows on AMP
• The below example hashed Emp_No 1001 (Primary Index value) and the
output was a Row Hash of 13. Teradata counted over to bucket 13 in the
Hash Map, and it has the number one (1) inside that bucket. This means
that this row will go to AMP 1.
• Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5.
Teradata counted over to bucket 5 in the Hash Map, and it has the number
two (2) inside that bucket. This means that this row will go to AMP 2.
• There is one Hashing Formula in Teradata, and it is consistent.
Emp No 1001 Emp No 1002
Review of Hashing process
• Hash the Primary Index Value for a row with the Hash
Formula.
• The output of the Hash Formula is a 32-bit Row Hash.
• Take the Row Hash and find its corresponding bucket in the
Hash Map.
• Send the row and its Row Hash to the AMP listed in the
Hash Map Bucket.
Skew Factor
• NULL values in the Primary Index is the main reason for skew. A
Table with a Unique Primary Index can have only one Null value,
but a NUPI table can have many NULL values, and each NULL
value hashes to the same AMP.
Uniqueness Value
• Each AMP will place a Uniqueness Value after the row hash
to track duplicate values
• The Hash Formula is consistent so every Smith has the
same Row Hash and the same goes for each Jones and each
Patel. Therefore, duplicate values land on the same AMP.
Plan:
1. PE sees the last name as Priamry index
2. It hash Smith and get row hash
3. Row hash =7
4. Counts the bucket in hash map 7 times
and it says Amp 1
5. Passes message to AMP1 through
BYNET to retrieve row has 7’s
6. Bring back all columns for Row hash 7
(‘Smith’)
Binary Search - Example
Emp_no is a USI.
PE will hash 1004 and see which AMP holds row in subtable. (AMP 3).
PE will have the BYNET contact with AMP 3 and retrieves row 1004 (Single AMP).
AMP will pass the real row id of base table row (1,4) back up to PE.
PE will use the ROW –ID to find the base table row with another single AMP retrieve.
• Syntax
First_name is a NUSI.
PE will order each AMP to search if they have kyle’ in their NUSI subtable
Each AMP will simultaneously perform a binary search on their NUSI Subtable
If AMP has Kyle, PE will order them to retrieve the base row.
If there are 50 AMP’s, then all 50 AMP’s will perform a binary search simultaneously and
if they find ‘Kyle’ they perform another binary search on base table.
63
Summary
Objectives:
After completing this chapter, you will be able to answer the
following questions
What is Teradata database and user?
How are space allocated to Teradata objects?
What is the hierarchy of objects in Teradata syatem?
Space
Syntax:
CREATE DATABASE new_db FROM existing_db
AS
PERMANENT = 20000000
,SPOOL= 50000000
,TEMP = 20000000
Syntax:
CREATE USER new_user FROM existing_user
AS
PERMANENT = 10000000
PASSWORD =‘Acdmy’
,SPOOL= 50000000
,TEMP = 20000000
Objectives
After completing this module you will be able to answer
• How locks prevents loss of data integrity?
• What are the types of locking provided by Teradata?
• What are FALLBACK tables?
Locks
Assume in Employee_Table, we have four SQL statement first two are SELECT, third is
INSERT and fourth is SELECT.
Compatibility:
• Read supports other Read locks and Access Locks
• Write supports Access Lock
Cliques
Fallback tables
• One AMP down
– Data fully available
• Tow or more AMPs down
– In different cluster
• Data fully available
– In the same cluster
• System halts.
• RAID 1 provides each AMP two disks for storing data and two disks
for mirroring.
• The data disk and the mirror disk are called a mirrored pair.
• RAID 1 costs 50% of the disk space, but it ensures a 99% up time for
customers.
• If a single disk goes down, it is easily replaced and Teradata isn't
even effected
RAID
RAID 5(Parity):
• For every 3 blocks of data, there is a parity block on a 4th disk.
• If a disk fails, any missing blockmay be reconstructed using the
other three disks
• Array controller reconstruction of failed disk is longer than RAID
1
Summary:
• RAID 1: Good Performance with disk failures. Higher cost in
terms of disk space
• RAID 5: Reduced Performance with disk failures. Lower cost in
terms of disk space
Questions
84
Test Your Understanding
Disclaimer: Parts of the content of this course is based on the materials available from the
websites and books listed above. The materials that can be accessed from the linked sites
are not maintained by Cognizant Academy and we are not responsible for the contents
thereof. All trademarks, service marks, and trade names in this course are the marks of the
respective owner(s).
32
Change Log
34
Introduction to Teradata