0% found this document useful (0 votes)

5 views

Database

Uploaded by

shouryakaushik22234

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Database

Uploaded by

shouryakaushik22234

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 7

=> What is Database

- DB is a collection of records(collection of rows or collection of json

documents)

=> What is Transaction

- It is a collection of queries that are treated as one unit of work
- In sequel DB you have data into different tables and we want operation to
perform in all/some tables so we use use Transaction.
- Eg Account deposit(Select/check the current amount in account, Update amount
after deduction, Update the account after adding the money)

=> Transaction Life Span

- TA BEGAIN
- TA COMMIT
- TA ROLLBACK

=> Nature of Transactions

- Usually TA are used to change and modify data
- However, it is perfectly normal to have a read only TA
eg, u want to generate a report and u want to get consistent snapshot based
on the time of TA

- If we do any operation in db it always happen in TA, some are done by us some by

default by DB

=> What is ACID (Atomicity, Consistency, Isolation, Durability)

=> Atomicity
- means TA is one unit of work and can not be split.
- All queries in a TA must succeed
- if one query fails, all prior successful queries in the TA should rollback
- if the DB went down prior to a commit of TA, all the successful queries in
the TA should rollback, so when DB back up it should
rollback and clear the prior commits
- Lack of atomicity led to inconsistency

=> Isolation
- For the DB we can have multiple TCP connections and all might have there
on TA started, can my inflight TA see changes made by other TA?

=> Read phenomena

- Dirty Read -> means u are in a TA and other TA wrote something but did
not commit then u read the value from other TA. Now whatever u read might
not get commit or get rollback, so, u get
inconsistent value. so whatever u read is dirty read.
(A transaction reads data written by a concurrent
uncommitted transaction.)

- non-repeatable read -> A transaction re-reads data it has previously

read and finds that data has been modified by another transaction.

- Phantom Read -> A transaction re-executes a query returning a set of

rows that satisfy a search condition and finds that the set
of rows satisfying the condition has changed due to
another recently-committed transaction.

- Lost Updated -> U wrote something in a TA but did not commit and some
other TA updated that value u wrote, now if u try to
read the value u wrote, it not there. So it is lost
update

Different Isolation levels helps in stopping the above read phenomena.

- Isolation - Isolation levels for inflight TA(The SQL standard defines four
levels of transaction isolation.The most strict is Serializable)
- Serialisable -> which is defined by the standard in a paragraph which
says that any concurrent execution of a set of Serializable transactions
is guaranteed to produce the same effect as running
them one at a time in some order.It is slowest.

(The other three levels are defined in terms of phenomena, resulting from
interaction between concurrent transactions)
- Read uncommitted -> No isolation, any change from outside is visible to
the TA, committed or not.
All read Phenomena can happen may happen
- Read committed -> Each query in a TA only sees committed changes by other
TA, means if u have a long running TA and some other TA commit something
ur TA can read that. All read Phenomena can happen
except dirty read
- Repeatable Read -> The TA will make sure that when a query reads a row,
that row will remain unchanged while it is running.(RR means I will read
same value in whole TA). It can not get read
of phantom read may happen but not in POSTGRES DB
- Snapshot -> Each query in a TA only sees changes that have been committed
up to the start of TA. It's like snapshot version of DB at that moment.
This guarantees to get rid of all read phenomena.

- DB implementation of Isolation
- Each DBMS implements Isolation level differently
- Pessimistic - Row level locks, table locks, page locks to avoid lost
updates
- Optimistic - No locks, just track if things changed and fail the TA if so
- Repeatable read locks the rows it reads but it could be expensive if u
read lot of rows, Postgres implements RR as snapshot. That is why you don't
get phantom reads with Postgres in RR.

- POSTGRES ISOLATION LEVEL

(https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED)
- Read Committed is the default isolation level in PostgreSQL means if u are in
a TA and u read something and during that time another TA
update the record u read and if u read again then u will get old data + new
data by other TA

=> Consistency =>

- Data is in a consistent state when a transaction starts and when it ends.
- In the context of databases, Consistency is Correctness, which means that
under no circumstance will the data lose its correctness
- Database systems allow us to define rules that each data record in db has
to follow. These rules can be defined on a db using
constraints, cascades, triggers. Eg foreign key, check constraints, on
delete cascade.
- We can have 2 consistency 1. Data Consistency 2. Read Consistency
- Defined by User as user split the related data into diff tables and define
rules like constraints and foreign key so data
stay consistent in all related tables.
- lack of atomicity led to inconsistency
- lack of isolation led to inconsistent result.
- If a update happens on a table and some other user read that updated value
at same moment after update if the user do not get
updated value then there is inconsistency in data.
- Eventual consistency means data will get eventually consistent but if data
get corrupted then there is not eventual consistency coming
for that data(this eventual consistency might happened in case of reads
means eventual u will get correct value)
(https://www.udemy.com/course/database-engines-crash-course/learn/
lecture/22485170#overview)

=> Durability =>

- Changes made by committed TA must be persisted in a durable non-volatile
storage(means SSD or hard-drive)
- Durability is a database feature that guarantees the recording of committed
transactions even if the server crashes or loses power.
However, durability adds significant database overhead.durability is still
guaranteed in case of a crash of the database software;
only an abrupt operating system crash creates a risk of data loss or
corruption when these settings are used.
- Durability techniques
- WAL - Write ahead log
- Asynchronous snapshot

*** ACID BY PRACTICAL EXAMPLE => MOST IMPORTANT

(https://www.udemy.com/course/database-engines-crash-course/learn/lecture/
24846622#overview) ***

=> Difference between Serialisable and Repeatable read

(https://www.udemy.com/course/database-engines-crash-course/learn/lecture/
25758282#overview)

=> INTERNALS OF POSTGRES

=> Where and How Postgres store the data into memory
(https://www.udemy.com/course/sql-and-postgresql/learn/lecture/22802643#overview)
Watch whole playlist.

=> Heap or Heap File -> File that contains all the data(rows) of our table
=> Tuple or Item -> Individual row from the table
=> Block or Page -> The heap file is divided into many different blocks or
pages. Each pages/block store some number of rows and size of page is 8KB.

=> Row_ID -> Internal and system maintained

-> In certain db (mysql-innoDB) it is same as PK but other DB like
Postgres have a system column row_id(tuple_id)

=> IO -> IO operation (input/output) is read request to disk.

-> We try to minimise it as much as possible.
-> An IO can fetch 1 or more page depending on disk partitions and other
factors
-> An IO cannot read a single row, it's a page with many rows in them, u
get them for free.
-> You want to minimise the no of IO as they are expensive
-> Some IO in OS goes to OS cache and not disk.
-> Postgres relies heavily on OS cache

=> Heap / Hard-drive -> The heap is data structure where the table is stored
with all its page one after another
-> This is where actual data is stored including everything
-> Traversing the heap is expensive as we need to read so many data to
find what we want
-> That is why we need indexes that help tell us exactly what part of
heap we need to read.

=> Index -> An index is another data structure separate from heap that has
pointer to heap.
-> It has part of data and used to quickly search for something
-> You can index on one column or more
-> Once you find a value of the index, you go to heap to fetch more
information where everything is there.
-> Index tells u exactly which page to fetch in the heap instead of
taking the hit to scan every page of heap
-> The index is also store as page and cost IO to pull the entries of
the index.
-> The smaller the index, the more it can fit in memory the faster the
search.
-> The popular data structure for index is b-trees.

=> INDEXES(Watch Database Indexing YouTube video from coding and concepts channel)

=> Index is a DS that u build and assign on top of a existing table, what is
does is basically look through ur table and try to analyse
and summarise so that it can create a shortcut to access the data into table
=> It is used to increase the performance of the DB query, so data can be
fetched faster. Without indexing DB has to iterate each and every
table row to find the requested data.
=> For PK indexes are automatically created and for uniques constraints also
index automatically created and they do not show in pgAdmin.

=> DBMS created data pages(generally 8KB but depends upon DB to DB). Each page
can store multiple rows

=> Page -> Depending on the storage model(row vs column store), the rows are
stored and read in logical pages
-> The DB doesn't read a single row, it reads a page or more in a
single IO and we get a lot of rows in that IO.
-> Pages are fixed sized memory location in disk
-> Each page has a size(8KB in Postgres, 16KB in mysql)

=> A single page is of 8KB, all 8KB is not used to store table info or data.
Some bytes are used for headers and offsets, remaining use for actual data.
eg. 8KB = 8192bytes, assume 96KB is assigned for header to store meta-data
about page like PageNo, how much free space is available.
36KB is assigned to offset or footer, contains array, each index of array
holds a pointer to corresponding data in data record of same header,
remaining 8060 bytes is for actual data record. Now assume a row of size
125bytes then a single page can hold 8060 / 125 rows init

=> DBMS creates and manage the data pages. As for 1 table data, it can create
many data pages. These data pages ultimately get stored in the data
blocks in physical memory.
=> Data Block -> Data block is the minimum amount of data which can be
read/write by an I/O operation.
-> It is manage by underlying storage system like disk. Data
block can range from 4Kb to 32Kb(common size if 8KB)
-> So based on the data block size, it can hold 1 or many data
page.

=> Now DBMS create data page which get stored in data block, and all data pages
stored randomly store in different data pages. Now DBMS manage
mapping of dataPages to corresponding data block.Remember DBMS controls
data pages(like which rows goes in which data page or sequence of pages)
but has no control on data blocks(data blocks can be scattered over disk)
eg. DataPage1 => Data Block 1
DataPage2 => Data Block 1
DataPage3 => Data Block 2
DataPage4 => Data Block 3

=> CREATE INDEX INDEX_NAME FROM TABLE_NAME(COLUMN_FIELD); eg CREATE INDEX

employee_name from employee(name);
=> DROP INDEX <INDEX-NAME>;

=> B+ Tree
=> If a table has million rows then query can take upto O(N) to fetch data.
Which data structure provides better time complexity. B+ tree, it
provides O(log N) for insertion, searching and deletion.
=> B+ tree are self balancing tree. It maintains sorted data, all leaf are at
same level
=> M order B tree means, each node can have at most M children's and M-1
keys per node
=> B tree and B+ trees are same except in B+ trees all leaf node are
connected.

=> DBMS uses B+ trees to manage its data pages and rows within Pages.(Watch
concept and coding index after 50 min)
-> The root node or intermediary node hold the value which is used for
faster searching of data. Possible that value might
deleted from DB, but it is used for sorting the tree
-> Leaf node actually holds the indexed column value of table.
-> With the help of B+ tree, DBMS decide which rows goes to which data
page to efficiently manage/search the data.

=> Index Type (Watch concept and coding index after 1h 5min)
=> Cluster Indexing -> Clustered indexes are the unique index per table that
uses the primary key to organize the data that is within the table.
The clustered index ensures that the primary key
is stored in increasing order, which is also the order the table holds in memory.
Clustered indexes have to be explicitly declared
in the case of Postgres. Created when the table is created.
Use the primary key sorted in ascending order.
-> Order of rows inside the data pages, match with
the order of indexing.
-> Offset manage the pointer to data in such a way it
manage the indexing sorted sequence.
-> If manually u have not provided any cluster index,
dbms assume PK as cluster key.
-> If there is no PK available then dbms create
internal hidden column which is used as cluster index
(this column increase sequentially and will not
be null)

=> Non-Cluster Indexing -> Learn later about it

=> Indexing best Practices (https://www.pgmustard.com/blog/indexing-best-

practices-postgresql) READ LATER

=> Different table scan (https://www.udemy.com/course/database-engines-crash-

course/learn/lecture/24171140#overview)

=> Sometime the heap table can be organised around a single index. This is
called a clustered index or index organised table
- PK is usually a clustered index unless otherwise specified
- Mysql InnoDB always have a PK, other indexes point to the PK value
- Postgres only have secondary indexes and all indexes point directly to
row_id which lives in heap

=> ROW VS COLUMN BASED storage DB (https://www.udemy.com/course/database-engines-

crash-course/learn/lecture/23084542#overview)
Postgres is by default Row level DB

***** what is PK and Secondary Key (https://www.udemy.com/course/database-engines-

crash-course/learn/lecture/26775206#overview) *****

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Starting Database Administration: Oracle DBA
From Everand
Starting Database Administration: Oracle DBA
anuragbaruah84
3/5 (2)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Xtract Report Builder User Manual
No ratings yet
Xtract Report Builder User Manual
64 pages
Wall Man Manual
50% (2)
Wall Man Manual
137 pages
Rdbms - Unit 5
No ratings yet
Rdbms - Unit 5
23 pages
Difference Between A Database and An Instance (Oracle)
No ratings yet
Difference Between A Database and An Instance (Oracle)
33 pages
Unit 4RTH Notes (DBMS)
No ratings yet
Unit 4RTH Notes (DBMS)
18 pages
Avoid SELECT : SSIS Performance Improvement
No ratings yet
Avoid SELECT : SSIS Performance Improvement
7 pages
DBMS2
No ratings yet
DBMS2
54 pages
DBMS Concept
No ratings yet
DBMS Concept
8 pages
Unit IV Dbms
No ratings yet
Unit IV Dbms
42 pages
Oracle DBA Interview Question & Ans
No ratings yet
Oracle DBA Interview Question & Ans
99 pages
OracleDBA10g Interview Questions With Answers
No ratings yet
OracleDBA10g Interview Questions With Answers
28 pages
Hat Is Log Switch
No ratings yet
Hat Is Log Switch
6 pages
Database Interview Questions
100% (2)
Database Interview Questions
8 pages
Sybase Interview Questions
50% (2)
Sybase Interview Questions
22 pages
UNIT- IV Transaction Concept
No ratings yet
UNIT- IV Transaction Concept
33 pages
DataStage Configuration File
No ratings yet
DataStage Configuration File
10 pages
Teradata Performance Optimization
No ratings yet
Teradata Performance Optimization
7 pages
Unit 5 - Database Management System - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Database Management System - WWW - Rgpvnotes.in
8 pages
Netezza Questions and Answers
No ratings yet
Netezza Questions and Answers
5 pages
15 Ques
No ratings yet
15 Ques
4 pages
Performance Tuning PostgreSQL
No ratings yet
Performance Tuning PostgreSQL
25 pages
Teradata Performance Optimization
No ratings yet
Teradata Performance Optimization
7 pages
Operating System: COPE/Technical Test/Interview/Database Questions & Answers
No ratings yet
Operating System: COPE/Technical Test/Interview/Database Questions & Answers
25 pages
SQL Server Questionnaire-I
No ratings yet
SQL Server Questionnaire-I
47 pages
Performance Optimization Technique
No ratings yet
Performance Optimization Technique
8 pages
Dot Net Interview Q & A 5
No ratings yet
Dot Net Interview Q & A 5
6 pages
SQL SERVER DBA QUEs
No ratings yet
SQL SERVER DBA QUEs
11 pages
T08 Databases and Optimizing Storage 1
No ratings yet
T08 Databases and Optimizing Storage 1
58 pages
Clustered and A Non-Clustered Index
No ratings yet
Clustered and A Non-Clustered Index
3 pages
Postgresql MVCC
No ratings yet
Postgresql MVCC
5 pages
DataStage Configuration File
No ratings yet
DataStage Configuration File
7 pages
What Is Teradata
0% (1)
What Is Teradata
7 pages
SQL QST
No ratings yet
SQL QST
14 pages
Dmbs New Slides Unit 3
No ratings yet
Dmbs New Slides Unit 3
20 pages
SQL Faqs
No ratings yet
SQL Faqs
7 pages
1.1.1: What Is SQL Server and ASE?: (Cspdba) ?
No ratings yet
1.1.1: What Is SQL Server and ASE?: (Cspdba) ?
28 pages
Transactions
No ratings yet
Transactions
13 pages
ACID Properties
No ratings yet
ACID Properties
12 pages
DataStage Configuration File
No ratings yet
DataStage Configuration File
6 pages
Isolation Levels
No ratings yet
Isolation Levels
10 pages
Datastage Interview
100% (1)
Datastage Interview
161 pages
notes - Copy (2)
No ratings yet
notes - Copy (2)
4 pages
Oracle 3 Transactions Analytic Functions
No ratings yet
Oracle 3 Transactions Analytic Functions
31 pages
APT Config
No ratings yet
APT Config
9 pages
Dbms Unit IV
No ratings yet
Dbms Unit IV
10 pages
Nosql
No ratings yet
Nosql
8 pages
Dbms Transaction
No ratings yet
Dbms Transaction
3 pages
Unit VI Transaction Processing, Concurrency Control and Recovery Techniques
No ratings yet
Unit VI Transaction Processing, Concurrency Control and Recovery Techniques
53 pages
Dbms Notes Ramamoorthy
No ratings yet
Dbms Notes Ramamoorthy
33 pages
02 Principles of Parallel Execution and Partitioning
No ratings yet
02 Principles of Parallel Execution and Partitioning
23 pages
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
5/5 (2)
Basic DBA Query v.1: Oracle Database
From Everand
Basic DBA Query v.1: Oracle Database
Oraclesql-plsql
5/5 (1)
Professional SQL Server 2012 Internals and Troubleshooting
From Everand
Professional SQL Server 2012 Internals and Troubleshooting
Christian Bolton
4/5 (4)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
SQL Syntax
No ratings yet
SQL Syntax
9 pages
Intro to Databases and SQL
No ratings yet
Intro to Databases and SQL
22 pages
An Introduction To HEDCO Company: Hampa Energy Engineering & Design Company
No ratings yet
An Introduction To HEDCO Company: Hampa Energy Engineering & Design Company
33 pages
Information Technology in Bio and Medical Informatics 5th International Conference ITBAM 2014 Munich Germany September 2 2014 Proceedings 1st Edition Miroslav Bursa 2024 scribd download
100% (1)
Information Technology in Bio and Medical Informatics 5th International Conference ITBAM 2014 Munich Germany September 2 2014 Proceedings 1st Edition Miroslav Bursa 2024 scribd download
55 pages
(Ebook) Fundamentals of Database Systems by Ramez Elmasri, Shamkant B. Navathe ISBN 9780133970777, 0133970779 instant download
100% (1)
(Ebook) Fundamentals of Database Systems by Ramez Elmasri, Shamkant B. Navathe ISBN 9780133970777, 0133970779 instant download
52 pages
Trees in SQL
100% (6)
Trees in SQL
20 pages
Execute Dynamic SQL Commands in SQL Server
No ratings yet
Execute Dynamic SQL Commands in SQL Server
3 pages
Oracle SQL Quick Reference 6.0 - Quick Hand Guide
No ratings yet
Oracle SQL Quick Reference 6.0 - Quick Hand Guide
13 pages
Class Xi Ip - MS
No ratings yet
Class Xi Ip - MS
5 pages
Oracle Performance Optimization Using The Wait Interface - 7, 8, 9 and Beyond
No ratings yet
Oracle Performance Optimization Using The Wait Interface - 7, 8, 9 and Beyond
121 pages
Planning Your ASE15 Migration v1
No ratings yet
Planning Your ASE15 Migration v1
130 pages
7 Oracle SQL Tuning Tactics You Can Start Implementing Immediately
No ratings yet
7 Oracle SQL Tuning Tactics You Can Start Implementing Immediately
10 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
No ratings yet
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
41 pages
Software System
No ratings yet
Software System
21 pages
Power Designer 16.6 - Data - Modeling
No ratings yet
Power Designer 16.6 - Data - Modeling
638 pages
DynamoDB Cookbook - Sample Chapter
No ratings yet
DynamoDB Cookbook - Sample Chapter
35 pages
5.3 SQL
No ratings yet
5.3 SQL
29 pages
Excel Functions: Anatomy of An Excel Function
No ratings yet
Excel Functions: Anatomy of An Excel Function
7 pages
Tuning Your PostgreSQL Server
No ratings yet
Tuning Your PostgreSQL Server
7 pages
Data Storage: Agnibesh Samanta Mba-Final Year
No ratings yet
Data Storage: Agnibesh Samanta Mba-Final Year
12 pages
Informix Se
No ratings yet
Informix Se
4 pages
Information Retrieval
No ratings yet
Information Retrieval
62 pages
The Normalization Process: The Atomic Age Is Here To Stay-But Are We?
No ratings yet
The Normalization Process: The Atomic Age Is Here To Stay-But Are We?
2 pages
Messages 8B Brazilian Localization
No ratings yet
Messages 8B Brazilian Localization
20 pages
How Autonomous Is The Oracle Autonomous Data Warehouse?: Christian Antognini / Dani Schnider
No ratings yet
How Autonomous Is The Oracle Autonomous Data Warehouse?: Christian Antognini / Dani Schnider
57 pages
Module 12 - Examen Test D
No ratings yet
Module 12 - Examen Test D
132 pages

Database

Uploaded by

Database

Uploaded by

=> What is Database

- DB is a collection of records(collection of rows or collection of json

=> What is Transaction

=> Transaction Life Span

=> Nature of Transactions

- If we do any operation in db it always happen in TA, some are done by us some by

=> What is ACID (Atomicity, Consistency, Isolation, Durability)

=> Read phenomena

- non-repeatable read -> A transaction re-reads data it has previously

- Phantom Read -> A transaction re-executes a query returning a set of

Different Isolation levels helps in stopping the above read phenomena.

- POSTGRES ISOLATION LEVEL

=> Consistency =>

=> Durability =>

*** ACID BY PRACTICAL EXAMPLE => MOST IMPORTANT

=> Difference between Serialisable and Repeatable read

=> INTERNALS OF POSTGRES

=> Row_ID -> Internal and system maintained

=> IO -> IO operation (input/output) is read request to disk.

=> CREATE INDEX INDEX_NAME FROM TABLE_NAME(COLUMN_FIELD); eg CREATE INDEX

=> Non-Cluster Indexing -> Learn later about it

=> Indexing best Practices (https://www.pgmustard.com/blog/indexing-best-

=> Different table scan (https://www.udemy.com/course/database-engines-crash-

=> ROW VS COLUMN BASED storage DB (https://www.udemy.com/course/database-engines-

***** what is PK and Secondary Key (https://www.udemy.com/course/database-engines-

You might also like