Dbms Unit-6
Dbms Unit-6
File Organization
Sorted File Method –In this method, As the name itself suggest whenever a new
record has to be inserted, it is always inserted in a sorted (ascending or
descending) manner. Sorting of records may be based on any primary key or any
other key
Insertion of new record –
Pros –
•Fast and efficient method for huge amount of data.
•Simple design.
•Files can be easily stored in magnetic tapes i.e cheaper
storage mechanism.
CONS-
•Time wastage as we cannot jump on a particular record
that is required, but we have to move in a sequential
manner which takes our time.
•Sorted file method is inefficient as it takes time and
space for sorting records.
Heap File Organization –
1.Open Hashing – In Open hashing method, next available data block is used to enter the new record,
instead of overwriting the older one. This method is also called linear probing. For example, D3 is a new
record that needs to be inserted, the hash function generates the address as 105. But it is already full. So
the system searches next available data bucket, 123 and assigns D3 to it.
2.Closed hashing – In Closed hashing method, a new data bucket is allocated with same address and is linked it
After the full data bucket. This method is also known as overflow chaining. For example, we have to insert a new
record D3 into the tables. The static hash function generates the data bucket address as 105. But this bucket is full
to store the new data. In this case is a new data bucket is added at the end of 105 data bucket and is linked to it.
Then new record D3 is inserted into the new bucket.
•Quadratic probing : Quadratic probing is very much similar to open hashing or linear probing.
Here, The only difference between old and new bucket is linear. Quadratic function is used to
determine the new bucket address.
•Double Hashing : Double Hashing is another method similar to linear probing. Here the
difference is fixed as in linear probing, but this fixed difference is calculated by using another hash
function. That’s why the name is double hashing.
Dynamic Hashing –
The drawback of static hashing is that it does not expand or shrink dynamically as the size of the database grows
or shrinks. In Dynamic hashing, data buckets grows or shrinks (added or removed dynamically) as the records
increases or decreases. Dynamic hashing is also known as extended hashing. In dynamic hashing, the hash
function is made to produce a large number of values. For Example, there are three data records D1, D2 and D3 .
The hash function generates three addresses 1001, 0101 and 1010 respectively. This method of storing considers
only part of this address – especially only first one bit to store the data. So it tries to load three of them at address
0 and 1.
In the above diagram 56 is the root node which is also called the main node of the tree.
The intermediate nodes here, just consist the address of leaf nodes. They do not contain any actual record. Leaf
nodes consist of the actual record. All leaf nodes are balanced.
Pros –
hash = hashfunc(key)
index = hash % array_size
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day,
generation of data reaches up to many Petabytes.
NoSQL database is non-relational, so it scales out better than relational databases as they are designed with web
applications in mind.
Features of NoSQL
Non-relational
•NoSQL databases never follow the relational model
•Never provide tables with flat fixed-column records
•Work with self-contained aggregates or BLOBs
•Doesn’t require object-relational mapping and data normalization
•No complex features like query languages, query planners,referential integrity joins, ACID
Schema-free
•NoSQL databases are either schema-free or have relaxed schemas
•Do not require any sort of definition of the schema of the data
•Offers heterogeneous structures of data in the same domain
Simple API
•Offers easy to use interfaces for storage and querying data
provided
•APIs allow low-level data manipulation & selection methods
•Text-based protocols mostly used with HTTP REST with JSON
•Mostly used no standard based NoSQL query language
•Web-enabled databases running as internet-facing services
Distributed
•Multiple NoSQL databases can be executed in a distributed fashion
•Offers auto-scaling and fail-over capabilities
•Often ACID concept can be sacrificed for scalability and throughput
•Mostly no synchronous replication between distributed nodes Asynchronous Multi-Master Replication, peer-to-peer, HDFS
Replication
•Only providing eventual consistency
•Shared Nothing Architecture. This enables less coordination and higher distribution.
It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a collection, dictionaries,
associative arrays, etc. Key value stores help the developer to store schema-less data. They work best for shopping cart
contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all based on Amazon’s Dynamo
paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately.
Values of single column databases are stored contiguously.
•Sparse Index
• The index record appears only for a few items in the data file. Each item points to a block as shown.
• To locate a record, we find the index record with the largest search key value less than or equal to the
search key value we are looking for.
• We start at that record pointed to by the index record, and proceed along with the pointers in the file (that
is, sequentially) until we find the desired record.
• Number of Accesses required=log₂(n)+1, (here n=number of blocks acquired by index file)
Hash File organization
Indices are based on the values being distributed uniformly across a range of buckets. The buckets to which a
value is assigned is determined by a function called a hash function. There are primarily three methods of
indexing:
Clustered Indexing
When more than two records are stored in the same file
these types of storing known as cluster indexing. By
using the cluster indexing we can reduce the cost of
searching reason being multiple records related to the
same thing are stored at one place and it also gives the
frequent joining of more than two tables (records).
Primary Indexing
This is a type of Clustered Indexing wherein the data is sorted according to the search key and the primary key
of the database table is used to create the index. It is a default format of indexing where it induces sequential
file organization. As primary keys are unique and are stored in a sorted manner, the performance of the
searching operation is quite efficient.
Non-clustered or Secondary Indexing
A non clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to the
location where the data is actually stored. Data is not physically stored in the order of the index. Instead, data is
present in leaf nodes. For eg. the contents page of a book. Each entry gives us the page number or location of the
information stored. The actual data here(information on each page of the book) is not organized but we have an
ordered reference(contents page) to where the data points actually lie. We can have only dense ordering in the non-
clustered index as sparse ordering is not possible because data is not physically organized accordingly.
Multilevel Indexing
With the growth of the size of the database, indices also grow. As the index is stored in the main memory, a
single-level index might become too large a size to store with multiple disk accesses. The multilevel indexing
segregates the main block into various smaller blocks so that the same can stored in a single block. The outer
blocks are divided into inner blocks which in turn are pointed to the data blocks. This can be easily stored in the
main memory with fewer overheads.
Advantages of Indexing
•Improved Query Performance: Indexing enables faster data retrieval from the database. The database may
rapidly discover rows that match a specific value or collection of values by generating an index on a column,
minimising the amount of time it takes to perform a query.
•Efficient Data Access: Indexing can enhance data access efficiency by lowering the amount of disk I/O required
to retrieve data. The database can maintain the data pages for frequently visited columns in memory by
generating an index on those columns, decreasing the requirement to read from disk.
•Optimized Data Sorting: Indexing can also improve the performance of sorting operations. By creating an index
on the columns used for sorting, the database can avoid sorting the entire table and instead sort only the relevant
rows.
•Consistent Data Performance: Indexing can assist ensure that the database performs consistently even as the
amount of data in the database rises. Without indexing, queries may take longer to run as the number of rows in
the table grows, while indexing maintains roughly consistent speed.
•By ensuring that only unique values are inserted into columns that have been indexed as unique, indexing can
also be utilized to ensure the integrity of data. This avoids storing duplicate data in the database, which might
lead to issues when performing queries or reports.
Disadvantages of Indexing
•Indexing necessitates more storage space to hold the index data structure, which might increase the total size of
the database.
•Increased database maintenance overhead: Indexes must be maintained as data is added, destroyed, or modified
in the table, which might raise database maintenance overhead.