CSC 211 Lecture Note
CSC 211 Lecture Note
a) Static Hashing
In static hashing, the number of data buckets in the memory remains constant throughout, it is fixed. The
drawback of static hashing is that that it does not expand or shrink dynamically as the size of the database
grows or shrinks.
b) Dynamic/Extended Hashing
In Dynamic hashing, data buckets grow or shrinks (added or removed dynamically) as the records increases
or decreases. In dynamic hashing, the hash function is made to produce many values.
BUCKET OVERFLOW
Bucket overflow occurs when new records are inserted into the file but the data bucket address generated
by the hash function is not empty or the data already exists in that address. This becomes a critical situation
to handle, so then how then can we insert data in this case? There are several methods such as open
hashing, closed hashing, quadratic probing, double hashing provided to overcome this situation.
Some commonly used methods include:
• Open Hashing – In this method, the next available data block is used to enter the new
record, instead of overwriting the older one.
Example:
D3 is a new record that needs to be inserted, the hash function generates the address as 105. But it is
already full. So the system searches the next available data bucket, 123, and assigns D3 to it.
Open Hashing
Closed hashing – In the Closed hashing method, a new data bucket is allocated with the same address
and is linked to it after the full data bucket.
Example:
We want to insert a new record D3 into the tables. The hash function generates the data bucket address
as 105. But this bucket is full to store the new data. In this case, a new data bucket is added at the end
of the 105 data bucket and is linked to it. The new record D3 is inserted into the new bucket.
Closed Hashing
4. B+ Tree File Organization
B+ Tree, as the name suggests, uses a tree like structure to store records in File. It is an advanced method
of indexed sequential access mechanism it uses the concept of Key indexing where the primary key is
used to sort the records. For each primary key, an index value is generated and mapped with the
In the above diagram 56 is the root node which is also called the main node of the tree. The intermediate
nodes here consist of the address of leaf nodes. They do not contain any actual record. Leaf nodes consist
of the actual record.
Advantages
Tree traversal is easier and faster.
Searching becomes easy as all records are stored only in leaf nodes and are sorted sequential linked
list.
There is no restriction on B+ tree size. It may grows/shrink as the size of data increases/decreases.
Disadvantages
Inefficient for static tables.
ISAM method is an advanced sequential file organization. In this method, records are stored in the file using
the primary key. An index value is generated for each primary key and mapped with the record. This index
contains the address of the record in the file.
Advantages of ISAM
In this method, each record has the address of its data block, searching a record in a huge database is
quick and easy.
This method supports range retrieval and partial retrieval of records. Since the index is based on the
primary key values, we can retrieve the data for the given range of value.
Disadvantages of ISAM
This method requires extra space in the disk to store the index value.
When the new records are inserted, reconstructed must be done to maintain the sequence.
When the two or more records are stored in the same file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes which are used to map these tables together are
stored only once.
The cluster file organization is used when there is a frequent need for joining the tables with the same
condition. These joins will give only a few records from both tables. In the given example, we are retrieving
the record for only particular departments. This method can't be used to retrieve the record for the entire
department.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key with
which searching is done. Cluster key is a type of key with which joining of the table is performed.
a) Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are
grouped based on the cluster key- DEP_ID and all the records are grouped.
b) Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key,
we generate the value of the hash key for the cluster key and store the records with the same hash key
value.
It provides the efficient result when there is a 1:M mapping between the tables.
This method reduces the cost of searching for various records in different files.
4.1 PREAMBLE
Information management can be defined as a process of collecting, storing, maintaining and managing
data and other types of information. Information management shows the guideline and procedures
adopted by organizations in managing and communicating among different individuals, departments and
stakeholders.
Information Retrieval refers to the human-computer interaction (HCI) that happens when a user uses a
machine to search for information to match the user’s search query. It is all about retrieving in formation
that is stored in a database or computer that is related to the user’s needs. An example of Information
retrieval is when a user enters a query into web search engines. The IR system assists the users in finding the
information they require but it does not explicitly return the answers to the question. It notifies regarding the
existence and location of documents that might consist of the required information. There are various
methods and techniques used in information retrieval these include Precision and Recall.
Backup Procedures refers to storing a copy of original data which can be used in case of data loss. Backup
is considered as one of the approach of data protection. Important data of the organization needs to be
kept and backup efficiently in order to protect valuable data. Backup can be achieved by storing copy
of the original data separately or database on storage devices. There are various types of backups are
available such as full backup, incremental backup, Local backup, mirror backup etc. Example of
Backup can be SnapManager, which makes a backup of everything in database.
Types of back up include full backup, incremental up, differential up, mirror backup, full PC backup, Local
backup, offsite backup, online back up.
Recovery
Recovery refers to restoring the lost data in case of failure by using/implementing some recovery
techniques. When data base failures due to any reason then there is the chance of data loss, so in th at
case recovery process helps in improving the reliability of the database.
Example of Recovery can be Snap Manager is an example of recovery, recovers the data to the last
transaction.
Date Security refers to the protection of data against unauthorized access, modification, and destruction. With most
companies depending on information, it is important that it is protected. There are ranges of technologies and actions
such as antivirus software, firewalls, backup and recovery system that are used in data protection because breaches
in computer security can lead to loss of profit, loss of public trust and lawsuits.
Organization should ensure that computer operation, and any other job should be done separately this is to ensure
that it would take the collusion of two or more employees to be able to defraud the company.
unauthorized employees and members of the public shouldn’t be allowed into secure areas such as computer
operations rooms.
Staffs should be educated on security breaches and how it can be prevented by rises alarm when necessary.
A. Password Protection
Password lists should not be stored in plain form, but should be encrypted, and held in an irreversibly transformed
state.
All users must ensure that their passwords is kept confidential, not written down, not made up of easily guessed
words and is changed regularly, at least every 3 months.
B. Data encryption
Data on a network is vulnerable to wire-tapping when it is being transmitted over a network, and one method of
preventing confidential data from being read by unauthorized hackers is to encrypt it, making it incomprehensible to
anyone who does not hold the ‘key’ to decode it.
There are many ways of encrypting data, often based either transposition (where characters arc switched
around) or substitution (where characters are replaced by other characters).
Cryptography serves three purposes:
It helps to identify authentic users.
It prevents alteration of the message.
It prevents unauthorized users from reading the message.
C. Access rights
Organization can program their systems to allow access to particular data only from particular terminals,
and only at certain times of day. The terminal in the database administrator's office may be the only terminal
from which changes to the structure of a database may be made.
Biometric security is the use of biometric data for identification, access control, and authentication. Fingerprint
recognition techniques, voice recognition and face recognition are Biometric methods of identifying an
authorized user.
Data will be destroyed no matter what precautions are taken a when there is a fire, or flood outbreak or
accidental destruction of data. A simple disk head crash can destroy a disk pack in a fraction of a second.
A backup facility that does not degrade the performance of the system should be readily available. The
cost of lack of planning for computer failure can be very disastrous, hence as a computer scientist backing
up data is very important.