0% found this document useful (0 votes)
7 views

CSC 211 Lecture Note

Steps on how to solve CSC 211 problems
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

CSC 211 Lecture Note

Steps on how to solve CSC 211 problems
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

3.

Hash File Organization


In hash file organization, records are not stored according to position rather records are stored at the data
blocks whose address is generated by using hash function. The memory location where these records are
stored is called as data block or data bucket.
Data bucket – Data buckets are the memory locations where the records are stored. These buckets are
also considered as Unit of Storage.
Hash Function – Hash function is a mathematical function used to generate addresses where records
can be saved.
Hash Index-The prefix of an entire hash value is taken as a hash index. Every hash index has a depth
value to signify how many bits are used for computing a hash function.
Below given diagram clearly depicts how hash function work:

Hashing is divided into two subcategories:

a) Static Hashing
In static hashing, the number of data buckets in the memory remains constant throughout, it is fixed. The
drawback of static hashing is that that it does not expand or shrink dynamically as the size of the database
grows or shrinks.
b) Dynamic/Extended Hashing
In Dynamic hashing, data buckets grow or shrinks (added or removed dynamically) as the records increases
or decreases. In dynamic hashing, the hash function is made to produce many values.
BUCKET OVERFLOW

Bucket overflow occurs when new records are inserted into the file but the data bucket address generated
by the hash function is not empty or the data already exists in that address. This becomes a critical situation
to handle, so then how then can we insert data in this case? There are several methods such as open
hashing, closed hashing, quadratic probing, double hashing provided to overcome this situation.
Some commonly used methods include:
• Open Hashing – In this method, the next available data block is used to enter the new
record, instead of overwriting the older one.
Example:
D3 is a new record that needs to be inserted, the hash function generates the address as 105. But it is
already full. So the system searches the next available data bucket, 123, and assigns D3 to it.

Open Hashing

Closed hashing – In the Closed hashing method, a new data bucket is allocated with the same address
and is linked to it after the full data bucket.

Example:
We want to insert a new record D3 into the tables. The hash function generates the data bucket address
as 105. But this bucket is full to store the new data. In this case, a new data bucket is added at the end
of the 105 data bucket and is linked to it. The new record D3 is inserted into the new bucket.

Closed Hashing
4. B+ Tree File Organization

B+ Tree, as the name suggests, uses a tree like structure to store records in File. It is an advanced method

of indexed sequential access mechanism it uses the concept of Key indexing where the primary key is

used to sort the records. For each primary key, an index value is generated and mapped with the

record. An index of a record is the address of record in the file.

In the above diagram 56 is the root node which is also called the main node of the tree. The intermediate
nodes here consist of the address of leaf nodes. They do not contain any actual record. Leaf nodes consist
of the actual record.
Advantages
Tree traversal is easier and faster.
Searching becomes easy as all records are stored only in leaf nodes and are sorted sequential linked
list.
There is no restriction on B+ tree size. It may grows/shrink as the size of data increases/decreases.

Disadvantages
Inefficient for static tables.

5. Indexed Sequential Access Method (ISAM)

ISAM method is an advanced sequential file organization. In this method, records are stored in the file using
the primary key. An index value is generated for each primary key and mapped with the record. This index
contains the address of the record in the file.
Advantages of ISAM
In this method, each record has the address of its data block, searching a record in a huge database is
quick and easy.
This method supports range retrieval and partial retrieval of records. Since the index is based on the
primary key values, we can retrieve the data for the given range of value.

Disadvantages of ISAM
This method requires extra space in the disk to store the index value.
When the new records are inserted, reconstructed must be done to maintain the sequence.

6. Cluster File Organization

When the two or more records are stored in the same file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes which are used to map these tables together are
stored only once.
The cluster file organization is used when there is a frequent need for joining the tables with the same
condition. These joins will give only a few records from both tables. In the given example, we are retrieving
the record for only particular departments. This method can't be used to retrieve the record for the entire
department.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key with
which searching is done. Cluster key is a type of key with which joining of the table is performed.

Cluster file organization is of two types:

a) Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are
grouped based on the cluster key- DEP_ID and all the records are grouped.

b) Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster key,
we generate the value of the hash key for the cluster key and store the records with the same hash key
value.

Advantages of Cluster File Organization

It provides the efficient result when there is a 1:M mapping between the tables.
This method reduces the cost of searching for various records in different files.

Disadvantages of Cluster file organization


This method has the low performance for the very large database.
If there is any change in joining condition, then this method cannot use.
This method is not suitable for a table with a 1:1 condition.
When the record is deleted, then the space used by it needs to be released. Otherwise, the performance
of the database will slow down.
INFORMATION MANAGEMENT
Learning Objectives
In this chapter, you will learn:

• What is Information Management


• Information Retrieval
• Backup Procedures
• How to secure Data from Fraudulent use

4.1 PREAMBLE

Information management can be defined as a process of collecting, storing, maintaining and managing
data and other types of information. Information management shows the guideline and procedures
adopted by organizations in managing and communicating among different individuals, departments and
stakeholders.

4.2 INFORMATION RETRIEVAL

Information Retrieval refers to the human-computer interaction (HCI) that happens when a user uses a
machine to search for information to match the user’s search query. It is all about retrieving in formation
that is stored in a database or computer that is related to the user’s needs. An example of Information
retrieval is when a user enters a query into web search engines. The IR system assists the users in finding the
information they require but it does not explicitly return the answers to the question. It notifies regarding the
existence and location of documents that might consist of the required information. There are various
methods and techniques used in information retrieval these include Precision and Recall.

4.3 BACKUP PROCEDURES

Backup Procedures refers to storing a copy of original data which can be used in case of data loss. Backup
is considered as one of the approach of data protection. Important data of the organization needs to be
kept and backup efficiently in order to protect valuable data. Backup can be achieved by storing copy
of the original data separately or database on storage devices. There are various types of backups are
available such as full backup, incremental backup, Local backup, mirror backup etc. Example of
Backup can be SnapManager, which makes a backup of everything in database.
Types of back up include full backup, incremental up, differential up, mirror backup, full PC backup, Local
backup, offsite backup, online back up.

Recovery
Recovery refers to restoring the lost data in case of failure by using/implementing some recovery
techniques. When data base failures due to any reason then there is the chance of data loss, so in th at
case recovery process helps in improving the reliability of the database.
Example of Recovery can be Snap Manager is an example of recovery, recovers the data to the last
transaction.

Difference between Backup and Recovery


Backup Recovery
Backup refers to storing a copy of original data Recovery refers to restoring the lost data in case of
separately failure
Backup is a copy of data which is used to restore Recovery is a process of retrieving lost, corrupted
original data after a data loss/damage occurs or damaged data to its original state.
Backup is the replication of data Recovery is the process to store the database
The prior goal of backup is just to keep one extra The prior goal of recovery is to retrieve original
copy to refer in case of original data loss data in case of original data failure
It helps in improving data protection. It helps in improving the reliability of the database.
Backup makes the recovery process easier Recovery has no role in data backup
The cost of backup is affordable The cost of recovery is expensive

Date Security refers to the protection of data against unauthorized access, modification, and destruction. With most
companies depending on information, it is important that it is protected. There are ranges of technologies and actions
such as antivirus software, firewalls, backup and recovery system that are used in data protection because breaches
in computer security can lead to loss of profit, loss of public trust and lawsuits.

Unauthorized Data Access


Malicious Mischief
Unauthorized Computer Access
Computer Viruses
Physical theft

4.4 HOW TO SECURE DATA FROM FRAUDULENT USE

How to secure Data from fraudulent use in an organization


Most organizations are at risk of losing data to hackers and fraudulent employees within the organization and at such
precautionary measures must be put in place to counteract these risks. These precautionary measures include following:

Staffs of an organization must be careful vetted.


cancellation of all passwords and authorizations for employees who have been sacked or who handed in their
resignation must be done immediately from the company’s data base.

Organization should ensure that computer operation, and any other job should be done separately this is to ensure
that it would take the collusion of two or more employees to be able to defraud the company.

unauthorized employees and members of the public shouldn’t be allowed into secure areas such as computer
operations rooms.

Staffs should be educated on security breaches and how it can be prevented by rises alarm when necessary.

Other techniques used in securing Data from fraudulent use include:

A. Password Protection

Password lists should not be stored in plain form, but should be encrypted, and held in an irreversibly transformed
state.

User IDs and passwords

Rules issued by companies regarding password include the following.

Password must be at least 6 characters.

Password display must be automatically suppressed on screen or printed output;

Files containing passwords must be encrypted;

All users must ensure that their passwords is kept confidential, not written down, not made up of easily guessed
words and is changed regularly, at least every 3 months.

B. Data encryption

Data on a network is vulnerable to wire-tapping when it is being transmitted over a network, and one method of
preventing confidential data from being read by unauthorized hackers is to encrypt it, making it incomprehensible to
anyone who does not hold the ‘key’ to decode it.

There are many ways of encrypting data, often based either transposition (where characters arc switched
around) or substitution (where characters are replaced by other characters).
Cryptography serves three purposes:
It helps to identify authentic users.
It prevents alteration of the message.
It prevents unauthorized users from reading the message.

C. Access rights
Organization can program their systems to allow access to particular data only from particular terminals,
and only at certain times of day. The terminal in the database administrator's office may be the only terminal
from which changes to the structure of a database may be made.

D. Biometric security measures

Biometric security is the use of biometric data for identification, access control, and authentication. Fingerprint
recognition techniques, voice recognition and face recognition are Biometric methods of identifying an
authorized user.

4.5 DISASTER PLANNING

Data will be destroyed no matter what precautions are taken a when there is a fire, or flood outbreak or
accidental destruction of data. A simple disk head crash can destroy a disk pack in a fraction of a second.
A backup facility that does not degrade the performance of the system should be readily available. The
cost of lack of planning for computer failure can be very disastrous, hence as a computer scientist backing
up data is very important.

You might also like