File Organization SS2 Data 2nd Term
File Organization SS2 Data 2nd Term
Manages Data Integrity and Security: Data integrity needs to be checked and
managed accurately as it protects and restricts data from unauthorized use. DBA
eyes on relationships within data to maintain data integrity.
Database Design: DBA is held responsible and accountable for logical, physical
design, external model design, and integrity and security control.
Tuning Database Performance: If the user is not able to get data speedily and
accurately then it may lose organization’s business. So by tuning SQL commands
DBA can enhance the performance of the database.
The practice of protecting database data from unauthorized access, theft, alteration, and
destruction is known as database security. Protecting sensitive information from many risks,
including hackers, nefarious insiders, and natural calamities, is the aim of database security.
Organizations can implement database security safeguards using a variety of techniques.
We will talk about some of the most popular database security control mechanisms in this
article −
Authentication
Database Encryption
Access control
Inference control
Flow control
Authentication
Database security that verifies the user's login information stored in the database is known
as database authentication. The user can access the database if their login information
matches what is in there. In other words, the user must authenticate before accessing your
database.
Database Encryption
One of the best methods for preventing unauthorized access to your database while it is
being stored or transmitted over the internet is encryption.
Therefore, until the data is decrypted, hackers who get access to your database are unable
to utilize it.
Backup Database
Another kind of database security is backup, which may be used to recover data in the event
of data loss, data corruption, hacking, or natural catastrophes. It continuously copies or
archives the database on backup storage.
Access Controls
Inference control
Flow control
In DBMS (Database Management Systems), "flow control" refers to the technique for
controlling the data flow between various system components. The purpose of flow control
is to guarantee the dependable, effective, and secure processing of data.
For organizations that keep sensitive or private information in their systems, database
security is a major problem. Applying statistical techniques to identify abnormalities or
trends that might point to a security breach or an effort to access unauthorized data is one
form of database security.
Conclusion
Identifying, reporting, and managing database security issues, audit trails, and
forensics.
File organization refers to the arrangement of data in a file or a database table. In computer
science, file organization is an important concept used in the design and implementation of
file systems and database management systems (DBMS). The way data is organized in a file
or table can have a significant impact on the efficiency and speed of data access and
retrieval.
There are four types of file organization methods that a DBMS can use to store data.
Sequential file organization is suitable for situations where data is accessed in a serial
manner, such as batch processing or generating reports. It is also used in situations where
data is not frequently updated, as adding or deleting a record can cause the entire file to be
rewritten.
In an indexed file organization, the index is created based on one or more fields of the data,
which are referred to as the index key. The index key is unique for each record, which allows
for fast and efficient retrieval of specific records based on the value of the index key.
Indexed file organization is suitable for situations where data needs to be accessed quickly
and efficiently, and where the data is not frequently updated. However, updating or inserting
new records can be slower and more complex, as the index needs to be updated as well.
Hash file organization is a type of file organization where data is stored in a file or table
using a hash function. A hash function is a mathematical function that converts a key value
into a hash code, which is used to map the key to the location in the file or table where the
data is stored.
Hash File Organization
In a hash file organization, the hash function is used to determine the location in the file or
table where a record will be stored, based on the value of the record’s key. This makes it very
fast and efficient to retrieve records based on their key values.
Clustered file organization is a type of file organization where data is stored in a file or table
based on the values of one or more fields, called the clustering key. The clustering key
determines the physical order in which the data is stored on disk, and the data is typically
sorted in ascending or descending order based on the values of the clustering key.
clustered file organization
In a clustered file organization, all the records with the same clustering key values are stored
physically close to each other on disk, making it easy and efficient to retrieve them together.
This makes clustered file organization particularly useful for queries that retrieve a range of
values for the clustering key, as the system can read the entire range of data from disk
sequentially.
The choice of file organization depends on various factors, such as the type of data being
stored, the frequency and types of queries performed on the data, and the available
hardware resources. Therefore, choosing the right file organization is critical to the
performance and efficiency of a computer system.
FILE EXTENSION
A file extension is a suffix at the end of a computer file. It usually consists of two to four
characters and helps identify the file type. Examples include .txt (plain text), .jpg (image),
and .zip (compressed archive)
A file extension, sometimes called a file suffix or a filename extension, is the character or
group of characters after the period that makes up an entire file name. Some common file
extensions include PNG, MP4, PDF, MP3, DOC, SVG, INI, DAT, EXE, and LOG.
The file extension helps an operating system, like Windows or macOS, determine which
program on your computer the file is associated with.
For example, the file notes.docx ends in docx, a file extension that's associated
with Microsoft Word on your computer. When you attempt to open this file, Windows sees
that the file ends in a DOCX extension, which it already knows should be opened by Word.
File extensions also often indicate the file type, or file format, of the file, but not always. Any
file's extension can be renamed, but that won't convert the file to another format or change
anything about the file other than this portion of its name.
File extensions and file formats are often spoken about interchangeably. This is usually okay,
but in reality, a file extension is just the characters that appear after the period, while the file
format speaks to the way in which the data in the file is organized.
For example, in the file name data.csv, the file extension is csv, indicating that this is a CSV
file. A computer user could rename that file to data.mp3, however, that wouldn't mean you
could play the file as some sort of audio on a smartphone. The file itself is still rows of text
(a CSV file), not a compressed musical recording (an MP3 file).
INDEX FILE
An index file is created alongside a data file and contains pairs of primary key values and
pointers to corresponding data records. It is used to find records in the file by searching the
index and accessing the data file directly using pointers.
A Primary Index is an ordered file whose records are of fixed length with two fields. The first
field of the index is the primary key of the data file in an ordered manner, and the second
field of the ordered file contains a block pointer that points to the data block where a record
containing the key is available.
Working of Primary Indexing
In primary indexing, the data file is sorted or clustered based on the primary key as
shown in the below figure.
An index file (also known as the index table) is created alongside the data file.
The index file contains pairs of primary key values and pointers to the corresponding
data records.
Each entry in the index file corresponds to a block or page in the data file.
CLUSTERED INDEX
A clustered index in SQL is a type of index that determines the physical order of data in a
table. It defines the order in which the rows of a table are stored on disk or other storage
media.
Clustering indexing is a database indexing technique that is used to physically arrange the
data in a table based on the values of the clustered index key. This means that the rows in
the table are stored on disk in the same order as the clustered index key. With a clustered
index, the database can more efficiently retrieve data because it doesn’t have to scan the
entire table to find the data it needs. Instead, it can use the clustered index to quickly locate
the data, resulting in faster query execution times and improved overall performance.
Advantages
Improved Query Performance: Clustering indexing results in faster query
performance, as the data is stored in a way that makes it easier to retrieve the
desired information. This is because the index is built based on the clustered data,
reducing the number of disk I/Os required to retrieve the data.
Reduced Disk Space Usage: Clustering indexing reduces the amount of disk space
required to store the index. This is because the index contains only the information
necessary to retrieve the data, rather than storing a copy of the data itself.
Improved Data Retrieval: Clustering indexing can also improve the efficiency of data
retrieval operations. In a clustered index, the data is stored in a logical order, which
makes it easier to locate and retrieve the data. This can result in faster data retrieval
times, particularly for large databases.
Disadvantages
Limited to One Clustered Index: A table can have only one clustered index, as having
multiple clustered indexes would result in conflicting physical orderings of the data.
A nonclustered index in SQL Server is a separate structure from the table that contains a
copy of selected columns from the base table. It’s organized in a way that allows for fast
retrieval of data based on the indexed columns, without having to scan the entire table.