Database
Database
SQL performance tuning is the process of making SQL queries run faster and use fewer
resources on the server side.It is process of enhancing SQL queries to accelerate the servers
performance, If we create our queries well in SQL, it will provide the information needed
from the database quickly and with less stress on the server.
2. What is the focus of most performance tuning activities, and why does that focus exist?
The focus of most performance tuning activities typically revolves around improving the
efficiency, speed, and resource utilization of computer systems, software applications, and
databases by minimizing the number of input output operation as this operation are much
slower than directly reading data from the data cache. This focus exists to enhance user
experience, optimize resource allocation, and maximize the throughput of systems.
Database statistics can be gathered either manually by the Database Administrator (DBA)
or automatically by the Database Management System (DBMS) itself. Database
administrators can manually gather statistics using commands provided by the DBMS For
example many DBMS vendors supports the SQL’s ANALYZE command which is
commonly used across different database platforms to collect statistics. Many DBMS also
offer built-in mechanisms for automatic database statistics collection.
4. If indexes are so important, why not index every column in every table? (Include a brief
discussion of the role played by data sparsity).
Indexing every column in every table may seem like a simple solution to improve query
performance, but it has a major drawback. Although indexes are essential for good data
storage, processing and maintenance costs. There are several reasons why indexing every
row in every table is not a common method some of them are listed below:
Index maintenance overhead: Each index requires additional storage space to store
evaluation values and point to the table pointer to the corresponding row. Index
maintenance involves updating the index every time an insert, update, or delete is
made to the base table. Indexing every column in every table will increase the
overhead associated with indexing operations, slow down data updates, and require
more storage.
Performance: If we create many indexes, the DBMS will take more time to evaluate
and choose between different ways to access the index; This will lead to weak queries
and longer queries.
Storage Overhead: Indexes use additional disk space, which can be significant,
especially for tables with many rows and columns. Analyzing every row in every
table will increase dependency on index management, which leads to higher storage
costs and resource usage.
Data sparsity: Data sparsity refers to the number of different values in a column.
Data sparsity helps to determine whether indexing a particular column would be
beneficial or not Columns with low sparsity, which have a limited number of distinct
values, may not benefit significantly from indexing. For example, columns like
marital status with only a few distinct values may have low sparsity. On the other
hand, columns with high sparsity, such as Email address which are highly varied can
benefit from indexing because they offer more distinct values that need efficient
access. By analyzing the sparsity of each column, administrators can prioritize
indexing those with high sparsity. It also prevents unnecessary indexing of columns
with low sparsity, hence optimizing resource utilization within the database system.
5. Most query optimization techniques are designed to make the optimizer’s work
easier. What factors should you keep in mind if you intend to write conditional
expressions in SQL code?
When writing conditional expressions in SQL code, following things should be considerd
to optimize query performance:
If there are multiple AND conditions, we should write the condition most likely to
be false first.
Avoid using the NOT logical operator whenever possible. For example, instead of
using NOT (Quantity > 100), we should use Quantity <= 100
6. What does RAID stand for, and what are some commonly used RAID levels?
RAID stands for Redundant Array of Independent Disks. It is used in computer storage to
combine multiple physical disk drives into a single logical unit for the purpose of data
redundancy, performance enhancement, or both. RAID systems are designed to balance
between performance and fault tolerance by distributing data across multiple disks in
various configurations.
RAID systems support various RAID levels, where each of them offer different
configurations for data redundancy and performance enhancement. Some commonly used
RAID levels are:
RAID 0: It utilizes striping (data blocks are spread over separate drives) without
redundancy, It is purely used for performance improvement.
RAID 1: It utilizes mirroring for data redundancy. Here the same data blocks are written
to separate drives. It provides data redundancy and fault tolerance because each disk in
the array contains an identical copy of the data.
RAID 3: It utilizes striping with dedicated parity, where data is distributed across disks
and parity information is stored on a dedicated disk for fault tolerance. It offers good read
performance for sequential data access but may suffer in write performance due to parity
calculation.
RAID 5:It utilizes striping with distributed parity, providing both performance and fault
tolerance.
SELECT EMP_SEX
EMPLOYEE
EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’ EMP_LNAME, EMP_FNAME;
Here in the EMP_SEX column the query filters for employees with a specific
gender (Female). Most likely there are only two distinct values for the EMP_SEX
column: M: Male and F: Female. So the data sparsity of the column EMP_SEX
would be low as this column as only two possible values.
2. What indexes should you create? Write the required SQL commands.
To optimize the query performance for the given SQL statement, we should create the
following indexes:
SQL commands: