Database and SQL Queries d - Copy (4)
Database and SQL Queries d - Copy (4)
On the other hand, primary keys make it easier to interpret your data
model. By seeing the primary keys of every table in an ENTITY-
RELATIONSHIP DIAGRAM (ERD), the programmer writing a query will
know how to access each table and how to join it with others.
Having a primary key in each table ensures that relationships can be
maintained between tables.
A null value can be applied to columns of any data type (as long as
the column supports null values).
Any SQL operation involving a null value will result in another null
value. The exception is mathematical aggregate functions like
SUM(), where null values are treated as zeros.
#5: Atomicity
Sometimes, you may feel tempted to have a single column for complex
or compound data. For example, the table below stores complete
addresses and full names in single fields:
customer_no customer_name customer_address
Try to divide the information into logical parts; in our example, you
could create separate fields for first name and last name and for
address, city, state, and postal code. This principle is made explicit in
the first normal form, which we’ll discuss next.
#6: Normalization
When designing a schema, you must choose the appropriate data type
for each column of each table. You’ll choose a data type according to
the nature and format of the information expected to be stored in that
column.
If, for example, you are creating a column where telephone numbers
will be stored, you could associate a numeric data type to it (such as
INT) if it will only store numbers. But, if it must also store other
characters – such as parentheses, hyphens, or spaces – the data type
should be VARCHAR.
8: Indexing
Indexes are data structures that make it easy for the database engine to
locate the row(s) that meet a given search criteria. Each index is
associated with a table and contains the values of one or more columns
of that table. Read our article WHAT IS A DATABASE INDEX? for more
information.
9: Schema Partitioning
Large schemas are difficult to read and manage when the totality of
their tables exceeds the dimensions of a medium-sized poster or a
couple of screens. At this point, partitioning the schema becomes
necessary so that the schema can be visualized by sections.
There are several things that you, as a designer, can do to minimize the
risks of unauthorized access to information. One of them is to provide
columns that support encrypted or hashed data. String encryption and
hashing techniques alter the length of character strings and the set of
characters that can be allowed. When you’re defining VARCHAR
columns to store data that can be encrypted or hashed, you must take
into account both the maximum length and the range of characters
they can have.
Good design practices for security and user authentication include not
storing keys, even encrypted ones. All encrypted data carries the risk of
being decrypted. For this reason, hash functions that are not bijective
are used to protect keys. This means that there is no way to use a hash
function result to obtain the original data. Instead of storing the
encrypted key, only the hash of that key is stored.
A hashed key, even if it does not allow finding the original key, serves as
an authentication mechanism: if the hash of a password entered during
a login session matches the hash stored for the user trying to log in,
then there is no doubt that the password entered is the correct one.
For the second example, SIN and Course determine the date completed
(DateCompleted). This must also work for a composite PK.
Inference Rules
Armstrong’s axioms are a set of inference rules used to infer all the functional dependencies on a
relational database. They were developed by William W. Armstrong. The following describes what will
be used, in terms of notation, to explain these axioms.
Axiom of reflexivity
To fix this problem, we need to break the original table down into two
as follows:
Axiom of transitivity
Ch-11-Axiom-of-transitivity-300x30
The table below has information not directly related to the student; for
instance, ProgramID and ProgramName should have a table of its own.
ProgramName is not dependent on StudentNo; it’s dependent on
ProgramID.
To fix this problem, we need to break this table into two: one to hold
information about the student and the other to hold information about
the program.
Union
This rule suggests that if two tables are separate, and the PK is the
same, you may want to consider putting them together. It states that if
X determines Y and X determines Z then X must also determine Y and Z
(see Figure 11.4).
Decomposition is the reverse of the Union rule. If you have a table that
appears to contain two entities that are determined by the same PK,
consider breaking them up into two tables. This rule states that if X
determines Y and Z, then X determines Y and X determines Z separately
(see Figure 11.5).
Dependency Diagram
Normalization
Min-Max Normalization
Z-Score Normalization
Z-score normalization is a strategy of normalizing data that avoids
this outlier issue.
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all
non-key attributes are independent of each other. This means
that each column should be directly related to the primary key,
and not to any other columns in the same table.
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set
of values for an attribute.
Example:
Example:
. Key constraints
o Keys are the entity set that is used to identify an entity within its
entity set uniquely.
o An entity set can have multiple keys, but out of which one key will
be the primary key. A primary key can contain a unique and null
value in the relational table.
Example:
What is SQL?
Structured query language (SQL) is a programming language for storing
and processing information in a relational database. A relational
database stores information in tabular form, with rows and columns
representing different data attributes and the various relationships
between the data values. You can use SQL statements to store, update,
remove, search, and retrieve information from the database. You can
also use SQL to maintain and optimize database performance.
SQL table
A SQL table is the basic element of a relational database. The SQL
database table consists of rows and columns. Database engineers
create relationships between multiple database tables to optimize data
storage space.
For example, the database engineer creates a SQL table for products in
a store:
Then the database engineer links the product table to the color table
with the Color ID:
Color 1 Blue
Color 2 Red
SQL statements
For example, the following SQL statement uses a SQL INSERT command
to store Mattress Brand A, priced $499, into a table
named Mattress_table, with column names brand_name and cost:
INSERT INTO Mattress_table (brand_name, cost)
VALUES(‘A’,’499’);
Stored procedures
Parser
Correctness
The parser verifies that the SQL statement conforms to SQL semantics,
or rules, that ensure the correctness of the query statement. For
example, the parser checks if the SQL command ends with a semi-
colon. If the semi-colon is missing, the parser returns an error.
Authorization
The parser also validates that the user running the query has the
necessary authorization to manipulate the respective data. For
example, only admin users might have the right to delete data.
Relational engine
Storage engine
Data definition language (DDL) refers to SQL commands that design the
database structure. Database engineers use DDL to create and modify
database objects based on the business requirements. For example, the
database engineer uses the CREATE command to create database
objects such as tables, views, and indexes.
What is MySQL?
What is NoSQL?
DATABASE ADMINISTRATION
Physical implimentation of the data
The physical model describes the database in a specific working
environment that includes a specific database product, a specific
hardware and network configuration, and a specific level of data
update and retrieval activity. The physical implementation makes this
specification real. The implemented database contains objects (e.g.,
tables, views, indexes) that correspond to the objects in your physical
model.
Step 1. Select a server. The Data Use Analysis and Data Volume Analysis
models, define the guidelines for choosing a server with adequate CPU
power and enough hard disk capacity to see you through the first few
years of operation. The Data Use Analysis model lets you visually map
the important processes that run on a database. You then calculate
average and maximum read and write operations. From this analysis,
you can see the kind of processing power you need, then translate that
into the CPU model and the RAM you need for prime performance
Step 2. Create a database. You can use the Data Volume Analysis
model again to guide you in sizing the user data and transaction log file.
This model gives you a rough idea of space requirements, which you
can translate into initial database file sizes.
Step 3. Create the database objects. You have two options for creating
the tables, indexes, views, constraints, stored procedures, and triggers
that make up an operational database. First, you can use the physical
data model to guide you in writing SQL scripts that you can later
execute, or you can create the objects directly by using Enterprise
Manager's graphical and programming interface. Second, if you've used
CASE software such as Visio 2000 to help with the modeling, you can let
the CASE software generate the scripts for you
Step 4. Load the data. How should you approach loading data into the
database? The answer depends on where the data is coming from (the
source) and how much data you need to load. If you don't have any
data to load when you first create the database (a highly unusual
situation), you need to concentrate only on the data-capture schemes
you plan to implement, such as data entry forms or automated capture
programs like those used in monitored environments such as
manufacturing sites and hospital intensive care units. Most likely, you'll
have to import data from comma-delimited flat files or transfer data
from other systems into your database. If you plan to import delimited
files, the bulk copy program (bcp) might be your best option. Bcp
creates minimal overhead and can quickly load data because it doesn't
generate a transaction log, so you don't have to worry about
transaction rollbacks, index updating, or constraint checking. But if you
need to import or transform (reorganize, restructure) data from other
database or nondatabase systems, you should use SQL Server's Data
Transformation Services
Step 5. Create and run security scripts. Creating security scripts is,
unfortunately, a task that you have to perform manually. You can use
the security matrix from "The Security Matrix," March 2000, as a guide
to building your SQL Server security scripts. You can also set up security
through Enterprise Manager, then have Enterprise Manager generate
the scripts.
Step 6. Create backup and other utility tasks. Now that you've created
your database and loaded some data, you need to implement your
disaster-avoidance plan. You can use the SQL Server 7.0 Database
Maintenance Plan Wizard to help set up scheduled backups for all user
and system databases. You can also use the Maintenance Plan wizard
to help set up tasks to reorganize data and index pages, update
statistics, recover unused space from database files, check for database
integrity, and generate reports about utility job activity.
An index file is much smaller than the data file, and therefore searching
the index using a binary search can be carried out quickly. Multilevel
indexing goes one step further in the sense that it eliminates the need
for a binary search, by building indexes to the index itself. We will be
discussing these techniques later on in the chapter.
Figure 11.1
Although they may be of the same type, one or more of the fields
may be of varying length. For instance, students names are of
different lengths.
The records are of the same type, but one or more of the fields
may be a repeating field with multiple values.
If one or more fields are optional, not all records (of the same
type) will have values for them.
File headers
To search for a record on disk, one or more blocks are transferred into
main memory buffers. Programs then search for the desired record or
records within the buffers, using the header information.
If the address of the block that contains the desired record is not
known, the programs have to carry out a linear search through the
blocks. Each block is loaded into a buffer and checked until either the
record is found or all the blocks have been searched unsuccessfully
(which means the required record is not in the file). This can be very
time-consuming for a large file. The goal of a good file organisation is to
locate the block that contains a desired record with a minimum number
of block transfers.
Operations on files
Find (or Locate): Searches for the first record satisfying a search
condition (a condition specifying the criteria that the desired
records must satisfy). Transfers the block containing that record
into a buffer (if it is not already in main memory). The record is
located in the buffer and becomes the current record (ready to be
processed).
Read (or Get): Copies the current record from the buffer to a
program variable. This command may also advance the current
record pointer to the next record in the file.
FindNext: Searches for the next record in the file that satisfies the
search condition. Transfers the block containing that record into a
buffer, and the record becomes the current record.
Delete: Deletes the current record and updates the file on disk to
reflect the change requested.
Modify: Modifies some field values for the current record and
updates the file on disk to reflect the modification.
Insert: Inserts a new record in the file by locating the block where
the record is to be inserted, transferring that block into a buffer,
writing the (new) record into the buffer, and writing the buffer to
the disk file to reflect the insertion.
FindAll: Locates all the records in the file that satisfy a search
condition.
Lock-Based Protocols
Two Phase Locking Protocol
Timestamp-Based Protocols
Validation-Based Protocols
Lock-based Protocols
Lock Based Protocols in DBMS is a mechanism in which a
transaction cannot Read or Write the data until it acquires an
appropriate lock. Lock based protocols help to eliminate the
concurrency problem in DBMS for simultaneous transactions by
locking or isolating a particular transaction to a single user.
A lock is a data variable which is associated with a data item. This
lock signifies that operations that can be performed on the data item.
Locks in DBMS help synchronize access to the database items by
concurrent transactions.
A shared lock is also called a Read-only lock. With the shared lock,
the data item can be shared between transactions. This is because
you will never have permission to update data on the data item.
For example, consider a case where two transactions are reading the
account balance of a person. The database will let them read by
placing a shared lock. However, if another transaction wants to update
that account’s balance, shared lock prevent it until the reading process
is over.
With the Exclusive Lock, a data item can be read as well as written.
This is exclusive and can’t be held concurrently on the same data
item. X-lock is requested using lock-x instruction. Transactions may
unlock the data item after finishing the ‘write’ operation.
4. Pre-claiming Locking
Starvation
Database security is the processes, tools, and controls that secure and
protect databases against accidental and intentional threats. The
objective of database security is to secure sensitive data and maintain
the confidentiality, availability, and integrity of the database. In
addition to protecting the data within the database, database security
protects the database management system and associated
applications, systems, physical and virtual servers, and network
infrastructure.
Network security
Access management
Threat protection
Information protection
Procedure
2. In the list of databases, click the database for which you want to
set configuration parameters.
5. Click Apply.
1. You can click Restore to original values at any time to restore all
of the listed parameters to their default values.
1. You can click Restore at any time to restore all of the listed
parameters to their default values.
After the explosion of the internet, though, it became clear that there
were limitations to the traditional relational database. In particular, it
wasn’t easy to scale, it wasn’t built to function well in cloud
environments, and distributing it across multiple instances
required complex, manual work called sharding
Active-passive
Active-active
Multi-active
Multi-active is the system for availability used by CockroachDB, which
attempts to offer a better alternative to active-passive and active-active
configurations.
Like active-active configurations, all replicas can handle both reads and
writes in a multi-active system. But unlike active-active, multi-active
systems eliminate the possibility of inconsistencies by using a
consensus replication system, where writes are only committed when a
majority of replicas confirm they’ve received the write.
There are multiple types of database audits, including, but not limited
to, the following:
Data Auditing - A data audit monitors and logs data access and
modifications. It allows you to trace who accessed the data and
what changes were made, including identifying individuals
responsible for adding, modifying, or deleting data. It also enables
tracking of when these changes are made.
A database audit can also help with business continuity by making sure
the database is available and accessible at all times. In addition, should
an issue occur where a database becomes corrupt or attacked, a
database audit can ensure that a disaster recovery plan is in place.
Analyze data on user login attempts and review access control settings,
including authentication methods.
Once your changes and updates are made, monitor the database
carefully to ensure no additional issues are discovered. For best
practice, performing a database audit after making changes and
updates ensures the database is running properly
The process of finding the desired information from the set of items
stored in the form of elements in the computer memory is referred to
as ‘searching in data structure’. These sets of items are in various
forms, such as an array, tree, graph, or linked list. Another way of
defining searching in the data structure is by locating the desired
element of specific characteristics in a collection of items.
Our learners also read: Data structures and Algorithms free course!
Sequential search
Interval Search
Let’s get detailed insight into the linear search and binary search in the
data structure.
Linear Search
The linear search algorithm searches all elements in the array
sequentially. Its best execution time is one, whereas the worst
execution time is n, where n is the total number of items in the search
array.
It is the most simple search algorithm in data structure and checks each
item in the set of elements until it matches the search element until the
end of data collection. When data is unsorted, a linear search algorithm
is preferred.
Binary Search
Interpolation Search
It is an improved variant of the binary search algorithm and works on
the search element’s probing position. Similar to binary search
algorithms, it works efficiently only on sorted data collection.
Hashing
Understanding Hashing
Hashing is essentially the same as a secret code for data. An input (or
key) is passed to a hash function, which converts it into a fixed-length
string of characters—typically a combination of integers and letters.
The generated hash is then used to search data structures, usually an
array, as an index or address to find the corresponding data.
Compromises in hashing
Hashing has trade-offs, even if it has constant-time access appeal. The
quality of the hash function determines how efficient hashing is; poorly
constructed methods can increase collisions and reduce performance.
Furthermore, overly complicated hash functions could introduce
computational costs.
Some dichotomic searches only have results at the leaves of the tree,
such as the Huffman tree used in Huffman coding, or the
implicit classification tree used in Twenty Questions. Other dichotomic
searches also have results in at least some internal nodes of the tree,
such as a dichotomic search table for Morse code. There is thus some
looseness in the definition. Though there may indeed be only two paths
from any node, there are thus three possibilities at each step: choose
one onwards path or the other, or' stop at this node.
Sorting Algorithms
Computational complexity
Stability
Bubble Sort
Selection Sort
Insertion Sort
Merge Sort
Quick Sort
Heap Sort
Radix Sort
Bubble Sort, as the name suggests, repeatedly steps through the list,
compares each pair of adjacent items and swaps them if they are in the
wrong order. The pass through the list is repeated until the list is
sorted.
Selection Sort
Insertion Sort
Search Algorithms