Advanced Databases and Mining
Advanced Databases and Mining
Query language
Source: Nebula Graph
Early query langauges were
extremely complicated, which
meant that only specially trained
individuals could interact with
electronic databases. But the
interfaces have evolved and have
now become far more user
friendly, thus making it possible for
casual users to access database
information.
The most popular types of query
modes are the menu, the “fill-in-
the-blank” technique, and the
structured query. The menu is the
best option for novices as it just
requires them to simply pick from
a range of alternatives that is
displayed on the screen. The fill-
in-the-blank technique prompts
the user to enter key words as
search statements. The structured
query approach is very effective
when it is used on relational
databases. It has a formal,
powerful syntax that is actually a
programming language, and it has
the ability to accommodate logical
operators. One implementation of
the structured query approach, the
Structured Query Language
(SQL), has the form:
Normalization Techniques at a
Glance
Four common normalization
techniques may be useful:
scaling to a range
clipping
log scaling
z-score
The following charts show the
effect of each normalization
technique on the distribution of the
raw feature (price) on the left. The
charts are based on the data set
from 1985 Ward's Automotive
Yearbook that is part of
the UCI Machine Learning
Repository under Automobile Data
Set.
Scaling to a range
Recall from MLCC that scaling
means converting floating-point
feature values from their natural
range (for example, 100 to 900)
into a standard range—usually 0
and 1 (or sometimes -1 to +1).
Use the following simple formula
to scale to a range:
A comparison of a native
distribution and a capped
distribution. In the
native distribution, nearly all
values fall within the range 1 to 4,
but
a small percentage of values lie
between 5 and 55. In the capped
distribution,
all values originally above 4 now
have the
value 4.
Figure 2. Comparing a raw
distribution and its clipped version.
Log Scaling
Log scaling computes the log of
your values to compress a wide
range to a narrow range.
Z-Score
Z-score is a variation of scaling
that represents the number of
standard deviations away from the
mean. You would use z-score to
ensure your feature distributions
have mean = 0 and std = 1. It’s
useful when there are a few
outliers, but not so extreme that
you need clipping.
What is an “Isolation
Level”?
Types of OLAP
systems
OLAP systems typically fall into
one of three types:
Multidimensional OLAP
(MOLAP) is OLAP that indexes
directly into a multidimensional
database.
SQL Injection
SQL injection is a code injection
technique attackers use to gain
unauthorized access to a
database by injecting malicious
SQL commands into web page
inputs.