ETL Interview+Prep
ETL Interview+Prep
CTE stands for common table expression. A CTE is a temporary named result set.
A CTE is a temporary named result set that you can reference within another SELECT,
INSERT, UPDATE, or DELETE statement.
A CTE is similar to a view in that it is not stored as an object and lasts only for the
duration of the query. A CTE is a temporary result set that's defined within the
execution scope of a single statement. In other words, a CTE is like a view, except that
it's only available for the duration of the query.
In practice, a CTE is a result set that remains in memory for the scope of a single
execution of a SELECT, INSERT, UPDATE, DELETE, or MERGEstatement.
A CTE allows you to define a temporary named result set that available temporarily in
the execution scope of a statement. CTEs can be used to simplify complex queries by
breaking them down into smaller, more manageable pieces. For example, let's say you
have a query that retrieves data from three different tables.
With a CTE, you can define each table as its own CTE and then reference those CTES in
your main query. This can make your code more readable and easier to maintain. There
are two ways to create a CTE in SQL Server: using the WITH clause or using CREATE TABLE.
The WITH clause is the simpler of the two methods and is the recommended way to
create CTEs.
CTEs can be used to encapsulate reusable code. If you find yourself writing similar
queries over and over again, you can create a CTE once and then reference it whenever
you need it.
In this syntax:
To skip the first 10 products and return the rest, you use the OFFSET clause as shown in
the following statement:
SELECT
product_name,
list_price
FROM
production.products
ORDER BY
list_price,
product_name
OFFSET 10 ROWS;
To skip the first 10 products and select the next 10 products, you use
both OFFSET and FETCH clauses as follows:
SELECT
product_name,
list_price
FROM
production.products
ORDER BY
list_price,
product_name
OFFSET 10 ROWS
SELECT
product_name,
list_price
FROM
production.products
ORDER BY
list_price DESC,
product_name
OFFSET 0 ROWS
In this example, the ORDER BY clause sorts the products by their list prices in descending
order. Then, the OFFSET clause skips zero row and the FETCH clause fetches the first 10
products from the list.
The SELECT TOP clause allows you to limit the number of rows or percentage of rows
returned in a query result set.
[WITH TIES]
FROM
table_name
ORDER BY
column_name;
expression
PERCENT
WITH TIES
The WITH TIES allows you to return more rows with values that match the last row in the
limited result set. Note that WITH TIES may cause more rows to be returned than you
specify in the expression.
For example, if you want to return the most expensive products, you can use the TOP 1.
However, if two or more products have the same prices as the most expensive product,
then you miss the other most expensive products in the result set.
To avoid this, you can use TOP 1 WITH TIES. It will include not only the first expensive
product but also the second one, and so on.
SQL Server SELECT TOP examples
The following example uses a constant value to return the top 10 most expensive
products.
SELECT TOP 10
product_name,
list_price
FROM
production.products
ORDER BY
list_price DESC;
The following example uses PERCENT to specify the number of products returned in the
result set. The production.products table has 321 rows, therefore, one percent of 321 is a
fraction value ( 3.21), SQL Server rounds it up to the next whole number which is four ( 4)
in this case.
product_name,
list_price
FROM
production.products
ORDER BY
list_price DESC;
3) Using TOP WITH TIES to include rows that match the values in the last row
The following statement returns the top three most expensive products:
product_name,
list_price
FROM
production.products
ORDER BY
list_price DESC;
In this example, the third expensive product has a list price of 6499.99. Because the
statement used TOP WITH TIES, it returned three more products whose list prices are the
same as the third one.
Nesting subquery
A subquery can be nested within another subquery. SQL Server supports up to 32 levels
of nesting. Consider the following example:
SELECT
product_name,
list_price
FROM
production.products
WHERE
list_price > (
SELECT
AVG (list_price)
FROM
production.products
WHERE
brand_id IN (
SELECT
brand_id
FROM
production.brands
WHERE
brand_name = 'Strider'
OR brand_name = 'Trek'
ORDER BY
list_price;
In place of an expression
With IN or NOT IN
With ANY or ALL
With EXISTS or NOT EXISTS
In UPDATE, DELETE, orINSERT statement
In the FROM clause
SELECT
order_id,
order_date,
SELECT
MAX (list_price)
FROM
sales.order_items i
WHERE
i.order_id = o.order_id
) AS max_list_price
FROM
sales.orders o
A subquery that is used with the IN operator returns a set of zero or more values. After
the subquery returns values, the outer query makes use of them.
The following query finds the names of all mountain bikes and road bikes products that
the Bike Stores sell.
SELECT
product_id,
product_name
FROM
production.products
WHERE
category_id IN (
SELECT
category_id
FROM
production.categories
WHERE
);
Assuming that the subquery returns a list of value v1, v2, … vn. The ANY operator
returns TRUE if one of a comparison pair (scalar_expression, vi) evaluates to TRUE; otherwise,
it returns FALSE.
SELECT
product_name,
list_price
FROM
production.products
WHERE
SELECT
AVG (list_price)
FROM
production.products
GROUP BY
brand_id
The following query finds the products whose list price is greater than or equal to the
average list price returned by the subquery:
SELECT
product_name,
list_price
FROM
production.products
WHERE
SELECT
AVG (list_price)
FROM
production.products
GROUP BY
brand_id
SELECT
customer_id,
first_name,
last_name,
city
FROM
sales.customers c
WHERE
EXISTS (
SELECT
customer_id
FROM
sales.orders o
WHERE
o.customer_id = c.customer_id
ORDER BY
first_name,
last_name;
If you use the NOT EXISTS instead of EXISTS, you can find the customers who did not buy
any products in 2017.
SELECT
customer_id,
first_name,
last_name,
city
FROM
sales.customers c
WHERE
NOT EXISTS (
SELECT
customer_id
FROM
sales.orders o
WHERE
o.customer_id = c.customer_id
AND YEAR (order_date) = 2017
ORDER BY
first_name,
last_name;
A clustered index stores data rows in a sorted structure based on its key values. Each
table has only one clustered index because data rows can be only sorted in one order. A
table that has a clustered index is called a clustered table.
Cluster index is a type of index which sorts the data rows in the table on their
key values. In the Database, there is only one clustered index per table.
A clustered index defines the order in which data is stored in the table which
can be sorted in only one way. So, there can be an only a single clustered
index for every table. In an RDBMS, usually, the primary key allows you to
create a clustered index based on that specific column/
Whenever you apply clustered indexing in a table, it will perform sorting in that
table only. You can create only one clustered index in a table like primary key.
Clustered index is as same as dictionary where the data is arranged by
alphabetical order.
You can have only one clustered index in one table, but you can have one
clustered index on multiple columns, and that type of index is called composite
index.
SQL Server CREATE CLUSTERED INDEX syntax
ON schema_name.table_name (column_list);
In this syntax:
First, specify the name of the clustered index after the CREATE CLUSTERED
INDEX clause.
Second, specify the schema and table name on which you want to create the
index.
Third, list one or more columns included in the index.
For example, a book can have more than one index, one at the beginning
which displays the contents of a book unit wise while the second index shows
the index of terms in alphabetical order.
A non-clustering index is defined in the non-ordering field of the table. This
type of indexing method helps you to improve the performance of queries that
use keys which are not assigned as a primary key. A non-clustered index
allows you to add a unique key for a table.
Index is a lookup table associated with actual table or view that is used by the database
to improve the data retrieval performance timing. In index , keys are stored in a
structure (B-tree) that enables SQL Server to find the row or rows associated with the
key values quickly and efficiently. Index gets automatically created if primary key and
unique constraint is defined on the table. There are two types of index −
Clustered Index - Table is created with primary key constraints then database
engine automatically create clustered index . In this data sort or store in the table
or view based on their key and values.
Non-Clustered Index - Table is created with UNIQUE constraints then database
engine automatically create non-clustered index . A nonclustered index contains
the nonclustered index key values and each key value entry has a pointer to the
data row that contains the key value.
Sr. Key Clustered Index Non-Clustered Index
No.
5 Performance Data retrieval is faster than Data update is faster than
non-cluster index clustered index
In this syntax:
First, specify the name of the index after the CREATE NONCLUSTERED INDEX clause.
Note that the NONCLUSTERED keyword is optional.
Second, specify the table name on which you want to create the index and a list
of columns of that table as the index key columns.
Oracle Correlated Subquery
Unlike the above subquery, a correlated subquery is a subquery that uses values from the outer
query. In addition, a correlated subquery may be evaluated once for each row selected by the
outer query. Because of this, a query that uses a correlated subquery could be slow.
Let’s take some examples of the correlated subqueries to better understand how they
work.
The following query finds all products whose list price is above average for their
category.
SELECT
product_id,
product_name,
list_price
FROM
products p
WHERE
list_price > (
SELECT
AVG( list_price )
FROM
products
WHERE
category_id = p.category_id
);
Code language: SQL (Structured Query Language) (sql)
In the above query, the outer query is:
SELECT
product_id,
product_name,
list_price
FROM
products p
WHERE
list_price >
Code language: SQL (Structured Query Language) (sql)
SELECT
AVG( list_price )
FROM
products
WHERE
category_id = p.category_id
Code language: SQL (Structured Query Language) (sql)
For each product from the products table, Oracle has to execute the correlated subquery
to calculate the average price by category.
The following query returns all products and the average standard cost based on the
product category:
SELECT
product_id,
product_name,
standard_cost,
ROUND(
(
SELECT
AVG( standard_cost )
FROM
products
WHERE
category_id = p.category_id
),
2
) avg_standard_cost
FROM
products p
ORDER BY
product_name;
Code language: SQL (Structured Query Language) (sql)
For each product from the products table, Oracle executed the correlated subquery to
calculate the average standard of cost for the product category.
Note that the above query used the ROUND() function to round the average standard
cost to two decimals.
SELECT
customer_id,
name
FROM
customers
WHERE
NOT EXISTS (
SELECT
*
FROM
orders
WHERE
orders.customer_id = customers.customer_id
)
ORDER BY
name;
Oracle EXISTS
The Oracle EXISTS operator is a Boolean operator that returns either true or false.
The EXISTS operator is often used with a subquery to test for the existence of rows:
SELECT
*
FROM
table_name
WHERE
EXISTS(subquery);
Code language: SQL (Structured Query Language) (sql)
The EXISTS operator returns true if the subquery returns any rows, otherwise, it returns
false. In addition, the EXISTS operator terminates the processing of the subquery once
the subquery returns the first row.