0% found this document useful (0 votes)
24 views11 pages

1) Union and Union All

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

1) Union and Union All

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 11

1)union and union all

--------------------

UNION and UNION ALL are both set operations in SQL used to combine the result sets
of two or more SELECT statements. However, they have a key difference in how they
handle duplicate rows.

union:-
------
It removes duplicate rows from the combined result set.
Columns in the result set must be of the same data type and in the same order
union all:-
-----------
It does not eliminate duplicate rows from the combined result set.
Columns in the result set must be of the same data type and in the same order

2)remove duplicates
-------------------

To remove duplicates from a table in a relational database using SQL, you can use
the DISTINCT keyword in a SELECT statement or the GROUP BY clause. The method you
choose depends on your specific requirements.

By using distinct:
------------------
SELECT DISTINCT column1, column2, ...
FROM your_table;

By using groupby:
----------------
SELECT column1, column2, ...
FROM your_table
GROUP BY column1, column2, ...;

using row_number():-
-----------------
WITH CTE AS (
SELECT column1, column2, ..., ROW_NUMBER() OVER (PARTITION BY column1,
column2, ... ORDER BY (SELECT NULL)) AS rn
FROM your_table
)
DELETE FROM CTE WHERE rn > 1;

find third highest salary:


--------------------------
row_number:
---------
SELECT employee_id,salary FROM RankedSalaries WHERE row_num = 3;

rank:
-----

WITH RankedSalaries AS (
SELECT
employee_id,
salary,
RANK() OVER (ORDER BY salary DESC) AS ranking
FROM your_table
)
SELECT
employee_id,
salary
FROM RankedSalaries
WHERE ranking = 3;

dense_rank:
----------
WITH RankedSalaries AS (
SELECT
employee_id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_ranking
FROM your_table
)
SELECT
employee_id,
salary
FROM RankedSalaries
WHERE dense_ranking = 3;

normalization
-------------
Normalization is a database design technique that helps organize data in a
relational database to reduce redundancy and dependency. The goal of normalization
is to eliminate data anomalies
There are different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), each representing
a different level of normalization. Here's a brief overview of the first three
normal forms:

First Normal Form (1NF):


-----------------------

Eliminate duplicate columns from the same table.


Create a separate table for each group of related data and identify a primary key
for each table.
Ensure that each column contains atomic (indivisible) values.
Second Normal Form (2NF):
-----------------------

Meet the requirements of 1NF.


Remove partial dependencies by putting columns that are not dependent on the
primary key into separate tables.
Ensure that every non-prime attribute (a column not part of the primary key) is
fully functionally dependent on the primary key.
Third Normal Form (3NF):
------------------------

Meet the requirements of 2NF.


Remove transitive dependencies by moving columns that are dependent on another non-
prime attribute into separate tables.
Ensure that every non-prime attribute is directly dependent on the primary key.

deference between rank and dense_rank and to calculate the 3rd highest salary
usingdense_rank/rank
-----------------------------------------------------------------------------------
---------------
RANK() and DENSE_RANK() are both window functions in SQL that assign a rank to each
row based on the values in one or more columns. However, they differ in how they
handle ties (rows with equal values).
1 Rank():
---------
RANK() OVER (ORDER BY salary DESC)

2 dense_rank
-----------
DENSE_RANK() OVER (ORDER BY salary DESC)

calculate 3rd highest salary:


-----------------------------
WITH RankedSalaries AS (
SELECT
employee_id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_ranking
FROM your_table
)
SELECT
employee_id,
salary
FROM RankedSalaries
WHERE dense_ranking = 3;

write sql query to retrive the employee details from the employee table such that
the result are returned in alternating values of male and female??

To retrieve employee details from the employee table in alternating values of male
and female, you can use the ROW_NUMBER() window function
WITH NumberedEmployees AS (
SELECT
employee_id,
employee_name,
gender,
ROW_NUMBER() OVER (PARTITION BY gender ORDER BY employee_id) AS row_num
FROM employee
)
SELECT
employee_id,
employee_name,
gender
FROM NumberedEmployees
ORDER BY row_num, gender;

what is scd2 in sql?


SCD2 (Slowly Changing Dimension Type 2) is a technique used in data warehousing and
database design to handle historical changes to dimension data and SCD2 is a method
for managing changes to this data over time.
In a SCD2 scenario, when there are changes to the attributes of a dimension (such
as a product name, customer address, or employee job title), the system doesn't
update the existing record. Instead, it creates a new record with the updated
information and maintains a history of changes.

The key characteristics of SCD2 are:

Effective Date Range:


Each record in the dimension table has an effective date range, indicating the
period during which that version of the data is valid.
Historical Tracking:

Historical changes are maintained in the dimension table, allowing you to query the
database to see what the data looked like at specific points in time.
New Records for Changes:

Instead of updating the existing record, a new record is added with the updated
information, and the original record is marked as inactive.

how to find duplicates in a table??


SELECT column_name, COUNT(*)
FROM your_table
GROUP BY column_name
HAVING COUNT(*) > 1;

by using selfjoin:
------------------
SELECT t1.*
FROM your_table t1
JOIN your_table t2 ON t1.column_name = t2.column_name AND t1.id <> t2.id;

how to get the distinc t records out of a table with out using the distinct
keyword??
-----------------------------------------------------------------------------------
--

If you want to get distinct records from a table without using the DISTINCT
keyword, you can achieve this using other techniques such as GROUP BY.
SELECT column1, MIN(column2) AS column2, MIN(column3) AS column3
FROM your_table
GROUP BY column1;

how to fetch the latest record out of a incremental table??


------------------------------------------------------------
SELECT *
FROM your_table
ORDER BY timestamp_column DESC
LIMIT 1;

(or)
SELECT TOP 1 *
FROM your_table
ORDER BY timestamp_column DESC;

data lake vs delta lake??


---------------------------
Delta Lake is a technology built on top of Apache Spark for managing data lakes. It
provides ACID transactions, scalable metadata handling, and schema enforcement for
data lakes, making it easier to manage and govern large volumes of data in a
reliable and consistent manner

Data Lake:
----------
Concept:
--------
A data lake is a centralized repository that allows businesses to store all their
structured and unstructured data at any scale.

Characteristics:
-----------------
Supports diverse data types and formats.
Scales horizontally to handle massive amounts of data

Schema Flexibility:
------------------
Typically has schema-on-read, meaning the schema is applied when the data is
queried rather than when it is ingested.

Delta Lake:
---------
Concept:
--------
Delta Lake is an open-source storage layer that brings ACID transactions to Apache
Spark and big data workloads.

ACID Transactions:
-----------------
Provides asset transactions, ensuring atomicity, consistency, isolation, and
durability.

Schema Enforcement and Evolution:


---------------------------------
Allows you to enforce a schema on write, ensuring that the data written to Delta
Lake adheres to a predefined schema.

Time Travel:
-------------
Provides time travel capabilities, allowing you to query data at different points
in time, view changes, and roll back to previous versions.

Optimizations
---------------
Implements optimizations like data skipping, indexing, and caching to improve query
performance

how to print the even rows from a table??


---------------------------------------
WITH EvenNumberedRows AS (
SELECT
your_columns,
ROW_NUMBER() OVER (ORDER BY some_column) AS row_num
FROM your_table
)
SELECT
your_columns
FROM EvenNumberedRows
WHERE row_num % 2 = 0;

fact table and dimension table??


----------------------------------
In a data warehouse or a database designed for business intelligence, fact tables
and dimension tables are key components used to organize and store data in a
structured manner
Fact Table
-----------
A fact table is a table in a database schema that typically stores quantitative
data (facts) about a business process
Fact tables often contain foreign keys that link to dimension tables.
They usually have numerical values or facts that can be aggregated
ex:-A Sales Fact Table might have columns such as order_date, product_id,
customer_id, quantity_sold, and revenue.
Dimension Table:
---------------
A dimension table is a table that stores descriptive attributes related to the
business process
Dimension tables typically have a primary key that is used as a foreign key in the
fact table.
Dimension tables are often smaller in size compared to fact tables
ex:-A Product Dimension Table might have columns such as product_id, product_name,
category, and manufacturer.

Star Schema and Snowflake Schema


---------------------------------
In a star schema, fact tables are directly connected to dimension tables, forming a
star-like structure.
In a snowflake schema, dimension tables may be normalized into sub-dimensions,
creating a snowflake-like structure.

surrogate key
--------------
A surrogate key is a unique identifier assigned to a record or row in a database
table to serve as its primary key
A surrogate key must be unique within the table, ensuring that each record has a
distinct identifier.
Surrogate keys are usually stable and do not change over time, even if the values
of other attributes in the record change
Commonly implemented as integers but they can also be GUIDs
Ensures a stable identifier even if natural keys change.
Facilitates joins and relationships between tables
use in datawarehouse:
---------------------
Surrogate keys are often used in data warehousing and business intelligence
environments to facilitate data integration and dimensional modeling.

query for highest earner in employee??


-------------------------------------
Select Max(SAL) as maximum from employee

(or)
by using subquery
-------------------

SELECT *
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

Rolling sum in sql??


To calculate a rolling sum in sql you can use the window function SUM() along with
the OVER() clause.
SELECT
your_column,
SUM(your_column) OVER (ORDER BY your_order_column) AS rolling_sum
FROM your_table;
dept wise highest salary??
--------------------------
To find the highest earner in each department, you can use the ROW_NUMBER() window
function

WITH RankedSalaries AS (
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS
rank_in_dept
FROM employees
)
SELECT
employee_id,
department_id,
salary
FROM RankedSalaries
WHERE rank_in_dept = 1;

get nth highest salary from employee ??


--------------------------------------
WITH RankedSalaries AS (
SELECT
EmployeeID,
Salary,
ROW_NUMBER() OVER (ORDER BY Salary DESC) AS salary_rank
FROM Employee
)
SELECT
EmployeeID,
Salary
FROM RankedSalaries
WHERE salary_rank = N;

Pivot table:
-----------
A pivot table is a data processing technique used in databases and spreadsheet
software to transform data from a long format (rows with multiple entries) into a
wide format
In SQL, the PIVOT operator is used to perform pivoting.

SELECT *
FROM (
SELECT Product, Month, Revenue
FROM Sales
) AS PivotSource
PIVOT (
SUM(Revenue)
FOR Month IN ([January], [February], [March])
) AS PivotTable;

window functions:
-----------------
Window functions, also known as windowing or analytic functions, are a category of
SQL functions that perform calculations
ROW_NUMBER():
------------
Assigns a unique integer to each row within the result set based on a specified
order.
SELECT
column1,
column2,
ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM your_table;

RANK() and DENSE_RANK():


-----------------------

RANK(): Assigns a rank to each row based on the specified order, with equal values
receiving the same rank.
------
DENSE_RANK(): Similar to RANK(), but without gaps between rank values for equal
rows.
------------
SELECT
column1,
column2,
RANK() OVER (ORDER BY column1) AS ranking,
DENSE_RANK() OVER (ORDER BY column1) AS dense_ranking
FROM your_table;

find the new unique user count on each day skipping the users who have come the
prior??
-----------------------------------------------------------------------------------
---

you can use window functions such as LAG() and COUNT()

WITH UserActivityRanked AS (
SELECT
user_id,
activity_date,
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date) AS
prior_activity_date
FROM user_activity
)
SELECT
activity_date,
COUNT(user_id) AS new_unique_users
FROM UserActivityRanked
WHERE prior_activity_date IS NULL OR prior_activity_date < activity_date
GROUP BY activity_date
ORDER BY activity_date;

Designing a data warehouse for a small cafe involves understanding the data you
need to analyze and making decisions about how to structure that data efficiently.
In the context of a cafe, you might have data related to sales, customers,
products, and time
FactTables:sales
-------------
CREATE TABLE Sales (
SaleID INT PRIMARY KEY,
ProductID INT,
CustomerID INT,
SaleDate DATE,
QuantitySold INT,
TotalAmount DECIMAL(10, 2),
CONSTRAINT fk_product FOREIGN KEY (ProductID) REFERENCES Products(ProductID),
CONSTRAINT fk_customer FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
DimensionTable:product
----------------------
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Category VARCHAR(50)
);
customers:
------------
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(255)
);
Time:
-----
CREATE TABLE Time (
SaleDate DATE PRIMARY KEY,
DayOfWeek VARCHAR(10),
Month VARCHAR(10),
Year INT
);

how to fetch 5 records without using in built functions??


---------------------------------------------------------
If you want to fetch only 5 records from a table without using built-in functions
like LIMIT or TOP, you can use the ROWNUM
using in oracle:
----------------
SELECT *
FROM your_table
WHERE ROWNUM <= 5;

using in microsoft:
-------------------
SELECT TOP 5 *
FROM your_table;

for mysql:
--------
SELECT *
FROM your_table
LIMIT 5;

how to fetch the 3rd highest sales in the last year from the current date?
--------------------------------------------------------------------------

To fetch the 3rd highest sales in the last year from the current date, you can use
the ORDER BY clause in conjunction with LIMIT

SELECT amount
FROM sales
WHERE sale_date BETWEEN CURRENT_DATE - INTERVAL 1 YEAR AND CURRENT_DATE
ORDER BY amount DESC
LIMIT 1 OFFSET 2;

String split function:


----------------------
In SQL, the method for splitting a string into multiple parts varies across
different database systems. While some databases provide built-in functions for
string splitting, others might not have native support, and you would need to use a
combination of other functions to achieve the same result. Below, I'll provide
examples using both SQL Server and MySQL.

DECLARE @inputString NVARCHAR(MAX) = 'apple,orange,banana';

SELECT value AS SplitValue


FROM STRING_SPLIT(@inputString, ',');

query to fetch the second max salary from table without using analytical
functions??
-----------------------------------------------------------------------------------
If you want to fetch the second-highest salary without using analytical functions
like ROW_NUMBER(), DENSE_RANK(), or LIMIT, you can use a subquery with the MAX()
function

SELECT MAX(salary) AS second_highest_salary


FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

explain semi addictive facts:


------------------------------
Semi-additive facts are measures in a data warehouse that can be aggregated along
some dimensions but not along others.
There are two common types of semi-additive facts:
Partially Additive:
-----------------
These are facts that can be aggregated along some dimensions but not along others.
For example, consider a "Balance" fact in a banking data warehouse. You can sum
balances across customers or branches
Non-additive:
------------
These are facts that cannot be aggregated along any dimension.
For example, consider a "Percentage" fact. It makes no sense to add percentages
across any dimension

how to create the checkpoint if check point log is deleted in which point onward
the package will run??
-----------------------------------------------------------------------------------
--------------------

In SQL Server Integration Services (SSIS), a checkpoint file (.ckp) is used to save
the package execution status so that if the package fails, it can restart from the
last successfully completed task. If the checkpoint file is deleted, the package
will start from the beginning unless you configure the package to create a new
checkpoint file.

changing configuration of package at runtime??


--------------------------------------------
Changing the configuration of an Integration Services (SSIS) package at runtime can
be done using parameters and expressions. Parameters allow you to set values that
can be modified at runtime, and expressions enable dynamic assignment of values
based on conditions or calculations.

You might also like