1) Union and Union All
1) Union and Union All
--------------------
UNION and UNION ALL are both set operations in SQL used to combine the result sets
of two or more SELECT statements. However, they have a key difference in how they
handle duplicate rows.
union:-
------
It removes duplicate rows from the combined result set.
Columns in the result set must be of the same data type and in the same order
union all:-
-----------
It does not eliminate duplicate rows from the combined result set.
Columns in the result set must be of the same data type and in the same order
2)remove duplicates
-------------------
To remove duplicates from a table in a relational database using SQL, you can use
the DISTINCT keyword in a SELECT statement or the GROUP BY clause. The method you
choose depends on your specific requirements.
By using distinct:
------------------
SELECT DISTINCT column1, column2, ...
FROM your_table;
By using groupby:
----------------
SELECT column1, column2, ...
FROM your_table
GROUP BY column1, column2, ...;
using row_number():-
-----------------
WITH CTE AS (
SELECT column1, column2, ..., ROW_NUMBER() OVER (PARTITION BY column1,
column2, ... ORDER BY (SELECT NULL)) AS rn
FROM your_table
)
DELETE FROM CTE WHERE rn > 1;
rank:
-----
WITH RankedSalaries AS (
SELECT
employee_id,
salary,
RANK() OVER (ORDER BY salary DESC) AS ranking
FROM your_table
)
SELECT
employee_id,
salary
FROM RankedSalaries
WHERE ranking = 3;
dense_rank:
----------
WITH RankedSalaries AS (
SELECT
employee_id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_ranking
FROM your_table
)
SELECT
employee_id,
salary
FROM RankedSalaries
WHERE dense_ranking = 3;
normalization
-------------
Normalization is a database design technique that helps organize data in a
relational database to reduce redundancy and dependency. The goal of normalization
is to eliminate data anomalies
There are different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), each representing
a different level of normalization. Here's a brief overview of the first three
normal forms:
deference between rank and dense_rank and to calculate the 3rd highest salary
usingdense_rank/rank
-----------------------------------------------------------------------------------
---------------
RANK() and DENSE_RANK() are both window functions in SQL that assign a rank to each
row based on the values in one or more columns. However, they differ in how they
handle ties (rows with equal values).
1 Rank():
---------
RANK() OVER (ORDER BY salary DESC)
2 dense_rank
-----------
DENSE_RANK() OVER (ORDER BY salary DESC)
write sql query to retrive the employee details from the employee table such that
the result are returned in alternating values of male and female??
To retrieve employee details from the employee table in alternating values of male
and female, you can use the ROW_NUMBER() window function
WITH NumberedEmployees AS (
SELECT
employee_id,
employee_name,
gender,
ROW_NUMBER() OVER (PARTITION BY gender ORDER BY employee_id) AS row_num
FROM employee
)
SELECT
employee_id,
employee_name,
gender
FROM NumberedEmployees
ORDER BY row_num, gender;
Historical changes are maintained in the dimension table, allowing you to query the
database to see what the data looked like at specific points in time.
New Records for Changes:
Instead of updating the existing record, a new record is added with the updated
information, and the original record is marked as inactive.
by using selfjoin:
------------------
SELECT t1.*
FROM your_table t1
JOIN your_table t2 ON t1.column_name = t2.column_name AND t1.id <> t2.id;
how to get the distinc t records out of a table with out using the distinct
keyword??
-----------------------------------------------------------------------------------
--
If you want to get distinct records from a table without using the DISTINCT
keyword, you can achieve this using other techniques such as GROUP BY.
SELECT column1, MIN(column2) AS column2, MIN(column3) AS column3
FROM your_table
GROUP BY column1;
(or)
SELECT TOP 1 *
FROM your_table
ORDER BY timestamp_column DESC;
Data Lake:
----------
Concept:
--------
A data lake is a centralized repository that allows businesses to store all their
structured and unstructured data at any scale.
Characteristics:
-----------------
Supports diverse data types and formats.
Scales horizontally to handle massive amounts of data
Schema Flexibility:
------------------
Typically has schema-on-read, meaning the schema is applied when the data is
queried rather than when it is ingested.
Delta Lake:
---------
Concept:
--------
Delta Lake is an open-source storage layer that brings ACID transactions to Apache
Spark and big data workloads.
ACID Transactions:
-----------------
Provides asset transactions, ensuring atomicity, consistency, isolation, and
durability.
Time Travel:
-------------
Provides time travel capabilities, allowing you to query data at different points
in time, view changes, and roll back to previous versions.
Optimizations
---------------
Implements optimizations like data skipping, indexing, and caching to improve query
performance
surrogate key
--------------
A surrogate key is a unique identifier assigned to a record or row in a database
table to serve as its primary key
A surrogate key must be unique within the table, ensuring that each record has a
distinct identifier.
Surrogate keys are usually stable and do not change over time, even if the values
of other attributes in the record change
Commonly implemented as integers but they can also be GUIDs
Ensures a stable identifier even if natural keys change.
Facilitates joins and relationships between tables
use in datawarehouse:
---------------------
Surrogate keys are often used in data warehousing and business intelligence
environments to facilitate data integration and dimensional modeling.
(or)
by using subquery
-------------------
SELECT *
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);
WITH RankedSalaries AS (
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS
rank_in_dept
FROM employees
)
SELECT
employee_id,
department_id,
salary
FROM RankedSalaries
WHERE rank_in_dept = 1;
Pivot table:
-----------
A pivot table is a data processing technique used in databases and spreadsheet
software to transform data from a long format (rows with multiple entries) into a
wide format
In SQL, the PIVOT operator is used to perform pivoting.
SELECT *
FROM (
SELECT Product, Month, Revenue
FROM Sales
) AS PivotSource
PIVOT (
SUM(Revenue)
FOR Month IN ([January], [February], [March])
) AS PivotTable;
window functions:
-----------------
Window functions, also known as windowing or analytic functions, are a category of
SQL functions that perform calculations
ROW_NUMBER():
------------
Assigns a unique integer to each row within the result set based on a specified
order.
SELECT
column1,
column2,
ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM your_table;
RANK(): Assigns a rank to each row based on the specified order, with equal values
receiving the same rank.
------
DENSE_RANK(): Similar to RANK(), but without gaps between rank values for equal
rows.
------------
SELECT
column1,
column2,
RANK() OVER (ORDER BY column1) AS ranking,
DENSE_RANK() OVER (ORDER BY column1) AS dense_ranking
FROM your_table;
find the new unique user count on each day skipping the users who have come the
prior??
-----------------------------------------------------------------------------------
---
WITH UserActivityRanked AS (
SELECT
user_id,
activity_date,
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date) AS
prior_activity_date
FROM user_activity
)
SELECT
activity_date,
COUNT(user_id) AS new_unique_users
FROM UserActivityRanked
WHERE prior_activity_date IS NULL OR prior_activity_date < activity_date
GROUP BY activity_date
ORDER BY activity_date;
Designing a data warehouse for a small cafe involves understanding the data you
need to analyze and making decisions about how to structure that data efficiently.
In the context of a cafe, you might have data related to sales, customers,
products, and time
FactTables:sales
-------------
CREATE TABLE Sales (
SaleID INT PRIMARY KEY,
ProductID INT,
CustomerID INT,
SaleDate DATE,
QuantitySold INT,
TotalAmount DECIMAL(10, 2),
CONSTRAINT fk_product FOREIGN KEY (ProductID) REFERENCES Products(ProductID),
CONSTRAINT fk_customer FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
DimensionTable:product
----------------------
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Category VARCHAR(50)
);
customers:
------------
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(255)
);
Time:
-----
CREATE TABLE Time (
SaleDate DATE PRIMARY KEY,
DayOfWeek VARCHAR(10),
Month VARCHAR(10),
Year INT
);
using in microsoft:
-------------------
SELECT TOP 5 *
FROM your_table;
for mysql:
--------
SELECT *
FROM your_table
LIMIT 5;
how to fetch the 3rd highest sales in the last year from the current date?
--------------------------------------------------------------------------
To fetch the 3rd highest sales in the last year from the current date, you can use
the ORDER BY clause in conjunction with LIMIT
SELECT amount
FROM sales
WHERE sale_date BETWEEN CURRENT_DATE - INTERVAL 1 YEAR AND CURRENT_DATE
ORDER BY amount DESC
LIMIT 1 OFFSET 2;
query to fetch the second max salary from table without using analytical
functions??
-----------------------------------------------------------------------------------
If you want to fetch the second-highest salary without using analytical functions
like ROW_NUMBER(), DENSE_RANK(), or LIMIT, you can use a subquery with the MAX()
function
how to create the checkpoint if check point log is deleted in which point onward
the package will run??
-----------------------------------------------------------------------------------
--------------------
In SQL Server Integration Services (SSIS), a checkpoint file (.ckp) is used to save
the package execution status so that if the package fails, it can restart from the
last successfully completed task. If the checkpoint file is deleted, the package
will start from the beginning unless you configure the package to create a new
checkpoint file.