PostgreSQL - DISTINCT ON expression
Last Updated :
11 Oct, 2024
The DISTINCT ON
clause in PostgreSQL allows us to retrieve unique rows based on specific columns by offering more flexibility than the standard DISTINCT
clause. DISTINCT ON
allow us to specify which row to keep for each unique value based on an ORDER BY
clause.
This is particularly useful for selecting the most recent or highest values in grouped data. In this article, we’ll explore the PostgreSQL DISTINCT ON syntax, examples and so on.
What is the PostgreSQL DISTINCT ON Clause?
- The
DISTINCT ON
in PostgreSQL clause allows us to retrieve unique rows based on one or more columns in a table.
- However, unlike the standard
DISTINCT
clauses that discard all duplicate rows, DISTINCT ON
gives us more control.
- It enables us to determine which row to retain by arranging the rows in a particular order through the ORDER BY clause.
Syntax
SELECT DISTINCT ON (column1, column2, ...) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;
Explanation:
- DISTINCT ON (column1, column2, ...): This part tells PostgreSQL to return the first row for each unique combination of the specified columns.
- ORDER BY: The
ORDER BY
clause is crucial because it determines which row from each group of duplicates will be kept. The rows are ordered based on the columns specified here.
Key Features of PostgreSQL DISTINCT ON
- Allows fetching the first unique row based on specified columns.
- Works with the
ORDER BY
clause to determine which row to keep in case of duplicates.
- Enables retrieving data in a more controlled manner compared to the standard
DISTINCT
.
Examples of Using PostgreSQL DISTINCT ON
Let’s explore some examples to understand how DISTINCT ON
works in real-world scenarios.
Example 1: Retrieve Highest Score for Each Student
First, create a table student_scores
to store students' scores in various subjects.
CREATE TABLE student_scores (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
subject VARCHAR(50) NOT NULL,
score INTEGER NOT NULL
);
Next, insert some sample data:
INSERT INTO student_scores (name, subject, score)
VALUES
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'Physics', 92),
('Bob', 'Physics', 88),
('Charlie', 'Math', 95),
('Charlie', 'Physics', 90);
Now, let’s retrieve the highest score for each student in any subject:
SELECT DISTINCT ON (name) name, subject, score
FROM student_scores
ORDER BY name, score DESC;
Output:
name | subject | score |
---|
Alice | Physics | 92 |
Bob | Physics | 88 |
Charlie | Math | 95 |
Explanation: In this query, the DISTINCT ON (name)
clause ensures that we get one row for each student, and the ORDER BY
clause sorts the scores in descending order so that the highest score for each student is returned.
Example 2: Log Data – Latest Request by URL
Suppose we have a log table that records URLs and the duration of each request:
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
url VARCHAR(255) NOT NULL,
request_duration INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL
);
Insert some data:
INSERT INTO logs (url, request_duration, timestamp)
VALUES
('/home', 120, '2024-01-01 10:00:00'),
('/about', 95, '2024-01-01 11:00:00'),
('/home', 110, '2024-01-01 12:00:00'),
('/contact', 105, '2024-01-01 10:30:00'),
('/about', 100, '2024-01-01 12:30:00');
To retrieve the most recent request duration for each URL, use:
SELECT DISTINCT ON (url) url, request_duration, timestamp
FROM logs
ORDER BY url, timestamp DESC;
Output:
url | request_duration | timestamp |
---|
/about | 100 | 2024-01-01 12:30:00 |
/contact | 105 | 2024-01-01 10:30:00 |
/home | 110 | 2024-01-01 12:00:00 |
Explanation: Here, DISTINCT ON (url)
returns the most recent request for each URL, thanks to the ORDER BY url, timestamp DESC
clause.
Important Points about PostgreSQL DISTINCT ON expression
- The PostgreSQL
DISTINCT
ON
expression is used to return only the first row of each set of rows where the given expression has the same value, effectively removing duplicates based on the specified column.
- It is used to retain the "first row" of each group of duplicates in the result set, based on the ordering specified in the
ORDER BY
clause.
- The
DISTINCT ON
expression must always match the leftmost expression in the ORDER BY
clause to ensure predictable results.
- Unlike the
DISTINCT
clause, which removes all duplicates, DISTINCT ON
allows for more fine-grained control by specifying which duplicate row to keep.
Conlusion
Overall, the PostgreSQL DISTINCT ON clause helps you get unique rows based on specific columns while giving you control over which row to keep. By using the ORDER BY
clause, you can decide which entry, such as the highest score or the most recent log, should be shown. This makes it a useful tool for organizing and retrieving data more efficiently in PostgreSQL.
Similar Reads
PostgreSQL - Index On Expression When working with databases, optimizing query performance is crucial, especially when dealing with large datasets. One powerful technique in PostgreSQL is leveraging indexes on expressions. This approach allows you to optimize queries that involve expressions, ensuring faster retrieval times and eff
3 min read
Group By Vs Distinct in PostgreSQL When working with PostgreSQL, efficiently organizing and retrieving data is critical for database performance and decision-making. Two commonly used clauses are DISTINCT and GROUP BY that serve distinct purposes in data retrieval. DISTINCT is used to filter out duplicate values, while GROUP BY is em
6 min read
PostgreSQL - COUNT() Function The COUNT() function in PostgreSQL is an aggregate function used to return the number of rows that match a specified condition in a query. This article will explore the various syntaxes of the COUNT() function and provide practical examples to help you understand its usage in PostgreSQL.SyntaxDepend
2 min read
PostgreSQL - IN operator The IN operator in PostgreSQL is a powerful and efficient tool used to filter records based on a predefined set of values. When used with the WHERE clause, it simplifies SQL queries and enhances readability, making it a key component of SQL query optimization for data retrieval and database manipula
4 min read
PostgreSQL - EXISTS Operator The EXISTS operator in PostgreSQL is a powerful SQL feature used to check the existence of rows in a subquery. It is particularly useful when working with correlated subqueries, where the inner query depends on values from the outer query. The EXISTS operator returns true if the subquery returns at
4 min read
PostgreSQL - ROW_NUMBER Function The PostgreSQL ROW_NUMBER function is a crucial part of window functions, enabling users to assign unique sequential integers to rows within a dataset. This function is invaluable for tasks such as ranking, pagination and identifying duplicates. In this article, we will provide PostgreSQL ROW_NUMBER
4 min read
PostgreSQL - SELECT DISTINCT clause The SELECT statement with the DISTINCT clause to remove duplicate rows from a query result set in PostgreSQL. By leveraging the DISTINCT clause, you can ensure your query results contain only unique rows, whether you're dealing with a single column or multiple columns.The DISTINCT clause in PostgreS
2 min read
PostgreSQL - EXCEPT Operator In PostgreSQL, the EXCEPT operator is a powerful tool used to return distinct rows from the first query that are not present in the output of the second query. This operator is useful when you need to compare result sets of two or more queries and find the differences.Let us better understand the EX
3 min read
DISTINCT vs GROUP BY in SQL SQL (Structured Query Language) is used to manage and manipulate the data in relational databases. It can be used for tasks such as database querying, data editing, database and table creation and deletion, and granting user permissions. We can use the DISTINCT keyword and GROUP BY clause when we wa
4 min read
PostgreSQL - LAST_VALUE Function The PostgreSQL LAST_VALUE() function is a powerful window function used to retrieve the last value within a specified window frame of a query result set. It is particularly beneficial for performing advanced data analysis and retrieving the final value in ordered partitions.In this article, weâll ex
4 min read