PostgreSQL - DISTINCT ON expression
Last Updated :
11 Oct, 2024
The DISTINCT ON
clause in PostgreSQL allows us to retrieve unique rows based on specific columns by offering more flexibility than the standard DISTINCT
clause. DISTINCT ON
allow us to specify which row to keep for each unique value based on an ORDER BY
clause.
This is particularly useful for selecting the most recent or highest values in grouped data. In this article, we’ll explore the PostgreSQL DISTINCT ON syntax, examples and so on.
What is the PostgreSQL DISTINCT ON Clause?
- The
DISTINCT ON
in PostgreSQL clause allows us to retrieve unique rows based on one or more columns in a table.
- However, unlike the standard
DISTINCT
clauses that discard all duplicate rows, DISTINCT ON
gives us more control.
- It enables us to determine which row to retain by arranging the rows in a particular order through the ORDER BY clause.
Syntax
SELECT DISTINCT ON (column1, column2, ...) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;
Explanation:
- DISTINCT ON (column1, column2, ...): This part tells PostgreSQL to return the first row for each unique combination of the specified columns.
- ORDER BY: The
ORDER BY
clause is crucial because it determines which row from each group of duplicates will be kept. The rows are ordered based on the columns specified here.
Key Features of PostgreSQL DISTINCT ON
- Allows fetching the first unique row based on specified columns.
- Works with the
ORDER BY
clause to determine which row to keep in case of duplicates.
- Enables retrieving data in a more controlled manner compared to the standard
DISTINCT
.
Examples of Using PostgreSQL DISTINCT ON
Let’s explore some examples to understand how DISTINCT ON
works in real-world scenarios.
Example 1: Retrieve Highest Score for Each Student
First, create a table student_scores
to store students' scores in various subjects.
CREATE TABLE student_scores (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
subject VARCHAR(50) NOT NULL,
score INTEGER NOT NULL
);
Next, insert some sample data:
INSERT INTO student_scores (name, subject, score)
VALUES
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'Physics', 92),
('Bob', 'Physics', 88),
('Charlie', 'Math', 95),
('Charlie', 'Physics', 90);
Now, let’s retrieve the highest score for each student in any subject:
SELECT DISTINCT ON (name) name, subject, score
FROM student_scores
ORDER BY name, score DESC;
Output:
name | subject | score |
---|
Alice | Physics | 92 |
Bob | Physics | 88 |
Charlie | Math | 95 |
Explanation: In this query, the DISTINCT ON (name)
clause ensures that we get one row for each student, and the ORDER BY
clause sorts the scores in descending order so that the highest score for each student is returned.
Example 2: Log Data – Latest Request by URL
Suppose we have a log table that records URLs and the duration of each request:
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
url VARCHAR(255) NOT NULL,
request_duration INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL
);
Insert some data:
INSERT INTO logs (url, request_duration, timestamp)
VALUES
('/home', 120, '2024-01-01 10:00:00'),
('/about', 95, '2024-01-01 11:00:00'),
('/home', 110, '2024-01-01 12:00:00'),
('/contact', 105, '2024-01-01 10:30:00'),
('/about', 100, '2024-01-01 12:30:00');
To retrieve the most recent request duration for each URL, use:
SELECT DISTINCT ON (url) url, request_duration, timestamp
FROM logs
ORDER BY url, timestamp DESC;
Output:
url | request_duration | timestamp |
---|
/about | 100 | 2024-01-01 12:30:00 |
/contact | 105 | 2024-01-01 10:30:00 |
/home | 110 | 2024-01-01 12:00:00 |
Explanation: Here, DISTINCT ON (url)
returns the most recent request for each URL, thanks to the ORDER BY url, timestamp DESC
clause.
Important Points about PostgreSQL DISTINCT ON expression
- The PostgreSQL
DISTINCT
ON
expression is used to return only the first row of each set of rows where the given expression has the same value, effectively removing duplicates based on the specified column.
- It is used to retain the "first row" of each group of duplicates in the result set, based on the ordering specified in the
ORDER BY
clause.
- The
DISTINCT ON
expression must always match the leftmost expression in the ORDER BY
clause to ensure predictable results.
- Unlike the
DISTINCT
clause, which removes all duplicates, DISTINCT ON
allows for more fine-grained control by specifying which duplicate row to keep.
Conlusion
Overall, the PostgreSQL DISTINCT ON clause helps you get unique rows based on specific columns while giving you control over which row to keep. By using the ORDER BY
clause, you can decide which entry, such as the highest score or the most recent log, should be shown. This makes it a useful tool for organizing and retrieving data more efficiently in PostgreSQL.
Similar Reads
PostgreSQL Tutorial In this PostgreSQL tutorial youâll learn the basic data types(Boolean, char, text, time, int etc.), Querying and Filtering techniques like select, where, in, order by, etc. managing and modifying the tables in PostgreSQL. Weâll cover all the basic to advance concepts of PostgreSQL in this tutorial.
8 min read
PostgreSQL DATEDIFF Function PostgreSQL doesnât have a DATEDIFF function like some other databases, but you can still calculate the difference between dates using simple subtraction. This approach allows you to find out how many days, months, or years separate two dates. In this article, we'll explore how to compute date differ
6 min read
PostgreSQL - Data Types PostgreSQL is a powerful, open-source relational database management system that supports a wide variety of data types. These data types are essential for defining the nature of the data stored in a database column. which allows developers to define, store, and manipulate data in a way that aligns w
5 min read
PostgreSQL - Psql commands PostgreSQL, or Postgres, is an object-relational database management system that utilizes the SQL language. PSQL is a powerful interactive terminal for working with the PostgreSQL database. It enables users to execute queries efficiently and manage databases effectively.Here, we highlight some of th
2 min read
Top 50 PostgreSQL Interview Questions and Answers Are you preparing for a PostgreSQL interview? PostgreSQL is a powerful open-source relational database management system (RDBMS) that is well-known for its reliability, scalability, and rich set of features. Itâs a favorite among developers and businesses alike, making it essential to master if we w
15+ min read
PostgreSQL - Create Database Creating a database in PostgreSQL is an important task for developers and database administrators to manage data effectively. PostgreSQL provides multiple ways to create a database, catering to different user preferences, whether through the command-line interface or using a graphical interface like
5 min read
How to Dump and Restore PostgreSQL Database? PostgreSQL remains among the most efficient and widely applied open-source relational database management systems. It provides the superior function of saving, configuring, and extracting information most effectively. In the process of migrating data, creating backups, or transferring databases betw
6 min read
PostgreSQL - SERIAL When working with PostgreSQL, we need to create tables with unique primary keys. PostgreSQL offers a powerful feature known as the SERIAL pseudo-type which simplifies generating auto-incrementing sequences for columns. In this article, weâll learn about the PostgreSQL SERIAL pseudo-type by explain h
5 min read
PostgreSQL - DISTINCT ON expression The DISTINCT ON clause in PostgreSQL allows us to retrieve unique rows based on specific columns by offering more flexibility than the standard DISTINCT clause. DISTINCT ON allow us to specify which row to keep for each unique value based on an ORDER BY clause. This is particularly useful for select
5 min read
PostgreSQL Connection String A connection string is an essential component that enables applications to communicate with databases or other data sources by providing the necessary configuration details. It consolidates critical information such as the server address, database name, user credentials, and additional parameters li
4 min read