How to Calculate Percentiles in PostgreSQL

Analytic functions in databases operate on top of rows to return a group of rows that can be further analyzed. The PERCENT_CONT is an analytic or Windows function that is used for continuous distribution in the PostgreSQL database. PERCENT_DISK is also an analytic function that is used to sort the percentile of the specific values for discrete distribution.

This guide will explain how to calculate percentiles in PostgreSQL.

How to Calculate Percentile/Median in PostgreSQL?

The PERCENT_CONT function is used to calculate the percentile based on the continuous distribution of the column value in a table. The value after applying the percentile function can not be equal to any specific values from the table. The PERCENTILE_DISC function returns the percentile of the current row concerning the current partition.

Syntax

The syntax to use the PERCENTILE_CONT in the PostgreSQL table is mentioned below:

SELECT
 PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY column_name),
 PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY column_name),
 PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY column_name),
 PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY column_name)
 FROM table_name;

Here:

- The query selects from the table to apply multiple percentile functions using the ORDER BY clause on the column.
- The WITHIN GROUP function is used for aggregation to combine multiple rows in the group of rows.
- The user can either apply a single PERCENTILE_CONT function with any of the percentile numbers or multiple functions as mentioned above.

The following is the syntax for using the PERCENTILE_DISK function in PostgreSQL:

SELECT
 PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY column_name)
 FROM table_name;

The above query is almost similar to the PERCENTILE_CONT with a simple keyword difference which in this query is PERCENTILE_DISK with percentile value as its parameter.

Example 1: PERCENTILE_CONT in PostgreSQL

Use the following query to get the data from the sales tables using the ORDER BY clause on the sale column:

SELECT * FROM sales
 ORDER BY sale;

The data is displayed in ascending order by sale column:

img

Use the following code to get the 25th percentile of the sale column:

SELECT PERCENTILE_CONT(0.25) 
 WITHIN GROUP(ORDER BY sale) 
 FROM sales;

The above query will get the 25th percentile of the sale column data from the sales table which aggregates the sale column to display only a single percentile value:

img

The following query simply uses multiple PERCENTILE_CONT functions in a single query to get percentiles of the sale column:

SELECT 
 PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY sale),
 PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY sale),
 PERCENTILE_CONT(0.75) WITHIN GROUP(ORDER BY sale)
 FROM sales;

The following screenshot displays 3 percentiles which are the 25th, 50th, and 75th percentiles:

img

Example 2: PERCENTILE_DISC in PostgreSQL

The following example displays the use of the PERCENTILE_DISK function in PostgreSQL on the sales table:

SELECT * FROM sales
 ORDER BY sale;

The above query displays the data from the sales table which is ordered by the sale column:

img

The following query displays the 50th percentile of the sales row set:

SELECT PERCENTILE_DISC(0.5) 
 WITHIN GROUP(ORDER BY sale) 
 FROM sales;

The following screenshot displays the 50th discrete percentile from the sale column:

img

That’s all about calculating percentiles in PostgreSQL.

Conclusion

To calculate the median/percentiles in PostgreSQL, PERCENT_CONT, and PERCENT_DISC functions can be used. Both of these functions are analytic or Windows functions to evaluate continuous and discrete distributions of the data respectively. The user can also apply multiple CONT and DISC functions in a PostgreSQL query to find the median of the data. This guide demonstrated the process of calculating percentiles in the PostgreSQL database using multiple examples.