PostgreSQL Performance (Part 1)

Photo by Kurzgesagt – In a Nutshell

The pg_stat_all_tables view shows statistics about accesses to all tables inside the database. If the table is not analyzed, the PostgreSQL query planner may generate inefficient query execution plans for queries, which results in bad performance of the PostgreSQL server.

We can use the following query to check the last date and time that statistics of tables were updated:

last_analyze: the last time this table was manually analyzed.
last_autoanalyze: the last time this table was analyzed by the autovacuum daemon.

SELECT
  schemaname,
  relname,
  DATE_TRUNC('minute', last_analyze) last_analyze,
  DATE_TRUNC('minute', last_autoanalyze) last_autoanalyze
FROM
  pg_catalog.pg_stat_all_tables
WHERE
  schemaname = 'public'
ORDER BY
  last_analyze DESC NULLS FIRST,
  last_autoanalyze DESC NULLS FIRST;

Sample query output:

schemaname|relname           |last_analyze|last_autoanalyze             |
----------+------------------+------------+-----------------------------+
public    |language          |            |                             |
public    |rental_by_category|            |                             |
public    |payment           |            |                             |
public    |category          |            |                             |
public    |store             |            |2024-09-22 09:52:00.000 +0700|
public    |customer          |            |2024-09-22 09:52:00.000 +0700|
public    |payment_p2022_04  |            |2024-09-22 09:52:00.000 +0700|
public    |city              |            |2024-09-22 09:52:00.000 +0700|
public    |address           |            |2024-09-22 09:52:00.000 +0700|

We need to ensure that tables are analyzed regularly. To analyze the table manually, we can use this query:

ANALYZE <table_name>;

We can query the same data from pg_stat_user_tables and pg_stat_sys_tables tables; however, they are filtered to show user-defined and system tables respectively.

Connection Statistics

Monitoring database connections is another method we can use to troubleshoot performance issues. By doing this, we can understand server's workload and optimize database server configurations based on usage patterns.

Connection Summary

The pg_stat_activity and pg_settings views provides the database information that we can query. The below query shows the total connections, non-idle connections, max connections and connection utilization percentage. By analyzing these numbers, we can identify if our database has a connection bottleneck and make necessary configuration adjustments.

WITH connection_state AS (
  SELECT 
    COUNT(1) AS total_connections, 
    SUM(CASE WHEN state <> 'idle' THEN 1 ELSE 0 END) AS non_idle_connections 
  FROM pg_catalog.pg_stat_activity
),
connection_setting AS (
  SELECT setting AS max_connections 
  FROM pg_catalog.pg_settings 
  WHERE name = 'max_connections'
)
SELECT
  connection_state.total_connections,
  connection_state.non_idle_connections,
  connection_setting.max_connections,
  ROUND(
    (100 * connection_state.total_connections::numeric / 
     connection_setting.max_connections::numeric), 2
  ) AS utilization_percentage
FROM
  connection_state,
  connection_setting;

Sample query output:

total_connections|non_idle_connections|max_connections|utilization_percentage|
-----------------+--------------------+---------------+----------------------+
8                |1                   |100            |8.00                  |

Non-idle Connections

From pg_stat_activity view, we can also query the distribution of non-idle connections over database queries inside multiple databases.

SELECT
  datname AS database_name,
  substr(query, 1, 200) truncated_query,
  count(1) AS idle_connections
FROM
  pg_catalog.pg_stat_activity
WHERE
  state <> 'idle'
GROUP BY
  database_name,
  truncated_query
ORDER BY
  idle_connections DESC;

Sample query output:

database_name|truncated_query       |idle_connections|
-------------+----------------------+----------------+
postgres     |select * from table_a |               1|
awesome_db   |select * from table_b |               2|

We can also query the pg_stat_activity to find long-running database queries which can be optimized. Long-running queries can cause spikes in CPU utilization.

The query below returns non-idle sessions that take more than 10 seconds, sorted by the runtime in descending order.

SELECT
  now() - query_start AS runtime,
  pid process_id,
  datname database_name,
  client_addr,
  client_hostname,
  query
FROM
  pg_catalog.pg_stat_activity
WHERE
  state <> 'idle'
  AND now() - query_start > '0 seconds'::interval
ORDER BY
  runtime DESC;

Sample query output:

runtime  |process_id|database_name|client_addr|client_hostname|query                 |
----------------+----------+-------------+-----------+---------------+---------------+
00:16:23 |       101|postgres     |172.18.0.1 |               |select * from table_a |

In case the database query takes too long to complete, we can terminate it mannually using the below query.

SELECT pg_terminate_backend(<process_id>);

The function pg_terminate_backend returns true or false based on the termination signal sent to backend process. Using this function may cause data corruption or inconsistency. We can try to use pg_cancel_backend first which attempts to request the backend process to cancel its current operation.

Running pg_terminate_backend or pg_cancel_backend requires sufficient administrative privileges or a role with pg_signal_backend privilege.

Frequently Executing Queries

Finding frequently executed queries in the database is another method for optimization. Based on this stat, we can review our application logic and make changes if necessary to reduce the number of queries.

The pg_stat_statements view tracks statistics for the planning and execution of all SQL statements executed by a database server. Below is a query that shows the number of runs per second for each database query inside databases.

WITH 
a AS (SELECT dbid, queryid, query, calls FROM pg_stat_statements),
b AS (SELECT dbid, queryid, query, calls FROM pg_stat_statements, pg_catalog.pg_sleep(1))
SELECT
  pd.datname AS database_name,
  a.query AS executed_query,
  sum(b.calls - a.calls) AS runs_per_second
FROM a, b, pg_catalog.pg_database pd
WHERE
  a.dbid = b.dbid
  AND a.queryid = b.queryid
  AND pd.oid = b.dbid
GROUP BY
  database_name,
  executed_query
ORDER BY
  runs_per_second DESC;

Sample query output:

database_name|executed_query       |runs_per_second       |
-------------+---------------------+----------------------+
postgres     |SELECT * from table_a|                    10|
postgres     |SELECT * from table_b|                   100|

If pg_stat_statements is not available by default, we can enable it by running the following queries:

CREATE EXTENSION pg_stat_statements;
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

-- Check the shared preload libraries
SELECT * FROM pg_file_settings WHERE name = 'shared_preload_libraries';

CPU-bound SQL Queries

CPU-bound queries are queries that primarily consume CPU resources rather than waiting on I/O operations. The database server spends most of its time executing those queries on the CPU.

Some characteristics of CPU-bound query:

High CPU utilization: we get spikes in database server CPU usage when executing the query.
Long execution time: the query takes long time to complete due to extensive data processing.
Complex operations: the query includes complex aggregations, sorting or calculations.
Limited I/O: the query does not consume lots of I/O operations.

Some downsides of CPU-bound query:

Performance degradation: the query slows down the database server and affects the execution of other queries.
Database throughput: the database server can process fewer database queries concurrently.
High latency: consumers may see delays when accessing data or completing transactions.

The pg_stat_statements view can be used to obtain CPU usage statistics for executed queries.

SELECT
  pss.userid,
  pss.dbid,
  pd.datname AS database_name,
  round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) AS total_time,
  pss.calls,
  round((pss.mean_exec_time + pss.mean_plan_time)::numeric, 2) AS mean,
  round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) AS cpu_usage,
  pss.query
FROM
  pg_stat_statements pss,
  pg_catalog.pg_database pd
WHERE
  pd.oid = pss.dbid
ORDER BY
  (pss.total_exec_time + pss.total_plan_time) DESC
LIMIT 50;

Sample query output (numbers are for demonstration only):

userid|dbid|database_name|total_time|calls|mean   |cpu_usage|query                |
------+----+-------------+----------+-----+-------+---------+---------------------+
    10|   5|postgres     |   5007.57|  500|1001.51|    98.81|select * from table_a|
    10|   5|postgres     |     25.08| 9937|   0.68|     17.8|select * from table_b|

The first query has a high mean execution time. The second query has high number of calls. We can use this information to perform tuning on the database server.

Query Analyzing

PostgreSQL generates a query plan for each query it receives. We can use the EXPLAIN command to see that plan.

EXPLAIN <query>;

Sample query output:

Sort  (cost=291.79..293.15 rows=544 width=224) (actual time=60.754..60.760 rows=69 loops=1)
   Sort Key: n.nspname, p.proname, (pg_get_function_arguments(p.oid))
   Sort Method: quicksort  Memory: 38kB
   Buffers: shared hit=97
   ->  Hash Join  (cost=1.08..223.93 rows=544 width=224) (actual time=11.679..60.696 rows=69 loops=1)
         Hash Cond: (p.pronamespace = n.oid)
         Buffers: shared hit=97
         ->  Seq Scan on pg_proc p  (cost=0.00..210.17 rows=1087 width=73) (actual time=0.067..59.669 rows=3320 loops=1)
               Filter: pg_function_is_visible(oid)
               Rows Removed by Filter: 12
               Buffers: shared hit=96
         ->  Hash  (cost=1.06..1.06 rows=2 width=68) (actual time=0.011..0.011 rows=2 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               Buffers: shared hit=1
               ->  Seq Scan on pg_namespace n  (cost=0.00..1.06 rows=2 width=68) (actual time=0.004..0.006 rows=2 loops=1)
                     Filter: ((nspname <> 'pg_catalog'::name) AND (nspname <> 'information_schema'::name))
                     Rows Removed by Filter: 2
                     Buffers: shared hit=1
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.288 ms
 Execution Time: 60.802 ms

The output of the EXPLAIN query is a tree of plan nodes. Each node has one line in the plan tre, showing the basic node type and the estimated cost that the query planner made for the execution of that node.

I usually use this online tool: https://explain.depesz.com to parse the EXPLAIN query output into a more readable format, and use that to find performance issues in the database queries.

We may have a dedicated article for this topic, but let's start with the basic. Stay tuned!

Bloated Indexes

In PostgreSQL, if your application has many frequent INSERT, DELETE, or UPDATE operations, the indexes may become bloated. Index bloat occurs when an index contains outdated or stale entries, taking up more disk space than necessary. This makes the index less efficient, leading to slower processing time.

Normally, deleted or stale index entries are not physically removed from the table, theya are stored until the VACUUM command is executed. This command cleans up dead entries and restores disk space.

To run VACUUM for a specific table and its indexes, the following command:

VACUUM <table_name>;

PostgreSQL can execute VACUUM and ANALYZE commands automatically. You can refer to this link for more information: https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

We can use the pg_stat_all_tables view to query information about tuples and the VACUUM process for each table inside the database server.

SELECT
  schemaname,
  relname,
  n_tup_ins,
  n_tup_upd,
  n_tup_del,
  n_live_tup,
  n_dead_tup,
  date_trunc('minute', last_vacuum) last_vacuum,
  date_trunc('minute', last_autovacuum) last_autovacuum
FROM
  pg_catalog.pg_stat_all_tables
WHERE
  schemaname = 'public'
ORDER BY
  n_dead_tup DESC;

n_tup_ins: the number of rows inserted.
n_tup_upd: the number of rows updated.
n_tup_del: the number of rows deleted.
n_live_tup: the estimated live rows.
n_dead_tup: the estimated dead rows.
last_vacuum: the last time that the table was vacuumed manually.
last_autovacuum: the last time that the table was vacuumed by the autovacuum daemon.

Sample query output:

schemaname|relname           |n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|last_vacuum|last_autovacuum              |
----------+------------------+---------+---------+---------+----------+----------+-----------+-----------------------------+
public    |address           |      603|        0|        0|       603|        41|           |                             |
public    |payment_p2022_05  |     2677|        0|        0|      2677|      1312|           |2024-09-22 09:52:00.000 +0700|
public    |film_actor        |     5462|        0|        0|      5462|         0|           |2024-09-22 09:52:00.000 +0700|
public    |payment_p2022_07  |     2334|        0|        0|      2334|         0|           |2024-09-22 09:52:00.000 +0700|

Dead tuples are rows that have been marked as deleted but not yet removed from the database tables. A high value for n_dead_tup indicates table bloat.