Photo by Kurzgesagt – In a Nutshell
Performance Troubleshooting
Table Statistics
The pg_stat_all_tables
view shows statistics about accesses to all tables inside the database.
If the table is not analyzed, the PostgreSQL query planner may generate inefficient query execution plans for queries,
which results in bad performance of the PostgreSQL server.
We can use the following query to check the last date and time that statistics of tables were updated:
last_analyze
: the last time this table was manually analyzed.last_autoanalyze
: the last time this table was analyzed by the autovacuum daemon.
SELECT
schemaname,
relname,
DATE_TRUNC('minute', last_analyze) last_analyze,
DATE_TRUNC('minute', last_autoanalyze) last_autoanalyze
FROM
pg_catalog.pg_stat_all_tables
WHERE
schemaname = 'public'
ORDER BY
last_analyze DESC NULLS FIRST,
last_autoanalyze DESC NULLS FIRST;
Sample query output:
schemaname|relname |last_analyze|last_autoanalyze |
----------+------------------+------------+-----------------------------+
public |language | | |
public |rental_by_category| | |
public |payment | | |
public |category | | |
public |store | |2024-09-22 09:52:00.000 +0700|
public |customer | |2024-09-22 09:52:00.000 +0700|
public |payment_p2022_04 | |2024-09-22 09:52:00.000 +0700|
public |city | |2024-09-22 09:52:00.000 +0700|
public |address | |2024-09-22 09:52:00.000 +0700|
We need to ensure that tables are analyzed regularly. To analyze the table manually, we can use this query:
ANALYZE <table_name>;
We can query the same data from pg_stat_user_tables
and pg_stat_sys_tables
tables; however, they are filtered
to show user-defined and system tables respectively.
Connection Statistics
Monitoring database connections is another method we can use to troubleshoot performance issues. By doing this, we can understand server's workload and optimize database server configurations based on usage patterns.
Connection Summary
The pg_stat_activity
and pg_settings
views provides the database information that we can query. The below query shows
the total connections, non-idle connections, max connections and connection utilization percentage. By analyzing these numbers,
we can identify if our database has a connection bottleneck and make necessary configuration adjustments.
WITH connection_state AS (
SELECT
COUNT(1) AS total_connections,
SUM(CASE WHEN state <> 'idle' THEN 1 ELSE 0 END) AS non_idle_connections
FROM pg_catalog.pg_stat_activity
),
connection_setting AS (
SELECT setting AS max_connections
FROM pg_catalog.pg_settings
WHERE name = 'max_connections'
)
SELECT
connection_state.total_connections,
connection_state.non_idle_connections,
connection_setting.max_connections,
ROUND(
(100 * connection_state.total_connections::numeric /
connection_setting.max_connections::numeric), 2
) AS utilization_percentage
FROM
connection_state,
connection_setting;
Sample query output:
total_connections|non_idle_connections|max_connections|utilization_percentage|
-----------------+--------------------+---------------+----------------------+
8 |1 |100 |8.00 |
Non-idle Connections
From pg_stat_activity
view, we can also query the distribution of non-idle connections over database queries inside multiple databases.
SELECT
datname AS database_name,
substr(query, 1, 200) truncated_query,
count(1) AS idle_connections
FROM
pg_catalog.pg_stat_activity
WHERE
state <> 'idle'
GROUP BY
database_name,
truncated_query
ORDER BY
idle_connections DESC;
Sample query output:
database_name|truncated_query |idle_connections|
-------------+----------------------+----------------+
postgres |select * from table_a | 1|
awesome_db |select * from table_b | 2|
We can also query the pg_stat_activity
to find long-running database queries which can be optimized. Long-running queries
can cause spikes in CPU utilization.
The query below returns non-idle sessions that take more than 10 seconds, sorted by the runtime in descending order.
SELECT
now() - query_start AS runtime,
pid process_id,
datname database_name,
client_addr,
client_hostname,
query
FROM
pg_catalog.pg_stat_activity
WHERE
state <> 'idle'
AND now() - query_start > '0 seconds'::interval
ORDER BY
runtime DESC;
Sample query output:
runtime |process_id|database_name|client_addr|client_hostname|query |
----------------+----------+-------------+-----------+---------------+---------------+
00:16:23 | 101|postgres |172.18.0.1 | |select * from table_a |
In case the database query takes too long to complete, we can terminate it mannually using the below query.
SELECT pg_terminate_backend(<process_id>);
The function pg_terminate_backend
returns true
or false
based on the termination signal sent to backend process.
Using this function may cause data corruption or inconsistency. We can try to use pg_cancel_backend
first which attempts
to request the backend process to cancel its current operation.
Running pg_terminate_backend
or pg_cancel_backend
requires sufficient administrative privileges or a role with pg_signal_backend
privilege.
Frequently Executing Queries
Finding frequently executed queries in the database is another method for optimization. Based on this stat, we can review our application logic and make changes if necessary to reduce the number of queries.
The pg_stat_statements
view tracks statistics for the planning and execution of all SQL statements executed by a database server.
Below is a query that shows the number of runs per second for each database query inside databases.
WITH
a AS (SELECT dbid, queryid, query, calls FROM pg_stat_statements),
b AS (SELECT dbid, queryid, query, calls FROM pg_stat_statements, pg_catalog.pg_sleep(1))
SELECT
pd.datname AS database_name,
a.query AS executed_query,
sum(b.calls - a.calls) AS runs_per_second
FROM a, b, pg_catalog.pg_database pd
WHERE
a.dbid = b.dbid
AND a.queryid = b.queryid
AND pd.oid = b.dbid
GROUP BY
database_name,
executed_query
ORDER BY
runs_per_second DESC;
Sample query output:
database_name|executed_query |runs_per_second |
-------------+---------------------+----------------------+
postgres |SELECT * from table_a| 10|
postgres |SELECT * from table_b| 100|
If pg_stat_statements
is not available by default, we can enable it by running the following queries:
CREATE EXTENSION pg_stat_statements;
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';
-- Check the shared preload libraries
SELECT * FROM pg_file_settings WHERE name = 'shared_preload_libraries';
CPU-bound SQL Queries
CPU-bound queries are queries that primarily consume CPU resources rather than waiting on I/O operations. The database server spends most of its time executing those queries on the CPU.
Some characteristics of CPU-bound query:
- High CPU utilization: we get spikes in database server CPU usage when executing the query.
- Long execution time: the query takes long time to complete due to extensive data processing.
- Complex operations: the query includes complex aggregations, sorting or calculations.
- Limited I/O: the query does not consume lots of I/O operations.
Some downsides of CPU-bound query:
- Performance degradation: the query slows down the database server and affects the execution of other queries.
- Database throughput: the database server can process fewer database queries concurrently.
- High latency: consumers may see delays when accessing data or completing transactions.
The pg_stat_statements
view can be used to obtain CPU usage statistics for executed queries.
SELECT
pss.userid,
pss.dbid,
pd.datname AS database_name,
round((pss.total_exec_time + pss.total_plan_time)::numeric, 2) AS total_time,
pss.calls,
round((pss.mean_exec_time + pss.mean_plan_time)::numeric, 2) AS mean,
round((100 * (pss.total_exec_time + pss.total_plan_time) / sum((pss.total_exec_time + pss.total_plan_time)::numeric) OVER ())::numeric, 2) AS cpu_usage,
pss.query
FROM
pg_stat_statements pss,
pg_catalog.pg_database pd
WHERE
pd.oid = pss.dbid
ORDER BY
(pss.total_exec_time + pss.total_plan_time) DESC
LIMIT 50;
Sample query output (numbers are for demonstration only):
userid|dbid|database_name|total_time|calls|mean |cpu_usage|query |
------+----+-------------+----------+-----+-------+---------+---------------------+
10| 5|postgres | 5007.57| 500|1001.51| 98.81|select * from table_a|
10| 5|postgres | 25.08| 9937| 0.68| 17.8|select * from table_b|
The first query has a high mean execution time. The second query has high number of calls. We can use this information to perform tuning on the database server.
Query Analyzing
PostgreSQL generates a query plan for each query it receives. We can use the EXPLAIN
command to see that plan.
EXPLAIN <query>;
Sample query output:
Sort (cost=291.79..293.15 rows=544 width=224) (actual time=60.754..60.760 rows=69 loops=1)
Sort Key: n.nspname, p.proname, (pg_get_function_arguments(p.oid))
Sort Method: quicksort Memory: 38kB
Buffers: shared hit=97
-> Hash Join (cost=1.08..223.93 rows=544 width=224) (actual time=11.679..60.696 rows=69 loops=1)
Hash Cond: (p.pronamespace = n.oid)
Buffers: shared hit=97
-> Seq Scan on pg_proc p (cost=0.00..210.17 rows=1087 width=73) (actual time=0.067..59.669 rows=3320 loops=1)
Filter: pg_function_is_visible(oid)
Rows Removed by Filter: 12
Buffers: shared hit=96
-> Hash (cost=1.06..1.06 rows=2 width=68) (actual time=0.011..0.011 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on pg_namespace n (cost=0.00..1.06 rows=2 width=68) (actual time=0.004..0.006 rows=2 loops=1)
Filter: ((nspname <> 'pg_catalog'::name) AND (nspname <> 'information_schema'::name))
Rows Removed by Filter: 2
Buffers: shared hit=1
Planning:
Buffers: shared hit=4
Planning Time: 0.288 ms
Execution Time: 60.802 ms
The output of the EXPLAIN
query is a tree of plan nodes. Each node has one line in the plan tre, showing the basic
node type and the estimated cost that the query planner made for the execution of that node.
I usually use this online tool: https://explain.depesz.com to parse the EXPLAIN
query output into a more readable format,
and use that to find performance issues in the database queries.
We may have a dedicated article for this topic, but let's start with the basic. Stay tuned!
Bloated Indexes
In PostgreSQL, if your application has many frequent INSERT, DELETE, or UPDATE operations, the indexes may become bloated. Index bloat occurs when an index contains outdated or stale entries, taking up more disk space than necessary. This makes the index less efficient, leading to slower processing time.
Normally, deleted or stale index entries are not physically removed from the table, theya are stored until the
VACUUM
command is executed. This command cleans up dead entries and restores disk space.
To run VACUUM
for a specific table and its indexes, the following command:
VACUUM <table_name>;
PostgreSQL can execute VACUUM
and ANALYZE
commands automatically. You can refer to this link for more information: https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM
We can use the pg_stat_all_tables
view to query information about tuples and the VACUUM process for each table inside the database server.
SELECT
schemaname,
relname,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup,
date_trunc('minute', last_vacuum) last_vacuum,
date_trunc('minute', last_autovacuum) last_autovacuum
FROM
pg_catalog.pg_stat_all_tables
WHERE
schemaname = 'public'
ORDER BY
n_dead_tup DESC;
n_tup_ins
: the number of rows inserted.n_tup_upd
: the number of rows updated.n_tup_del
: the number of rows deleted.n_live_tup
: the estimated live rows.n_dead_tup
: the estimated dead rows.last_vacuum
: the last time that the table was vacuumed manually.last_autovacuum
: the last time that the table was vacuumed by the autovacuum daemon.
Sample query output:
schemaname|relname |n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|last_vacuum|last_autovacuum |
----------+------------------+---------+---------+---------+----------+----------+-----------+-----------------------------+
public |address | 603| 0| 0| 603| 41| | |
public |payment_p2022_05 | 2677| 0| 0| 2677| 1312| |2024-09-22 09:52:00.000 +0700|
public |film_actor | 5462| 0| 0| 5462| 0| |2024-09-22 09:52:00.000 +0700|
public |payment_p2022_07 | 2334| 0| 0| 2334| 0| |2024-09-22 09:52:00.000 +0700|
Dead tuples are rows that have been marked as deleted but not yet removed from the database tables. A high value for n_dead_tup
indicates table bloat.