This is a test table from an Oracle to Postgres migration. The table has had a dozen or so columns removed for the illustration of this test case. I did not design this table but the customer is fixing it (adding proper primary key, changing to boolean and integer where appropriate etc...).
Table "public.costcenter"
Column | Type | Modifiers -------------------+-----------------------------+-------------------- costcenterid | numeric | not null costcenterno | character varying(100) | not null amount | numeric(18,2) | closed | boolean | not null enteruser | integer | phonelines | numeric | cckeyid | character varying(100) | country | character varying(2) | level4 | character varying(100) | level5 | character varying(100) | level6 | character varying(100) | level7 | character varying(100) | level8 | character varying(100) | level9 | character varying(100) | latitude | character varying(20) | longitude | character varying(20) | lastactiondate | timestamp without time zone | lastactionuserid | numeric | countryid | numeric(3,0) | stateid | numeric(4,0) |
Indexes:
"costcenter_idx" PRIMARY KEY, btree (costcenterid) "enteruser_cost" btree (costcenterid, enteruser)
General settings
- PostgreSQL 9.5.4
- Linux (because there is only Linux)
- random_page_cost and seq_page_cost both = to 1.0
- Standard (100) default_statistics_target
- 8392 rows
Distribution of column costcenterid,enteruser:
costcenterid | enteruser --------------+----------- 109 | 43 3 | 46 12 | 623 1287 | 44 6936 | 2 3 | 848 2 | 1 40 | 50
Query Plan Examples:
cs=# explain analyze select count(*) from costcenter where enteruser = 43; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=181.99..182.00 rows=1 width=0) (actual time=0.299..0.299 rows=1 loops=1) -> Index Only Scan using enteruser_cost on costcenter (cost=0.29..181.72 rows=109 width=0) (actual time=0.150..0.289 rows=109 loops=1) Index Cond: (enteruser = 43) Heap Fetches: 109 Planning time: 0.095 ms Execution time: 0.319 ms cs=# explain analyze select count(*) from costcenter where enteruser = 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Aggregate (cost=290.24..290.25 rows=1 width=0) (actual time=2.052..2.052 rows=1 loops=1) -> Seq Scan on costcenter (cost=0.00..272.90 rows=6936 width=0) (actual time=0.008..1.717 rows=6936 loops=1) Filter: (enteruser = 2) Rows Removed by Filter: 1456 Planning time: 0.062 ms Execution time: 2.079 ms cs=# set enable_seqscan to false; cs=# explain analyze select count(*) from costcenter where enteruser = 2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=352.71..352.72 rows=1 width=0) (actual time=3.355..3.355 rows=1 loops=1) -> Index Only Scan using enteruser_cost on costcenter (cost=0.29..335.37 rows=6936 width=0) (actual time=0.020..2.982 rows=6936 loops=1) Index Cond: (enteruser = 2) Heap Fetches: 6936 Planning time: 0.082 ms Execution time: 3.388 ms
As you can see, with a proper data distribution the planner will chose to use the second column within an index. It will also choose the proper plan (seq_scan) when it is cheaper to sequential scan than use an index. However, keep in mind that it very much does depend on the amount of data that is being accessed, and the data distribution of that data as to whether using a second element in the index as the primary index for that column is the most efficient.