I'm executing a rather simple query on a full planet dump of OSM using Postgres 9.4. What I want to do is fetching all ways which belong to the A8 autobahn in Germany. In a preparation step, I've created multipolygons for all administrative boundary relations and stored them in the table polygons so I can do a more easy spatial intersection test. To allow for a fast query processing, I also created an index for the 'ref' hstore tags:
CREATE INDEX idx_ways_tags_ref ON planet_20141222.ways USING btree (lower(tags->'ref'));
Additionally, I have already obtained the id of the administrative boundary of Germany by a previous query (result id = 51477).
My db schema is the normal API 0.6 schema, the data was imported via the dump approach into Postgres (using the pgsnapshot_schema_0.6*.sql scripts which come with osmosis). VACUUM ANALYZE was also performed for all tables.
The problematic query looks like this:
SELECT DISTINCT wy.id FROM planet_20141222.ways wy, planet_20141222.polygons py WHERE py.id=51477 AND ST_Intersects(py.geom,wy.linestring) AND ((wy.tags->'highway') is not null) AND (lower(wy.tags->'ref') like lower('A8'));
The runtime of this query is terrible because Postgres prefers the costly ST_Intersects() test over the cheap (and highly selective) index on 'ref'. When removing the intersection test, the query returns in some milliseconds.
What can I do so that Postgres first evaluates the parts of the query where an index exists instead of testing each way in the entire planet for an intersection with Germany?
My current solution is to split the SQL query in two separate queries. The first does the index-supported tag tests and the second does the spatial intersection test. I suppose that Postgres can do better, but how?
a) the OSM 0.6 import scripts create the following indexes on the ways table:
CREATE INDEX idx_ways_bbox ON ways USING gist (bbox); CREATE INDEX idx_ways_linestring ON ways USING gist (linestring);
b) Additionally, I created another index on polygons:
CREATE INDEX polygons_geom_tags on polygons using gist(geom, tags);
c) The EXPLAIN ANALYZE output of the query without ST_Intersects() looks like this:
"Index Scan using ways_tags_ref on ways (cost=0.57..4767.61 rows=1268 width=467) (actual time=0.064..0.267 rows=60 loops=1)" " Index Cond: (lower((tags -> 'ref'::text)) = 'a8'::text)" " Filter: (((tags -> 'highway'::text) IS NOT NULL) AND (lower((tags -> 'ref'::text)) ~~ 'a8'::text))" " Rows Removed by Filter: 5" "Total runtime: 0.300 ms"
The runtime of the query with ST_Intersects() is more than 15 minutes, so I cancelled it.