I'm creating a DB from scratch and I'm trying to create queries with performance in mind. For test purpose I filled my tables with test data.
My query needs to join 3 tables: 2 with millions of rows and the third with hundred thousands rows.
How can I know my query is the fastest possible? Here the result of explain:
From the explain I noticed that the query is using exactly all the Indexes I created for it, so I should be happy and it should be fast, but it's taking 3 seconds (when I execute it after MySQL reboot), and for a real time application this is way too much.
Basically I have two question:
- Is there a way to "matematically" detect if the query is best performing. i.e. check the number of row scanned
- How can I test queries speed? I try with "select sql_no_cache ..." or with "FLUSH QUERY CACHE;" or "RESET QUERY CACHE;" between queries, but the result is always that queries seems faster after the first time. I noticed that my query takes long time only the first time after the MySQL reboot (lets say 3 seconds), than also if I change some parameter (and I don't reboot the MySQL) is faster (lets say 0.6 seconds).
Thanks very much for your help
Best How To :
Except for trivial queries, there is no way to know if you have the optimal query & indexes. Nor can you get a metric for how well designed the schema and application are.
3 seconds on a cold system for a 3-way
JOIN with "Rows" of 409, 45, 1 is not too bad. Will 0.3 seconds be ok on a warm system? Won't the system be running warm most of the time?
The "Query Cache" has one purpose, but it not useful for most production systems. It remembers the output from each SELECT; then if the identical query comes along before that table(s) changes, it will very rapidly (~1ms) return the saved resultset. In a typical production system the table(s) is changing too often for this to be useful.
Either turn the QC off (_size=0 and _type=OFF), or use
SELECT SQL_NO_CACHE... when timing queries.
"Using index" is good, but it does not prove anything. If you would like to discuss that particular query, please provide
SHOW CREATE TABLE and the
SELECT. We may be able to suggest better datatypes, better indexes, rearrangement of the schema, reformulation of the
SELECT, or even a redesign of your application.
I see "geo" -- if you are using latitude and longitude, then there is a non-trivial, non-obvious, but much faster way to "find the nearest", but it involves just about everything I mentioned in the the previous paragraph.
Back to your mathematical question -- If you are comparing two
SELECTs, here are two imprecise ways to compare their performance:
- Multiple the
Rows column (409*45*1) in the
EXPLAIN for each
SELECT; see which has the smaller product.
FLUSH STATUS; SELECT ...; SHOW SESSION STATUS LIKE 'Handler%';; see which
SELECT has the smaller values.