ruby: mysql query optimiser better performance from subquery with partitions than left join with index

when using mysql version 5.6.14-enterprise-commercial-advanced-log on an ubuntu 12.04 LTS, I encounter the following behaviour when querying data from these tables:

CREATE TABLE `a` (
          `id` varchar(32) DEFAULT NULL,
          `request_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
        ) ENGINE=MyISAM DEFAULT CHARSET=latin1
        !50100 PARTITION BY RANGE (UNIX_TIMESTAMP(request_time))
        (PARTITION p15062116 VALUES LESS THAN (1434895200) ENGINE = MyISAM,
         PARTITION p15062117 VALUES LESS THAN (1434898800) ENGINE = MyISAM,
         PARTITION p15062118 VALUES LESS THAN (1434902400) ENGINE = MyISAM,
        ...
        PARTITION rest VALUES LESS THAN MAXVALUE ENGINE = MyISAM)

CREATE TABLE `b` (
          `id` varchar(50) NOT NULL,
          `start_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
          `item` int(11) DEFAULT NULL,
          `item2` int(11) DEFAULT NULL,
          PRIMARY KEY (`id`,`start_time`)
        ) ENGINE=MyISAM DEFAULT CHARSET=utf8
        !50100 PARTITION BY RANGE (UNIX_TIMESTAMP(start_time))
        (PARTITION p15062516 VALUES LESS THAN (1435240800) ENGINE = MyISAM,
         PARTITION p15062517 VALUES LESS THAN (1435244400) ENGINE = MyISAM
        ....
        PARTITION rest VALUES LESS THAN MAXVALUE ENGINE = MyISAM)

using this query results in a 1 second runtime:

SELECT SQL_NO_CACHE count(*)  FROM a left join (select * from b where start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00') c using(id) where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00';

explain output:

+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
| id | select_type | table      | type | possible_keys | key         | key_len | ref  | rows   | Extra       |
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
|  1 | PRIMARY     | a   | ALL  | NULL          | NULL        | NULL    | NULL | 336972 | Using where |
|  1 | PRIMARY     | <derived2> | ref  | <auto_key0>   | <auto_key0> | 152     | func |     10 | Using where |
|  2 | DERIVED     | b  | ALL  | NULL          | NULL        | NULL    | NULL |  39508 | Using where |
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
3 rows in set (0.00 sec)

and using this query results in 30 seconds runtime:

SELECT SQL_NO_CACHE count(*)  FROM a  left join b using(id) where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' and start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00';

explain output:

+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table     | type | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | a  | ALL  | NULL          | NULL    | NULL    | NULL | 336972 | Using where              |
|  1 | SIMPLE      | b | ref  | PRIMARY       | PRIMARY | 152     | func |    395 | Using where; Using index |
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
2 rows in set (12.17 sec)

I expected similar or better results from the second query based on the usage of the id index and the 1 hour partition. both tables have 1000000~ records each.

can you explain why is the first query much more efficient than the second one?

can we refactor the first query so that it might become a view or a reusable query instead of rebuilding the subquery for each join?

thanks

ruby

dimanche 28 juin 2015

mysql query optimiser better performance from subquery with partitions than left join with index

Aucun commentaire:

Enregistrer un commentaire