when using mysql version 5.6.14-enterprise-commercial-advanced-log on an ubuntu 12.04 LTS, I encounter the following behaviour when querying data from these tables:
CREATE TABLE `a` (
`id` varchar(32) DEFAULT NULL,
`request_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
) ENGINE=MyISAM DEFAULT CHARSET=latin1
!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(request_time))
(PARTITION p15062116 VALUES LESS THAN (1434895200) ENGINE = MyISAM,
PARTITION p15062117 VALUES LESS THAN (1434898800) ENGINE = MyISAM,
PARTITION p15062118 VALUES LESS THAN (1434902400) ENGINE = MyISAM,
...
PARTITION rest VALUES LESS THAN MAXVALUE ENGINE = MyISAM)
CREATE TABLE `b` (
`id` varchar(50) NOT NULL,
`start_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`item` int(11) DEFAULT NULL,
`item2` int(11) DEFAULT NULL,
PRIMARY KEY (`id`,`start_time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(start_time))
(PARTITION p15062516 VALUES LESS THAN (1435240800) ENGINE = MyISAM,
PARTITION p15062517 VALUES LESS THAN (1435244400) ENGINE = MyISAM
....
PARTITION rest VALUES LESS THAN MAXVALUE ENGINE = MyISAM)
using this query results in a 1 second runtime:
SELECT SQL_NO_CACHE count(*) FROM a left join (select * from b where start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00') c using(id) where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00';
explain output:
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
| 1 | PRIMARY | a | ALL | NULL | NULL | NULL | NULL | 336972 | Using where |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 152 | func | 10 | Using where |
| 2 | DERIVED | b | ALL | NULL | NULL | NULL | NULL | 39508 | Using where |
+----+-------------+------------+------+---------------+-------------+---------+------+--------+-------------+
3 rows in set (0.00 sec)
and using this query results in 30 seconds runtime:
SELECT SQL_NO_CACHE count(*) FROM a left join b using(id) where request_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00' and start_time between '2015-06-28 10:00:00' and '2015-06-28 11:00:00';
explain output:
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 336972 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 152 | func | 395 | Using where; Using index |
+----+-------------+-----------+------+---------------+---------+---------+------+--------+--------------------------+
2 rows in set (12.17 sec)
I expected similar or better results from the second query based on the usage of the id index and the 1 hour partition. both tables have 1000000~ records each.
can you explain why is the first query much more efficient than the second one?
can we refactor the first query so that it might become a view or a reusable query instead of rebuilding the subquery for each join?
thanks
Aucun commentaire:
Enregistrer un commentaire