15.8.10.1 Configuring Persistent Optimizer Statistics Parameters

Contents

The persistent optimizer statistics feature improves plan stability by storing statistics to disk and making them persistent across server restarts so that the optimizer is more likely to make consistent choices each time for a given query.

Optimizer statistics are persisted to disk when innodb_stats_persistent=ON or when individual tables are created or altered with STATS_PERSISTENT=1. innodb_stats_persistent is enabled by default.

Formerly, optimizer statistics were cleared on each server restart and after some other operations, and recomputed on the next table access. Consequently, different estimates could be produced when recalculating statistics, leading to different choices in query execution plans and thus variations in query performance.

Persistent statistics are stored in the mysql.innodb_table_stats and mysql.innodb_index_stats tables, as described in Section 15.8.10.1.5, “InnoDB Persistent Statistics Tables”.

To revert to using non-persistent optimizer statistics, you can modify tables using an ALTER TABLE tbl_name STATS_PERSISTENT=0 statement. For related information, see Section 15.8.10.2, “Configuring Non-Persistent Optimizer Statistics Parameters”

15.8.10.1.1 Configuring Automatic Statistics Calculation for Persistent Optimizer Statistics

The innodb_stats_auto_recalc configuration option, which is enabled by default, determines whether statistics are calculated automatically whenever a table undergoes substantial changes (to more than 10% of the rows). You can also configure automatic statistics recalculation for individual tables using a STATS_AUTO_RECALC clause in a CREATE TABLE or ALTER TABLE statement. innodb_stats_auto_recalc is enabled by default.

Because of the asynchronous nature of automatic statistics recalculation (which occurs in the background), statistics may not be recalculated instantly after running a DML operation that affects more than 10% of a table, even when innodb_stats_auto_recalc is enabled. In some cases, statistics recalculation may be delayed by a few seconds. If up-to-date statistics are required immediately after changing significant portions of a table, run ANALYZE TABLE to initiate a synchronous (foreground) recalculation of statistics.

If innodb_stats_auto_recalc is disabled, ensure the accuracy of optimizer statistics by issuing the ANALYZE TABLE statement for each applicable table after making substantial changes to indexed columns. You might run this statement in your setup scripts after representative data has been loaded into the table, and run it periodically after DML operations significantly change the contents of indexed columns, or on a schedule at times of low activity. When a new index is added to an existing table, or a column is added or dropped, index statistics are calculated and added to the innodb_index_stats table regardless of the value of innodb_stats_auto_recalc.

Caution

To ensure statistics are gathered when a new index is created, either enable the innodb_stats_auto_recalc option, or run ANALYZE TABLE after creating each new index when the persistent statistics mode is enabled.

15.8.10.1.2 Configuring Optimizer Statistics Parameters for Individual Tables

innodb_stats_persistent, innodb_stats_auto_recalc, and innodb_stats_persistent_sample_pages are global configuration options. To override these system-wide settings and configure optimizer statistics parameters for individual tables, you can define STATS_PERSISTENT, STATS_AUTO_RECALC, and STATS_SAMPLE_PAGES clauses in CREATE TABLE or ALTER TABLE statements.

STATS_PERSISTENT specifies whether to enable persistent statistics for an InnoDB table. The value DEFAULT causes the persistent statistics setting for the table to be determined by the innodb_stats_persistent configuration option. The value 1 enables persistent statistics for the table, while the value 0 turns off this feature. After enabling persistent statistics through a CREATE TABLE or ALTER TABLE statement, issue an ANALYZE TABLE statement to calculate the statistics, after loading representative data into the table.
STATS_AUTO_RECALC specifies whether to automatically recalculate persistent statistics for an InnoDB table. The value DEFAULT causes the persistent statistics setting for the table to be determined by the innodb_stats_auto_recalc configuration option. The value 1 causes statistics to be recalculated when 10% of the data in the table has changed. The value 0 prevents automatic recalculation for this table; with this setting, issue an ANALYZE TABLE statement to recalculate the statistics after making substantial changes to the table.
STATS_SAMPLE_PAGES specifies the number of index pages to sample when estimating cardinality and other statistics for an indexed column, such as those calculated by ANALYZE TABLE.

All three clauses are specified in the following CREATE TABLE example:

CREATE TABLE `t1` (
`id` int(8) NOT NULL auto_increment,
`data` varchar(255),
`date` datetime,
PRIMARY KEY (`id`),
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
STATS_PERSISTENT=1,
STATS_AUTO_RECALC=1,
STATS_SAMPLE_PAGES=25;

Contents Haut

15.8.10.1.3 Configuring the Number of Sampled Pages for InnoDB Optimizer Statistics

The MySQL query optimizer uses estimated statistics about key distributions to choose the indexes for an execution plan, based on the relative selectivity of the index. Operations such as ANALYZE TABLE cause InnoDB to sample random pages from each index on a table to estimate the cardinality of the index. (This technique is known as random dives.)

To give you control over the quality of the statistics estimate (and thus better information for the query optimizer), you can change the number of sampled pages using the parameter innodb_stats_persistent_sample_pages, which can be set at runtime.

innodb_stats_persistent_sample_pages has a default value of 20. As a general guideline, consider modifying this parameter when encountering the following issues:

Statistics are not accurate enough and the optimizer chooses suboptimal plans, as shown by EXPLAIN output. The accuracy of statistics can be checked by comparing the actual cardinality of an index (as returned by running SELECT DISTINCT on the index columns) with the estimates provided in the mysql.innodb_index_stats persistent statistics table.
If it is determined that statistics are not accurate enough, the value of innodb_stats_persistent_sample_pages should be increased until the statistics estimates are sufficiently accurate. Increasing innodb_stats_persistent_sample_pages too much, however, could cause ANALYZE TABLE to run slowly.
ANALYZE TABLE is too slow. In this case innodb_stats_persistent_sample_pages should be decreased until ANALYZE TABLE execution time is acceptable. Decreasing the value too much, however, could lead to the first problem of inaccurate statistics and suboptimal query execution plans.
If a balance cannot be achieved between accurate statistics and ANALYZE TABLE execution time, consider decreasing the number of indexed columns in the table or limiting the number of partitions to reduce ANALYZE TABLE complexity. The number of columns in the table's primary key is also important to consider, as primary key columns are appended to each nonunique index.
For related information, see Section 15.8.10.3, “Estimating ANALYZE TABLE Complexity for InnoDB Tables”.

Contents Haut

15.8.10.1.4 Including Delete-marked Records in Persistent Statistics Calculations

By default, InnoDB reads uncommitted data when calculating statistics. In the case of an uncommitted transaction that deletes rows from a table, InnoDB excludes records that are delete-marked when calculating row estimates and index statistics, which can lead to non-optimal execution plans for other transactions that are operating on the table concurrently using a transaction isolation level other than READ UNCOMMITTED. To avoid this scenario, innodb_stats_include_delete_marked can be enabled to ensure that InnoDB includes delete-marked records when calculating persistent optimizer statistics.

When innodb_stats_include_delete_marked is enabled, ANALYZE TABLE considers delete-marked records when recalculating statistics.

innodb_stats_include_delete_marked is a global setting that affects all InnoDB tables, and it is only applicable to persistent optimizer statistics.

Contents Haut

15.8.10.1.5 InnoDB Persistent Statistics Tables

The persistent statistics feature relies on the internally managed tables in the mysql database, named innodb_table_stats and innodb_index_stats. These tables are set up automatically in all install, upgrade, and build-from-source procedures.

Table 15.7 Columns of innodb_table_stats

Column name	Description
`database_name`	Database name
`table_name`	Table name, partition name, or subpartition name
`last_update`	A timestamp indicating the last time that `InnoDB` updated this row
`n_rows`	The number of rows in the table
`clustered_index_size`	The size of the primary index, in pages
`sum_of_other_index_sizes`	The total size of other (non-primary) indexes, in pages

Table 15.8 Columns of innodb_index_stats

Column name	Description
`database_name`	Database name
`table_name`	Table name, partition name, or subpartition name
`index_name`	Index name
`last_update`	A timestamp indicating the last time that `InnoDB` updated this row
`stat_name`	The name of the statistic, whose value is reported in the `stat_value` column
`stat_value`	The value of the statistic that is named in `stat_name` column
`sample_size`	The number of pages sampled for the estimate provided in the `stat_value` column
`stat_description`	Description of the statistic that is named in the `stat_name` column

Both the innodb_table_stats and innodb_index_stats tables include a last_update column showing when InnoDB last updated index statistics, as shown in the following example:

mysql> SELECT * FROM innodb_table_stats \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
last_update: 2014-05-28 16:16:44
n_rows: 200
clustered_index_size: 1
sum_of_other_index_sizes: 1
...

mysql> SELECT * FROM innodb_index_stats \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
index_name: PRIMARY
last_update: 2014-05-28 16:16:44
stat_name: n_diff_pfx01
stat_value: 200
sample_size: 1
...

The innodb_table_stats and innodb_index_stats tables are ordinary tables and can be updated manually. The ability to update statistics manually makes it possible to force a specific query optimization plan or test alternative plans without modifying the database. If you manually update statistics, issue the FLUSH TABLE tbl_name command to make MySQL reload the updated statistics.

Persistent statistics are considered local information, because they relate to the server instance. The innodb_table_stats and innodb_index_stats tables are therefore not replicated when automatic statistics recalculation takes place. If you run ANALYZE TABLE to initiate a synchronous recalculation of statistics, this statement is replicated (unless you suppressed logging for it), and recalculation takes place on the replication slaves.

Contents Haut

15.8.10.1.6 InnoDB Persistent Statistics Tables Example

The innodb_table_stats table contains one row per table. The data collected is demonstrated in the following example.

Table t1 contains a primary index (columns a, b) secondary index (columns c, d), and unique index (columns e,f):

CREATE TABLE t1 (
a INT, b INT, c INT, d INT, e INT, f INT,
PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f)
) ENGINE=INNODB;

After inserting five rows of sample data, the table appears as follows:

mysql> SELECT * FROM t1;
+---+---+------+------+------+------+
| a | b | c | d | e | f |
+---+---+------+------+------+------+
| 1 | 1 | 10 | 11 | 100 | 101 |
| 1 | 2 | 10 | 11 | 200 | 102 |
| 1 | 3 | 10 | 11 | 100 | 103 |
| 1 | 4 | 10 | 12 | 200 | 104 |
| 1 | 5 | 10 | 12 | 100 | 105 |
+---+---+------+------+------+------+

To immediately update statistics, run ANALYZE TABLE (if innodb_stats_auto_recalc is enabled, statistics are updated automatically within a few seconds assuming that the 10% threshold for changed table rows is reached):

mysql> ANALYZE TABLE t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+

Table statistics for table t1 show the last time InnoDB updated the table statistics (2014-03-14 14:36:34), the number of rows in the table (5), the clustered index size (1 page), and the combined size of the other indexes (2 pages).

mysql> SELECT * FROM mysql.innodb_table_stats WHERE table_name like 't1'\G
*************************** 1. row ***************************
database_name: test
table_name: t1
last_update: 2014-03-14 14:36:34
n_rows: 5
clustered_index_size: 1
sum_of_other_index_sizes: 2

The innodb_index_stats table contains multiple rows for each index. Each row in the innodb_index_stats table provides data related to a particular index statistic which is named in the stat_name column and described in the stat_description column. For example:

mysql> SELECT index_name, stat_name, stat_value, stat_description
FROM mysql.innodb_index_stats WHERE table_name like 't1';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i1 | n_leaf_pages | 1 | Number of leaf pages in the index |
| i1 | size | 1 | Number of pages in the index |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
| i2uniq | n_leaf_pages | 1 | Number of leaf pages in the index |
| i2uniq | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+

The stat_name column shows the following types of statistics:

size: Where stat_name=size, the stat_value column displays the total number of pages in the index.
n_leaf_pages: Where stat_name=n_leaf_pages, the stat_value column displays the number of leaf pages in the index.
n_diff_pfxNN: Where stat_name=n_diff_pfx01, the stat_value column displays the number of distinct values in the first column of the index. Where stat_name=n_diff_pfx02, the stat_value column displays the number of distinct values in the first two columns of the index, and so on. Additionally, where stat_name=n_diff_pfxNN, the stat_description column shows a comma separated list of the index columns that are counted.

To further illustrate the n_diff_pfxNN statistic, which provides cardinality data, consider once again the t1 table example that was introduced previously. As shown below, the t1 table is created with a primary index (columns a, b), a secondary index (columns c, d), and a unique index (columns e, f):

CREATE TABLE t1 (
a INT, b INT, c INT, d INT, e INT, f INT,
PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f)
) ENGINE=INNODB;

After inserting five rows of sample data, the table appears as follows:

mysql> SELECT * FROM t1;
+---+---+------+------+------+------+
| a | b | c | d | e | f |
+---+---+------+------+------+------+
| 1 | 1 | 10 | 11 | 100 | 101 |
| 1 | 2 | 10 | 11 | 200 | 102 |
| 1 | 3 | 10 | 11 | 100 | 103 |
| 1 | 4 | 10 | 12 | 200 | 104 |
| 1 | 5 | 10 | 12 | 100 | 105 |
+---+---+------+------+------+------+

When you query the index_name, stat_name, stat_value, and stat_description where stat_name LIKE 'n_diff%', the following result set is returned:

mysql> SELECT index_name, stat_name, stat_value, stat_description
FROM mysql.innodb_index_stats
WHERE table_name like 't1' AND stat_name LIKE 'n_diff%';
+------------+--------------+------------+------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
+------------+--------------+------------+------------------+

For the PRIMARY index, there are two n_diff% rows. The number of rows is equal to the number of columns in the index.

Note

For nonunique indexes, InnoDB appends the columns of the primary key.

Where index_name=PRIMARY and stat_name=n_diff_pfx01, the stat_value is 1, which indicates that there is a single distinct value in the first column of the index (column a). The number of distinct values in column a is confirmed by viewing the data in column a in table t1, in which there is a single distinct value (1). The counted column (a) is shown in the stat_description column of the result set.
Where index_name=PRIMARY and stat_name=n_diff_pfx02, the stat_value is 5, which indicates that there are five distinct values in the two columns of the index (a,b). The number of distinct values in columns a and b is confirmed by viewing the data in columns a and b in table t1, in which there are five distinct values: (1,1), (1,2), (1,3), (1,4) and (1,5). The counted columns (a,b) are shown in the stat_description column of the result set.

For the secondary index (i1), there are four n_diff% rows. Only two columns are defined for the secondary index (c,d) but there are four n_diff% rows for the secondary index because InnoDB suffixes all nonunique indexes with the primary key. As a result, there are four n_diff% rows instead of two to account for the both the secondary index columns (c,d) and the primary key columns (a,b).

Where index_name=i1 and stat_name=n_diff_pfx01, the stat_value is 1, which indicates that there is a single distinct value in the first column of the index (column c). The number of distinct values in column c is confirmed by viewing the data in column c in table t1, in which there is a single distinct value: (10). The counted column (c) is shown in the stat_description column of the result set.
Where index_name=i1 and stat_name=n_diff_pfx02, the stat_value is 2, which indicates that there are two distinct values in the first two columns of the index (c,d). The number of distinct values in columns c an d is confirmed by viewing the data in columns c and d in table t1, in which there are two distinct values: (10,11) and (10,12). The counted columns (c,d) are shown in the stat_description column of the result set.
Where index_name=i1 and stat_name=n_diff_pfx03, the stat_value is 2, which indicates that there are two distinct values in the first three columns of the index (c,d,a). The number of distinct values in columns c, d, and a is confirmed by viewing the data in column c, d, and a in table t1, in which there are two distinct values: (10,11,1) and (10,12,1). The counted columns (c,d,a) are shown in the stat_description column of the result set.
Where index_name=i1 and stat_name=n_diff_pfx04, the stat_value is 5, which indicates that there are five distinct values in the four columns of the index (c,d,a,b). The number of distinct values in columns c, d, a and b is confirmed by viewing the data in columns c, d, a, and b in table t1, in which there are five distinct values: (10,11,1,1), (10,11,1,2), (10,11,1,3), (10,12,1,4) and (10,12,1,5). The counted columns (c,d,a,b) are shown in the stat_description column of the result set.

For the unique index (i2uniq), there are two n_diff% rows.

Where index_name=i2uniq and stat_name=n_diff_pfx01, the stat_value is 2, which indicates that there are two distinct values in the first column of the index (column e). The number of distinct values in column e is confirmed by viewing the data in column e in table t1, in which there are two distinct values: (100) and (200). The counted column (e) is shown in the stat_description column of the result set.
Where index_name=i2uniq and stat_name=n_diff_pfx02, the stat_value is 5, which indicates that there are five distinct values in the two columns of the index (e,f). The number of distinct values in columns e and f is confirmed by viewing the data in columns e and f in table t1, in which there are five distinct values: (100,101), (200,102), (100,103), (200,104) and (100,105). The counted columns (e,f) are shown in the stat_description column of the result set.

Contents Haut

15.8.10.1.7 Retrieving Index Size Using the innodb_index_stats Table

The size of indexes for tables, partitions, or subpartitions can be retrieved using the innodb_index_stats table. In the following example, index sizes are retrieved for table t1. For a definition of table t1 and corresponding index statistics, see Section 15.8.10.1.6, “InnoDB Persistent Statistics Tables Example”.

mysql> SELECT SUM(stat_value) pages, index_name,
SUM(stat_value)*@@innodb_page_size size
FROM mysql.innodb_index_stats WHERE table_name='t1'
AND stat_name = 'size' GROUP BY index_name;
+-------+------------+-------+
| pages | index_name | size |
+-------+------------+-------+
| 1 | PRIMARY | 16384 |
| 1 | i1 | 16384 |
| 1 | i2uniq | 16384 |
+-------+------------+-------+

For partitions or subpartitions, the same query with a modified WHERE clause can be used to retrieve index sizes. For example, the following query retrieves index sizes for partitions of table t1:

mysql> SELECT SUM(stat_value) pages, index_name,
SUM(stat_value)*@@innodb_page_size size
FROM mysql.innodb_index_stats WHERE table_name like 't1#P%'
AND stat_name = 'size' GROUP BY index_name;

Find a PHP function

Document created the 26/06/2006, last modified the 26/10/2018
Source of the printed document:https://www.gaudry.be/en/mysql-rf-innodb-persistent-stats.html

The infobrol is a personal site whose content is my sole responsibility. The text is available under CreativeCommons license (BY-NC-SA). More info on the terms of use and the author.

References

Manuel MySQL : https://dev.mysql.com/

These references and links indicate documents consulted during the writing of this page, or which may provide additional information, but the authors of these sources can not be held responsible for the content of this page.
The author of this site is solely responsible for the way in which the various concepts, and the freedoms that are taken with the reference works, are presented here. Remember that you must cross multiple source information to reduce the risk of errors.