Technology Sharing

Mysql-index application

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Table of contents

Index Application

What indexes does MySQL have?

What is the difference between a normal index and a unique index? Which one has better update performance?

How to set the primary key index of the clustered index? Question: What happens if you don't set it?

What kind of fields do we generally choose to create indexes?

Are more indexes better?

How to optimize indexes? (Covering index optimization, preventing index failure, primary key increment, prefix index optimization)

After the index is created, will it be used during query? (Index failure, optimizer selects execution plan based on cost)

If I define a date field of varchar type and there is a data of '20230922', if there is an index on this date field, then if I query the where condition as where time=20230922 without single quotes, will the index still be hit? Why?

Has the latest version of MySQL solved any cases of index failure? (Function index: the value calculated by the function can also be indexed, index jump scanning mechanism (leftmost prefix))

What is the leftmost matching principle?

What should I pay attention to when establishing a joint index? (The index with the highest degree of distinction is placed on the leftmost side, and the index is not used after the leftmost matching principle and range query)

Leftmost matching principle query order

What is index pushdown? Added in MySQL 5.6 to optimize data query

Where a>1 and b=2 and c <3 How to create an index?

Will the (A,B,C) joint index select * from tbn where a=? and b in (?,?) and c>? follow the index?

where a>100 and b=100 and c=123 order by d How to create a joint index?

select id, name from XX where age > 10 and name like'xx%', there is a joint index (name, age), let's talk about the query process


Index Application

MySQLWhat indexes are there?

I learned that MySQL hasPrimary KeyIndex, unique index, normal index, prefix index,Joint IndexThese indexes.

The Innodb engine requires that each database table must have aPrimary KeyindexIndex column values ​​are not allowed to haveNull ValueFor example, the id field in the table is the primary key index

Unique Index: The uniqueness of each row of data in the data column is guaranteed, but null values ​​are allowed.

ThenFor fields that are frequently queried, we can create a normal index for this fieldIf there are multiple fields, you can consider creatingJoint Index,useIndex CoveringThe features of .

For long text, string and other types of fields, such as article titles, product names, etc., we can only index the prefix part of these fields, that is,Create a prefix index to reduce the storage space of the index.

What is the difference between a normal index and a unique index? Which one has better update performance?

  • When querying for a single value, a unique index may be slightly faster because it can terminate the search after finding the first match.

  • For insert and update operations, a plain index may be slightly faster because it does not require a uniqueness check.

  1. The values ​​of ordinary index columns can be repeated, but the values ​​of unique index columns must be unique. When we insert a duplicate value into a unique index, an error will be reported due to the uniqueness constraint.

  2. I thinkThe update performance of ordinary indexes will be better because when ordinary indexes are updated, if the updated data page is notMemoryIf the update operation is in the change buffer, the update operation is completed. (No uniqueness check is required)

  3. but,Because a unique index needs to have a unique constraint, if the updated data page is notMemoryIf so, it is necessary to read the corresponding data page from the disk to the memory to determine whether there is a conflict. This involves random diskIOAccess.

  4. Because common indexes can use the change buffer feature, common index updates are faster than unique indexes.Reduced random disk access, so update performance is better

Clustered indexofPrimary KeyHow to set the index? Follow-up question: What happens if you don’t set it?

When InnoDB creates a clustered index, it selects different columns as indexes according to different scenarios:

  1. If there is a primary key, the primary key will be used as the index key of the clustered index by default.

  2. If there is no primary key, selectThe first one does not contain NULL ValuesThe only column isClustered indexIndex key

  3. In the absence of either of the above, InnoDB will automatically generate an implicit auto-increment rowid column as the index key of the clustered index.

What kind of fields do we generally choose to create indexes?

Scenarios where indexing is applicable:

  1. Fields with uniqueness restrictions, such as product code

  2. Fields often used in WHERE query conditions, which can improve the query speed of the entire table. If the query condition is not a field, a joint index can be created

  3. Fields often used in GROUP BY and ORDER BYIn this way, there is no need to sort again when searching, because the records in the B+ Tree are sorted after the index is created.

Scenarios not suitable for indexing

  1. Fields not used in WHERE conditions, GROUP BY, and ORDER BYThe value of an index is quick positioning. If a field does not serve as a positioning function, there is usually no need to create an index because the index will take up physical space.

  2. Fields with low discrimination, there is no need to create an index. For example, if the gender field only has male and female records, and if the database table has evenly distributed male and female records, then no matter which value you search for, you may get half of the data. In these cases, it is better not to have an index, because MySQLThere is still oneQuery OptimizerWhen the query optimizer finds that a value appears in a high percentage of the data rows in the table, it generally ignores the index and performsFull table scan

  3. Frequently updated fieldsFor example, do not index the user balance of an e-commerce project because the index field is frequently modified.To maintain B+TreeIf the orderliness of the data is not maintained, the index needs to be rebuilt frequently, which will affect the performance of the database.

  4. Unordered values ​​are not recommended.(such as ID card, UUID) as an index. When the primary key is uncertain, it will cause leaf nodes to split frequently, resulting in disk storage fragmentation.

  • The data table is smaller:When the amount of data in a table is small, or the query needs to scan most of the data in the table, the database optimizer may choose a full table scan instead of using an index. In this case, the cost of maintaining the index may be greater than the performance improvement it brings.

Are more indexes better?

No, although indexes can improve query efficiency, creating an additional index means generating a new B+ tree index, which requires storage space. Especially when the table data volume is very large, the larger the space occupied by the index, the more storage space is needed.

The more indexes there are, the lower the write performance of the database will be, because every time you add, delete, or modify a table, you need to maintain the order of each B+ tree index.

How to optimize the index?Covering IndexOptimize and prevent index failure.Primary KeyIncremental, prefix index optimization)

I have used these optimization methods

  1. For SQL that needs to query data in several fields, we can createJoint Index, so the query becomesCovering Index, avoiding table return and reducing a large number of I/O operations.

  2. ourPrimary KeyIndexes should preferably be increasing valuesBecause our index stores data in sequence, if the value of the primary key is a random value, it may cause page splitting. Page splitting will cause a lot of memory fragmentation, so that the index structure is not compact and will affect query efficiency.

  3. we wantAvoid writing out index failures SQL For example, do not perform left or right fuzzy matching on the index column, do not perform calculations, functions, or type conversion operations on the index, and the joint index must be used correctly to follow the leftmost matching principle, etc.In the WHERE clause, if the condition column before OR is an index column, but the condition column after OR is not an index column, the index will be invalid.

  • Use Not Equal To (<>) or NOT operator: These operators usually invalidate indexes because they scan the entire table.

  • OR operator: If OR is used in the query conditions, and the conditions on both sides of the OR involve different indexes, then these indexes may not be used.

    • use OR operator, ifOR The conditions on both sides involve different indexes, and the database engine cannot use multiple indexes at the same time to optimize the query in most cases.because OR The operator requires that the conditions on either side be met, which increases the complexity of query optimization.

  1. For indexing of some large strings, we can consider usingPrefix IndexOnly the prefix part of the index column is indexed to save index storage space and improve query performance.

  2. The index is best set to NOT NULL: To make better use of the index, the index column should be set to a NOT NULL constraint. There are two reasons:

    1. The presence of NULL in the index column will make the optimizer more complicated when making index selection, making it more difficult to optimize operations such as count.

    2. The NULL value is a meaningless value, but it takes up physical space.At least 1 byte of space will be used to store NULL List of values

After creating an index, will it be used during query? (Index failure,OptimizerChoose an execution plan based on cost)

no.

  1. I understandEven if the query uses an index, it is possible that the index will not be used.

    1. For example, when our query statement performs left fuzzy matching, expression calculation, function, and implicit type conversion operations on the index field, the query statement cannot use the index, and the query method becomes a full table scan.

    2. We also useJoint IndexWhen querying, if the leftmost matching principle is not followed, the index will fail.

  2. The optimizer isChoosing a query method based on cost considerationsWhen using a secondary index for querying, the optimizer will calculate the cost of returning to the table and the cost of a full table scan. If the cost of returning to the table is too high, the optimizer will choose not to use the index but to scan the full table.

If I define a date field of varchar type and there is a data of '20230922', if there is an index on this date field, then if I query the where condition as where time=20230922 without single quotes, will the index still be hit? Why?

The index will not be hit.

Because mysql encountersString and Number ComparisonWhenImplicit type conversion, willConvert string objects to numbers, this conversion process actually involvesfunctionIn the query you mentioned, the date field is a string. When implicit type conversion occurs, it will act on the date index field. If function calculation is performed on the index, the index will become invalid.

For an integer index column, for exampleidThe column's value is stored directly in the index without any function calculation. This means that when usingidWhen matching, noidNo functions or conversions are performed, just simple comparisons of integer values.

MySQLWhat situations of index failure have been resolved in the latest version? (Functional index:Function ComputeThe value after can also use the index and index jump scanning mechanism (leftmost prefix)

I understand that MySQL 8.0 can add fieldsFunction IndexThis new feature can solve the problem of index invalidation when using functions on indexes.

Another new feature isIndex skip scanBefore version 5.7, when using a joint index, if the leftmost matching principle is not met, the index will be invalid. After 8.0 introduced the index jump scan feature, the joint index can still be used even if the leftmost matching principle is not followed.

What is the leftmost matching principle?

Suppose there is a (a, b, c) joint index, its storage order is to sort by a first, then by b if a is the same, then by c if b is the same. Due to this feature, when using a joint index, there is a leftmost matching principle, the specific rules are:

  1. MySQL's joint index will be fromThe leftmost index column starts matching the query conditions, and then matches in order from left to right. If the query conditions do not use a column, all columns to the right of the column cannot be used for indexing.

  2. When a column is used in the query condition,However, the value of this column contains a range query, and the fields of the range query can be usedJoint Index, but the joint index cannot be used for the fields following the range query field.

Therefore, when using a joint index, we must follow the leftmost matching principle, otherwise some index fields may not be indexed.

EstablishJoint IndexWhat should I pay attention to? (The ones with the highest degree of distinction are placed on the far left, and the leftmost matching principle and the index after the range query are not used)

  1. mostPut the fields with greater differentiation inJoint IndexFar left, which helpsImprove the filtering effect of indexesFor example, fields such as UUID are more suitable for indexing or being ranked at the front of the joint index column.

  2. If a field with low discrimination is placed on the left side of a joint index, the query optimizer may choose to scan the entire table instead of using the index.

  3. The leftmost matching principle of the joint index,When encountering a range query (such as &gt;, &lt;), matching will stopThat is, the fields in the range query can use the joint index, but the fields after the range query field cannot use the joint index.However, for the four range queries of &gt;=, &lt;=, BETWEEN, and like prefix matching, matching will not stop.

    1. In MySQL, BETWEEN includes the value1 and value2 boundary values, similar to &gt;= and =&lt;.

    2. Reference link: https://zhuanlan.zhihu.com/p/573138586

Leftmost matching principle query order

 

select * from T where c=1 and a=2 and b=3;

abc can all be indexed, because The order of the where query condition fields does not affect, the MySQL optimizer will help us adjust the query order of the fields so that it also complies with the leftmost matching principle.

IndexpushWhat is it? Added in MySQL 5.6 to optimize data queries

Index pushdown can reduceSecondary IndexesThe table return operation during query improves the efficiency of query because it The server layer is responsible for some of the things that the storage engine layer is responsible for.Go deal with it.

  • When index condition pushdown optimization is not used, the storage engine retrieves data through the index and returns it to the MySQL Server.MySQL Server Determine the filtering conditions.

  • When using index condition pushdown optimization, if there are judgment conditions for certain indexed columns, MySQL Server pushes these judgment conditions to the storage engine, and then the storage engine determines whether the index meets the conditions passed by MySQL Server. Only when the index meets the conditions will the data be retrieved and returned to the MySQL server.

Index condition pushdown optimization can reduce the number of times the storage engine queries the base table, and can also reduce MySQL The number of times the server received data from the storage engine.

 

select * from t_user where age > 20 and reward = 100000;

Where a&gt;1 and b=2 and c &lt;3 How to create an index?

  1. Create a joint index of (abc), (acb), (ab), (ac), only a can be indexed

  2. Create a joint index of (cab), (cba), (ca), and (cb). Only c can be indexed.

  3. Create a (ba) joint index so that both b and a can use the index

  4. Create a (bc) joint index, both b and c can use the index

  5. Create (bac) Joint Index, b and a can both be indexed, but (ba)Joint index has another advantage: the c field canIndex Pushdown, will reduce the number of times the table is returned;

  6. create(bca) Joint Index, both b and c can be indexed, but it has one more advantage than the (bc) joint index: field a canIndex Pushdown, will reduce the number of times the table is returned;

(A,B,C) joint index select * from tbn where a=? and b in (?,?) and c>? Will it go through the index?

This query will use the joint index (A,B,C), because the condition is based on the index column ABC This is the ideal usage scenario.

  1. for A=?: This condition is an exact match, and MySQL will use the index to locate the table that meets the condition. A=? record of.

  2. for B IN (?, ?): This condition specifies B The column can take two possible values. MySQL will use the index to find all matchingA=? andB Records that are listed as either of these two values.

  3. for C>?:This condition is a range query. A andB Based on the filter, MySQL will continue to use the index to findC Records where the column value is greater than the specified value.

where a&gt;100 and b=100 and c=123 order by d how to buildJoint Index?

I thinkEstablish bcda in orderJoint IndexBetter, at this time, both b and c fields can be indexed, andd can use index ordering to avoid file sort (extra sorting)Although the last field a cannot be indexed (a is out of order), the index can be pushed down to reduce the number of table returns.

select id ,name from XX where age > 10 and name like‘xx%’,有Joint Index(name,age) , let’s talk about the query process

The order of the joint index is name first, then age. The structure is to sort by name first, and then by age if the names are equal. Therefore, the optimizer needs to match name first. At this time, name is a right fuzzy query, and the index will not be invalid. Therefore, this SQL can use the joint index.

Specifically, only name can be indexed, becauseAfter the right fuzzy query of name, the value of the age field is not in order, so age cannot be indexed, but age can beIndex Pushdown

The last fields to be queried are id and name. Both fields can be found on the joint index, so there is no need to return to the table. This is an index coverage query.

The right fuzzy query of name belongs to range query, and the following fields cannot be indexed.