2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Generally speaking, MySQL can be divided into two layers
mysql -h$ip -P$port -u$user -p
The mysql in the connection command is a client tool used to establish a connection with the server. After completing the classic TCP handshake, the connector
You will start to authenticate your identity, and the username and password you entered will be used at this time.
If the client is silent for too long, the connector will automatically disconnect it. This time is controlled by the parameter wait_timeout, and the default value is 8 hours.
If the client sends a request again after the connection is disconnected, it will receive an error message: Lost connection to MySQL server during query
If you want to continue at this time, you need to reconnect and then execute the request again.
In the database, a long connection means that after a successful connection, if the client continues to have requests, the same connection will be used all the time. A short connection means that the connection is disconnected after a few queries are executed each time, and a new connection is established for the next query.
The process of establishing a connection is usually complicated, so I suggest that you try to reduce the actions of establishing a connection as much as possible during use, that is, try to use a long connection.
However, after using all long connections, you may find that sometimes the memory occupied by MySQL increases very quickly. This is becauseThe memory temporarily used by MySQL during execution is managed in the connection object.These resources will be released only when the connection is disconnected. So ifLong connections accumulate and may cause the memory usage to be too large, was forcibly killed by the system (OOM). From the phenomenon, it seems that MySQL restarted abnormally.
How to solve this problem? You can consider the following two solutions.
mysql_reset_connection
to reinitialize the connection resources. This process does not require reconnection and re-authorization verification, but will restore the connection to the state when it was just created.When MySQL receives a query request, it will first check the query cache to see if the statement has been executed before. Previously executed statements and their results may be directly cached in the memory in the form of key-value pairs. The key is the query statement, and the value is the query result. If your query can directly find the key in this cache, then the value will be directly returned to the client.
If the statement is not in the query cache, it will continue to the subsequent execution phase. After the execution is completed, the execution result will be stored in the query cache. You can see that if the query hits the cache, MySQL does not need to perform the subsequent complex operations and can directly return the result, which is very efficient.
But most of the time I willIt is recommended that you do not use query cachingWhy? Because query caching often does more harm than good.
The query cache is invalidated very frequently. As long as a table is updated, all query caches on this table will be cleared. Therefore, it is very likely that you have saved the results with great effort, but before you use them, they are all cleared by an update. For databases with heavy update pressure, the query cache hit rate will be very low. Unless your business has a static table that is updated only once in a long time, such as a system configuration table, then the queries on this table are suitable for query caching.
Fortunately, MySQL also provides this "on-demand use" method. You can set the parameter query_cache_type to DEMAND, so that the query cache is not used for the default SQL statements. For the statements that you are sure to use the query cache, you can explicitly specify it with SQL_CACHE, like the following statement:
select SQL_CACHE * from T where ID=10;
have to be aware of is,MySQL 8.0 version directly deletes the entire query cache functionThis means that this function is no longer available since version 8.0.
If the query cache is not hit, the statement will be actually executed. First, MySQL needs to know what you want to do, so it needs to parse the SQL statement.
I don't know if you still remember the article "Kong Yiji". The innkeeper had a chalk board, which was used to record the guests' credit records. If there were not many people who bought on credit, he could write the customer names and accounts on the board. But if there were many people who bought on credit, the chalk board would sometimes not be able to record all the accounts. At this time, the innkeeper must have a special account book to record the credit accounts.
If someone wants to buy on credit or pay back a debt, the shopkeeper generally has two options:
When business is booming and the counter is busy, the shopkeeper will definitely choosethe latter, because the former operation is too troublesome. First, you have to find the record of the total amount of credit for this person. Just imagine, there are dozens of pages of densely packed information. If the shopkeeper wants to find that name, he may have to put on reading glasses and look slowly. After finding it, he will take out the abacus to calculate, and finally write the result back into the account book.
The whole process is troublesome. In comparison, it is more convenient to record it on the blackboard first. If the shopkeeper does not have the help of the blackboard, he has to look through the account book every time he records the account. Isn't the efficiency so low that it is unbearable?
Similarly, MySQL also has this problem. If each update operation needs to be written to the disk, and the disk also needs to find the corresponding record and then update it, the entire process has high IO cost and search cost. In order to solve this problem, the designers of MySQL used an idea similar to the hotel owner's chalkboard to improve the update efficiency.
The whole process of cooperation between the pinkboard and the ledger is actually what is often mentioned in MySQL. WAL
technology,WAL
The full name isWrite-Ahead Logging
The key point isWrite to log first, then to disk, that is, write on the blackboard first, and then write in the account book when you are not busy.
Specifically, when a record needs to be updated, the InnoDB engine will first write the record to the redo log (blackboard) and update the memory, and the update is completed at this time. At the same time, the InnoDB engine will update the operation record to the disk at an appropriate time, and this update is often done when the system is relatively idle, which is like what the shopkeeper does after closing time.
If there are not many credit sales today, the shopkeeper can sort them out after closing. But if there are a lot of credit sales on a certain day and the blackboard is full, what should he do? At this time, the shopkeeper has to put down his work, update some of the credit sales records on the blackboard to the account book, and then erase these records from the blackboard to make room for new accounts.
Similarly, the redo log of InnoDB is of fixed size. For example, it can be configured as a group of 4 files, each of which is 1GB in size. Then this "chalkboard" can record a total of 4GB of operations. It starts from the beginning and goes back to the beginning to write in a loop after it reaches the end, as shown in the figure below.
Write pos is the current record position, which moves backwards while writing, and returns to the beginning of file 0 after writing to the end of file 3. Checkpoint is the current position to be erased, which also moves backwards and loops. Before erasing a record, the record must be updated to the data file.
The space between write pos and checkpoint is the empty space on the "pinkboard" that can be used to record new operations. If write pos catches up with checkpoint, it means the "pinkboard" is full, and no new updates can be performed at this time. You have to stop and erase some records to advance the checkpoint.
With redo log, InnoDB can ensure that even if the database restarts abnormally, the previously submitted records will not be lost. This capability is calledcrash-safe
。
To understand the concept of crash-safe, think of the example of credit records we mentioned earlier. As long as the credit records are recorded on the blackboard or written in the account book, even if the shopkeeper forgets about it later, such as suddenly closing down for a few days, he can still clarify the credit accounts through the data in the account book and blackboard after resuming business.
As we mentioned earlier, MySQL as a whole is actually divided into two parts: one is the server layer, which mainly handles MySQL functional aspects; the other is the engine layer, which is responsible for storage-related specific matters. redo log is a log specific to the InnoDB engine,and The server layer also has its own log, called binlog (archive log)。
I'm sure you'll ask, why are there two logs?
Because there was no InnoDB engine in MySQL at the beginning. The engine that comes with MySQL is MyISAM, but MyISAM does not have the crash-safe capability, and binlog logs can only be used for archiving. InnoDB was introduced to MySQL by another company as a plug-in. Since relying solely on binlog does not have the crash-safe capability, InnoDB uses another log system - that is, redo log to achieve crash-safe capabilities.
There are three differences between these two logs.
With a conceptual understanding of the two logs, let's look at the internal process of the executor and InnoDB engine when executing this simple update statement.
Here I give the execution flow chart of this update statement. The light-colored boxes in the figure indicate that it is executed inside InnoDB, and the dark-colored boxes indicate that it is executed in the executor.
Update statement execution process
You may have noticed that the last three steps seem a bit convoluted. The writing of the redo log is divided into two steps: prepare and commit. This is the "two-phase commit".
Why is there a need for "two-phase commit"? This is to allow the two logs to beLogical consistencyTo explain this problem, we have to start with the question at the beginning of the article: How to restore the database to the state of any second within half a month?
As we have said before, binlog will record all logical operations and adopt the form of "append write". If your DBA promises that the recovery can be done within half a month, then the backup system will definitely save all binlogs in the last half a month, and the system will regularly back up the entire database. The "regular" here depends on the importance of the system, which can be once a day or once a week.
When you need to restore to a specified second, for example, at 2 pm one day, you find that a table was accidentally deleted at 12 pm and need to recover the data, you can do this:
OK, now that we have finished talking about the data recovery process, let's talk about why logs require "two-phase commit". Here we can use proof by contradiction to explain.
Since redo log and binlog are two independent logics, if two-phase commit is not used, either redo log is written first and then binlog is written, or the order is reversed. Let's see what problems these two methods may cause.
Let's still use the previous update statement as an example. Assume that the value of field c in the current row with ID=2 is 0. And suppose that during the update statement execution, after the first log is written but before the second log is written, a crash occurs. What will happen?
You may say, isn't this probability very low? Usually there is no scenario where you need to restore the temporary library all the time?
Actually, no. This process is not only needed to restore data after an erroneous operation. When you need to expand capacity, that is, when you need to build more backup databases to increase the system's read capacity, the common practice now is to use full backup plus application binlog to achieve this. This "inconsistency" will cause inconsistency between the master and slave databases online.
Simply put, both redo log and binlog can be used to indicate the commit status of a transaction.Two-phase commit is to keep these two states logically consistent。