Some practices of MySQL single-machine database optimization

Database optimization has many things to talk about, and can be divided into two stages according to the supported data volume: single machine database and sharding and table splitting. The former can generally support500W or10G within the data, if the value exceeds this, it is necessary to consider sharding and table splitting. In addition, general large enterprise interviews often start from a single machine database, step by step to sharding and table splitting, and many database optimization issues will be interspersed in the middle. This article attempts to describe some practices of single machine database optimization, the database is based on mysql, and welcome corrections if there are any unreasonable places.

1Table structure optimization

When starting an application, the design of the database table structure often affects the performance of the application later, especially the performance after the user volume increases. Therefore, table structure optimization is an important step.

1.1Character set

Generally speaking, try to choose UTF-8Although GBK is better than UTF in storing noon,-8uses less storage space, but UTF-8Compatible with various languages, in fact, we do not need to sacrifice scalability for this little storage space. In fact, if you need to convert from GBK to UTF later-8The cost is very high, and data migration is required, while storage space can be completely solved by spending money to expand the hard disk.

1.2Primary key

When using MySQL's innodb, the underlying storage model of innodb is B+A tree uses the primary key as a clustered index, uses the inserted data as leaf nodes, and can quickly find leaf nodes through the primary key, thus quickly obtaining records. Therefore, when designing a table, it is necessary to add a primary key, and it is better to be auto-incrementing. Because an auto-incrementing primary key allows the inserted data to be inserted into the underlying B in order of the primary key.+In the leaf nodes of a tree, since they are ordered, this insertion almost does not require moving existing other data, so the insertion efficiency is very high. If the primary key is not auto-incrementing, then the value of the primary key is approximately random each time, at this point it may be necessary to move a large amount of data to ensure B+The characteristics of trees add unnecessary overhead.

1.3Fields

1.3.1The indexed fields must be added with not null constraints, and default values must be set

1.3.2It is not recommended to use float, double to store decimals to prevent precision loss, and it is recommended to use decimal

1.3.3It is not recommended to use Text/blob to save large amounts of data, because reading and writing large text data will cause relatively large I/O overhead, and occupy MySQL's cache, which will greatly reduce the database throughput under high concurrency. It is recommended to store large text data in a dedicated file storage system, and only save the access address of this file in MySQL, such as blog articles can be stored in files, and only the relative address of the file is saved in MySQL.

1.3.4The length of varchar type is recommended not to exceed8K.

1.3.5It is recommended to use Datetime for time types, do not use timestamp, although Datetime occupies8bytes, while timestamp only occupies4bytes, but the latter must ensure that it is not empty, and the latter is time zone sensitive.

1.3.6It is recommended to add the gmt_create and gmt_modified fields to the table, which are used to record the creation and modification time of the data. The reason for establishing these fields is to facilitate troubleshooting.

1.4Index creation

1.4.1At this stage, since the business is not well understood, it is best not to add indexes blindly, but only to add ordinary indexes for fields that are definitely used as indexes.

1.4.2The length of the innodb single-column index should not exceed767bytes, if it exceeds, it will be used before255bytes as a prefix index

1.4.3The length of each column index for creating innodb composite indexes should not exceed767bytes, the total should not exceed3072bytes

2SQL Optimization

Generally speaking, SQL has only a few types: basic CRUD operations, pagination queries, range queries, fuzzy search, and multi-table joins

2.1Basic queries

Generally, queries need to go through indexes. If there are no indexes, it is recommended to modify the query, add the indexed field, and if it is not possible to use this field due to business scenarios, then it is necessary to see how large the query call volume is. If it is large, such as daily calls10W+This requires the addition of new indexes, if not large, such as daily calls100+If it is not necessary, consider keeping it as is. In addition, select * Use as few fields as possible, add the fields you use in the SQL statement, and don't search for unnecessary fields, as it is a waste of I/O and memory space.

2.2High-performance pagination

limit m,n actually means to first execute limit m+n, and then take n rows from the mth row. This way, as the limit pagination goes further, m becomes larger, and the performance becomes lower. For example

select * from A limit 100000,10This SQL statement is very inefficient, it is recommended to change to the following version:

select id,name,age from A where id >=(select id from A limit 100000,1) limit 10

2.3, range query

Range queries include between, greater than, less than, and in. The number of conditions in the in query in MySQL is limited. If the number is small, it can be indexed, but if the number is large, it becomes a full table scan. And between, greater than, less than, and other queries will not use the index, so they should be placed after the indexable query conditions.

2.4, fuzzy query like

Using statements like like %name% will not use the index, which is equivalent to a full table scan. When the data volume is small, there will not be too much of a problem, but when the data volume is large, the performance will drop significantly. It is recommended to use a search engine to replace such fuzzy search when the data volume is large, and if necessary, add an indexable condition before the fuzzy query.

2.5, multi-table join

Subqueries and joins can both retrieve data between multiple tables, but subqueries have poor performance, and it is recommended to change subqueries to joins. For MySQL's join, it uses the Nested Loop Join algorithm, which is to query the next table through the result set of the previous table, for example, if the result set of the previous table is100 records, and the next table has10W data, then it is necessary to100*10W data set to filter and get the final result set. Therefore, it is recommended to use small result set tables to join with large tables, and at the same time, establish indexes on the join fields. If an index cannot be established, it is necessary to set a sufficiently large join buffer size. If the above techniques cannot solve the performance degradation problem caused by join, then it is better not to use join at all, and split a single join query into two simple queries. In addition, multi-table joins should not exceed three tables. Generally speaking, the performance will be very poor if more than three tables are used, and it is recommended to split the SQL.

3, database connection pool optimization

The database connection pool is essentially a cache, and it is a means to resist high concurrency. The optimization of the database connection pool is mainly to optimize the parameters. Generally, we use the DBCP connection pool, and its specific parameters are as follows:

3.1　 initialSize

The initial connection number, where the initial refers to the first time getConnection is called, not when the application starts. The initial value can be set to the historical average of the concurrency.

3.2, minIdle

The minimum retained number of idle connections. DBCP will start a background thread to recycle idle connections. When this thread recycles idle connections, it will retain a number of connections equal to minIdle. Generally, it is set to5, if the concurrency is really low, it can be set to1.

3.3, maxIdle

The maximum retained number of idle connections is set according to the business concurrency peak. For example, if the concurrency peak is20, then after the peak passes, these connections will not be recycled immediately. If another peak comes after a short period of time, the connection pool can reuse these idle connections without the need to frequently create and close connections.

3.4maxActive

Maximum active connection number, set according to the acceptable concurrency extreme value. For example, the acceptable extreme value of single-machine concurrency is100, then this maxActive is set to100 after, can only be served simultaneously for100 requests for service, and extra requests will be discarded after the maximum waiting time. This value must be set to prevent malicious concurrent attacks and protect the database.

3.5maxWait

The maximum waiting time to obtain a connection, it is recommended to set it shorter, such as3s, so that the request can fail quickly because a request cannot be released while waiting to obtain a connection, and the thread concurrency of a single machine is limited. If this time is set too long, such as the suggestion on the Internet,60s, then this thread is in6It cannot be released within 0s. As soon as this kind of request becomes numerous, the available threads of the application become fewer, and the service becomes unavailable.

3.6minEvictableIdleTimeMillis

The time a connection remains idle and is not recycled, the default30 minutes.

3.7validationQuery

Used to check whether the connection is valid, usually a simple SQL statement, it is recommended to set

3.8testOnBorrow

Check the connection when applying for it, it is not recommended to enable it as it severely affects performance

3.9testOnReturn

Check the connection when returning it, it is not recommended to enable it as it severely affects performance

3.10testWhileIdle

After it is enabled, the background thread that cleans up connections will periodically perform validateObject on idle connections. If a connection is invalid, it will be cleared, which does not affect performance and is recommended to be enabled.

3.11numTestsPerEvictionRun

Represents the number of connections checked each time, it is recommended to set it to be the same as maxActive, so that all connections can be effectively checked each time.

3.12Preheat connection pool

It is recommended to preheat the connection pool when the application starts, performing simple SQL queries before providing external access to fill the necessary number of connections in the connection pool.

4Index optimization

After the data volume increases to a certain extent, performance can no longer be improved by SQL optimization. At this point, it is necessary to resort to the big move: indexing. There are three levels of indexing, generally speaking, mastering these three levels is sufficient. In addition, the selectivity of the fields on which indexes are established should be considered.

4.1First-level index

An index is established on the conditions following 'where'. A single column can be indexed with a regular index, while multiple columns require a composite index. The principle of the leftmost prefix should be noted when creating a composite index.

4.2Second-level index

If a field used in 'order by' or 'group by' is present, it is advisable to create an index on this field. Since indexes are naturally ordered, this can avoid the sorting brought by 'order by' and 'group by', thereby improving performance.

4.3, three-level index

If the above two methods are still not effective, then add indexes to the fields to be queried, which forms what is called index coverage. This can reduce one I/O operation, because MySQL searches for data first by the primary key index, then according to the primary key index to search the general index, and then according to the general index to search the corresponding records. If the records we need are all in the general index, then the third step is not needed. Of course, this way of building indexes is quite extreme and is not suitable for general scenarios.

4.4. What is high selectivity? High selectivity means that the amount of data found through this field is small, for example, searching for a person's information by name usually results in a very small amount of data, while searching by gender may bring out half of the database's data, so, name is a high selectivity field, while gender is a low selectivity field.

, the selection of indexes

5, historical data archiving

When the data volume increases by a year5When the index is unable to help at 00W, the general approach is to consider sharding databases and tables. If the business does not experience explosive growth, but the data is indeed increasing slowly, then it is not necessary to consider sharding databases and tables as complex technical means, but to carry out historical data archiving. We target historical data that has already completed its lifecycle, such as6The data from a month ago is archived. We can use quartz scheduling tasks to automatically archive the data at midnight.6The data from a month ago is found out and then stored in the remote HBase server. Of course, we also need to provide an interface for querying historical data for future needs.

This is the organized material of optimization information for MySQL single-machine database. We will continue to supplement relevant materials later, thank you all for your support to this site!

Declaration: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (Please replace # with @ when sending an email for reporting. Provide relevant evidence, and once verified, this site will immediately delete the infringing content.)

Basic Tutorial