7 Advanced SQL Optimization Tips You Should Know

Aug 18 2024 sql 12 minutes read (About 1783 words)

I shared 8 Essential SQL Optimization Tips in the last post, this article shares 7 Advanced SQL Optimization Tips, hoping to help you.

Medium Link: 7 Advanced SQL Optimization Tips You Should Know | by Wesley Wei | Aug, 2024 | Programmer’s Career
Author：Wesley Wei – Medium

Preface

I shared 8 Essential SQL Optimization Tips in the last post, this article shares 7 Advanced SQL Optimization Tips, hoping to help you.

1 Using Join Queries Instead of Subqueries

In MySQL, when we need to query data from two or more tables, there are usually two ways to implement this: subqueries and join queries.

Bad Example: A subquery example:

1	select * from order where user_id in (select id from user where status=1)

A subquery statement can be implemented using the in keyword, which allows a query condition to fall within another select statement’s results. The program first runs the innermost query and then runs the outer query.

The advantages of subqueries are that they are simple and structured, and if the number of involved tables is small. However, the disadvantages are that MySQL needs to create temporary tables when executing subqueries, which can cause extra performance consumption after the query is complete.

Good Example: A join query example:

1	select o.* from order o inner join user u on o.user_id = u.id where u.status=1

2 Avoid Too Many Join Tables

According to the Alibaba Developer Handbook, the number of join tables should not exceed 3.

Bad Example:

select a.name,b.name,c.name,d.name from a inner join b on a.id = b.a_id inner join c on c.b_id = b.id inner join d on d.c_id = c.id inner join e on e.d_id = d.id inner join f on f.e_id = e.id inner join g on g.f_id = f.id

If there are too many joins, MySQL will become very complex when selecting indexes, and it is easy to select the wrong index. Moreover, if no index is hit, nested loop join will be used, which has a complexity of n^2.

Good Example:

1	select a.name,b.name,c.name,a.d_name from a inner join b on a.id = b.a_id inner join c on c.b_id = b.id

If we need to query data from another few tables in our business scenario, we can add redundant fields specifically for these tables. For example, we can add the d_name field to table a, which saves the data we need to query.

However, I have also seen some ERP systems that have a small concurrent load but complex business logic and require joining 10 or more tables to query out the data.

Conclusion: The number of join tables should be determined based on the actual situation of our system. We should try to use as few joins as possible.

3 Notes on Joining

When we are dealing with multiple tables and need to query them, we usually use the join keyword.

The most commonly used join types are left join and inner join.

left join: Returns the intersection of two tables plus all remaining rows from the left table.
inner join: Returns only the rows that have matching values in both tables.

Here is an example of using inner join:

1	select o.id, o.code, u.name from order o inner join user u on o.user_id = u.id where u.status=1;

If we use inner join to relate two tables, MySQL will automatically choose the smaller table as the driving table, so there won’t be a significant performance issue.

Here is an example of using left join:

1	select o.id, o.code, u.name from order o left join user u on o.user_id = u.id where u.status=1;

If we use left join to relate two tables, MySQL will default to the left table as the driving table. If the left table has a lot of data, it may cause performance issues.

We should be especially careful when using left join and make sure that the left table is smaller than the right table. If possible, try to use inner join instead of left join.

4 Controlling Index Quantity

It is well known that indexes can significantly improve the performance of SQL queries, but the quantity of indexes is not always better.

When new data is added to a table, we need to create an index for it as well, which requires additional storage space and has some performance overhead.

According to Alibaba’s developer manual, the number of indexes on a single table should be controlled to 5 or less, and each index should have no more than 5 fields.

MySQL uses a B+ tree structure to store indexes, and when we perform insert, update, or delete operations, we need to update the B+ tree index. If there are too many indexes, it will consume a lot of extra performance.

So, what if our table has too many indexes, exceeding 5?

This question requires careful consideration. If your system has low concurrency and small data volume, you can exceed 5 as long as you don’t exceed too much.

However, for high-concurrency systems, please strictly follow the rule that a single table should not have more than 5 indexes.

So, how do we optimize the quantity of indexes in high-concurrency systems?

We can create composite indexes instead of individual indexes and delete unnecessary individual indexes.

We can also migrate some query functionality to other types of databases, such as Elastic Search or HBase, so that we only need to create a few key indexes on our business table.

5 Choosing Reasonable Field Types

The char type represents a fixed-length string, which occupies a fixed amount of storage space and may waste storage space.

1	alter table order add column code char(20) NOT NULL;

The varchar type represents a variable-length string, which adjusts its storage space according to the actual data length and does not waste storage space.

1	alter table order add column code varchar(20) NOT NULL;

If it’s a fixed-length field, such as a user’s phone number, which is usually 11 characters long, you can define it as a char type with a length of 11 bytes.
However, if it’s an enterprise name field, defining it as a char type would be problematic.

If the length is defined too long, such as 200 bytes, but the actual enterprise name is only 50 characters long, you will waste 150 bytes of storage space.
If the length is defined too short, such as 50 bytes, but the actual enterprise name has 100 characters, it will not be stored and an exception will be thrown.

Therefore, it’s recommended to change the enterprise name field to a varchar type, which can store variable-length strings and save storage space. Additionally, searching for data within a relatively small field is more efficient.

When choosing field types, we should follow these principles:

Use numeric types instead of string types whenever possible, because character processing is generally slower than numeric processing.
Try to use smaller types, such as bit for boolean values and tinyint for enumeration values.
Use char type for fixed-length strings.
Use varchar type for variable-length strings.
Use decimal type for monetary fields to avoid precision loss issues.

There are many more principles, but they won’t be listed here.

6 Improving the Efficiency of Group By

We often use the group by keyword in our business scenarios, which mainly functions as a grouping and deduplication tool.
Typically, it’s used with having, indicating that data should be filtered after grouping according to certain conditions.

Bad Example:

1	select user_ id, user_name from order group by user_id having user_id <= 200;

This type of query has poor performance because it first groups all orders by user ID and then filters out users with IDs greater than or equal to 200.
Grouping is a relatively time-consuming operation. Why can’t we filter out unnecessary data before grouping?

Good Example:

1	select user_ id, user_name from order where user_id <= 200 group by user_id;

By using the where condition before grouping, we can filter out excessive data and improve the efficiency of the query.

In fact, this is a thought process that is not limited to optimizing group by. When our SQL statements perform some time-consuming operations, we should try to reduce the scope of the data as much as possible before performing those operations. This can help improve the overall performance of the SQL statement.

7 Index Optimization

In SQL optimization, there is a very important aspect: “Index Optimization”.

Many times, SQL statements execute with or without indexes, and the execution efficiency can be quite different. Therefore, index optimization is considered the top priority in SQL optimization.

The first step of index optimization is to check whether the SQL statement uses an index.

So, how do we view whether a SQL statement uses an index?

We can use the explain command to view MySQL’s execution plan.

For example:

1	explain select * from `order` where code='002';

The result:

Through these columns, we can judge the index usage situation.

To be honest, if a SQL statement does not use an index, excluding cases where no index is built, the most likely reason is that the index has become invalid.

Below are some common reasons for index invalidation:

If it’s not one of these reasons, then we need to further investigate other possible causes.

Additionally, have you ever encountered a situation where the same SQL statement uses different indexes depending on the input parameters?

Yes, sometimes MySQL will choose the wrong index.
In such cases, we can use force index to force the query to use a specific index.

More Series Articles about Programming Skill:

https://wesley-wei.medium.com/list/programming-skills-81166d8d7555

And I’m Wesley, delighted to share knowledge from the world of programming.

Don’t forget to follow me for more informative content, or feel free to share this with others who may also find it beneficial. it would be a great help to me.

Give me some free applauds, highlights, or replies, and I’ll pay attention to those reactions, which will determine whether or not I continue to post this type of article.

See you in the next article. 👋