1) During the initiation of a project, there are so many other important tasks to get the project moving on.
2) In the middle of the project, you will find out that you have only few months left to close the new tasks and accumulated TODOs from the previously finished components
3) When nearing to the UAT, you will show a part of the completed product to the customer and they will jump in with new possible suggestions. By the mean time you will give your work of art :) to the QA Engineers. QA Engineers are born critics, so they will give you a hard time. So your project managers will be staring at you when you tell the bug count in your daily meeting.
4) UAT will be the place for negotiations, so you will be given a chance to decorate your system with some more features in a short period of time :)
So there is no wonder why we think, spending time on Performance tuning does not mean anything to us. But I believe, we better be aware of certain possible performance tuning techniques, that can be done at the different stages of the project development. There are few simple performance tuning techniques, that can be followed from the start of the project. But some of the techniques will not do you much of good during the project and those techniques might require more information which can come on your way after the complete integration of a system. So we better get to know some of the techniques and when to use them.
I have divided the Performance tuning techniques into 3 categories based on different stages of the project.
1) Project development initiation
i) Choose a good data types
ii) Choose effective storage engines for your requirement
2) During the internal releases
i) Try procedure analysis and fine tune the data type definition
ii) query tuning
You can simply avoid a bug on the latency of your system
3) Preproduction tests in production environment
i) Introduce indexes based on the currently available queries in the system
ii) Can try partitioning the tables based on your query logic
This will help you to deliver what you might have promised under your non-functional requirements section
Let me have a closer look on these techniques
1.0 Project development initiation
1.1 Make sure that you use the appropriate Data Types. If you choose for a string data type for a attribute, which has only numerical values, then you may require additional processing to convert the string to a numeric value prior to performing calculations. You can avoid this by trying to collect as much as details of an attribute before do the database design.
1.2 When you choose the appropriate Length, you can save a lot of disc space. Your hardware is expensive, unnecessary space grows as your data grows. Use Appropriate Character Length. Using a CHAR where a VARCHAR is more acceptable can lead to additional space being used unnecessarily
1.3 Use Not Null. Null is a different creature to MySQL, so it will be processed in a different way, and nullable column can cause integrity issues and add unnecessary processes. So whenever possible try to go for "Not Null" enforcement of a column.
1.4 When you are trying to choose a appropriate storage engine, you need to have 2 kinds of information.
i) What kind of storage engines are available in MySQL and what they are good at
ii) The natute of the usage of your tables
Let me give a brief introduction on the Storage Engines in MySQL.
MyISAM (Table level locking)
- Favourable when you have lot of reads & less writes
- Read locks can be shared but not the write locks
- So only one write at a time to one table
- When we are not fan of transaction handling :)
- Good choice for a Reporting summary database, which is updated less frequently (scheduled summary updates)
- Quite harmful in high tps write systems
InnoDb (Row level locking)
- Suitable when you have more writes compared to reads
- Since Row level locking, writes, which target the different row numbers can share the locks at the same time
- Supports Transaction handling
- Use more memory, performs really well when high memory is allocated. More data will be buffered to the memory, which provides faster access than hard disk
- innodb_buffer_pool_size – need to be changed based the available memory capacity. Less disk I/O access results in faster response
Archive (Less storage space)
- supports only SELECTs and INSERTs
- So only suitable for the storing the data, which are not supposed to be changed but can be extended (quaranteed)
- Ideal for keeping the audit data of the systems, which produce high volume of audit data
- Less storage capacity is required comparing the packed MyISAM tables. So good one to save the storage.
Memory (Table stored in memory)
- Use hash indexes, which makes the access faster
- When you shutdown the server, data in memory tables will vanish (Oops..)
- Used when a temporary data need to be kept in the memory with faster access
2.0 During the internal releases
2.1 Procedure Analysis is one of the useful suggestion feature provided by the MySQL. Anyway to use this tool you need to have all the possible variety of data within your system. So it is always suitable to use this feature after you have build the system to some level from which you can populate with variety of data from automated test cases.
SELECT capital, continent FROM Country PROCEDURE ANALYSE()\G
- Use it to minimize field size, which is important as MySQL often allocates memory for the maximum field size
- Use it to determine if a field should allow NULL
- Make sure you tried all the possibilities of the system functionalities (Including error scenarios), so that data has most likely possible choices
3.0 Preproduction tests in production environment
Indexes are one of the broader subject on its' own. Depends on the situation, it works like magic and it can make big differences. So it is always good to follow the possible techniques in indexing to speed up your query.
- Can speed up locating rows – Index files will be loaded to the memory, so that it gives a faster access to the index files.
- Can speed up sorting rows – order by queries can be faster
- Can speed up join queries – when join columns are indexed, join queries will show a bigger improvement
Choose columns for indexing
- Columns from where conditions
- Columns from order by
- Columns used in the join conditions
- When the data stored under the column is evenly distributed
- Don't index Columns which does not have different variety of values Eg: Yes, No valued columns
Optimizer would prefer full table scan rather than the index files
Problems with indexes
- Don't over index, so more memory will be wasted for no reason
- Slower writes time to write to the data is increased to maintain index integrity
- Don't index Columns which has highly duplicated values Eg: Gender - Male/Female kind of column Optimizer would prefer full table scan rather than the index files Duplicated data types Highly duplicated data should not be indexed
3.2 Don't forget to use the explain to see whether your indexes are being used by your Mysql process while executing your queries.
Explain command is used to find the MySQL's query execution plan
create index idx_city_name on city(name(3));
mysql> explain select * from world.city where name like 'Rub%';
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | city | range | idx_city_name | idx_city_name | 3 | NULL | 1 | Using where |
select_type – Describes the type of SELECT being performed
type – Describes the access strategy deployed by MySQL to get the data in the table or index in this row
possible_keys – Provides the available indexes in the table
key Displays the actual key chosen to perform the data access
key_len Provides the length, in bytes, of the key chosen
rows – Displays the number of rows that MySQL expects to find
3.2 Partitioning of the Tables type refers to the physical way of data storage of the MySQL.
Partitioning rule of the MySQL table will store the MySQL table as multiple segments. From the application's point of view, it is same old good table. But it helps to improve the Performance from background.
It improves the performance in the following way.
Parallel partition scans at a time
- Enables the concurrent Access within the different partitions
- Query analyzer might leave out one partition from the execution plan. So unwanted partition scans will be eliminated during the data retrieval.
Partitions can be of 2 types.
Range or list
Based on the values of a particular column
- Suitable for the archiving and removing the data by partitions. Database maintenance will be faster
- Depends on the data of the column (Eg: can be created_date column of a table)
Hash or Key
Suitable for the data, which is distributed evenly
- Depends on the storage engine
I believe above mentioned techniques have been ordered in the order, which they could be used.