You may wish to generate type 1 UUIDs if you need to produce massive numbers of UUIDs at a time (such as an archiving process), but a sequential UUID type 4 should be fine in most situations. Conclusionĭon't worry about the cost of switching to using UUIDs in your tables as performance can remain good, even as your table grows to millions of rows in size. It took about a minute to insert 5 million fake logs which I consider an extreme workload. Using type 1 UUIDs is still slower than using an auto-incrementing integer, but I feel that it is worth the cost. Watch your CPU utilization whilst the tests are running and you will see what I mean. This is because it takes noticeably longer for the code to generate these UUIDs which get fed into the queries. We would expect the "pure" type 4 UUID scenario to take longer than the others because the queries took so much longer to complete, however you can see that even the sequential type 4 takes noticeably longer than using type 1. Not only does this account for long the queries took to run, but also how long it might have taken the software to generate the UUIDs necessary to create the query. The chart below shows how long each of the tests took to run (inserting 5 million fake logs to an empty table). Using type 1 (time based and thus sequential) or sequential type 4 results in practically the same performance no matter what the table size, which is good to have. This is because the records are not being just appended, but have to be inserted into the body of the table data each time. This chart shows the same results, but without UUID type 4 - random, so that you can more clearly see the performance difference between the other 3 scenarios.Īs you can see, using a "native" type 4 will kill performance if your table grows to a large size. Please note that it just records the time that it took for the query to be sent off and completed, and not how long it took for my code to create the query as we are just measuring database performance here. The chart below shows how long each of the insert queries (of 100,000 fake logs) took to make in seconds. You can also download the results spreadsheet that has my test results and was used to generate the graphs in the body of this tutorial. I published the code on Github if you would like to run the tests yourself, tweak them, or find any possible flaws with my logic. Scenario 4 - using UUID type 4 (still random but manipulated to be sequential).Scenario 3 - using UUID type 4 (random).Scenario 2 - using UUID type 1 (time based).Scenario 1 - using auto-incrementing unsigned integer.I decided to write some code to benchmark the batch insertion of millions of fake logs to see how much impact some various scenarios would have: So for a sha512, the field would be BINARY (64) since a sha512 encoding is 128 characters long.You may be concerned about the performance impact of switching from an auto-incrementing integer to a UUID. If you want to use this technique with any hex string, always do length / 2 for the field length. SELECT HEX(field_binary) AS field_binary FROM `test_table` If the positions are always different you could store that info in a second field.įull example : CREATE TABLE `test_table` ( Now in your programming language, re-insert the dashes at the positions 9, 14, 19 and 24 to match your original UUID. SELECT HEX(FieldBin) AS FieldBin FROM Table Since a single BINARY in MySQL is 8 bits in size, BINARY(16) is the size of a UUID (8*16 = 128). Now it's 32 chars (like an MD5 hash, which this also works with). So having UUID as PRIMARY KEY makes the index bigger which cannot be fit into the memory Inserts are random and the data is scattered. InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. Since UUID is 128 bits and is written as hexadecimal, it's very easy to speed up and store the UUID.įirst, in your programming language remove the dashesįrom 110E8400-E29B-11D4-A716-446655440000 to 110E8400E29B11D4A716446655440000. UUID has 36 characters which make it bulky. In fact, MySQL is faster using binary than anything else when indexes are required. If I understand correctly, you're using UUIDs in your primary column? People will say that a regular (integer) primary key will be faster, but there's another way using MySQL's dark side.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |