Article X4PB CodeSOD: Collated Performance

CodeSOD: Collated Performance

by
Remy Porter
from The Daily WTF on (#X4PB)

Eliza had a co-worker who had a problem. There were users with names like "Reni(C)". Other users, using standard keyboards, wanted to search for "Reni(C)", but couldn't be bothered to figure out how to type that accent, so they just searched for "Rene".

The co-worker came up with this solution:

SELECT * FROM table WHERE UPPER(trim(translate(first_name, 'iiiii+iiiiiiTMiii"i-iiAi(C)iiiiiiiii^1ii1/4iiii(R)ii', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy'))) LIKE (translate('FIRSTNAMESEARCHSTRING%', 'iiiii+iiiiiiTMiii"i-iiAi(C)iiiiiiiii^1ii1/4iiii(R)ii', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy')) AND UPPER(trim(translate(last_name, 'iiiii+iiiiiiTMiii"i-iiAi(C)iiiiiiiii^1ii1/4iiii(R)ii', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy'))) LIKE (translate('LASTNAMESEARCHSTRING%', 'iiiii+iiiiiiTMiii"i-iiAi(C)iiiiiiiii^1ii1/4iiii(R)ii', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy'))

Eliza noticed it because this query was extremely slow- and no surprise. translate is unavoidably a character by character operation. It was taking 30 seconds to search their database with a nine character last name that didn't even contain any of the replaced characters. And of course, in most collations and character sets, i and upper('i(C)') are going to be the same character, making half the replacements completely unnecessary.

Speaking of collations, they're a tool that's been standardized to make this really complicated problem of determining which characters are the same, or what order is "alphabetical" easy for programmers to solve. Every RDBMS supports them, and by specifying a collation. You can set the collation either on the table, the column, the individual query you're running against, or for the session of the query, using the standard SQL command COLLATE.

Eliza added some code to issue a COLLATE utf8_general_ci; command before running the query, removed the translates, and watched the query execute in 1/30th of a second, instead of 30 seconds.

inedo50.png[Advertisement] Use NuGet or npm? Check out ProGet, the easy-to-use package repository that lets you host and manage your own personal or enterprise-wide NuGet feeds and npm repositories. It's got an impressively-featured free edition, too! TheDailyWtf?d=yIl2AUoC8zALqZrjv_ekUs
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments