Article 4Q3F0 CodeSOD: ImAlNumb?

CodeSOD: ImAlNumb?

by
Remy Porter
from The Daily WTF on (#4Q3F0)

I think it's fair to say that C, as a language, has never had a particularly great story for working with text. Individual characters are okay, but strings are a nightmare. The need to support unicode has only made that story a little more fraught, especially as older code now suddenly needs to support extended characters. And by "older" I mean, "wchar was added in 1995, which is practically yesterday in C time".

Lexie inherited some older code. It was not designed to support unicode, which is certainly a problem in 2019, and it's the problem Lexie was tasked with fixing. But it had an" interesting approach to deciding if a character was alphanumeric.

Now, if we limit ourselves to ASCII, there are a variety of ways we could do this check. We could convert it to a number and do a simple check- characters 48-57 are numeric, 65-90 and 97-122 cover the alphabetic characters. But that's a conditional expression- six comparison operations! So maybe we should be more clever. There is a built-in library function, isalnum, which might be more optimized, and is available on Lexie's platform. But we're dedicated to really doing some serious premature optimization, so there has to be a better way.

bool isalnumCache[256] ={false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, true, true, true, true, false, false, false, false, false, false,false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, false, false, false, false,false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, false, false, false,false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false,false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false};

This is a lookup table. Convert your character to an integer, and then use it to index the array. This is fast. It's also error prone, and this block does incorrectly identify a non-alphanumeric as an alphanumeric. It also 100% fails if you are dealing with wchar_t, which is how Lexie ended up looking at this block in the first place.

buildmaster-icon.png [Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today! TheDailyWtf?d=yIl2AUoC8zA3_9qWjsxNro
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments