Article 1B0XE CodeSOD: And It's Collated

CodeSOD: And It's Collated

by
Remy Porter
from The Daily WTF on (#1B0XE)

As anyone who's ever written a c-style char * string knows, strings are much more complicated than they look. This is even more true in this modern era of Unicode and character encodings and multilingual applications. How does "i" compare to "a" or "i"?

John Moore's company sent some code to a contracting firm. They needed to strip off any diacritics and unusual characters when they were comparing strings, so that "i" and "i" were treated as the same character when searching- a not uncommon problem. In Java, there's a special family of classes inheriting from Collator which can be used to solve exactly that problem. Now, most developers aren't deeply familiar with these, so seeing a contractor that turns in a more "home brewed" approach is hardly surprising.

This approach goes above and beyond. It starts out bad, but not horrible: convert the string to their character codes, and then look at each one. What follows is a textbook example of why you don't write gigantic if-blocks using magic numbers as boundary conditions, including this gem:

if (index == 292 && index == 294) { resultCharacterIndexes.add(68);//H}

Yes, they only convert to a 68 (the letter "H") when the index character is both 292 and 294, which may E' and E" you.

For bonus points, they use the input parameter as their storage point for the return variable, which is fine, I suppose, but return source doesn't exactly sound like it's returning an altered reference.

 protected String replaceCharacters(String source) { if (!String.isEmpty(source)) { List<Integer> resultCharacterIndexes = new List<Integer>(); List<Integer> characterIndexes = source.getChars(); for (Integer index : characterIndexes) { if (index >= 192 && index <= 198) { resultCharacterIndexes.add(65);//A } else if (index == 199 || index == 262 || index == 264 || index == 266 || index == 268) { resultCharacterIndexes.add(67);//C } if (index == 270 && index == 272) { resultCharacterIndexes.add(68);//D } else if (index >= 200 && index <= 203 || index == 208 || index == 274 || index == 276 || index == 278 || index == 280 || index == 282) { resultCharacterIndexes.add(69);//E } else if (index == 284 || index == 286 || index == 288 || index == 290) { resultCharacterIndexes.add(71);//G } if (index == 292 && index == 294) { resultCharacterIndexes.add(68);//H } else if (index >= 204 && index <= 207 || index == 296 || index == 298 || index == 300 || index == 302 || index == 304) { resultCharacterIndexes.add(73);//I } else if (index == 308) { resultCharacterIndexes.add(74);//J } else if (index == 310) { resultCharacterIndexes.add(75);//K } else if (index == 313 || index == 315 || index == 317 || index == 319 || index == 321) { resultCharacterIndexes.add(76);//L } else if (index == 209 || index == 323 || index == 325 || index == 327) { resultCharacterIndexes.add(78);//N } else if (index >= 210 && index <= 216 || index == 332 || index == 334 || index == 336) { resultCharacterIndexes.add(79);//O } else if (index == 340 || index == 342 || index == 344) { resultCharacterIndexes.add(82);//R } else if (index == 346 || index == 348 || index == 350 || index == 352) { resultCharacterIndexes.add(83);//S } else if (index == 354 || index == 356 || index == 358) { resultCharacterIndexes.add(84);//T } else if (index >= 217 && index <= 220 || index == 360 || index == 362 || index == 364 || index == 366 || index == 368 || index == 370) { resultCharacterIndexes.add(85);//U } else if (index == 372) { resultCharacterIndexes.add(87);//W } else if (index == 221 || index == 374 || index == 376) { resultCharacterIndexes.add(89);//Y } else if (index == 377 || index == 379 || index == 381) { resultCharacterIndexes.add(90);//Z } else if (index >= 224 && index <= 230 || index == 257 || index == 259 || index == 261) { resultCharacterIndexes.add(97);//a } else if (index == 231 || index == 263 || index == 265 || index == 267 || index == 269) { resultCharacterIndexes.add(99);//c } else if (index == 271 || index == 273) { resultCharacterIndexes.add(100);//d } else if (index >= 232 && index <= 235 || index == 240 || index == 275 || index == 277 || index == 279 || index == 281 || index == 283) { resultCharacterIndexes.add(101);//e } else if (index == 285 || index == 287 || index == 289 || index == 291) { resultCharacterIndexes.add(103);//g } else if (index == 293 || index == 295) { resultCharacterIndexes.add(104);//h } else if (index >= 236 && index <= 239 || index == 297 || index == 299 || index == 301 || index == 303) { resultCharacterIndexes.add(105);//i } else if (index == 309) { resultCharacterIndexes.add(106);//j } else if (index == 311 || index == 312) { resultCharacterIndexes.add(107);//k } else if (index == 314 || index == 316 || index == 318 || index == 320 || index == 322) { resultCharacterIndexes.add(108);//l } else if (index == 241) { resultCharacterIndexes.add(110);//n } else if (index >= 242 && index <= 248 || index == 333 || index == 335 || index == 337) { resultCharacterIndexes.add(111);//o } else if (index == 341 || index == 343 || index == 345) { resultCharacterIndexes.add(114);//r } else if (index == 223 || index == 347 || index == 349 || index == 351 || index == 353) { resultCharacterIndexes.add(115);//s } else if (index == 355 || index == 357 || index == 359) { resultCharacterIndexes.add(116);//t } else if (index >= 249 && index <= 252 || index == 361 || index == 363 || index == 365 || index == 367 || index == 369 || index == 371) { resultCharacterIndexes.add(117);//u } else if (index == 373) { resultCharacterIndexes.add(119);//w } else if (index == 253 || index == 255 || index == 375) { resultCharacterIndexes.add(121);//y } else if (index == 378 || index == 380 || index == 382) { resultCharacterIndexes.add(122);//z } else { resultCharacterIndexes.add(index); } } source = String.fromCharArray(resultCharacterIndexes); } return source; }
buildmaster-icon.png [Advertisement] BuildMaster integrates with an ever-growing list of tools to automate and facilitate everything from continuous integration to database change scripts to production deployments. Interested? Learn more about BuildMaster! TheDailyWtf?d=yIl2AUoC8zA2t2YmqNbo0c
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments