Article 1Z46F CodeSOD: Keeping it Regular

CodeSOD: Keeping it Regular

by
Remy Porter
from The Daily WTF on (#1Z46F)

Regular expressions are like one of those multi-tools: they're a knife, they're a screwdriver, they're pliers, and there's a pair of tweezers stuck in the handle. We can use them to do anything.

For example, Linda inherited a site that counted up and down votes, like Reddit, implemented in CoffeeScript. Instead of using variables or extracting the text from the DOM, this code brought regular expressions to bear.

updateBookmarkCounter = (upOrDown) -> counterSpan = $('.bookmark .counter') spanHtml = counterSpan.html() count = spanHtml.match(/\d/).first().toInteger() newCount = if (upOrDown == 'down') then (count - 1) else (count + 1) newCount = 0 if newCount < 1 counterSpan.html(spanHtml.replace(/\d/, newCount)) updateUserBookmarkCount upOrDown

There's a glitch in this code, and no, it's not that this code exists in the first place. Think about what happens when the number of votes exceeds 10.

Okay, maybe not the best use of regexes. What about sanitizing inputs? That seems like a textbook use case. Alexander T's co-worker found a very clever way to convert any input into a floating point number. Any input.

function convertToFloat(value) { if(typeof value == "number") return value; return parseFloat(value.replace(/\D/g, ''));}

Values like "D12.3" convert seamlessly to "123".

You know what else regexes can do? Parse things! Not just HTML and XML, but anything. Like, for example, parsing out an INI file. Kate found this.

$ini = array();preg_match_all('/\[(?<sections>[^\]\r\n]+)\][\s\r\n]*(?<values>([^\r\n\[=]+(=[^\r\n]+)?[\s\r\n]*)+)?/i', file_get_contents($iniFile), $ini);foreach($ini['sections'] as $i=>$section) { $ini[$section] = array(); $values = $sections['values'][$i]; preg_match_all('/[\s]*(?<names>[^\r\n\=]+)(=(?<values>[^\r\n]+))?[\r\n]*/i',$values,$ini); foreach($ini['names'] as $j=>$name) { $name = trim($name); if($name && !preg_match("/[#;]/", $name)){ $value = trim($ini['values'][$j]); if(!preg_match("/[#;]/", $value)) $ini[$section][$name] = $value; } }}

She was able to replace that entire block with $ini = parse_ini_file($iniFile, true);. parse_ini_file is a built-in library function in PHP.

Speaking of parsing, R.J. L. works for a company that runs printed documents through optical character recognition, and then uses regular expressions to identify interesting parts of the document to store in the database. These regular expressions are written in a custom, in-house regex language, that is almost but not quite a PCRE. By using their own regular expression language, tasks that might be inelegant or complicated in traditional languages become simple. For example, this regex find the document identifier on any page.

([:-.,;/\\(]{0,2}(( [C|c][P|p][K,<|k,<][0-9]{11})||([:#.$",'#-/|][C|c][P|p][K,<|k,<][0-9]{11} )||( [C|c][P|p][K,<|k,<][0-9]{11}[:.$",'#-/|l\\])||([:.$",'#-/|][C|c][P|p][K,<|k,<][0-9]{11}[:.$",'#-/|l\\])||(01[A|a|C|c|D|d|E|e|R|r][0-9]{7} )||([:#.$",'#-/|]01[A|a|C|c|D|d|E|e|R|r][0-9]{7})||(01[A|a|C|c|D|d|E|e|R|r][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]01[A|a|C|c|D|d|E|e|R|r][0-9]{7}[:#.$",'#-/|l\\])||( 02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{7})||([:#.$",'#-/|]02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{7} )||( 02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{7}[:#.$",'#-/|l\\])||([:#-/|]02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{7}[:#.$",'#-/|l\\])||( 04[C|c|D|d|F|f|V|v][0-9]{7})||([:#.$",'#-/|]04[C|c|D|d|F|f|V|v][0-9]{7} )||( 04[C|c|D|d|F|f|V|v][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]04[C|c|D|d|F|f|V|v][0-9]{7}[:#.$",'#-/|l\\])||(05[M|m|A|a][0-9]{7} )||([:#.$",'#-/|]05[M|m|A|a][0-9]{7} )||( 05[M|m|A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]05[M|m|A|a][0-9]{7}[:#.$",'#-/|l\\])||(06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{7})||([:#.$",'#-/|]06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{7} )||( 06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{7}[:#.$",'#-/|l\\])||( 07[U|u][0-9]{7} )||([:#.$",'#-/|]07[U|u][0-9]{7} )||( 07[U|u][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]07[U|u][0-9]{7}[:#.$",'#-/|l\\])||( 08[A|a][0-9]{7})||([:#.$",'#-/|]08[A|a][0-9]{7} )||( 08[A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]08[A|a][0-9]{7}[:#.$",'#-/|l\\])||( 09[A|a|B|b|C|c|D|d|F|f][0-9]{7})||([:#.$",'#-/|]09[A|a|B|b|C|c|D|d|F|f][0-9]{7} )||( 09[A|a|B|b|C|c|D|d|F|f][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]09[A|a|B|b|C|c|D|d|F|f][0-9]{7}[:#.$",'#-/|l\\])||( 10[M|m|F|f][0-9]{7} )||([:#.$",'#-/|]10[M|m|F|f][0-9]{7} )||( 10[M|m|F|f][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]10[M|m|F|f][0-9]{7}[:#.$",'#-/|l\\])||( 13[A|a][0-9]{7} )||([:#.$",'#-/|]13[A|a][0-9]{7} )||( 13[A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]13[A|a][0-9]{7}[:#.$",'#-/|l\\])||( 14[A|a][0-9]{7})||([:#.$",'#-/|]14[A|a][0-9]{7} )||(14[A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]14[A|a][0-9]{7})||(15[D|d|E|e|R|r|T|t][0-9]{7} )||([:#.$",'#-/|]15[D|d|E|e|R|r|T|t][0-9]{7} )||( 15[D|d|E|e|R|r|T|t][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]15[D|d|E|e|R|r|T|t][0-9]{7}[:#.$",'#-/|l\\])||( 17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{7})||([:#.$",'#-/|]17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{7} )||( 17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{7}[:#.$",'#-/|l\\])||( 18[A|a][0-9]{7})||([:#.$",'#-/|]18[A|a][0-9]{7} )||( 18[A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]18[A|a][0-9]{7}[:#.$",'#-/|l\\])||( 21[A|a|C|c|D|d][0-9]{7})||([:#.$",'#-/|]21[A|a|C|c|D|d][0-9]{7} )||( 21[A|a|C|c|D|d][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]21[A|a|C|c|D|d][0-9]{7}[:#.$",'#-/|l\\])||(23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{7})||([:#.$",'#-/|]23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{7} )||(23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{7}[:#.$",'#-/|l\\]) ||( 24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{7})||([:#.$",'#-/|]24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{7} )||( 24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{7}[:#.$",'#-/|l\\]) ||( 25[A|a][0-9]{7})||([:#.$",'#-/|]25[A|a][0-9]{7} )||( 25[A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]25[A|a][0-9]{7}[:#.$",'#-/|l\\])||( 32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{7})||([:#.$",'#-/|]32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{7} )||( 32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{7}[:#.$",'#-/|l\\])||( 34[A|a][0-9]{7} )||([:#.$",'#-/|]34[A|a][0-9]{7} )||(34[A|a][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]34[A|a][0-9]{7}[:#.$",'#-/|l\\])||( 35[A|a|B|R|r|S|s|T|t|U|u][0-9]{7})||([:#.$",'#-/|]35[A|a|B|R|r|S|s|T|t|U|u][0-9]{7} )||( 35[A|a|B|R|r|S|s|T|t|U|u][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]35[A|a|B|R|r|S|s|T|t|U|u][0-9]{7}[:#.$",'#-/|l\\])||( 39[C|c|P|p][0-9]{7} )||([:#.$",'#-/|]39[C|c|P|p][0-9]{7} )||( 39[C|c|P|p][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]39[C|c|P|p][0-9]{7}[:#.$",'#-/|l\\])||( 40[A|a|C|c|D|d|S|s][0-9]{7})||([:#.$",'#-/|]40[A|a|C|c|D|d|S|s][0-9]{7} )||( 40[A|a|C|c|D|d|S|s][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]40[A|a|C|c|D|d|S|s][0-9]{7}[:#.$",'#-/|l\\])||(46[A|a|B|b][0-9]{7} )||([:#.$",'#-/|]46[A|a|B|b][0-9]{7} )||( 46[A|a|B|b][0-9]{7}[:#.$",'#-/|l\\])||([:#.$",'#-/|]46[A|a|B|b][0-9]{7}[:#.$",'#-/|l\\]) ||(01[A|a|C|c|D|d|E|e|R|r][0-9]{9} )||([:#.$",'#-/|]01[A|a|C|c|D|d|E|e|R|r][0-9]{9})||(01[A|a|C|c|D|d|E|e|R|r][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]01[A|a|C|c|D|d|E|e|R|r][0-9]{9}[:#.$",'#-/|l\\])||( 02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{9} )||([:#.$",'#-/|]02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{9} )||( 02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]02[A|a|B|b|C|c|D|d|E|e|F|f][0-9]{9}[:#.$",'#-/|l\\]) ||( 04[C|c|D|d|F|f|V|v][0-9]{9})||([:#.$",'#-/|]04[C|c|D|d|F|f|V|v][0-9]{9} )||( 04[C|c|D|d|F|f|V|v][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]04[C|c|D|d|F|f|V|v][0-9]{9}[:#.$",'#-/|l\\])||(05[M|m|A|a][0-9]{9} )||([:#.$",'#-/|]05[M|m|A|a][0-9]{9} )||( 05[M|m|A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]05[M|m|A|a][0-9]{9}[:#.$",'#-/|l\\])||(06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{9})||([:#.$",'#-/|]06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{9} )||( 06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]06[B|b|C|c|G|g|H|h|J|j|K|k|L|l|M|m|S|s|U|u|Y|y][0-9]{9}[:#.$",'#-/|l\\])||( 07[U|u][0-9]{9} )||([:#.$",'#-/|]07[U|u][0-9]{9} )||( 07[U|u][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]07[U|u][0-9]{9}[:#.$",'#-/|l\\])||( 08[A|a][0-9]{9})||([:#.$",'#-/|]08[A|a][0-9]{9} )||( 08[A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]08[A|a][0-9]{9}[:#.$",'#-/|l\\])||( 09[A|a|B|b|C|c|D|d|F|f][0-9]{9} )||([:#.$",'#-/|]09[A|a|B|b|C|c|D|d|F|f][0-9]{9} )||( 09[A|a|B|b|C|c|D|d|F|f][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]09[A|a|B|b|C|c|D|d|F|f][0-9]{9}[:#.$",'#-/|l\\])||( 10[M|m|F|f][0-9]{9} )||([:#.$",'#-/|]10[M|m|F|f][0-9]{9} )||( 10[M|m|F|f][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]10[M|m|F|f][0-9]{9}[:#.$",'#-/|l\\])||(13[A|a][0-9]{9} )||([:#.$",'#-/|]13[A|a][0-9]{9} )||( 13[A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]13[A|a][0-9]{9}[:#.$",'#-/|l\\])||( 14[A|a][0-9]{9} )||([:#.$",'#-/|]14[A|a][0-9]{9} )||( 14[A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]14[A|a][0-9]{9}[:#.$",'#-/|l\\])|| ( 15[D|d|E|e|R|r|T|t][0-9]{9})||([:#.$",'#-/|]15[D|d|E|e|R|r|T|t][0-9]{9} )||( 15[D|d|E|e|R|r|T|t][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]15[D|d|E|e|R|r|T|t][0-9]{9}[:#.$",'#-/|l\\])||(17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{9})||([:#.$",'#-/|]17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{9} )||( 17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]17[A|a|E|e|L|l|M|m|P|p|S|s|U|u|W|w][0-9]{9}[:#.$",'#-/|l\\])||( 18[A|a][0-9]{9})||([:#.$",'#-/|]18[A|a][0-9]{9} )||(18[A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]18[A|a][0-9]{9}[:#.$",'#-/|l\\])||( 21[A|a|C|c|D|d][0-9]{9} )||([:#.$",'#-/|]21[A|a|C|c|D|d][0-9]{9} )||( 21[A|a|C|c|D|d][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]21[A|a|C|c|D|d][0-9]{9}[:#.$",'#-/|l\\])||( 23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{9})||([:#.$",'#-/|]23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{9} )||( 23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]23[A|a|B|b|C|c|D|d|L|l|M|m][0-9]{9}[:#.$",'#-/|l\\])||( 24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{9})||([:#.$",'#-/|]24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{9} )||( 24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]24[A|a|B|b|C|c|F|f|K|k|M|m|T|t][0-9]{9}[:#.$",'#-/|l\\])||( 25[A|a][0-9]{9})||([:#.$",'#-/|]25[A|a][0-9]{9} )||(25[A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]25[A|a][0-9]{9}[:#.$",'#-/|l\\])||( 32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{9})||([:#.$",'#-/|]32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{9} )||( 32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]32[A|a|F|f|H|h|X|x|Y|y|Z|z][0-9]{9}[:#.$",'#-/|l\\])||( 34[A|a][0-9]{9} )||([:#.$",'#-/|]34[A|a][0-9]{9} )||( 34[A|a][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]34[A|a][0-9]{9}[:#.$",'#-/|l\\])||(35[A|a|B|b|R|r|S|s|T|t|U|u][0-9]{9})||([:#.$",'#-/|]35[A|a|B|b|R|r|S|s|T|t|U|u][0-9]{9} )||( 35[A|a|B|b|R|r|S|s|T|t|U|u][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]35[A|a|B|b|R|r|S|s|T|t|U|u][0-9]{9}[:#.$",'#-/|l\\])||( 39[C|c|P|p][0-9]{9} )||([:#.$",'#-/|]39[C|c|P|p][0-9]{9})||( 39[C|c|P|p][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]39[C|c|P|p][0-9]{9}[:#.$",'#-/|l\\]) ||( 40[A|a|C|c|D|d|S|s][0-9]{9})||([:#.$",'#-/|]40[A|a|C|c|D|d|S|s][0-9]{9} )||( 40[A|a|C|c|D|d|S|s][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]40[A|a|C|c|D|d|S|s][0-9]{9}[:#.$",'#-/|l\\])||(46[A|a|B|b][0-9]{9} )||([:#.$",'#-/|]46[A|a|B|b][0-9]{9} )||( 46[A|a|B|b][0-9]{9}[:#.$",'#-/|l\\])||([:#.$",'#-/|]46[A|a|B|b][0-9]{9}[:#.$",'#-/|l\\]))[-.,;:Il|/\\]{0,2} )

Simplicity itself.

buildmaster-icon.png [Advertisement] BuildMaster integrates with an ever-growing list of tools to automate and facilitate everything from continuous integration to database change scripts to production deployments. Interested? Learn more about BuildMaster! TheDailyWtf?d=yIl2AUoC8zAiHE4Jzdzymg
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments