Article 2PGM1 CodeSOD: Documented Concerns

CodeSOD: Documented Concerns

by
Remy Porter
from The Daily WTF on (#2PGM1)

There's a lot of debate about how much a developer should rely on comments. Clear code should, well, be clear, and thus not need comments. On the other hand, code that's clear the minute you write it might not be as clear six months later when you forget what it was for. On the other, other hand, sometimes a crime has been committed and we need the comments for a confession.

Austin S confesses his crime.

 private static readonly HtmlElementFlag brokenFormFlag; private static readonly AtomicInteger currentConfiguredVersion = (AtomicInteger)HTML_PROCESSING_VERSION; private static readonly ReaderWriterLockSlim configLock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion); static HtmlLogic() { brokenFormFlag = HtmlNode.ElementsFlags["form"]; //fix stupidity ConfigureFor(HTML_PROCESSING_VERSION); } private static void ConfigureFor(int version) { if (version >= 8) { //form elements are NOT EMPTY!!!! HtmlNode.ElementsFlags.Remove("form"); } else { //Need to handle broken things properly... HtmlNode.ElementsFlags["form"] = brokenFormFlag; } currentConfiguredVersion.Value = version; } /// <summary> /// A horrible abomination you need to call whenever you want /// HtmlAgilityPack to load html or set inner html without breaking /// things... See top of method for reasons. /// </summary> /// <param name="action">Action to perform while in the version lock</param> /// <param name="version">Version to lock for</param> private static void ExecuteWithVersion(Action action, int version) { // (Note: you should only need to use this when importing/parsing // html.I do not believe that you need it when exporting html) /** Comment author = Austin S. Using CDATA elements to keep * the IDE sane. * * So, you may ask, "Why do we have this unholy mother of * abominations in our code?" That is an excellent question, allow * me to explain why. To ensure that both the Java and C# versions * handle html text the same despite using different html dom * libraries, we now sanitize the initial html with HtmlTidy. This * results in both libraries being handed the same (except for the * "generator" meta attribute that we don't care about) <b>VALID</b> * html that contains NO ERRORS. * Ever since version 0 (with BoolOption3 set * to true I think), we used the HtmlAgilityPack library to parse * the html we are reading and save it out. Ever since then, it * has had a certain bug. Allow us to consider the following * fragment of html (sans the CDATA wrapper):<![CDATA[<div id="p-search" role="search"> <h3><label for="searchInput">Search</label></h3> <form action="https://en.wikipedia.org/w/index.php" id="searchform" name="searchform"> <div id="simpleSearch"> <input type="search" name="search" placeholder="Search Wikipedia" title="Search Wikipedia [alt-shift-f]" accesskey="f" id="searchInput" tabindex="1" autocomplete="off"> <input type="hidden" value="Special:Search" name="title"> <input type="submit" name="go" value="Go" title="Go to a page with this exact name if it exists" id="searchButton" class="searchButton"> </div> </form></div>]]> * "What is so special about this fragment?" you ask. Simple, it * has a form element. When HtmlAgilityPack parses this fragment, * we get something like this (comments from me):<![CDATA[Element: div Text:\n Element: h3 Element: label Text: Search Text: \n Element: form //why is it empty? Text: \n Element: div Text: \n Element: input Element: input Element: input Text: </form> //<---WTF(udge)?!?!?!?!?!]]> * As you can see, <![CDATA[</form>]]> is interpreted as text. * This means that [PRODUCT_NAME] WILL NOT TAG IT AWAY and WILL BE * VISIBLE TO THE USER FOR TRANSLATION. The Java library does not * have this issue. The simplest solution to this problem is to * not have the Java version support html, which is sheer lunacy * and is not permitted for a number of reasons, such as "Sorry, * you need to use the Windows version to translate any html." * Luckily, this can be fixed by modifying a <b>static</b> variable * in the library like so:<code> * <see cref="HtmlNode.ElementsFlags"/>.Remove("form"); * </code> * But, as a comment on the StackOverflow answer I got this from * (http://stackoverflow.com/a/4237224/1366594) mentions, this * is a breaking change, and the variable is <b>STATIC</b>. If * you don't know what that means, it means that this setting will * be read on all threads, regardless of their origin. This * wouldn't be an issue if it weren't for the fact that we need to * be able to do the following two things at the same time: * 1. Keep the code backwards compatible so users can save out old * Files. * 2. Support concurrent execution since the user may be saving out * more than one file at a time and/or the File Handler that * calls this logic will be doing so on multiple threads. * It wouldn't be so complex if we only had to handle one of these. * Our second "simple" fix would be to drop support for one of * those two things, but we can't drop backwards compatibility * support in the C# version because that would be stupid and we * would have angry customers, and we can't drop support for * concurrent saving since that is also utter lunacy (try saving * out 100 html files at once...) * So, we need to move to our next approach, which is this * abomination. Luckily, I only think we need to do this when * parsing html. We create a method that wraps all html "parsing" * calls so we can ensure that the html is parsed with the correct * settings. Then we have this issue: concurrency. To fix this * issue, we use an <see cref="AtomicInteger"/> (since I am more * used to that than using a volatile int that is changed via * calls to Interlocked) to keep track of the current configured * version, and a <see cref="ReaderWriterLockSlim"/> to make sure * we don't try to run logic for an old version at the same time * we are running logic for a new version. The new versions use * the "Read" lock, and the old versions use the "Write" lock. * This permits us to execute multiple actions with the "current" * configuration, but has the downside of only letting one action * with the old versions run at a time. The best way to work * around this is to implement some sort of concurrent multi lock * thing that only permits locks of the same key to run at the same * time, but I haven't found any such thing, and creating our own * will surely end with someone going mad. * So now we need to pass in a version number with any call that * parses html, so are we done, right? WRONG! * Recently I implemented some version checking on saving out so * we don't try to save any documents that are too "new", as in * the user hasn't updated their copy of PRODUCT_NAME and try to save * out a file they got from a co-worker who did update their copy. * If we allowed them to do so, the result would likely be invalid * output, and would result in their customers being unhappy with * the result since it is useless. If the customer of our customer * does not catch this before deploying the translation into the * wild, it would make them look like a fool, so we cannot allow it. * We store the opening code's primary version in IntOption1 and * the secondary version (usually the html opener's version) in * IntOption2 if applicable. We would rename these variables, * but some users insist on using OLD_PRODUCT_NAME (which is no longer * receiving updates and also doesn't have any version checking * on saving out). This is a pain, but there currently isn't much * that can be done about it, so I digress. * Due to this stupid breaking change thing, EVERY call to parse * some html into a <see cref="HtmlAgilityPack.HtmlDocument"/> or * a <see cref="HtmlNode"/> needs to be wrapped with this method * (<see cref="ExecuteWithVersion(Action, int)"/>) with an int * version passed into it. That version number needs to be stored * somewhere and is usually stored in IntOption2 (or IntOption1 if * this was an html file we opened). Then we encounter a new issue. * The new issue is that the file adapters for .po and .properties * use some html logic and use IntOption2 for other things, so I * had to increment the main version numbers on them and move * those values to IntOption3. Finally, at the end of this method * (<see cref="ExecuteWithVersion(Action, int)"/>) if the current * settings are not set to the current version and no other threads * still want to use the old version, we set the config back to * the current version because we all know that some poor sap will * forget to wrap his html parsing call in this method, or we might * do html parsing elsewhere where we cannot reach this method. * Hopefully, we do not need to do any more insanity... */ bool currentVer = version >= 8; try { if (currentVer) { configLock.EnterReadLock(); } else { //sadly, this will make it so old html files can only be //"saved" one at a time, but there isn't much that can //be done about that without breaking out evil voodo. configLock.EnterWriteLock(); } if (currentConfiguredVersion.Value != version) { //we need to make sure only one thread makes changes lock (configLock) { if (currentConfiguredVersion.Value != version) { ConfigureFor(version); } } } action(); } finally { if (currentVer) { configLock.ExitReadLock(); } else { configLock.ExitWriteLock(); if (configLock.CurrentReadCount < 1 && configLock.WaitingReadCount < 1 && configLock.WaitingWriteCount < 1) { try { //reset config configLock.EnterWriteLock(); lock (configLock) { if (currentConfiguredVersion.Value != HTML_PROCESSING_VERSION) { ConfigureFor(HTML_PROCESSING_VERSION); } } } finally { configLock.ExitWriteLock(); } } } } }

Honestly, saying much more would be gilding the lily.

release50.png[Advertisement] Release!is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped! TheDailyWtf?d=yIl2AUoC8zA2ykEZzeVA_8
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments