Article 68Y0P CodeSOD: File Type Detection

CodeSOD: File Type Detection

by
Remy Porter
from The Daily WTF on (#68Y0P)

Discerning the type of data stored in a file is frequently a challenge. We've come up with all sorts of ways to do it- like including magic bytes at the start of a file, using file extensions, appending MIME type information where possible, and frequently just hoping for the best. Ivan was working on a Python system that needed to handle XML data. Someone wanted to make sure that the XML data was actually XML, and not some other file format.

def is_xml(str): return str.startswith("<")

Any string of text which starts with < is clearly an XML file. This certainly won't give any false positives. If we assume that they at least trimed whitespace off, I think we can be fairly safe that there won't be any false negatives at least. Though if there is some way to generate a valid XML document where the first non-whitespace character isn't a <, I'd be curious to see it.

The real question is: what if this check is actually successful at filtering out a large amount of invalid files? If this check is basically useless, that's a WTF. If this check is actually valuable- that's a bigger WTF.

otter-icon.png [Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!
External Content
Source RSS or Atom Feed
Feed Location http://syndication.thedailywtf.com/TheDailyWtf
Feed Title The Daily WTF
Feed Link http://thedailywtf.com/
Reply 0 comments