Article 6HSPX Why “a caret, euro, trademark” ’ in a file?

Why “a caret, euro, trademark” ’ in a file?

by
John
from John D. Cook on (#6HSPX)

ahat_euro_tm.png

Why might you see a2122.png in the middle of an otherwise intelligible file? The reason is very similar to the reason you might see , which I explained in the previous post. You might want to read that post first if you're not familiar with Unicode and character encodings.

It all has to do with an encoding error, probably. Not necessarily, since, for example, I deliberately put a2122.png in the opening sentence. But assuming it is an error, it's likely an encoding error.

But it's the opposite of the error. The occurs when non- UTF-8 text has been declared (or implicitly interpreted as) Unicode. In particular, you can run into this error if text encoded in ISO 8859-1 is interpreted as as UTF-8.

The a2122.png sequence is usually the opposite: UTF-8 encoded text is being interpreted as Windows-1252 (a.k.a. CP-1252) encoded text. In particular, a single quote (U+2019) encoded in UTF-8 has been interpreted as the Windows-1252 text a2122.png.

Windows-1252 is a superset of IDO 8859-1, the error resulting in could also be described as a Windows-1252 error. So a means Windows-1252 text has been interpreted as UTF-8, and a2122.png means UTF-8 has been interpreted as Windows-1252. In the former case there is an invalid character. In the latter case all the characters are valid, though they're not the characters you were supposed to see.

You can fix the error by making your content and your encoding match. Or remove the offending character, replacing the single quote with ’.

You can find more details in this Stack Overflow post.

The post Why a caret, euro, trademark" aTM in a file? first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments