Article 6VJ5A Understanding surrogate pairs: why some Windows filenames can’t be read

Understanding surrogate pairs: why some Windows filenames can’t be read

by
Thom Holwerda
from OSnews on (#6VJ5A)

Windows was an early adopter of Unicode, and its file APIs use UTF16 internally since Windows 2000-used to be UCS-2 in Windows 95 era, when Unicode standard was only a draft on paper, but that's another topic. Using UTF-16 means that filenames, text strings, and other data are stored as sequences of 16bit units. For Windows, a properly formed surrogate pair is perfectly acceptable. However, issues arise when string manipulation produces isolated or malformed surrogates. Such errors can lead to unreadable filenames and display glitches-even though the operating system itself can execute files correctly. But we can create them deliberately as well, which we can see below.

Zafer Balkan

What a wild ride and an odd corner case. I wonder what kind of odd and fun shenanigans this could be used for.

External Content
Source RSS or Atom Feed
Feed Location http://www.osnews.com/files/recent.xml
Feed Title OSnews
Feed Link https://www.osnews.com/
Reply 0 comments