Understanding surrogate pairs: why some Windows filenames can’t be read
Windows was an early adopter of Unicode, and its file APIs use UTF16 internally since Windows 2000-used to be UCS-2 in Windows 95 era, when Unicode standard was only a draft on paper, but that's another topic. Using UTF-16 means that filenames, text strings, and other data are stored as sequences of 16bit units. For Windows, a properly formed surrogate pair is perfectly acceptable. However, issues arise when string manipulation produces isolated or malformed surrogates. Such errors can lead to unreadable filenames and display glitches-even though the operating system itself can execute files correctly. But we can create them deliberately as well, which we can see below.
Zafer Balkan
What a wild ride and an odd corner case. I wonder what kind of odd and fun shenanigans this could be used for.