Article 60EQP Letter-like Unicode symbols

Letter-like Unicode symbols

by
John
from John D. Cook on (#60EQP)

Unicode provides a way to distinguish symbols that look alike but have different meanings.

We can illustrate this with the following Python code.

 import unicodedata as u for pair in [('K', ''), ('', ''), ('', '')]: for c in pair: print(format(ord(c), '4X'), u.bidirectional(c), u.name(c))

This produces

 4B L LATIN CAPITAL LETTER K 212A L KELVIN SIGN 3A9 L GREEK CAPITAL LETTER OMEGA 2126 L OHM SIGN 2135 L ALEF SYMBOL 5D0 R HEBREW LETTER ALEF

Even though K and look similar, the former is a Roman letter and the latter is a symbol for temperature in Kelvin. Similarly, and are semantically different even though they look alike.

Or rather, they probably look similar. A font may or may not use different glyphs for different code points. The editor I'm using to write this post uses a font that makes no difference between ohm and omega. The letter K and the Kelvin symbol are slightly different if I look very closely. The two alefs appear substantially different.

Note that the mathematical symbol alef is a left-to-right character and the Hebrew latter alef is a right-to-left character. The former could be useful to tell a word processor This isn't really a Hebrew letter; it's a symbol that looks like a Hebrew letter. Don't change the direction of my text or switch any language-sensitive features like spell checking."

These letter-like symbols can be used to provide semantic information, but they can also be used to deceive. For example, a malicious web site could change a K in a URL to a Kelvin sign.

Related postsThe post Letter-like Unicode symbols first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments