Letter-like Unicode symbols
Unicode provides a way to distinguish symbols that look alike but have different meanings.
We can illustrate this with the following Python code.
import unicodedata as u for pair in [('K', ''), ('', ''), ('', '')]: for c in pair: print(format(ord(c), '4X'), u.bidirectional(c), u.name(c))
This produces
4B L LATIN CAPITAL LETTER K 212A L KELVIN SIGN 3A9 L GREEK CAPITAL LETTER OMEGA 2126 L OHM SIGN 2135 L ALEF SYMBOL 5D0 R HEBREW LETTER ALEF
Even though K and look similar, the former is a Roman letter and the latter is a symbol for temperature in Kelvin. Similarly, and are semantically different even though they look alike.
Or rather, they probably look similar. A font may or may not use different glyphs for different code points. The editor I'm using to write this post uses a font that makes no difference between ohm and omega. The letter K and the Kelvin symbol are slightly different if I look very closely. The two alefs appear substantially different.
Note that the mathematical symbol alef is a left-to-right character and the Hebrew latter alef is a right-to-left character. The former could be useful to tell a word processor This isn't really a Hebrew letter; it's a symbol that looks like a Hebrew letter. Don't change the direction of my text or switch any language-sensitive features like spell checking."
These letter-like symbols can be used to provide semantic information, but they can also be used to deceive. For example, a malicious web site could change a K in a URL to a Kelvin sign.
Related postsThe post Letter-like Unicode symbols first appeared on John D. Cook.