Accessing characters by name

John

from John D. Cook on 2020-12-19 01:50 (#5BVCN)

You can sometimes make code more readable by using names for characters rather than the characters themselves or their code points. Python and Perl both, for example, let you refer to a character by using its standard Unicode name inside \N{}.

For instance, \N{SNOWMAN} refers to Unicode character U+2603, shown at the top of the post. It's also kinda hard to read , and not many people would read \u2603 and immediately think Ah yes, U+2603, the snowman."

A few days ago I wrote about how to get one-liners to work on Windows and Linux. Shells and programming languages have different ways to quoting and escaping special characters, and sometimes these ways interfere with each other.

I said that one way to get around problems with literal quotes inside a quoted string is to use character codes for quotes. This may be overkill, but it works. For example,

 perl -e 'print qq{\x27hello\x27\n}'

and

 python -c "print('\x27hello\x27\n')"

both print 'hello', including the single quotes.

One problem with this is that you may not remember that U+0027 is a single quote. And even if you have that code point memorized [2], someone else reading your code might not.

The official Unicode name for a single quote is APOSTROPHE. So the Python one-liner above could be written

 python -c "print('\N{APOSTROPHE}hello\N{APOSTROPHE}\n')"

This is kinda artificial in a one-liner because such tiny programs optimize for brevity rather than readability. But in an ordinary program rather than on the command line, using character names could make code easier to read.

So how do you find out the name of a Unicode character? The names are standard, independent of any programming language, so you can look them up in any Unicode reference.

A programming language that lets you use Unicode names probably also has a way to let you look up Unicode names. For example, in Python you can use unicodedata.name.

 >>> from unicodedata import name >>> name('') 'GREEK SMALL LETTER PI' >>> name("\u05d0") #  >>> 'HEBREW LETTER ALEF'

In Perl you could write

 use charnames q{ :full }; print charnames::viacode(0x22b4); #

which prints NORMAL SUBGROUP OF OR EQUAL TO" illustrating that Unicode names can be quite long.

[1] How this renders varies greatly from platform to platform. Here are some examples.

Windows with Firefox:

iPad with Firefox:

iPad with Inoreader:

[2] Who memorizes Unicode code points?! Well, I've memorized a few landmarks. For example, I memorized where the first letters of the Latin, Greek, and Hebrew alphabets are, so in a pinch I can figure out the rest of the letters.

The post Accessing characters by name first appeared on John D. Cook. oPfbvsEzkDo

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title	John D. Cook
Feed Link	https://www.johndcook.com/blog