Russian transliteration hack
I mentioned in the previous post that I had been poking around in HTML entities and noticed symbols for Fourier transforms and such. I also noticed HTML entities for Cyrillic letters. These entities have the form
& + transliteration + cy;.
For example, the Cyrillic letter is based on the Greek letter and its closest English counterpart is P, and its HTML entity is П.
The Cyrillic letter has HTML entity &Rpcy; and not П because although it looks like an English P, it sounds more like an English R.
Just as a hack, I decided to write code to transliterate Russian text by converting letters to their HTML entities, then chopping off the initial & and the final cy;.
I don't speak Russian, but according to Google Translate, the Russian translation of Hello world" is , ."
Here's my hello-world program for transliterating Russian.
from bs4.dammit import EntitySubstitution def transliterate(ch): entity = escaper.substitute_html(ch)[1:] return entity[:-3] a = [transliterate(c) for c in ", ."] print(" ".join(a))
This prints
P r i v ie t m i r
Here's what I get trying to transliterate Chebyshev's native name .
P a f n u t i j L soft v o v i ch CH ie b y sh io v
I put a space between letters because of possible outputs like soft v" above.
This was just a fun hack. Here's what I'd get if I used software intended to be used for transliteration.
import unidecode for x in [", ", " "]: print(unidecode.unidecode(x))
This produces
Privet, mir
Pafnutii L'vovich Chebyshiov
The results are similar.
Related postsThe post Russian transliteration hack first appeared on John D. Cook.