Russian transliteration hack

John

from John D. Cook on 2023-07-11 14:51 (#6CWFQ)

I mentioned in the previous post that I had been poking around in HTML entities and noticed symbols for Fourier transforms and such. I also noticed HTML entities for Cyrillic letters. These entities have the form

& + transliteration + cy;.

For example, the Cyrillic letter is based on the Greek letter and its closest English counterpart is P, and its HTML entity is &Pcy;.

The Cyrillic letter has HTML entity &Rpcy; and not &Pcy; because although it looks like an English P, it sounds more like an English R.

Just as a hack, I decided to write code to transliterate Russian text by converting letters to their HTML entities, then chopping off the initial & and the final cy;.

I don't speak Russian, but according to Google Translate, the Russian translation of Hello world" is , ."

Here's my hello-world program for transliterating Russian.

 from bs4.dammit import EntitySubstitution def transliterate(ch): entity = escaper.substitute_html(ch)[1:] return entity[:-3] a = [transliterate(c) for c in ", ."] print(" ".join(a))

This prints

P r i v ie t m i r

Here's what I get trying to transliterate Chebyshev's native name .

P a f n u t i j L soft v o v i ch CH ie b y sh io v

I put a space between letters because of possible outputs like soft v" above.

This was just a fun hack. Here's what I'd get if I used software intended to be used for transliteration.

 import unidecode for x in [", ", "  "]: print(unidecode.unidecode(x))

This produces

Privet, mir
Pafnutii L'vovich Chebyshiov

The results are similar.

The post Russian transliteration hack first appeared on John D. Cook.

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title	John D. Cook
Feed Link	https://www.johndcook.com/blog