Article 6CWFQ Russian transliteration hack

Russian transliteration hack

by
John
from John D. Cook on (#6CWFQ)

I mentioned in the previous post that I had been poking around in HTML entities and noticed symbols for Fourier transforms and such. I also noticed HTML entities for Cyrillic letters. These entities have the form

& + transliteration + cy;.

For example, the Cyrillic letter is based on the Greek letter and its closest English counterpart is P, and its HTML entity is П.

The Cyrillic letter has HTML entity &Rpcy; and not П because although it looks like an English P, it sounds more like an English R.

Just as a hack, I decided to write code to transliterate Russian text by converting letters to their HTML entities, then chopping off the initial & and the final cy;.

I don't speak Russian, but according to Google Translate, the Russian translation of Hello world" is , ."

Here's my hello-world program for transliterating Russian.

 from bs4.dammit import EntitySubstitution def transliterate(ch): entity = escaper.substitute_html(ch)[1:] return entity[:-3] a = [transliterate(c) for c in ", ."] print(" ".join(a))

This prints

P r i v ie t m i r

Here's what I get trying to transliterate Chebyshev's native name .

P a f n u t i j L soft v o v i ch CH ie b y sh io v

I put a space between letters because of possible outputs like soft v" above.

This was just a fun hack. Here's what I'd get if I used software intended to be used for transliteration.

 import unidecode for x in [", ", "  "]: print(unidecode.unidecode(x))

This produces

Privet, mir
Pafnutii L'vovich Chebyshiov

The results are similar.

Related postsThe post Russian transliteration hack first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments