Article 58V8J Bidirectional text

Bidirectional text

by
John
from John D. Cook on (#58V8J)

This post will take a look at simple bidirectional text, such as a bit of English inside an Arabic document, or a few words of Hebrew inside a French document. If you want to explore the subject in all its complexity, see Unicode Standard Annex 9.

You may not need to do anything special to display bidirectional text. For example, when I typed the following sentence, I just typed the letters in logical order.

The first letter of the Hebrew alphabet is .

For the last word, I typed , then , then . When I entered the , the editor put it on the left side of the , and when I entered the editor put it to the left of the . The characters are stored in memory in the same sequence that I typed them, though they are displayed in the order appropriate for each language.

You can change the default display ordering of characters by inserting control characters. For example, I typed

The [U+202E]quick brown fox[U+202C] jumped.

and the text displays [1] as

The quick brown fox jumped.

The Unicode character U+202E, known as RLO for right-to-left override," tells the browser to display the following letters from right-to-left. Then the character U+202C, known as PDF for pop directional formatting," exits that mode, returning to left-to-right [2]. If we copy the first sentence into a text file and open it with a hex editor we can see the control characters, circled in red.

bidi.png

I saved the file in UTF-16 encoding to make the characters easy to see: each quartet of hex characters represented a Unicode character. UTF-8 encoding more common and more compressed.

If for some reason you wanted to force Hebrew to display from left-to-right, you could insert U+202D, known as LRO for left-to-right override." The character to exit this mode is PDF, U+202C, as before.

Here's a bit of Hebrew written left-to-right:

Written left-to-right: .

And here's what it looks like in an hex editor:

bidi3.png

Related posts

[1] This should look the same as

The xof nworb kciuq jumped.

though in this footnote I typed the letters in the order they appear: xof ...

If for some reason the text in the body of the post displays in normal order, not as in this note, then something has gone wrong with your browser's rendering.

[2] So in addition to Portable Document Format and Probability Density Function, PDF can stand for Pop Directional Formatting. Here pop" is being used in the computer science sense of popping a stack.

The post Bidirectional text first appeared on John D. Cook.

2zd3zci_Ogg
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments