Article 5MGB3 Upper case, lower case, title case

Upper case, lower case, title case

by
John
from John D. Cook on (#5MGB3)

Converting text to all upper case or all lower case is a fairly common task.

One way to convert text to upper case would be to use the tr utility to replace the letters a through z with the letters A through Z. For example,

 $ echo Now is the time | tr '[a-z]' '[A-Z]' NOW IS THE TIME

You could convert to lower case by reversing the arguments to tr.

The approach above works if your text consists of only unadorned Roman letters. But it wouldn't work, for example, if you gave it a jalapeno or :

 $ echo jalapeno  | tr '[a-z]' '[A-Z]' JALAPEnO 

Using the character classes [:lower:] and [:upper:] won't help either.

Tussling with Unicode

One alternative would be to use the uc command from the Unicode::Tussle package [1] I mentioned a few days ago. There's also a lc counterpart, and a tc for title case. These utilities handle far more than Roman letters.

 $ echo jalapeno  | uc JALAPENO 

Unicode capitalization rules are a black hole, but we'll just look at one example and turn around quickly before we cross the event horizon.

Suppose you want to send all the letters in the Greek word to upper case.

 $ echo  | uc 

Greek has two lower case forms of sigma: at the end of a word and everywhere else. But there's only one upper case sigma, so both get mapped to . This means that if we convert the text to upper case and then to lower case, we won't end up exactly where we started.

 $ echo  | uc | lc 

Note that the lc program chose as the lower case of and didn't take into account that it was at the end of a word.

Related posts

[1] Tussle" is an acronym for Tom [Christiansen]'s Unicode Scripts So Life is Easier.

The post Upper case, lower case, title case first appeared on John D. Cook.q8r2EiKWeKo
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments