Article 6X6CM Sorting Roman numerals

Sorting Roman numerals

by
John
from John D. Cook on (#6X6CM)

This morning I wrote about the frequencies of names for popes and kings. This involved sorting strings with Roman numerals since it's common for popes and kings to have Roman numerals after their names.

Something that surprised me was that sorting Roman numerals alphabetically roughly sorts them in numerical order, especially for small numbers. It's not perfect. For example, IX comes before V in alphabetical order.

Everyone who has done much work with data will have run into the problem of a column of numbers being sorted alphabetically rather than numerically. For example, 10" comes between 1" and 2" even though 10 comes after 1 and 2.

So you can't sort numerals, Roman or Arabic, as strings and expect them to appear in numerical order. But Roman numbers come close when you're sorting small numbers, such as I through XXIII for popes named John or I through VIII for kings of England named Henry.

To illustrate this, I plotted how well string sort order correlates with numeric order for Roman and Arabic numbers, for the sequence 1 ... n for increasing values ofn. I measured correlation using Spearman's rank-order correlation. I tried Kendall's tau and as well and got similar results.

spearman_roman1.png

Alphabetical order and numerical order for Roman numerals agree pretty well up to XXXVIII, with just a few numbers out of place, namely IX, XIX, and XXIX. But alphabetical order and numerical order diverge quite a bit for Arabic numerals when all the numbers between 10 and 19 come before 2.

As you go further out, alphabetical order and numerical order diverge for both writing systems, but especially for Roman numerals.

spearman_roman2.png

The post Sorting Roman numerals first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments