Article 6XDY0 UK-To-US English Converter Produced Amazing Mistakes

UK-To-US English Converter Produced Amazing Mistakes

by
janrinok
from SoylentNews on (#6XDY0)

Arthur T Knackerbracket has processed the following story:

This week, meet a reader we'll Regomize as "Colin" who told us about his time working as a front-end developer for an education company that decided the time was right to expand from the UK to the US.

"Suddenly we needed to localize thousands of online articles, lessons, and other documents into American English."

Inconveniently, all that content was static HTML. "There was no CMS, no database, nothing I could harness on the server side," Colin lamented to Who, Me?

After due consideration, Colin and his team decided to use regular expressions to do the job.

"Our system combined tackling spelling swaps like changing 'ae' to 'e' in words like 'archaeology' and word/phrase swaps so that British terms like 'post' were changed to the American 'mail.'" Colin knew this could go pear-shaped if the system changed a term like "post-modern" to "mail-modern," so compound words were exempt.

As Colin and his workmates considered all the necessary changes, they realized they needed a lot of rules.

"The fact it was running the replacements directly on the body HTML, and causing lots of page repaints, meant we had to build a REST API to cache which rules ran and didn't run for each page, so as to not cause slowdown by running unnecessary rules," he explained.

Which worked well until it didn't.

"One day we got a call asking why a lesson about famous artists referred to the great painter 'Vincent Truck Gogh.'"

Readers are doubtless familiar with Vincent Van Gogh, and the different names for midsize vehicles on each side of the North Atlantic.

That was just the start. Next came complaints about a religious studies lesson that explained how Adam and Eve lived in the "Yard of Eden" - not the garden. Another religion class mentioned sinister-sounding "Easter hoods" instead of the daintier "Easter bonnets."

Colin figured out that the word swaps he coded failed to consider cases where it should just skip a word altogether. A van, after all, is a truck if you're American.

"In the end, we managed to get the system to be context-aware, so that certain swaps could be suppressed if the article contained a certain trigger word which suggested it shouldn't run, and the problems went away. But it was a very entertaining bug to be involved with!"

Original Submission

Processed by jelizondo

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments