Comment 8C Re: UTF-8




UTF-8 (Score: 1)

by on 2014-02-20 08:02 (#38)

The "Content-Type" header has been: "text/html; charset=utf-8" (as well as the equivalent meta tag) since the first day.

As for posting comments, I'm currently being "overly safe" and only allowing keys that can be typed on a US keyboard (minus the ampersand). The idea is that I eventually loosen the rules a bit for most western languages and useful symbols (like euro, pound, and yen.)

The full set is just a huge potential source of abuse with non-printing characters, right-to-left switching, and CJK characters. Within minutes of Soylent News adding it, for example, people where posting pages of braille and other crap.

Re: UTF-8 (Score: 1)

by on 2014-02-28 19:34 (#8C)

Hi. I have created code in the past, a certain set of magical regular expressions, that complies with International Domain Names (IDN) version 2 standard, and only allows UTF-8 letters and punctuation (no symbols).

If you want, just tell me where to send this.

Junk Status

Not marked as junk