Theiling Online    Sitemap    Conlang Mailing List HQ   

TECH: Unicode vs The Rest Of The World (Again)

From:Danny Wier <dawiertx@...>
Date:Friday, April 30, 2004, 21:12
From: "Paul Bennett" <paul-bennett@...>

> I think it is the only way (certainly for long s). Unfortunately, many > people around here haven't yet bothered to get Unicode mail clients > (including the very small number for whom this is a still technical > impossibility for one reason or another). It's quite an amusing situation > that so many linguists would deny themselves the biggest boon to online > linguistics since the invention of e-mail, IMO. > > More worrying than the mere adoption rate is that the List Server itself > is **severely** broken when it comes to UTF-8 (and presumably any other > full 8-bit encoding). It takes byte values (inside message bodies, I don't > know about inside attachments) 128 thru 149 and subtracts 128 from them, > leaving you with multi-byte UTF sequences that at best point to the wrong > character and at worst form a broken character that is unprintable.
Welcome to the Unicode Empire. Resistance is futile. ;) Seriously, a good policy on Unicode (and non-ASCII encodings in general) I propose: 1) Give a spoiler warning at the top of your post or in the Subject: line saying "Warning: Unicode" or something like that. 2) Only use Unicode when necessary if you need to use a character outside of Latin-1; try to stick to the WGL4 character set if possible. Otherwise, if you get a Unicode-encoded message from CONLANG and reply to the list, convert to ISO or Windows Western European before you send. 3) Hebrew, Arabic, Hangul and Chinese-Japanese-Korean characters are okay, but don't expect everyone to be able to read them. We don't all have Windows 2000/XP. 4) Offer an X-SAMPA alternative, ESPECIALLY if you use anything in the IPA area of Unicode. 5) Don't use any non-ASCII at all in the Subject: line. Also, a few folks here aren't even able to read anything beyond ASCII, even 8-bit Latin-1.