I may regret this as I'm not an internationalisation guru, but there seems to be some misunderstanding in the whole discussion here.
We need to keep Unicode *representations* (ie a string as a set of code points) separate from *encodings* (ie how those strings of code point are stored in files or memory).
As fara as encodings go UTF-8 is a perfectly workable encoding (and is very widely used) - Linux and Windows can both read and write files encoded this way. And indeed should be fine for UTF16 files if they have the correct header on them... (I'm not sure how different OSs are evidence of endian problems - did Window put a non standard header on UTF16 files at some point?)
For those who are interested, there's a very readable explanation here, called "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"
http://www.joelonsoftware.com/articles/Unicode.html
Cheers,
Martin
(hoping he hasn't made anything worse... :)
-- Martin Thompson CEng MIET TRW Conekt, Stratford Road, Solihull, B90 4GW. UK +44 (0)121-627-3569 : martin.j.thompson@trw.com http://www.conekt.co.uk/ Conekt is a trading division of TRW Limited Registered in England, No. 872948 Registered Office Address: Stratford Road, Solihull B90 4AX -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.Received on Thu Aug 11 01:01:12 2011
This archive was generated by hypermail 2.1.8 : Thu Aug 11 2011 - 01:01:32 PDT