I retro-fitted Unicode (UTF-8) to an existing language last year. My own
view is that any/every modern language should support UTF-8 natively,
and that it's easy if you build it in from the start. Retro-fitting can
be a real problem, though. Some issues:
- I was surprised by the comment about the difficulty of supporting
strings; this was the only thing that was trivial
- If the compiler has a (f)lex lexer, then you've got a problem, since
lex only supports ASCII-7. However, you can work around this without too
much trouble
- not an issue for VHDL, but my biggest problem was getting UTF-8 into a
C-like preprocessor. This was most of the work
- I had other minor problems with symbol tables, and so on, but these
were easy to fix. They would be more difficult if your compiler is
written in C and relies heavily on nul-terminated strings. You can
probably fix all these problems by requiring input to be "modified
UTF-8" rather than standard UTF-8. I think this is quite common - Java,
for example.
- there's no problem with legacy back-end tools: you just output a
translated version of your identifier. You can't put arbitrary UTF-8
chars into an extended identifier, so this wouldn't be particularly
clean, but it's better than nothing [as an aside, extended identifiers
would of course be redundant with UTF-8 support]
- you have to think about what 'line terminators' and 'spaces' actually
are - Unicode has an extended set of these
- you'd need to support input files that start with Microsoft's 3-byte
"UTF-8 BOM"
- one plus is that VHDL has no significant printf-style support, so you
don't have to worry about the tricky issue of display widths, aligning
text, and so on
I think there were other issues, but I can't think what right now. It
was a lot of work, for a group of users who were mainly probably happy
to write in Latin characters anyway. Having said that, it's difficult to
justify not having Unicode support.
-Evan
-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.Received on Wed Aug 10 02:30:06 2011
This archive was generated by hypermail 2.1.8 : Wed Aug 10 2011 - 02:30:41 PDT