On 10/08/11 9:29 PM, Evan Lavelle wrote:
> I think there were other issues, but I can't think what right now. It was a
> lot of work, for a group of users who were mainly probably happy to write in
> Latin characters anyway. Having said that, it's difficult to justify not
> having Unicode support.
There can be implementation dependent mechanisms for specifying library
locations and name mapping and also analysis dependencies that can require
string handling. We're talking about a subset of lib standard (for null
terminated strings).
I was thinking of doing strings with 32 bit chars where you zero pad smaller
values from the LSB. Unicode should parse cleanly and uniquely allowing a
traversal to map multibyte chars to 32 bit (unsigned) chars.
The big problem as you've pointed out is needing a lexer replacement for
those domain model specification languages (e.g. lex or flex) that are ASCII
constrained. I've written a version of my VHDL93 lexer that is lex/flex
interface compliant in C, that I was planning on updating to VHDL-2008
which would allow string format conversion. You'd require conversion back
to unicode for outputs (your own printf's).
I'd imagine there are classes of ancillary tools that would have problems
with unicode and would need updating. Anything that peers into analyzed
libraries comes to mind. Another class would be waveform viewers and 1029.1
(WAVES, still in effect, isn't it?) and the like. You either keep names in
unicode byte streams and the tools need to be modified to handle it or
you've modified them for N-byte chars and modify string handling. Foreign
procedures/functions would need a common view of strings or translation
across the boundary. They'd need to be unicode compliant at the least.
All the sudden we're seeing bigger ripple effects.
I had optimized by careful analysis a hashing algorithm that maximized
symbol table leveling that could also swallow extended identifiers. The
hashes might need to get bigger (currently 12 bits). The higher the nn in
UTF-nn, the more bits you nee, I think.
The lexer and symbol table effort isn't the largest part of a new tool
effort but might be prohibitive for retrofitting, in particular when you
consider debugging environments that any vendors might have implemented.
Essentially you'd need to separate data from program control. Semantic
error handling would need to be considered, character counts and string
format conversion, object name output. Table driven parsers generally have
the ability to have code hung on them arbitrarily. Parsers wouldn't
necessary survive unscathed.
If you look at the numbers of comments in Martin's three blog posts, there
aren't enough to be statistically significant. I got two replies asking
after users of an open source tool to which I have contributor access.
I haven't heard whether unicode is actually going into Ada 2012 either. Ada
would be mainly established tools and would have better developer support
than VHDL has. (Consider I waved IR1045 around to make the point).
I don't think we've proven a need for unicode, not based on a sample of 10
or so total. This as one of Martin's comments said is a marketing problem
which implies asking enough customers to get a real indication.
I'd imagine this sort of thing is why new languages are defined. Reducing
the amount of baggage dragged into the future. I'd also worry about vendor
resistance to becoming compliant based on effort size. We already see
enough lag in 2002/2008 compliance. A standard should lead where people are
willing to follow or heading anyway. In this case it's people using VHDL
and not other languages.
-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.Received on Wed Aug 10 05:18:17 2011
This archive was generated by hypermail 2.1.8 : Wed Aug 10 2011 - 05:18:37 PDT