Re: [vhdl-200x] VHDL support for Unicode

From: Ernst Christen <christen.1858@comcast.net>
Date: Wed Aug 10 2011 - 14:08:37 PDT
There is another level of Unicode support, not mentioned in Martin's twiki page: support in comments only. This will make it possible for model writers to annotate their models using their native language, but the simulatable/synthesizable part of the description stays within ISO8859-1. Depending on what an implementation does with comments it may reduce the implementation problem to one that can be handled by the lexer. The issue here is to understand what the real requirements are, so far I believe we've only discussed solutions and their issues.
 
One issue to keep in mind, and I believe this will have to be a strong requirement, is portability of a description. Based on my experience this rules out pure unicode files because of the wide character issue Evan mentioned. Additionally, if more than one byte is used per character the endianness of the computer matters. The Microsoft signature in a unicode file is definitely Windows-specific (Linux tools don't recognize it), although it comes in both a big endian and a little endian version. This seems to suggest an encoding of characters whose unicode representation needs more bytes than a "regular" character is preferable to a pure unicode file. As far as I know, the most portable "regular" character set is still based on using a single 8-bit byte per character, and the most portable encoding for unicode is UTF-8.
 
Ernst Christen
 
On Wed, 10 Aug 2011 14:50:39 +0100, Evan Lavelle <eml-vhdl-200x@cyconix.com> wrote:
 
 On 10/08/2011 13:17, David G. Koontz wrote:

> I was thinking of doing strings with 32 bit chars where you zero pad smaller
> values from the LSB.  Unicode should parse cleanly and uniquely allowing a
> traversal to map multibyte chars to 32 bit (unsigned) chars.

I did initially go for UTF-32, but it turned out to be just too much
 work to change all the string stuff for wide char handling, and to find
 all the dependencies. The killer was that MS and Unix have different
 ideas on what a 'wide' char is (16- vs. 32-bit). UTF-8 turned out to be
 much easier, and seems to be what most other people have gone for.

> Semantic
> error handling would need to be considered

I'd forgotten that. If you advertise a language as Unicode-compliant,
 people expect it to be fully internationalised. There's not much point
 typing Sanskrit identifiers and getting English error messages. I had to
 strip out several hundred warning and error messages and make the whole
 messaging system table-driven, so that messages could be replaced with
 local versions. That was a *lot* of work, and difficult to test.

> If you look at the numbers of comments in Martin's three blog posts, there
> aren't enough to be statistically significant.  I got two replies asking
> after users of an open source tool to which I have contributor access.

No, but you have to consider that the people who might be interested
 probably aren't reading c.l.vhdl or English blog posts. Even so, I'd
 still put this way down the list of requirements.

-Evan

--
 This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

 

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean. Received on Wed Aug 10 14:09:37 2011

This archive was generated by hypermail 2.1.8 : Wed Aug 10 2011 - 14:10:12 PDT