Expressions in Bit String Literals

Proposal Editing Information

Requirement Summary

Add expression option for the bit length of the literal in order to enhance structured programming.

Related and/or Competing Issues:


Use Model

This code example exhibits better code structuring:

  constant vector_width_jc : positive := 6;
  subtype  vector_rst     is natural range vector_width_jc - 1 downto 0;
  subtype  vector_uvst    is unsigned(vector_rst);

  signal   vector_uvs      : vector_uvst;
  constant vector_uvc      : vector_uvst := (vector_width_jc)UX"3C";


  vector_uvs <= (vector_width_jc)UX"2F";


The exsisting BNF is as follows:

  base_specifier ::= B | O | X | UB | UO | UX | SB | SO | SX | D
  bit_string_literal ::= [ integer ] base_specifier " [ bit_value ] "

and the suggestion is to enhance it as shown:

  constrained_bit_string_literal ::= [ (constraint_expression) ] bit_string_literal

where the expression resolves to integer and replaces the integer bit width defined in the 'bit string literal' lexeme (as per David Koontz's comments below).


General Comments

This proposal was instigated from this reflector thread:

Comment from David Koontz:

Bit string literals are lexical elements.

15.8 Bit string literals

A bit string literal has a value that is a string literal. The string literal is formed from the bit value by first obtaining a simplified bit value, consisting of the bit value with underline characters removed, and then obtaining an expanded bit value. Finally, the string literal value is obtained by adjusting the expanded bit value, if required.

Annex I (informative) Glossary

literal: A value that is directly specified in the description of a design. A literal can be a bit string literal, enumeration literal, numeric literal, string literal, or the literal null. (9.3.2)

A string literal value that requires evaluation of an expression isn't directly specified when it requires evaluation and indirect access as in your example to attributes of a named entity. From a historic point of view lexical analysis provides lexical elements that are completely known (directly specified) at time of lexical analysis which provides both a lexical identifier and a value. The string literal value can't be determined without knowing the length.

I recently came across a book Compiler Construction for Digital Computers by David Gries, Cornell University, 1971, John Wiley & Sons, ISBN 0-471-32776-X which addresses the subject of lexer (or scanner) and parser (syntax analyzer) separation. In Chapter 3, The Scanner, section 3.1, Introduction:

One may justly ask why the lexical analysis cannot be
incorporated into the syntactic analysis. After all, we can use
BNF to describe the syntax of symbols. For example, FORTRAN

identifiers can te described by

(3.1.1) <identifier> ::= letter {letter I digit}5

There are several good reasons for separating lexical from
syntactical analysis:

1. A large portion of compile-time is spent in scanning
characters. Separation allows us to concentrate solely on
reducing this time. One way is to program part or all of
the scanner in assembly language, and this is easier if the
separation is made. (For example, on the IBM 360, one can
execute a single TRT instruction to sguence through 1 to
256 characters looking for a non-blank character.) Of
course, we don't recommend using assembly language unless
the compiler is really going to be used often.

2. The syntax of symbols can be described by very simple
grammars. If we separate scanning from syntax recognition,
we can develop efficient parsing techniques which are
particularly well suited for these grammars. Moreover, we
can then develop automatic methods fo constructing scanners
which use these efficient techniques.

3. Since the scanner returns a symbol instead of a character,
the syntax analyzer actually gets more information about
what to do at each step. Moreover, some of the context
checks necessary in order parse symbols are easier to
perform in an ad hoc manner than in a formal syntax
analyzer. For example, it is easy to recognize what is
meant by the FORTRAN statement D010I =... by determining
whether a , or ( occurs first afer the equal sign.

4. Development of high-level langages requires attention to
both lexical and syntactic properties. Separation of the
two allows us to investigate them independently.

5. Often one has two different hardware representations for the
same language. For example, in some ALGOL iplementations,
reserved words aLe surrounded by quote characters, and

blanks have essentially no meaning -- they are completely

ignored. In other implementations, reserved words cannot be
used as identifiers, and adjacent reserved words and/or
identifiers must be separated by at least one blank. The
hardware representation on paper tape, cards, and on-line
typewriters may be totally different.

Separation allows us to write one syntactic analyzer and
several scanners (which are simpler and easier to write) --
one for each source program representation and/or input
device. Each scanner translates the symbols into the same
internal form used by the syntactic analyzer.

A scanner may be programmed as a separate pass which performs a
complete lexical analysis of the source program and which gives
to the syntax analyzer: a table containing the source program in
an internal symbol form. Alternatively, it can be a subroutine
SCAN called by the syntax analyzer whenever the syntax analyzer
needs a new symbol (Figure 3.1). When called, SCAN recognizes
the next source program symbol and passes it to the syntax
analyzer. This alternative is generally better, because the
whole internal source program need not be constructed and kept
in memory. We will assume that the scanner is to be implemented
in this manner in the rest of this chapter.

This is the environment spawning Ada and it's derivative VHDL. Personally I write lexers that can be operated either as a pipeline element (separate pass) or as an on demand subroutine. It allows independent testing as well as flexibility for things like syntax highlighting.

The present form in VHDL-2008 without an intervening space allows a length specified bit string literal to be treated as a single lexical element while left and right parenthesis are delimiters. Delimiters separate lexical elements in the syntax. Using them in a lexical element would require the ability look ahead or the ability to backtrack lexical elements for purposes of disambiguation, something likely incompatible with some number of existing lexical analyzer implementations. Likewise todays lexical analyzers don't require access to a symbol table to delimit lexemes.

While this all looks like a good argument for a preprocessor there are three other candidate constructs in VHDL, qualified expressions, type conversions and function calls. Qualified expressions are out because the string literal may not be dimensionally identical with the type mark. Type conversions do not allow operation on string literals.

-- DavidKoontz - 2013-01-25

-- Email reflector comment from: PeterFlake (Fri Jan 25 2013 - 03:12:44 PST)

I support the proposal, but I think the comments about compiler construction are out of date. 1971 is before compiler construction tools were available. Some tools have separate descriptions of scanner and parser e.g. flex and bison, and some tools have a single description e.g. antlr. Nevertheless, it is still a good idea to make the language easy to parse, and so I agree with the proposed parentheses.

-- Brent Hayhoe Question

If I understand correctly:

  • a compiler has two separate modes (at least) - lexical analysis and syntactical analysis.
  • The existing VHDL2008 syntax for a bit string literal defines a purely lexical value.
  • Adding the expression part breaks this model making the compile function more complex.

This may explain why when I first encountered this VHDL2008 enhancement, my thought was "good, but not good enough". Do you think the 'integer' syntax for the bit length was intentional, in order to minimize the compiler design impact for 2008 implementations?

-- DavidKoontz - 2013-01-25 RE:Question

In all cases its prudent to ask why changes were made the way they were.

I liked your (Daniel Kho's) idea, but asked myself why the present restriction was implemented. It would appear it comes down to amount of work to implement and when someone working for a tool vendor is in support of it it either implies support for other agendas or competitive advantage because they can implement faster or with less work. One strong historic principle in the VHDL specification was that it was implementation neutral.

The impact of making this one little change might affect the structure of the majority of existing VHDL analyzer tools, implying changes to support a narrowing of implementation methods as Peter Flake implies. When VHDL87 was released the LRM had something close to 60 percent in common with the Ada-83 specification, albeit with section reordering to present the new stuff first. Just about all the changes to VHDL since have been relatively neutral to that model of commonality.

A case in point might be extended identifiers, which other than including white space characters SPACE and NBSP through inclusion of the full set of graphic characters (in lieu of say first character of a word capitalization) could be handled by extending the input character set. In all likelihood this preserved hashing schemes for indexing symbol table as well as keyword recognition.

The protect tool directive doesn't need to imply a tool directive lexical element, where a protected portion of a design unit or units are substituted for an encrypted text. This allows the use of a multiple pass model, while a preprocessor is just as capable of performing the substitution (a pipeline model). There's a hint of a desire for greater security least the wiley hacker intercept unencrypted and precious 'Eye Pee' (24.1.6 "A decryption tool shall not display or store in any form accessible to the user or other tools any parts of decrypted portions of a VHDL description, decrypted keys, or decrypted digests"). This case actual demonstrates something about the vendor championing the feature's implementation.

Most telling should be that expressions are not lexical elements or contained in lexical elements historically in VHDL (nor Ada). Including an expression in a lexical element shouldn't be done cavalierly. Carrying around baggage of the past or otherwise not providing competitive advantage to particular vendor's implementations should give pause to why new languages (such as SystemVerilog) come about.

There's also an implication that a lexical element other than delimited comments will fit on a single input line. This isn't necessarily true for an expression contained within a lexical element. Comments aren't necessary for syntactical analysis and can be discarded in a lot of implementations. Delimited comments generally don't affect lexer structure while multi line expressions might, requiring the lexer to track input line number for back tracking.

In VHDL-93 there was an implied lexical element evaluation order that went (SEPARATOR), BIT_STRING_LITERAL, IDENTIFIER, ABSTRACT_LITERAL, CHARACTER_LITERAL, COMMENT, DELIMITER, STRING_LITERAL, EXTENDED_IDENTIFIER which alleviated the need for backtracking in a lexer without regard to implementation tools (and some of these can be moved in order it's first character dependent). Introducing integer into bit_string_literal ::= [ integer ] base_specifier " [ bit_value ] ", requires lookahead to determine whether or not an abstract literal is separated from a following identifier (base specifier) and is actually manageable in just about any implementation, it's single character based, and lexers are character oriented.

Lookahead for integer expression in bit_string_literal ::= [ integer | ( _integer _expression) ] base_specifier " [ bit_value ] " means parsing integer expressions and disambiguating base_specifier from various identifiers by encountering the following run on string literal. The issue is with complexity of expressions, which are the single most complex thing in VHDL, capable of recursion including parenthesis delimited sub expressions.

In general we don't burden languages with the ability to implement a separate lexer with potentially large amounts of retrace. This breaks the model totally. Mind you could try to limit expression complexity here and defer it's syntax meaning to the parser. It's still contraindicated to parse VHDL expressions or subsets in a lexer.

There's a model here where despite the ability of modern versions of YACC to mix character literals (as integers) with integer tokens, lexers operated in the character domain and parsers operate in the symbol domain (including declared identifiers). That's the basic premise of 'lexical elements'. We handle these kinds of conflicts by defining a lexical element (the smallest unit of a language) in terms of it's token value (typically integer) and a union between string value (lexeme - the string representing the lexical element) and abstract (numerical) value.

This is how we could defer expression evaluation to the parser (where declared names are recognized), re-parsing the string value. I'd imagine there are some VHDL lexers that defer abstract literal conversion for instance. In this case we'd have bit string literals of two classes, those that can be "directly specified in the description" (literal, glossary) and those that are deferred.

Like I said, I like the idea, just not the idea of modifying the definition of lexical element. My preferred solution would be to implement a new syntax feature:

constrained_bit_string_literal ::= [ (constraint_expression) ] bit_string_literal

Where the literal string value of the bit string literal (possibly previously length specified with a prefix integer) could be abrogated or lengthened, the constraint expression evaluating as an integer and matching the array length of the enclosing left hand side. This implies a single token look ahead on encountering a parenthesis enclosed expression that evaluates to an integer.

This wouldn't limit lexical analyzer implementation.

-- Brent Hayhoe RE:Question

I'm all in favour of making implementation easier, as long as the outcome for the user is the same. It would appear that you are just suggesting a slightly different BNF structure with some 'abrogation' wording in the LRM regarding the integer bit length. Have I understood you correctly David?

And I presume this would still cause problems:

constrained_bit_string_literal ::= [ ( integer_expression ) ] base_specifier " [ bit_value ] "

-- DavidKoontz - 2013-01-30

The way I expressed it was intentional. I'm proposing leaving the lexical element bit_string_literal alone, because the above BNF declaration is simply changing the name from bit_string_literal to constrained_bit_string_literal while adding an integer expression.

The idea is to change nothing in the bit_string_literal lexical element as it stands today, merely associating an expresssion which has semantic constraints with it optionally in the syntax analyzer.

-- DavidKoontz - 2013-01-30 RE: abrogration language

When I wrote 'abrogates or extends' the intent was to show that the string could be both shortened or lengthed. Because the bit_string_literal value is the lexme (string corresponding to the lexical element in the design file description) modified in length by the width integer, abrogation isn't the correct way to describe what the constraint expression does. In practical terms it either extends the length (width) further, has no effect or shortens the length of the value returned with the bit_string_literal token. It doesn't treat as nonexistent the width integer. Rather it modifies the string literal equivalent by length further.

15.8 Bit string literals describes the process of producing the equivalent string literal:

A bit string literal has a value that is a string literal. The string literal is formed from the bit value by first obtaining a simplified bit value, consisting of the bit value with underline characters removed, and then obtaining an expanded bit value. Finally, the string literal value is obtained by adjusting the expanded bit value, if required.

Action in the syntactical analyzer modifies the value of the string literal value, but lexical elements are viewed as indivisible, the building blocks we use to describe syntax. The constraint expression would either expand or contract the expanded bit value based on the constraint expression integer result either being greater than or less than the the string literal length of the expanded bit value.

There's also the notion here of doing this to the end of the string literal instead of the head. There are three cases for an integer: positive, zero, negative. What do we do when the expression returns 0? (Either don't modify or produce null string). Returns less than zero? (Could we use this to modify the other end of the string, performing left justification instead of right?). Restrict to positive?

It behooves us to be concise.

-- Brent Hayhoe RE:Question

I think we may be talking at cross purposes here. I think that you are talking about the 'bit string literal' syntax as a lexeme and the integer constraint will actually modify the lexeme by extending its length (8 bits > 16 bits), leave it the same (8 bits > 8 bits), or reduce it (16 bits > 8 bits) - the integer part of the lexeme is replaced. Whereas I was considering it from a final resultant bit string length, and that the existing constraint being in the lexeme will be overriden.

I think it comes down to the different ways a compiler designer thinks to that of a user. Point taken.

Your last point should (I think) be dealt with by constraining the value to 'Natural' rather than pure 'Integer'. Whatever way it is resolved, values less than one should always retun a null string, and the modifying constraint should always take precedence. The value constrains to an absolute bit length result of zero or positive number of bits.

-- DanielKho - 2013-01-29 RE:Question

Having a similar viewpoint as Brent, I'm also in favour of any LRM change that makes it easier for tool vendors to implement. The BNF structure may vary, but the resulting effect from user's perspective is the same. Myself being more of a tool user rather than a tool writer/implementor, I however do agree with David that considerations be made from a tool vendor perspective. Hopefully an agreement can be reached on a solution which is simple enough for both the tool users and tool vendors.

I'm fine with "constrained_bit_string_literal ::= [ (constraint_expression) ] bit_string_literal" or any suitable variation of it.

-- KevinJennings - 2013-04-05

This is simply an example of a type conversion, in this case from a bit string into a vector. The proposed method as shown is error prone since one needs to define constants and subtypes and make sure that the proper width is used to specify the size of the target.

Functionally, all that is being done in the example is to perform a type conversion from bit string to unsigned. The cleanest, least ambiguous way to do this is the following:

vector_uvs <= to_unsigned(UX"2F");

The problem of course is that the above line of code will not compile today because the 'to_unsigned' function requires the length of the target as a parameter. This could be overcome though and the above code would be legitimate if the FunctionKnowsVectorSize proposal is implemented instead.

However, even if that proposal is not implemented, how are the following three substantially different?

1. vector_uvs <= to_unsigned(16#2F#, vector_uvs'length); (Legitimate code in all versions of VHDL)

2. vector_uvs <= (vector_width_jc) UX"2F" (Original proposal)

3. vector_uvs <= (vector_uvs'length) UX"2F" (Improvement on the proposal...the convention of using 'length guarantees that things are the right size)

#1 has the advantage that it is explicitly making clear that the argument is being converted to an unsigned.

#2 has the advantage of less typing, but one has to guess about the type of the RHS.

#3 is no different than #2 but makes it less error prone since it removes the dependency on an independent object that really shouldn't be independent (i.e. 'vector_width_jc')

4/15/2013: Removed objection based on type checking. No objections, but still think that FunctionKnowsVectorSize is simpler and less error prone.

-- DavidKoontz - 2013-04-07 Re: RE: Question #2, #3

#2 and #3 both appear to be valid constrained bit string literals. You could note that 9.3.2 (Operands, Literals) tells us:

String and bit string literals are representations of one-dimensional arrays of characters. The type of a string or bit string literal shall be determinable solely from the context in which the literal appears, excluding the literal itself but using the fact that the type of the literal shall be a one-dimensional array of a character type. The lexical structure of string and bit string literals is defined in Clause 15.

That the type of string literals and the value of bit string literals is determinable from environment. Any place a string literal's type is determined from it's environment, so can a bit string literal.

The semantic constraint mechanism is intended only to edit the length of the string value of a bit string literal. There is no intention to abandon type checking.

Because the interpretation of a constrained bit string literal can cross the lexer parser boundary because lexical analysis has not historically been used to evaluate expressions which can require symbol lookup, either an attribute passed with the lexical element bit string literal to the parser is needed to indicate Signed (S) versus Unsigned (U) or rendering of a bit string literal into it's equivalent string literal value can be deferred until after semantic analysis has determined whether or not the bit string literal is preceded by a constraint expression.

Also by extension the same mechanism that could be used to determine length from environment you propose in FunctionKnowsVectorSize should it prove workable could be used in place of a constrained bit string literal, also only requiring bit literal string length specification in assembling strings or array values akin to use of a length specifier in a printf statement in C. There may be places where the length is required to be determined from bit string literal directly such as in aggregates or length deferred arguments to subprograms.

-- Brent Hayhoe -2013-04-19 Re: RE: Question #1

I think #1 is incorrect syntax and should be:

1. vector_uvs <= to_unsigned(16#2F#, vector_uvs'length); -- OOPS! Corrected -- KevinJennings - 2013-04-20

However, I hadn't noticed before, but there is an overloaded function in 'Numeric_Std' that allows:

1a. vector_uvs <= to_unsigned(16#2F#, vector_uvs);

which does the same thing, just saves a bit of typing.

-- KevinJennings - 2013-04-20

For the originator and supporters, my question remains though: How are the three (now four counting the '1a' example) substantially different? My take is that they are all substantially the same. They are simply different ways of doing something all with the same basic limitations, none is an improvement over current LRM. They all suffer from the same drawback that in order to avoid a latent design defect, one needs to manually make sure that the left hand side is used as a parameter on the right hand side.

Allowing the use of bit string literals rather than numeric literals isn't that important in my opinion. I would also consider it somewhat of a detriment in that the bit string literal clearly is meant to be interpreted as a number so why not simply use a number in the first place which allows use of currently standardized functions? The text one would type to represent that number is practically identical and the 'numeric part' is identical. What is the advantage?

Follow ups to that would be:

- How is the proposal an improvement over what currently exists? My take is that it is not an improvement, just different. While 'just different' can be OK, then it would come down to a value judgment on whether the perceived benefit is worth the collective cost.

- What is the advantage of the use of a bit string literal rather than a numeric literal such as 47 or 16#2F#? My take is that there is at best two keystroke differences between typing "2F" versus 16#2F#.

-- Brent Hayhoe -2013-04-23 Bit String Literals

So as defined in the 2008 LRM section 15.8:

The 'string' part of a bit string literal can be any graphical character with the underscore character having a special interpretation.

They are used to assign values to objects of bit-string vector types, e.g. BIT_VECTOR, STD_ULOGIC_VECTOR.

They interpret numerical octal, hexadecimal and decimal string values and convert them to bit values (signed or unsigned).

The underscore character can be used to delimit blocks of characters for readability improvement. It is stripped out before the bit string literal is interpreted.

However, they are not just used for assigning numerical values. For example, in the case of STD_ULOGIC_VECTOR types, they can set bits as 'X', 'W', 'Z' etc.

Up until VHDL-2008 the assigned vector had to be a multiple of 4 bits for hex assignments and 3 bits for octal assignments.

So, at least two advantages. Delimiting using underscores for readabilty and the assigning of simulation bit meta-values other than '1' or '0'.

-- TristanGingold - 2016-05-15

For the syntax: I think we should not allow both a constraint and a size, ie things like (length)12b"01"

Either make enfore it with the grammar or with a rule. With the grammar, the new rules could be:

  unsized_bit_string_literal ::= base_specifier " [ bit_value ] " (Like vhdl 2002, maybe get a better word instead of unsized)
  bit_string_literal ::= [ integer] unsized_bit_string_literal
  constrained_bit_string_literal ::= ( expression) unsized_bit_string_literal

For the static semantic: The type of the length expression should be any integer type (like the parameter of 'Pos). I see no reason to require Integer.

I also see no reason to require the expression to be constant. Either require it to be statically constant but that would be too strict and useless, or remove any requirement about static.

Also this new constrained bit string literal would be as static as the expression.

Finally, it is an error if the expression value is less than 0.

Based on this slightly more formal description of this feature, I see no difficulties to implement it. This is more or less an implicit call to RESIZE, so I wonder if that feature is really interesting: it simply makes the notation slightly shorter and thus less readable.

Why not simply automatically resize unsized bit string litteral when they are on the RHS of an assignment ?


Add your signature here to indicate your support for the proposal

-- Brent Hayhoe -2013-01-24

-- PeterFlake - 2013-01-25 (as per reflector Email)

-- DanielKho - 2013-01-29

Topic revision: r20 - 2016-05-15 - 06:32:35 - TristanGingold
Copyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback