RE: [vhdl-200x-ft] Re: [vhdl-200x] IP Protection and Encryption Donation

From: Deepak Pant <dpant@cadence.com>
Date: Mon May 31 2004 - 11:34:14 PDT

Here is some more information from various sources on sixel/ uuencode/ base64

encoding. BASE64 seems to be a better option. On the other hand, it may make

sense to add an encoding_method pragma keyword.

 

Comparison:

==========

 

Range of printable characters

-----------------------------

UUENCODE: 0x20-0x5F

BASE64: 0x2B,0x2F,0x3D,0x30-0x39,0x41-0x5A,0x61-0x7A

SIXEL: 0x3F-0X7E

 

Problem in translating from ASCII to EBCDIC

-------------------------------------------

UUENCODE: YES

BASE64: NO <----------------- NOTE

SIXEL: YES

 

Max length of line in output

----------------------------

UUENCODE: 60

BASE64: 76

SIXEL: ??

 

 

 

BASE64 uses character set which can be represented identically in all versions

of ISO 646, including US ASCII, and all characters in the subset are also

represented identically in all versions of EBCDIC. This is not true for

Uuencode and SIXEL as not all ASCII symbols have a corresponding EBCDIC symbol.

 

 

Mimencode (BASE64) is intended to be a replacement for uuencode for mail and

news use. The reason is simple: uuencode doesn't work very well in a number of

 circumstances and ways. In particular, uuencode uses characters that

don't translate well across all mail gateways (particularly ASCII <-> EBCDIC

gateways). Also, uuencode is not standard -- there are several variants

floating around, encoding and decoding things in different and incompatible

ways, with no "standard" on which to base an implementation.

 

 

 

Encoding techniques: UUENCODE / BASE64 / SIXEL

==============================================

 

UUENCODE

--------

Uuencode started out as Unix program for encoding binary data as ASCII. It was

originally designed for UNIX systems, and the name stands for "UNIX-to-UNIX

encode". Nowadays the versions of Uuencode and Uudecode programs exist for

almost all operating systems. Uuencode is used for sending binary files by

e-mail and posting to usenet etc.

 

Uuencoded data starts with a line of the form:

begin <mode> <decode_pathname>

 

<mode> is the permission modes of source-file

<decode_pathname> is the name to be used when recreating the binary data

 

Uuencode repeatedly takes in a group of three bytes, adding trailing zeros if

there are less than three bytes left. These 24 bits are split into four groups

of six which are treated as numbers between 0 and 63. Decimal 32 is added to

each number and they are ouput as ASCII characters which will lie in the range

32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also

 be used; however, only the six right-most bits are relevant.

 

Each group of sixty output characters (corresponding to 45 input bytes) is

output as a separate line preceded by an 'M' (ASCII code 77 = 32+45). At the end

 of the input, if there are N output characters left after the last group of

sixty and N>0 then they will be preceded by the character whose code is 32+N.

Finally, a line containing just a single space is output, followed by one

containing just "end".

 

Despite using this limited range of characters, there are still some problems

encountered when uuencoded data passes through certain old computers. The worst

offenders are computers using non-ASCII character sets such as EBCDIC.

 

Example,

 

Let us assume we have 3 bytes of input: 155, 162 and 233.

The corresponding bit stream is 100110111010001011101001, which in turn

corresponds to the 6-bit values: 100110(38), 111010(58), 001011(11), 101001(41).

 

Now we add decimal 32 to these 6-bit chunks:

1000110(38+32=70), 1011010(58+32=90), 101011(11+32=43), 1001001(41+32=73)

 

We lookup these numbers in ASCII table and get:

70 = F

90 = Z

43 = +

73 = I

 

FZ+I is 4 bytes long, plus 32 makes 36 which translates to '$'. So full line of

uuencoded data is $FZ+I. Overall, uuencoded data will look like

 

begin 664 /tmp/tempfile

$FZ+I

 

end

 

It should be noted that uuencoded utility on Unix has a different method for

converting 3 bytes of input data into 4 six-bits chunk. For more details, please

refer to man page of uuencode.

 

-------------------------------------------------------------------------------

 

BASE64

------

base64 is a data encoding scheme whereby binary-encoded data is converted to

printable ASCII characters. It is defined as a MIME content transfer encoding

for use in Internet e-mail. The only characters used are the upper- and

lower-case Roman alphabet characters (A-Z, a-z), the numerals (0-9), and the "+"

 and "/" symbols, with the "=" symbol as a special suffix code.

 

Full specifications for base64 are contained in RFC 1421 and RFC 2045.

 

A 65-character subset of US-ASCII is used, enabling 6 bits to be represented

per printable character. (The extra 65th character, "=", is used to signify a

special processing function.)

 

      NOTE: This subset has the important property that it is

      represented identically in all versions of ISO 646, including US

      ASCII, and all characters in the subset are also represented

      identically in all versions of EBCDIC. Other popular encodings,

      such as the encoding used by the uuencode utility and the base85

      encoding specified as part of Level 2 PostScript, do not share

      these properties, and thus do not fulfill the portability

      requirements a binary transport encoding for mail must meet.

 

 

Base64 encoding takes three bytes, each consisting of eight bits, and represents

them as four printable characters in the ASCII standard. It does that in

essentially two steps.

 

The first step is to convert three bytes to four numbers of six bits.

Each character in the ASCII standard consists of seven bits. Base64 only uses 6

bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable

and humanly readable. None of the special characters available in ASCII are used

. The 64 characters (hence the name Base64) are 10 digits, 26 lowercase

characters, 26 uppercase characters as well as '+' and '/'.

 

Example,

 

Let us assume we have 3 bytes of input: 155, 162 and 233.

The corresponding bit stream is 100110111010001011101001, which in turn

corresponds to the 6-bit values: 100110(38), 111010(58), 001011(11), 101001(41).

 

These numbers are converted to ASCII characters in the second step using the

Base64 encoding table.

 

Representing input in binary,

     155 -> 10011011

     162 -> 10100010

     233 -> 11101001

 

Corresponding 6-bit values,

     100110 -> 38

     111010 -> 58

     001011 -> 11

     101001 -> 41

 

ASCII characters found using BASE64 lookup table (at the end of Base64 desc.),

     38 -> m

     58 -> 6

     11 -> L

     41 -> p

 

This two-step process is applied to the whole sequence of bytes that are

encoded. To ensure the encoded data can be properly printed and does not exceed

any mail server's line length limit, newline characters are inserted to keep

line lengths below 76 characters. The newline characters are encoded like all

other data.

 

Special processing is performed if fewer than 24 bits are available at the end

of the data being encoded. A full encoding quantum is always completed at the

end of a body. When fewer than 24 input bits are available in an input group,

zero bits are added (on the right) to form an integral number of 6-bit groups.

Padding at the end of the data is performed using the '=' character.

 

 

                            Table 1: The Base64 Alphabet

 

      Value Encoding Value Encoding Value Encoding Value Encoding

           0 A 17 R 34 i 51 z

           1 B 18 S 35 j 52 0

           2 C 19 T 36 k 53 1

           3 D 20 U 37 l 54 2

           4 E 21 V 38 m 55 3

           5 F 22 W 39 n 56 4

           6 G 23 X 40 o 57 5

           7 H 24 Y 41 p 58 6

           8 I 25 Z 42 q 59 7

           9 J 26 a 43 r 60 8

          10 K 27 b 44 s 61 9

          11 L 28 c 45 t 62 +

          12 M 29 d 46 u 63 /

          13 N 30 e 47 v

          14 O 31 f 48 w (pad) =

          15 P 32 g 49 x

          16 Q 33 h 50 y

 

-------------------------------------------------------------------------------

 

SIXEL

-----

Used by DEC's VT220 terminals and to send data to printers to print bitmap.

SIXEL is used to represent a bit-map of a character in soft-font. They simply

use 6-bit bytes (called "sixels"). Since there are only 64 six-bit numbers they

can be readily mapped to printable characters. It uses the upper half (Hex 40 to

 7E) of the lower ASCII range. To convert a "sixel" to printable ASCII add 0x3F

 to the 6-bit numbers (as we added decimal 32 in Uuencode).

 

-------------------------------------------------------------------------------
Received on Mon May 31 11:34:19 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 11:34:22 PDT