Here is some more information from various sources on sixel/ uuencode/ base64
encoding. BASE64 seems to be a better option. On the other hand, it may make
sense to add an encoding_method pragma keyword.
Comparison:
==========
Range of printable characters
-----------------------------
UUENCODE: 0x20-0x5F
BASE64: 0x2B,0x2F,0x3D,0x30-0x39,0x41-0x5A,0x61-0x7A
SIXEL: 0x3F-0X7E
Problem in translating from ASCII to EBCDIC
-------------------------------------------
UUENCODE: YES
BASE64: NO <----------------- NOTE
SIXEL: YES
Max length of line in output
----------------------------
UUENCODE: 60
BASE64: 76
SIXEL: ??
BASE64 uses character set which can be represented identically in all versions
of ISO 646, including US ASCII, and all characters in the subset are also
represented identically in all versions of EBCDIC. This is not true for
Uuencode and SIXEL as not all ASCII symbols have a corresponding EBCDIC symbol.
Mimencode (BASE64) is intended to be a replacement for uuencode for mail and
news use. The reason is simple: uuencode doesn't work very well in a number of
circumstances and ways. In particular, uuencode uses characters that
don't translate well across all mail gateways (particularly ASCII <-> EBCDIC
gateways). Also, uuencode is not standard -- there are several variants
floating around, encoding and decoding things in different and incompatible
ways, with no "standard" on which to base an implementation.
Encoding techniques: UUENCODE / BASE64 / SIXEL
==============================================
UUENCODE
--------
Uuencode started out as Unix program for encoding binary data as ASCII. It was
originally designed for UNIX systems, and the name stands for "UNIX-to-UNIX
encode". Nowadays the versions of Uuencode and Uudecode programs exist for
almost all operating systems. Uuencode is used for sending binary files by
e-mail and posting to usenet etc.
Uuencoded data starts with a line of the form:
begin <mode> <decode_pathname>
<mode> is the permission modes of source-file
<decode_pathname> is the name to be used when recreating the binary data
Uuencode repeatedly takes in a group of three bytes, adding trailing zeros if
there are less than three bytes left. These 24 bits are split into four groups
of six which are treated as numbers between 0 and 63. Decimal 32 is added to
each number and they are ouput as ASCII characters which will lie in the range
32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also
be used; however, only the six right-most bits are relevant.
Each group of sixty output characters (corresponding to 45 input bytes) is
output as a separate line preceded by an 'M' (ASCII code 77 = 32+45). At the end
of the input, if there are N output characters left after the last group of
sixty and N>0 then they will be preceded by the character whose code is 32+N.
Finally, a line containing just a single space is output, followed by one
containing just "end".
Despite using this limited range of characters, there are still some problems
encountered when uuencoded data passes through certain old computers. The worst
offenders are computers using non-ASCII character sets such as EBCDIC.
Example,
Let us assume we have 3 bytes of input: 155, 162 and 233.
The corresponding bit stream is 100110111010001011101001, which in turn
corresponds to the 6-bit values: 100110(38), 111010(58), 001011(11), 101001(41).
Now we add decimal 32 to these 6-bit chunks:
1000110(38+32=70), 1011010(58+32=90), 101011(11+32=43), 1001001(41+32=73)
We lookup these numbers in ASCII table and get:
70 = F
90 = Z
43 = +
73 = I
FZ+I is 4 bytes long, plus 32 makes 36 which translates to '$'. So full line of
uuencoded data is $FZ+I. Overall, uuencoded data will look like
begin 664 /tmp/tempfile
$FZ+I
end
It should be noted that uuencoded utility on Unix has a different method for
converting 3 bytes of input data into 4 six-bits chunk. For more details, please
refer to man page of uuencode.
-------------------------------------------------------------------------------
BASE64
------
base64 is a data encoding scheme whereby binary-encoded data is converted to
printable ASCII characters. It is defined as a MIME content transfer encoding
for use in Internet e-mail. The only characters used are the upper- and
lower-case Roman alphabet characters (A-Z, a-z), the numerals (0-9), and the "+"
and "/" symbols, with the "=" symbol as a special suffix code.
Full specifications for base64 are contained in RFC 1421 and RFC 2045.
A 65-character subset of US-ASCII is used, enabling 6 bits to be represented
per printable character. (The extra 65th character, "=", is used to signify a
special processing function.)
NOTE: This subset has the important property that it is
represented identically in all versions of ISO 646, including US
ASCII, and all characters in the subset are also represented
identically in all versions of EBCDIC. Other popular encodings,
such as the encoding used by the uuencode utility and the base85
encoding specified as part of Level 2 PostScript, do not share
these properties, and thus do not fulfill the portability
requirements a binary transport encoding for mail must meet.
Base64 encoding takes three bytes, each consisting of eight bits, and represents
them as four printable characters in the ASCII standard. It does that in
essentially two steps.
The first step is to convert three bytes to four numbers of six bits.
Each character in the ASCII standard consists of seven bits. Base64 only uses 6
bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable
and humanly readable. None of the special characters available in ASCII are used
. The 64 characters (hence the name Base64) are 10 digits, 26 lowercase
characters, 26 uppercase characters as well as '+' and '/'.
Example,
Let us assume we have 3 bytes of input: 155, 162 and 233.
The corresponding bit stream is 100110111010001011101001, which in turn
corresponds to the 6-bit values: 100110(38), 111010(58), 001011(11), 101001(41).
These numbers are converted to ASCII characters in the second step using the
Base64 encoding table.
Representing input in binary,
155 -> 10011011
162 -> 10100010
233 -> 11101001
Corresponding 6-bit values,
100110 -> 38
111010 -> 58
001011 -> 11
101001 -> 41
ASCII characters found using BASE64 lookup table (at the end of Base64 desc.),
38 -> m
58 -> 6
11 -> L
41 -> p
This two-step process is applied to the whole sequence of bytes that are
encoded. To ensure the encoded data can be properly printed and does not exceed
any mail server's line length limit, newline characters are inserted to keep
line lengths below 76 characters. The newline characters are encoded like all
other data.
Special processing is performed if fewer than 24 bits are available at the end
of the data being encoded. A full encoding quantum is always completed at the
end of a body. When fewer than 24 input bits are available in an input group,
zero bits are added (on the right) to form an integral number of 6-bit groups.
Padding at the end of the data is performed using the '=' character.
Table 1: The Base64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
-------------------------------------------------------------------------------
SIXEL
-----
Used by DEC's VT220 terminals and to send data to printers to print bitmap.
SIXEL is used to represent a bit-map of a character in soft-font. They simply
use 6-bit bytes (called "sixels"). Since there are only 64 six-bit numbers they
can be readily mapped to printable characters. It uses the upper half (Hex 40 to
7E) of the lower ASCII range. To convert a "sixel" to printable ASCII add 0x3F
to the 6-bit numbers (as we added decimal 32 in Uuencode).
-------------------------------------------------------------------------------
Received on Mon May 31 11:34:19 2004
This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 11:34:22 PDT