Skip to content

XML output incorrectly base64-encodes binary data #310

@lewissbaker

Description

@lewissbaker

The function base64_encode, which is used by the --xml command-line option of zbarcam/zbarimg to encode binary data in a <![CDATA[ section, has a bug in it that gives incorrect base64-encoded strings for some binary QR code data.

For example, when you try to evaluate the following code:

const char src[3] = { '\x3a', '\xc2', '\x98'};
char dst[5];
base64_encode(dst, src, 3);
dst[4] = '\0';
printf("%s\n", dst);

The output you get is "//+Y".
However, the actual output should be "OsKY".

This is because of the use of signed char types when performing the bit-shifting of groups of 3 bytes into an unsigned integer.
See

zbar/zbar/symbol.c

Lines 311 to 326 in a549566

unsigned base64_encode(char *dst, const char *src, unsigned srclen)
{
static const char alphabet[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
char *start = dst;
int nline = 19;
for (; srclen; srclen -= 3) {
unsigned int buf = *(src++) << 16;
if (srclen > 1)
buf |= *(src++) << 8;
if (srclen > 2)
buf |= *(src++);
*(dst++) = alphabet[(buf >> 18) & 0x3f];
*(dst++) = alphabet[(buf >> 12) & 0x3f];
*(dst++) = (srclen > 1) ? alphabet[(buf >> 6) & 0x3f] : '=';
*(dst++) = (srclen > 2) ? alphabet[buf & 0x3f] : '=';

If either the second or third byte in the group has the high-bit set then on platforms that have a signed char type, the expression *(src++) << 8 performs signed integer promotion of the char value to an int, which sign-extends the top-most bit into the upper 24 bits and then shifts this value left by 8, leaving the upper 16 bits still set to 1. Thus when it is bit-wise or'ed into buf, it overwrites the bits already shifted into bits 16-23 by the statement unsigned int buf = *(src++) << 16;

To fix this issue, you need to first cast the char values to unsigned char values before shifting them, so that the promotion to int when shifting does not sign-extend the most-significant bit.

For example:

 unsigned base64_encode(char *dst, const char *src, unsigned srclen)
 {
     static const char alphabet[] =
 	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
     char *start = dst;
     int nline	= 19;
     for (; srclen; srclen -= 3) {
-        unsigned int buf = *(src++) << 16;
+        unsigned int buf = ((unsigned char)*(src++)) << 16;
         if (srclen > 1)
-            buf |= *(src++) << 8;
+            buf |= ((unsigned char)*(src++)) << 8;
         if (srclen > 2)
-            buf |= *(src++);
+            buf |= ((unsigned char)*(src++));
         *(dst++) = alphabet[(buf >> 18) & 0x3f];
         *(dst++) = alphabet[(buf >> 12) & 0x3f];
         *(dst++) = (srclen > 1) ? alphabet[(buf >> 6) & 0x3f] : '=';
         *(dst++) = (srclen > 2) ? alphabet[buf & 0x3f] : '=';
         if (srclen < 3)
             break;
         if (!--nline) {
             *(dst++) = '\n';
             nline    = 19;
         }
     }
     *(dst++) = '\n';
     *(dst++) = '\0';
     return (dst - start - 1);
 }

An alternative might be to change the type of the src parameter to be const unsigned char* to represent bytes, rather than using the signed const char* array.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions