Related Vulnerabilities: CVE-2009-0153 CVE-2008-1036

Source

Debian Bug report logs - #534590
does not properly handle invalid byte sequences during Unicode conversion

Package: icu; Maintainer for icu is Laszlo Boszormenyi (GCS) <gcs@debian.org>;

Reported by: Kees Cook <kees@debian.org>

Date: Thu, 25 Jun 2009 15:42:01 UTC

Severity: normal

Tags: security

Found in version 3.8.1-3+lenny1

Fixed in versions icu/3.8.1-3+lenny2, icu/3.6-2etch4

Done: Jay Berkenbilt <qjb@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to debian-bugs-dist@lists.debian.org, Jay Berkenbilt <qjb@debian.org>:
Bug#534590; Package icu. (Thu, 25 Jun 2009 15:42:04 GMT) (full text, mbox, link).

Acknowledgement sent to Kees Cook <kees@debian.org>:
New Bug report received and forwarded. Copy sent to Jay Berkenbilt <qjb@debian.org>. (Thu, 25 Jun 2009 15:42:04 GMT) (full text, mbox, link).

Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

Package: icu
Version: 3.8.1-3+lenny1
Severity: normal
Tags: security

Hi!

There is a security issue with the stable release of icu (it was fixed in
4.0.1, IIUC):

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-0153

"International Components for Unicode (ICU) 4.0, 3.6, and other 3.x
versions, as used in Apple Mac OS X 10.5 before 10.5.7, iPhone OS 1.0
through 2.2.1, iPhone OS for iPod touch 1.1 through 2.2.1, Fedora 9 and 10,
and possibly other operating systems, does not properly handle invalid byte
sequences during Unicode conversion, which might allow remote attackers to
conduct cross-site scripting (XSS) attacks."

More details are here:
https://bugzilla.redhat.com/show_bug.cgi?id=503071

Thanks!

-Kees

-- 
Kees Cook                                            @debian.org

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Mon, 24 Aug 2009 16:21:03 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Mon, 24 Aug 2009 16:21:03 GMT) (full text, mbox, link).

Message #10 received at 534590@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

The reporter of this ICU security bug that impacts oldstable and stable
but not testing or unstable was kind enough to refer to the Red Hat
bugzilla entry for this.  The people at Red Hat have backported the
security fix to 3.6 (our oldstable) and 3.8 (our stable) versions of ICU
(which appear in RHEL5 and Fedora 9).  I have grabbed their SRPMs for
the patched versions and extracted the patches that apply to the 3.6 and
3.8 versions.  Attached here are the patches directly from those source
RPMs, not modified in any way or tested for debian.  I can integrate
these into the debian packages prepare uploads for stable security and
oldstable security, or I can defer to the security team to do the
integration.  Just let me know.  It may be several days before I have a
chance to work on it, but I have prepared stable security uploads for my
packages before.  I am grateful to Red Hat for doing the work of
backporting to the older ICU versions.

-- 
Jay Berkenbilt <qjb@debian.org>

[icu-3.8-CVE-2009-0153.patch (text/x-patch, inline)]

diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c
--- icu.6175/source/common/ucnv2022.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnv2022.c	2009-06-11 18:09:48.000000000 +0100
@@ -1973,6 +1973,7 @@
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
         cs = (StateEnum)pToU2022State->cs[pToU2022State->g];
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -2102,18 +2103,45 @@
                 default:
                     /* G0 DBCS */
                     if(mySource < mySourceLimit) {
+                        int leadIsOk, trailIsOk;
                         char trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        if(cs == JISX208) {
-                            _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
-                        } else {
-                            tempBuf[0] = (char)mySourceChar;
-                            tempBuf[1] = trailByte;
-                        }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
-                    } else {
+                        trailByte = *mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                         * the 21..7e range, then we treat them as a pair.
+                         * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                         * we report only the first byte as the illegal sequence.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk == trailIsOk) {
+                            ++mySource;
+                            uint32_t tmpSourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
+                            if (leadIsOk) {
+                                if(cs == JISX208) {
+                                    _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
+                                    mySourceChar = tmpSourceChar;
+                                } else {
+                                    /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */
+                                    mySourceChar = tmpSourceChar;
+                                    if (cs == KSC5601) {
+                                        tmpSourceChar += 0x8080;  /* = _2022ToGR94DBCS(tmpSourceChar) */
+                                    }
+                                    tempBuf[0] = (char)(tmpSourceChar >> 8);
+                                    tempBuf[1] = (char)(tmpSourceChar);
+                                }
+                                targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
+                            } else {
+                                mySourceChar = tmpSourceChar;
+                            }
+                          }
+                      } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
                         goto endloop;
@@ -2254,7 +2282,12 @@
             }
             /* only DBCS or SBCS characters are expected*/
             /* DB characters with high bit set to 1 are expected */
-            if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){
+            if( length > 2 || length==0 ||
+                (length == 1 && targetByteUnit > 0x7f) ||
+                (length == 2 &&
+                    ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) ||
+                    (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1)))
+            ) {
                 targetByteUnit=missingCharMarker;
             }
             if (targetByteUnit != missingCharMarker){
@@ -2583,17 +2616,36 @@
             myData->isEmptySegment = FALSE;	/* Any invalid char errors will be detected separately, so just reset this */
             if(myData->toU2022State.g == 1) {
                 if(mySource < mySourceLimit) {
+                    int leadIsOk, trailIsOk;
                     char trailByte;
 getTrailByte:
-                    trailByte = *mySource++;
-                    tempBuf[0] = (char)(mySourceChar + 0x80);
-                    tempBuf[1] = (char)(trailByte + 0x80);
-                    mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                    if((mySourceChar & 0x8080) == 0) {
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
+                    targetUniChar = missingCharMarker;
+                    trailByte = *mySource;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                     * the 21..7e range, then we treat them as a pair.
+                     * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                     * we report only the first byte as the illegal sequence.
+                     */
+                    leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk == trailIsOk) {
+                        ++mySource;
+                        if (leadIsOk) {
+                            tempBuf[0] = (char)(mySourceChar + 0x80);
+                            tempBuf[1] = (char)(trailByte + 0x80);
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
+                        } else {
+                            leadIsOk = TRUE; /* TODO: remove */
+                        }
+                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                     } else {
-                        /* illegal bytes > 0x7f */
-                        targetUniChar = missingCharMarker;
+                        trailIsOk = TRUE; /* TODO: remove */
                     }
                 } else {
                     args->converter->toUBytes[0] = (uint8_t)mySourceChar;
@@ -2601,8 +2653,10 @@
                     break;
                 }
             }
-            else{
+            else if(mySourceChar <= 0x7f) {
                 targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback);
+            } else {
+                targetUniChar = 0xffff;
             }
             if(targetUniChar < 0xfffe){
                 if(args->offsets) {
@@ -3099,6 +3153,7 @@
         /* continue with a partial double-byte character */
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -3178,29 +3233,48 @@
                         UConverterSharedData *cnv;
                         StateEnum tempState;
                         int32_t tempBufLen;
+                        int leadIsOk, trailIsOk;
                         char trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
-                        if(tempState > CNS_11643_0) {
-                            cnv = myData->myConverterArray[CNS_11643];
-                            tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
-                            tempBuf[1] = (char) (mySourceChar);
-                            tempBuf[2] = trailByte;
-                            tempBufLen = 3;
-
-                        }else{
-                            cnv = myData->myConverterArray[tempState];
-                            tempBuf[0] = (char) (mySourceChar);
-                            tempBuf[1] = trailByte;
-                            tempBufLen = 2;
+                        trailByte = *mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                         * the 21..7e range, then we treat them as a pair.
+                         * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                         * we report only the first byte as the illegal sequence.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk == trailIsOk) {
+                            ++mySource;
+                            if (leadIsOk) {
+                                tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
+                                if(tempState >= CNS_11643_0) {
+                                    cnv = myData->myConverterArray[CNS_11643];
+                                    tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
+                                    tempBuf[1] = (char) (mySourceChar);
+                                    tempBuf[2] = trailByte;
+                                    tempBufLen = 3;
+
+                                }else{
+                                    cnv = myData->myConverterArray[tempState];
+                                    tempBuf[0] = (char) (mySourceChar);
+                                    tempBuf[1] = trailByte;
+                                    tempBufLen = 2;
+                                }
+                                targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
+                            }
+                            mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         if(pToU2022State->g>=2) {
                             /* return from a single-shift state to the previous one */
                             pToU2022State->g=pToU2022State->prevG;
                         }
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c
--- icu.6175/source/common/ucnvhz.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnvhz.c	2009-06-11 18:05:36.000000000 +0100
@@ -215,19 +215,35 @@
                 }
                 else{
                     /* trail byte */
+                    int leadIsOk, trailIsOk;
                     uint32_t leadByte = args->converter->toUnicodeStatus & 0xff;
-                    if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) &&
-                        (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21)
-                    ) {
-                        tempBuf[0] = (char) (leadByte+0x80) ;
-                        tempBuf[1] = (char) (mySourceChar+0x80);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
-                            tempBuf, 2, args->converter->useFallback);
+                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In HZ DBCS, if both bytes are valid or both bytes are outside
+                     * the 21..7d/7e range, then we treat them as a pair.
+                     * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                     * we report only the first byte as the illegal sequence.
+                     */
+                    leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21);
+                    trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk == trailIsOk) {
+                        if (leadIsOk) {
+                            tempBuf[0] = (char) (leadByte+0x80) ;
+                            tempBuf[1] = (char) (mySourceChar+0x80);
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
+                                tempBuf, 2, args->converter->useFallback);
+                        }
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     } else {
-                        targetUniChar = 0xffff;
+                        --mySource;
+                        mySourceChar = (int32_t)leadByte;
                     }
-                    /* add another bit so that the code below writes 2 bytes in case of error */
-                    mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     args->converter->toUnicodeStatus =0x00;
                 }
             }
diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c
--- icu.6175/source/common/ucnvmbcs.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnvmbcs.c	2009-06-11 18:05:36.000000000 +0100
@@ -1,7 +1,7 @@
 /*
 ******************************************************************************
 *
-*   Copyright (C) 2000-2007, International Business Machines
+*   Copyright (C) 2000-2008, International Business Machines
 *   Corporation and others.  All Rights Reserved.
 *
 ******************************************************************************
@@ -1791,6 +1791,65 @@
     pArgs->offsets=offsets;
 }
 
+static UBool
+hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) {
+    const int32_t *row=stateTable[state];
+    int32_t b, entry;
+    /* First test for final entries in this state for some commonly valid byte values. */
+    entry=row[0xa1];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    entry=row[0x41];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    /* Then test for final entries in this state. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+            MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+        ) {
+            return TRUE;
+        }
+    }
+    /* Then recurse for transition entries. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( MBCS_ENTRY_IS_TRANSITION(entry) &&
+            hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry))
+        ) {
+            return TRUE;
+        }
+    }
+    return FALSE;
+}
+
+/*
+ * Is byte b a single/lead byte in this state?
+ * Recurse for transition states, because here we don't want to say that
+ * b is a lead byte if all byte sequences that start with b are illegal.
+ */
+static UBool
+isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) {
+    const int32_t *row=stateTable[state];
+    int32_t entry=row[b];
+    if(MBCS_ENTRY_IS_TRANSITION(entry)) {   /* lead byte */
+        return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry));
+    } else {
+        uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry));
+        if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) {
+            return FALSE;   /* SI/SO are illegal for DBCS-only conversion */
+        } else {
+            return action!=MBCS_STATE_ILLEGAL;
+        }
+    }
+}
+
 U_CFUNC void
 ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs,
                           UErrorCode *pErrorCode) {
@@ -2146,6 +2205,34 @@
             sourceIndex=nextSourceIndex;
         } else if(U_FAILURE(*pErrorCode)) {
             /* callback(illegal) */
+            if(byteIndex>1) {
+                /*
+                 * Ticket 5691: consistent illegal sequences:
+                 * - We include at least the first byte in the illegal sequence.
+                 * - If any of the non-initial bytes could be the start of a character,
+                 *   we stop the illegal sequence before the first one of those.
+                 */
+                UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+                int8_t i;
+                for(i=1;
+                    i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]);
+                    ++i) {}
+                if(i<byteIndex) {
+                    /* Back out some bytes. */
+                    int8_t backOutDistance=byteIndex-i;
+                    int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source);
+                    byteIndex=i;  /* length of reported illegal byte sequence */
+                    if(backOutDistance<=bytesFromThisBuffer) {
+                        source-=backOutDistance;
+                    } else {
+                        /* Back out bytes from the previous buffer: Need to replay them. */
+                        cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                        /* preToULength is negative! */
+                        uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength);
+                        source=(const uint8_t *)pArgs->source;
+                    }
+                }
+            }
             break;
         } else /* unassigned sequences indicated with byteIndex>0 */ {
             /* try an extension mapping */
@@ -2156,6 +2243,7 @@
                               &offsets, sourceIndex,
                               pArgs->flush,
                               pErrorCode);
+            /* TODO: nextSourceIndex+=diff instead of nextSourceIndex+diff ?? */
             sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source);
 
             if(U_FAILURE(*pErrorCode)) {
@@ -2447,15 +2535,37 @@
 
     if(c<0) {
         if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) {
-            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
-        }
-        if(U_FAILURE(*pErrorCode)) {
             /* incomplete character byte sequence */
             uint8_t *bytes=cnv->toUBytes;
             cnv->toULength=(int8_t)(source-lastSource);
             do {
                 *bytes++=*lastSource++;
             } while(lastSource<source);
+            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
+        } else if(U_FAILURE(*pErrorCode)) {
+            /* callback(illegal) */
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             */
+            UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+            uint8_t *bytes=cnv->toUBytes;
+            *bytes++=*lastSource++;     /* first byte */
+            if(lastSource==source) {
+                cnv->toULength=1;
+            } else /* lastSource<source: multi-byte character */ {
+                int8_t i;
+                for(i=1;
+                    lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource);
+                    ++i
+                ) {
+                    *bytes++=*lastSource++;
+                }
+                cnv->toULength=i;
+                source=lastSource;
+            }
         } else {
             /* no output because of empty input or only state changes */
             *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR;
diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c
--- icu.6175/source/test/cintltst/nccbtst.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/cintltst/nccbtst.c	2009-06-11 18:05:36.000000000 +0100
@@ -2497,13 +2497,13 @@
 
 
     static const uint8_t text943[] = {
-        0x82, 0xa9, 0x82, 0x20, /*0xc8,*/  0x61, 0x8a, 0xbf, 0x8e, 0x9a };
-    static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
-    static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
+        0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a };
+    static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22,  0x5b57 };
+    static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22,  0x5b57 };
     static const UChar toUnicode943stop[]= { 0x304b};
 
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 4, 5, 7};
-    static const int32_t  fromIBM943Offsskip[] = { 0, 4, 5, 7};
+    static const int32_t  fromIBM943Offssub[]  = { 0, 2, 3, 4, 5, 7 };
+    static const int32_t  fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 };
     static const int32_t  fromIBM943Offsstop[] = { 0};
 
     gInBufferSize = inputsize;
@@ -2537,9 +2537,9 @@
 {
     static const uint8_t sampleText[] = {
         0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82,
-        0xff, /*0x82, 0xa9,*/ 0x32, 0x33};
-    static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063,  0xfffd,/*0x304b,*/ 0x0032, 0x0033};
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 3, 4, 5, 7, 8};
+        0xff, 0x32, 0x33};
+    static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 };
+    static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 };
     /*checking illegal value for ibm-943 with substitute*/ 
     gInBufferSize = inputsize;
     gOutBufferSize = outputsize;
diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c
--- icu.6175/source/test/cintltst/nucnvtst.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/cintltst/nucnvtst.c	2009-06-11 18:05:36.000000000 +0100
@@ -2608,7 +2608,7 @@
     TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source");
     /*Test for the condition where there is an invalid character*/
     {
-        static const uint8_t source2[]={0xa1, 0x01};
+        static const uint8_t source2[]={0xa1, 0x80};
         TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character");
     }
     /*Test for the condition where we have a truncated char*/
@@ -3901,11 +3901,11 @@
 TestISO_2022_KR() {
     /* test input */
     static const uint16_t in[]={
-                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D
-                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04
+                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D
+                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04
                    ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029
                    ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB
-                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2
+                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2
                    ,0x53E3,0x53E4,0x000A,0x000D};
     const UChar* uSource;
     const UChar* uSourceLimit;
diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt
--- icu.6175/source/test/testdata/conversion.txt	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/testdata/conversion.txt	2009-06-11 18:05:36.000000000 +0100
@@ -48,12 +48,83 @@
     toUnicode {
       Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" }
       Cases {
+        // Test ticket 5691: consistent illegal sequences
+        // Unfortunately, we cannot use the Shift-JIS examples from the ticket
+        // comments because our Shift-JIS table is Windows-compatible and
+        // therefore has no illegal single bytes. Same for GBK.
+        // Instead, we use the stricter GB 18030 also for 2-byte examples.
+        // The byte sequences are generally slightly different from the ticket
+        // comment, simply using assigned characters rather than just
+        // theoretically valid sequences.
+        {
+          "gb18030",
+          :bin{ 618140813c81ff7a },
+          "a\u4e02\\x81<\\x81\\xFFz",
+          :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "EUC-JP",
+          :bin{ 618fb0a98fb03c8f3cb0a97a },
+          "a\u4e28\\x8F\\xB0<\\x8F<\u9022z",
+          :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "gb18030",
+          :bin{ 618130fc318130fc8181303c3e813cfc817a },
+          "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z",
+          :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "UTF-8",
+          :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a },
+          "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z",
+          :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-JP-2",
+          :bin{ 1b24424141af4142affe41431b2842 },
+          "\u758f\\xAF\u758e\\xAF\\xFE\u790e",
+          :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ibm-25546",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-KR",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 411b242941420e4141af4142affe41430f5a },
+          "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "HZ",
+          :bin{ 417e7b4141af4142affe41437e7d5a },
+          "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
         // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e
         {
           "HZ",
           :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b },
-          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
-          :intvector{ 2,4,6,8,10,12,14,18,19,21,24 },
+          "\u3000\ufffd\ufffd\u3013\ufffd\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
+          :intvector{ 2,4,5,6,8,9,10,12,14,18,19,21,24 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and
@@ -61,8 +132,8 @@
         {
           "ISO-2022-JP",
           :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 },
-          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
-          :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 },
+          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\ufffd\ufffd\u25b2\ufffd\ufffd\u6f3e",
+          :intvector{ 3,4,5,9,11,12,13,14,16,17,19,20,21,22,23,25,26,27 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()

[icu-3.6-CVE-2009-0153.patch (text/x-patch, inline)]

diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c
--- icu.6175/source/common/ucnv2022.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnv2022.c	2009-06-02 16:03:15.000000000 +0100
@@ -754,6 +754,7 @@
     UConverterDataISO2022* myData2022 = ((UConverterDataISO2022*)_this->extraInfo);
     uint32_t key = myData2022->key;
     int32_t offset = 0;
+    int8_t initialToULength = _this->toULength;
     char c;
 
     value = VALID_NON_TERMINAL_2022;
@@ -806,7 +807,6 @@
         return;
     } else if (value == INVALID_2022 ) {
         *err = U_ILLEGAL_ESCAPE_SEQUENCE;
-        return;
     } else /* value == VALID_TERMINAL_2022 */ {
         switch(var){
 #ifdef U_ENABLE_GENERIC_ISO_2022
@@ -938,6 +938,35 @@
     }
     if(U_SUCCESS(*err)) {
         _this->toULength = 0;
+    } else if(*err==U_ILLEGAL_ESCAPE_SEQUENCE) {
+        if(_this->toULength>1) {
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte (ESC) in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             *   In escape sequences, all following bytes are "printable", that is,
+             *   unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS),
+             *   they are valid single/lead bytes.
+             *   For simplicity, we always only report the initial ESC byte as the
+             *   illegal sequence and back out all other bytes we looked at.
+             */
+            /* Back out some bytes. */
+            int8_t backOutDistance=_this->toULength-1;
+            int8_t bytesFromThisBuffer=_this->toULength-initialToULength;
+            if(backOutDistance<=bytesFromThisBuffer) {
+                /* same as initialToULength<=1 */
+                *source-=backOutDistance;
+            } else {
+                /* Back out bytes from the previous buffer: Need to replay them. */
+                _this->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                /* same as -(initialToULength-1) */
+                /* preToULength is negative! */
+                uprv_memcpy(_this->preToU, _this->toUBytes+1, -_this->preToULength);
+                *source-=bytesFromThisBuffer;
+            }
+            _this->toULength=1;
+        }
     } else if(*err==U_UNSUPPORTED_ESCAPE_SEQUENCE) {
         _this->toUCallbackReason = UCNV_UNASSIGNED;
     }
@@ -1973,6 +2002,7 @@
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
         cs = (StateEnum)pToU2022State->cs[pToU2022State->g];
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -2102,17 +2132,44 @@
                 default:
                     /* G0 DBCS */
                     if(mySource < mySourceLimit) {
-                        char trailByte;
+                        int leadIsOk, trailIsOk;
+                        uint8_t trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        if(cs == JISX208) {
-                            _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
-                        } else {
-                            tempBuf[0] = (char)mySourceChar;
-                            tempBuf[1] = trailByte;
+                        trailByte = (uint8_t)*mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                         * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                         * Otherwise we convert or report the pair of bytes.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk && trailIsOk) {
+                            ++mySource;
+                            uint32_t tmpSourceChar = (mySourceChar << 8) | trailByte;
+                            if(cs == JISX208) {
+                                _2022ToSJIS((uint8_t)mySourceChar, trailByte, tempBuf);
+                                mySourceChar = tmpSourceChar;
+                            } else {
+                                /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */
+                                mySourceChar = tmpSourceChar;
+                                if (cs == KSC5601) {
+                                    tmpSourceChar += 0x8080;  /* = _2022ToGR94DBCS(tmpSourceChar) */
+                                }
+                                tempBuf[0] = (char)(tmpSourceChar >> 8);
+                                tempBuf[1] = (char)(tmpSourceChar);
+                            }
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
+                        } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                            /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                            ++mySource;
+                            /* add another bit so that the code below writes 2 bytes in case of error */
+                            mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
@@ -2254,7 +2311,12 @@
             }
             /* only DBCS or SBCS characters are expected*/
             /* DB characters with high bit set to 1 are expected */
-            if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){
+            if( length > 2 || length==0 ||
+                (length == 1 && targetByteUnit > 0x7f) ||
+                (length == 2 &&
+                    ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) ||
+                    (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1)))
+            ) {
                 targetByteUnit=missingCharMarker;
             }
             if (targetByteUnit != missingCharMarker){
@@ -2583,17 +2645,34 @@
             myData->isEmptySegment = FALSE;	/* Any invalid char errors will be detected separately, so just reset this */
             if(myData->toU2022State.g == 1) {
                 if(mySource < mySourceLimit) {
-                    char trailByte;
+                    int leadIsOk, trailIsOk;
+                    uint8_t trailByte;
 getTrailByte:
-                    trailByte = *mySource++;
-                    tempBuf[0] = (char)(mySourceChar + 0x80);
-                    tempBuf[1] = (char)(trailByte + 0x80);
-                    mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                    if((mySourceChar & 0x8080) == 0) {
+                    targetUniChar = missingCharMarker;
+                    trailByte = (uint8_t)*mySource;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                     * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                     * Otherwise we convert or report the pair of bytes.
+                     */
+                    leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk && trailIsOk) {
+                        ++mySource;
+                        tempBuf[0] = (char)(mySourceChar + 0x80);
+                        tempBuf[1] = (char)(trailByte + 0x80);
                         targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
-                    } else {
-                        /* illegal bytes > 0x7f */
-                        targetUniChar = missingCharMarker;
+                        mySourceChar = (mySourceChar << 8) | trailByte;
+                    } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                        /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                        ++mySource;
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                     }
                 } else {
                     args->converter->toUBytes[0] = (uint8_t)mySourceChar;
@@ -2601,8 +2680,10 @@
                     break;
                 }
             }
-            else{
+            else if(mySourceChar <= 0x7f) {
                 targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback);
+            } else {
+                targetUniChar = 0xffff;
             }
             if(targetUniChar < 0xfffe){
                 if(args->offsets) {
@@ -3099,6 +3180,7 @@
         /* continue with a partial double-byte character */
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -3178,29 +3260,50 @@
                         UConverterSharedData *cnv;
                         StateEnum tempState;
                         int32_t tempBufLen;
-                        char trailByte;
+                        int leadIsOk, trailIsOk;
+                        uint8_t trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
-                        if(tempState > CNS_11643_0) {
-                            cnv = myData->myConverterArray[CNS_11643];
-                            tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
-                            tempBuf[1] = (char) (mySourceChar);
-                            tempBuf[2] = trailByte;
-                            tempBufLen = 3;
-
-                        }else{
-                            cnv = myData->myConverterArray[tempState];
-                            tempBuf[0] = (char) (mySourceChar);
-                            tempBuf[1] = trailByte;
-                            tempBufLen = 2;
+                        trailByte = (uint8_t)*mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                         * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                         * Otherwise we convert or report the pair of bytes.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk && trailIsOk) {
+                            ++mySource;
+                            tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
+                            if(tempState >= CNS_11643_0) {
+                                cnv = myData->myConverterArray[CNS_11643];
+                                tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
+                                tempBuf[1] = (char) (mySourceChar);
+                                tempBuf[2] = (char) trailByte;
+                                tempBufLen = 3;
+
+                            }else{
+                                cnv = myData->myConverterArray[tempState];
+                                tempBuf[0] = (char) (mySourceChar);
+                                tempBuf[1] = (char) trailByte;
+                                tempBufLen = 2;
+                            }
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
+                            mySourceChar = (mySourceChar << 8) | trailByte;
+                        } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                            /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                            ++mySource;
+                            /* add another bit so that the code below writes 2 bytes in case of error */
+                            mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         if(pToU2022State->g>=2) {
                             /* return from a single-shift state to the previous one */
                             pToU2022State->g=pToU2022State->prevG;
                         }
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c
--- icu.6175/source/common/ucnvhz.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnvhz.c	2009-06-02 15:57:18.000000000 +0100
@@ -196,10 +196,30 @@
                      /* if the first byte is equal to TILDE and the trail byte
                      * is not a valid byte then it is an error condition
                      */
-                    mySourceChar = 0x7e00 | mySourceChar;
-                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     */
                     myData->isEmptySegment = FALSE; /* different error here, reset this to avoid spurious future error */
-                    break;
+                    *err = U_ILLEGAL_ESCAPE_SEQUENCE;
+                    args->converter->toUBytes[0] = UCNV_TILDE;
+                    if( myData->isStateDBCS ?
+                            (0x21 <= mySourceChar && mySourceChar <= 0x7e) :
+                            mySourceChar <= 0x7f
+                    ) {
+                        /* The current byte could be the start of a character: Back it out. */
+                        args->converter->toULength = 1;
+                        --mySource;
+                    } else {
+                        /* Include the current byte in the illegal sequence. */
+                        args->converter->toUBytes[1] = mySourceChar;
+                        args->converter->toULength = 2;
+                    }
+                    args->target = myTarget;
+                    args->source = mySource;
+                    return;
                 }
             } else if(myData->isStateDBCS) {
                 if(args->converter->toUnicodeStatus == 0x00){
@@ -215,19 +235,36 @@
                 }
                 else{
                     /* trail byte */
+                    int leadIsOk, trailIsOk;
                     uint32_t leadByte = args->converter->toUnicodeStatus & 0xff;
-                    if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) &&
-                        (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21)
-                    ) {
+                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In HZ DBCS, if the second byte is in the 21..7e range,
+                     * we report only the first byte as the illegal sequence.
+                     * Otherwise we convert or report the pair of bytes.
+                     */
+                    leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21);
+                    trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk && trailIsOk) {
                         tempBuf[0] = (char) (leadByte+0x80) ;
                         tempBuf[1] = (char) (mySourceChar+0x80);
                         targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
                             tempBuf, 2, args->converter->useFallback);
+                        mySourceChar= (leadByte << 8) | mySourceChar;
+                    } else if (trailIsOk) {
+                        /* report a single illegal byte and continue with the following DBCS starter byte */
+                        --mySource;
+                        mySourceChar = (int32_t)leadByte;
                     } else {
-                        targetUniChar = 0xffff;
+                        /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     }
-                    /* add another bit so that the code below writes 2 bytes in case of error */
-                    mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     args->converter->toUnicodeStatus =0x00;
                 }
             }
diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c
--- icu.6175/source/common/ucnvmbcs.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnvmbcs.c	2009-06-02 15:56:07.000000000 +0100
@@ -1697,6 +1697,65 @@
     pArgs->offsets=offsets;
 }
 
+static UBool
+hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) {
+    const int32_t *row=stateTable[state];
+    int32_t b, entry;
+    /* First test for final entries in this state for some commonly valid byte values. */
+    entry=row[0xa1];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    entry=row[0x41];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    /* Then test for final entries in this state. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+            MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+        ) {
+            return TRUE;
+        }
+    }
+    /* Then recurse for transition entries. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( MBCS_ENTRY_IS_TRANSITION(entry) &&
+            hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry))
+        ) {
+            return TRUE;
+        }
+    }
+    return FALSE;
+}
+
+/*
+ * Is byte b a single/lead byte in this state?
+ * Recurse for transition states, because here we don't want to say that
+ * b is a lead byte if all byte sequences that start with b are illegal.
+ */
+static UBool
+isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) {
+    const int32_t *row=stateTable[state];
+    int32_t entry=row[b];
+    if(MBCS_ENTRY_IS_TRANSITION(entry)) {   /* lead byte */
+        return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry));
+    } else {
+        uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry));
+        if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) {
+            return FALSE;   /* SI/SO are illegal for DBCS-only conversion */
+        } else {
+            return action!=MBCS_STATE_ILLEGAL;
+        }
+    }
+}
+
 U_CFUNC void
 ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs,
                           UErrorCode *pErrorCode) {
@@ -2052,6 +2111,34 @@
             sourceIndex=nextSourceIndex;
         } else if(U_FAILURE(*pErrorCode)) {
             /* callback(illegal) */
+            if(byteIndex>1) {
+                /*
+                 * Ticket 5691: consistent illegal sequences:
+                 * - We include at least the first byte in the illegal sequence.
+                 * - If any of the non-initial bytes could be the start of a character,
+                 *   we stop the illegal sequence before the first one of those.
+                 */
+                UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+                int8_t i;
+                for(i=1;
+                    i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]);
+                    ++i) {}
+                if(i<byteIndex) {
+                    /* Back out some bytes. */
+                    int8_t backOutDistance=byteIndex-i;
+                    int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source);
+                    byteIndex=i;  /* length of reported illegal byte sequence */
+                    if(backOutDistance<=bytesFromThisBuffer) {
+                        source-=backOutDistance;
+                    } else {
+                        /* Back out bytes from the previous buffer: Need to replay them. */
+                        cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                        /* preToULength is negative! */
+                        uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength);
+                        source=(const uint8_t *)pArgs->source;
+                    }
+                }
+            }
             break;
         } else /* unassigned sequences indicated with byteIndex>0 */ {
             /* try an extension mapping */
@@ -2062,7 +2149,7 @@
                               &offsets, sourceIndex,
                               pArgs->flush,
                               pErrorCode);
-            sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source);
+            sourceIndex=nextSourceIndex+=(int32_t)(source-(const uint8_t *)pArgs->source);
 
             if(U_FAILURE(*pErrorCode)) {
                 /* not mappable or buffer overflow */
@@ -2353,15 +2440,37 @@
 
     if(c<0) {
         if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) {
-            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
-        }
-        if(U_FAILURE(*pErrorCode)) {
             /* incomplete character byte sequence */
             uint8_t *bytes=cnv->toUBytes;
             cnv->toULength=(int8_t)(source-lastSource);
             do {
                 *bytes++=*lastSource++;
             } while(lastSource<source);
+            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
+        } else if(U_FAILURE(*pErrorCode)) {
+            /* callback(illegal) */
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             */
+            UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+            uint8_t *bytes=cnv->toUBytes;
+            *bytes++=*lastSource++;     /* first byte */
+            if(lastSource==source) {
+                cnv->toULength=1;
+            } else /* lastSource<source: multi-byte character */ {
+                int8_t i;
+                for(i=1;
+                    lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource);
+                    ++i
+                ) {
+                    *bytes++=*lastSource++;
+                }
+                cnv->toULength=i;
+                source=lastSource;
+            }
         } else {
             /* no output because of empty input or only state changes */
             *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR;
diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c
--- icu.6175/source/test/cintltst/nccbtst.c	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/cintltst/nccbtst.c	2009-06-02 15:47:38.000000000 +0100
@@ -2497,13 +2497,13 @@
 
 
     static const uint8_t text943[] = {
-        0x82, 0xa9, 0x82, 0x20, /*0xc8,*/  0x61, 0x8a, 0xbf, 0x8e, 0x9a };
-    static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
-    static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
+        0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a };
+    static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22,  0x5b57 };
+    static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22,  0x5b57 };
     static const UChar toUnicode943stop[]= { 0x304b};
 
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 4, 5, 7};
-    static const int32_t  fromIBM943Offsskip[] = { 0, 4, 5, 7};
+    static const int32_t  fromIBM943Offssub[]  = { 0, 2, 3, 4, 5, 7 };
+    static const int32_t  fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 };
     static const int32_t  fromIBM943Offsstop[] = { 0};
 
     gInBufferSize = inputsize;
@@ -2537,9 +2537,9 @@
 {
     static const uint8_t sampleText[] = {
         0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82,
-        0xff, /*0x82, 0xa9,*/ 0x32, 0x33};
-    static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063,  0xfffd,/*0x304b,*/ 0x0032, 0x0033};
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 3, 4, 5, 7, 8};
+        0xff, 0x32, 0x33};
+    static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 };
+    static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 };
     /*checking illegal value for ibm-943 with substitute*/ 
     gInBufferSize = inputsize;
     gOutBufferSize = outputsize;
diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c
--- icu.6175/source/test/cintltst/nucnvtst.c	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/cintltst/nucnvtst.c	2009-06-02 15:47:38.000000000 +0100
@@ -2606,7 +2606,7 @@
     TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source");
     /*Test for the condition where there is an invalid character*/
     {
-        static const uint8_t source2[]={0xa1, 0x01};
+        static const uint8_t source2[]={0xa1, 0x80};
         TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character");
     }
     /*Test for the condition where we have a truncated char*/
@@ -3899,11 +3899,11 @@
 TestISO_2022_KR() {
     /* test input */
     static const uint16_t in[]={
-                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D
-                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04
+                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D
+                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04
                    ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029
                    ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB
-                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2
+                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2
                    ,0x53E3,0x53E4,0x000A,0x000D};
     const UChar* uSource;
     const UChar* uSourceLimit;
diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt
--- icu.6175/source/test/testdata/conversion.txt	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/testdata/conversion.txt	2009-06-02 15:57:41.000000000 +0100
@@ -48,12 +48,144 @@
     toUnicode {
       Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" }
       Cases {
+        // Test ticket 5691: consistent illegal sequences
+        // The following test cases are for illegal character byte sequences.
+        //
+        // Unfortunately, we cannot use the Shift-JIS examples from the ticket
+        // comments because our Shift-JIS table is Windows-compatible and
+        // therefore has no illegal single bytes. Same for GBK.
+        // Instead, we use the stricter GB 18030 also for 2-byte examples.
+        // The byte sequences are generally slightly different from the ticket
+        // comment, simply using assigned characters rather than just
+        // theoretically valid sequences.
+        {
+          "gb18030",
+          :bin{ 618140813c81ff7a },
+          "a\u4e02\\x81<\\x81\\xFFz",
+          :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "EUC-JP",
+          :bin{ 618fb0a98fb03c8f3cb0a97a },
+          "a\u4e28\\x8F\\xB0<\\x8F<\u9022z",
+          :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "gb18030",
+          :bin{ 618130fc318130fc8181303c3e813cfc817a },
+          "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z",
+          :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "UTF-8",
+          :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a },
+          "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z",
+          :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-JP",
+          :bin{ 1b24424141af4142affe41431b2842 },
+          "\u758f\\xAF\u758e\\xAF\\xFE\u790e",
+          :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ibm-25546",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-KR",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 411b242941420e4141af4142affe41430f5a },
+          "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "HZ",
+          :bin{ 417e7b4141af4142affe41437e7d5a },
+          "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: consistent illegal sequences
+        // The following test cases are for illegal escape/designator/shift sequences.
+        //
+        // ISO-2022-JP and -CN with illegal escape sequences.
+        {
+          "ISO-2022-JP",
+          :bin{ 611b24201b244241411b283f1b28427a },
+          "a\\x1B$ \u758f\\x1B\u2538z",
+          :intvector{ 0,1,1,1,1,2,3,7,9,9,9,9,10,15 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 611b2429201b2429410e41410f7a },
+          "a\\x1B$) \u4eaez",
+          :intvector{ 0,1,1,1,1,2,3,4,10,13 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: ISO-2022-JP-2 with illegal single-shift SS2 and SS3 sequences.
+        // The first ESC N comes before its designator sequence, the last sequence is ESC+space.
+        {
+          "ISO-2022-JP-2",
+          :bin{ 4e1b4e4e1b2e414e1b4e4e4e1b204e },
+          "N\\x1BNNN\xceN\\x1B N",
+          :intvector{ 0,1,1,1,1,2,3,7,10,11,12,12,12,12,13,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN-EXT",
+          :bin{ 4e1b4e4e1b242a484e1b4e4e4e4e1b204e },
+          "N\\x1BNNN\u8f0eN\\x1B N",
+          :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN-EXT",
+          :bin{ 4f1b4f4f1b242b494f1b4f4f4f4f1b204f },
+          "O\\x1BOOO\u492bO\\x1B O",
+          :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: HZ with illegal tilde sequences.
+        {
+          "HZ",
+          :bin{ 417e20427e21437e80447e7b41417e207e41427e7f41437e7d5a },
+          "A\\x7E B\\x7E!C\\x7E\\x80D\u4eae\\x7E\\x20\\x7E\u8c05\\x7E\\x7F\u64a9Z",
+          :intvector{ 0,1,1,1,1,2,3,4,4,4,4,5,6,7,7,7,7,7,7,7,7,9,                          // SBCS
+                      12,14,14,14,14,14,14,14,14,16,16,16,16,17,19,19,19,19,19,19,19,19,21, // DBCS
+                      25 },                                                                 // SBCS
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: Example from Peter Edberg.
+        {
+          "ISO-2022-JP",
+          :bin{ 1b244230212f7e742630801b284a621b2458631b2842648061 },
+          "\u4e9c\ufffd\u7199\ufffdb\ufffd$Xcd\ufffda",
+          :intvector{ 3,5,7,9,14,15,16,17,18,22,23,24 },
+          :int{1}, :int{0}, "", "?", :bin{""}
+        }
         // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e
         {
           "HZ",
-          :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b },
-          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
-          :intvector{ 2,4,6,8,10,12,14,18,19,21,24 },
+          :bin{ 7e7b21212120217e217f772100007e217e7e7d207e7e807e0a2b },
+          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd\u3013 ~\ufffd+",
+          :intvector{ 2,4,6,8,10,12,14,15,19,20,22,25 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and
@@ -61,8 +193,8 @@
         {
           "ISO-2022-JP",
           :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 },
-          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
-          :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 },
+          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
+          :intvector{ 3,4,5,9,11,12,14,16,17,19,21,23,25,27 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()
@@ -341,7 +473,7 @@
         {
           "ISO-2022-CN-EXT",
           :bin{ 411b4e2121 }, "\x41", :intvector{ 0 },
-          :int{1}, :int{1}, "illesc", ".", :bin{ 1b4e }
+          :int{1}, :int{1}, "illesc", ".", :bin{ 1b }
         }
         // G3 designator: recognized, but not supported for -CN (only for -CN-EXT)
         {

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 25 Aug 2009 20:21:03 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 25 Aug 2009 20:21:03 GMT) (full text, mbox, link).

Message #15 received at 534590@bugs.debian.org (full text, mbox, reply):

(Sorry if this is a duplicate -- I sent this yesterday but it still
hasn't shown up on the bug in the BTS, so I don't know whether it got
out.)

The reporter of this ICU security bug that impacts oldstable and stable
but not testing or unstable was kind enough to refer to the Red Hat
bugzilla entry for this.  The people at Red Hat have backported the
security fix to 3.6 (our oldstable) and 3.8 (our stable) versions of ICU
(which appear in RHEL5 and Fedora 9).  I have grabbed their SRPMs for
the patched versions and extracted the patches that apply to the 3.6 and
3.8 versions.  Attached here are the patches directly from those source
RPMs, not modified in any way or tested for debian.  I can integrate
these into the debian packages prepare uploads for stable security and
oldstable security, or I can defer to the security team to do the
integration.  Just let me know.  It may be several days before I have a
chance to work on it, but I have prepared stable security uploads for my
packages before.  I am grateful to Red Hat for doing the work of
backporting to the older ICU versions.

-- 
Jay Berkenbilt <qjb@debian.org>

[2. text/x-patch; icu-3.8-CVE-2009-0153.patch]

diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c
--- icu.6175/source/common/ucnv2022.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnv2022.c	2009-06-11 18:09:48.000000000 +0100
@@ -1973,6 +1973,7 @@
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
         cs = (StateEnum)pToU2022State->cs[pToU2022State->g];
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -2102,18 +2103,45 @@
                 default:
                     /* G0 DBCS */
                     if(mySource < mySourceLimit) {
+                        int leadIsOk, trailIsOk;
                         char trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        if(cs == JISX208) {
-                            _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
-                        } else {
-                            tempBuf[0] = (char)mySourceChar;
-                            tempBuf[1] = trailByte;
-                        }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
-                    } else {
+                        trailByte = *mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                         * the 21..7e range, then we treat them as a pair.
+                         * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                         * we report only the first byte as the illegal sequence.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk == trailIsOk) {
+                            ++mySource;
+                            uint32_t tmpSourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
+                            if (leadIsOk) {
+                                if(cs == JISX208) {
+                                    _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
+                                    mySourceChar = tmpSourceChar;
+                                } else {
+                                    /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */
+                                    mySourceChar = tmpSourceChar;
+                                    if (cs == KSC5601) {
+                                        tmpSourceChar += 0x8080;  /* = _2022ToGR94DBCS(tmpSourceChar) */
+                                    }
+                                    tempBuf[0] = (char)(tmpSourceChar >> 8);
+                                    tempBuf[1] = (char)(tmpSourceChar);
+                                }
+                                targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
+                            } else {
+                                mySourceChar = tmpSourceChar;
+                            }
+                          }
+                      } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
                         goto endloop;
@@ -2254,7 +2282,12 @@
             }
             /* only DBCS or SBCS characters are expected*/
             /* DB characters with high bit set to 1 are expected */
-            if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){
+            if( length > 2 || length==0 ||
+                (length == 1 && targetByteUnit > 0x7f) ||
+                (length == 2 &&
+                    ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) ||
+                    (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1)))
+            ) {
                 targetByteUnit=missingCharMarker;
             }
             if (targetByteUnit != missingCharMarker){
@@ -2583,17 +2616,36 @@
             myData->isEmptySegment = FALSE;	/* Any invalid char errors will be detected separately, so just reset this */
             if(myData->toU2022State.g == 1) {
                 if(mySource < mySourceLimit) {
+                    int leadIsOk, trailIsOk;
                     char trailByte;
 getTrailByte:
-                    trailByte = *mySource++;
-                    tempBuf[0] = (char)(mySourceChar + 0x80);
-                    tempBuf[1] = (char)(trailByte + 0x80);
-                    mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                    if((mySourceChar & 0x8080) == 0) {
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
+                    targetUniChar = missingCharMarker;
+                    trailByte = *mySource;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                     * the 21..7e range, then we treat them as a pair.
+                     * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                     * we report only the first byte as the illegal sequence.
+                     */
+                    leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk == trailIsOk) {
+                        ++mySource;
+                        if (leadIsOk) {
+                            tempBuf[0] = (char)(mySourceChar + 0x80);
+                            tempBuf[1] = (char)(trailByte + 0x80);
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
+                        } else {
+                            leadIsOk = TRUE; /* TODO: remove */
+                        }
+                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                     } else {
-                        /* illegal bytes > 0x7f */
-                        targetUniChar = missingCharMarker;
+                        trailIsOk = TRUE; /* TODO: remove */
                     }
                 } else {
                     args->converter->toUBytes[0] = (uint8_t)mySourceChar;
@@ -2601,8 +2653,10 @@
                     break;
                 }
             }
-            else{
+            else if(mySourceChar <= 0x7f) {
                 targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback);
+            } else {
+                targetUniChar = 0xffff;
             }
             if(targetUniChar < 0xfffe){
                 if(args->offsets) {
@@ -3099,6 +3153,7 @@
         /* continue with a partial double-byte character */
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -3178,29 +3233,48 @@
                         UConverterSharedData *cnv;
                         StateEnum tempState;
                         int32_t tempBufLen;
+                        int leadIsOk, trailIsOk;
                         char trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
-                        if(tempState > CNS_11643_0) {
-                            cnv = myData->myConverterArray[CNS_11643];
-                            tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
-                            tempBuf[1] = (char) (mySourceChar);
-                            tempBuf[2] = trailByte;
-                            tempBufLen = 3;
-
-                        }else{
-                            cnv = myData->myConverterArray[tempState];
-                            tempBuf[0] = (char) (mySourceChar);
-                            tempBuf[1] = trailByte;
-                            tempBufLen = 2;
+                        trailByte = *mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                         * the 21..7e range, then we treat them as a pair.
+                         * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                         * we report only the first byte as the illegal sequence.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk == trailIsOk) {
+                            ++mySource;
+                            if (leadIsOk) {
+                                tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
+                                if(tempState >= CNS_11643_0) {
+                                    cnv = myData->myConverterArray[CNS_11643];
+                                    tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
+                                    tempBuf[1] = (char) (mySourceChar);
+                                    tempBuf[2] = trailByte;
+                                    tempBufLen = 3;
+
+                                }else{
+                                    cnv = myData->myConverterArray[tempState];
+                                    tempBuf[0] = (char) (mySourceChar);
+                                    tempBuf[1] = trailByte;
+                                    tempBufLen = 2;
+                                }
+                                targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
+                            }
+                            mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         if(pToU2022State->g>=2) {
                             /* return from a single-shift state to the previous one */
                             pToU2022State->g=pToU2022State->prevG;
                         }
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c
--- icu.6175/source/common/ucnvhz.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnvhz.c	2009-06-11 18:05:36.000000000 +0100
@@ -215,19 +215,35 @@
                 }
                 else{
                     /* trail byte */
+                    int leadIsOk, trailIsOk;
                     uint32_t leadByte = args->converter->toUnicodeStatus & 0xff;
-                    if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) &&
-                        (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21)
-                    ) {
-                        tempBuf[0] = (char) (leadByte+0x80) ;
-                        tempBuf[1] = (char) (mySourceChar+0x80);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
-                            tempBuf, 2, args->converter->useFallback);
+                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In HZ DBCS, if both bytes are valid or both bytes are outside
+                     * the 21..7d/7e range, then we treat them as a pair.
+                     * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                     * we report only the first byte as the illegal sequence.
+                     */
+                    leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21);
+                    trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk == trailIsOk) {
+                        if (leadIsOk) {
+                            tempBuf[0] = (char) (leadByte+0x80) ;
+                            tempBuf[1] = (char) (mySourceChar+0x80);
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
+                                tempBuf, 2, args->converter->useFallback);
+                        }
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     } else {
-                        targetUniChar = 0xffff;
+                        --mySource;
+                        mySourceChar = (int32_t)leadByte;
                     }
-                    /* add another bit so that the code below writes 2 bytes in case of error */
-                    mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     args->converter->toUnicodeStatus =0x00;
                 }
             }
diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c
--- icu.6175/source/common/ucnvmbcs.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnvmbcs.c	2009-06-11 18:05:36.000000000 +0100
@@ -1,7 +1,7 @@
 /*
 ******************************************************************************
 *
-*   Copyright (C) 2000-2007, International Business Machines
+*   Copyright (C) 2000-2008, International Business Machines
 *   Corporation and others.  All Rights Reserved.
 *
 ******************************************************************************
@@ -1791,6 +1791,65 @@
     pArgs->offsets=offsets;
 }
 
+static UBool
+hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) {
+    const int32_t *row=stateTable[state];
+    int32_t b, entry;
+    /* First test for final entries in this state for some commonly valid byte values. */
+    entry=row[0xa1];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    entry=row[0x41];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    /* Then test for final entries in this state. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+            MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+        ) {
+            return TRUE;
+        }
+    }
+    /* Then recurse for transition entries. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( MBCS_ENTRY_IS_TRANSITION(entry) &&
+            hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry))
+        ) {
+            return TRUE;
+        }
+    }
+    return FALSE;
+}
+
+/*
+ * Is byte b a single/lead byte in this state?
+ * Recurse for transition states, because here we don't want to say that
+ * b is a lead byte if all byte sequences that start with b are illegal.
+ */
+static UBool
+isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) {
+    const int32_t *row=stateTable[state];
+    int32_t entry=row[b];
+    if(MBCS_ENTRY_IS_TRANSITION(entry)) {   /* lead byte */
+        return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry));
+    } else {
+        uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry));
+        if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) {
+            return FALSE;   /* SI/SO are illegal for DBCS-only conversion */
+        } else {
+            return action!=MBCS_STATE_ILLEGAL;
+        }
+    }
+}
+
 U_CFUNC void
 ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs,
                           UErrorCode *pErrorCode) {
@@ -2146,6 +2205,34 @@
             sourceIndex=nextSourceIndex;
         } else if(U_FAILURE(*pErrorCode)) {
             /* callback(illegal) */
+            if(byteIndex>1) {
+                /*
+                 * Ticket 5691: consistent illegal sequences:
+                 * - We include at least the first byte in the illegal sequence.
+                 * - If any of the non-initial bytes could be the start of a character,
+                 *   we stop the illegal sequence before the first one of those.
+                 */
+                UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+                int8_t i;
+                for(i=1;
+                    i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]);
+                    ++i) {}
+                if(i<byteIndex) {
+                    /* Back out some bytes. */
+                    int8_t backOutDistance=byteIndex-i;
+                    int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source);
+                    byteIndex=i;  /* length of reported illegal byte sequence */
+                    if(backOutDistance<=bytesFromThisBuffer) {
+                        source-=backOutDistance;
+                    } else {
+                        /* Back out bytes from the previous buffer: Need to replay them. */
+                        cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                        /* preToULength is negative! */
+                        uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength);
+                        source=(const uint8_t *)pArgs->source;
+                    }
+                }
+            }
             break;
         } else /* unassigned sequences indicated with byteIndex>0 */ {
             /* try an extension mapping */
@@ -2156,6 +2243,7 @@
                               &offsets, sourceIndex,
                               pArgs->flush,
                               pErrorCode);
+            /* TODO: nextSourceIndex+=diff instead of nextSourceIndex+diff ?? */
             sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source);
 
             if(U_FAILURE(*pErrorCode)) {
@@ -2447,15 +2535,37 @@
 
     if(c<0) {
         if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) {
-            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
-        }
-        if(U_FAILURE(*pErrorCode)) {
             /* incomplete character byte sequence */
             uint8_t *bytes=cnv->toUBytes;
             cnv->toULength=(int8_t)(source-lastSource);
             do {
                 *bytes++=*lastSource++;
             } while(lastSource<source);
+            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
+        } else if(U_FAILURE(*pErrorCode)) {
+            /* callback(illegal) */
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             */
+            UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+            uint8_t *bytes=cnv->toUBytes;
+            *bytes++=*lastSource++;     /* first byte */
+            if(lastSource==source) {
+                cnv->toULength=1;
+            } else /* lastSource<source: multi-byte character */ {
+                int8_t i;
+                for(i=1;
+                    lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource);
+                    ++i
+                ) {
+                    *bytes++=*lastSource++;
+                }
+                cnv->toULength=i;
+                source=lastSource;
+            }
         } else {
             /* no output because of empty input or only state changes */
             *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR;
diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c
--- icu.6175/source/test/cintltst/nccbtst.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/cintltst/nccbtst.c	2009-06-11 18:05:36.000000000 +0100
@@ -2497,13 +2497,13 @@
 
 
     static const uint8_t text943[] = {
-        0x82, 0xa9, 0x82, 0x20, /*0xc8,*/  0x61, 0x8a, 0xbf, 0x8e, 0x9a };
-    static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
-    static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
+        0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a };
+    static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22,  0x5b57 };
+    static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22,  0x5b57 };
     static const UChar toUnicode943stop[]= { 0x304b};
 
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 4, 5, 7};
-    static const int32_t  fromIBM943Offsskip[] = { 0, 4, 5, 7};
+    static const int32_t  fromIBM943Offssub[]  = { 0, 2, 3, 4, 5, 7 };
+    static const int32_t  fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 };
     static const int32_t  fromIBM943Offsstop[] = { 0};
 
     gInBufferSize = inputsize;
@@ -2537,9 +2537,9 @@
 {
     static const uint8_t sampleText[] = {
         0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82,
-        0xff, /*0x82, 0xa9,*/ 0x32, 0x33};
-    static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063,  0xfffd,/*0x304b,*/ 0x0032, 0x0033};
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 3, 4, 5, 7, 8};
+        0xff, 0x32, 0x33};
+    static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 };
+    static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 };
     /*checking illegal value for ibm-943 with substitute*/ 
     gInBufferSize = inputsize;
     gOutBufferSize = outputsize;
diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c
--- icu.6175/source/test/cintltst/nucnvtst.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/cintltst/nucnvtst.c	2009-06-11 18:05:36.000000000 +0100
@@ -2608,7 +2608,7 @@
     TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source");
     /*Test for the condition where there is an invalid character*/
     {
-        static const uint8_t source2[]={0xa1, 0x01};
+        static const uint8_t source2[]={0xa1, 0x80};
         TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character");
     }
     /*Test for the condition where we have a truncated char*/
@@ -3901,11 +3901,11 @@
 TestISO_2022_KR() {
     /* test input */
     static const uint16_t in[]={
-                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D
-                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04
+                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D
+                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04
                    ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029
                    ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB
-                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2
+                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2
                    ,0x53E3,0x53E4,0x000A,0x000D};
     const UChar* uSource;
     const UChar* uSourceLimit;
diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt
--- icu.6175/source/test/testdata/conversion.txt	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/testdata/conversion.txt	2009-06-11 18:05:36.000000000 +0100
@@ -48,12 +48,83 @@
     toUnicode {
       Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" }
       Cases {
+        // Test ticket 5691: consistent illegal sequences
+        // Unfortunately, we cannot use the Shift-JIS examples from the ticket
+        // comments because our Shift-JIS table is Windows-compatible and
+        // therefore has no illegal single bytes. Same for GBK.
+        // Instead, we use the stricter GB 18030 also for 2-byte examples.
+        // The byte sequences are generally slightly different from the ticket
+        // comment, simply using assigned characters rather than just
+        // theoretically valid sequences.
+        {
+          "gb18030",
+          :bin{ 618140813c81ff7a },
+          "a\u4e02\\x81<\\x81\\xFFz",
+          :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "EUC-JP",
+          :bin{ 618fb0a98fb03c8f3cb0a97a },
+          "a\u4e28\\x8F\\xB0<\\x8F<\u9022z",
+          :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "gb18030",
+          :bin{ 618130fc318130fc8181303c3e813cfc817a },
+          "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z",
+          :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "UTF-8",
+          :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a },
+          "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z",
+          :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-JP-2",
+          :bin{ 1b24424141af4142affe41431b2842 },
+          "\u758f\\xAF\u758e\\xAF\\xFE\u790e",
+          :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ibm-25546",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-KR",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 411b242941420e4141af4142affe41430f5a },
+          "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "HZ",
+          :bin{ 417e7b4141af4142affe41437e7d5a },
+          "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
         // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e
         {
           "HZ",
           :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b },
-          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
-          :intvector{ 2,4,6,8,10,12,14,18,19,21,24 },
+          "\u3000\ufffd\ufffd\u3013\ufffd\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
+          :intvector{ 2,4,5,6,8,9,10,12,14,18,19,21,24 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and
@@ -61,8 +132,8 @@
         {
           "ISO-2022-JP",
           :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 },
-          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
-          :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 },
+          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\ufffd\ufffd\u25b2\ufffd\ufffd\u6f3e",
+          :intvector{ 3,4,5,9,11,12,13,14,16,17,19,20,21,22,23,25,26,27 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()
[3. text/x-patch; icu-3.6-CVE-2009-0153.patch]

diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c
--- icu.6175/source/common/ucnv2022.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnv2022.c	2009-06-02 16:03:15.000000000 +0100
@@ -754,6 +754,7 @@
     UConverterDataISO2022* myData2022 = ((UConverterDataISO2022*)_this->extraInfo);
     uint32_t key = myData2022->key;
     int32_t offset = 0;
+    int8_t initialToULength = _this->toULength;
     char c;
 
     value = VALID_NON_TERMINAL_2022;
@@ -806,7 +807,6 @@
         return;
     } else if (value == INVALID_2022 ) {
         *err = U_ILLEGAL_ESCAPE_SEQUENCE;
-        return;
     } else /* value == VALID_TERMINAL_2022 */ {
         switch(var){
 #ifdef U_ENABLE_GENERIC_ISO_2022
@@ -938,6 +938,35 @@
     }
     if(U_SUCCESS(*err)) {
         _this->toULength = 0;
+    } else if(*err==U_ILLEGAL_ESCAPE_SEQUENCE) {
+        if(_this->toULength>1) {
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte (ESC) in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             *   In escape sequences, all following bytes are "printable", that is,
+             *   unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS),
+             *   they are valid single/lead bytes.
+             *   For simplicity, we always only report the initial ESC byte as the
+             *   illegal sequence and back out all other bytes we looked at.
+             */
+            /* Back out some bytes. */
+            int8_t backOutDistance=_this->toULength-1;
+            int8_t bytesFromThisBuffer=_this->toULength-initialToULength;
+            if(backOutDistance<=bytesFromThisBuffer) {
+                /* same as initialToULength<=1 */
+                *source-=backOutDistance;
+            } else {
+                /* Back out bytes from the previous buffer: Need to replay them. */
+                _this->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                /* same as -(initialToULength-1) */
+                /* preToULength is negative! */
+                uprv_memcpy(_this->preToU, _this->toUBytes+1, -_this->preToULength);
+                *source-=bytesFromThisBuffer;
+            }
+            _this->toULength=1;
+        }
     } else if(*err==U_UNSUPPORTED_ESCAPE_SEQUENCE) {
         _this->toUCallbackReason = UCNV_UNASSIGNED;
     }
@@ -1973,6 +2002,7 @@
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
         cs = (StateEnum)pToU2022State->cs[pToU2022State->g];
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -2102,17 +2132,44 @@
                 default:
                     /* G0 DBCS */
                     if(mySource < mySourceLimit) {
-                        char trailByte;
+                        int leadIsOk, trailIsOk;
+                        uint8_t trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        if(cs == JISX208) {
-                            _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
-                        } else {
-                            tempBuf[0] = (char)mySourceChar;
-                            tempBuf[1] = trailByte;
+                        trailByte = (uint8_t)*mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                         * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                         * Otherwise we convert or report the pair of bytes.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk && trailIsOk) {
+                            ++mySource;
+                            uint32_t tmpSourceChar = (mySourceChar << 8) | trailByte;
+                            if(cs == JISX208) {
+                                _2022ToSJIS((uint8_t)mySourceChar, trailByte, tempBuf);
+                                mySourceChar = tmpSourceChar;
+                            } else {
+                                /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */
+                                mySourceChar = tmpSourceChar;
+                                if (cs == KSC5601) {
+                                    tmpSourceChar += 0x8080;  /* = _2022ToGR94DBCS(tmpSourceChar) */
+                                }
+                                tempBuf[0] = (char)(tmpSourceChar >> 8);
+                                tempBuf[1] = (char)(tmpSourceChar);
+                            }
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
+                        } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                            /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                            ++mySource;
+                            /* add another bit so that the code below writes 2 bytes in case of error */
+                            mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
@@ -2254,7 +2311,12 @@
             }
             /* only DBCS or SBCS characters are expected*/
             /* DB characters with high bit set to 1 are expected */
-            if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){
+            if( length > 2 || length==0 ||
+                (length == 1 && targetByteUnit > 0x7f) ||
+                (length == 2 &&
+                    ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) ||
+                    (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1)))
+            ) {
                 targetByteUnit=missingCharMarker;
             }
             if (targetByteUnit != missingCharMarker){
@@ -2583,17 +2645,34 @@
             myData->isEmptySegment = FALSE;	/* Any invalid char errors will be detected separately, so just reset this */
             if(myData->toU2022State.g == 1) {
                 if(mySource < mySourceLimit) {
-                    char trailByte;
+                    int leadIsOk, trailIsOk;
+                    uint8_t trailByte;
 getTrailByte:
-                    trailByte = *mySource++;
-                    tempBuf[0] = (char)(mySourceChar + 0x80);
-                    tempBuf[1] = (char)(trailByte + 0x80);
-                    mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                    if((mySourceChar & 0x8080) == 0) {
+                    targetUniChar = missingCharMarker;
+                    trailByte = (uint8_t)*mySource;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                     * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                     * Otherwise we convert or report the pair of bytes.
+                     */
+                    leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk && trailIsOk) {
+                        ++mySource;
+                        tempBuf[0] = (char)(mySourceChar + 0x80);
+                        tempBuf[1] = (char)(trailByte + 0x80);
                         targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
-                    } else {
-                        /* illegal bytes > 0x7f */
-                        targetUniChar = missingCharMarker;
+                        mySourceChar = (mySourceChar << 8) | trailByte;
+                    } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                        /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                        ++mySource;
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                     }
                 } else {
                     args->converter->toUBytes[0] = (uint8_t)mySourceChar;
@@ -2601,8 +2680,10 @@
                     break;
                 }
             }
-            else{
+            else if(mySourceChar <= 0x7f) {
                 targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback);
+            } else {
+                targetUniChar = 0xffff;
             }
             if(targetUniChar < 0xfffe){
                 if(args->offsets) {
@@ -3099,6 +3180,7 @@
         /* continue with a partial double-byte character */
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -3178,29 +3260,50 @@
                         UConverterSharedData *cnv;
                         StateEnum tempState;
                         int32_t tempBufLen;
-                        char trailByte;
+                        int leadIsOk, trailIsOk;
+                        uint8_t trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
-                        if(tempState > CNS_11643_0) {
-                            cnv = myData->myConverterArray[CNS_11643];
-                            tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
-                            tempBuf[1] = (char) (mySourceChar);
-                            tempBuf[2] = trailByte;
-                            tempBufLen = 3;
-
-                        }else{
-                            cnv = myData->myConverterArray[tempState];
-                            tempBuf[0] = (char) (mySourceChar);
-                            tempBuf[1] = trailByte;
-                            tempBufLen = 2;
+                        trailByte = (uint8_t)*mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                         * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                         * Otherwise we convert or report the pair of bytes.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk && trailIsOk) {
+                            ++mySource;
+                            tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
+                            if(tempState >= CNS_11643_0) {
+                                cnv = myData->myConverterArray[CNS_11643];
+                                tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
+                                tempBuf[1] = (char) (mySourceChar);
+                                tempBuf[2] = (char) trailByte;
+                                tempBufLen = 3;
+
+                            }else{
+                                cnv = myData->myConverterArray[tempState];
+                                tempBuf[0] = (char) (mySourceChar);
+                                tempBuf[1] = (char) trailByte;
+                                tempBufLen = 2;
+                            }
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
+                            mySourceChar = (mySourceChar << 8) | trailByte;
+                        } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                            /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                            ++mySource;
+                            /* add another bit so that the code below writes 2 bytes in case of error */
+                            mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         if(pToU2022State->g>=2) {
                             /* return from a single-shift state to the previous one */
                             pToU2022State->g=pToU2022State->prevG;
                         }
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c
--- icu.6175/source/common/ucnvhz.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnvhz.c	2009-06-02 15:57:18.000000000 +0100
@@ -196,10 +196,30 @@
                      /* if the first byte is equal to TILDE and the trail byte
                      * is not a valid byte then it is an error condition
                      */
-                    mySourceChar = 0x7e00 | mySourceChar;
-                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     */
                     myData->isEmptySegment = FALSE; /* different error here, reset this to avoid spurious future error */
-                    break;
+                    *err = U_ILLEGAL_ESCAPE_SEQUENCE;
+                    args->converter->toUBytes[0] = UCNV_TILDE;
+                    if( myData->isStateDBCS ?
+                            (0x21 <= mySourceChar && mySourceChar <= 0x7e) :
+                            mySourceChar <= 0x7f
+                    ) {
+                        /* The current byte could be the start of a character: Back it out. */
+                        args->converter->toULength = 1;
+                        --mySource;
+                    } else {
+                        /* Include the current byte in the illegal sequence. */
+                        args->converter->toUBytes[1] = mySourceChar;
+                        args->converter->toULength = 2;
+                    }
+                    args->target = myTarget;
+                    args->source = mySource;
+                    return;
                 }
             } else if(myData->isStateDBCS) {
                 if(args->converter->toUnicodeStatus == 0x00){
@@ -215,19 +235,36 @@
                 }
                 else{
                     /* trail byte */
+                    int leadIsOk, trailIsOk;
                     uint32_t leadByte = args->converter->toUnicodeStatus & 0xff;
-                    if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) &&
-                        (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21)
-                    ) {
+                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In HZ DBCS, if the second byte is in the 21..7e range,
+                     * we report only the first byte as the illegal sequence.
+                     * Otherwise we convert or report the pair of bytes.
+                     */
+                    leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21);
+                    trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk && trailIsOk) {
                         tempBuf[0] = (char) (leadByte+0x80) ;
                         tempBuf[1] = (char) (mySourceChar+0x80);
                         targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
                             tempBuf, 2, args->converter->useFallback);
+                        mySourceChar= (leadByte << 8) | mySourceChar;
+                    } else if (trailIsOk) {
+                        /* report a single illegal byte and continue with the following DBCS starter byte */
+                        --mySource;
+                        mySourceChar = (int32_t)leadByte;
                     } else {
-                        targetUniChar = 0xffff;
+                        /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     }
-                    /* add another bit so that the code below writes 2 bytes in case of error */
-                    mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     args->converter->toUnicodeStatus =0x00;
                 }
             }
diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c
--- icu.6175/source/common/ucnvmbcs.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnvmbcs.c	2009-06-02 15:56:07.000000000 +0100
@@ -1697,6 +1697,65 @@
     pArgs->offsets=offsets;
 }
 
+static UBool
+hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) {
+    const int32_t *row=stateTable[state];
+    int32_t b, entry;
+    /* First test for final entries in this state for some commonly valid byte values. */
+    entry=row[0xa1];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    entry=row[0x41];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    /* Then test for final entries in this state. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+            MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+        ) {
+            return TRUE;
+        }
+    }
+    /* Then recurse for transition entries. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( MBCS_ENTRY_IS_TRANSITION(entry) &&
+            hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry))
+        ) {
+            return TRUE;
+        }
+    }
+    return FALSE;
+}
+
+/*
+ * Is byte b a single/lead byte in this state?
+ * Recurse for transition states, because here we don't want to say that
+ * b is a lead byte if all byte sequences that start with b are illegal.
+ */
+static UBool
+isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) {
+    const int32_t *row=stateTable[state];
+    int32_t entry=row[b];
+    if(MBCS_ENTRY_IS_TRANSITION(entry)) {   /* lead byte */
+        return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry));
+    } else {
+        uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry));
+        if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) {
+            return FALSE;   /* SI/SO are illegal for DBCS-only conversion */
+        } else {
+            return action!=MBCS_STATE_ILLEGAL;
+        }
+    }
+}
+
 U_CFUNC void
 ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs,
                           UErrorCode *pErrorCode) {
@@ -2052,6 +2111,34 @@
             sourceIndex=nextSourceIndex;
         } else if(U_FAILURE(*pErrorCode)) {
             /* callback(illegal) */
+            if(byteIndex>1) {
+                /*
+                 * Ticket 5691: consistent illegal sequences:
+                 * - We include at least the first byte in the illegal sequence.
+                 * - If any of the non-initial bytes could be the start of a character,
+                 *   we stop the illegal sequence before the first one of those.
+                 */
+                UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+                int8_t i;
+                for(i=1;
+                    i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]);
+                    ++i) {}
+                if(i<byteIndex) {
+                    /* Back out some bytes. */
+                    int8_t backOutDistance=byteIndex-i;
+                    int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source);
+                    byteIndex=i;  /* length of reported illegal byte sequence */
+                    if(backOutDistance<=bytesFromThisBuffer) {
+                        source-=backOutDistance;
+                    } else {
+                        /* Back out bytes from the previous buffer: Need to replay them. */
+                        cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                        /* preToULength is negative! */
+                        uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength);
+                        source=(const uint8_t *)pArgs->source;
+                    }
+                }
+            }
             break;
         } else /* unassigned sequences indicated with byteIndex>0 */ {
             /* try an extension mapping */
@@ -2062,7 +2149,7 @@
                               &offsets, sourceIndex,
                               pArgs->flush,
                               pErrorCode);
-            sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source);
+            sourceIndex=nextSourceIndex+=(int32_t)(source-(const uint8_t *)pArgs->source);
 
             if(U_FAILURE(*pErrorCode)) {
                 /* not mappable or buffer overflow */
@@ -2353,15 +2440,37 @@
 
     if(c<0) {
         if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) {
-            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
-        }
-        if(U_FAILURE(*pErrorCode)) {
             /* incomplete character byte sequence */
             uint8_t *bytes=cnv->toUBytes;
             cnv->toULength=(int8_t)(source-lastSource);
             do {
                 *bytes++=*lastSource++;
             } while(lastSource<source);
+            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
+        } else if(U_FAILURE(*pErrorCode)) {
+            /* callback(illegal) */
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             */
+            UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+            uint8_t *bytes=cnv->toUBytes;
+            *bytes++=*lastSource++;     /* first byte */
+            if(lastSource==source) {
+                cnv->toULength=1;
+            } else /* lastSource<source: multi-byte character */ {
+                int8_t i;
+                for(i=1;
+                    lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource);
+                    ++i
+                ) {
+                    *bytes++=*lastSource++;
+                }
+                cnv->toULength=i;
+                source=lastSource;
+            }
         } else {
             /* no output because of empty input or only state changes */
             *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR;
diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c
--- icu.6175/source/test/cintltst/nccbtst.c	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/cintltst/nccbtst.c	2009-06-02 15:47:38.000000000 +0100
@@ -2497,13 +2497,13 @@
 
 
     static const uint8_t text943[] = {
-        0x82, 0xa9, 0x82, 0x20, /*0xc8,*/  0x61, 0x8a, 0xbf, 0x8e, 0x9a };
-    static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
-    static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
+        0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a };
+    static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22,  0x5b57 };
+    static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22,  0x5b57 };
     static const UChar toUnicode943stop[]= { 0x304b};
 
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 4, 5, 7};
-    static const int32_t  fromIBM943Offsskip[] = { 0, 4, 5, 7};
+    static const int32_t  fromIBM943Offssub[]  = { 0, 2, 3, 4, 5, 7 };
+    static const int32_t  fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 };
     static const int32_t  fromIBM943Offsstop[] = { 0};
 
     gInBufferSize = inputsize;
@@ -2537,9 +2537,9 @@
 {
     static const uint8_t sampleText[] = {
         0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82,
-        0xff, /*0x82, 0xa9,*/ 0x32, 0x33};
-    static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063,  0xfffd,/*0x304b,*/ 0x0032, 0x0033};
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 3, 4, 5, 7, 8};
+        0xff, 0x32, 0x33};
+    static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 };
+    static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 };
     /*checking illegal value for ibm-943 with substitute*/ 
     gInBufferSize = inputsize;
     gOutBufferSize = outputsize;
diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c
--- icu.6175/source/test/cintltst/nucnvtst.c	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/cintltst/nucnvtst.c	2009-06-02 15:47:38.000000000 +0100
@@ -2606,7 +2606,7 @@
     TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source");
     /*Test for the condition where there is an invalid character*/
     {
-        static const uint8_t source2[]={0xa1, 0x01};
+        static const uint8_t source2[]={0xa1, 0x80};
         TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character");
     }
     /*Test for the condition where we have a truncated char*/
@@ -3899,11 +3899,11 @@
 TestISO_2022_KR() {
     /* test input */
     static const uint16_t in[]={
-                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D
-                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04
+                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D
+                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04
                    ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029
                    ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB
-                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2
+                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2
                    ,0x53E3,0x53E4,0x000A,0x000D};
     const UChar* uSource;
     const UChar* uSourceLimit;
diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt
--- icu.6175/source/test/testdata/conversion.txt	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/testdata/conversion.txt	2009-06-02 15:57:41.000000000 +0100
@@ -48,12 +48,144 @@
     toUnicode {
       Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" }
       Cases {
+        // Test ticket 5691: consistent illegal sequences
+        // The following test cases are for illegal character byte sequences.
+        //
+        // Unfortunately, we cannot use the Shift-JIS examples from the ticket
+        // comments because our Shift-JIS table is Windows-compatible and
+        // therefore has no illegal single bytes. Same for GBK.
+        // Instead, we use the stricter GB 18030 also for 2-byte examples.
+        // The byte sequences are generally slightly different from the ticket
+        // comment, simply using assigned characters rather than just
+        // theoretically valid sequences.
+        {
+          "gb18030",
+          :bin{ 618140813c81ff7a },
+          "a\u4e02\\x81<\\x81\\xFFz",
+          :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "EUC-JP",
+          :bin{ 618fb0a98fb03c8f3cb0a97a },
+          "a\u4e28\\x8F\\xB0<\\x8F<\u9022z",
+          :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "gb18030",
+          :bin{ 618130fc318130fc8181303c3e813cfc817a },
+          "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z",
+          :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "UTF-8",
+          :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a },
+          "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z",
+          :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-JP",
+          :bin{ 1b24424141af4142affe41431b2842 },
+          "\u758f\\xAF\u758e\\xAF\\xFE\u790e",
+          :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ibm-25546",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-KR",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 411b242941420e4141af4142affe41430f5a },
+          "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "HZ",
+          :bin{ 417e7b4141af4142affe41437e7d5a },
+          "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: consistent illegal sequences
+        // The following test cases are for illegal escape/designator/shift sequences.
+        //
+        // ISO-2022-JP and -CN with illegal escape sequences.
+        {
+          "ISO-2022-JP",
+          :bin{ 611b24201b244241411b283f1b28427a },
+          "a\\x1B$ \u758f\\x1B\u2538z",
+          :intvector{ 0,1,1,1,1,2,3,7,9,9,9,9,10,15 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 611b2429201b2429410e41410f7a },
+          "a\\x1B$) \u4eaez",
+          :intvector{ 0,1,1,1,1,2,3,4,10,13 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: ISO-2022-JP-2 with illegal single-shift SS2 and SS3 sequences.
+        // The first ESC N comes before its designator sequence, the last sequence is ESC+space.
+        {
+          "ISO-2022-JP-2",
+          :bin{ 4e1b4e4e1b2e414e1b4e4e4e1b204e },
+          "N\\x1BNNN\xceN\\x1B N",
+          :intvector{ 0,1,1,1,1,2,3,7,10,11,12,12,12,12,13,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN-EXT",
+          :bin{ 4e1b4e4e1b242a484e1b4e4e4e4e1b204e },
+          "N\\x1BNNN\u8f0eN\\x1B N",
+          :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN-EXT",
+          :bin{ 4f1b4f4f1b242b494f1b4f4f4f4f1b204f },
+          "O\\x1BOOO\u492bO\\x1B O",
+          :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: HZ with illegal tilde sequences.
+        {
+          "HZ",
+          :bin{ 417e20427e21437e80447e7b41417e207e41427e7f41437e7d5a },
+          "A\\x7E B\\x7E!C\\x7E\\x80D\u4eae\\x7E\\x20\\x7E\u8c05\\x7E\\x7F\u64a9Z",
+          :intvector{ 0,1,1,1,1,2,3,4,4,4,4,5,6,7,7,7,7,7,7,7,7,9,                          // SBCS
+                      12,14,14,14,14,14,14,14,14,16,16,16,16,17,19,19,19,19,19,19,19,19,21, // DBCS
+                      25 },                                                                 // SBCS
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: Example from Peter Edberg.
+        {
+          "ISO-2022-JP",
+          :bin{ 1b244230212f7e742630801b284a621b2458631b2842648061 },
+          "\u4e9c\ufffd\u7199\ufffdb\ufffd$Xcd\ufffda",
+          :intvector{ 3,5,7,9,14,15,16,17,18,22,23,24 },
+          :int{1}, :int{0}, "", "?", :bin{""}
+        }
         // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e
         {
           "HZ",
-          :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b },
-          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
-          :intvector{ 2,4,6,8,10,12,14,18,19,21,24 },
+          :bin{ 7e7b21212120217e217f772100007e217e7e7d207e7e807e0a2b },
+          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd\u3013 ~\ufffd+",
+          :intvector{ 2,4,6,8,10,12,14,15,19,20,22,25 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and
@@ -61,8 +193,8 @@
         {
           "ISO-2022-JP",
           :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 },
-          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
-          :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 },
+          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
+          :intvector{ 3,4,5,9,11,12,14,16,17,19,21,23,25,27 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()
@@ -341,7 +473,7 @@
         {
           "ISO-2022-CN-EXT",
           :bin{ 411b4e2121 }, "\x41", :intvector{ 0 },
-          :int{1}, :int{1}, "illesc", ".", :bin{ 1b4e }
+          :int{1}, :int{1}, "illesc", ".", :bin{ 1b }
         }
         // G3 designator: recognized, but not supported for -CN (only for -CN-EXT)
         {

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 25 Aug 2009 20:30:08 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 25 Aug 2009 20:30:08 GMT) (full text, mbox, link).

Message #20 received at 534590@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

(Trying one more time to post properly to bug report; original message
also copied to team@security.debian.org.)

The reporter of this ICU security bug that impacts oldstable and stable
but not testing or unstable was kind enough to refer to the Red Hat
bugzilla entry for this.  The people at Red Hat have backported the
security fix to 3.6 (our oldstable) and 3.8 (our stable) versions of ICU
(which appear in RHEL5 and Fedora 9).  I have grabbed their SRPMs for
the patched versions and extracted the patches that apply to the 3.6 and
3.8 versions.  Attached here are the patches directly from those source
RPMs, not modified in any way or tested for debian.  I can integrate
these into the debian packages prepare uploads for stable security and
oldstable security, or I can defer to the security team to do the
integration.  Just let me know.  It may be several days before I have a
chance to work on it, but I have prepared stable security uploads for my
packages before.  I am grateful to Red Hat for doing the work of
backporting to the older ICU versions.

-- 
Jay Berkenbilt <qjb@debian.org>

[icu-3.8-CVE-2009-0153.patch (text/x-patch, inline)]

diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c
--- icu.6175/source/common/ucnv2022.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnv2022.c	2009-06-11 18:09:48.000000000 +0100
@@ -1973,6 +1973,7 @@
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
         cs = (StateEnum)pToU2022State->cs[pToU2022State->g];
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -2102,18 +2103,45 @@
                 default:
                     /* G0 DBCS */
                     if(mySource < mySourceLimit) {
+                        int leadIsOk, trailIsOk;
                         char trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        if(cs == JISX208) {
-                            _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
-                        } else {
-                            tempBuf[0] = (char)mySourceChar;
-                            tempBuf[1] = trailByte;
-                        }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
-                    } else {
+                        trailByte = *mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                         * the 21..7e range, then we treat them as a pair.
+                         * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                         * we report only the first byte as the illegal sequence.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk == trailIsOk) {
+                            ++mySource;
+                            uint32_t tmpSourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
+                            if (leadIsOk) {
+                                if(cs == JISX208) {
+                                    _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
+                                    mySourceChar = tmpSourceChar;
+                                } else {
+                                    /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */
+                                    mySourceChar = tmpSourceChar;
+                                    if (cs == KSC5601) {
+                                        tmpSourceChar += 0x8080;  /* = _2022ToGR94DBCS(tmpSourceChar) */
+                                    }
+                                    tempBuf[0] = (char)(tmpSourceChar >> 8);
+                                    tempBuf[1] = (char)(tmpSourceChar);
+                                }
+                                targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
+                            } else {
+                                mySourceChar = tmpSourceChar;
+                            }
+                          }
+                      } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
                         goto endloop;
@@ -2254,7 +2282,12 @@
             }
             /* only DBCS or SBCS characters are expected*/
             /* DB characters with high bit set to 1 are expected */
-            if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){
+            if( length > 2 || length==0 ||
+                (length == 1 && targetByteUnit > 0x7f) ||
+                (length == 2 &&
+                    ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) ||
+                    (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1)))
+            ) {
                 targetByteUnit=missingCharMarker;
             }
             if (targetByteUnit != missingCharMarker){
@@ -2583,17 +2616,36 @@
             myData->isEmptySegment = FALSE;	/* Any invalid char errors will be detected separately, so just reset this */
             if(myData->toU2022State.g == 1) {
                 if(mySource < mySourceLimit) {
+                    int leadIsOk, trailIsOk;
                     char trailByte;
 getTrailByte:
-                    trailByte = *mySource++;
-                    tempBuf[0] = (char)(mySourceChar + 0x80);
-                    tempBuf[1] = (char)(trailByte + 0x80);
-                    mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                    if((mySourceChar & 0x8080) == 0) {
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
+                    targetUniChar = missingCharMarker;
+                    trailByte = *mySource;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                     * the 21..7e range, then we treat them as a pair.
+                     * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                     * we report only the first byte as the illegal sequence.
+                     */
+                    leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk == trailIsOk) {
+                        ++mySource;
+                        if (leadIsOk) {
+                            tempBuf[0] = (char)(mySourceChar + 0x80);
+                            tempBuf[1] = (char)(trailByte + 0x80);
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
+                        } else {
+                            leadIsOk = TRUE; /* TODO: remove */
+                        }
+                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                     } else {
-                        /* illegal bytes > 0x7f */
-                        targetUniChar = missingCharMarker;
+                        trailIsOk = TRUE; /* TODO: remove */
                     }
                 } else {
                     args->converter->toUBytes[0] = (uint8_t)mySourceChar;
@@ -2601,8 +2653,10 @@
                     break;
                 }
             }
-            else{
+            else if(mySourceChar <= 0x7f) {
                 targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback);
+            } else {
+                targetUniChar = 0xffff;
             }
             if(targetUniChar < 0xfffe){
                 if(args->offsets) {
@@ -3099,6 +3153,7 @@
         /* continue with a partial double-byte character */
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -3178,29 +3233,48 @@
                         UConverterSharedData *cnv;
                         StateEnum tempState;
                         int32_t tempBufLen;
+                        int leadIsOk, trailIsOk;
                         char trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
-                        if(tempState > CNS_11643_0) {
-                            cnv = myData->myConverterArray[CNS_11643];
-                            tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
-                            tempBuf[1] = (char) (mySourceChar);
-                            tempBuf[2] = trailByte;
-                            tempBufLen = 3;
-
-                        }else{
-                            cnv = myData->myConverterArray[tempState];
-                            tempBuf[0] = (char) (mySourceChar);
-                            tempBuf[1] = trailByte;
-                            tempBufLen = 2;
+                        trailByte = *mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside
+                         * the 21..7e range, then we treat them as a pair.
+                         * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                         * we report only the first byte as the illegal sequence.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk == trailIsOk) {
+                            ++mySource;
+                            if (leadIsOk) {
+                                tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
+                                if(tempState >= CNS_11643_0) {
+                                    cnv = myData->myConverterArray[CNS_11643];
+                                    tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
+                                    tempBuf[1] = (char) (mySourceChar);
+                                    tempBuf[2] = trailByte;
+                                    tempBufLen = 3;
+
+                                }else{
+                                    cnv = myData->myConverterArray[tempState];
+                                    tempBuf[0] = (char) (mySourceChar);
+                                    tempBuf[1] = trailByte;
+                                    tempBufLen = 2;
+                                }
+                                targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
+                            }
+                            mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         if(pToU2022State->g>=2) {
                             /* return from a single-shift state to the previous one */
                             pToU2022State->g=pToU2022State->prevG;
                         }
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c
--- icu.6175/source/common/ucnvhz.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnvhz.c	2009-06-11 18:05:36.000000000 +0100
@@ -215,19 +215,35 @@
                 }
                 else{
                     /* trail byte */
+                    int leadIsOk, trailIsOk;
                     uint32_t leadByte = args->converter->toUnicodeStatus & 0xff;
-                    if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) &&
-                        (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21)
-                    ) {
-                        tempBuf[0] = (char) (leadByte+0x80) ;
-                        tempBuf[1] = (char) (mySourceChar+0x80);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
-                            tempBuf, 2, args->converter->useFallback);
+                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In HZ DBCS, if both bytes are valid or both bytes are outside
+                     * the 21..7d/7e range, then we treat them as a pair.
+                     * Otherwise (valid lead byte + illegal trail byte, or vice versa)
+                     * we report only the first byte as the illegal sequence.
+                     */
+                    leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21);
+                    trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk == trailIsOk) {
+                        if (leadIsOk) {
+                            tempBuf[0] = (char) (leadByte+0x80) ;
+                            tempBuf[1] = (char) (mySourceChar+0x80);
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
+                                tempBuf, 2, args->converter->useFallback);
+                        }
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     } else {
-                        targetUniChar = 0xffff;
+                        --mySource;
+                        mySourceChar = (int32_t)leadByte;
                     }
-                    /* add another bit so that the code below writes 2 bytes in case of error */
-                    mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     args->converter->toUnicodeStatus =0x00;
                 }
             }
diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c
--- icu.6175/source/common/ucnvmbcs.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/common/ucnvmbcs.c	2009-06-11 18:05:36.000000000 +0100
@@ -1,7 +1,7 @@
 /*
 ******************************************************************************
 *
-*   Copyright (C) 2000-2007, International Business Machines
+*   Copyright (C) 2000-2008, International Business Machines
 *   Corporation and others.  All Rights Reserved.
 *
 ******************************************************************************
@@ -1791,6 +1791,65 @@
     pArgs->offsets=offsets;
 }
 
+static UBool
+hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) {
+    const int32_t *row=stateTable[state];
+    int32_t b, entry;
+    /* First test for final entries in this state for some commonly valid byte values. */
+    entry=row[0xa1];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    entry=row[0x41];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    /* Then test for final entries in this state. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+            MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+        ) {
+            return TRUE;
+        }
+    }
+    /* Then recurse for transition entries. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( MBCS_ENTRY_IS_TRANSITION(entry) &&
+            hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry))
+        ) {
+            return TRUE;
+        }
+    }
+    return FALSE;
+}
+
+/*
+ * Is byte b a single/lead byte in this state?
+ * Recurse for transition states, because here we don't want to say that
+ * b is a lead byte if all byte sequences that start with b are illegal.
+ */
+static UBool
+isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) {
+    const int32_t *row=stateTable[state];
+    int32_t entry=row[b];
+    if(MBCS_ENTRY_IS_TRANSITION(entry)) {   /* lead byte */
+        return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry));
+    } else {
+        uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry));
+        if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) {
+            return FALSE;   /* SI/SO are illegal for DBCS-only conversion */
+        } else {
+            return action!=MBCS_STATE_ILLEGAL;
+        }
+    }
+}
+
 U_CFUNC void
 ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs,
                           UErrorCode *pErrorCode) {
@@ -2146,6 +2205,34 @@
             sourceIndex=nextSourceIndex;
         } else if(U_FAILURE(*pErrorCode)) {
             /* callback(illegal) */
+            if(byteIndex>1) {
+                /*
+                 * Ticket 5691: consistent illegal sequences:
+                 * - We include at least the first byte in the illegal sequence.
+                 * - If any of the non-initial bytes could be the start of a character,
+                 *   we stop the illegal sequence before the first one of those.
+                 */
+                UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+                int8_t i;
+                for(i=1;
+                    i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]);
+                    ++i) {}
+                if(i<byteIndex) {
+                    /* Back out some bytes. */
+                    int8_t backOutDistance=byteIndex-i;
+                    int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source);
+                    byteIndex=i;  /* length of reported illegal byte sequence */
+                    if(backOutDistance<=bytesFromThisBuffer) {
+                        source-=backOutDistance;
+                    } else {
+                        /* Back out bytes from the previous buffer: Need to replay them. */
+                        cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                        /* preToULength is negative! */
+                        uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength);
+                        source=(const uint8_t *)pArgs->source;
+                    }
+                }
+            }
             break;
         } else /* unassigned sequences indicated with byteIndex>0 */ {
             /* try an extension mapping */
@@ -2156,6 +2243,7 @@
                               &offsets, sourceIndex,
                               pArgs->flush,
                               pErrorCode);
+            /* TODO: nextSourceIndex+=diff instead of nextSourceIndex+diff ?? */
             sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source);
 
             if(U_FAILURE(*pErrorCode)) {
@@ -2447,15 +2535,37 @@
 
     if(c<0) {
         if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) {
-            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
-        }
-        if(U_FAILURE(*pErrorCode)) {
             /* incomplete character byte sequence */
             uint8_t *bytes=cnv->toUBytes;
             cnv->toULength=(int8_t)(source-lastSource);
             do {
                 *bytes++=*lastSource++;
             } while(lastSource<source);
+            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
+        } else if(U_FAILURE(*pErrorCode)) {
+            /* callback(illegal) */
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             */
+            UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+            uint8_t *bytes=cnv->toUBytes;
+            *bytes++=*lastSource++;     /* first byte */
+            if(lastSource==source) {
+                cnv->toULength=1;
+            } else /* lastSource<source: multi-byte character */ {
+                int8_t i;
+                for(i=1;
+                    lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource);
+                    ++i
+                ) {
+                    *bytes++=*lastSource++;
+                }
+                cnv->toULength=i;
+                source=lastSource;
+            }
         } else {
             /* no output because of empty input or only state changes */
             *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR;
diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c
--- icu.6175/source/test/cintltst/nccbtst.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/cintltst/nccbtst.c	2009-06-11 18:05:36.000000000 +0100
@@ -2497,13 +2497,13 @@
 
 
     static const uint8_t text943[] = {
-        0x82, 0xa9, 0x82, 0x20, /*0xc8,*/  0x61, 0x8a, 0xbf, 0x8e, 0x9a };
-    static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
-    static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
+        0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a };
+    static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22,  0x5b57 };
+    static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22,  0x5b57 };
     static const UChar toUnicode943stop[]= { 0x304b};
 
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 4, 5, 7};
-    static const int32_t  fromIBM943Offsskip[] = { 0, 4, 5, 7};
+    static const int32_t  fromIBM943Offssub[]  = { 0, 2, 3, 4, 5, 7 };
+    static const int32_t  fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 };
     static const int32_t  fromIBM943Offsstop[] = { 0};
 
     gInBufferSize = inputsize;
@@ -2537,9 +2537,9 @@
 {
     static const uint8_t sampleText[] = {
         0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82,
-        0xff, /*0x82, 0xa9,*/ 0x32, 0x33};
-    static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063,  0xfffd,/*0x304b,*/ 0x0032, 0x0033};
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 3, 4, 5, 7, 8};
+        0xff, 0x32, 0x33};
+    static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 };
+    static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 };
     /*checking illegal value for ibm-943 with substitute*/ 
     gInBufferSize = inputsize;
     gOutBufferSize = outputsize;
diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c
--- icu.6175/source/test/cintltst/nucnvtst.c	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/cintltst/nucnvtst.c	2009-06-11 18:05:36.000000000 +0100
@@ -2608,7 +2608,7 @@
     TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source");
     /*Test for the condition where there is an invalid character*/
     {
-        static const uint8_t source2[]={0xa1, 0x01};
+        static const uint8_t source2[]={0xa1, 0x80};
         TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character");
     }
     /*Test for the condition where we have a truncated char*/
@@ -3901,11 +3901,11 @@
 TestISO_2022_KR() {
     /* test input */
     static const uint16_t in[]={
-                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D
-                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04
+                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D
+                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04
                    ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029
                    ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB
-                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2
+                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2
                    ,0x53E3,0x53E4,0x000A,0x000D};
     const UChar* uSource;
     const UChar* uSourceLimit;
diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt
--- icu.6175/source/test/testdata/conversion.txt	2009-06-11 13:44:44.000000000 +0100
+++ icu/source/test/testdata/conversion.txt	2009-06-11 18:05:36.000000000 +0100
@@ -48,12 +48,83 @@
     toUnicode {
       Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" }
       Cases {
+        // Test ticket 5691: consistent illegal sequences
+        // Unfortunately, we cannot use the Shift-JIS examples from the ticket
+        // comments because our Shift-JIS table is Windows-compatible and
+        // therefore has no illegal single bytes. Same for GBK.
+        // Instead, we use the stricter GB 18030 also for 2-byte examples.
+        // The byte sequences are generally slightly different from the ticket
+        // comment, simply using assigned characters rather than just
+        // theoretically valid sequences.
+        {
+          "gb18030",
+          :bin{ 618140813c81ff7a },
+          "a\u4e02\\x81<\\x81\\xFFz",
+          :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "EUC-JP",
+          :bin{ 618fb0a98fb03c8f3cb0a97a },
+          "a\u4e28\\x8F\\xB0<\\x8F<\u9022z",
+          :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "gb18030",
+          :bin{ 618130fc318130fc8181303c3e813cfc817a },
+          "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z",
+          :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "UTF-8",
+          :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a },
+          "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z",
+          :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-JP-2",
+          :bin{ 1b24424141af4142affe41431b2842 },
+          "\u758f\\xAF\u758e\\xAF\\xFE\u790e",
+          :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ibm-25546",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-KR",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 411b242941420e4141af4142affe41430f5a },
+          "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "HZ",
+          :bin{ 417e7b4141af4142affe41437e7d5a },
+          "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
         // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e
         {
           "HZ",
           :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b },
-          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
-          :intvector{ 2,4,6,8,10,12,14,18,19,21,24 },
+          "\u3000\ufffd\ufffd\u3013\ufffd\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
+          :intvector{ 2,4,5,6,8,9,10,12,14,18,19,21,24 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and
@@ -61,8 +132,8 @@
         {
           "ISO-2022-JP",
           :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 },
-          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
-          :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 },
+          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\ufffd\ufffd\u25b2\ufffd\ufffd\u6f3e",
+          :intvector{ 3,4,5,9,11,12,13,14,16,17,19,20,21,22,23,25,26,27 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()

[icu-3.6-CVE-2009-0153.patch (text/x-patch, inline)]

diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c
--- icu.6175/source/common/ucnv2022.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnv2022.c	2009-06-02 16:03:15.000000000 +0100
@@ -754,6 +754,7 @@
     UConverterDataISO2022* myData2022 = ((UConverterDataISO2022*)_this->extraInfo);
     uint32_t key = myData2022->key;
     int32_t offset = 0;
+    int8_t initialToULength = _this->toULength;
     char c;
 
     value = VALID_NON_TERMINAL_2022;
@@ -806,7 +807,6 @@
         return;
     } else if (value == INVALID_2022 ) {
         *err = U_ILLEGAL_ESCAPE_SEQUENCE;
-        return;
     } else /* value == VALID_TERMINAL_2022 */ {
         switch(var){
 #ifdef U_ENABLE_GENERIC_ISO_2022
@@ -938,6 +938,35 @@
     }
     if(U_SUCCESS(*err)) {
         _this->toULength = 0;
+    } else if(*err==U_ILLEGAL_ESCAPE_SEQUENCE) {
+        if(_this->toULength>1) {
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte (ESC) in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             *   In escape sequences, all following bytes are "printable", that is,
+             *   unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS),
+             *   they are valid single/lead bytes.
+             *   For simplicity, we always only report the initial ESC byte as the
+             *   illegal sequence and back out all other bytes we looked at.
+             */
+            /* Back out some bytes. */
+            int8_t backOutDistance=_this->toULength-1;
+            int8_t bytesFromThisBuffer=_this->toULength-initialToULength;
+            if(backOutDistance<=bytesFromThisBuffer) {
+                /* same as initialToULength<=1 */
+                *source-=backOutDistance;
+            } else {
+                /* Back out bytes from the previous buffer: Need to replay them. */
+                _this->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                /* same as -(initialToULength-1) */
+                /* preToULength is negative! */
+                uprv_memcpy(_this->preToU, _this->toUBytes+1, -_this->preToULength);
+                *source-=bytesFromThisBuffer;
+            }
+            _this->toULength=1;
+        }
     } else if(*err==U_UNSUPPORTED_ESCAPE_SEQUENCE) {
         _this->toUCallbackReason = UCNV_UNASSIGNED;
     }
@@ -1973,6 +2002,7 @@
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
         cs = (StateEnum)pToU2022State->cs[pToU2022State->g];
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -2102,17 +2132,44 @@
                 default:
                     /* G0 DBCS */
                     if(mySource < mySourceLimit) {
-                        char trailByte;
+                        int leadIsOk, trailIsOk;
+                        uint8_t trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        if(cs == JISX208) {
-                            _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf);
-                        } else {
-                            tempBuf[0] = (char)mySourceChar;
-                            tempBuf[1] = trailByte;
+                        trailByte = (uint8_t)*mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                         * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                         * Otherwise we convert or report the pair of bytes.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk && trailIsOk) {
+                            ++mySource;
+                            uint32_t tmpSourceChar = (mySourceChar << 8) | trailByte;
+                            if(cs == JISX208) {
+                                _2022ToSJIS((uint8_t)mySourceChar, trailByte, tempBuf);
+                                mySourceChar = tmpSourceChar;
+                            } else {
+                                /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */
+                                mySourceChar = tmpSourceChar;
+                                if (cs == KSC5601) {
+                                    tmpSourceChar += 0x8080;  /* = _2022ToGR94DBCS(tmpSourceChar) */
+                                }
+                                tempBuf[0] = (char)(tmpSourceChar >> 8);
+                                tempBuf[1] = (char)(tmpSourceChar);
+                            }
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
+                        } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                            /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                            ++mySource;
+                            /* add another bit so that the code below writes 2 bytes in case of error */
+                            mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
@@ -2254,7 +2311,12 @@
             }
             /* only DBCS or SBCS characters are expected*/
             /* DB characters with high bit set to 1 are expected */
-            if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){
+            if( length > 2 || length==0 ||
+                (length == 1 && targetByteUnit > 0x7f) ||
+                (length == 2 &&
+                    ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) ||
+                    (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1)))
+            ) {
                 targetByteUnit=missingCharMarker;
             }
             if (targetByteUnit != missingCharMarker){
@@ -2583,17 +2645,34 @@
             myData->isEmptySegment = FALSE;	/* Any invalid char errors will be detected separately, so just reset this */
             if(myData->toU2022State.g == 1) {
                 if(mySource < mySourceLimit) {
-                    char trailByte;
+                    int leadIsOk, trailIsOk;
+                    uint8_t trailByte;
 getTrailByte:
-                    trailByte = *mySource++;
-                    tempBuf[0] = (char)(mySourceChar + 0x80);
-                    tempBuf[1] = (char)(trailByte + 0x80);
-                    mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
-                    if((mySourceChar & 0x8080) == 0) {
+                    targetUniChar = missingCharMarker;
+                    trailByte = (uint8_t)*mySource;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                     * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                     * Otherwise we convert or report the pair of bytes.
+                     */
+                    leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk && trailIsOk) {
+                        ++mySource;
+                        tempBuf[0] = (char)(mySourceChar + 0x80);
+                        tempBuf[1] = (char)(trailByte + 0x80);
                         targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback);
-                    } else {
-                        /* illegal bytes > 0x7f */
-                        targetUniChar = missingCharMarker;
+                        mySourceChar = (mySourceChar << 8) | trailByte;
+                    } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                        /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                        ++mySource;
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                     }
                 } else {
                     args->converter->toUBytes[0] = (uint8_t)mySourceChar;
@@ -2601,8 +2680,10 @@
                     break;
                 }
             }
-            else{
+            else if(mySourceChar <= 0x7f) {
                 targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback);
+            } else {
+                targetUniChar = 0xffff;
             }
             if(targetUniChar < 0xfffe){
                 if(args->offsets) {
@@ -3099,6 +3180,7 @@
         /* continue with a partial double-byte character */
         mySourceChar = args->converter->toUBytes[0];
         args->converter->toULength = 0;
+        targetUniChar = missingCharMarker;
         goto getTrailByte;
     }
 
@@ -3178,29 +3260,50 @@
                         UConverterSharedData *cnv;
                         StateEnum tempState;
                         int32_t tempBufLen;
-                        char trailByte;
+                        int leadIsOk, trailIsOk;
+                        uint8_t trailByte;
 getTrailByte:
-                        trailByte = *mySource++;
-                        tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
-                        if(tempState > CNS_11643_0) {
-                            cnv = myData->myConverterArray[CNS_11643];
-                            tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
-                            tempBuf[1] = (char) (mySourceChar);
-                            tempBuf[2] = trailByte;
-                            tempBufLen = 3;
-
-                        }else{
-                            cnv = myData->myConverterArray[tempState];
-                            tempBuf[0] = (char) (mySourceChar);
-                            tempBuf[1] = trailByte;
-                            tempBufLen = 2;
+                        trailByte = (uint8_t)*mySource;
+                        /*
+                         * Ticket 5691: consistent illegal sequences:
+                         * - We include at least the first byte in the illegal sequence.
+                         * - If any of the non-initial bytes could be the start of a character,
+                         *   we stop the illegal sequence before the first one of those.
+                         *
+                         * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is
+                         * an ESC/SO/SI, we report only the first byte as the illegal sequence.
+                         * Otherwise we convert or report the pair of bytes.
+                         */
+                        leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                        trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21);
+                        if (leadIsOk && trailIsOk) {
+                            ++mySource;
+                            tempState = (StateEnum)pToU2022State->cs[pToU2022State->g];
+                            if(tempState >= CNS_11643_0) {
+                                cnv = myData->myConverterArray[CNS_11643];
+                                tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0));
+                                tempBuf[1] = (char) (mySourceChar);
+                                tempBuf[2] = (char) trailByte;
+                                tempBufLen = 3;
+
+                            }else{
+                                cnv = myData->myConverterArray[tempState];
+                                tempBuf[0] = (char) (mySourceChar);
+                                tempBuf[1] = (char) trailByte;
+                                tempBufLen = 2;
+                            }
+                            targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
+                            mySourceChar = (mySourceChar << 8) | trailByte;
+                        } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) {
+                            /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                            ++mySource;
+                            /* add another bit so that the code below writes 2 bytes in case of error */
+                            mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte;
                         }
-                        mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte);
                         if(pToU2022State->g>=2) {
                             /* return from a single-shift state to the previous one */
                             pToU2022State->g=pToU2022State->prevG;
                         }
-                        targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE);
                     } else {
                         args->converter->toUBytes[0] = (uint8_t)mySourceChar;
                         args->converter->toULength = 1;
diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c
--- icu.6175/source/common/ucnvhz.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnvhz.c	2009-06-02 15:57:18.000000000 +0100
@@ -196,10 +196,30 @@
                      /* if the first byte is equal to TILDE and the trail byte
                      * is not a valid byte then it is an error condition
                      */
-                    mySourceChar = 0x7e00 | mySourceChar;
-                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     */
                     myData->isEmptySegment = FALSE; /* different error here, reset this to avoid spurious future error */
-                    break;
+                    *err = U_ILLEGAL_ESCAPE_SEQUENCE;
+                    args->converter->toUBytes[0] = UCNV_TILDE;
+                    if( myData->isStateDBCS ?
+                            (0x21 <= mySourceChar && mySourceChar <= 0x7e) :
+                            mySourceChar <= 0x7f
+                    ) {
+                        /* The current byte could be the start of a character: Back it out. */
+                        args->converter->toULength = 1;
+                        --mySource;
+                    } else {
+                        /* Include the current byte in the illegal sequence. */
+                        args->converter->toUBytes[1] = mySourceChar;
+                        args->converter->toULength = 2;
+                    }
+                    args->target = myTarget;
+                    args->source = mySource;
+                    return;
                 }
             } else if(myData->isStateDBCS) {
                 if(args->converter->toUnicodeStatus == 0x00){
@@ -215,19 +235,36 @@
                 }
                 else{
                     /* trail byte */
+                    int leadIsOk, trailIsOk;
                     uint32_t leadByte = args->converter->toUnicodeStatus & 0xff;
-                    if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) &&
-                        (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21)
-                    ) {
+                    targetUniChar = 0xffff;
+                    /*
+                     * Ticket 5691: consistent illegal sequences:
+                     * - We include at least the first byte in the illegal sequence.
+                     * - If any of the non-initial bytes could be the start of a character,
+                     *   we stop the illegal sequence before the first one of those.
+                     *
+                     * In HZ DBCS, if the second byte is in the 21..7e range,
+                     * we report only the first byte as the illegal sequence.
+                     * Otherwise we convert or report the pair of bytes.
+                     */
+                    leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21);
+                    trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21);
+                    if (leadIsOk && trailIsOk) {
                         tempBuf[0] = (char) (leadByte+0x80) ;
                         tempBuf[1] = (char) (mySourceChar+0x80);
                         targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData,
                             tempBuf, 2, args->converter->useFallback);
+                        mySourceChar= (leadByte << 8) | mySourceChar;
+                    } else if (trailIsOk) {
+                        /* report a single illegal byte and continue with the following DBCS starter byte */
+                        --mySource;
+                        mySourceChar = (int32_t)leadByte;
                     } else {
-                        targetUniChar = 0xffff;
+                        /* report a pair of illegal bytes if the second byte is not a DBCS starter */
+                        /* add another bit so that the code below writes 2 bytes in case of error */
+                        mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     }
-                    /* add another bit so that the code below writes 2 bytes in case of error */
-                    mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar;
                     args->converter->toUnicodeStatus =0x00;
                 }
             }
diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c
--- icu.6175/source/common/ucnvmbcs.c	2009-06-02 15:47:31.000000000 +0100
+++ icu/source/common/ucnvmbcs.c	2009-06-02 15:56:07.000000000 +0100
@@ -1697,6 +1697,65 @@
     pArgs->offsets=offsets;
 }
 
+static UBool
+hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) {
+    const int32_t *row=stateTable[state];
+    int32_t b, entry;
+    /* First test for final entries in this state for some commonly valid byte values. */
+    entry=row[0xa1];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    entry=row[0x41];
+    if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+        MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+    ) {
+        return TRUE;
+    }
+    /* Then test for final entries in this state. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( !MBCS_ENTRY_IS_TRANSITION(entry) &&
+            MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL
+        ) {
+            return TRUE;
+        }
+    }
+    /* Then recurse for transition entries. */
+    for(b=0; b<=0xff; ++b) {
+        entry=row[b];
+        if( MBCS_ENTRY_IS_TRANSITION(entry) &&
+            hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry))
+        ) {
+            return TRUE;
+        }
+    }
+    return FALSE;
+}
+
+/*
+ * Is byte b a single/lead byte in this state?
+ * Recurse for transition states, because here we don't want to say that
+ * b is a lead byte if all byte sequences that start with b are illegal.
+ */
+static UBool
+isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) {
+    const int32_t *row=stateTable[state];
+    int32_t entry=row[b];
+    if(MBCS_ENTRY_IS_TRANSITION(entry)) {   /* lead byte */
+        return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry));
+    } else {
+        uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry));
+        if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) {
+            return FALSE;   /* SI/SO are illegal for DBCS-only conversion */
+        } else {
+            return action!=MBCS_STATE_ILLEGAL;
+        }
+    }
+}
+
 U_CFUNC void
 ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs,
                           UErrorCode *pErrorCode) {
@@ -2052,6 +2111,34 @@
             sourceIndex=nextSourceIndex;
         } else if(U_FAILURE(*pErrorCode)) {
             /* callback(illegal) */
+            if(byteIndex>1) {
+                /*
+                 * Ticket 5691: consistent illegal sequences:
+                 * - We include at least the first byte in the illegal sequence.
+                 * - If any of the non-initial bytes could be the start of a character,
+                 *   we stop the illegal sequence before the first one of those.
+                 */
+                UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+                int8_t i;
+                for(i=1;
+                    i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]);
+                    ++i) {}
+                if(i<byteIndex) {
+                    /* Back out some bytes. */
+                    int8_t backOutDistance=byteIndex-i;
+                    int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source);
+                    byteIndex=i;  /* length of reported illegal byte sequence */
+                    if(backOutDistance<=bytesFromThisBuffer) {
+                        source-=backOutDistance;
+                    } else {
+                        /* Back out bytes from the previous buffer: Need to replay them. */
+                        cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance);
+                        /* preToULength is negative! */
+                        uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength);
+                        source=(const uint8_t *)pArgs->source;
+                    }
+                }
+            }
             break;
         } else /* unassigned sequences indicated with byteIndex>0 */ {
             /* try an extension mapping */
@@ -2062,7 +2149,7 @@
                               &offsets, sourceIndex,
                               pArgs->flush,
                               pErrorCode);
-            sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source);
+            sourceIndex=nextSourceIndex+=(int32_t)(source-(const uint8_t *)pArgs->source);
 
             if(U_FAILURE(*pErrorCode)) {
                 /* not mappable or buffer overflow */
@@ -2353,15 +2440,37 @@
 
     if(c<0) {
         if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) {
-            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
-        }
-        if(U_FAILURE(*pErrorCode)) {
             /* incomplete character byte sequence */
             uint8_t *bytes=cnv->toUBytes;
             cnv->toULength=(int8_t)(source-lastSource);
             do {
                 *bytes++=*lastSource++;
             } while(lastSource<source);
+            *pErrorCode=U_TRUNCATED_CHAR_FOUND;
+        } else if(U_FAILURE(*pErrorCode)) {
+            /* callback(illegal) */
+            /*
+             * Ticket 5691: consistent illegal sequences:
+             * - We include at least the first byte in the illegal sequence.
+             * - If any of the non-initial bytes could be the start of a character,
+             *   we stop the illegal sequence before the first one of those.
+             */
+            UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0);
+            uint8_t *bytes=cnv->toUBytes;
+            *bytes++=*lastSource++;     /* first byte */
+            if(lastSource==source) {
+                cnv->toULength=1;
+            } else /* lastSource<source: multi-byte character */ {
+                int8_t i;
+                for(i=1;
+                    lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource);
+                    ++i
+                ) {
+                    *bytes++=*lastSource++;
+                }
+                cnv->toULength=i;
+                source=lastSource;
+            }
         } else {
             /* no output because of empty input or only state changes */
             *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR;
diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c
--- icu.6175/source/test/cintltst/nccbtst.c	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/cintltst/nccbtst.c	2009-06-02 15:47:38.000000000 +0100
@@ -2497,13 +2497,13 @@
 
 
     static const uint8_t text943[] = {
-        0x82, 0xa9, 0x82, 0x20, /*0xc8,*/  0x61, 0x8a, 0xbf, 0x8e, 0x9a };
-    static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
-    static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22,  0x5b57};
+        0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a };
+    static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22,  0x5b57 };
+    static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22,  0x5b57 };
     static const UChar toUnicode943stop[]= { 0x304b};
 
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 4, 5, 7};
-    static const int32_t  fromIBM943Offsskip[] = { 0, 4, 5, 7};
+    static const int32_t  fromIBM943Offssub[]  = { 0, 2, 3, 4, 5, 7 };
+    static const int32_t  fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 };
     static const int32_t  fromIBM943Offsstop[] = { 0};
 
     gInBufferSize = inputsize;
@@ -2537,9 +2537,9 @@
 {
     static const uint8_t sampleText[] = {
         0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82,
-        0xff, /*0x82, 0xa9,*/ 0x32, 0x33};
-    static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063,  0xfffd,/*0x304b,*/ 0x0032, 0x0033};
-    static const int32_t  fromIBM943Offssub[]  = {0, 2, 3, 4, 5, 7, 8};
+        0xff, 0x32, 0x33};
+    static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 };
+    static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 };
     /*checking illegal value for ibm-943 with substitute*/ 
     gInBufferSize = inputsize;
     gOutBufferSize = outputsize;
diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c
--- icu.6175/source/test/cintltst/nucnvtst.c	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/cintltst/nucnvtst.c	2009-06-02 15:47:38.000000000 +0100
@@ -2606,7 +2606,7 @@
     TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source");
     /*Test for the condition where there is an invalid character*/
     {
-        static const uint8_t source2[]={0xa1, 0x01};
+        static const uint8_t source2[]={0xa1, 0x80};
         TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character");
     }
     /*Test for the condition where we have a truncated char*/
@@ -3899,11 +3899,11 @@
 TestISO_2022_KR() {
     /* test input */
     static const uint16_t in[]={
-                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D
-                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04
+                    0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D
+                   ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04
                    ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029
                    ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB
-                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2
+                   ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2
                    ,0x53E3,0x53E4,0x000A,0x000D};
     const UChar* uSource;
     const UChar* uSourceLimit;
diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt
--- icu.6175/source/test/testdata/conversion.txt	2009-06-02 15:47:18.000000000 +0100
+++ icu/source/test/testdata/conversion.txt	2009-06-02 15:57:41.000000000 +0100
@@ -48,12 +48,144 @@
     toUnicode {
       Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" }
       Cases {
+        // Test ticket 5691: consistent illegal sequences
+        // The following test cases are for illegal character byte sequences.
+        //
+        // Unfortunately, we cannot use the Shift-JIS examples from the ticket
+        // comments because our Shift-JIS table is Windows-compatible and
+        // therefore has no illegal single bytes. Same for GBK.
+        // Instead, we use the stricter GB 18030 also for 2-byte examples.
+        // The byte sequences are generally slightly different from the ticket
+        // comment, simply using assigned characters rather than just
+        // theoretically valid sequences.
+        {
+          "gb18030",
+          :bin{ 618140813c81ff7a },
+          "a\u4e02\\x81<\\x81\\xFFz",
+          :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "EUC-JP",
+          :bin{ 618fb0a98fb03c8f3cb0a97a },
+          "a\u4e28\\x8F\\xB0<\\x8F<\u9022z",
+          :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "gb18030",
+          :bin{ 618130fc318130fc8181303c3e813cfc817a },
+          "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z",
+          :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "UTF-8",
+          :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a },
+          "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z",
+          :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-JP",
+          :bin{ 1b24424141af4142affe41431b2842 },
+          "\u758f\\xAF\u758e\\xAF\\xFE\u790e",
+          :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ibm-25546",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-KR",
+          :bin{ 411b242943420e4141af4142affe41430f5a },
+          "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 411b242941420e4141af4142affe41430f5a },
+          "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "HZ",
+          :bin{ 417e7b4141af4142affe41437e7d5a },
+          "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z",
+          :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: consistent illegal sequences
+        // The following test cases are for illegal escape/designator/shift sequences.
+        //
+        // ISO-2022-JP and -CN with illegal escape sequences.
+        {
+          "ISO-2022-JP",
+          :bin{ 611b24201b244241411b283f1b28427a },
+          "a\\x1B$ \u758f\\x1B\u2538z",
+          :intvector{ 0,1,1,1,1,2,3,7,9,9,9,9,10,15 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN",
+          :bin{ 611b2429201b2429410e41410f7a },
+          "a\\x1B$) \u4eaez",
+          :intvector{ 0,1,1,1,1,2,3,4,10,13 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: ISO-2022-JP-2 with illegal single-shift SS2 and SS3 sequences.
+        // The first ESC N comes before its designator sequence, the last sequence is ESC+space.
+        {
+          "ISO-2022-JP-2",
+          :bin{ 4e1b4e4e1b2e414e1b4e4e4e1b204e },
+          "N\\x1BNNN\xceN\\x1B N",
+          :intvector{ 0,1,1,1,1,2,3,7,10,11,12,12,12,12,13,14 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN-EXT",
+          :bin{ 4e1b4e4e1b242a484e1b4e4e4e4e1b204e },
+          "N\\x1BNNN\u8f0eN\\x1B N",
+          :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        {
+          "ISO-2022-CN-EXT",
+          :bin{ 4f1b4f4f1b242b494f1b4f4f4f4f1b204f },
+          "O\\x1BOOO\u492bO\\x1B O",
+          :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 },
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: HZ with illegal tilde sequences.
+        {
+          "HZ",
+          :bin{ 417e20427e21437e80447e7b41417e207e41427e7f41437e7d5a },
+          "A\\x7E B\\x7E!C\\x7E\\x80D\u4eae\\x7E\\x20\\x7E\u8c05\\x7E\\x7F\u64a9Z",
+          :intvector{ 0,1,1,1,1,2,3,4,4,4,4,5,6,7,7,7,7,7,7,7,7,9,                          // SBCS
+                      12,14,14,14,14,14,14,14,14,16,16,16,16,17,19,19,19,19,19,19,19,19,21, // DBCS
+                      25 },                                                                 // SBCS
+          :int{1}, :int{0}, "", "&C", :bin{""}
+        }
+        // Test ticket 5691: Example from Peter Edberg.
+        {
+          "ISO-2022-JP",
+          :bin{ 1b244230212f7e742630801b284a621b2458631b2842648061 },
+          "\u4e9c\ufffd\u7199\ufffdb\ufffd$Xcd\ufffda",
+          :intvector{ 3,5,7,9,14,15,16,17,18,22,23,24 },
+          :int{1}, :int{0}, "", "?", :bin{""}
+        }
         // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e
         {
           "HZ",
-          :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b },
-          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+",
-          :intvector{ 2,4,6,8,10,12,14,18,19,21,24 },
+          :bin{ 7e7b21212120217e217f772100007e217e7e7d207e7e807e0a2b },
+          "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd\u3013 ~\ufffd+",
+          :intvector{ 2,4,6,8,10,12,14,15,19,20,22,25 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and
@@ -61,8 +193,8 @@
         {
           "ISO-2022-JP",
           :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 },
-          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
-          :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 },
+          "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e",
+          :intvector{ 3,4,5,9,11,12,14,16,17,19,21,23,25,27 },
           :int{1}, :int{1}, "", "?", :bin{""}
         }
         // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()
@@ -341,7 +473,7 @@
         {
           "ISO-2022-CN-EXT",
           :bin{ 411b4e2121 }, "\x41", :intvector{ 0 },
-          :int{1}, :int{1}, "illesc", ".", :bin{ 1b4e }
+          :int{1}, :int{1}, "illesc", ".", :bin{ 1b }
         }
         // G3 designator: recognized, but not supported for -CN (only for -CN-EXT)
         {

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 25 Aug 2009 20:36:03 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 25 Aug 2009 20:36:03 GMT) (full text, mbox, link).

Message #25 received at 534590@bugs.debian.org (full text, mbox, reply):

Okay, bad brain day.  Now I've posted the same patches three times.
Sorry.  Must have been looking at the wrong bug report or something.

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 08 Sep 2009 01:45:08 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 08 Sep 2009 01:45:08 GMT) (full text, mbox, link).

Message #30 received at 534590@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

I have backported CVE-2009-0153 into debian's stable and old-stable ICU
packages based on Red Hat's backporting.

Backporting CVE-2009-0153 using Red Hat's patches was a bit tricky since
their backport depended on several earlier patches, some of which we
had, and some of which we didn't, but I managed to figure out the patch
dependencies.  This involved pulling in a few additional patches of
theirs and reworking one we already had for an earlier issue.  Basically
I replaced our patch for CVE-2008-1036 with Red Hat's after comparing
the two patches and determining that they differed pretty much only by
offsets.  3.6 was slightly more difficult than 3.8.1, but after figuring
out the patch dependencies, they were quite similar.  3.6 required one
patch beyond what 3.8.1 required.

I don't have etch or lenny build environments, but I built the packages
on squeeze and manually ran the test suites on the built results.  In
both cases, the test suites pass, so I think we can have pretty high
confidence that this didn't break anything and correctly applied the
patches.  My confidence is somewhat higher with the 3.8 version than the
3.6 version, but it's pretty high in both cases.

I'm attaching a tarfile containing _source.changes, .diff.gz, and .dsc
files for icu_3.6-2etch4 and icu_3.8.1-3+lenny2.  I've signed the
changes and dsc files, but obviously feel free to make any changes
required before uploading (needless to say).

-- 
Jay Berkenbilt <qjb@debian.org>

[icu-CVE-2009-0153-backports.tar.gz (application/octet-stream, attachment)]

[Message part 3 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jay Berkenbilt <qjb@debian.org>:
Bug#534590; Package icu. (Tue, 08 Sep 2009 20:06:03 GMT) (full text, mbox, link).

Acknowledgement sent to Moritz Muehlenhoff <jmm@inutil.org>:
Extra info received and forwarded to list. Copy sent to Jay Berkenbilt <qjb@debian.org>. (Tue, 08 Sep 2009 20:06:04 GMT) (full text, mbox, link).

Message #35 received at 534590@bugs.debian.org (full text, mbox, reply):

On Mon, Sep 07, 2009 at 09:34:40PM -0400, Jay Berkenbilt wrote:
> 
> I have backported CVE-2009-0153 into debian's stable and old-stable ICU
> packages based on Red Hat's backporting.
> 
> Backporting CVE-2009-0153 using Red Hat's patches was a bit tricky since
> their backport depended on several earlier patches, some of which we
> had, and some of which we didn't, but I managed to figure out the patch
> dependencies.  This involved pulling in a few additional patches of
> theirs and reworking one we already had for an earlier issue.  Basically
> I replaced our patch for CVE-2008-1036 with Red Hat's after comparing
> the two patches and determining that they differed pretty much only by
> offsets.  3.6 was slightly more difficult than 3.8.1, but after figuring
> out the patch dependencies, they were quite similar.  3.6 required one
> patch beyond what 3.8.1 required.
> 
> I don't have etch or lenny build environments, but I built the packages
> on squeeze and manually ran the test suites on the built results.  In
> both cases, the test suites pass, so I think we can have pretty high
> confidence that this didn't break anything and correctly applied the
> patches.  My confidence is somewhat higher with the 3.8 version than the
> 3.6 version, but it's pretty high in both cases.
> 
> I'm attaching a tarfile containing _source.changes, .diff.gz, and .dsc
> files for icu_3.6-2etch4 and icu_3.8.1-3+lenny2.  I've signed the
> changes and dsc files, but obviously feel free to make any changes
> required before uploading (needless to say).

Thanks, I'll review, build, test and upload. What testing do you
propose? Any application using ICU, which is especially good for
testing or is there even a test suite?

Cheers,
        Moritz

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 08 Sep 2009 20:24:04 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 08 Sep 2009 20:24:04 GMT) (full text, mbox, link).

Message #40 received at 534590@bugs.debian.org (full text, mbox, reply):

Moritz Muehlenhoff <jmm@inutil.org> wrote:

> On Mon, Sep 07, 2009 at 09:34:40PM -0400, Jay Berkenbilt wrote:

>> I have backported CVE-2009-0153 into debian's stable and old-stable ICU
>> packages based on Red Hat's backporting.

>> . . .

>> I don't have etch or lenny build environments, but I built the packages
>> on squeeze and manually ran the test suites on the built results.  In
>> both cases, the test suites pass, so I think we can have pretty high
>> confidence that this didn't break anything and correctly applied the
>> patches.  My confidence is somewhat higher with the 3.8 version than the
>> 3.6 version, but it's pretty high in both cases.

>> . . .
>
> Thanks, I'll review, build, test and upload. What testing do you
> propose? Any application using ICU, which is especially good for
> testing or is there even a test suite?

The ICU packages themselves have test suites, which I ran in both
cases.  The test suites are not run by default by debian/rules because
they take a long time and, in some case in the past, I had builds fail
because of timeouts in autobuilders on slower platforms like the now
defunct m68k and arm ports.  Maybe I should consider turning this back
on.

In any case, after building the package, just cd to
build-area/icu-x.y.z/icu/source and run make check.

As for other testing, openoffice.org uses ICU, so opening up various
translations in openoffice.org is probably a reasonable test.  Based on
past experience, I generally find the ICU test suite to be pretty
trustworthy.

When I test ICU myself before uploading to unstable, I also run the test
suite for some of my own applications that link with xerces-c and use
that to read UTF-8 encoded XML, since xerces-c uses ICU, but that's a
weaker test case than running ICU's test suite.

Hmm.  It occurs to me that Red Hat's 3.6 packages include a patch to
roll back ABI changes introduced by other patches.  I'll check into that
and get back to you momentarily to see whether it's necessary to apply
all or part of it.  In the mean time, I'd definitely suggest using
openoffice.org at least to make sure everything appears to work.  I
don't have a specific test case that exercises the bug that this patch
is designed to fix.  Maybe there's one mentioned in the bug report or
one of its links.

I'll reply again about the ABI issue.

-- 
Jay Berkenbilt <qjb@debian.org>

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 08 Sep 2009 20:33:19 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 08 Sep 2009 20:33:19 GMT) (full text, mbox, link).

Message #45 received at 534590@bugs.debian.org (full text, mbox, reply):

Stand by....I'm going to give you an updated diff.gz and dsc for 3.6
that includes the ABI rollback patch.  This appears not to be necessary
for 3.8.1.

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#534590; Package icu. (Tue, 08 Sep 2009 20:39:08 GMT) (full text, mbox, link).

Acknowledgement sent to Jay Berkenbilt <qjb@debian.org>:
Extra info received and forwarded to list. (Tue, 08 Sep 2009 20:39:08 GMT) (full text, mbox, link).

Message #50 received at 534590@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Okay, here are updated files for 3.6.  These also pass their own test
suite.  (Run their test suite from build-tree/icu-3.6/icu/source, not
build-area as I said in my earlier message.)  You should test with these
instead of the ones I sent yesterday.  This includes the ABI rollback
patch.  It also builds fine under squeeze and passes its test suite.
The 3.8.1 files I sent should be okay, but you should obviously test
them as well.  Sorry about the omission.

--Jay

[icu-CVE-2009-0153-backports-3.6-updated.tar.gz (application/x-gzip, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jay Berkenbilt <qjb@debian.org>:
Bug#534590; Package icu. (Tue, 08 Sep 2009 20:42:03 GMT) (full text, mbox, link).

Message #55 received at 534590@bugs.debian.org (full text, mbox, reply):

On Tue, Sep 08, 2009 at 04:26:48PM -0400, Jay Berkenbilt wrote:
> 
> Okay, here are updated files for 3.6.  These also pass their own test
> suite.  (Run their test suite from build-tree/icu-3.6/icu/source, not
> build-area as I said in my earlier message.)  You should test with these
> instead of the ones I sent yesterday.  This includes the ABI rollback
> patch.  It also builds fine under squeeze and passes its test suite.
> The 3.8.1 files I sent should be okay, but you should obviously test
> them as well.  Sorry about the omission.

Thanks, I'll look into it tomorrow.

Cheers,
        Moritz

Reply sent to Jay Berkenbilt <qjb@debian.org>:
You have taken responsibility. (Thu, 17 Sep 2009 02:30:03 GMT) (full text, mbox, link).

Notification sent to Kees Cook <kees@debian.org>:
Bug acknowledged by developer. (Thu, 17 Sep 2009 02:30:03 GMT) (full text, mbox, link).

Message #60 received at 534590-close@bugs.debian.org (full text, mbox, reply):

Source: icu
Source-Version: 3.8.1-3+lenny2

We believe that the bug you reported is fixed in the latest version of
icu, which is due to be installed in the Debian FTP archive:

icu-doc_3.8.1-3+lenny2_all.deb
  to pool/main/i/icu/icu-doc_3.8.1-3+lenny2_all.deb
icu_3.8.1-3+lenny2.diff.gz
  to pool/main/i/icu/icu_3.8.1-3+lenny2.diff.gz
icu_3.8.1-3+lenny2.dsc
  to pool/main/i/icu/icu_3.8.1-3+lenny2.dsc
libicu-dev_3.8.1-3+lenny2_i386.deb
  to pool/main/i/icu/libicu-dev_3.8.1-3+lenny2_i386.deb
libicu38-dbg_3.8.1-3+lenny2_i386.deb
  to pool/main/i/icu/libicu38-dbg_3.8.1-3+lenny2_i386.deb
libicu38_3.8.1-3+lenny2_i386.deb
  to pool/main/i/icu/libicu38_3.8.1-3+lenny2_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 534590@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Jay Berkenbilt <qjb@debian.org> (supplier of updated icu package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Mon, 07 Sep 2009 20:00:39 -0400
Source: icu
Binary: libicu38 libicu38-dbg libicu-dev lib32icu38 lib32icu-dev icu-doc
Architecture: source all i386
Version: 3.8.1-3+lenny2
Distribution: stable-security
Urgency: high
Maintainer: Jay Berkenbilt <qjb@debian.org>
Changed-By: Jay Berkenbilt <qjb@debian.org>
Description: 
 icu-doc    - API documentation for ICU classes and functions
 lib32icu-dev - Development files for International Components for Unicode (32-bi
 lib32icu38 - International Components for Unicode (32-bit)
 libicu-dev - Development files for International Components for Unicode
 libicu38   - International Components for Unicode
 libicu38-dbg - International Components for Unicode
Closes: 534590
Changes: 
 icu (3.8.1-3+lenny2) stable-security; urgency=high
 .
   * Apply patch CVE-2009-0153.patch to fix problem handling invalid byte
     sequences during Unicode conversion.  Thanks to Red Hat for
     backporting the patch to ICU version 3.8.1.  Applying this patch to
     the debian package required pulling in three additional Red Hat
     patches for tickets 5797, 6001, and 6002 in ICU's issue tracking
     system as well as adjusting offsets in CVE-2008-1036.patch.  (Closes:
     #534590)
Checksums-Sha1: 
 9e19a04c83c78507a17a4b63fc3cd888dff1f0fe 1298 icu_3.8.1-3+lenny2.dsc
 380e3dffcf780226e3c48c5a1c075513554f5083 41943 icu_3.8.1-3+lenny2.diff.gz
 fd1b481cf86ffe12f003bbb88202a089b7f7ab90 3659700 icu-doc_3.8.1-3+lenny2_all.deb
 61be29d34827455982116937e8722928c3af625e 5918780 libicu38_3.8.1-3+lenny2_i386.deb
 08f530b306a9dd05dcb27cabb8e49cc82089817c 2278340 libicu38-dbg_3.8.1-3+lenny2_i386.deb
 75862e07dfc12a6bd37aab9ca0b7bcd7d09effa9 6975168 libicu-dev_3.8.1-3+lenny2_i386.deb
Checksums-Sha256: 
 47afc1ac4ae7d37f8f26d894de75082a71a6653b6336eca853ec165a9e6b4d90 1298 icu_3.8.1-3+lenny2.dsc
 ba7999f8ff453a10163edffcaf5f7f9ca946100b4f83da47c68616860ee1cb4f 41943 icu_3.8.1-3+lenny2.diff.gz
 ad8ded72ac122d6609b88e66cc0f75c2e06f079466cd8287e8ef9d9c11c5259d 3659700 icu-doc_3.8.1-3+lenny2_all.deb
 6450aab55d8ea87cf14a46f804a0c5ebdce82890cd4c51b67026db4820bb467d 5918780 libicu38_3.8.1-3+lenny2_i386.deb
 5a5c53fda354063f299a9c5d06d2a0d97c096b9d7707ece07987c574162aaf4a 2278340 libicu38-dbg_3.8.1-3+lenny2_i386.deb
 942a3560e7d1c4a8641cb1a11283ce72c99b5473366acffea5244efe77f365a0 6975168 libicu-dev_3.8.1-3+lenny2_i386.deb
Files: 
 e0528ce00964025af9b2f940f588664a 1298 libs optional icu_3.8.1-3+lenny2.dsc
 57d76fe9884c543a634bfd44425a42c6 41943 libs optional icu_3.8.1-3+lenny2.diff.gz
 69882d02e07863b195b7e9b798bdeff2 3659700 doc optional icu-doc_3.8.1-3+lenny2_all.deb
 a471bd785fecadc4a7acd91be38a1bca 5918780 libs optional libicu38_3.8.1-3+lenny2_i386.deb
 b95d691813f7d32d7bc1a8aa96ddcd94 2278340 libs extra libicu38-dbg_3.8.1-3+lenny2_i386.deb
 e5c844c5ce908655075dd49c57182b3f 6975168 libdevel optional libicu-dev_3.8.1-3+lenny2_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkqvzHkACgkQXm3vHE4uylps0gCcCkfaGXyWtJrz8fqno3S/XXoM
zfwAoNAmgYLnJww5PNtseCjH1TVNeV0q
=1o3y
-----END PGP SIGNATURE-----

Reply sent to Jay Berkenbilt <qjb@debian.org>:
You have taken responsibility. (Mon, 05 Oct 2009 01:57:03 GMT) (full text, mbox, link).

Notification sent to Kees Cook <kees@debian.org>:
Bug acknowledged by developer. (Mon, 05 Oct 2009 01:57:03 GMT) (full text, mbox, link).

Message #65 received at 534590-close@bugs.debian.org (full text, mbox, reply):

Source: icu
Source-Version: 3.6-2etch4

We believe that the bug you reported is fixed in the latest version of
icu, which is due to be installed in the Debian FTP archive:

icu-doc_3.6-2etch4_all.deb
  to pool/main/i/icu/icu-doc_3.6-2etch4_all.deb
icu_3.6-2etch4.diff.gz
  to pool/main/i/icu/icu_3.6-2etch4.diff.gz
icu_3.6-2etch4.dsc
  to pool/main/i/icu/icu_3.6-2etch4.dsc
libicu36-dev_3.6-2etch4_i386.deb
  to pool/main/i/icu/libicu36-dev_3.6-2etch4_i386.deb
libicu36_3.6-2etch4_i386.deb
  to pool/main/i/icu/libicu36_3.6-2etch4_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 534590@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Jay Berkenbilt <qjb@debian.org> (supplier of updated icu package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Mon, 07 Sep 2009 20:21:59 -0400
Source: icu
Binary: libicu36-dev libicu36 icu-doc
Architecture: source all i386
Version: 3.6-2etch4
Distribution: oldstable-security
Urgency: high
Maintainer: Jay Berkenbilt <qjb@debian.org>
Changed-By: Jay Berkenbilt <qjb@debian.org>
Description: 
 icu-doc    - API documentation for ICU classes and functions
 libicu36   - International Components for Unicode (libraries)
 libicu36-dev - International Components for Unicode (development files)
Closes: 534590
Changes: 
 icu (3.6-2etch4) oldstable-security; urgency=high
 .
   * Apply patch CVE-2009-0153.patch to fix problem handling invalid byte
     sequences during Unicode conversion.  Thanks to Red Hat for
     backporting the patch to ICU version 3.6.  Applying this patch to the
     debian package required pulling in three additional Red Hat patches
     for tickets 5483, 5797, 6001, and 6002 in ICU's issue tracking system
     as well as adjusting offsets in CVE-2008-1036.patch.  (Closes:
     #534590)
Files: 
 8b600075600533ce08c9801ffa571a19 592 libs optional icu_3.6-2etch4.dsc
 601af38fe10a27e08e40985c409bc6c4 45190 libs optional icu_3.6-2etch4.diff.gz
 8bf16fb7db375fb14de7082bcb814733 3239572 doc optional icu-doc_3.6-2etch4_all.deb
 f5d9e50ecb224df9ae4f0c7057097f54 5470148 libs optional libicu36_3.6-2etch4_i386.deb
 d8e1c31e6f1d238353340a9b82da1ed8 6466444 libdevel optional libicu36-dev_3.6-2etch4_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKr8TrXm3vHE4uyloRAtbEAJ9FHPzNYtHX8cuG3Xf8mpD1+bP39wCgndMb
pIx4vAu9EHFoerZ+8wRn4Rs=
=xhfn
-----END PGP SIGNATURE-----

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 02 Nov 2009 07:41:42 GMT) (full text, mbox, link).

Send a report that this bug log contains spam.

Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Jun 19 13:02:21 2019; Machine Name: beach

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

does not properly handle invalid byte sequences during Unicode conversion

Debian Bug report logs - #534590
does not properly handle invalid byte sequences during Unicode conversion

Vulmon Search

About

Products

Connect

does not properly handle invalid byte sequences during Unicode conversion

Debian Bug report logs - #534590 does not properly handle invalid byte sequences during Unicode conversion

Vulmon Search

About

Products

Connect

Debian Bug report logs - #534590
does not properly handle invalid byte sequences during Unicode conversion