Package: icu; Maintainer for icu is Laszlo Boszormenyi (GCS) <gcs@debian.org>;
Reported by: Kees Cook <kees@debian.org>
Date: Thu, 25 Jun 2009 15:42:01 UTC
Severity: normal
Tags: security
Found in version 3.8.1-3+lenny1
Fixed in versions icu/3.8.1-3+lenny2, icu/3.6-2etch4
Done: Jay Berkenbilt <qjb@debian.org>
Bug is archived. No further changes may be made.
View this report as an mbox folder, status mbox, maintainer mbox
Report forwarded
to debian-bugs-dist@lists.debian.org, Jay Berkenbilt <qjb@debian.org>
:
Bug#534590
; Package icu
.
(Thu, 25 Jun 2009 15:42:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Kees Cook <kees@debian.org>
:
New Bug report received and forwarded. Copy sent to Jay Berkenbilt <qjb@debian.org>
.
(Thu, 25 Jun 2009 15:42:04 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: icu Version: 3.8.1-3+lenny1 Severity: normal Tags: security Hi! There is a security issue with the stable release of icu (it was fixed in 4.0.1, IIUC): http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-0153 "International Components for Unicode (ICU) 4.0, 3.6, and other 3.x versions, as used in Apple Mac OS X 10.5 before 10.5.7, iPhone OS 1.0 through 2.2.1, iPhone OS for iPod touch 1.1 through 2.2.1, Fedora 9 and 10, and possibly other operating systems, does not properly handle invalid byte sequences during Unicode conversion, which might allow remote attackers to conduct cross-site scripting (XSS) attacks." More details are here: https://bugzilla.redhat.com/show_bug.cgi?id=503071 Thanks! -Kees -- Kees Cook @debian.org
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Mon, 24 Aug 2009 16:21:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Mon, 24 Aug 2009 16:21:03 GMT) (full text, mbox, link).
Message #10 received at 534590@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
The reporter of this ICU security bug that impacts oldstable and stable but not testing or unstable was kind enough to refer to the Red Hat bugzilla entry for this. The people at Red Hat have backported the security fix to 3.6 (our oldstable) and 3.8 (our stable) versions of ICU (which appear in RHEL5 and Fedora 9). I have grabbed their SRPMs for the patched versions and extracted the patches that apply to the 3.6 and 3.8 versions. Attached here are the patches directly from those source RPMs, not modified in any way or tested for debian. I can integrate these into the debian packages prepare uploads for stable security and oldstable security, or I can defer to the security team to do the integration. Just let me know. It may be several days before I have a chance to work on it, but I have prepared stable security uploads for my packages before. I am grateful to Red Hat for doing the work of backporting to the older ICU versions. -- Jay Berkenbilt <qjb@debian.org>
[icu-3.8-CVE-2009-0153.patch (text/x-patch, inline)]
diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c --- icu.6175/source/common/ucnv2022.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnv2022.c 2009-06-11 18:09:48.000000000 +0100 @@ -1973,6 +1973,7 @@ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; cs = (StateEnum)pToU2022State->cs[pToU2022State->g]; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -2102,18 +2103,45 @@ default: /* G0 DBCS */ if(mySource < mySourceLimit) { + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - if(cs == JISX208) { - _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); - } else { - tempBuf[0] = (char)mySourceChar; - tempBuf[1] = trailByte; - } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); - } else { + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + uint32_t tmpSourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); + if (leadIsOk) { + if(cs == JISX208) { + _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); + mySourceChar = tmpSourceChar; + } else { + /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ + mySourceChar = tmpSourceChar; + if (cs == KSC5601) { + tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ + } + tempBuf[0] = (char)(tmpSourceChar >> 8); + tempBuf[1] = (char)(tmpSourceChar); + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); + } else { + mySourceChar = tmpSourceChar; + } + } + } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; goto endloop; @@ -2254,7 +2282,12 @@ } /* only DBCS or SBCS characters are expected*/ /* DB characters with high bit set to 1 are expected */ - if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){ + if( length > 2 || length==0 || + (length == 1 && targetByteUnit > 0x7f) || + (length == 2 && + ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) || + (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1))) + ) { targetByteUnit=missingCharMarker; } if (targetByteUnit != missingCharMarker){ @@ -2583,17 +2616,36 @@ myData->isEmptySegment = FALSE; /* Any invalid char errors will be detected separately, so just reset this */ if(myData->toU2022State.g == 1) { if(mySource < mySourceLimit) { + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - tempBuf[0] = (char)(mySourceChar + 0x80); - tempBuf[1] = (char)(trailByte + 0x80); - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - if((mySourceChar & 0x8080) == 0) { - targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); + targetUniChar = missingCharMarker; + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + if (leadIsOk) { + tempBuf[0] = (char)(mySourceChar + 0x80); + tempBuf[1] = (char)(trailByte + 0x80); + targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); + } else { + leadIsOk = TRUE; /* TODO: remove */ + } + mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); } else { - /* illegal bytes > 0x7f */ - targetUniChar = missingCharMarker; + trailIsOk = TRUE; /* TODO: remove */ } } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; @@ -2601,8 +2653,10 @@ break; } } - else{ + else if(mySourceChar <= 0x7f) { targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback); + } else { + targetUniChar = 0xffff; } if(targetUniChar < 0xfffe){ if(args->offsets) { @@ -3099,6 +3153,7 @@ /* continue with a partial double-byte character */ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -3178,29 +3233,48 @@ UConverterSharedData *cnv; StateEnum tempState; int32_t tempBufLen; + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; - if(tempState > CNS_11643_0) { - cnv = myData->myConverterArray[CNS_11643]; - tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); - tempBuf[1] = (char) (mySourceChar); - tempBuf[2] = trailByte; - tempBufLen = 3; - - }else{ - cnv = myData->myConverterArray[tempState]; - tempBuf[0] = (char) (mySourceChar); - tempBuf[1] = trailByte; - tempBufLen = 2; + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + if (leadIsOk) { + tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; + if(tempState >= CNS_11643_0) { + cnv = myData->myConverterArray[CNS_11643]; + tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); + tempBuf[1] = (char) (mySourceChar); + tempBuf[2] = trailByte; + tempBufLen = 3; + + }else{ + cnv = myData->myConverterArray[tempState]; + tempBuf[0] = (char) (mySourceChar); + tempBuf[1] = trailByte; + tempBufLen = 2; + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); + } + mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); if(pToU2022State->g>=2) { /* return from a single-shift state to the previous one */ pToU2022State->g=pToU2022State->prevG; } - targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c --- icu.6175/source/common/ucnvhz.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnvhz.c 2009-06-11 18:05:36.000000000 +0100 @@ -215,19 +215,35 @@ } else{ /* trail byte */ + int leadIsOk, trailIsOk; uint32_t leadByte = args->converter->toUnicodeStatus & 0xff; - if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) && - (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21) - ) { - tempBuf[0] = (char) (leadByte+0x80) ; - tempBuf[1] = (char) (mySourceChar+0x80); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, - tempBuf, 2, args->converter->useFallback); + targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In HZ DBCS, if both bytes are valid or both bytes are outside + * the 21..7d/7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21); + trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + if (leadIsOk) { + tempBuf[0] = (char) (leadByte+0x80) ; + tempBuf[1] = (char) (mySourceChar+0x80); + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, + tempBuf, 2, args->converter->useFallback); + } + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; } else { - targetUniChar = 0xffff; + --mySource; + mySourceChar = (int32_t)leadByte; } - /* add another bit so that the code below writes 2 bytes in case of error */ - mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; args->converter->toUnicodeStatus =0x00; } } diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c --- icu.6175/source/common/ucnvmbcs.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnvmbcs.c 2009-06-11 18:05:36.000000000 +0100 @@ -1,7 +1,7 @@ /* ****************************************************************************** * -* Copyright (C) 2000-2007, International Business Machines +* Copyright (C) 2000-2008, International Business Machines * Corporation and others. All Rights Reserved. * ****************************************************************************** @@ -1791,6 +1791,65 @@ pArgs->offsets=offsets; } +static UBool +hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) { + const int32_t *row=stateTable[state]; + int32_t b, entry; + /* First test for final entries in this state for some commonly valid byte values. */ + entry=row[0xa1]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + entry=row[0x41]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + /* Then test for final entries in this state. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + } + /* Then recurse for transition entries. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( MBCS_ENTRY_IS_TRANSITION(entry) && + hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)) + ) { + return TRUE; + } + } + return FALSE; +} + +/* + * Is byte b a single/lead byte in this state? + * Recurse for transition states, because here we don't want to say that + * b is a lead byte if all byte sequences that start with b are illegal. + */ +static UBool +isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) { + const int32_t *row=stateTable[state]; + int32_t entry=row[b]; + if(MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ + return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)); + } else { + uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry)); + if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { + return FALSE; /* SI/SO are illegal for DBCS-only conversion */ + } else { + return action!=MBCS_STATE_ILLEGAL; + } + } +} + U_CFUNC void ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs, UErrorCode *pErrorCode) { @@ -2146,6 +2205,34 @@ sourceIndex=nextSourceIndex; } else if(U_FAILURE(*pErrorCode)) { /* callback(illegal) */ + if(byteIndex>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + int8_t i; + for(i=1; + i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]); + ++i) {} + if(i<byteIndex) { + /* Back out some bytes. */ + int8_t backOutDistance=byteIndex-i; + int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source); + byteIndex=i; /* length of reported illegal byte sequence */ + if(backOutDistance<=bytesFromThisBuffer) { + source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* preToULength is negative! */ + uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength); + source=(const uint8_t *)pArgs->source; + } + } + } break; } else /* unassigned sequences indicated with byteIndex>0 */ { /* try an extension mapping */ @@ -2156,6 +2243,7 @@ &offsets, sourceIndex, pArgs->flush, pErrorCode); + /* TODO: nextSourceIndex+=diff instead of nextSourceIndex+diff ?? */ sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source); if(U_FAILURE(*pErrorCode)) { @@ -2447,15 +2535,37 @@ if(c<0) { if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) { - *pErrorCode=U_TRUNCATED_CHAR_FOUND; - } - if(U_FAILURE(*pErrorCode)) { /* incomplete character byte sequence */ uint8_t *bytes=cnv->toUBytes; cnv->toULength=(int8_t)(source-lastSource); do { *bytes++=*lastSource++; } while(lastSource<source); + *pErrorCode=U_TRUNCATED_CHAR_FOUND; + } else if(U_FAILURE(*pErrorCode)) { + /* callback(illegal) */ + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + uint8_t *bytes=cnv->toUBytes; + *bytes++=*lastSource++; /* first byte */ + if(lastSource==source) { + cnv->toULength=1; + } else /* lastSource<source: multi-byte character */ { + int8_t i; + for(i=1; + lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource); + ++i + ) { + *bytes++=*lastSource++; + } + cnv->toULength=i; + source=lastSource; + } } else { /* no output because of empty input or only state changes */ *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR; diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c --- icu.6175/source/test/cintltst/nccbtst.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/cintltst/nccbtst.c 2009-06-11 18:05:36.000000000 +0100 @@ -2497,13 +2497,13 @@ static const uint8_t text943[] = { - 0x82, 0xa9, 0x82, 0x20, /*0xc8,*/ 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; - static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; - static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; + 0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; + static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22, 0x5b57 }; + static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22, 0x5b57 }; static const UChar toUnicode943stop[]= { 0x304b}; - static const int32_t fromIBM943Offssub[] = {0, 2, 4, 5, 7}; - static const int32_t fromIBM943Offsskip[] = { 0, 4, 5, 7}; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 7 }; + static const int32_t fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 }; static const int32_t fromIBM943Offsstop[] = { 0}; gInBufferSize = inputsize; @@ -2537,9 +2537,9 @@ { static const uint8_t sampleText[] = { 0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82, - 0xff, /*0x82, 0xa9,*/ 0x32, 0x33}; - static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063, 0xfffd,/*0x304b,*/ 0x0032, 0x0033}; - static const int32_t fromIBM943Offssub[] = {0, 2, 3, 4, 5, 7, 8}; + 0xff, 0x32, 0x33}; + static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 }; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 }; /*checking illegal value for ibm-943 with substitute*/ gInBufferSize = inputsize; gOutBufferSize = outputsize; diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c --- icu.6175/source/test/cintltst/nucnvtst.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/cintltst/nucnvtst.c 2009-06-11 18:05:36.000000000 +0100 @@ -2608,7 +2608,7 @@ TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source"); /*Test for the condition where there is an invalid character*/ { - static const uint8_t source2[]={0xa1, 0x01}; + static const uint8_t source2[]={0xa1, 0x80}; TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character"); } /*Test for the condition where we have a truncated char*/ @@ -3901,11 +3901,11 @@ TestISO_2022_KR() { /* test input */ static const uint16_t in[]={ - 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D - ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04 + 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D + ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04 ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029 ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB - ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2 + ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2 ,0x53E3,0x53E4,0x000A,0x000D}; const UChar* uSource; const UChar* uSourceLimit; diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt --- icu.6175/source/test/testdata/conversion.txt 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/testdata/conversion.txt 2009-06-11 18:05:36.000000000 +0100 @@ -48,12 +48,83 @@ toUnicode { Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" } Cases { + // Test ticket 5691: consistent illegal sequences + // Unfortunately, we cannot use the Shift-JIS examples from the ticket + // comments because our Shift-JIS table is Windows-compatible and + // therefore has no illegal single bytes. Same for GBK. + // Instead, we use the stricter GB 18030 also for 2-byte examples. + // The byte sequences are generally slightly different from the ticket + // comment, simply using assigned characters rather than just + // theoretically valid sequences. + { + "gb18030", + :bin{ 618140813c81ff7a }, + "a\u4e02\\x81<\\x81\\xFFz", + :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "EUC-JP", + :bin{ 618fb0a98fb03c8f3cb0a97a }, + "a\u4e28\\x8F\\xB0<\\x8F<\u9022z", + :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "gb18030", + :bin{ 618130fc318130fc8181303c3e813cfc817a }, + "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z", + :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "UTF-8", + :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a }, + "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z", + :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-JP-2", + :bin{ 1b24424141af4142affe41431b2842 }, + "\u758f\\xAF\u758e\\xAF\\xFE\u790e", + :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ibm-25546", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-KR", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 411b242941420e4141af4142affe41430f5a }, + "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "HZ", + :bin{ 417e7b4141af4142affe41437e7d5a }, + "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e { "HZ", :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b }, - "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", - :intvector{ 2,4,6,8,10,12,14,18,19,21,24 }, + "\u3000\ufffd\ufffd\u3013\ufffd\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", + :intvector{ 2,4,5,6,8,9,10,12,14,18,19,21,24 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and @@ -61,8 +132,8 @@ { "ISO-2022-JP", :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 }, - "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", - :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 }, + "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\ufffd\ufffd\u25b2\ufffd\ufffd\u6f3e", + :intvector{ 3,4,5,9,11,12,13,14,16,17,19,20,21,22,23,25,26,27 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()
[icu-3.6-CVE-2009-0153.patch (text/x-patch, inline)]
diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c --- icu.6175/source/common/ucnv2022.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnv2022.c 2009-06-02 16:03:15.000000000 +0100 @@ -754,6 +754,7 @@ UConverterDataISO2022* myData2022 = ((UConverterDataISO2022*)_this->extraInfo); uint32_t key = myData2022->key; int32_t offset = 0; + int8_t initialToULength = _this->toULength; char c; value = VALID_NON_TERMINAL_2022; @@ -806,7 +807,6 @@ return; } else if (value == INVALID_2022 ) { *err = U_ILLEGAL_ESCAPE_SEQUENCE; - return; } else /* value == VALID_TERMINAL_2022 */ { switch(var){ #ifdef U_ENABLE_GENERIC_ISO_2022 @@ -938,6 +938,35 @@ } if(U_SUCCESS(*err)) { _this->toULength = 0; + } else if(*err==U_ILLEGAL_ESCAPE_SEQUENCE) { + if(_this->toULength>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte (ESC) in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * In escape sequences, all following bytes are "printable", that is, + * unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS), + * they are valid single/lead bytes. + * For simplicity, we always only report the initial ESC byte as the + * illegal sequence and back out all other bytes we looked at. + */ + /* Back out some bytes. */ + int8_t backOutDistance=_this->toULength-1; + int8_t bytesFromThisBuffer=_this->toULength-initialToULength; + if(backOutDistance<=bytesFromThisBuffer) { + /* same as initialToULength<=1 */ + *source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + _this->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* same as -(initialToULength-1) */ + /* preToULength is negative! */ + uprv_memcpy(_this->preToU, _this->toUBytes+1, -_this->preToULength); + *source-=bytesFromThisBuffer; + } + _this->toULength=1; + } } else if(*err==U_UNSUPPORTED_ESCAPE_SEQUENCE) { _this->toUCallbackReason = UCNV_UNASSIGNED; } @@ -1973,6 +2002,7 @@ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; cs = (StateEnum)pToU2022State->cs[pToU2022State->g]; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -2102,17 +2132,44 @@ default: /* G0 DBCS */ if(mySource < mySourceLimit) { - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - if(cs == JISX208) { - _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); - } else { - tempBuf[0] = (char)mySourceChar; - tempBuf[1] = trailByte; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + uint32_t tmpSourceChar = (mySourceChar << 8) | trailByte; + if(cs == JISX208) { + _2022ToSJIS((uint8_t)mySourceChar, trailByte, tempBuf); + mySourceChar = tmpSourceChar; + } else { + /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ + mySourceChar = tmpSourceChar; + if (cs == KSC5601) { + tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ + } + tempBuf[0] = (char)(tmpSourceChar >> 8); + tempBuf[1] = (char)(tmpSourceChar); + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; @@ -2254,7 +2311,12 @@ } /* only DBCS or SBCS characters are expected*/ /* DB characters with high bit set to 1 are expected */ - if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){ + if( length > 2 || length==0 || + (length == 1 && targetByteUnit > 0x7f) || + (length == 2 && + ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) || + (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1))) + ) { targetByteUnit=missingCharMarker; } if (targetByteUnit != missingCharMarker){ @@ -2583,17 +2645,34 @@ myData->isEmptySegment = FALSE; /* Any invalid char errors will be detected separately, so just reset this */ if(myData->toU2022State.g == 1) { if(mySource < mySourceLimit) { - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - tempBuf[0] = (char)(mySourceChar + 0x80); - tempBuf[1] = (char)(trailByte + 0x80); - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - if((mySourceChar & 0x8080) == 0) { + targetUniChar = missingCharMarker; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + tempBuf[0] = (char)(mySourceChar + 0x80); + tempBuf[1] = (char)(trailByte + 0x80); targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); - } else { - /* illegal bytes > 0x7f */ - targetUniChar = missingCharMarker; + mySourceChar = (mySourceChar << 8) | trailByte; + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; @@ -2601,8 +2680,10 @@ break; } } - else{ + else if(mySourceChar <= 0x7f) { targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback); + } else { + targetUniChar = 0xffff; } if(targetUniChar < 0xfffe){ if(args->offsets) { @@ -3099,6 +3180,7 @@ /* continue with a partial double-byte character */ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -3178,29 +3260,50 @@ UConverterSharedData *cnv; StateEnum tempState; int32_t tempBufLen; - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; - if(tempState > CNS_11643_0) { - cnv = myData->myConverterArray[CNS_11643]; - tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); - tempBuf[1] = (char) (mySourceChar); - tempBuf[2] = trailByte; - tempBufLen = 3; - - }else{ - cnv = myData->myConverterArray[tempState]; - tempBuf[0] = (char) (mySourceChar); - tempBuf[1] = trailByte; - tempBufLen = 2; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; + if(tempState >= CNS_11643_0) { + cnv = myData->myConverterArray[CNS_11643]; + tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); + tempBuf[1] = (char) (mySourceChar); + tempBuf[2] = (char) trailByte; + tempBufLen = 3; + + }else{ + cnv = myData->myConverterArray[tempState]; + tempBuf[0] = (char) (mySourceChar); + tempBuf[1] = (char) trailByte; + tempBufLen = 2; + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); + mySourceChar = (mySourceChar << 8) | trailByte; + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); if(pToU2022State->g>=2) { /* return from a single-shift state to the previous one */ pToU2022State->g=pToU2022State->prevG; } - targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c --- icu.6175/source/common/ucnvhz.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnvhz.c 2009-06-02 15:57:18.000000000 +0100 @@ -196,10 +196,30 @@ /* if the first byte is equal to TILDE and the trail byte * is not a valid byte then it is an error condition */ - mySourceChar = 0x7e00 | mySourceChar; - targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ myData->isEmptySegment = FALSE; /* different error here, reset this to avoid spurious future error */ - break; + *err = U_ILLEGAL_ESCAPE_SEQUENCE; + args->converter->toUBytes[0] = UCNV_TILDE; + if( myData->isStateDBCS ? + (0x21 <= mySourceChar && mySourceChar <= 0x7e) : + mySourceChar <= 0x7f + ) { + /* The current byte could be the start of a character: Back it out. */ + args->converter->toULength = 1; + --mySource; + } else { + /* Include the current byte in the illegal sequence. */ + args->converter->toUBytes[1] = mySourceChar; + args->converter->toULength = 2; + } + args->target = myTarget; + args->source = mySource; + return; } } else if(myData->isStateDBCS) { if(args->converter->toUnicodeStatus == 0x00){ @@ -215,19 +235,36 @@ } else{ /* trail byte */ + int leadIsOk, trailIsOk; uint32_t leadByte = args->converter->toUnicodeStatus & 0xff; - if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) && - (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21) - ) { + targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In HZ DBCS, if the second byte is in the 21..7e range, + * we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21); + trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { tempBuf[0] = (char) (leadByte+0x80) ; tempBuf[1] = (char) (mySourceChar+0x80); targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, tempBuf, 2, args->converter->useFallback); + mySourceChar= (leadByte << 8) | mySourceChar; + } else if (trailIsOk) { + /* report a single illegal byte and continue with the following DBCS starter byte */ + --mySource; + mySourceChar = (int32_t)leadByte; } else { - targetUniChar = 0xffff; + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; } - /* add another bit so that the code below writes 2 bytes in case of error */ - mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; args->converter->toUnicodeStatus =0x00; } } diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c --- icu.6175/source/common/ucnvmbcs.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnvmbcs.c 2009-06-02 15:56:07.000000000 +0100 @@ -1697,6 +1697,65 @@ pArgs->offsets=offsets; } +static UBool +hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) { + const int32_t *row=stateTable[state]; + int32_t b, entry; + /* First test for final entries in this state for some commonly valid byte values. */ + entry=row[0xa1]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + entry=row[0x41]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + /* Then test for final entries in this state. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + } + /* Then recurse for transition entries. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( MBCS_ENTRY_IS_TRANSITION(entry) && + hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)) + ) { + return TRUE; + } + } + return FALSE; +} + +/* + * Is byte b a single/lead byte in this state? + * Recurse for transition states, because here we don't want to say that + * b is a lead byte if all byte sequences that start with b are illegal. + */ +static UBool +isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) { + const int32_t *row=stateTable[state]; + int32_t entry=row[b]; + if(MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ + return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)); + } else { + uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry)); + if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { + return FALSE; /* SI/SO are illegal for DBCS-only conversion */ + } else { + return action!=MBCS_STATE_ILLEGAL; + } + } +} + U_CFUNC void ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs, UErrorCode *pErrorCode) { @@ -2052,6 +2111,34 @@ sourceIndex=nextSourceIndex; } else if(U_FAILURE(*pErrorCode)) { /* callback(illegal) */ + if(byteIndex>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + int8_t i; + for(i=1; + i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]); + ++i) {} + if(i<byteIndex) { + /* Back out some bytes. */ + int8_t backOutDistance=byteIndex-i; + int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source); + byteIndex=i; /* length of reported illegal byte sequence */ + if(backOutDistance<=bytesFromThisBuffer) { + source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* preToULength is negative! */ + uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength); + source=(const uint8_t *)pArgs->source; + } + } + } break; } else /* unassigned sequences indicated with byteIndex>0 */ { /* try an extension mapping */ @@ -2062,7 +2149,7 @@ &offsets, sourceIndex, pArgs->flush, pErrorCode); - sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source); + sourceIndex=nextSourceIndex+=(int32_t)(source-(const uint8_t *)pArgs->source); if(U_FAILURE(*pErrorCode)) { /* not mappable or buffer overflow */ @@ -2353,15 +2440,37 @@ if(c<0) { if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) { - *pErrorCode=U_TRUNCATED_CHAR_FOUND; - } - if(U_FAILURE(*pErrorCode)) { /* incomplete character byte sequence */ uint8_t *bytes=cnv->toUBytes; cnv->toULength=(int8_t)(source-lastSource); do { *bytes++=*lastSource++; } while(lastSource<source); + *pErrorCode=U_TRUNCATED_CHAR_FOUND; + } else if(U_FAILURE(*pErrorCode)) { + /* callback(illegal) */ + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + uint8_t *bytes=cnv->toUBytes; + *bytes++=*lastSource++; /* first byte */ + if(lastSource==source) { + cnv->toULength=1; + } else /* lastSource<source: multi-byte character */ { + int8_t i; + for(i=1; + lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource); + ++i + ) { + *bytes++=*lastSource++; + } + cnv->toULength=i; + source=lastSource; + } } else { /* no output because of empty input or only state changes */ *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR; diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c --- icu.6175/source/test/cintltst/nccbtst.c 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/cintltst/nccbtst.c 2009-06-02 15:47:38.000000000 +0100 @@ -2497,13 +2497,13 @@ static const uint8_t text943[] = { - 0x82, 0xa9, 0x82, 0x20, /*0xc8,*/ 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; - static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; - static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; + 0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; + static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22, 0x5b57 }; + static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22, 0x5b57 }; static const UChar toUnicode943stop[]= { 0x304b}; - static const int32_t fromIBM943Offssub[] = {0, 2, 4, 5, 7}; - static const int32_t fromIBM943Offsskip[] = { 0, 4, 5, 7}; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 7 }; + static const int32_t fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 }; static const int32_t fromIBM943Offsstop[] = { 0}; gInBufferSize = inputsize; @@ -2537,9 +2537,9 @@ { static const uint8_t sampleText[] = { 0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82, - 0xff, /*0x82, 0xa9,*/ 0x32, 0x33}; - static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063, 0xfffd,/*0x304b,*/ 0x0032, 0x0033}; - static const int32_t fromIBM943Offssub[] = {0, 2, 3, 4, 5, 7, 8}; + 0xff, 0x32, 0x33}; + static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 }; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 }; /*checking illegal value for ibm-943 with substitute*/ gInBufferSize = inputsize; gOutBufferSize = outputsize; diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c --- icu.6175/source/test/cintltst/nucnvtst.c 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/cintltst/nucnvtst.c 2009-06-02 15:47:38.000000000 +0100 @@ -2606,7 +2606,7 @@ TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source"); /*Test for the condition where there is an invalid character*/ { - static const uint8_t source2[]={0xa1, 0x01}; + static const uint8_t source2[]={0xa1, 0x80}; TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character"); } /*Test for the condition where we have a truncated char*/ @@ -3899,11 +3899,11 @@ TestISO_2022_KR() { /* test input */ static const uint16_t in[]={ - 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D - ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04 + 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D + ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04 ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029 ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB - ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2 + ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2 ,0x53E3,0x53E4,0x000A,0x000D}; const UChar* uSource; const UChar* uSourceLimit; diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt --- icu.6175/source/test/testdata/conversion.txt 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/testdata/conversion.txt 2009-06-02 15:57:41.000000000 +0100 @@ -48,12 +48,144 @@ toUnicode { Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" } Cases { + // Test ticket 5691: consistent illegal sequences + // The following test cases are for illegal character byte sequences. + // + // Unfortunately, we cannot use the Shift-JIS examples from the ticket + // comments because our Shift-JIS table is Windows-compatible and + // therefore has no illegal single bytes. Same for GBK. + // Instead, we use the stricter GB 18030 also for 2-byte examples. + // The byte sequences are generally slightly different from the ticket + // comment, simply using assigned characters rather than just + // theoretically valid sequences. + { + "gb18030", + :bin{ 618140813c81ff7a }, + "a\u4e02\\x81<\\x81\\xFFz", + :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "EUC-JP", + :bin{ 618fb0a98fb03c8f3cb0a97a }, + "a\u4e28\\x8F\\xB0<\\x8F<\u9022z", + :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "gb18030", + :bin{ 618130fc318130fc8181303c3e813cfc817a }, + "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z", + :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "UTF-8", + :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a }, + "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z", + :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-JP", + :bin{ 1b24424141af4142affe41431b2842 }, + "\u758f\\xAF\u758e\\xAF\\xFE\u790e", + :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ibm-25546", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-KR", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 411b242941420e4141af4142affe41430f5a }, + "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "HZ", + :bin{ 417e7b4141af4142affe41437e7d5a }, + "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: consistent illegal sequences + // The following test cases are for illegal escape/designator/shift sequences. + // + // ISO-2022-JP and -CN with illegal escape sequences. + { + "ISO-2022-JP", + :bin{ 611b24201b244241411b283f1b28427a }, + "a\\x1B$ \u758f\\x1B\u2538z", + :intvector{ 0,1,1,1,1,2,3,7,9,9,9,9,10,15 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 611b2429201b2429410e41410f7a }, + "a\\x1B$) \u4eaez", + :intvector{ 0,1,1,1,1,2,3,4,10,13 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: ISO-2022-JP-2 with illegal single-shift SS2 and SS3 sequences. + // The first ESC N comes before its designator sequence, the last sequence is ESC+space. + { + "ISO-2022-JP-2", + :bin{ 4e1b4e4e1b2e414e1b4e4e4e1b204e }, + "N\\x1BNNN\xceN\\x1B N", + :intvector{ 0,1,1,1,1,2,3,7,10,11,12,12,12,12,13,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN-EXT", + :bin{ 4e1b4e4e1b242a484e1b4e4e4e4e1b204e }, + "N\\x1BNNN\u8f0eN\\x1B N", + :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN-EXT", + :bin{ 4f1b4f4f1b242b494f1b4f4f4f4f1b204f }, + "O\\x1BOOO\u492bO\\x1B O", + :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: HZ with illegal tilde sequences. + { + "HZ", + :bin{ 417e20427e21437e80447e7b41417e207e41427e7f41437e7d5a }, + "A\\x7E B\\x7E!C\\x7E\\x80D\u4eae\\x7E\\x20\\x7E\u8c05\\x7E\\x7F\u64a9Z", + :intvector{ 0,1,1,1,1,2,3,4,4,4,4,5,6,7,7,7,7,7,7,7,7,9, // SBCS + 12,14,14,14,14,14,14,14,14,16,16,16,16,17,19,19,19,19,19,19,19,19,21, // DBCS + 25 }, // SBCS + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: Example from Peter Edberg. + { + "ISO-2022-JP", + :bin{ 1b244230212f7e742630801b284a621b2458631b2842648061 }, + "\u4e9c\ufffd\u7199\ufffdb\ufffd$Xcd\ufffda", + :intvector{ 3,5,7,9,14,15,16,17,18,22,23,24 }, + :int{1}, :int{0}, "", "?", :bin{""} + } // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e { "HZ", - :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b }, - "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", - :intvector{ 2,4,6,8,10,12,14,18,19,21,24 }, + :bin{ 7e7b21212120217e217f772100007e217e7e7d207e7e807e0a2b }, + "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd\u3013 ~\ufffd+", + :intvector{ 2,4,6,8,10,12,14,15,19,20,22,25 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and @@ -61,8 +193,8 @@ { "ISO-2022-JP", :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 }, - "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", - :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 }, + "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", + :intvector{ 3,4,5,9,11,12,14,16,17,19,21,23,25,27 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets() @@ -341,7 +473,7 @@ { "ISO-2022-CN-EXT", :bin{ 411b4e2121 }, "\x41", :intvector{ 0 }, - :int{1}, :int{1}, "illesc", ".", :bin{ 1b4e } + :int{1}, :int{1}, "illesc", ".", :bin{ 1b } } // G3 designator: recognized, but not supported for -CN (only for -CN-EXT) {
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 25 Aug 2009 20:21:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 25 Aug 2009 20:21:03 GMT) (full text, mbox, link).
Message #15 received at 534590@bugs.debian.org (full text, mbox, reply):
(Sorry if this is a duplicate -- I sent this yesterday but it still hasn't shown up on the bug in the BTS, so I don't know whether it got out.) The reporter of this ICU security bug that impacts oldstable and stable but not testing or unstable was kind enough to refer to the Red Hat bugzilla entry for this. The people at Red Hat have backported the security fix to 3.6 (our oldstable) and 3.8 (our stable) versions of ICU (which appear in RHEL5 and Fedora 9). I have grabbed their SRPMs for the patched versions and extracted the patches that apply to the 3.6 and 3.8 versions. Attached here are the patches directly from those source RPMs, not modified in any way or tested for debian. I can integrate these into the debian packages prepare uploads for stable security and oldstable security, or I can defer to the security team to do the integration. Just let me know. It may be several days before I have a chance to work on it, but I have prepared stable security uploads for my packages before. I am grateful to Red Hat for doing the work of backporting to the older ICU versions. -- Jay Berkenbilt <qjb@debian.org> [2. text/x-patch; icu-3.8-CVE-2009-0153.patch] diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c --- icu.6175/source/common/ucnv2022.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnv2022.c 2009-06-11 18:09:48.000000000 +0100 @@ -1973,6 +1973,7 @@ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; cs = (StateEnum)pToU2022State->cs[pToU2022State->g]; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -2102,18 +2103,45 @@ default: /* G0 DBCS */ if(mySource < mySourceLimit) { + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - if(cs == JISX208) { - _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); - } else { - tempBuf[0] = (char)mySourceChar; - tempBuf[1] = trailByte; - } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); - } else { + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + uint32_t tmpSourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); + if (leadIsOk) { + if(cs == JISX208) { + _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); + mySourceChar = tmpSourceChar; + } else { + /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ + mySourceChar = tmpSourceChar; + if (cs == KSC5601) { + tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ + } + tempBuf[0] = (char)(tmpSourceChar >> 8); + tempBuf[1] = (char)(tmpSourceChar); + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); + } else { + mySourceChar = tmpSourceChar; + } + } + } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; goto endloop; @@ -2254,7 +2282,12 @@ } /* only DBCS or SBCS characters are expected*/ /* DB characters with high bit set to 1 are expected */ - if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){ + if( length > 2 || length==0 || + (length == 1 && targetByteUnit > 0x7f) || + (length == 2 && + ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) || + (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1))) + ) { targetByteUnit=missingCharMarker; } if (targetByteUnit != missingCharMarker){ @@ -2583,17 +2616,36 @@ myData->isEmptySegment = FALSE; /* Any invalid char errors will be detected separately, so just reset this */ if(myData->toU2022State.g == 1) { if(mySource < mySourceLimit) { + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - tempBuf[0] = (char)(mySourceChar + 0x80); - tempBuf[1] = (char)(trailByte + 0x80); - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - if((mySourceChar & 0x8080) == 0) { - targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); + targetUniChar = missingCharMarker; + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + if (leadIsOk) { + tempBuf[0] = (char)(mySourceChar + 0x80); + tempBuf[1] = (char)(trailByte + 0x80); + targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); + } else { + leadIsOk = TRUE; /* TODO: remove */ + } + mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); } else { - /* illegal bytes > 0x7f */ - targetUniChar = missingCharMarker; + trailIsOk = TRUE; /* TODO: remove */ } } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; @@ -2601,8 +2653,10 @@ break; } } - else{ + else if(mySourceChar <= 0x7f) { targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback); + } else { + targetUniChar = 0xffff; } if(targetUniChar < 0xfffe){ if(args->offsets) { @@ -3099,6 +3153,7 @@ /* continue with a partial double-byte character */ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -3178,29 +3233,48 @@ UConverterSharedData *cnv; StateEnum tempState; int32_t tempBufLen; + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; - if(tempState > CNS_11643_0) { - cnv = myData->myConverterArray[CNS_11643]; - tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); - tempBuf[1] = (char) (mySourceChar); - tempBuf[2] = trailByte; - tempBufLen = 3; - - }else{ - cnv = myData->myConverterArray[tempState]; - tempBuf[0] = (char) (mySourceChar); - tempBuf[1] = trailByte; - tempBufLen = 2; + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + if (leadIsOk) { + tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; + if(tempState >= CNS_11643_0) { + cnv = myData->myConverterArray[CNS_11643]; + tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); + tempBuf[1] = (char) (mySourceChar); + tempBuf[2] = trailByte; + tempBufLen = 3; + + }else{ + cnv = myData->myConverterArray[tempState]; + tempBuf[0] = (char) (mySourceChar); + tempBuf[1] = trailByte; + tempBufLen = 2; + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); + } + mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); if(pToU2022State->g>=2) { /* return from a single-shift state to the previous one */ pToU2022State->g=pToU2022State->prevG; } - targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c --- icu.6175/source/common/ucnvhz.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnvhz.c 2009-06-11 18:05:36.000000000 +0100 @@ -215,19 +215,35 @@ } else{ /* trail byte */ + int leadIsOk, trailIsOk; uint32_t leadByte = args->converter->toUnicodeStatus & 0xff; - if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) && - (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21) - ) { - tempBuf[0] = (char) (leadByte+0x80) ; - tempBuf[1] = (char) (mySourceChar+0x80); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, - tempBuf, 2, args->converter->useFallback); + targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In HZ DBCS, if both bytes are valid or both bytes are outside + * the 21..7d/7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21); + trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + if (leadIsOk) { + tempBuf[0] = (char) (leadByte+0x80) ; + tempBuf[1] = (char) (mySourceChar+0x80); + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, + tempBuf, 2, args->converter->useFallback); + } + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; } else { - targetUniChar = 0xffff; + --mySource; + mySourceChar = (int32_t)leadByte; } - /* add another bit so that the code below writes 2 bytes in case of error */ - mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; args->converter->toUnicodeStatus =0x00; } } diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c --- icu.6175/source/common/ucnvmbcs.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnvmbcs.c 2009-06-11 18:05:36.000000000 +0100 @@ -1,7 +1,7 @@ /* ****************************************************************************** * -* Copyright (C) 2000-2007, International Business Machines +* Copyright (C) 2000-2008, International Business Machines * Corporation and others. All Rights Reserved. * ****************************************************************************** @@ -1791,6 +1791,65 @@ pArgs->offsets=offsets; } +static UBool +hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) { + const int32_t *row=stateTable[state]; + int32_t b, entry; + /* First test for final entries in this state for some commonly valid byte values. */ + entry=row[0xa1]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + entry=row[0x41]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + /* Then test for final entries in this state. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + } + /* Then recurse for transition entries. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( MBCS_ENTRY_IS_TRANSITION(entry) && + hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)) + ) { + return TRUE; + } + } + return FALSE; +} + +/* + * Is byte b a single/lead byte in this state? + * Recurse for transition states, because here we don't want to say that + * b is a lead byte if all byte sequences that start with b are illegal. + */ +static UBool +isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) { + const int32_t *row=stateTable[state]; + int32_t entry=row[b]; + if(MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ + return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)); + } else { + uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry)); + if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { + return FALSE; /* SI/SO are illegal for DBCS-only conversion */ + } else { + return action!=MBCS_STATE_ILLEGAL; + } + } +} + U_CFUNC void ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs, UErrorCode *pErrorCode) { @@ -2146,6 +2205,34 @@ sourceIndex=nextSourceIndex; } else if(U_FAILURE(*pErrorCode)) { /* callback(illegal) */ + if(byteIndex>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + int8_t i; + for(i=1; + i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]); + ++i) {} + if(i<byteIndex) { + /* Back out some bytes. */ + int8_t backOutDistance=byteIndex-i; + int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source); + byteIndex=i; /* length of reported illegal byte sequence */ + if(backOutDistance<=bytesFromThisBuffer) { + source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* preToULength is negative! */ + uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength); + source=(const uint8_t *)pArgs->source; + } + } + } break; } else /* unassigned sequences indicated with byteIndex>0 */ { /* try an extension mapping */ @@ -2156,6 +2243,7 @@ &offsets, sourceIndex, pArgs->flush, pErrorCode); + /* TODO: nextSourceIndex+=diff instead of nextSourceIndex+diff ?? */ sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source); if(U_FAILURE(*pErrorCode)) { @@ -2447,15 +2535,37 @@ if(c<0) { if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) { - *pErrorCode=U_TRUNCATED_CHAR_FOUND; - } - if(U_FAILURE(*pErrorCode)) { /* incomplete character byte sequence */ uint8_t *bytes=cnv->toUBytes; cnv->toULength=(int8_t)(source-lastSource); do { *bytes++=*lastSource++; } while(lastSource<source); + *pErrorCode=U_TRUNCATED_CHAR_FOUND; + } else if(U_FAILURE(*pErrorCode)) { + /* callback(illegal) */ + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + uint8_t *bytes=cnv->toUBytes; + *bytes++=*lastSource++; /* first byte */ + if(lastSource==source) { + cnv->toULength=1; + } else /* lastSource<source: multi-byte character */ { + int8_t i; + for(i=1; + lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource); + ++i + ) { + *bytes++=*lastSource++; + } + cnv->toULength=i; + source=lastSource; + } } else { /* no output because of empty input or only state changes */ *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR; diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c --- icu.6175/source/test/cintltst/nccbtst.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/cintltst/nccbtst.c 2009-06-11 18:05:36.000000000 +0100 @@ -2497,13 +2497,13 @@ static const uint8_t text943[] = { - 0x82, 0xa9, 0x82, 0x20, /*0xc8,*/ 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; - static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; - static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; + 0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; + static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22, 0x5b57 }; + static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22, 0x5b57 }; static const UChar toUnicode943stop[]= { 0x304b}; - static const int32_t fromIBM943Offssub[] = {0, 2, 4, 5, 7}; - static const int32_t fromIBM943Offsskip[] = { 0, 4, 5, 7}; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 7 }; + static const int32_t fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 }; static const int32_t fromIBM943Offsstop[] = { 0}; gInBufferSize = inputsize; @@ -2537,9 +2537,9 @@ { static const uint8_t sampleText[] = { 0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82, - 0xff, /*0x82, 0xa9,*/ 0x32, 0x33}; - static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063, 0xfffd,/*0x304b,*/ 0x0032, 0x0033}; - static const int32_t fromIBM943Offssub[] = {0, 2, 3, 4, 5, 7, 8}; + 0xff, 0x32, 0x33}; + static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 }; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 }; /*checking illegal value for ibm-943 with substitute*/ gInBufferSize = inputsize; gOutBufferSize = outputsize; diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c --- icu.6175/source/test/cintltst/nucnvtst.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/cintltst/nucnvtst.c 2009-06-11 18:05:36.000000000 +0100 @@ -2608,7 +2608,7 @@ TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source"); /*Test for the condition where there is an invalid character*/ { - static const uint8_t source2[]={0xa1, 0x01}; + static const uint8_t source2[]={0xa1, 0x80}; TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character"); } /*Test for the condition where we have a truncated char*/ @@ -3901,11 +3901,11 @@ TestISO_2022_KR() { /* test input */ static const uint16_t in[]={ - 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D - ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04 + 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D + ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04 ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029 ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB - ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2 + ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2 ,0x53E3,0x53E4,0x000A,0x000D}; const UChar* uSource; const UChar* uSourceLimit; diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt --- icu.6175/source/test/testdata/conversion.txt 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/testdata/conversion.txt 2009-06-11 18:05:36.000000000 +0100 @@ -48,12 +48,83 @@ toUnicode { Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" } Cases { + // Test ticket 5691: consistent illegal sequences + // Unfortunately, we cannot use the Shift-JIS examples from the ticket + // comments because our Shift-JIS table is Windows-compatible and + // therefore has no illegal single bytes. Same for GBK. + // Instead, we use the stricter GB 18030 also for 2-byte examples. + // The byte sequences are generally slightly different from the ticket + // comment, simply using assigned characters rather than just + // theoretically valid sequences. + { + "gb18030", + :bin{ 618140813c81ff7a }, + "a\u4e02\\x81<\\x81\\xFFz", + :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "EUC-JP", + :bin{ 618fb0a98fb03c8f3cb0a97a }, + "a\u4e28\\x8F\\xB0<\\x8F<\u9022z", + :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "gb18030", + :bin{ 618130fc318130fc8181303c3e813cfc817a }, + "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z", + :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "UTF-8", + :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a }, + "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z", + :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-JP-2", + :bin{ 1b24424141af4142affe41431b2842 }, + "\u758f\\xAF\u758e\\xAF\\xFE\u790e", + :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ibm-25546", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-KR", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 411b242941420e4141af4142affe41430f5a }, + "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "HZ", + :bin{ 417e7b4141af4142affe41437e7d5a }, + "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e { "HZ", :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b }, - "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", - :intvector{ 2,4,6,8,10,12,14,18,19,21,24 }, + "\u3000\ufffd\ufffd\u3013\ufffd\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", + :intvector{ 2,4,5,6,8,9,10,12,14,18,19,21,24 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and @@ -61,8 +132,8 @@ { "ISO-2022-JP", :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 }, - "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", - :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 }, + "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\ufffd\ufffd\u25b2\ufffd\ufffd\u6f3e", + :intvector{ 3,4,5,9,11,12,13,14,16,17,19,20,21,22,23,25,26,27 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets() [3. text/x-patch; icu-3.6-CVE-2009-0153.patch] diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c --- icu.6175/source/common/ucnv2022.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnv2022.c 2009-06-02 16:03:15.000000000 +0100 @@ -754,6 +754,7 @@ UConverterDataISO2022* myData2022 = ((UConverterDataISO2022*)_this->extraInfo); uint32_t key = myData2022->key; int32_t offset = 0; + int8_t initialToULength = _this->toULength; char c; value = VALID_NON_TERMINAL_2022; @@ -806,7 +807,6 @@ return; } else if (value == INVALID_2022 ) { *err = U_ILLEGAL_ESCAPE_SEQUENCE; - return; } else /* value == VALID_TERMINAL_2022 */ { switch(var){ #ifdef U_ENABLE_GENERIC_ISO_2022 @@ -938,6 +938,35 @@ } if(U_SUCCESS(*err)) { _this->toULength = 0; + } else if(*err==U_ILLEGAL_ESCAPE_SEQUENCE) { + if(_this->toULength>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte (ESC) in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * In escape sequences, all following bytes are "printable", that is, + * unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS), + * they are valid single/lead bytes. + * For simplicity, we always only report the initial ESC byte as the + * illegal sequence and back out all other bytes we looked at. + */ + /* Back out some bytes. */ + int8_t backOutDistance=_this->toULength-1; + int8_t bytesFromThisBuffer=_this->toULength-initialToULength; + if(backOutDistance<=bytesFromThisBuffer) { + /* same as initialToULength<=1 */ + *source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + _this->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* same as -(initialToULength-1) */ + /* preToULength is negative! */ + uprv_memcpy(_this->preToU, _this->toUBytes+1, -_this->preToULength); + *source-=bytesFromThisBuffer; + } + _this->toULength=1; + } } else if(*err==U_UNSUPPORTED_ESCAPE_SEQUENCE) { _this->toUCallbackReason = UCNV_UNASSIGNED; } @@ -1973,6 +2002,7 @@ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; cs = (StateEnum)pToU2022State->cs[pToU2022State->g]; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -2102,17 +2132,44 @@ default: /* G0 DBCS */ if(mySource < mySourceLimit) { - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - if(cs == JISX208) { - _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); - } else { - tempBuf[0] = (char)mySourceChar; - tempBuf[1] = trailByte; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + uint32_t tmpSourceChar = (mySourceChar << 8) | trailByte; + if(cs == JISX208) { + _2022ToSJIS((uint8_t)mySourceChar, trailByte, tempBuf); + mySourceChar = tmpSourceChar; + } else { + /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ + mySourceChar = tmpSourceChar; + if (cs == KSC5601) { + tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ + } + tempBuf[0] = (char)(tmpSourceChar >> 8); + tempBuf[1] = (char)(tmpSourceChar); + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; @@ -2254,7 +2311,12 @@ } /* only DBCS or SBCS characters are expected*/ /* DB characters with high bit set to 1 are expected */ - if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){ + if( length > 2 || length==0 || + (length == 1 && targetByteUnit > 0x7f) || + (length == 2 && + ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) || + (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1))) + ) { targetByteUnit=missingCharMarker; } if (targetByteUnit != missingCharMarker){ @@ -2583,17 +2645,34 @@ myData->isEmptySegment = FALSE; /* Any invalid char errors will be detected separately, so just reset this */ if(myData->toU2022State.g == 1) { if(mySource < mySourceLimit) { - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - tempBuf[0] = (char)(mySourceChar + 0x80); - tempBuf[1] = (char)(trailByte + 0x80); - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - if((mySourceChar & 0x8080) == 0) { + targetUniChar = missingCharMarker; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + tempBuf[0] = (char)(mySourceChar + 0x80); + tempBuf[1] = (char)(trailByte + 0x80); targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); - } else { - /* illegal bytes > 0x7f */ - targetUniChar = missingCharMarker; + mySourceChar = (mySourceChar << 8) | trailByte; + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; @@ -2601,8 +2680,10 @@ break; } } - else{ + else if(mySourceChar <= 0x7f) { targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback); + } else { + targetUniChar = 0xffff; } if(targetUniChar < 0xfffe){ if(args->offsets) { @@ -3099,6 +3180,7 @@ /* continue with a partial double-byte character */ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -3178,29 +3260,50 @@ UConverterSharedData *cnv; StateEnum tempState; int32_t tempBufLen; - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; - if(tempState > CNS_11643_0) { - cnv = myData->myConverterArray[CNS_11643]; - tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); - tempBuf[1] = (char) (mySourceChar); - tempBuf[2] = trailByte; - tempBufLen = 3; - - }else{ - cnv = myData->myConverterArray[tempState]; - tempBuf[0] = (char) (mySourceChar); - tempBuf[1] = trailByte; - tempBufLen = 2; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; + if(tempState >= CNS_11643_0) { + cnv = myData->myConverterArray[CNS_11643]; + tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); + tempBuf[1] = (char) (mySourceChar); + tempBuf[2] = (char) trailByte; + tempBufLen = 3; + + }else{ + cnv = myData->myConverterArray[tempState]; + tempBuf[0] = (char) (mySourceChar); + tempBuf[1] = (char) trailByte; + tempBufLen = 2; + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); + mySourceChar = (mySourceChar << 8) | trailByte; + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); if(pToU2022State->g>=2) { /* return from a single-shift state to the previous one */ pToU2022State->g=pToU2022State->prevG; } - targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c --- icu.6175/source/common/ucnvhz.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnvhz.c 2009-06-02 15:57:18.000000000 +0100 @@ -196,10 +196,30 @@ /* if the first byte is equal to TILDE and the trail byte * is not a valid byte then it is an error condition */ - mySourceChar = 0x7e00 | mySourceChar; - targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ myData->isEmptySegment = FALSE; /* different error here, reset this to avoid spurious future error */ - break; + *err = U_ILLEGAL_ESCAPE_SEQUENCE; + args->converter->toUBytes[0] = UCNV_TILDE; + if( myData->isStateDBCS ? + (0x21 <= mySourceChar && mySourceChar <= 0x7e) : + mySourceChar <= 0x7f + ) { + /* The current byte could be the start of a character: Back it out. */ + args->converter->toULength = 1; + --mySource; + } else { + /* Include the current byte in the illegal sequence. */ + args->converter->toUBytes[1] = mySourceChar; + args->converter->toULength = 2; + } + args->target = myTarget; + args->source = mySource; + return; } } else if(myData->isStateDBCS) { if(args->converter->toUnicodeStatus == 0x00){ @@ -215,19 +235,36 @@ } else{ /* trail byte */ + int leadIsOk, trailIsOk; uint32_t leadByte = args->converter->toUnicodeStatus & 0xff; - if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) && - (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21) - ) { + targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In HZ DBCS, if the second byte is in the 21..7e range, + * we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21); + trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { tempBuf[0] = (char) (leadByte+0x80) ; tempBuf[1] = (char) (mySourceChar+0x80); targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, tempBuf, 2, args->converter->useFallback); + mySourceChar= (leadByte << 8) | mySourceChar; + } else if (trailIsOk) { + /* report a single illegal byte and continue with the following DBCS starter byte */ + --mySource; + mySourceChar = (int32_t)leadByte; } else { - targetUniChar = 0xffff; + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; } - /* add another bit so that the code below writes 2 bytes in case of error */ - mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; args->converter->toUnicodeStatus =0x00; } } diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c --- icu.6175/source/common/ucnvmbcs.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnvmbcs.c 2009-06-02 15:56:07.000000000 +0100 @@ -1697,6 +1697,65 @@ pArgs->offsets=offsets; } +static UBool +hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) { + const int32_t *row=stateTable[state]; + int32_t b, entry; + /* First test for final entries in this state for some commonly valid byte values. */ + entry=row[0xa1]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + entry=row[0x41]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + /* Then test for final entries in this state. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + } + /* Then recurse for transition entries. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( MBCS_ENTRY_IS_TRANSITION(entry) && + hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)) + ) { + return TRUE; + } + } + return FALSE; +} + +/* + * Is byte b a single/lead byte in this state? + * Recurse for transition states, because here we don't want to say that + * b is a lead byte if all byte sequences that start with b are illegal. + */ +static UBool +isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) { + const int32_t *row=stateTable[state]; + int32_t entry=row[b]; + if(MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ + return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)); + } else { + uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry)); + if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { + return FALSE; /* SI/SO are illegal for DBCS-only conversion */ + } else { + return action!=MBCS_STATE_ILLEGAL; + } + } +} + U_CFUNC void ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs, UErrorCode *pErrorCode) { @@ -2052,6 +2111,34 @@ sourceIndex=nextSourceIndex; } else if(U_FAILURE(*pErrorCode)) { /* callback(illegal) */ + if(byteIndex>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + int8_t i; + for(i=1; + i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]); + ++i) {} + if(i<byteIndex) { + /* Back out some bytes. */ + int8_t backOutDistance=byteIndex-i; + int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source); + byteIndex=i; /* length of reported illegal byte sequence */ + if(backOutDistance<=bytesFromThisBuffer) { + source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* preToULength is negative! */ + uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength); + source=(const uint8_t *)pArgs->source; + } + } + } break; } else /* unassigned sequences indicated with byteIndex>0 */ { /* try an extension mapping */ @@ -2062,7 +2149,7 @@ &offsets, sourceIndex, pArgs->flush, pErrorCode); - sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source); + sourceIndex=nextSourceIndex+=(int32_t)(source-(const uint8_t *)pArgs->source); if(U_FAILURE(*pErrorCode)) { /* not mappable or buffer overflow */ @@ -2353,15 +2440,37 @@ if(c<0) { if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) { - *pErrorCode=U_TRUNCATED_CHAR_FOUND; - } - if(U_FAILURE(*pErrorCode)) { /* incomplete character byte sequence */ uint8_t *bytes=cnv->toUBytes; cnv->toULength=(int8_t)(source-lastSource); do { *bytes++=*lastSource++; } while(lastSource<source); + *pErrorCode=U_TRUNCATED_CHAR_FOUND; + } else if(U_FAILURE(*pErrorCode)) { + /* callback(illegal) */ + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + uint8_t *bytes=cnv->toUBytes; + *bytes++=*lastSource++; /* first byte */ + if(lastSource==source) { + cnv->toULength=1; + } else /* lastSource<source: multi-byte character */ { + int8_t i; + for(i=1; + lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource); + ++i + ) { + *bytes++=*lastSource++; + } + cnv->toULength=i; + source=lastSource; + } } else { /* no output because of empty input or only state changes */ *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR; diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c --- icu.6175/source/test/cintltst/nccbtst.c 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/cintltst/nccbtst.c 2009-06-02 15:47:38.000000000 +0100 @@ -2497,13 +2497,13 @@ static const uint8_t text943[] = { - 0x82, 0xa9, 0x82, 0x20, /*0xc8,*/ 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; - static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; - static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; + 0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; + static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22, 0x5b57 }; + static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22, 0x5b57 }; static const UChar toUnicode943stop[]= { 0x304b}; - static const int32_t fromIBM943Offssub[] = {0, 2, 4, 5, 7}; - static const int32_t fromIBM943Offsskip[] = { 0, 4, 5, 7}; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 7 }; + static const int32_t fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 }; static const int32_t fromIBM943Offsstop[] = { 0}; gInBufferSize = inputsize; @@ -2537,9 +2537,9 @@ { static const uint8_t sampleText[] = { 0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82, - 0xff, /*0x82, 0xa9,*/ 0x32, 0x33}; - static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063, 0xfffd,/*0x304b,*/ 0x0032, 0x0033}; - static const int32_t fromIBM943Offssub[] = {0, 2, 3, 4, 5, 7, 8}; + 0xff, 0x32, 0x33}; + static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 }; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 }; /*checking illegal value for ibm-943 with substitute*/ gInBufferSize = inputsize; gOutBufferSize = outputsize; diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c --- icu.6175/source/test/cintltst/nucnvtst.c 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/cintltst/nucnvtst.c 2009-06-02 15:47:38.000000000 +0100 @@ -2606,7 +2606,7 @@ TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source"); /*Test for the condition where there is an invalid character*/ { - static const uint8_t source2[]={0xa1, 0x01}; + static const uint8_t source2[]={0xa1, 0x80}; TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character"); } /*Test for the condition where we have a truncated char*/ @@ -3899,11 +3899,11 @@ TestISO_2022_KR() { /* test input */ static const uint16_t in[]={ - 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D - ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04 + 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D + ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04 ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029 ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB - ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2 + ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2 ,0x53E3,0x53E4,0x000A,0x000D}; const UChar* uSource; const UChar* uSourceLimit; diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt --- icu.6175/source/test/testdata/conversion.txt 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/testdata/conversion.txt 2009-06-02 15:57:41.000000000 +0100 @@ -48,12 +48,144 @@ toUnicode { Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" } Cases { + // Test ticket 5691: consistent illegal sequences + // The following test cases are for illegal character byte sequences. + // + // Unfortunately, we cannot use the Shift-JIS examples from the ticket + // comments because our Shift-JIS table is Windows-compatible and + // therefore has no illegal single bytes. Same for GBK. + // Instead, we use the stricter GB 18030 also for 2-byte examples. + // The byte sequences are generally slightly different from the ticket + // comment, simply using assigned characters rather than just + // theoretically valid sequences. + { + "gb18030", + :bin{ 618140813c81ff7a }, + "a\u4e02\\x81<\\x81\\xFFz", + :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "EUC-JP", + :bin{ 618fb0a98fb03c8f3cb0a97a }, + "a\u4e28\\x8F\\xB0<\\x8F<\u9022z", + :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "gb18030", + :bin{ 618130fc318130fc8181303c3e813cfc817a }, + "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z", + :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "UTF-8", + :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a }, + "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z", + :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-JP", + :bin{ 1b24424141af4142affe41431b2842 }, + "\u758f\\xAF\u758e\\xAF\\xFE\u790e", + :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ibm-25546", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-KR", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 411b242941420e4141af4142affe41430f5a }, + "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "HZ", + :bin{ 417e7b4141af4142affe41437e7d5a }, + "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: consistent illegal sequences + // The following test cases are for illegal escape/designator/shift sequences. + // + // ISO-2022-JP and -CN with illegal escape sequences. + { + "ISO-2022-JP", + :bin{ 611b24201b244241411b283f1b28427a }, + "a\\x1B$ \u758f\\x1B\u2538z", + :intvector{ 0,1,1,1,1,2,3,7,9,9,9,9,10,15 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 611b2429201b2429410e41410f7a }, + "a\\x1B$) \u4eaez", + :intvector{ 0,1,1,1,1,2,3,4,10,13 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: ISO-2022-JP-2 with illegal single-shift SS2 and SS3 sequences. + // The first ESC N comes before its designator sequence, the last sequence is ESC+space. + { + "ISO-2022-JP-2", + :bin{ 4e1b4e4e1b2e414e1b4e4e4e1b204e }, + "N\\x1BNNN\xceN\\x1B N", + :intvector{ 0,1,1,1,1,2,3,7,10,11,12,12,12,12,13,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN-EXT", + :bin{ 4e1b4e4e1b242a484e1b4e4e4e4e1b204e }, + "N\\x1BNNN\u8f0eN\\x1B N", + :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN-EXT", + :bin{ 4f1b4f4f1b242b494f1b4f4f4f4f1b204f }, + "O\\x1BOOO\u492bO\\x1B O", + :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: HZ with illegal tilde sequences. + { + "HZ", + :bin{ 417e20427e21437e80447e7b41417e207e41427e7f41437e7d5a }, + "A\\x7E B\\x7E!C\\x7E\\x80D\u4eae\\x7E\\x20\\x7E\u8c05\\x7E\\x7F\u64a9Z", + :intvector{ 0,1,1,1,1,2,3,4,4,4,4,5,6,7,7,7,7,7,7,7,7,9, // SBCS + 12,14,14,14,14,14,14,14,14,16,16,16,16,17,19,19,19,19,19,19,19,19,21, // DBCS + 25 }, // SBCS + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: Example from Peter Edberg. + { + "ISO-2022-JP", + :bin{ 1b244230212f7e742630801b284a621b2458631b2842648061 }, + "\u4e9c\ufffd\u7199\ufffdb\ufffd$Xcd\ufffda", + :intvector{ 3,5,7,9,14,15,16,17,18,22,23,24 }, + :int{1}, :int{0}, "", "?", :bin{""} + } // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e { "HZ", - :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b }, - "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", - :intvector{ 2,4,6,8,10,12,14,18,19,21,24 }, + :bin{ 7e7b21212120217e217f772100007e217e7e7d207e7e807e0a2b }, + "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd\u3013 ~\ufffd+", + :intvector{ 2,4,6,8,10,12,14,15,19,20,22,25 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and @@ -61,8 +193,8 @@ { "ISO-2022-JP", :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 }, - "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", - :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 }, + "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", + :intvector{ 3,4,5,9,11,12,14,16,17,19,21,23,25,27 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets() @@ -341,7 +473,7 @@ { "ISO-2022-CN-EXT", :bin{ 411b4e2121 }, "\x41", :intvector{ 0 }, - :int{1}, :int{1}, "illesc", ".", :bin{ 1b4e } + :int{1}, :int{1}, "illesc", ".", :bin{ 1b } } // G3 designator: recognized, but not supported for -CN (only for -CN-EXT) {
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 25 Aug 2009 20:30:08 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 25 Aug 2009 20:30:08 GMT) (full text, mbox, link).
Message #20 received at 534590@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
(Trying one more time to post properly to bug report; original message also copied to team@security.debian.org.) The reporter of this ICU security bug that impacts oldstable and stable but not testing or unstable was kind enough to refer to the Red Hat bugzilla entry for this. The people at Red Hat have backported the security fix to 3.6 (our oldstable) and 3.8 (our stable) versions of ICU (which appear in RHEL5 and Fedora 9). I have grabbed their SRPMs for the patched versions and extracted the patches that apply to the 3.6 and 3.8 versions. Attached here are the patches directly from those source RPMs, not modified in any way or tested for debian. I can integrate these into the debian packages prepare uploads for stable security and oldstable security, or I can defer to the security team to do the integration. Just let me know. It may be several days before I have a chance to work on it, but I have prepared stable security uploads for my packages before. I am grateful to Red Hat for doing the work of backporting to the older ICU versions. -- Jay Berkenbilt <qjb@debian.org>
[icu-3.8-CVE-2009-0153.patch (text/x-patch, inline)]
diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c --- icu.6175/source/common/ucnv2022.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnv2022.c 2009-06-11 18:09:48.000000000 +0100 @@ -1973,6 +1973,7 @@ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; cs = (StateEnum)pToU2022State->cs[pToU2022State->g]; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -2102,18 +2103,45 @@ default: /* G0 DBCS */ if(mySource < mySourceLimit) { + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - if(cs == JISX208) { - _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); - } else { - tempBuf[0] = (char)mySourceChar; - tempBuf[1] = trailByte; - } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); - } else { + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + uint32_t tmpSourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); + if (leadIsOk) { + if(cs == JISX208) { + _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); + mySourceChar = tmpSourceChar; + } else { + /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ + mySourceChar = tmpSourceChar; + if (cs == KSC5601) { + tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ + } + tempBuf[0] = (char)(tmpSourceChar >> 8); + tempBuf[1] = (char)(tmpSourceChar); + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); + } else { + mySourceChar = tmpSourceChar; + } + } + } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; goto endloop; @@ -2254,7 +2282,12 @@ } /* only DBCS or SBCS characters are expected*/ /* DB characters with high bit set to 1 are expected */ - if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){ + if( length > 2 || length==0 || + (length == 1 && targetByteUnit > 0x7f) || + (length == 2 && + ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) || + (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1))) + ) { targetByteUnit=missingCharMarker; } if (targetByteUnit != missingCharMarker){ @@ -2583,17 +2616,36 @@ myData->isEmptySegment = FALSE; /* Any invalid char errors will be detected separately, so just reset this */ if(myData->toU2022State.g == 1) { if(mySource < mySourceLimit) { + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - tempBuf[0] = (char)(mySourceChar + 0x80); - tempBuf[1] = (char)(trailByte + 0x80); - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - if((mySourceChar & 0x8080) == 0) { - targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); + targetUniChar = missingCharMarker; + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + if (leadIsOk) { + tempBuf[0] = (char)(mySourceChar + 0x80); + tempBuf[1] = (char)(trailByte + 0x80); + targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); + } else { + leadIsOk = TRUE; /* TODO: remove */ + } + mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); } else { - /* illegal bytes > 0x7f */ - targetUniChar = missingCharMarker; + trailIsOk = TRUE; /* TODO: remove */ } } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; @@ -2601,8 +2653,10 @@ break; } } - else{ + else if(mySourceChar <= 0x7f) { targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback); + } else { + targetUniChar = 0xffff; } if(targetUniChar < 0xfffe){ if(args->offsets) { @@ -3099,6 +3153,7 @@ /* continue with a partial double-byte character */ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -3178,29 +3233,48 @@ UConverterSharedData *cnv; StateEnum tempState; int32_t tempBufLen; + int leadIsOk, trailIsOk; char trailByte; getTrailByte: - trailByte = *mySource++; - tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; - if(tempState > CNS_11643_0) { - cnv = myData->myConverterArray[CNS_11643]; - tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); - tempBuf[1] = (char) (mySourceChar); - tempBuf[2] = trailByte; - tempBufLen = 3; - - }else{ - cnv = myData->myConverterArray[tempState]; - tempBuf[0] = (char) (mySourceChar); - tempBuf[1] = trailByte; - tempBufLen = 2; + trailByte = *mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if both bytes are valid or both bytes are outside + * the 21..7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + ++mySource; + if (leadIsOk) { + tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; + if(tempState >= CNS_11643_0) { + cnv = myData->myConverterArray[CNS_11643]; + tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); + tempBuf[1] = (char) (mySourceChar); + tempBuf[2] = trailByte; + tempBufLen = 3; + + }else{ + cnv = myData->myConverterArray[tempState]; + tempBuf[0] = (char) (mySourceChar); + tempBuf[1] = trailByte; + tempBufLen = 2; + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); + } + mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); if(pToU2022State->g>=2) { /* return from a single-shift state to the previous one */ pToU2022State->g=pToU2022State->prevG; } - targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c --- icu.6175/source/common/ucnvhz.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnvhz.c 2009-06-11 18:05:36.000000000 +0100 @@ -215,19 +215,35 @@ } else{ /* trail byte */ + int leadIsOk, trailIsOk; uint32_t leadByte = args->converter->toUnicodeStatus & 0xff; - if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) && - (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21) - ) { - tempBuf[0] = (char) (leadByte+0x80) ; - tempBuf[1] = (char) (mySourceChar+0x80); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, - tempBuf, 2, args->converter->useFallback); + targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In HZ DBCS, if both bytes are valid or both bytes are outside + * the 21..7d/7e range, then we treat them as a pair. + * Otherwise (valid lead byte + illegal trail byte, or vice versa) + * we report only the first byte as the illegal sequence. + */ + leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21); + trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + if (leadIsOk == trailIsOk) { + if (leadIsOk) { + tempBuf[0] = (char) (leadByte+0x80) ; + tempBuf[1] = (char) (mySourceChar+0x80); + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, + tempBuf, 2, args->converter->useFallback); + } + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; } else { - targetUniChar = 0xffff; + --mySource; + mySourceChar = (int32_t)leadByte; } - /* add another bit so that the code below writes 2 bytes in case of error */ - mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; args->converter->toUnicodeStatus =0x00; } } diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c --- icu.6175/source/common/ucnvmbcs.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/common/ucnvmbcs.c 2009-06-11 18:05:36.000000000 +0100 @@ -1,7 +1,7 @@ /* ****************************************************************************** * -* Copyright (C) 2000-2007, International Business Machines +* Copyright (C) 2000-2008, International Business Machines * Corporation and others. All Rights Reserved. * ****************************************************************************** @@ -1791,6 +1791,65 @@ pArgs->offsets=offsets; } +static UBool +hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) { + const int32_t *row=stateTable[state]; + int32_t b, entry; + /* First test for final entries in this state for some commonly valid byte values. */ + entry=row[0xa1]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + entry=row[0x41]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + /* Then test for final entries in this state. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + } + /* Then recurse for transition entries. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( MBCS_ENTRY_IS_TRANSITION(entry) && + hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)) + ) { + return TRUE; + } + } + return FALSE; +} + +/* + * Is byte b a single/lead byte in this state? + * Recurse for transition states, because here we don't want to say that + * b is a lead byte if all byte sequences that start with b are illegal. + */ +static UBool +isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) { + const int32_t *row=stateTable[state]; + int32_t entry=row[b]; + if(MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ + return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)); + } else { + uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry)); + if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { + return FALSE; /* SI/SO are illegal for DBCS-only conversion */ + } else { + return action!=MBCS_STATE_ILLEGAL; + } + } +} + U_CFUNC void ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs, UErrorCode *pErrorCode) { @@ -2146,6 +2205,34 @@ sourceIndex=nextSourceIndex; } else if(U_FAILURE(*pErrorCode)) { /* callback(illegal) */ + if(byteIndex>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + int8_t i; + for(i=1; + i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]); + ++i) {} + if(i<byteIndex) { + /* Back out some bytes. */ + int8_t backOutDistance=byteIndex-i; + int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source); + byteIndex=i; /* length of reported illegal byte sequence */ + if(backOutDistance<=bytesFromThisBuffer) { + source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* preToULength is negative! */ + uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength); + source=(const uint8_t *)pArgs->source; + } + } + } break; } else /* unassigned sequences indicated with byteIndex>0 */ { /* try an extension mapping */ @@ -2156,6 +2243,7 @@ &offsets, sourceIndex, pArgs->flush, pErrorCode); + /* TODO: nextSourceIndex+=diff instead of nextSourceIndex+diff ?? */ sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source); if(U_FAILURE(*pErrorCode)) { @@ -2447,15 +2535,37 @@ if(c<0) { if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) { - *pErrorCode=U_TRUNCATED_CHAR_FOUND; - } - if(U_FAILURE(*pErrorCode)) { /* incomplete character byte sequence */ uint8_t *bytes=cnv->toUBytes; cnv->toULength=(int8_t)(source-lastSource); do { *bytes++=*lastSource++; } while(lastSource<source); + *pErrorCode=U_TRUNCATED_CHAR_FOUND; + } else if(U_FAILURE(*pErrorCode)) { + /* callback(illegal) */ + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + uint8_t *bytes=cnv->toUBytes; + *bytes++=*lastSource++; /* first byte */ + if(lastSource==source) { + cnv->toULength=1; + } else /* lastSource<source: multi-byte character */ { + int8_t i; + for(i=1; + lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource); + ++i + ) { + *bytes++=*lastSource++; + } + cnv->toULength=i; + source=lastSource; + } } else { /* no output because of empty input or only state changes */ *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR; diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c --- icu.6175/source/test/cintltst/nccbtst.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/cintltst/nccbtst.c 2009-06-11 18:05:36.000000000 +0100 @@ -2497,13 +2497,13 @@ static const uint8_t text943[] = { - 0x82, 0xa9, 0x82, 0x20, /*0xc8,*/ 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; - static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; - static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; + 0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; + static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22, 0x5b57 }; + static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22, 0x5b57 }; static const UChar toUnicode943stop[]= { 0x304b}; - static const int32_t fromIBM943Offssub[] = {0, 2, 4, 5, 7}; - static const int32_t fromIBM943Offsskip[] = { 0, 4, 5, 7}; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 7 }; + static const int32_t fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 }; static const int32_t fromIBM943Offsstop[] = { 0}; gInBufferSize = inputsize; @@ -2537,9 +2537,9 @@ { static const uint8_t sampleText[] = { 0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82, - 0xff, /*0x82, 0xa9,*/ 0x32, 0x33}; - static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063, 0xfffd,/*0x304b,*/ 0x0032, 0x0033}; - static const int32_t fromIBM943Offssub[] = {0, 2, 3, 4, 5, 7, 8}; + 0xff, 0x32, 0x33}; + static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 }; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 }; /*checking illegal value for ibm-943 with substitute*/ gInBufferSize = inputsize; gOutBufferSize = outputsize; diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c --- icu.6175/source/test/cintltst/nucnvtst.c 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/cintltst/nucnvtst.c 2009-06-11 18:05:36.000000000 +0100 @@ -2608,7 +2608,7 @@ TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source"); /*Test for the condition where there is an invalid character*/ { - static const uint8_t source2[]={0xa1, 0x01}; + static const uint8_t source2[]={0xa1, 0x80}; TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character"); } /*Test for the condition where we have a truncated char*/ @@ -3901,11 +3901,11 @@ TestISO_2022_KR() { /* test input */ static const uint16_t in[]={ - 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D - ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04 + 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D + ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04 ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029 ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB - ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2 + ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2 ,0x53E3,0x53E4,0x000A,0x000D}; const UChar* uSource; const UChar* uSourceLimit; diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt --- icu.6175/source/test/testdata/conversion.txt 2009-06-11 13:44:44.000000000 +0100 +++ icu/source/test/testdata/conversion.txt 2009-06-11 18:05:36.000000000 +0100 @@ -48,12 +48,83 @@ toUnicode { Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" } Cases { + // Test ticket 5691: consistent illegal sequences + // Unfortunately, we cannot use the Shift-JIS examples from the ticket + // comments because our Shift-JIS table is Windows-compatible and + // therefore has no illegal single bytes. Same for GBK. + // Instead, we use the stricter GB 18030 also for 2-byte examples. + // The byte sequences are generally slightly different from the ticket + // comment, simply using assigned characters rather than just + // theoretically valid sequences. + { + "gb18030", + :bin{ 618140813c81ff7a }, + "a\u4e02\\x81<\\x81\\xFFz", + :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "EUC-JP", + :bin{ 618fb0a98fb03c8f3cb0a97a }, + "a\u4e28\\x8F\\xB0<\\x8F<\u9022z", + :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "gb18030", + :bin{ 618130fc318130fc8181303c3e813cfc817a }, + "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z", + :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "UTF-8", + :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a }, + "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z", + :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-JP-2", + :bin{ 1b24424141af4142affe41431b2842 }, + "\u758f\\xAF\u758e\\xAF\\xFE\u790e", + :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ibm-25546", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-KR", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 411b242941420e4141af4142affe41430f5a }, + "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "HZ", + :bin{ 417e7b4141af4142affe41437e7d5a }, + "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e { "HZ", :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b }, - "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", - :intvector{ 2,4,6,8,10,12,14,18,19,21,24 }, + "\u3000\ufffd\ufffd\u3013\ufffd\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", + :intvector{ 2,4,5,6,8,9,10,12,14,18,19,21,24 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and @@ -61,8 +132,8 @@ { "ISO-2022-JP", :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 }, - "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", - :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 }, + "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\ufffd\ufffd\u25b2\ufffd\ufffd\u6f3e", + :intvector{ 3,4,5,9,11,12,13,14,16,17,19,20,21,22,23,25,26,27 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets()
[icu-3.6-CVE-2009-0153.patch (text/x-patch, inline)]
diff -ru icu.6175/source/common/ucnv2022.c icu/source/common/ucnv2022.c --- icu.6175/source/common/ucnv2022.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnv2022.c 2009-06-02 16:03:15.000000000 +0100 @@ -754,6 +754,7 @@ UConverterDataISO2022* myData2022 = ((UConverterDataISO2022*)_this->extraInfo); uint32_t key = myData2022->key; int32_t offset = 0; + int8_t initialToULength = _this->toULength; char c; value = VALID_NON_TERMINAL_2022; @@ -806,7 +807,6 @@ return; } else if (value == INVALID_2022 ) { *err = U_ILLEGAL_ESCAPE_SEQUENCE; - return; } else /* value == VALID_TERMINAL_2022 */ { switch(var){ #ifdef U_ENABLE_GENERIC_ISO_2022 @@ -938,6 +938,35 @@ } if(U_SUCCESS(*err)) { _this->toULength = 0; + } else if(*err==U_ILLEGAL_ESCAPE_SEQUENCE) { + if(_this->toULength>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte (ESC) in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * In escape sequences, all following bytes are "printable", that is, + * unless they are completely illegal (>7f in SBCS, outside 21..7e in DBCS), + * they are valid single/lead bytes. + * For simplicity, we always only report the initial ESC byte as the + * illegal sequence and back out all other bytes we looked at. + */ + /* Back out some bytes. */ + int8_t backOutDistance=_this->toULength-1; + int8_t bytesFromThisBuffer=_this->toULength-initialToULength; + if(backOutDistance<=bytesFromThisBuffer) { + /* same as initialToULength<=1 */ + *source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + _this->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* same as -(initialToULength-1) */ + /* preToULength is negative! */ + uprv_memcpy(_this->preToU, _this->toUBytes+1, -_this->preToULength); + *source-=bytesFromThisBuffer; + } + _this->toULength=1; + } } else if(*err==U_UNSUPPORTED_ESCAPE_SEQUENCE) { _this->toUCallbackReason = UCNV_UNASSIGNED; } @@ -1973,6 +2002,7 @@ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; cs = (StateEnum)pToU2022State->cs[pToU2022State->g]; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -2102,17 +2132,44 @@ default: /* G0 DBCS */ if(mySource < mySourceLimit) { - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - if(cs == JISX208) { - _2022ToSJIS((uint8_t)mySourceChar, (uint8_t)trailByte, tempBuf); - } else { - tempBuf[0] = (char)mySourceChar; - tempBuf[1] = trailByte; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + uint32_t tmpSourceChar = (mySourceChar << 8) | trailByte; + if(cs == JISX208) { + _2022ToSJIS((uint8_t)mySourceChar, trailByte, tempBuf); + mySourceChar = tmpSourceChar; + } else { + /* Copy before we modify tmpSourceChar so toUnicodeCallback() sees the correct bytes. */ + mySourceChar = tmpSourceChar; + if (cs == KSC5601) { + tmpSourceChar += 0x8080; /* = _2022ToGR94DBCS(tmpSourceChar) */ + } + tempBuf[0] = (char)(tmpSourceChar >> 8); + tempBuf[1] = (char)(tmpSourceChar); + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->myConverterArray[cs], tempBuf, 2, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; @@ -2254,7 +2311,12 @@ } /* only DBCS or SBCS characters are expected*/ /* DB characters with high bit set to 1 are expected */ - if(length > 2 || length==0 ||(((targetByteUnit & 0x8080) != 0x8080)&& length==2)){ + if( length > 2 || length==0 || + (length == 1 && targetByteUnit > 0x7f) || + (length == 2 && + ((uint16_t)(targetByteUnit - 0xa1a1) > (0xfefe - 0xa1a1) || + (uint8_t)(targetByteUnit - 0xa1) > (0xfe - 0xa1))) + ) { targetByteUnit=missingCharMarker; } if (targetByteUnit != missingCharMarker){ @@ -2583,17 +2645,34 @@ myData->isEmptySegment = FALSE; /* Any invalid char errors will be detected separately, so just reset this */ if(myData->toU2022State.g == 1) { if(mySource < mySourceLimit) { - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - tempBuf[0] = (char)(mySourceChar + 0x80); - tempBuf[1] = (char)(trailByte + 0x80); - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); - if((mySourceChar & 0x8080) == 0) { + targetUniChar = missingCharMarker; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + tempBuf[0] = (char)(mySourceChar + 0x80); + tempBuf[1] = (char)(trailByte + 0x80); targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, tempBuf, 2, useFallback); - } else { - /* illegal bytes > 0x7f */ - targetUniChar = missingCharMarker; + mySourceChar = (mySourceChar << 8) | trailByte; + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; @@ -2601,8 +2680,10 @@ break; } } - else{ + else if(mySourceChar <= 0x7f) { targetUniChar = ucnv_MBCSSimpleGetNextUChar(sharedData, mySource - 1, 1, useFallback); + } else { + targetUniChar = 0xffff; } if(targetUniChar < 0xfffe){ if(args->offsets) { @@ -3099,6 +3180,7 @@ /* continue with a partial double-byte character */ mySourceChar = args->converter->toUBytes[0]; args->converter->toULength = 0; + targetUniChar = missingCharMarker; goto getTrailByte; } @@ -3178,29 +3260,50 @@ UConverterSharedData *cnv; StateEnum tempState; int32_t tempBufLen; - char trailByte; + int leadIsOk, trailIsOk; + uint8_t trailByte; getTrailByte: - trailByte = *mySource++; - tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; - if(tempState > CNS_11643_0) { - cnv = myData->myConverterArray[CNS_11643]; - tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); - tempBuf[1] = (char) (mySourceChar); - tempBuf[2] = trailByte; - tempBufLen = 3; - - }else{ - cnv = myData->myConverterArray[tempState]; - tempBuf[0] = (char) (mySourceChar); - tempBuf[1] = trailByte; - tempBufLen = 2; + trailByte = (uint8_t)*mySource; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In ISO-2022 DBCS, if the second byte is in the 21..7e range or is + * an ESC/SO/SI, we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + trailIsOk = (uint8_t)(trailByte - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { + ++mySource; + tempState = (StateEnum)pToU2022State->cs[pToU2022State->g]; + if(tempState >= CNS_11643_0) { + cnv = myData->myConverterArray[CNS_11643]; + tempBuf[0] = (char) (0x80+(tempState-CNS_11643_0)); + tempBuf[1] = (char) (mySourceChar); + tempBuf[2] = (char) trailByte; + tempBufLen = 3; + + }else{ + cnv = myData->myConverterArray[tempState]; + tempBuf[0] = (char) (mySourceChar); + tempBuf[1] = (char) trailByte; + tempBufLen = 2; + } + targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); + mySourceChar = (mySourceChar << 8) | trailByte; + } else if (!(trailIsOk || IS_2022_CONTROL(trailByte))) { + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + ++mySource; + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar = 0x10000 | (mySourceChar << 8) | trailByte; } - mySourceChar = (mySourceChar << 8) | (uint8_t)(trailByte); if(pToU2022State->g>=2) { /* return from a single-shift state to the previous one */ pToU2022State->g=pToU2022State->prevG; } - targetUniChar = ucnv_MBCSSimpleGetNextUChar(cnv, tempBuf, tempBufLen, FALSE); } else { args->converter->toUBytes[0] = (uint8_t)mySourceChar; args->converter->toULength = 1; diff -ru icu.6175/source/common/ucnvhz.c icu/source/common/ucnvhz.c --- icu.6175/source/common/ucnvhz.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnvhz.c 2009-06-02 15:57:18.000000000 +0100 @@ -196,10 +196,30 @@ /* if the first byte is equal to TILDE and the trail byte * is not a valid byte then it is an error condition */ - mySourceChar = 0x7e00 | mySourceChar; - targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ myData->isEmptySegment = FALSE; /* different error here, reset this to avoid spurious future error */ - break; + *err = U_ILLEGAL_ESCAPE_SEQUENCE; + args->converter->toUBytes[0] = UCNV_TILDE; + if( myData->isStateDBCS ? + (0x21 <= mySourceChar && mySourceChar <= 0x7e) : + mySourceChar <= 0x7f + ) { + /* The current byte could be the start of a character: Back it out. */ + args->converter->toULength = 1; + --mySource; + } else { + /* Include the current byte in the illegal sequence. */ + args->converter->toUBytes[1] = mySourceChar; + args->converter->toULength = 2; + } + args->target = myTarget; + args->source = mySource; + return; } } else if(myData->isStateDBCS) { if(args->converter->toUnicodeStatus == 0x00){ @@ -215,19 +235,36 @@ } else{ /* trail byte */ + int leadIsOk, trailIsOk; uint32_t leadByte = args->converter->toUnicodeStatus & 0xff; - if( (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21) && - (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21) - ) { + targetUniChar = 0xffff; + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + * + * In HZ DBCS, if the second byte is in the 21..7e range, + * we report only the first byte as the illegal sequence. + * Otherwise we convert or report the pair of bytes. + */ + leadIsOk = (uint8_t)(leadByte - 0x21) <= (0x7d - 0x21); + trailIsOk = (uint8_t)(mySourceChar - 0x21) <= (0x7e - 0x21); + if (leadIsOk && trailIsOk) { tempBuf[0] = (char) (leadByte+0x80) ; tempBuf[1] = (char) (mySourceChar+0x80); targetUniChar = ucnv_MBCSSimpleGetNextUChar(myData->gbConverter->sharedData, tempBuf, 2, args->converter->useFallback); + mySourceChar= (leadByte << 8) | mySourceChar; + } else if (trailIsOk) { + /* report a single illegal byte and continue with the following DBCS starter byte */ + --mySource; + mySourceChar = (int32_t)leadByte; } else { - targetUniChar = 0xffff; + /* report a pair of illegal bytes if the second byte is not a DBCS starter */ + /* add another bit so that the code below writes 2 bytes in case of error */ + mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; } - /* add another bit so that the code below writes 2 bytes in case of error */ - mySourceChar= 0x10000 | (leadByte << 8) | mySourceChar; args->converter->toUnicodeStatus =0x00; } } diff -ru icu.6175/source/common/ucnvmbcs.c icu/source/common/ucnvmbcs.c --- icu.6175/source/common/ucnvmbcs.c 2009-06-02 15:47:31.000000000 +0100 +++ icu/source/common/ucnvmbcs.c 2009-06-02 15:56:07.000000000 +0100 @@ -1697,6 +1697,65 @@ pArgs->offsets=offsets; } +static UBool +hasValidTrailBytes(const int32_t (*stateTable)[256], uint8_t state) { + const int32_t *row=stateTable[state]; + int32_t b, entry; + /* First test for final entries in this state for some commonly valid byte values. */ + entry=row[0xa1]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + entry=row[0x41]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + /* Then test for final entries in this state. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( !MBCS_ENTRY_IS_TRANSITION(entry) && + MBCS_ENTRY_FINAL_ACTION(entry)!=MBCS_STATE_ILLEGAL + ) { + return TRUE; + } + } + /* Then recurse for transition entries. */ + for(b=0; b<=0xff; ++b) { + entry=row[b]; + if( MBCS_ENTRY_IS_TRANSITION(entry) && + hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)) + ) { + return TRUE; + } + } + return FALSE; +} + +/* + * Is byte b a single/lead byte in this state? + * Recurse for transition states, because here we don't want to say that + * b is a lead byte if all byte sequences that start with b are illegal. + */ +static UBool +isSingleOrLead(const int32_t (*stateTable)[256], uint8_t state, UBool isDBCSOnly, uint8_t b) { + const int32_t *row=stateTable[state]; + int32_t entry=row[b]; + if(MBCS_ENTRY_IS_TRANSITION(entry)) { /* lead byte */ + return hasValidTrailBytes(stateTable, (uint8_t)MBCS_ENTRY_TRANSITION_STATE(entry)); + } else { + uint8_t action=(uint8_t)(MBCS_ENTRY_FINAL_ACTION(entry)); + if(action==MBCS_STATE_CHANGE_ONLY && isDBCSOnly) { + return FALSE; /* SI/SO are illegal for DBCS-only conversion */ + } else { + return action!=MBCS_STATE_ILLEGAL; + } + } +} + U_CFUNC void ucnv_MBCSToUnicodeWithOffsets(UConverterToUnicodeArgs *pArgs, UErrorCode *pErrorCode) { @@ -2052,6 +2111,34 @@ sourceIndex=nextSourceIndex; } else if(U_FAILURE(*pErrorCode)) { /* callback(illegal) */ + if(byteIndex>1) { + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + int8_t i; + for(i=1; + i<byteIndex && !isSingleOrLead(stateTable, state, isDBCSOnly, bytes[i]); + ++i) {} + if(i<byteIndex) { + /* Back out some bytes. */ + int8_t backOutDistance=byteIndex-i; + int32_t bytesFromThisBuffer=(int32_t)(source-(const uint8_t *)pArgs->source); + byteIndex=i; /* length of reported illegal byte sequence */ + if(backOutDistance<=bytesFromThisBuffer) { + source-=backOutDistance; + } else { + /* Back out bytes from the previous buffer: Need to replay them. */ + cnv->preToULength=(int8_t)(bytesFromThisBuffer-backOutDistance); + /* preToULength is negative! */ + uprv_memcpy(cnv->preToU, bytes+i, -cnv->preToULength); + source=(const uint8_t *)pArgs->source; + } + } + } break; } else /* unassigned sequences indicated with byteIndex>0 */ { /* try an extension mapping */ @@ -2062,7 +2149,7 @@ &offsets, sourceIndex, pArgs->flush, pErrorCode); - sourceIndex=nextSourceIndex+(int32_t)(source-(const uint8_t *)pArgs->source); + sourceIndex=nextSourceIndex+=(int32_t)(source-(const uint8_t *)pArgs->source); if(U_FAILURE(*pErrorCode)) { /* not mappable or buffer overflow */ @@ -2353,15 +2440,37 @@ if(c<0) { if(U_SUCCESS(*pErrorCode) && source==sourceLimit && lastSource<source) { - *pErrorCode=U_TRUNCATED_CHAR_FOUND; - } - if(U_FAILURE(*pErrorCode)) { /* incomplete character byte sequence */ uint8_t *bytes=cnv->toUBytes; cnv->toULength=(int8_t)(source-lastSource); do { *bytes++=*lastSource++; } while(lastSource<source); + *pErrorCode=U_TRUNCATED_CHAR_FOUND; + } else if(U_FAILURE(*pErrorCode)) { + /* callback(illegal) */ + /* + * Ticket 5691: consistent illegal sequences: + * - We include at least the first byte in the illegal sequence. + * - If any of the non-initial bytes could be the start of a character, + * we stop the illegal sequence before the first one of those. + */ + UBool isDBCSOnly=(UBool)(cnv->sharedData->mbcs.dbcsOnlyState!=0); + uint8_t *bytes=cnv->toUBytes; + *bytes++=*lastSource++; /* first byte */ + if(lastSource==source) { + cnv->toULength=1; + } else /* lastSource<source: multi-byte character */ { + int8_t i; + for(i=1; + lastSource<source && !isSingleOrLead(stateTable, state, isDBCSOnly, *lastSource); + ++i + ) { + *bytes++=*lastSource++; + } + cnv->toULength=i; + source=lastSource; + } } else { /* no output because of empty input or only state changes */ *pErrorCode=U_INDEX_OUTOFBOUNDS_ERROR; diff -ru icu.6175/source/test/cintltst/nccbtst.c icu/source/test/cintltst/nccbtst.c --- icu.6175/source/test/cintltst/nccbtst.c 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/cintltst/nccbtst.c 2009-06-02 15:47:38.000000000 +0100 @@ -2497,13 +2497,13 @@ static const uint8_t text943[] = { - 0x82, 0xa9, 0x82, 0x20, /*0xc8,*/ 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; - static const UChar toUnicode943sub[] = { 0x304b, 0xfffd, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; - static const UChar toUnicode943skip[]= { 0x304b, /*0xff88,*/ 0x0061, 0x6f22, 0x5b57}; + 0x82, 0xa9, 0x82, 0x20, 0x61, 0x8a, 0xbf, 0x8e, 0x9a }; + static const UChar toUnicode943sub[] = { 0x304b, 0x1a, 0x20, 0x0061, 0x6f22, 0x5b57 }; + static const UChar toUnicode943skip[]= { 0x304b, 0x20, 0x0061, 0x6f22, 0x5b57 }; static const UChar toUnicode943stop[]= { 0x304b}; - static const int32_t fromIBM943Offssub[] = {0, 2, 4, 5, 7}; - static const int32_t fromIBM943Offsskip[] = { 0, 4, 5, 7}; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 7 }; + static const int32_t fromIBM943Offsskip[] = { 0, 3, 4, 5, 7 }; static const int32_t fromIBM943Offsstop[] = { 0}; gInBufferSize = inputsize; @@ -2537,9 +2537,9 @@ { static const uint8_t sampleText[] = { 0x82, 0xa9, 0x61, 0x62, 0x63 , 0x82, - 0xff, /*0x82, 0xa9,*/ 0x32, 0x33}; - static const UChar toUnicode943sub[] = {0x304b, 0x0061, 0x0062, 0x0063, 0xfffd,/*0x304b,*/ 0x0032, 0x0033}; - static const int32_t fromIBM943Offssub[] = {0, 2, 3, 4, 5, 7, 8}; + 0xff, 0x32, 0x33}; + static const UChar toUnicode943sub[] = { 0x304b, 0x0061, 0x0062, 0x0063, 0x1a, 0x1a, 0x0032, 0x0033 }; + static const int32_t fromIBM943Offssub[] = { 0, 2, 3, 4, 5, 6, 7, 8 }; /*checking illegal value for ibm-943 with substitute*/ gInBufferSize = inputsize; gOutBufferSize = outputsize; diff -ru icu.6175/source/test/cintltst/nucnvtst.c icu/source/test/cintltst/nucnvtst.c --- icu.6175/source/test/cintltst/nucnvtst.c 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/cintltst/nucnvtst.c 2009-06-02 15:47:38.000000000 +0100 @@ -2606,7 +2606,7 @@ TestNextUCharError(cnv, source, source, U_INDEX_OUTOFBOUNDS_ERROR, "sourceLimit <= source"); /*Test for the condition where there is an invalid character*/ { - static const uint8_t source2[]={0xa1, 0x01}; + static const uint8_t source2[]={0xa1, 0x80}; TestNextUCharError(cnv, (const char*)source2, (const char*)source2+sizeof(source2), U_ZERO_ERROR, "an invalid character"); } /*Test for the condition where we have a truncated char*/ @@ -3899,11 +3899,11 @@ TestISO_2022_KR() { /* test input */ static const uint16_t in[]={ - 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F66,0x9F67,0x9F6A,0x000A,0x000D - ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC02,0xAC04 + 0x9F4B,0x9F4E,0x9F52,0x9F5F,0x9F61,0x9F67,0x9F6A,0x000A,0x000D + ,0x9F6C,0x9F77,0x9F8D,0x9F90,0x9F95,0x9F9C,0xAC00,0xAC01,0xAC04 ,0xAC07,0xAC08,0xAC09,0x0025,0x0026,0x0027,0x000A,0x000D,0x0028,0x0029 ,0x002A,0x002B,0x002C,0x002D,0x002E,0x53C3,0x53C8,0x53C9,0x53CA,0x53CB - ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53DF,0x53E1,0x53E2 + ,0x53CD,0x53D4,0x53D6,0x53D7,0x53DB,0x000A,0x000D,0x53E1,0x53E2 ,0x53E3,0x53E4,0x000A,0x000D}; const UChar* uSource; const UChar* uSourceLimit; diff -ru icu.6175/source/test/testdata/conversion.txt icu/source/test/testdata/conversion.txt --- icu.6175/source/test/testdata/conversion.txt 2009-06-02 15:47:18.000000000 +0100 +++ icu/source/test/testdata/conversion.txt 2009-06-02 15:57:41.000000000 +0100 @@ -48,12 +48,144 @@ toUnicode { Headers { "charset", "bytes", "unicode", "offsets", "flush", "fallbacks", "errorCode", "callback", "invalidChars" } Cases { + // Test ticket 5691: consistent illegal sequences + // The following test cases are for illegal character byte sequences. + // + // Unfortunately, we cannot use the Shift-JIS examples from the ticket + // comments because our Shift-JIS table is Windows-compatible and + // therefore has no illegal single bytes. Same for GBK. + // Instead, we use the stricter GB 18030 also for 2-byte examples. + // The byte sequences are generally slightly different from the ticket + // comment, simply using assigned characters rather than just + // theoretically valid sequences. + { + "gb18030", + :bin{ 618140813c81ff7a }, + "a\u4e02\\x81<\\x81\\xFFz", + :intvector{ 0,1,3,3,3,3,4,5,5,5,5,5,5,5,5,7 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "EUC-JP", + :bin{ 618fb0a98fb03c8f3cb0a97a }, + "a\u4e28\\x8F\\xB0<\\x8F<\u9022z", + :intvector{ 0,1,4,4,4,4,5,5,5,5,6,7,7,7,7,8,9,11 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "gb18030", + :bin{ 618130fc318130fc8181303c3e813cfc817a }, + "a5ed\\x810\u9f07\\x810<>\\x81<\u9f07z", + :intvector{ 0,1,5,5,5,5,6,7,9,9,9,9,10,11,12,13,13,13,13,14,15,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "UTF-8", + :bin{ 61f1808182f180813cf18081fff180ff3cf1ff3c3e7a }, + "a\U00040042\\xF1\\x80\\x81<\\xF1\\x80\\x81\\xFF\\xF1\\x80\\xFF<\\xF1\\xFF<>z", + :intvector{ 0,1,1,5,5,5,5,5,5,5,5,5,5,5,5,8,9,9,9,9,9,9,9,9,9,9,9,9,12,12,12,12,13,13,13,13,13,13,13,13,15,15,15,15,16,17,17,17,17,18,18,18,18,19,20,21 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-JP", + :bin{ 1b24424141af4142affe41431b2842 }, + "\u758f\\xAF\u758e\\xAF\\xFE\u790e", + :intvector{ 3,5,5,5,5,6,8,8,8,8,8,8,8,8,10 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ibm-25546", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-KR", + :bin{ 411b242943420e4141af4142affe41430f5a }, + "AB\uc88b\\xAF\uc88c\\xAF\\xFE\uc88dZ", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 411b242941420e4141af4142affe41430f5a }, + "AB\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,5,7,9,9,9,9,10,12,12,12,12,12,12,12,12,14,17 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "HZ", + :bin{ 417e7b4141af4142affe41437e7d5a }, + "A\u4eae\\xAF\u8c05\\xAF\\xFE\u64a9Z", + :intvector{ 0,3,5,5,5,5,6,8,8,8,8,8,8,8,8,10,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: consistent illegal sequences + // The following test cases are for illegal escape/designator/shift sequences. + // + // ISO-2022-JP and -CN with illegal escape sequences. + { + "ISO-2022-JP", + :bin{ 611b24201b244241411b283f1b28427a }, + "a\\x1B$ \u758f\\x1B\u2538z", + :intvector{ 0,1,1,1,1,2,3,7,9,9,9,9,10,15 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN", + :bin{ 611b2429201b2429410e41410f7a }, + "a\\x1B$) \u4eaez", + :intvector{ 0,1,1,1,1,2,3,4,10,13 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: ISO-2022-JP-2 with illegal single-shift SS2 and SS3 sequences. + // The first ESC N comes before its designator sequence, the last sequence is ESC+space. + { + "ISO-2022-JP-2", + :bin{ 4e1b4e4e1b2e414e1b4e4e4e1b204e }, + "N\\x1BNNN\xceN\\x1B N", + :intvector{ 0,1,1,1,1,2,3,7,10,11,12,12,12,12,13,14 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN-EXT", + :bin{ 4e1b4e4e1b242a484e1b4e4e4e4e1b204e }, + "N\\x1BNNN\u8f0eN\\x1B N", + :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + { + "ISO-2022-CN-EXT", + :bin{ 4f1b4f4f1b242b494f1b4f4f4f4f1b204f }, + "O\\x1BOOO\u492bO\\x1B O", + :intvector{ 0,1,1,1,1,2,3,8,11,13,14,14,14,14,15,16 }, + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: HZ with illegal tilde sequences. + { + "HZ", + :bin{ 417e20427e21437e80447e7b41417e207e41427e7f41437e7d5a }, + "A\\x7E B\\x7E!C\\x7E\\x80D\u4eae\\x7E\\x20\\x7E\u8c05\\x7E\\x7F\u64a9Z", + :intvector{ 0,1,1,1,1,2,3,4,4,4,4,5,6,7,7,7,7,7,7,7,7,9, // SBCS + 12,14,14,14,14,14,14,14,14,16,16,16,16,17,19,19,19,19,19,19,19,19,21, // DBCS + 25 }, // SBCS + :int{1}, :int{0}, "", "&C", :bin{""} + } + // Test ticket 5691: Example from Peter Edberg. + { + "ISO-2022-JP", + :bin{ 1b244230212f7e742630801b284a621b2458631b2842648061 }, + "\u4e9c\ufffd\u7199\ufffdb\ufffd$Xcd\ufffda", + :intvector{ 3,5,7,9,14,15,16,17,18,22,23,24 }, + :int{1}, :int{0}, "", "?", :bin{""} + } // test that HZ limits its byte values to lead bytes 21..7d and trail bytes 21..7e { "HZ", - :bin{ 7e7b21212120217e217f772100007e217e7d207e7e807e0a2b }, - "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd ~\ufffd+", - :intvector{ 2,4,6,8,10,12,14,18,19,21,24 }, + :bin{ 7e7b21212120217e217f772100007e217e7e7d207e7e807e0a2b }, + "\u3000\ufffd\u3013\ufffd\u9ccc\ufffd\ufffd\u3013 ~\ufffd+", + :intvector{ 2,4,6,8,10,12,14,15,19,20,22,25 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of ISO-2022-JP converter with hardcoded JIS X 0201 and @@ -61,8 +193,8 @@ { "ISO-2022-JP", :bin{ 1b284a7d7e801b2442306c20217f7e21202160217f22202225227f5f211b2842 }, - "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", - :intvector{ 3,4,5,9,11,13,15,17,19,21,23,25,27 }, + "}\u203e\ufffd\u4e00\ufffd\ufffd\ufffd\ufffd\xf7\ufffd\ufffd\u25b2\ufffd\u6f3e", + :intvector{ 3,4,5,9,11,12,14,16,17,19,21,23,25,27 }, :int{1}, :int{1}, "", "?", :bin{""} } // improve coverage of unrolled loops in ucnvmbcs.c/ucnv_MBCSSingleToBMPWithOffsets() @@ -341,7 +473,7 @@ { "ISO-2022-CN-EXT", :bin{ 411b4e2121 }, "\x41", :intvector{ 0 }, - :int{1}, :int{1}, "illesc", ".", :bin{ 1b4e } + :int{1}, :int{1}, "illesc", ".", :bin{ 1b } } // G3 designator: recognized, but not supported for -CN (only for -CN-EXT) {
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 25 Aug 2009 20:36:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 25 Aug 2009 20:36:03 GMT) (full text, mbox, link).
Message #25 received at 534590@bugs.debian.org (full text, mbox, reply):
Okay, bad brain day. Now I've posted the same patches three times. Sorry. Must have been looking at the wrong bug report or something.
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 08 Sep 2009 01:45:08 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 08 Sep 2009 01:45:08 GMT) (full text, mbox, link).
Message #30 received at 534590@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
I have backported CVE-2009-0153 into debian's stable and old-stable ICU packages based on Red Hat's backporting. Backporting CVE-2009-0153 using Red Hat's patches was a bit tricky since their backport depended on several earlier patches, some of which we had, and some of which we didn't, but I managed to figure out the patch dependencies. This involved pulling in a few additional patches of theirs and reworking one we already had for an earlier issue. Basically I replaced our patch for CVE-2008-1036 with Red Hat's after comparing the two patches and determining that they differed pretty much only by offsets. 3.6 was slightly more difficult than 3.8.1, but after figuring out the patch dependencies, they were quite similar. 3.6 required one patch beyond what 3.8.1 required. I don't have etch or lenny build environments, but I built the packages on squeeze and manually ran the test suites on the built results. In both cases, the test suites pass, so I think we can have pretty high confidence that this didn't break anything and correctly applied the patches. My confidence is somewhat higher with the 3.8 version than the 3.6 version, but it's pretty high in both cases. I'm attaching a tarfile containing _source.changes, .diff.gz, and .dsc files for icu_3.6-2etch4 and icu_3.8.1-3+lenny2. I've signed the changes and dsc files, but obviously feel free to make any changes required before uploading (needless to say). -- Jay Berkenbilt <qjb@debian.org>
[icu-CVE-2009-0153-backports.tar.gz (application/octet-stream, attachment)]
[Message part 3 (application/pgp-signature, inline)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Jay Berkenbilt <qjb@debian.org>
:
Bug#534590
; Package icu
.
(Tue, 08 Sep 2009 20:06:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Moritz Muehlenhoff <jmm@inutil.org>
:
Extra info received and forwarded to list. Copy sent to Jay Berkenbilt <qjb@debian.org>
.
(Tue, 08 Sep 2009 20:06:04 GMT) (full text, mbox, link).
Message #35 received at 534590@bugs.debian.org (full text, mbox, reply):
On Mon, Sep 07, 2009 at 09:34:40PM -0400, Jay Berkenbilt wrote: > > I have backported CVE-2009-0153 into debian's stable and old-stable ICU > packages based on Red Hat's backporting. > > Backporting CVE-2009-0153 using Red Hat's patches was a bit tricky since > their backport depended on several earlier patches, some of which we > had, and some of which we didn't, but I managed to figure out the patch > dependencies. This involved pulling in a few additional patches of > theirs and reworking one we already had for an earlier issue. Basically > I replaced our patch for CVE-2008-1036 with Red Hat's after comparing > the two patches and determining that they differed pretty much only by > offsets. 3.6 was slightly more difficult than 3.8.1, but after figuring > out the patch dependencies, they were quite similar. 3.6 required one > patch beyond what 3.8.1 required. > > I don't have etch or lenny build environments, but I built the packages > on squeeze and manually ran the test suites on the built results. In > both cases, the test suites pass, so I think we can have pretty high > confidence that this didn't break anything and correctly applied the > patches. My confidence is somewhat higher with the 3.8 version than the > 3.6 version, but it's pretty high in both cases. > > I'm attaching a tarfile containing _source.changes, .diff.gz, and .dsc > files for icu_3.6-2etch4 and icu_3.8.1-3+lenny2. I've signed the > changes and dsc files, but obviously feel free to make any changes > required before uploading (needless to say). Thanks, I'll review, build, test and upload. What testing do you propose? Any application using ICU, which is especially good for testing or is there even a test suite? Cheers, Moritz
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 08 Sep 2009 20:24:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 08 Sep 2009 20:24:04 GMT) (full text, mbox, link).
Message #40 received at 534590@bugs.debian.org (full text, mbox, reply):
Moritz Muehlenhoff <jmm@inutil.org> wrote: > On Mon, Sep 07, 2009 at 09:34:40PM -0400, Jay Berkenbilt wrote: >> I have backported CVE-2009-0153 into debian's stable and old-stable ICU >> packages based on Red Hat's backporting. >> . . . >> I don't have etch or lenny build environments, but I built the packages >> on squeeze and manually ran the test suites on the built results. In >> both cases, the test suites pass, so I think we can have pretty high >> confidence that this didn't break anything and correctly applied the >> patches. My confidence is somewhat higher with the 3.8 version than the >> 3.6 version, but it's pretty high in both cases. >> . . . > > Thanks, I'll review, build, test and upload. What testing do you > propose? Any application using ICU, which is especially good for > testing or is there even a test suite? The ICU packages themselves have test suites, which I ran in both cases. The test suites are not run by default by debian/rules because they take a long time and, in some case in the past, I had builds fail because of timeouts in autobuilders on slower platforms like the now defunct m68k and arm ports. Maybe I should consider turning this back on. In any case, after building the package, just cd to build-area/icu-x.y.z/icu/source and run make check. As for other testing, openoffice.org uses ICU, so opening up various translations in openoffice.org is probably a reasonable test. Based on past experience, I generally find the ICU test suite to be pretty trustworthy. When I test ICU myself before uploading to unstable, I also run the test suite for some of my own applications that link with xerces-c and use that to read UTF-8 encoded XML, since xerces-c uses ICU, but that's a weaker test case than running ICU's test suite. Hmm. It occurs to me that Red Hat's 3.6 packages include a patch to roll back ABI changes introduced by other patches. I'll check into that and get back to you momentarily to see whether it's necessary to apply all or part of it. In the mean time, I'd definitely suggest using openoffice.org at least to make sure everything appears to work. I don't have a specific test case that exercises the bug that this patch is designed to fix. Maybe there's one mentioned in the bug report or one of its links. I'll reply again about the ABI issue. -- Jay Berkenbilt <qjb@debian.org>
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 08 Sep 2009 20:33:19 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 08 Sep 2009 20:33:19 GMT) (full text, mbox, link).
Message #45 received at 534590@bugs.debian.org (full text, mbox, reply):
Stand by....I'm going to give you an updated diff.gz and dsc for 3.6 that includes the ABI rollback patch. This appears not to be necessary for 3.8.1.
Information forwarded
to debian-bugs-dist@lists.debian.org
:
Bug#534590
; Package icu
.
(Tue, 08 Sep 2009 20:39:08 GMT) (full text, mbox, link).
Acknowledgement sent
to Jay Berkenbilt <qjb@debian.org>
:
Extra info received and forwarded to list.
(Tue, 08 Sep 2009 20:39:08 GMT) (full text, mbox, link).
Message #50 received at 534590@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Okay, here are updated files for 3.6. These also pass their own test suite. (Run their test suite from build-tree/icu-3.6/icu/source, not build-area as I said in my earlier message.) You should test with these instead of the ones I sent yesterday. This includes the ABI rollback patch. It also builds fine under squeeze and passes its test suite. The 3.8.1 files I sent should be okay, but you should obviously test them as well. Sorry about the omission. --Jay
[icu-CVE-2009-0153-backports-3.6-updated.tar.gz (application/x-gzip, attachment)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Jay Berkenbilt <qjb@debian.org>
:
Bug#534590
; Package icu
.
(Tue, 08 Sep 2009 20:42:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Moritz Muehlenhoff <jmm@inutil.org>
:
Extra info received and forwarded to list. Copy sent to Jay Berkenbilt <qjb@debian.org>
.
(Tue, 08 Sep 2009 20:42:03 GMT) (full text, mbox, link).
Message #55 received at 534590@bugs.debian.org (full text, mbox, reply):
On Tue, Sep 08, 2009 at 04:26:48PM -0400, Jay Berkenbilt wrote: > > Okay, here are updated files for 3.6. These also pass their own test > suite. (Run their test suite from build-tree/icu-3.6/icu/source, not > build-area as I said in my earlier message.) You should test with these > instead of the ones I sent yesterday. This includes the ABI rollback > patch. It also builds fine under squeeze and passes its test suite. > The 3.8.1 files I sent should be okay, but you should obviously test > them as well. Sorry about the omission. Thanks, I'll look into it tomorrow. Cheers, Moritz
Reply sent
to Jay Berkenbilt <qjb@debian.org>
:
You have taken responsibility.
(Thu, 17 Sep 2009 02:30:03 GMT) (full text, mbox, link).
Notification sent
to Kees Cook <kees@debian.org>
:
Bug acknowledged by developer.
(Thu, 17 Sep 2009 02:30:03 GMT) (full text, mbox, link).
Message #60 received at 534590-close@bugs.debian.org (full text, mbox, reply):
Source: icu Source-Version: 3.8.1-3+lenny2 We believe that the bug you reported is fixed in the latest version of icu, which is due to be installed in the Debian FTP archive: icu-doc_3.8.1-3+lenny2_all.deb to pool/main/i/icu/icu-doc_3.8.1-3+lenny2_all.deb icu_3.8.1-3+lenny2.diff.gz to pool/main/i/icu/icu_3.8.1-3+lenny2.diff.gz icu_3.8.1-3+lenny2.dsc to pool/main/i/icu/icu_3.8.1-3+lenny2.dsc libicu-dev_3.8.1-3+lenny2_i386.deb to pool/main/i/icu/libicu-dev_3.8.1-3+lenny2_i386.deb libicu38-dbg_3.8.1-3+lenny2_i386.deb to pool/main/i/icu/libicu38-dbg_3.8.1-3+lenny2_i386.deb libicu38_3.8.1-3+lenny2_i386.deb to pool/main/i/icu/libicu38_3.8.1-3+lenny2_i386.deb A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 534590@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Jay Berkenbilt <qjb@debian.org> (supplier of updated icu package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmaster@debian.org) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Format: 1.8 Date: Mon, 07 Sep 2009 20:00:39 -0400 Source: icu Binary: libicu38 libicu38-dbg libicu-dev lib32icu38 lib32icu-dev icu-doc Architecture: source all i386 Version: 3.8.1-3+lenny2 Distribution: stable-security Urgency: high Maintainer: Jay Berkenbilt <qjb@debian.org> Changed-By: Jay Berkenbilt <qjb@debian.org> Description: icu-doc - API documentation for ICU classes and functions lib32icu-dev - Development files for International Components for Unicode (32-bi lib32icu38 - International Components for Unicode (32-bit) libicu-dev - Development files for International Components for Unicode libicu38 - International Components for Unicode libicu38-dbg - International Components for Unicode Closes: 534590 Changes: icu (3.8.1-3+lenny2) stable-security; urgency=high . * Apply patch CVE-2009-0153.patch to fix problem handling invalid byte sequences during Unicode conversion. Thanks to Red Hat for backporting the patch to ICU version 3.8.1. Applying this patch to the debian package required pulling in three additional Red Hat patches for tickets 5797, 6001, and 6002 in ICU's issue tracking system as well as adjusting offsets in CVE-2008-1036.patch. (Closes: #534590) Checksums-Sha1: 9e19a04c83c78507a17a4b63fc3cd888dff1f0fe 1298 icu_3.8.1-3+lenny2.dsc 380e3dffcf780226e3c48c5a1c075513554f5083 41943 icu_3.8.1-3+lenny2.diff.gz fd1b481cf86ffe12f003bbb88202a089b7f7ab90 3659700 icu-doc_3.8.1-3+lenny2_all.deb 61be29d34827455982116937e8722928c3af625e 5918780 libicu38_3.8.1-3+lenny2_i386.deb 08f530b306a9dd05dcb27cabb8e49cc82089817c 2278340 libicu38-dbg_3.8.1-3+lenny2_i386.deb 75862e07dfc12a6bd37aab9ca0b7bcd7d09effa9 6975168 libicu-dev_3.8.1-3+lenny2_i386.deb Checksums-Sha256: 47afc1ac4ae7d37f8f26d894de75082a71a6653b6336eca853ec165a9e6b4d90 1298 icu_3.8.1-3+lenny2.dsc ba7999f8ff453a10163edffcaf5f7f9ca946100b4f83da47c68616860ee1cb4f 41943 icu_3.8.1-3+lenny2.diff.gz ad8ded72ac122d6609b88e66cc0f75c2e06f079466cd8287e8ef9d9c11c5259d 3659700 icu-doc_3.8.1-3+lenny2_all.deb 6450aab55d8ea87cf14a46f804a0c5ebdce82890cd4c51b67026db4820bb467d 5918780 libicu38_3.8.1-3+lenny2_i386.deb 5a5c53fda354063f299a9c5d06d2a0d97c096b9d7707ece07987c574162aaf4a 2278340 libicu38-dbg_3.8.1-3+lenny2_i386.deb 942a3560e7d1c4a8641cb1a11283ce72c99b5473366acffea5244efe77f365a0 6975168 libicu-dev_3.8.1-3+lenny2_i386.deb Files: e0528ce00964025af9b2f940f588664a 1298 libs optional icu_3.8.1-3+lenny2.dsc 57d76fe9884c543a634bfd44425a42c6 41943 libs optional icu_3.8.1-3+lenny2.diff.gz 69882d02e07863b195b7e9b798bdeff2 3659700 doc optional icu-doc_3.8.1-3+lenny2_all.deb a471bd785fecadc4a7acd91be38a1bca 5918780 libs optional libicu38_3.8.1-3+lenny2_i386.deb b95d691813f7d32d7bc1a8aa96ddcd94 2278340 libs extra libicu38-dbg_3.8.1-3+lenny2_i386.deb e5c844c5ce908655075dd49c57182b3f 6975168 libdevel optional libicu-dev_3.8.1-3+lenny2_i386.deb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAkqvzHkACgkQXm3vHE4uylps0gCcCkfaGXyWtJrz8fqno3S/XXoM zfwAoNAmgYLnJww5PNtseCjH1TVNeV0q =1o3y -----END PGP SIGNATURE-----
Reply sent
to Jay Berkenbilt <qjb@debian.org>
:
You have taken responsibility.
(Mon, 05 Oct 2009 01:57:03 GMT) (full text, mbox, link).
Notification sent
to Kees Cook <kees@debian.org>
:
Bug acknowledged by developer.
(Mon, 05 Oct 2009 01:57:03 GMT) (full text, mbox, link).
Message #65 received at 534590-close@bugs.debian.org (full text, mbox, reply):
Source: icu Source-Version: 3.6-2etch4 We believe that the bug you reported is fixed in the latest version of icu, which is due to be installed in the Debian FTP archive: icu-doc_3.6-2etch4_all.deb to pool/main/i/icu/icu-doc_3.6-2etch4_all.deb icu_3.6-2etch4.diff.gz to pool/main/i/icu/icu_3.6-2etch4.diff.gz icu_3.6-2etch4.dsc to pool/main/i/icu/icu_3.6-2etch4.dsc libicu36-dev_3.6-2etch4_i386.deb to pool/main/i/icu/libicu36-dev_3.6-2etch4_i386.deb libicu36_3.6-2etch4_i386.deb to pool/main/i/icu/libicu36_3.6-2etch4_i386.deb A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 534590@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Jay Berkenbilt <qjb@debian.org> (supplier of updated icu package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmaster@debian.org) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Format: 1.7 Date: Mon, 07 Sep 2009 20:21:59 -0400 Source: icu Binary: libicu36-dev libicu36 icu-doc Architecture: source all i386 Version: 3.6-2etch4 Distribution: oldstable-security Urgency: high Maintainer: Jay Berkenbilt <qjb@debian.org> Changed-By: Jay Berkenbilt <qjb@debian.org> Description: icu-doc - API documentation for ICU classes and functions libicu36 - International Components for Unicode (libraries) libicu36-dev - International Components for Unicode (development files) Closes: 534590 Changes: icu (3.6-2etch4) oldstable-security; urgency=high . * Apply patch CVE-2009-0153.patch to fix problem handling invalid byte sequences during Unicode conversion. Thanks to Red Hat for backporting the patch to ICU version 3.6. Applying this patch to the debian package required pulling in three additional Red Hat patches for tickets 5483, 5797, 6001, and 6002 in ICU's issue tracking system as well as adjusting offsets in CVE-2008-1036.patch. (Closes: #534590) Files: 8b600075600533ce08c9801ffa571a19 592 libs optional icu_3.6-2etch4.dsc 601af38fe10a27e08e40985c409bc6c4 45190 libs optional icu_3.6-2etch4.diff.gz 8bf16fb7db375fb14de7082bcb814733 3239572 doc optional icu-doc_3.6-2etch4_all.deb f5d9e50ecb224df9ae4f0c7057097f54 5470148 libs optional libicu36_3.6-2etch4_i386.deb d8e1c31e6f1d238353340a9b82da1ed8 6466444 libdevel optional libicu36-dev_3.6-2etch4_i386.deb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFKr8TrXm3vHE4uyloRAtbEAJ9FHPzNYtHX8cuG3Xf8mpD1+bP39wCgndMb pIx4vAu9EHFoerZ+8wRn4Rs= =xhfn -----END PGP SIGNATURE-----
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org
.
(Mon, 02 Nov 2009 07:41:42 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.
Vulmon Search is a vulnerability search engine. It gives comprehensive vulnerability information through a very simple user interface.