tracker issue | What iT iS dESign studios

Title:

Base64 Strings Mishandled During Serialization

| View in Tracker

Status/Resolution/Reason: Closed/Withdrawn/Duplicate

Reporter/Name(from Bugbase): John Nelson / John Nelson (John Nelson)

Created: 10/10/2014

Components: Language, Serialization

Versions: 11.0

Failure Type: Incorrect w/Workaround

Found In Build/Fixed In Build: CF11_Final /

Priority/Frequency: Normal / Most users will encounter

Locale/System: ALL / Win 2008 Server R2 64 bit

Vote Count: 15

Duplicate ID:	CF-3941059

Problem Description: When encoding binary data (in this case an image) as Base64 and adding it to a struct which is then serialized into JSON (I have not tested XML serialization), some combinations of characters are altered which should not be, resulting in invalid Base64 data.  Specifically, "u+" is changed to "\u" which would be invalid Base64 since "\" is not an included character.  I'm assuming this is happening because "u+" is the common prefix for Unicode characters and "\u" would allow the browser to interpret it as such.  However, in a Base64 string, there are no Unicode characters and changing the string at best makes it unusable, at worst it would corrupt the data into something very different from what was expected.

This issue does not occur every time as "u+" may not occur in every Base64 string, but it has happened with a number of images I have encoded and sent using our REST web service.

Steps to Reproduce:  

Encode a binary object as Base64 using either binaryEncode() or toBase64().  
Insert the resulting string into a struct.
Serialize the struct into JSON using serializeJSON().
Deserialize and attempt to decode the Base64 string.

Actual Result:
Attempting to decode the produced string results in an error which reports that it is not valid Base64.

Example Actual Result (excerpt): PFpA4ntd/j58gtZuyiJ\u9DdFEgv7lOKdVty6ynw

Expected Result:
Attempting to decode the produced string should result in the creation of the same binary data as what was encoded (an image, file, etc.).

Example Expected Result (excerpt): PFpA4ntd/j58gtZuyiJu+9DdFEgv7lOKdVty6ynw

Any Workarounds:
I was able to work around the issue by changing the "+" and "/" characters in the Base64 with other characters, in this case: "@^$%" and "*&%^", respectively.
After serializing the struct, I replaced those strings of characters with the appropriate Base64 characters and returned that instead of the direct result of the serializeJSON().

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3837347

External Customer Info:
External Company:  
External Customer Name: js.nelson
External Customer Email:  
External Test Config: My Hardware and Environment details:

Windows Server 2008 R2

IIS 7.5.7600.16385

64-bit ColdFusion 11 Update 1



Running on a VMWare Virtual Machine:

Intel Xeon E2-2680 v2 @ 2.8GHz (2 processors)

8 GB RAM

64-bit OS

Attachments:

October 22, 2014 00:00:00: 1_TestImage.jpg
March 10, 2015 00:00:00: 2_Stryker_model_MW3.png
March 18, 2015 00:00:00: 3_jsonFail.zip

Comments:

js.nelson,
I can observe in general that things are working with some images and fail with others (especially larger images).
But I can see that this works when dealing with images with "u+", in atleast one case.
Perhaps you can share some data (an image perhaps, smaller the better) which on decoding would result in an error? We can then compare notes, when processing the same data.
I have not been able to arrive at the diff, where the mismatch begins because out-of-memory err results when processing large images.
I used CF11 release build on Win 7x64 for the test.

Thanks.

Test code:
<cfscript>
	//base_file = "#expandpath("./")#cf_app.jpg";
	//base_file = "#expandpath("./")#com-object-properties.png";
	base_file = "#expandpath("./")#btn-default-medium-bg.gif";

	bin_img = fileReadBinary(base_file);
	b64_img = BinaryEncode(bin_img, "Base64");

	b64_stu = structNew();
	b64_stu.val = b64_img;

	srl_b64_stu = serializeJSON(b64_stu);
	dsrl = deserializeJSON(srl_b64_stu);

	org_str = toString(b64_img);
	dcd_str = toString(dsrl.val);

	for(c=1; c <= Len(dcd_str); c++)
	{
		org_substr = mid(org_str, 1, c);
		dcd_substr = mid(dcd_str, 1, c);
		if(org_substr NEQ dcd_substr)
		{
			writeOutput("diff pos: " + c & "<br>");
			writeOutput("org_substr: " & org_substr & "<br>");
			writeOutput("dcd_substr: " & dcd_substr & "<br>");
			abort;
		}
		else
			writeOutput( "substr" & c & "matched" & "substr:" & org_substr & "<br>" );
	}
 
	dcd_img = BinaryDecode(dsrl.val, "Base64");

	if(toString(bin_img) EQ toString(dcd_img))
		writeOutput("The decoded string matches the original.");
	else
		writeOutput("The decoded string does NOT match the original.");
</cfscript>

Comment by Piyush K.

10687 | October 22, 2014 06:29:33 AM GMT

I ran your code with the test image that I just attached and it did indeed fail:

diff pos: 775
org_substr: /9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAHgAoADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAABgAEBQcICQoBCwMC/8QAfhAAAAMDCAYIAwUDBAYOERALAQMEAAURBhMUITFBYfAHUXGBkaECCCMkscHR4RU08QkWM0RUEiVkFzVDdAoiJjJTYycoNjdFRlJVZYKEhZSVGEJHSFZmcnWSlqKkpbXF1eIpODlXWGJnd3iGiKa3uNbmGRpop6i0tsLG0tf/xAAeAQABBAIDAQAAAAAAAAAAAAAGBAUHCAAJAQMKAv/EAE4RAAEBBQQHBgQEAwUGBwABBQERAgQFITEABkFRAwcSFGFxgQgVIpGh8CSxwdEJMuHxExZCJTRS0uIXI2KCkqImNTZEcrLCGCczRUZU/9oADAMBAAIRAxEAPwD1su06Z/AuqGz3vDBidH21l3nqtjfHlGpo9zoz52f1eeqseGuOxiA4mZ/A1X47wv8ARtYlwYI+CEB9fERBWop9VrjLmWxp6+LDpmBPmmOdMLR50wR2B++ENt+7V4NHnf4AjiA7boePFpA4ntd/j58gtZuyiJu
dcd_substr: /9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAHgAoADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAABgAEBQcICQoBCwMC/8QAfhAAAAMDCAYIAwUDBAYOERALAQMEAAURBhMUITFBYfAHUXGBkaECCCMkscHR4RU08QkWM0RUEiVkFzVDdAoiJjJTYycoNjdFRlJVZYKEhZSVGEJHSFZmcnWSlqKkpbXF1eIpODlXWGJnd3iGiKa3uNbmGRpop6i0tsLG0tf/xAAeAQABBAIDAQAAAAAAAAAAAAAGBAUHCAAJAQMKAv/EAE4RAAEBBQQHBgQEAwUGBwABBQERAgQFITEABkFRAwcSFGFxgQgVIpGh8CSxwdEJMuHxExZCJTRS0uIXI2KCkqImNTZEcrLCGCczRUZU/9oADAMBAAIRAxEAPwD1su06Z/AuqGz3vDBidH21l3nqtjfHlGpo9zoz52f1eeqseGuOxiA4mZ/A1X47wv8ARtYlwYI+CEB9fERBWop9VrjLmWxp6+LDpmBPmmOdMLR50wR2B++ENt+7V4NHnf4AjiA7boePFpA4ntd/j58gtZuyiJ?

The character at 776 is a +.  So, it's a situation where there is a u+.  The full unicode code for that character is "u+9DdF" which happens to be the next piece of the string.  When ColdFusion 11 encodes that, it converts the "u+" notation to the JavaScript "\u" notation which obviously alters the string, corrupting the image data.

Comment by External U.

10688 | October 22, 2014 10:45:49 AM GMT

It does not happen consistently because it would need to be a "u+" followed by 4 "HEX" characters (0-9, A-F).  I don't know how frequently that type of string would occur when encoding an image, but it does happen sometimes.  The thing that I find especially confusing is why CF 10 handles this image just fine, but CF 11 does not.

Comment by External U.

10689 | October 22, 2014 10:51:40 AM GMT

Same error with CF 10 Update 14 or CF 10 Update 15. With CF 10 Update 13 the error doesn´t occour.

Vote by External U.

10709 | February 05, 2015 03:00:50 AM GMT

The u+abcd encoding issue is causing JSON serialization to break in my mobile app.  I don't understand why you don't provide a mechanism to bypass that additional level of encoding so the return strings are not altered.

Vote by External U.

10710 | February 06, 2015 10:44:49 AM GMT

Base64 Strings Mishandled During Serialization

Vote by External U.

10711 | February 24, 2015 03:56:35 AM GMT

We need that for production environments of middle sized companies. Implementing workarounds (if any exist) is no option. This is a blocker indeed!

Vote by External U.

10712 | February 24, 2015 03:57:16 AM GMT

Plz fix this, the workaround is very annoying.

Vote by External U.

10713 | February 24, 2015 04:00:36 AM GMT

Base64 Strings Mishandled During Serialization

Vote by External U.

10714 | February 24, 2015 04:03:22 AM GMT

I can also replicate this with a large png image.  Please fix!

Running: hf-10-00015

Vote by External U.

10715 | March 10, 2015 04:54:58 AM GMT

Same problem trying to pass json to elasticsearch. Errors on "\u".

Vote by External U.

10716 | March 17, 2015 10:47:39 AM GMT

Per an email exchange with Elishia Dvorak <edvorak@adobe.com>, I have attached a zip file with code you can use to test this issue:

Here is the text of my email:

From: Gerry Gurevich <gerry.gurevich@gmail.com>
Date: Wed, Mar 11, 2015 at 9:12 PM
Subject: Bug ID CF-3837347
To: elishia@adobe.com

https://bugbase.adobe.com/index.cfm?event=bug&id=CF-3837347

Thanks for the response today.  I do not find the suggested workaround to actually work in my case.  I ended up with my own workaround that is computationally intensive and potentially prone to failure.  I've provided enough data in the attached zip file for Adobe engineers to work with.  You'll note that close inspection of the stringComparison.txt file shows that there is no quick easy workaround after serialization. If you search for \u in the second string, you will find that it replaces both lower case and uppercase variants from the first string of U+ and u+.  Turning it back means that you need to know the original case.

My workaround was to leave this field out of the serialized json and then do a series of replacements to inject a json formatted string into each record.  That works well enough for a small data set. But these are large strings and that is a lot of string manipulation which could easily overflow the available memory in the JVM without taking extraordinary measures.

According to the ticket, this worked in CF10HF13 and no longer works in CF10HF15.  I can't verify that it ever worked since this is new functionality for me.  But it is clearly broken now.  I'd like to see the priority elevated.  

Again, thanks for your time.  I hope to hear something on this status soon.

Comment by External U.

10690 | March 18, 2015 02:48:23 AM GMT

I logged another bug (I didn't find this bug report until just now.) when our application broke in the same way when we updated CF10: #CF-3941059

The problem as I see it is that the fix in #CF-3561029 is broken.
SerializeJSON("xU+a600x") should result in "xU+a600x".
SerializeJSON("x?x") should result in "x\ua600x".

Comment by External U.

10691 | March 18, 2015 04:36:17 AM GMT

This is also a bug in ColdFusion 10 Update 15 that just started hitting us.

Vote by External U.

10717 | March 20, 2015 07:54:50 AM GMT

We were attempting to serialize a query that had been base 64 encoded that contained a MS word smart quote. As of Update 12 of CF10, we had no issue, right after the upgrade, this particular functionality started bombing in production.

Comment by External U.

10692 | March 20, 2015 07:57:01 AM GMT

As Jonas Meller has mentioned, we have changed  the way an Unicode char gets serialized wherein U+xxxx  gets converted to \uxxxx (fixed in #CF-3561029). We are doing it a per the spec.  Spec can be refered @  http://www.ietf.org/rfc/rfc4627.txt (section 2.5)

Any character may be escaped.  If the character is in the Basic
   Multilingual Plane (U+0000 through U+FFFF), then it may be
   represented as a six-character sequence: a reverse solidus, followed
   by the lowercase letter u, followed by four hexadecimal digits that
   encode the character's code point.  The hexadecimal letters A though
   F can be upper or lowercase.  So, for example, a string containing
   only a single reverse solidus character may be represented as
   "\u005C".

Justin, we tried to make the serialization proper but it broke your functionality. Let me know how we can solve your issue??

Comment by Awdhesh K.

10693 | March 30, 2015 04:35:02 AM GMT

No you are not following the spec.

"U+a600" is string consisting of 6 characters:
* Latin Capital Letter U (U+0055)
* Plus Sign (U+002B)
* Latin Small Letter A (U+0061)
* Digit Six (U+0036)
* Digit Zero (U+0030)
* Digit Zero (U+0030)
It could potentially be encoded as (but please don't do this): "\u0055\u002b\u0061\u0036\u0030\u0030".

"?" is string consisting of 1 characters:
* Vai Syllable Je (U+A600)
If encoded the result is "\ua600".

Comment by External U.

10694 | March 30, 2015 05:49:12 AM GMT

Awdhesh,

How does your response relate to the issues we see with the PNG file attached to this bug.  It seems to me that there are no unicode characters but rather sequences of codes that appear to be unicode.  Perhaps I'm not understanding the underlying data in the PNG file. Maybe there really are unicode characters.  I have provided very good resources in the jsonFail.zip file attached to this ticket. 

Can you explain how this should be handled if the current CF code is working as expected?

Comment by External U.

10695 | March 30, 2015 09:59:30 AM GMT

We are hitting this with a major client. It's time to fix such a simple bug!

Vote by External U.

10718 | May 11, 2015 01:56:55 PM GMT

We are having similar serialization issues using the couchbase provider for cachebox, which serializes complex data types for putting into Couchbase. We are having issues both on a CF11 server in testing as well as our CF10 servers that have update 14 (update 13 seems to work okay). The data being cached is a query returned from a MSSQL storedproc call so the only way to deal with the issue is to modify the query in some way to change what is being serialized, like wrapping it in a query-of-query. A really bad option to have to use for production server, so we are stuck on update 13 unless this is fixed.

Vote by External U.

10719 | May 11, 2015 02:01:35 PM GMT

I have encountered this when using caching (cachebox) - which serializes db and other objects under the hood. I have a case with a specific stored procedure if you need additional cases for replecation. I can provide an MDF file to attach to an MSSQL db server complete with an SP. If you instantiate the SP using CFSTOREDPROC and produce a result using cfprocresult - and then serialize it, it will. using the SP inside a cfquery or using Q of a Q to change the nature of the result set (adds a bunch of metadata) "fixes" the problem.

Comment by External U.

10696 | May 11, 2015 02:06:29 PM GMT

FYI for Adobe engineers - this problem does NOT occur on CF10 update 13, but it DOES occur on CF 10 update 14. Update 14 might be a good spot to look.

Comment by External U.

10697 | May 11, 2015 02:07:32 PM GMT

I just read some of these notes. Come on Adobe folks you can seriously say "well we know it doesn't work but we are just doing it per the spec." That's not acceptable is it?

If I serialize ANY object via base64 I should be able to take the string and DE-serialize it right? I don't really "care" how it's encoded - per the spec, not per the spec etc. I only care that I can serialze and ojbect, store it, and get it back. That's how a lot of cache mechanisms work these days. 

Clearly this is broken - let's take a closer look and FIX it before we throw up our hands and refer back to "the spec".  Thanks guys.

Comment by External U.

10698 | May 11, 2015 02:21:29 PM GMT

See my comment on the other ticket (https://bugbase.adobe.com/index.cfm?event=bug&id=CF-3941059). You've misread the spec, Awdhesh.

Pls just sort it out.

-- 
Adam

Comment by External U.

10699 | May 12, 2015 12:22:57 PM GMT

This is a very frustrating bug. If CF serializes something it should be able to de-serialize what it serialized.

Vote by External U.

10720 | May 12, 2015 12:25:44 PM GMT

We are running into this too when recently upgrade to coldfusion 11. It seems this has affected many users. Is Adobe having any plan when to fix this?

Comment by External U.

10700 | May 29, 2015 12:32:27 AM GMT

When will Adobe finally fix this bug?

Comment by External U.

10701 | June 15, 2015 05:21:38 AM GMT

We cannot send pdf files via rest services due to jsonserialize doing bad things to the base64 string.

Is there a target date on the delivery of the resolution?

Vote by External U.

10721 | June 18, 2015 09:57:17 AM GMT

Please explain in 25 characters or more how this bug impacts productivity and why you are adding a vote.

Vote by External U.

10722 | July 30, 2015 11:59:26 AM GMT

This is extremely frustrating for us, because Adobe is making a habit of ONLY fixing bugs on CF11 now, and yet WON'T fix a serious bug like this such that we could upgrade even if we wanted to.

Comment by External U.

10702 | July 30, 2015 11:59:52 AM GMT

We are having this same issue on a project I am working on.

Vote by External U.

10723 | July 30, 2015 12:05:48 PM GMT

In case Adobe has forgotten ColdFusion 10 is supported until 5/16/2017 (https://www.adobe.com/support/products/enterprise/eol/eol_matrix.html#63). This means that Adobe is supposed to be fixing bugs in ColdFusion 10 until 5/16/2017. The last time I checked we are still in 2015 so there's little excuse here other than complete disregard for the customer as to why Adobe is refusing to fix this and other bugs in ColdFusion 10.

Comment by External U.

10703 | July 30, 2015 01:58:04 PM GMT

Wil - this isn't even fixed on CF11 yet!! It's ridiculous that they would break something that effects so many things, and then leave it unfixed this long.

Comment by External U.

10704 | August 11, 2015 12:32:08 PM GMT

@Mary - This bug is a duplicate of CF-3941059. We are evaluating a possible fix for CF-3941059.

Comment by Vamseekrishna N.

10705 | August 18, 2015 10:09:06 PM GMT

Just out of curiosity, why did this ticket get marked as a duplicate when it was submitted before the other one?

Comment by External U.

10706 | August 19, 2015 06:42:28 AM GMT

The fix for CF-3941059 will be made available in the next update.

Comment by Vamseekrishna N.

10707 | August 20, 2015 03:22:18 AM GMT

This bug is being withdrawn as duplicate as the similar issues has been raised as part of the bug #CF-3941059.
Hence, the fix for the same will be available in the next update for ColdFusion 10 & 11.

Thanks!

Comment by S P.

10708 | October 26, 2015 03:48:11 AM GMT

tracker issue : CF-3837347

Base64 Strings Mishandled During Serialization