tracker issue | What iT iS dESign studios

Title:

Source code file should not need special charset encoding instructions for them to compile properly

| View in Tracker

Status/Resolution/Reason: Closed/Withdrawn/CannotReproduce

Reporter/Name(from Bugbase): Adam Cameron / Adam Cameron (Adam Cameron)

Created: 10/07/2012

Components: General Server

Versions: 10.0

Failure Type: Data Corruption

Found In Build/Fixed In Build: Final /

Priority/Frequency: Critical / Most users will encounter

Locale/System: English / Win All

Vote Count: 2

Problem Description:
If a source code file uses - for example - UTF-8 encoding, then one needs to TELL the CF compiler this, with s <cfprocessingdirective> tag.

Steps to Reproduce:
Put some non-ASCII-equivalent UTF-8 text into a CFM file and compile it, inspect results.  EG, something like this:
<!---<cfprocessingdirective pageencoding="UTF-8">--->
<cfset message = "???? ??????">
<cfoutput>#message#</cfoutput>

Actual Result:
à¤¹à¥ˆà¤²à¥‹ à¤¦à¥?à¤¨à¤¿à¤¯à¤¾

Expected Result:
???? ??????

Any Workarounds:
Don't care.  Should just work.  Any text processor worth its salt (even NOTEPAD) can just work it out for itself.  CF should be able to as well.

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3342141

External Customer Info:
External Company:  
External Customer Name: Adam Cameron.
External Customer Email:  
External Test Config: My Hardware and Environment details:

Attachments:

Comments:

Hi Adam,

I see the correct result, but it's b/c I use the JVM arg: -Dfile.encoding=UTF-8 to patch-up that UTF-8 hole.  (I say 'patch-up' just b/c I want UTF-8 everywhere, by default)

Basically, CF reads URL and Form input as UTF-8 by default.  And it encodes output as UTF-8 by default.  BUT, it reads files in the OS's default encoding by default.  =(

If CF can't include the -Dfile.encoding=UTF-8 JVM arg by default, then this would probably be a good CF Admin setting.  I'd kinda like the CF Admin's JVM page to have a set of checkboxes for the most common args that people add (rather than having to remember them).

Additionally, it'd be nice if CF had a charsetDetect() function which tried to guess the encoding of a string and of a text file.

Thanks,
-Aaron

Comment by External U.

17696 | October 08, 2012 12:31:06 PM GMT

Very valid, I'm having to publish code with cfprocessingdirective because of that now, as I don't wanna have to change the JVM args on all my instances.

Vote by External U.

17698 | November 05, 2012 06:26:26 AM GMT

As someone in a hosted environment, I can't change the JVM args to get around this.  I run a manga/anime review site and a personal site which regularly uses Japanese romaji characters. This bug has caused me so many headaches trying to figure out why my characters suddenly went haywire even though I had all the proper encodings set for display and in the database.  Please fix this, hopefully in both 10 and 9.

Vote by External U.

17699 | June 04, 2013 09:00:13 PM GMT

unable to observe the issue with an unpatched CF10 (Win server 2008/IIS). no encoding arguments in the jvm.config file. 
@Adam, have you saved that test cfm in UTF format. Can we take a look at you jvm.config file.

Comment by Piyush K.

17697 | July 24, 2013 03:56:20 AM GMT

tracker issue : CF-3342141

Source code file should not need special charset encoding instructions for them to compile properly