tracker issue : CF-3924625

select a category, or use search below
(searches all categories and all time range)
Title:

[ANeff] ER for: canonicalizeURL(inputString, restrictMultiple, restrictMixed[, throwOnError=false])

| View in Tracker

Status/Resolution/Reason: To Fix//

Reporter/Name(from Bugbase): Aaron Neff / Aaron Neff (Aaron Neff)

Created: 01/21/2015

Components: Language

Versions: 11.0

Failure Type: Enhancement Request

Found In Build/Fixed In Build: CF11_Final /

Priority/Frequency: Trivial / Unknown

Locale/System: English / Win All

Vote Count: 3

canonicalize(myURL) is broken b/c it incorrectly interprets some query string parameters as character entities and converts them to symbols.

Example:

writeOutput(canonicalize("http://www.domain.com/?foo=bar&pid=product_id", true, true, false)

returns: http://www.domain.com/?foo=bar?d=product_id  (note the ampersand is gone and there's a Pi symbol between 'r' and 'd')

Thus, URLs are a special case and a URL-specific canonicalizeURL() function is needed that takes same parameters as canonicalize(). Example:

<cfscript>
  // Canonicalizes a URL b/c canonicalize() converts, for example, &pi to the Pi symbol in the query string ?foo=bar&pid=product_id
  function udfCanonicalizeURL(required string inputString, required boolean restrictMultiple, required boolean restrictMixed, boolean throwOnError=false) {
	  var canonicalizedURL="";
	  ARGUMENTS.inputString = trim(ARGUMENTS.inputString);
	  if(isValid("url", ARGUMENTS.inputString)) {//note: has a bug per #3924581
		  var pattern = "([^?##]*)?(\?([^##]*))?(##(.*))?";//parses the URL into schemeHostPath, querystring and fragment
		  var parsedURL = reFind(pattern, ARGUMENTS.inputString, 1, true);
		  if(parsedURL.len[2]) {//2=schemeHostPath 
			  canonicalizedURL &= canonicalize(mid(ARGUMENTS.inputString, parsedURL.pos[2], parsedURL.len[2]), ARGUMENTS.restrictMultiple, ARGUMENTS.restrictMixed, ARGUMENTS.throwOnError);
			  if(parsedURL.len[4]) {//4=querystring
				  var qs = mid(ARGUMENTS.inputString, parsedURL.pos[4], parsedURL.len[4]);
				  var canonicalizedQS="";
				  var qsPairs = reMatch("[\&;]?[^\&;]+", qs);
				  for(var qsPair in qsPairs) {
					  var qsPairNoDelim = listLast(qsPair, "&;");
					  canonicalizedQS &= ((reFind("^[\&;].*", qsPair)?left(qsPair, 1):'') & canonicalize(listFirst(qsPairNoDelim, "="), ARGUMENTS.restrictMultiple, ARGUMENTS.restrictMixed, ARGUMENTS.throwOnError));
					  var qsValueStartPos = find("=", qsPairNoDelim);
					  if(qsValueStartPos and (len(qsPairNoDelim) gt qsValueStartPos)) {
						canonicalizedQS &= ('=' & canonicalize(right(qsPairNoDelim, len(qsPairNoDelim) - qsValueStartPos), ARGUMENTS.restrictMultiple, ARGUMENTS.restrictMixed, ARGUMENTS.throwOnError));
					  }
				  }
				  if(len(canonicalizedQS)) {
					  canonicalizedURL &= ('?' & canonicalizedQS);
				  }
			  }
			  if(parsedURL.len[6]) {//6=fragment
				  canonicalizedURL &= ("##" & canonicalize(mid(ARGUMENTS.inputString, parsedURL.pos[6], parsedURL.len[6]), ARGUMENTS.restrictMultiple, ARGUMENTS.restrictMixed, ARGUMENTS.throwOnError));
			  }
		  }
	  } else if(throwOnError) {
		  throw(message = "URL is not valid");
	  }
	  return canonicalizedURL;
  }
  theURL = "http://www.domain.com/?foo=bar&pid=product_id";
  writeOutput(canonicalize(theURL, true, true, false) & '<br> ' & udfCanonicalizeURL(theURL, true, true, false));
</cfscript>

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	3924625

Reason:	BugVerified

External Customer Info:
External Company:  
External Customer Name: Aaron
External Customer Email:

Attachments:

  1. January 21, 2015 00:00:00: 1_3924625.cfm

Comments:

Related ticket: CF-3861951
Comment by External U.
8874 | January 21, 2015 03:54:12 AM GMT
Attached code as CF-3924625.cfm
Comment by External U.
8875 | January 21, 2015 03:56:02 AM GMT
+1 ////////////////////////////////
Vote by External U.
8879 | February 04, 2015 09:34:50 AM GMT
This is 25+ chars so that I can upvote this issue.
Vote by External U.
8880 | October 08, 2015 03:52:39 PM GMT
Note: urlDecode() cannot be deprecated until this ticket is fixed b/c the ESAPI functions do not offer a fully-compatible replacement for urlDecode(). Example: <cfscript> queryString = "foo=bar&timestamp=2016%2d04%2d07T18%3a18%3a41Z"; writeOutput(urlDecode(queryString));//returns "foo=bar&timestamp=2016-04-07T18:18:41Z" (good) writeOutput(decodeFromURL(queryString));//throws: org.owasp.esapi.errors.IntrusionException: Input validation failure (b/c "&times" is treated as character entity "×") writeOutput(decodeFromURL("foo=bar&timestamp"));//returns "foo=bar×tamp" writeOutput(canonicalize(queryString, true, true));//returns empty string "" </cfscript> Current workaround: use urlDecode() or a UDF Proposed workaround: add canonicalizeURL(inputString, restrictMultiple, restrictMixed[, throwOnError=false]) to the language, as described in this ticket's description. Thanks!, -Aaron
Comment by External U.
8876 | April 07, 2016 01:32:03 PM GMT
Hi Adobe, Can this please be added in Aether? We need a built-in-function that can canonicalize URLs properly, by canonicalizing each query string name and parameter individually. Please see udfCanonicalizeURL() example function in this ticket's description. Thanks!, -Aaron
Comment by Aaron N.
8877 | September 30, 2017 09:46:07 PM GMT
This is being evaluated for CF Aether.
Comment by Vamseekrishna N.
8878 | October 03, 2017 05:16:30 AM GMT
Hi Adobe, Do you realize you basically already wrote the requested code, albeit in java, when coding the fix for CF-3861951 (which, in-turn, was a fix for improperly-fixed CF-3080158)? All that needs done is _surfacing_ that already-written code as a canonicalizeURL() BIF. History on this issue: CF10 temporarily removed the default cfform action to resolve an XSS issue. It's removal, however, caused URL parameters to vanish when passed into a CF AJAX container, after submitting a form w/in the container (see CF-3080158). In CF-3080158, you improperly fixed the issue by encoding the entire query string (even tho I'd warned that each name and value would need encoded individually). In CF-3861951 (see my full repro code in my '12/06/2014 06:15:40 GMT' comment), you fixed the issue by handling each URL parameter's name & value individually. And _that_ functionality is what this ticket here (CF-3924625) is requesting to be surfaced as a BIF. Basically: to surface what's already been written, so that we can also properly canonicalize our query strings simply by using canonicalizeURL(). I hope this could still be considered for the next CF version (CF2020?). Thanks!, -Aaron
Comment by Aaron N.
30355 | February 21, 2019 07:30:22 AM GMT
+1
Vote by Matthew P.
30951 | June 24, 2019 12:31:31 PM GMT
Hi Adobe, Can this please be added in CF2020? Please see my earlier comment. I believe you've already wrote the functionality, but not yet exposed as a BIF. Thanks!, -Aaron
Comment by Aaron N.
31302 | September 07, 2019 08:22:44 AM GMT