tracker issue : CF-4202643

select a category, or use search below
(searches all categories and all time range)
Title:

[ANeff] Bug for: cfsearch term highlighting is an absolute mess to get working correctly

| View in Tracker

Status/Resolution/Reason: To Fix//Investigate

Reporter/Name(from Bugbase): Aaron Neff / ()

Created: 05/30/2018

Components: Text Search, Solr

Versions: 2018

Failure Type: Enhancement Request

Found In Build/Fixed In Build: 2018.0.01.308605 (PreRelease) /

Priority/Frequency: Normal / Some users will encounter

Locale/System: / Platforms All

Vote Count: 0

Issue: cfsearch term highlighting is an absolute mess to get working correctly

Steps to Reproduce:

Application.cfc
-----------
component {
  THIS.name = "MyApp"

  public boolean function onRequestStart() {
    if(URL.keyExists("reinit")) {
      cfcollection(action="list", name="allCollections")
      if(!allCollections.valueArray("name").findNoCase("MyCollection")) {
        cfcollection(action="create", collection="MyCollection")
      }
      myQuery = queryNew("collectionKey,collectionTitle,collectionBody", "varchar,varchar,varchar", [[1,"Term highlighting with Adobe ColdFusion 2018 Release","By default, Solr highlights searched terms in the summary content. Not even the title (even tho solrconfig.xml defaults to 'summary title'), just the summary! This is pointless because that summary is just the first 100 or so characters of content. To correct this bogus behavior you must do many steps: 1) On CF Admin's collections page, click on the collection and then click the 'Enable' Term Highlighting button, 2) That page says the collection will need reindexed, but it doesn't go ahead and re-index for you, 3) On CF Admin's Collections page, see there is no 'Re-index' button, but there is an 'Index' button, 4) Click the 'Index' button, 5) Under 'Index Collection', click the 'Submit' button, 6) See error 'Please enter a valid Directory Path for this collection.', 7) Laugh in annoyance, 8) Run cfindex(action='refresh', ..), 9) See Term Highlighting still only highlights the useless Summary, 10) Think to yourself: 'Should I also click the Reload button on the Collectons page?', 11) Yes, LOL, do that, 12) NOW see Term Highlighting finally highlights terms in the document. Note: Clicking CF Admin's 'Reload' alone doesn't resolve the issue; you must also do cfindex(action='refresh', ..) Doh!! Now stop wasting your time w/ CF's shenanigans and backwards behavior and go do something useful with your time :)"]])
      cfindex(action="refresh", collection="MyCollection", query="myQuery", key="collectionKey", title="collectionTitle", body="collectionBody")
    }
    return true;
  }
}

index.cfm
-----------
<cfscript>
  cfparam(name="URL.type", default="standard")
  cfparam(name="URL.criteria", default="")
  cfsearch(name="cfsearchResult", collection="MyCollection", type=URL.type, criteria=URL.criteria, contextpassages="1")
  writeDump(cfsearchResult)
</cfscript>

1) Run above app w/ ?reinit URL parameter to create/index "MyCollection" collection
2) Run above app w/ ?criteria=button and see cfdump shows empty Context
3) On CF Admin's Collections page, click the collection and click 'Enable' Term Highlighting. See the page says collection must be re-indexed, but it doesn't go ahead and re-index for you.
4) On CF Admin's Collections page, click the "Index" (since there's no "Re-index" button) button.
5) Under "Index Collection", click "Submit" and see error "Please enter a valid Directory Path for this collection." and get annoyed :)
6) Run above app w/ ?reinit URL parameter to re-index "MyCollection" collection
7) Run above app w/ ?criteria=button and see cfdump shows empty Context
8) On CF Admin's Collections page, click the "Reload Collection" button
9) Run above app w/ ?criteria=button and see cfdump finally shows Context content!

Note: steps #6 and #8 can be flipped and result is the same

Question: WHY!?!?!?!? That summary content is always the first 100 characters, or so, of content. It's USELESS for Term Highlighting. 

Suggestion: Just make 'content title' be the default! And remove all the above nonsense/unnecessary steps.

Especially since those on shared hosting don't even have CF Admin access, and so they're stuck w/ that useless Summary term highlighting :/

Attachments:

Comments:

Related ticket: CF-4185364
Comment by Aaron N.
28949 | May 30, 2018 08:47:21 AM GMT
Typo: "Just make 'content title' be the default!" should've been "Just make 'contents title' be the default!" ('contents' instead of 'content')
Comment by Aaron N.
28950 | May 30, 2018 08:49:54 AM GMT
Hi Aaron, I've got some inputs and some follow up questions...  your points #4,#5 . the collection losing the directory and other settings once created and indexed is a known issue... We have a pre-existing bug (CF-4202648 ) for that. Re-index button wouldn't make sense then, would it? "That summary content is always the first 100 characters, or so, of content. It's USELESS for Term Highlighting.".  That seems to be the default in Solr for SUMMARY field. not sure at this point if there's a way to change the length in any of the solr config files. One can use the contents of the CONTEXT field for term highlighting. You can even increase the no. of chars returned in this field with "contextbytes" attribute. "Just make 'content title' be the default! "..? by contents title do you mean the TITLE field in the returned search query? Would it not make more sense to make CONTEXT default for term highlighting? after all, TITLE does not contain the searched term, just a short title (in the case of your example: "Term highlighting with Adobe ColdFusion 2018 Release")
Comment by Piyush K.
28986 | June 06, 2018 05:44:31 AM GMT
Hi Piyush, Sorry for my confusion. I meant, Solr default is useless: <str name="hl.fl">summary title </str> CF should ship with useful: <str name="hl.fl">contents title</str> IMO, there's no valid use case for the former, only for the latter. Yes? Thanks!, -Aaron
Comment by Aaron N.
29050 | June 15, 2018 06:08:36 AM GMT
Hi Piyush, If you agree w/ the proposal in my previous comment, then the rest of what I said in the ticket's description becomes irrelevant. Basically, developers shouldn't have to go out of their way to get a useful context highlighting setting. Additionally, developers on shared hosting cannot even change that setting. CF would give a better customer experience if it shipped w/ <str name="hl.fl">contents title</str>. Why Solr even defaults to that <str name="hl.fl">summary title </str> setting is beyond me. It's not useful at all! :) Thanks!, -Aaron P.S. Just a note: I cannot see CF-4202648? I think it's not public?
Comment by Aaron N.
29051 | June 15, 2018 06:18:18 AM GMT
Hi Piyush and Pavan, Please ignore above suggestions. New suggestion: cfindex(highlighting="contents,title", ..) Result: 1) solr-config.xml would have `<str name="hl.fl">contents title</str>` (for Standard and Dismax) 2) schema.xml would have `stored="true"` (for contents field and title field) Thoughts? Thanks!, -Aaron
Comment by Aaron N.
29052 | June 15, 2018 07:51:14 AM GMT
Sorry, I meant cfcollection(highlighting="contents,title", ..) B/c cfcollection creates solr-config.xml and schema.xml. Thanks!, -Aaron
Comment by Aaron N.
29053 | June 15, 2018 08:16:08 AM GMT
The reason I was suggesting cfcollection(highlighting="contents,title", ..) is b/c changing CF to default to <str name="hl.fl">contents title</str> would mean everyone's indexes would be larger, even if they didn't plan to use highlighting. Right? For me, I wouldn't care =P But maybe others would? Thanks!, -Aaron
Comment by Aaron N.
29054 | June 15, 2018 08:20:22 AM GMT
Hi Adobe, I got an email saying: ----------- Issue - https://tracker.adobe.com/#/view/CF-4202643 Reason Code updated from 'NeedMoreInfo' to 'PRNeedInfo' ----------- No idea what that's supposed to mean..? Thanks!, -Aaron
Comment by Aaron N.
29508 | August 16, 2018 09:11:43 AM GMT
filed another bug (4203288) for the "content highlighting" issue that was discovered when investigating this bug. closed that. With this ER the user is basically asking for a control at runtime for enabling or disabling content highlighting, possibly by using new attributes with the cfindex tag. Sounds reasonable. Can we take it up?      
Comment by Piyush K.
29574 | August 21, 2018 11:40:41 AM GMT