tracker issue | What iT iS dESign studios

Title:

Exceptions and CFDUMP stall scheduler threads

| View in Tracker

Status/Resolution/Reason: Closed/Withdrawn/AsDesigned

Reporter/Name(from Bugbase): Kyle Thompson / Kyle Thompson (Kyle Thompson)

Created: 03/06/2016

Components: Scheduler

Versions: 2016,11.0

Failure Type: Non Functioning

Found In Build/Fixed In Build: CF11_Final /

Priority/Frequency: Critical / Most users will encounter

Locale/System: English / Win 2012 Server x64

Vote Count: 0

Problem Description: When a session is ending, a scheduler thread comes by to clean it up. If the onSessionEnd function throws an error or uses cfdump, it will cause the schedule thread to stall. I can't seem to be able to reproduce it in CF2016, but it would be worth investigating into each support version of ColdFusion. Luckily, the scheduler seems to recover if this happens to all of the available scheduling threads and creates new threads. The main issue with this problem is the mail spooling thread also is in the scheduler and when this happens, the mail spool thread gets stuck and doesn't recover. 

Steps to Reproduce: Execute the attached code quite a bit to cause a large number of session to get sessions created and destroyed to trigger the bug. 

Attached is the code to reproduce the issue and a full thread dump.

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	4125306

External Customer Info:
External Company:  
External Customer Name: Kyle Thompson
External Customer Email:  
External Test Config: My Hardware and Environment details: Windows Server 2012, CF11u8. Happens on both virtualized and physical hardware.

Attachments:

March 06, 2016 00:00:00: 1_data.zip
April 07, 2016 00:00:00: 2_data2.zip

Comments:

I should probably rename this to long running stacks seem to break it and even sleeps could break it.

Comment by External U.

4399 | March 08, 2016 04:23:43 PM GMT

Hi Kyle,
 
I tried on CF11HF7 & CF10HF18 build . I am unable to reproduce it by the code you have sent Can you provide me some other info about your environment so that I can reproduce it? Also please specify the update you are on .

Thanks,
Suchika

Comment by Suchika S.

4400 | April 07, 2016 07:36:00 AM GMT

I was able to reproduce it with the latest builds, however, you have to make several requests and deadlock the scheduler threads. I've seen this happen in our shared hosting environment. Every time it happened, it was the same site causing the problem in the same place. I can't provide the thread dump of the server on the bug tracker for customer privacy reasons.

I know this is going to be hard to reproduce exactly and the circumstances that it happens in are very niche. I had the customer disable their error checking and the server returned to normal and hasn't had a problem since. Their application code is very strange as their Application.cfc inherits an ApplicationProxy.cfc and just calls function in the application proxy. The problem with this is the scope is being type-casted to a struct which is what is causing the on application. I can't make a reproduction case that uses this problem, but calling cfdump explicitly demonstrates the situation.

I modified to slightly, but it does pretty much the same thing as the original one with less cfdumps. If you look at the log snippet in the data2.zip in log.txt, you will see that the mail scheduler wasn't processed after 15 seconds for 2 minutes. I had 3 concurrent requests for running and once I stopped it and the sessions eventually cleaned themselves up. Depending on how large the allowed stack size in the JVM and how large the struct or object is being reflected against, it can make this last longer than what I was able to get it stay stuck for. The customer's site has a session timeout of 1 day. Most web crawlers do not support cookies and generate new sessions on every request they make.

One resolution I can propose is to move "system" thread / tasks to their own scheduler to avoid a situation where "user" code will not interfere. Alternatively, clean up tasks can have their own thread pool.

Comment by External U.

4401 | April 07, 2016 08:55:21 AM GMT

It is not a best practice to use cfdump on a production site because of the performance of the cfdump tag due to its use of reflection

When we took Thread dump most of the threads were waiting at cfdump code(to be precise at java's reflection code). Instead of cfdump cfoutput can be used

Comment by Uday O.

4402 | March 15, 2017 07:51:31 AM GMT

Closing the bug , as Uday suggested "cfdump" should not be used in production code.

Comment by Suchika S.

4403 | January 29, 2018 07:08:29 AM GMT

tracker issue : CF-4125306

Exceptions and CFDUMP stall scheduler threads