tracker issue : CF-4038865

select a category, or use search below
(searches all categories and all time range)
Title:

Error Executing Database Query. Timed out trying to establish connection

| View in Tracker

Status/Resolution/Reason: Closed/Withdrawn/CannotReproduce

Reporter/Name(from Bugbase): Leith Tussing / Leith Tussing (Leith Tussing)

Created: 08/18/2015

Components: Database

Versions: 11.0

Failure Type:

Found In Build/Fixed In Build: CF11_Final /

Priority/Frequency: Normal / Few users will encounter

Locale/System: English / Win 2008 Server R2 64 bit

Vote Count: 0

Problem Description: We have intermittent "Error Executing Database Query. Timed out trying to establish connection" from applications on our production server.  We had not witnessed these on our development server but once they had been moved to the live one they started happening.  Our .NET applications run from the same IIS server and connect to the same MS-SQL 2014 SP1 server.  The connection issue will typically happen once for a website and then you can just reload the page and everything works as expected.

Steps to Reproduce: Not consistently reproducible.  After forcing the development server to have a -XX:MaxMetaspaceSize=192 value and making all DB connections including the client variables DB encrypted we could more frequently make it happen.  It seems to happen the most frequently when the client variable store DB connection times out and is closed along with the particular websites datasources.  We also recycle the CF services nightly, otherwise if left online it seems to happen at a much more frequent rate too.

Actual Result: The CF websites seem to hang for 30 seconds before the connection times out and you get a CF error for the currently trying to run query.  "Error Message: Error Executing Database Query. Error Executing Database Query. Timed out trying to establish connection"  The full trace for it is attached as a text file.


Expected Result: All connections connect.

Any Workarounds: None found yet

----------------------------- Additional Watson Details -----------------------------

Watson Bug ID:	4038865

External Customer Info:
External Company:  
External Customer Name: Leith
External Customer Email:  
External Test Config: My Hardware and Environment details:



Windows 2008 R2 64bit Hyper-V VM

IIS 7.5 website connectors are installed per website and not ALL since not all websites use CF (3 websites use CF and 4 others use .NET)

ColdFusion 11 with Patch 5 (connectors rebuilt using latest version)

-Xms2048m -Xmx8192m

Java JRE 1.8.0_51

MS-SQL 2014 SP1 with SSL communication enforced (using a domain signed certificate)

All datasources have "EncryptionMethod=SSL;ValidateServerCertificate=false" to enable encryption but ignore the domain signed certificate 

Client Variables set to MS-SQL DB as default (connection also encrypted)

Attachments:

  1. August 19, 2015 00:00:00: 1_CF11_StackTrace.txt
  2. August 26, 2015 00:00:00: 2_2015.08.25_0924.zip

Comments:

We've tried upping these values as recommended to resolve the issue but it did not work. -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=1024m We've tried increasing the tomcat connects sessions as documented in the tuning document and that did not work. We've tried removing the encryption from the client variables database connection and that did not work. We've tried adding the domain root certificate to the Java JRE to ensure it can resolve and that did not work. We've thought about increasing the datasource timeout to see if that masks the issue but that has not been implemented yet.
Comment by External U.
6224 | August 18, 2015 01:33:47 PM GMT
I was working on the production server implementing a monitoring process to detect the occurrence of these and actually caused one to happen while the Windows Task Manager was open. I saw the coldfusion.exe process spike to 50% CPU usage (4 cores so 2 full CPUs worth) and was at the time consuming 1.5 GB of memory. It dropped down to below 6 mb of memory and then slowly started climbing back up to about 120mb of memory. While it was doing this the system gave me the typical DB connection timeout.
Comment by External U.
6225 | August 18, 2015 03:02:47 PM GMT
I was finally able to catch both a thread dump and heap dump of the failure happening. When it happens the system typically has a bunch of BLOCKERS and WAIT threads sitting around for 30 seconds. All of the other threads that are blocked do nothing making all of the other websites on the CF stop processing. The dump contains the following files, the BOCK happened at 0923am time wise in the files. 2015.08.25_0924_HeapDump.hprof - Heap Dump mid BLOCK 2015.08.25_0924_Monitor.JPG - VisualVM view of the Monitor tab showing the massive CPU spike, Thread jump, and Heap dump @ 0923 2015.08.25_0924_PSSurvivorSpace.JPG - VisualVM view of the Heap tab showing the PS Survivor Space plumet at 0923 2015.08.25_0924_ThreadDump.tdump - Thread Dump mid BLOCK 2015.08.25_0924_ThreadDump.txt - Thread Dump mid BLOCK (text file) 2015.08.25_0924_Threads.JPG - VisualVM view of the Threads tab showing the WAITING and BLOCKING threads starting at 0923 and lasting for exactly 30 seconds
Comment by External U.
6226 | August 25, 2015 12:46:16 PM GMT
The dump is in the uploaded 2015.08.25_0924.zip file.
Comment by External U.
6227 | August 25, 2015 12:46:45 PM GMT
After countless attempts to stop this from happening the solution we finally found was to run the Microsoft JDBC drivers v4.2 inside of ColdFusion11. Once we stopped using all Adobe JDBC drivers to connect to our MS-SQL 2014 server the server stopped being unstable. If even one active DB is using the built in Adobe MS-SQL driver then the system in both our development and production enviroment will eventually behave as documented previously. https://www.microsoft.com/en-us/download/details.aspx?id=11774
Comment by External U.
6228 | September 02, 2015 09:36:16 AM GMT
Hi Leith, Please provide below mentioned information: 1. Are you able to repro this issue consistently? 2. Settings Summary (DataSource details which is being used) 3. Sample Code
Comment by Nimit S.
6229 | September 13, 2015 11:45:01 AM GMT
1 - On our production server which gets more load it was happening multiple times a day. Also the longer the CF11 services were allowed to run the more often it happened. We were recycling the CF11 services nightly which reduced the rate of occurrence which if we did not the rate of it happening would increase as the services stayed online. On our development server which also cycles the CF11 services nightly we could cause it to happen sometimes but it would take more effort to make happen. 2 - I can provide the PDF output of the Settings Summary of the live system but I would prefer to provide it directly via email and not attach it to the case where others can see it. I highlighted two different datasources in it as examples. One shows the new active Microsoft JDBC based version and the ones with "_ADOBE" on the end of the name is the old datasource we were using before with the built in MS-SQL driver. We store the cv_store in the DB too which is also one of those datasources. 3 - I can provide sample code but like the settings I would prefer to provide it directly via email to someone and not attach it to the case.
Comment by External U.
6230 | September 14, 2015 09:35:19 AM GMT
Hi Leith, I understand your concern. You can send an email to nimsharm@adobe.com NOTE: Please change the extension from .zip to .adobe
Comment by Nimit S.
6231 | September 15, 2015 01:07:17 AM GMT
Email sent. Thank you.
Comment by External U.
6232 | September 18, 2015 08:01:11 AM GMT
Hi Leith, Thanks for sharing all the information, but it has lot of dependency. Is it possible if you can share repro case without any dependency?
Comment by Nimit S.
6233 | September 29, 2015 05:32:25 AM GMT
Hi Leith, Can you please provide a repro case without any dependencies involved?
Comment by Nimit S.
6234 | October 14, 2015 11:06:32 PM GMT
Hi Leith, I would really appreciate if you can provide a repro case without any dependencies involved?
Comment by Nimit S.
6235 | November 05, 2015 07:43:54 AM GMT
Sorry we've been busy, I will work on seeing if I can get a developer to provide something to me.
Comment by External U.
6236 | November 09, 2015 08:47:50 AM GMT
Thanks Leith.
Comment by Nimit S.
6237 | November 10, 2015 06:40:17 AM GMT
Leith, Is there any update on this issue?
Comment by Nimit S.
6238 | December 08, 2015 08:48:42 AM GMT
Leith, We have not received any update on this bug hence, closing it. Please feel free to reach out to ColdFusion support at cfsup@adobe.com if you are still facing this issue.
Comment by Nimit S.
6239 | December 20, 2015 12:34:16 AM GMT