Title:
Bug 73802:Can we please get some reporting (in the debug output, or via logging) of how much time a given request spends in cflock waits? We don't hear call about this, but I would argue it's because few unders
| View in TrackerStatus/Resolution/Reason: To Fix//
Reporter/Name(from Bugbase): Charlie Arehart / charlie arehart (Charlie Arehart)
Created: 11/26/2008
Components: Debugging, General
Versions: 9.0
Failure Type: Unspecified
Found In Build/Fixed In Build: 0000 /
Priority/Frequency: Normal / Unknown
Locale/System: English / Platforms All
Vote Count: 20
Problem:
Can we please get some reporting (in the debug output, or via logging) of how much time a given request spends in cflock waits? We don't hear call about this, but I would argue it's because few understand what cflock waits are about.
I frequently help folks who have poor performing pages that are due to waits for attaining CFLOCKs. We can only determine it by timing the code with the locks there or not (and no, for readers who may suggest that throwontimeout=no is the answer, it's not. More on that in a moment).
All I'm proposing is a CF Admin setting on the debugging settings page (like the option of whether to track query times) which would report then at end of page how long the request spent in lock waits. This should be easy: the CFLOCK tag already has a timeout attribute, so CF pays attention to how long it waits to obtain the lock (and no, for readers who think that controls how long the code within the lock should be allowed to take, it does not.)
I'd just want to then see a list in the debugging output of the time spent. It would be great to detail it by CFLOCK (just as query time is shown per cfquery). But I would especially like to see it totalled (I'd like to see CFQUERY times totalled also, but that will be a separate request).
I would also request separately that we have a report in the debug output of how many cflock timeouts were encountered. Again, one may think that a cflock timeout throws an error, but that's controllable with throwontimeout=no, so we really do need to track this. (And anyway, one could catch the timeout in try/catch and handle it otherwise, so a timeout may not create an error.) Knowing how often one has timeouts can be as valuable as knowing the total time spent in lock waits.
I would also argue we also need to be shown how many timeouts are in fact being ignored by the throwontimeout=no. That attribute/value combination is just devastating to those who don't understand it. So many who use it don't realize that it means that the code within the cflock WILL NOT BE EXECUTED if the lock is not obtained. So many people don't understand locking at all, so this confusion just exacerbates the problem of ignoring a timeout when it does happen.
And besides the debugging output, it would also be useful to be able to have this information logged, for admins to manage this when coders don't think to. Just like we have an option to log pages that exceed a given timeout, I might propose we also have an option to log pages whose total time in lock waits exceeds a given time. I realize some may blanch at this, because they think that the current "page too long" log report is enough, but I disagree. Some people accept that some pages take a long time due to the work they need to do, but I'm trying here to highlight problems that are OUTSIDE the scope of what the page is "expected" to be doing. My whole point is that many don't realize how much lock wait contention is adding time burden to their pages. We have no exposure of it.
And besides that logging, it might be nice to have an option to log the details per request, just like we have the option for cftrace stuff to go to a log in addition to the debugging output, for those who do not choose to enable debugging output but might still want to see the details per request.
Finally, I would ask that the Server Monitor and Admin api also both reflect this sort of info, just as they track query time in a page. Again, Admins sometimes are the ones who need to help developer understand the problem. But this can't be only in the monitor, because developers do need to see this info, just like query and trace timing, whether in the debug output or in a log.
And all this is all the more important to be tracked in other than the debugging output, because locking problems tend to happen only in production with multiple users. People don't tend to enable debugging output in production (and shouldn't, even if limited to certain ip addresses, for a variety of reasons that have long been discussed.) So having this info logged or in the Admin API would be all the more important.
PS If you really want to make this powerful, and you do track total lock time per request, it would be helpful to break down the total time in lock waits between the kind of locks (session, application, server, named lock, or the new "request" type added in CF8.01 for work with CFTHREAD), so that one may choose to focus on problems due to a certain kind of cflock wait (separate from each specific lock type).
Method:
Result:
----------------------------- Additional Watson Details -----------------------------
Watson Bug ID: 3036835
External Customer Info:
External Company:
External Customer Name: charlie arehart
External Customer Email: 03D0090C44723473992015D5
External Test Config: 11/26/2008
Attachments:
Comments: