In working with CF 10 on a site that uses Sybase as a backend database server one of our tasks was to convert various "inline" queries over to stored procedures. In some cases these queries used the "maxrow" attribute to limit the number of rows returned to the driver. Personally I usually revert to LIMIT or TOP (or whatever the DB Server syntax provides) for that purpose. With maxrows the DB server works just as hard (my best guess) and the driver simply counts the number of rows sent to the buffer and limits it there. In other words, I have always suspected that maxrows limits the number of rows sent from the DB Server and not the number of rows actually produced by the query. Still there are situations where it hardly matters in either case - and maxrows has a purpose there I guess.
So when it came time to convert our queries over to stored procedures - and with a requirement to change as little as possible on the DB Server - we dutifully added the maxrow attribute to our cfprocresult tag like so:
Recently one of our systems started misbehaving. In this system we had 2 ColdFusion 8 servers connecting to a single MSSQL 2005 server. All the hardware was quite good - plenty of RAM, Fast disks, moderate traffic etc. The system had been in place for some time. But (and isn't this often the case) we moved a new design in place with some changes to the query code and suddenly our well-behaved system started acting like a sugar-laden toddler in the cereal aisle.
Watching the "running request" counter for ColdFusion I noticed that it was slowly accumulating requests. When that happens (threads slowing building up over time) you usually have to prepare for some frustrating troubleshooting ahead. When a server is "crashing" you can often pinpoint the error. Crashing servers tend to suddenly fill up running requests and the request queue and the log files will generally have some clues occurring right around that time. But this was different. In this case the request count climbed slowly and was seemingly random. And these threads did not show up in the list of "active requests" in the CF monitor either. Aha! I thought. This is my old networking issue! You might not remember this but a few years ago we discovered that auto-synching ports can sometimes cause phantom connections to hang on a DB intensive application (see this post).
But a quick checkup of network settings showed that this was not the case. Network connectivity was excellent and both DB and the 2 servers were connected through the same Cisco switch. So it was on to the database. Why the DB? Why not scour through JVM settings and fiddle with CF request settings? For one thing, 80% of the time it's not CF or the DB but some combination of the 2 (bad query writing, resource constrained DB, drivers etc). In this case the 2 common denominators were the database and the new code - but I believed the DB was our lowest hanging fruit.
Sure enough a check of the database showed processor usage that did not look normal. Wait a minute Muse... don't you have any baseline numbers for that assumption? Nope, not at this point. I'm letting my experience guide me. When you have 4 cores and 2 of them are at a flat lined 50% you generally know something is wrong. In fact a quick check of the accumulating requests on CF showed a 20-25% per thread correlation. In other words, each of my hanging threads was using 20% or so of one core on the SQL server. Once it was hung that thread continued to use 20% of one SQL server core in perpetuity until CF was restarted.
The funny thing was that under regular load the DB processors was extremely underutilized until one of these threads was produced. The DB processors would stay at between 1 and 5 percent most of the time - practically idle. When one of these "special threads" came along, one proc out of the 4 of would "jump up" to 20 or 25 percent but the rest would idle along as before.
We tried a great many things. We patched and hot fixed, shrunk and optimized files, added and removed indexing etc - all of which was helpful and necessary, but none of which permanently "fixed" our problem). Finally, I was looking at the "activity monitor" in MSSQL05. The activity monitor "process info" view shows a list of connections along with some extra data, process ID, database, status etc. If you double click on an item in the row you will see the currently running query or task. You have to sort of "get lucky" to see it since most of them fly by pretty fast.
In any case I was watching this view (refreshing every 10 seconds) while there were no hanging threads and suddenly I saw something that made me scratch my head. A process ID was duplicated about 3 or 4 times. Each of the duplications had a "wait time" and a "wait type" of CXPACKET. So this process ID was spawning multiple threads under a single ID. And the wait time made me think that this might be our offending process. Looking at the processor utilization I noticed that sure enough, I had a 20% utilization on one core. Going back to my CF servers my suspicions were confirmed. We had a hanging thread on one of the servers - so this CXPACKET thing required some more investigation.
First however, I thought I might try to mitigate the problem from within the activity monitor by killing off this process ID. If I was successful I would have a new mitigation technique that would not involve any potential user disruption, with the exception of whoever was running the query that was locking up these threads (and they were probably tapping their fingers on the desk waiting anyway). So I tried the "kill process" button from the activity monitor, but I had to kill all of them individually and I couldn't catch them all before they re-spawned - or maybe I'm just too old. Turning to SQL Studio I ran the query KILL 55 (where 55 was the process ID in question). That did the trick and it was indeed a magic bullet. As soon as I "killed" that process ID - all the sub-processes were terminated as well. My CF server dropped the hanging thread and SQL Processor usage went back down to normal.
Ok so now what? I could hire a temp to sit in front of the activity monitor all day and kill off any process ID with a CXPACKET Wait type that correlated to a CF hanging thread. I could probably write a complicated SQL script to find these threads and terminate them (I kind of liked that idea actually). In the end I chose to do a little research into CXPACKET wait types. I was fortunate to stumble onto this post by Pinal Dave. It turns out that a CXPACKET wait is related to parallelism. Now parallelism is how MS SQL chops up the work load of a query and makes full uses of your processors to get the work done. Much like cfthread splits work out and then joins it back together, SQL splits the work out and then an "organizing" thread "waits" for all the individual threads to complete. Once they have all completed it assembles the data for return to the client. Make sure and read the full article as well as the comment by Jonathan Kehayias at the bottom - excellent stuff!
In any case my SQL server was suffering from not being able to reassemble threads from this division of labor. I'm not sure why that might be (I have some ideas) but the long and short of it is that attempts at parallelism for query execution were causing hanging Jrun threads on my CF server. Following Pinal's guide (and a couple of MS resource pages) I tried setting the max degree to 2 and the threshold to 20, 25, 30... looking for a "sweet spot" where most of my queries would execute without parallelism, reserving it for the report or aggregation queries in the admin section of this site. Unfortunately that didn't work. The issue here was likely a specific query with some new joins in it that was always going to trigger parallelism and ofen fail to complete - causing our hanging thread issue.
Finally, I set the "max degree" to 1. Doing this meant that there would be a 1 to 1 relationship between threads, process IDs and queries. In other words, a given query would never use more than one core execution thread. Now you might think this is problematic because it doesn't make full use of SQL's tuning engine. Technically you are right. I would only say that in a typical web application the query traffic generally consists of dozens of very short queries where parallelism would actually add additional time to the process. So in a typical web application you lose very little by minimizing the degree of parallelism. And indeed that appeared to be the case in our web application. Our CXPACKET waits, hanging threads and egregious processor usage all went away and things have been functioning smoothly since then.
The Muse knows his readers well. Some of you want to hammer me about not fixing the real problem - that specific query in the code. Not to worry. Using SQL's performance dashboard we teased out the worst offenders and set our ColdFusion developers to analyzing the code. But I suspect the version of SQL or something about the hardware, hyperthreading or NUMA to be a more likely culprit. I have never seen SQL's execution planner cause a problem when it turns to parallel execution before. Still - it's always a good idea to fine tune that query code.
First, let me make sure I do not get off on the wrong foot here. I am most explicitly not saying that you should be using Access on a production server with heavy traffic as a regular application data source. In my opinion Access should be limited to the smallest use possible on web servers. It's fine if you have a small intranet application to keep track of equipment or handle tickets among 4 or 5 people - although why you wouldn't at least move to MySQL or MSSQL express is puzzling. However (and the Muse can't be clearer than this) if you are using Access to backstop a high traffic ecommerce site or content management for an important customer then shame on you. It's a problem you will have to solve - maybe not today, maybe not tomorrow... but someday and for the rest of your product life cycle.
Having said all that you might remember my post on setting up an Access Proxy. The problem I was solving in that case was quite specific. I was using Access as an export format for an enterprise application. Users would run a very large report and then have the choice to export it to Excel or Access for offline viewing or mining. As far as I'm concerned this is an appropriate use of Access - as an offline data mining tool. My process basically copied the file to a unique location and used the Access Passthrough method to populate it, zipped it up and sent it to the users browser.
When I moved to 64bit ColdFusion I was forced to come up with an elaborate proxy service to insure that the process continued to function as usual. This was because I could not figure out how to get Sequelink (those funky "ODBC Server" and "ODBC Agent" services that install with ColdFusion) to talk to ColdFusion any more in the 64bit environment. Recently however, Brent Fry delivered this excellent post as well as provided me with some additional info for the windows 2003 server platform. So I now have a new process that is capable of synching up Sequelink with ColdFusion.
There's nothing wrong with Brent's post of course, but I have a desire to codify this in my own words. Plus I know I'll need it again and if I don't write it down for myself I forget where I found it the last time (no offense to Brent and cfexecute.com and with all credit due!). With that in mind, here's the Muse version of Access on 64bit ColdFusion with a little "under the hood" stuff thrown in.
Query caching is one of those underutilized features of ColdFusion that can exponentially speed up your application. It is also one of those misunderstood features that, when used incorrectly can be very disappointing. Let me say at the start that the Muse believes you should use query caching. If you don't believe I'm a fan then check out my post titled, Good Developers Practice Safe Query Caching. It's not a panacea, but it definitely has it's uses. Almost every application has some queries that can be cached - and saving round trips to the database is the holy grail of application tuning. But in this post we want to talk about naughty developers who cache irresponsibly... developers who do not understand the nature of the FIFO buffer.Read More
First let me say that query caching on a CF server is not a panacea. There are many ways to improve performance and there are many techniques for caching. All techniques have trade-offs. Still, there are instances where caching will save you time and money - and those are both things in short supply. More to the point, query caching is so easy to implement that it can be done for an entire site or application in a relatively short time - as long as you follow some simple rules and take some precautions (please people - use "safe caching").Read More
I have one more tip as a follow up to my previous post on Migrating Between MySQL and MSSQL. It has to do with the dreaded "data trunction error". If you have used MSSQL you may have seen this error crop up from time to time. It is a common error and very easy to remedy. The error occurs when you have a character field with a length that is too short for the size of the string you are trying to insert. Check out this example....Read More
I recently did an emergency stint of troubleshooting for a site owner (a designer who owned a complex ColdFusion site) who was hit with the HTML injection issue on his site. He had done a good deal of work on his own and cleaned up the HTML as best he could. He was busy moving the sites to a more secure environment (a better hosting company, no more FTP, intrusion detection and solid VPN support). He had managed to travel a long way down the migration path before he ran into trouble. His new environment used MSSQL and his old environment used MySQL.
Now I love MSSQL and I think it is a wonderful choice (price notwithstanding), but had he contacted me before he decided to go this route I would have suggested that he stick with MySQL for the sake of compatibility. Unfortunately he had already "flipped the switch" before I got there and so there was a lot of "on the fly" changes to make just to get his site working correctly again. One of the biggest issues had to do with his choice for migrating the actual data. He had chosen to use an export tool to move the MySQL data into an Microsoft Access file. He then used Microsoft Access "upsize" wizard to send the data to the MSSQL server. The biggest flaw with this approach is that it resulted in missing dates which were not translated correctly from MySQL to Access to MSSQL. So we had to re-export the data in to SQL dumps, modify them and then run them against MSSQL.
The date problem is not a typical incompatibility with MSSQL, but there are several we ran into that we had to account for. Here they are in random order:Read More
It's not often I see an error that I have simply never seen before. Here's one that some of you may recognize. It happened on a query using SELECT * against a fairly limited table on an MSSQL 2005 database. There wasn't anything unusual about the query (except for the bad form of using the asterisk). Here's the error that was thrown: