This post originated from an RSS feed registered with .NET Buzz
by Sam Gentile.
Original Post: ES, Distributed Transactions do work on Win2K3, Oracle bug outlined
Feed Title: Sam Gentile's Blog
Feed URL: http://samgentile.com/blog/Rss.aspx
Feed Description: .NET and Software Development from an experienced perspective - .NET/CLR, Rotor, Interop, MC+/C++, COM+, ES, Mac OS X, Extreme Programming and More!
Last week an alien took over my blog. This alternate Sam Gentile brashly said Distributed Transactions do not work on Win2K3. Imagine that! Now that I have gotten control of my blog back by vanquishing the alien, the real Sam Gentile is here to say that of course ES Distributed Transactions do work on Win2K3 and it was silly to say otherwise. They are in use all over the world in all sorts of systems.
In the case of our particular problems with Oracle, it turns out to be Oracle Services for MTS calling IResourceManagerFactory::Create with the same GUID during the second connection is the issue. On Windows Server 2003, if the timeout is less than 90 seconds you will also experience the heap corruption. Here is the sequence of events (when timeout is less than 90 seconds): 1. User calls open connection to the first database. 1.1 OraMts is registering with MSDTC by calling IResourceManagerFactory::Create using a GUID, let's name it guid1. This succeeds. 2. User calls open connection to the second database. 2.1 OraMts is trying to register with MSDTC by calling IResourceManagerFactory::Create using the same guid1. This specific call to IResourceManagerFactory::Create will re-try for 90 seconds before returning with "failed to enlist". The issue is in the fact that the same guid (guid1) is used to register a second RM with DTC. 2.2. While one thread in OraMts is waiting on IResourceManagerFactory::Create to return, since the timeout of the transaction is less then 90 seconds, MSDTC is generating an abort message when timeout happens 2.3 The code from OraMts handling the abort message is calling IDispenserManager::RegisterDispenser with an ANSI string instead of an UNICODE causing the heap corruption. 2.4. Finally the thread that called IResourceManagerFactory::Create and logs "Failed to enlist" but the heap is already corrupted at this point and that is why the modified transaction id.
The problem of course manifested as distributed transactions unable to enlist but in a combination of systems it can be anything and in this case it was an Oracle implementation bug. Many thanks to my teammate Robert who did the detective work digging, and isolating the problem and Florin Lazar and his team at Microsoft for finding the cause of the problem (and explaining it above).