A Hospital Really, Really Badly Needs a Database Doctor

As we all know, a good doctor keeps up with medical and technology breakthroughs, but at one hospital, the same case can’t be said for the Information Technology Staff.

I know for me, it is getting somewhat hard for me to keep up with the fast changing environment of technology. Yet there is IT pros that refuse to make changes and stay current with what is happening in the world of technology.

Such an example of this has kept the IT staff at a hospital from upgrading a mission critical system that turned catastrophic. This company, has roots and goes back decades to a single hospital in a small town, has hied people who started with the company straight out of school eventually progressed to become the Senior Managers of the IT department.

However, as usually happens in the business world, they aggressively bought up competitors’ hospitals and became a large enterprise with multiple facilities across multiple states. The IT Managers, for God knows what reason, didn’t keep their knowledge current with the changing times. They kept the attitude that their department, skills, and knowledge were fine with only minimal or intermittent upgrades, which held them back from implementing the most basic industry standard practices within their department.

Times are a Changing

Take a dose of HIPAA, ARRA, and with some Obamacare, then add the exponential growth of technology in business over the last 20 years alone. Mix all of that with a hefty dose of stubbornness, and you have a good recipe for a disaster.

The most critical system within a hospital is the patient record system. Their vendor had been warning that it was going to drop all support for the product. The vendor really recommended an upgrade to the most current version, but the IT Managers decided to stay with the outdated solution instead.

The database that was being used for the patient record system was well beyond what the solution was designed for. To make matters worse, the database server wasn’t given enough resources to keep up with the demand from it. The virtual machine was running on an old version of a common virtualization software. They also forgot updates to the OS or to the database program itself. The IT Staff claimed that the:

Updates break stuff!

The server team manager would often complained.

Database indexing? What’s that?

This caused the patient record system ran at glacial speeds now.

However, good changes in the laws finally forced the server team to upgrade the patient record system. The manager scheduled the change for the coming weekend. Somehow this was surprising news to the team.

Question: Any RFCs put in for this major change?
Reply: What’s an RFC?
Question: Any notification sent out to the user community?
Reply: Well, yeah — on Saturday morning, 10 minutes before the upgrade we’ll tell everyone to get out of the system for “a few minutes.”

This did not go very well, as you can assume.

And Here Goes Nothing

Saturday came and there were no major issues with the upgrading process itself, besideds the fact that it took six hours to upgrade to the new database schema. The application servers were updated in about an hour, then it was brought back up. The”few minutes” keep stretching on and on, until finally an email went out to the users telling them they could start using the patient record system again.

But there was a slight to major problem, depending on how you look at it. The new version of the software read and wrote a lot more data than the older versions.. As a result of this, the database transaction queue went from barely keeping up to thousands of transactions behind in less than a single hour. How did they manage to fix this? ? They took the system down for another hour to let the database server catch up, but making no functional changes to any the server configuration whatsoever.

When it was brought back up, you guessed it, the same thing happened again. The next step they took was to start a conference call with the vendor to ask for some assistance. The vendor then asked about the database server specs and suggested that it may not be fast enough for the workload that was being requested.

You can guess where this is going to go. The Server Team’s Manager yelled red-faced at the speakerphone saying:

No, it has been working fine for years! All we did was install this fucking upgrade, and now our server is crashing! It’s not the database server! You broke this fucking piece of shit and you are going to fucking fix it!

The vendor proceeded to try to help with whatever troubleshooting they could. In the meantime, none of the hospital’s multiple hospitals could pull up any patient record data, and the medical personnel had to resort back to chart by hand on paper. As the patients stacked up, patience with the IT department was growing very thin.

This went on for days of troubleshooting on other systems besides the database server, which was the issue, because the manager absolutely refused to believe anything could be wrong with it. At one point in time, he actually threw out the idea of rolling the whole thing back to how it was before the upgrade even started. There was one major flaw with trying to do a backup. There was no backup of the system had been taken before the upgrade even happened.

A Small Victory of Sorts

Finally, after the system had been down for quite a few days and after every other possible solution had been tried, the IT manager agreed to give a new database server a try. After it was racked, set up, tested, and ready for action, the system was brought back up in no time — and it worked without any issues! The problem is finally solved.

I wish I could of ended this story on a happier note that the Server Manager was fired, but that’s not how it worked out at all. When the Manager had reported what happened to the Board of Directors, he managed to throw the vendor under the bus. He also gained a round of congratulations for his handling of the crisis.

The sad thing is that this was a completely preventable catastrophe if only some basic disaster recovery practices had been followed, instead of the IT department assuming they knew how to do their job without gaining more knowledge in their fields. The lesson here is to back up, back up, back up! And don’t do anything major unless you have a foolproof plan for backing out if something bad happens.

Advertisements


Categories: Programming

Tags: ,

%d bloggers like this: