I had a really great day at work.
My colleague and I managed to win in foosball and give the other team a bright and shining egg. I love when that happens.
The weird part is a lot of not so great stuff actually happened today. But we came together as a team and fixed it. I love when you're able to feel like you're actually working together - and not just people working within the same vicinity.
So what happened? A little after 2PM a lot of our systems started crashing. I am responsible for most of our web-based applications. Every single one I’m responsible for (on our SharePoint and .NET farms) were unresponsive. Our automatic phone systems were down as well. The combination of this affected all of our 2000 employees.
I’m in application support, in case you’re wondering. I know a bunch of stuff about the different applications and I help develop new ones and maintain the ones we’ve got.
I don’t know a whole lot about servers. I don’t have any hard technical education, with the exception of a java-programing course.
I have access to every server that is important for the applications I support. And when the applications became non-responsive, I logged on to the web-front-end servers in the SharePoint farm and there was no issues I could see. The load-balancing tool told me that they were online and working just fine.
So when everything seems to be working, what do you do? You reboot everything you can get your hands on.
This was done by the server-team.
But it didn’t work.
Now at this point about an hour had passed. I’d given notice about the error to the relevant people in the business. And I had called users from our different locations in the country and established that it was indeed all users across the board that was unable to access the web applications.
Not sure what to do, I logged onto the application servers and started checking the event logs. It turned out there were a lot of error messages regarding one of the SQL servers in the SharePoint farm.
On the SQL server, there was some information messages about a failed windows update in the event log.
I gave this information to the server team, and it confirmed their suspicion that it was an issue with a group policy that had affected the firewall of the SQL servers. This had also been the issue with the Solidus server responsible for the phone application.
It meant that they now with certainty knew what was wrong. And how to fix it.
And it sounds so fucking obvious when you write out like this. Like of course, this is what was wrong. But I feel proud as hell, because I actually helped solve a server-related issue.
Usually when our systems crash I fidget around a bit but I’m never able to actually contribute anything that the server-team isn’t already on top off. So either I’ve gotten better at this. Or they’ve gotten slower.
Anyway. I kick ass at my job. And I had sushi and champagne for dinner.
All in all a pretty good Friday.
I hope you all get to have days like this too, once in a while. They’re nice.
3 comments:
Sorry but cod over rice is not the same thing as sushi.
That comment made me google code and cold rice. Which led me to the term Shirako.
Shirako is a Japaneese delicacy. It's cod sperm. Or rather, the sacks that contain the sperm.
http://www.hewdge.com/wp-content/uploads/shirako.jpg
... can I please unlearn this bit of information?
So, they applied the GPO to the servers by accident? The fuck?
Post a Comment