Aufait made a mistake! also Google!!
Though the year 2009 started on a positive note, last week turned out to be a week of mistakes.First it happened in Aufait.And then at Google. Both were human typographical errors.The impact was high.Both were solved within half an hour.
Aufait’s Mistake
It was on 28th January 2009, Wednesday Aufait made a mistake. The Customer Support System we built for a major airline client was live and running successfully since 1st January.We were happy that the system was being used quite heavily with no major bugs reported.The Customer Relations Department had actually stopped all manual activities for handling complaints and started using our system for managing complaints entirely.
Minor bugs reported
There was some minor bugs reported after two weeks of usage. It was regarding the notification mails automatically generated and sent by the system to its users.The system was sending mail to all associated users irrespective of their individual task status if a complaint was not closed within 10 days of submission.This created problems for some users. They complained that they are getting unwanted notification mails.Since the usage of the system was quite heavy, more and more users raised this issue. We were asked to correct the implementation on a high priority.
The quick fix
We fixed the issue quickly as it was an urgent one.The modified deployment files with fixes were released to the client by the end of Tuesday the 27th. It all happened on the next day. We were taking rest after having lunch in the afternoon. Suddenly I noticed my MS Outlook showing alert for 3 mail notifications received from the system. I was wondering how these mails came to me as I was not a user in the system.I quickly logged into the system with a user name/password given by the client.When checked the user activity log and we noticed that a particular supervisor’s email address was replaced by mine. But by this time the notification mails has started to flow into my Inbox unstoppably.To say exactly 25 mails per minute.We were wondering what is happening. It was then I noticed a mail from IC Department mentioning that the deployment files given on the previous day is deployed just a few minutes ago and then the application started sending continuous mails to all users who have some pending task in the system. We called up the client and asked to restore the just previous version as an immediate measure.
I quickly logged into the system and removed the SMTP server details used for sending mails. The flow stopped.
The cause
We analyzed what is the cause. What we found was surprising. When we modified the query to get users with pending tasks, an additional space entered somehow. It generated an error and system was working like having an infinite loop.
The result of an additional space was sending out nonstop mails to users for around 30 minutes!!!
The Google Error
It was on 31st Jan Saturday evening around 8.30 PM IST, that Google followed us by making a mistake.The Google home page is the most popular on the internet, with the overall site receiving millions of queries each day. It is the most common homepage and accounts for almost four out of every five internet searches.Google automatically identifies sites that may carry viruses and harmful software as part of its searches, but on Saturday all sites that were searched for carried the warning- ‘This site may harm your computer’. The system error left millions of visitors puzzled.

Google's search result on 31 Jan 09 Source : Telegraph UK
The glitch, which prevented internet users from directly clicking through to search results, was fixed within 30 minutes although users of Google’s email service Gmail have since reported finding genuine messages sent mistakenly to spam folders. The errors prompted panic among web surfers who at first feared the Google had suffered some kind of major failure that could have had serious implications.
Google’s Marissa Mayer, VP, Search Products & User Experience, explained on her blog that Google’s sudden false notification of the entire websites as containing malware was, “very simply, human error.” Evidently, Google maintains a list of Web sites that are known to contain malicious software, which it updates both automatically and manually, and Google will release a warning message when one of those sites appears in a Google search. In order to stay on top of malicious software, Google teams up with the non-profit StopBadware in order to monitor and maintain its malware list. But on Saturday, StopBadware’s site was temporarily down as millions of users attempted to access in order to figure out the Google problem.
Google’s mistake
“Unfortunately, the URL of ‘/’ was mistakenly checked in as a value to the file,” Marissa wrote, which apparently expanded the value to the entire list and caused the warning message to appear in all search results. She added, “We will carefully investigate this incident and put more robust file checks in place to prevent it from happening again.”
Good one!! Trying to justify an error with an error? or trying to say software is not yet an engineering but a humanly activity by saying “To err is human”? or a mild disclaimer to customers.
@Bijith
An error cannot be justified anyway. I was trying to put the second point forward. Even after years of evolution, Software Engineering is still prone to even simple human errors.Even an IT giant like Google is not spared.Doesn’t it requires some loud thinking?
Even after roaring this is going on. Organization can only do is to introduce “process” to make humans work like machines. The limitation this is in handling the basic instincts of humans like tendencies of skipping processes given an opportunity. This can expected only out of human maturity and experience by which he believes the process and collective wisdom and not by enforcement.