Two bungling engineers and a faulty cable brought down Singapore’s biggest bank DBS — all of the ATMs, internet banking — for about 7 hours last month on 5th July.
Or so that was the narrative painted by the Straits Times two days ago on Thursday that it was all due to human error. (There’s a far lengthier version in the printed Straits Times version than the gimped version online). The big headline inside the paper on page four was this: “It was definitely a human error”.
Really? Is that the best narrative that explained why the system crashed that day? Everything was due to “human error”, and two “bungling” IBM engineers were to blame?
If Singapore’s biggest bank could so easily be brought down by “human errors”, then I find it genuinely shocking. Surely IBM’s 10-year S$1.2 billion outsourcing contract — about S$120 million per year to maintain the IT infrastructure — details a stringent process for disaster recovery?
Doesn’t DBS and IBM have SLAs that spell out how IT failures should be recovered from, with a detailed escalation process? And seriously, a single misplugged cable can bring down your entire storage system? I don’t buy this at all. You’re not talking about a start-up servicing a bank; you’re talking about a maintenance contract deal worth millions.
Thus, the main point is not about “human error” — a totally wrongheaded slant that ST took, in my opinion — but the fact that DBS’ business process screwed up along the way. Yes, human error may have started this, but the recovery process screwed up and failed to kick in.
The blow-by-blow account of how engineers triggered this failure is not interesting. What’s interesting would have been how and why DBS’s disaster recovery process failed to kick in.