Icon

Introduction to Cyber Security - Preparing for Incident Response and Recovery

PRO
Outline

All the tutorials in this course:


Just what is an "incident"? I'll define it as a serious degradation of service. That might be the result of a significant hardware or software failure, a catastrophic natural or civil disaster, or a targeted attack. However it happened, such an incident will noticeably reduce the level of function of your internal or external resources - a "service outage" - or cause the exposure and theft of private data.

As soon as you realize that something's going wrong, you'll need to identify and characterize the event. That'll mean figuring out exactly what's happening, which specific systems are affected, and whether law enforcement officials should be notified.

Your next goal will be mitigation. Meaning, quick action to contain the damage. If your networks and servers have been infiltrated, either stop any unauthorized processes or - especially if your resources are being used for ongoing criminal or expensive operations - just shut everything down.

Once the initial panic hopefully subsides, you'll need to identify the immediate cause of the problem and neutralize it. Simply firing things up again without having removed the primary issue would be like just rebooting a malware-infected laptop or smartphone and hoping the problem somehow magically solved itself. You just know that that won't help.

It could be possible to surgically remove the malware, vulnerability, or compromised account that caused the trouble. Or you might just have to wipe your systems clean all the way down to the hardware layer. It might even end up being faster and cheaper to securely dispose of your drives and start all over with a set of brand new, clean replacements.

Once you're confident that you're on top of the core problem, you can begin the recovery. That'll include restoring all systems, applications, network configurations, and backed up data to their original pre-incident state. And, of course, that'll also involve a lot of testing to make sure it's all working the way it should. Remember how long it took until you had everything right when you first deployed your application? Well, this time, it'll take at least as much work.

Don't forget to document everything. What worked and what didn't work? Which steps in your recovery plan will need updating? Which stakeholders need to know the details? Are you mandated to report the incident to law enforcement and financial agencies? Should you tell your customers and, perhaps, provide them with guidance and support for managing any problems they may face as a result of the incident?

The better your documentation, the easier this will be for you the next time. And, eventually, there will be a next time.

Finally, when all the shouting has died down, you'll want to carefully figure out what went wrong. Sure, those foreign hackers were the immediate cause of the mess, but what system, training or cultural failures allowed it to happen? Then - and most important of all - what concrete and practical steps can you take right now to fix those problems.

 

I finished! On to the next chapter