Jeremy Wyatt, Operations Director at Fantastic Cloud Services Limited, explores some of the most commonly encountered issues organisations face when it comes to Backup and Disaster Recovery
Backup & especially Disaster Recovery are two areas which generally do not receive the same time and attention as general day to day services. It’s essential that they work but businesses rely on scheduled testing or the results of actual required restores to provide peace of mind that the process and solution works, but does this ensure the business has the complete knowledge and processes in place for when the worst happens? Any outage will have a financial impact against the business that in most cases costs significantly more than with thorough planning and testing.
Today the majority of businesses have a backup solution in place, backup reports are received daily, a team within the business have the responsibility to resolve issues and maintain the backup infrastructure. The core reason for having the solution, while it may only be used occasionally, is to restore critical deleted files, folders, databases, services, or maybe entire servers. The same goes for disaster recovery, there will be a plan in place that is carried out on a regular basis and will have been tested.
The underlying issue is that these recovery activities will take place during controlled planned situations which mean both your Backup and DR solutions look capable of delivering what the business requires. But what about when it really matters, when it’s no longer controlled, when the primary systems are offline due to an issue. As this is not a common scenario it’s difficult to learn lessons, correct oversights and hone the recovery process.
FCS focus only on Backup and Disaster Recovery which means we can take a standpoint that is independent and not influenced by other areas of your IT solutions. Here we share a few insights in areas we regularly experience:
- Backup and DR Documentation isn’t available when primary systems are offline.
- Before the recovery process can be started the backup solution needs to be brought online at the recovery site and the process to do this have not been fully documented.
- Primary Backup repositories which were used to perform Disaster Recovery testing have become affected and are unavailable.
– There is a backup copy offsite (air gapped) at another DC or in the Cloud but recovering using these offsite or cloud data repositories has not been tested. The process and Recovery Time (RTO) are unknown. Equally is the latest Recovery Point (RPO) available offsite or is it older data?
– Businesses have moved backup data to Cloud repositories but is it the best cost-effective option, especially when looking at long term costs and the ability to restore from these repositories? With the push to Cloud technologies Tape is becoming an often-overlooked media for backup data and sometimes classed as being outdated technology. However, it’s a very cost-effective, high-capacity solution that has been improved significantly over the years and more importantly is off the network so can be hardened further against ransomware. It’s also potentially quicker to restore from compared to downloading data from offsite or cloud backup repositories.
- Entire site recoveries are required to a new location, re-pointing of services such as external & internal DNS, email and remote access has never been tested or documented.
– Credentials and information to repoint external DNS is unavailable when primary systems are down.
– Internal DNS is fundamental for the majority of applications and infrastructure. When restoring to new freshly prepared infrastructure DNS needs to be set up prior to any recoveries. Recovery software, VMWare, Hyper-V all rely heavily on DNS working.
- There has been a partial site recovery to a separate location. Updating DNS entries, correct routing and networking have never been tested.
- Local Administrator passwords are either unknown or unavailable. Active Directory passwords are known but if the Active Directory is offline or the recovered server cannot communicate with the domain controller the Local Administrator will be required.
- DSRM Administrator Password is not known and is required to recover Active Directory.
- Password vaults are part of the production infrastructure and are unavailable, credentials for core services are unknown?
- Cloud Hosting, IaaS, PaaS or SaaS solutions are not protected sufficiently. It’s important to remember that these providers take on the responsibility of application uptime and the underlying infrastructure. It’s the customer’s responsibility to manage and protect their business data. The fundamental 321 Backup Rule is a time proven rule that is still relevant today.
- Backup and DR Systems require ownership with time spent planning, testing and maintenance. Not enough time and resource is set aside for the solution to be maintained with any errors being fixed quickly, errors in backups lead to failed recovery. Equally, there is not enough adequate time and resource assigned for testing and planning of DR.
- Lastly what the board expects recovery times to be, compared to what the IT team can deliver can be quite different.
These key areas may seem obvious to some but it’s surprising how often they are encountered. These valuable lessons and the skills to addresses them are only possible from our experience of assisting customers with their Backup and Disaster Recovery requirements and experience recovering systems during real disasters when time is critical.
FCS’s offer services from consultation, managed service, 3rd line support to license only. Please feel free to get in touch for a no obligation chat.
Please note: This is a commercial profile