Disaster Recovery In The Cloud For Your Housing Group

Over 40 mission critical workloads protected using Azure Site Recovery and over 4 terabytes of protected file storage using Azure File Sync Services. Fully automated failover, recovery and start up scripts to protect services such as housing & repairs systems, Biztalk, Windows infrastructure and more.
 
The team at A4S Cloud Solutions has delivered a project to protect a large North-West housing association’s IT infrastructure and business applications based on Microsoft Azure Site Recovery so they can survive a complete loss of their data centre hosted services.
 
 
 
 
The client (Your Housing Group Limited) maintains an estate of over 28,000 properties across the North West, Yorkshire, and the Midlands. They employ over 1000 staff spread geographically across these regions and preventing a disaster such as a data centre loss, is a key risk they wished to mitigate.
 
The Need
 
Your Housing Group’s IT infrastructure and applications, despite being highly resilient were exposed to the risk of being hosted in a single data centre.
 
The client decided it needed to replicate their most critical workloads into a geographically separated location where they could function as normal and provide services to their users in the unlikely event of a data centre loss.
 
 
 
 
The below are detailed some of the client’s requirement:
  • Protect & replicate the required file server-based file shares, file and folders.
  • Provide a Windows Terminal server-based solution that mimics the client’s corporate Windows 10 environment and be client agnostic.
  • Be able to perform routine/scheduled backups when running in ‘DR mode’ on the stated applications.
  • Be able to host and protect SQL database servers.
  • Host a replica instance of clients on premise Active Directory to ensure ‘same sign on’ username and password.
  • Be integrated with Azure Active Directory thus able to ‘single sign on’ into Microsoft Office 365 services.
  • Must be scripted and automated as much as possible.
  • Where needed, it should be possible to ‘scale/burst’ into a larger machine template in the environment to provide the required performance levels.
  • Reporting must show host replication related metrics.
  • It must be possible to perform routine disaster recovery tests without impacting the live/production environment.
  • The solution must provide web based and email based 24/7 monitoring and alerting.
 
The Disaster Recovery Solution
 
The solution is based on a suite of cloud-based technologies to enable replication, failover, access, and failback when required:
  • Azure Site Recovery with custom automation to provide replication and failover capabilities.
  • Express Route to enable failover and failback replication.
  • Remote Desktop Services to allow access when running in a disaster recovery mode.
  • Windows 10 IOT to provide a consistent end user experience which can activate access to both the production and disaster recovery environment.
  • Azure File Synchronisation Services to protect file storage into the cloud.
The client’s environment would be reviewed and reconfigured where necessary to ensure it will continue to function in the event of a migration to the disaster recovery solution, this included areas such as mass file storage, distributed file services and their interfacing engine.
 
Microsoft Azure Site Recovery (ASR) is a cloud-based disaster recovery solution that is the backbone of the solution and can be deployed in three different scenarios which are on-premise to Azure, Azure to Azure or on-premise to on-premise.Azure Site Recovery is a cost-effective replication and failover solution, when not in use; the client incurs only ASR licensing and storage related costs.
 
 
ASR uses an on-premise based proxy to efficiently replicate workloads from the client’s hypervisor, replication can be monitored and reported on at any time, replication can function in either direction i.e. on-premise to cloud or the opposite for the purposes of fail back.
 
In the event of an actual data centre failover; the solution moves workloads from on-premise and into the client’s Azure cloud in a controlled and safe manner to be accessed remote using Microsoft Remote Desktop Services access from either laptops or thin clients.
 
Azure Site Recovery gives companies the ability to apply automation to the failover process for the purposes of extremely granular control.
 
In addition to the implementation of ASR; the clients interfacing solutions were improved so they could be supported more effectively in a disaster recovery scenario. Your Housing Groups thin client environment was also updated to allow better control in the event of a disaster, this would allow thin clients to be updated on-mass to reconnect to the new Azure location rather than their on-premise VDI environment.
 
The client’s infrastructure was also upgraded and reconfigured to ensure survivability in the event of a disaster, services such as DHCP were re-engineered to ensure recovery was possible at a different location, domain services were updated to ensure the necessary authentication and name resolution would continue, and file services were protected using a combination of Azure File Synchronisation and DFS-R.
 
Finally, the file storage environment was reviewed, and it was decided that a smart approach to protecting file services should be taken; thus, Azure File Synchronisation services was utilised to ensure only the files needed for core business functions in the event of the data centre failure were protected, this protects bandwidth and lowers costs compared to protecting an entire file serving estate.
 
Workload Discovery & Protection
 
The team at A4S Cloud Solutions worked with the client on a highly complex set of project activities that included an Azure Site Recovery build, an Azure based application delivery environment, the re-engineering of over 300 interfaces, a new thin client environment and changes to their wide area network to ensure their application workloads continued to function in a DR scenario.
 
Each workload was individually inspected, the necessary changes planned out and implemented through the combined efforts of A4S and client resources, once the necessary changes were made; each workload was moved into an Azure testing environment so that changes could be tested, and the necessary assurance be obtained.
 
The achieved recovery point objectives across 40 servers were typically between 3 and 10 minutes depending on data churn rate, this performance is achieved through the highly efficient constant replication between the on-premise ASR infrastructure and the Azure based ASR service.
 
To ensure the failover process is as smooth as possible; A4S augmented the ASR Recovery Plans with various automated actions such as disabling services, public IP address allocation and network security group assignment, also processes such as scaling up key servers and initiating Azure backup routines to protect failed over system are triggered through scripts.
 
Accessing in The Event of a Disaster
 
The client used on-premise VDI solution accessible via a thin client environment, this needed reconfiguring in the event of a disaster to connect to the recovered workloads in Azure, as the previous thin client solution could not be quickly or easily reconfigured; A4S delivered a new fully Microsoft infrastructure managed thin client solution using Windows 10 IOT, to find out more about this fully integrated solution please click here.
 
Testing and Assurance
 
To complete the project and provide the necessary assurance; an end to end test failover was performed by A4S and YHG teams, all infrastructure operations were proven before application teams proved the business applications and data performed correctly.
 
 
 
 
ASR allows a test failover to occur without any impact to live services or existing protection levels, this meant any risky changes or downtimes were completely avoided.
 
Team Knowledge Building & Empowerment
 
Significant time and effort were spent on service transition, the most effective knowledge and skills share was achieved during actual test failover with both client and supplier working through the end to end process of failure.
As test failovers continued the client was able to require less and less assistance from our team until they became fully self sufficient in performing the failover process.
 
Positive Use of The A4S Online Project Management Tool
 
The A4S Service Design and Transition services were delivered using our online collaboration platform for complete transparency and ease of communication, to find out more information about our approach then please click here.
 
 
 
 
Our online project management platform provides the following benefits:
  • Real time risk and issues sharing.
  • Realtime budget burndown reporting.
  • Instant view of progress against schedule.
  • All communications and uploads hosted in one location.
  • Instance reporting of major issues for handling.
  • Download your project status at any time.
The Major Benefits of This Solution Include
  • Complete protection for critical systems in the event of a data centre loss.
  • End to end failover now takes less than 6 hours (this includes the failover, infrastructure, applications and data testing).
  • RPOs for replication are typically between 3 and 10 minutes.
  • The costs for ASR protection are extremely low per month per server plus a small amount of storage cost, the client only incurs further charges during testing or actual failover.
  • Users are presented with a similar desktop experience using Microsoft Remote Desktop Services.
  • The thin client environment is now fully managed, easily reconfigured or updated.
  • The solution is also usable as a testing environment taking point in time clones of on-premise or Azure based server workloads for isolated testing.
Lessons Learned
 
A client’s environment must be very carefully inspected to ensure it will function in any target disaster recovery environment, it may require a range of modifications and supporting automations to ensure services function as normal.
 
In the case of YHG a significant change was undertaken to their interfacing environment mainly focusing on file storage changes and ensuring connectivity would remain in the event of a network change, storage dependency was also removed from the main file server to ensure good replication and failover performance.
 
YHGs Biztalk environment was also re-engineered to support disaster recovery and protection, at the time of writing Microsoft document also was lacking clarity which made the process more difficult. Eventually it was decided that a simpler approach to Biztalk disaster recovery was taken as it was felt the complexity of the Microsoft recommended approach brought more challenges than benefit.
 
From a supplier perspective we felt an up front and highly detailed inspection brought major benefit that contributed to the success of the project.
 
 

 

Finally, both supplier and client found the ability to run workloads in the disaster recovery environment without production being affected directly enabled success by removing time pressures from both IT and business users, and this same feature will be used in the future to ensure disaster recovery drills can routinely occur with little or no service disruption.
 
Conclusion
 
After a complex project, and close collaboration between client and supplier; Your Housing Group now has a disaster recovery solution that protects their most critical workloads from the risks of data centre failure.
 
The journey of discovery, reconfiguration and implementation also supported the client’s roadmap to move more services into the cloud and improved their knowledge and understanding of Azure services.
 
After significant and detailed testing, the solution was proven to give the assurance needed to the client and their auditors that the business could continue to function with their most critical applications and data available.
 
Most evident was the amount of close teamwork between A4S and key team members at YHG from a wide range of teams including application delivery, infrastructure, Dynamics 365, and interfacing, this has been a fantastic team effort!


Statement by YHG IT Director Darren Halliwell

“We felt the time was right to upgrade and enhance our existing IT disaster recovery solutions to protect and provide continuity of service for our Organisation.”

“Finding a trusted partner with a demonstrable solution we could buy into was our number one priority for this project.”

“A4S were able to discover and document our environment in a highly detailed manner which inspired confidence from the outset. They proposed a design to our team that included some significant changes to our applications and interfaces that would ensure they would function in the cloud when needed.”

“This was key to us as our strategy and technology roadmap is to shift more and more services to the cloud.”

“Through extensive testing, the solution gave us a high level of assurance alongside excellent levels of performance and a capability to provide multiple in day recovery point functionality.”

“Working closely with the key SME’s both in the design and testing phases, provided the assurance that they and the senior business sponsors were looking for to confirm that the design and implementation was a robust, scalable and fit for purpose IT DR solution for the organisation.”

“We are really pleased with the outcome delivered by A4S, it’s been a quality and timely delivery from a professional team who are passionate about delivering and meeting their customer’s needs.”
 

Statement By A4S Managing Director Jason Birchall

“A4S are extremely grateful to have had the opportunity to deliver such a mission critical disaster recovery solution for our client using Microsoft Azure services, our technical expertise and experience of housing sector systems made this possible. Our relationship with the client and their IT teams has grown stronger throughout this deployment as there were many shared lessons learned, long nights and difficult challenges.”

“Working with the YHG IT team is always an enjoyable and engaging process, they have highly skilled team members who were made available during this project, they’ve been engaging throughout the project and were always quick to respond with any urgently needed information and changes.”

“Our aim was to ensure the clients IT infrastructure was configured to best practice and in a way that it’d continue to function in a completely new cloud environment, failover is meant to be smooth and automated, replication intervals need to be extremely low, and the environment needs to feel familiar for the client’s internal users, we feel we have achieved those objectives in a way that was made possible by the continued support and cooperation with our client.”

“A4S would once again like to again thank the YHG IT teams for the opportunity to be part of this project and we look forward to furthering opportunities to work together in the future.”

Thanks for reading!

The A4S Team!

Would you like to know more about our solutions and services?

Feel free to email me at [email protected], or fill in a form with your questions!