With the advent of low cost, high speed, Wide Area Network (WAN) connections many organisations are realising the cost benefits and service improvements from integrating services across multiple sites. However anyone who has implemented SITE to SITE private WAN will know that WAN’s are not as resilient as LAN’s and are subject to occasional service interruptions due to equipment failure, re-convergence due to cloud traffic optimisation and (most often) maintenance in the Wide Area Network cloud.
This article describes some of the issues encountered in integrating services over a WAN and how they can be overcome.
CASE STUDY: AMBULANCE NHS trust
As an example, we will look at an Ambulance NHS trust that provides Computer Aided Ambulance Dispatch (CAAD) services from three separate 999 control rooms across the region in which it operates. The CAAD application is a client/ server application with Citrix sessions between client and server running over the local LAN.
Prior to integration, the three control rooms acted as separate autonomous entities. The Ambulance trust wished to improve service and benefit from the staff and infrastructure savings by integrating the three autonomous control rooms into a single “virtual” control room.
Step 1: Private WAN connectivity
The diagram above shows how the three sites can be interconnected via a high speed private WAN with 100Mbit/s connections at each site. This model is perfectly functional with CAAD clients at each site talking to servers at SITE B or backup servers at SITE C in the event of a server failure at SITE B.
The weakness of this design is that if there is a WAN outage, a number of CAAD clients will be isolated from the servers and become unusable for the duration of the outage. For an Ambulance trusts providing critical services to the public even short duration outages are unacceptable.
Step 2: Diverse private Wide Area Network connectivity
A solution to the problem described in step 1 is to provision two WAN connections at each site as shown in the diagram above. The WAN tails (where the WAN service enters the building) must be diversely routed to minimise the risk of common failure. More importantly, the two Wide Area Network connections should be provided by different suppliers and care taken to ensure that there no common paths in the cloud between the two WAN’s.
Adding this diversity to the design ensures that if one WAN connection fails then another is available to accept the traffic. It is, of course, possible that both WAN’s will fail at the same time, but the chances are drastically reduced.
As we stated at the beginning of the report, WAN service interruptions can occur due to equipment failure, re-convergence events and scheduled maintenance work. However, if we look more closely at the failure mechanism we will see that when these events occur, the switch over between WAN 1 (primary) and WAN 2 (secondary) cannot be instantaneous. It takes a small amount of time to detect the failure of WAN 1 and to re-route traffic over WAN 2. During this period no data is transmitted or received at the site. This period may be only a few seconds, but it will cause screen freeze of any CAAD client that is running from a remote server and impacts overall service delivery at an Ambulance trust.
Step 3: Packet level resilience
Happily, there are now a small number of products available which can manage WAN traffic on a packet by packet basis and ensure that, if any possible path is available, then the packet will be delivered across the WAN with minimum available latency.
In our case study a pair of T750 Talari units (www.talari.com) were installed as a high availability pair at each site in order to manage CAAD traffic across the network. The Talari units duplicate critical traffic at the source site over WAN 1 and WAN 2. The Talari units at the destination site then take the first packet which arrives and discards the duplicate packet.
The effect of this is to eliminate any “glitches” caused by outages on a single WAN. Every packet will be delivered as efficiently as possible.
The table above shows a summary of the performance of a Talari backed dual WAN solution at SITE-A over a period of one week. Ingress and Egress traffic over each WAN to both destination sites is evaluated separately.
- The Talari protected traffic had 100% uptime during this period as shown in the Uptime column enclosed by the green box.
- All of the individual WAN paths experienced some downtime during this period as shown in the Badtime column enclosed by the red box above.
- In fact, there was a major service outage lasting over 11 hours on one WAN1 between Site A and Site C.
- No CAAD screen freezes were experienced by the Ambulance trust during this period
Complete Networks Ltd are the Europe-wide experts in designing and implementing highly resilient WAN solution using Talari devices. For more information on improving your WAN performance, please contact firstname.lastname@example.org