Most regular users will have noticed that last night you where unable to access the forum for much of the evening.
Having contacted the host (who's site was also down) at the time I have now received an explanation of the outage which apparently stemmed from a large distributed denial of service on another company using the same data center. This has now been resolved as of 1 am this morning and we have returned to normal operation without loss of data.
The was also a very short down time at 17.00 while I took a backup of the site as a precautionary measure, and there will be another very short amount of down time at 22.00 while the data center change over some hardware to further prevent re-occurrence.
For those interested a full transcript of the reply from the hosts is included below.
Dear Daniel Hutchinson (Canalworld),
First and foremost we would like to thank you for your patience yesterday and hope that you will accept our sincerest apologies for the prolonged network outage. We fully appreciate the impact this issue will have had on your business/website/email and would therefore like to reiterate that we will be doing everything in our power to ensure something of this nature does not occur again.
Preliminary analysis indicates that the original cause of the incident was a large DDOS attack targeted against one or more tenants, such as us, but not us, hosted within our datacentre facility. The attack was of such magnitude that the external IP transit and peering links were saturated, causing instability. This was then magnified by the sheer volume of traffic being directed from the core routers towards their internal switching platform.
Despite the datacentre policy of over-provisioning all switch and network link capacity, this volume of traffic was sufficient to disrupt their control protocols that normally ensure proper operation of the switches. The network engineers worked quickly to null-route the DDOS traffic; however the induced instability within the switching network reintroduced problems and created others down-stream within their customer switching networks.
In order to contain the situation and restore normal operation, datacentre network engineers took the major step of shutting down segments of the core switching network and re-establishing each tenant connection (us being one) from scratch. This was ultimately successful, but entailed a large amount of reconfiguration and testing work which unfortunately took time and caused further intermittent packet loss and downtime for the large majority of our customers.
As of approximately 1:00 AM (GMT) datacentre engineers were able to restore full service. They have since decided that the quickest and most risk-free option for restoring full resiliency and adding additional network capacity is to bring forward planned maintenance to replace their existing core switches. Their network engineers will be working with J-TAC (Juniper Technical Assistance Centre) to complete staging of the new switches today, and will carry out emergency maintenance tonight (4th January) to perform the upgrade.
We anticipate this maintenance will commence around 2200 (10:00 PM GMT) and downtime should not exceed more than five minutes. We will issue an update once the network engineers confirm the time and an ALL-CLEAR update will be issued upon successful competition of this maintenance.
Once again, across the board we understand the negative impact that downtime can have on each and every customer. It is a nightmare for all parties involved and something that we take extremely seriously. And although what happened yesterday was extremely rare, we remain fully committed to doing everything possible to ensure that this will not happen again. performance improvements to all customers which will help mitigate similar issues in the future. We will announce these changes separate of this notification.
On behalf of our entire team we would like to thank you for your continued business and wish you the best moving forward in this New Year.
UK Web.Solutions Direct Ltd