Following the outage last week I wanted to give you an update on what happened that evening and how we are focussed on continually improving the 48 network experience.
As I mentioned previously we have an ongoing investment programme on our platform and are coming to the end of a significant phase of that programme. There are still some final elements of this programme to be completed in January and February next year and unfortunately these are the elements that caused the problem last week. They also added to the length of time the system took to recover full service.
In accordance with best practice we had implemented a change freeze on the network in December and were confident that the work we had completed over 2013 had us in a good place over the Christmas period and into January when we recommenced the upgrade programme. This turned out not to be the case & it was the specific nodes which were next on the list for an upgrade that exasperated the issues. It is particularly disappointing that you have experienced another outage like this given that we have in fact invested a lot of time and money into the network throughout 2013, but we are confident that we will see the benefit of these upgrades in 2014.
With regard to the event itself our systems first started reporting errors at 5:43pm on Wednesday 19th December. The errors indicated that there was a fault with the hardware that manages the throughput of traffic across our core platform. This immediately affected calls, texts and data. It also resulted in the failure of our website services so that you could not buy memberships or add-ons on our website either.
The issue was escalated immediately with our service providers. Due to the nature of the issue our engineers could not rectify it remotely and were sent to the data centre. Data centre engineers began investigating the issues before our own engineers arrived. Unfortunately when the faulty hardware was fixed at around 8pm the rest of the network took a lot longer to recover but we saw the website services return at that time. This was the key focus until we saw service resume at 1:25am that night. Our engineers worked hard throughout the evening to bring all the necessary systems back online as quick as possible. Unfortunately this process can take some time due to the complexities involved in this work. Our engineers then remained onsite for a rest of the night to ensure the systems were stable.
We have now put in place a higher level of monitoring on the hardware impacted by this issue. We have already made some configuration changes to the hardware that manages the throughput of traffic across the platform and have replaced the specific hardware that was showing the errors.
We are committed to providing a better service to our customers and are confident that the final changes we will be making early next year will allow us to provide a significantly improved and robust service for our customers.
The purpose of this level of detail is to explain what happened, not to excuse it. We do know how frustrating and annoying these outages are and we can’t apologise enough for it.
there have been no issues with the network since service resumed on Thursday night. However, there may be local issues with the O2 radio network (their masts) in your area which means you can't pick up a signal.
You have alseady rebooted your phone, so if you could try to search for networks manually, find 48 and then connect to it. If that doesn't work then there may be a local O2 mast issue in your area. If you ask a 48 agent and mention the area you are in they should be able to tell you if there is an issue in your area and/or help you get connected again.
A fairly thorough explanation and I think many people appeciate that.
This outage appears from your explanation to have been fully self inflicted and not due to any third party and it is welcome that you admit that.
It strikes me that as a low cost service 48 has failed to invest in the key skills needed to adequately plan, test and implement the very improvements that are indeed welcome to every customer. This has lead to a) a meltdown, and b) an inability to recover from that meltdown in the kind of time frame that this kind of critical communications service demands.
When 48 offers a service to the public, cheap or not, it invites ordinary people to depend on and rely on 48 for critically important support when they are involed in seriously important moments in their lives, and in crises. This time I was lucky and doing nothing at all important. But many many people will have suffered appalling inconvenience and worry and stress and upset.
it is this that your senior management need to keep in mind when you are considering recruitment and investment in skills needed for planning and monitoring.
our reception was out for over 8 hours and I had no way of contacting anyone and had to walk 3 miles home in the rain, why should we not get compensation when we pay our membership every month and this company is supposed to be on the 02 reception lines.