Tenzix - Energy Failure in Portuguese Datacenters – Incident details

Energy Failure in Portuguese Datacenters

Resolved
Operational
Started 2 months agoLasted about 11 hours

Affected

Datacenters

Operational from 11:06 AM to 1:57 PM, Major outage from 1:57 PM to 2:58 PM, Operational from 1:57 PM to 2:58 PM, Major outage from 2:58 PM to 10:00 AM, Operational from 10:06 PM to 10:00 AM

LX1

Operational from 11:06 AM to 2:58 PM, Major outage from 2:58 PM to 10:00 AM, Operational from 10:06 PM to 10:00 AM

OPO1

Operational from 11:06 AM to 1:57 PM, Major outage from 1:57 PM to 10:00 AM, Operational from 10:06 PM to 10:00 AM

Tenzix Ecosystem

Operational from 11:06 AM to 3:21 PM, Major outage from 3:21 PM to 10:00 AM, Partial outage from 9:26 PM to 10:00 AM, Operational from 10:06 PM to 10:00 AM

tenzix.com

Operational from 11:06 AM to 3:21 PM, Major outage from 3:21 PM to 9:26 PM, Partial outage from 9:26 PM to 10:00 AM, Operational from 10:06 PM to 10:00 AM

dash.tenzix.com

Operational from 11:06 AM to 3:21 PM, Major outage from 3:21 PM to 10:00 AM, Operational from 10:06 PM to 10:00 AM

Updates
  • Update
    Update

    Yesterday, starting at 15:28, Lisbon Time, all Tenzix services became unavailable due to a nationwide power outage.

    Even though the power outage started at 11:33, our backup energy systems picked up the workload and allowed us to continue operating normally.

    Unfortunately, at 15:28 one of the generators powering the data center failed, making the energy systems unstable.

    Afterward, a second unit started to fail as it was overheating from the extra workload, leading to another failure.

    Our team was on-site and of site monitoring and updating all our customers and partners and trying to restore our normal operations.

    At 17:45, our remote team rerouted our main website to a new server allowing us to better notify customers unaware of what was going on.

    Even though the power grid started to come back online prior to the service restoration, we took precautions by checking all services to prevent damage.

    Everything is now operational and all systems are restored to full capacity.

    After this incident, we've decided to credit our users, even though we are below our annual SLA. We will be giving this credit to users who open a ticket and make that request to our sales team.

    We're sorry for this incident that was beyond our control. 

    We are already working with our data center operator to check what can be done better next time on our side to prevent such a drastic downtime.

    For starters, our website will now be routed through 2 different countries, to prevent communication issues like this. All our email services were down because we relied on a single location.

    As so, we're moving some of our services to new geo-redundant solutions.

    As for DNS services for customers, we will be adding another external location for redundancy. 

    Thanks for your continued trust and for making us want to improve for you.

  • Resolved
    Resolved

    All services are now online. We will be moving back our website so that it's back to it's normal operations.

    Please, open a ticket to claim your SLA Credit. It will be given to users starting tomorrow evening.

    We will also be posting an announcement regarding this incident and explaining all that happened with more detail than in our status page.

    Thanks for your understanding.

  • Update
    Update

    Energy was fully restored in our Porto Datacenter.

    Our systems are currently recovering.

  • Monitoring
    Monitoring

    All of our generators in our LX1 datacenters have ran out of Diesel.

    We're already waiting for a refill. Our technicians are already on site trying to re stablish all services.

  • Update
    Update

    We are continuing to work on a fix for this incident. Unfortunately our LX1 datacenters is now being affected as well.

    We will keep updating this incident as we receive new information.

  • Update
    Update

    Due to this incident some additional packet loss is being detected. We're working on restoring to normal values all our infrastructure.

  • Identified
    Identified

    We are currently facing a major power grid failure in both our portuguese datacenters.

    We are currently running on our backup energy systems. We will update once everything is restored.