Hosted Email Service Degredation

Incident Report for Enom

Postmortem

Wed, Nov 13, 2024 8:00pm - Tue, Nov 19, 4:00pm EST

Description

On Nov 13, 8:00am EST , we had a hardware failure which caused IMAP and Webmail services to fail on one of our mailstores.

The failure and the normal inbound flow of requests resulted in an unexpected increase in the number of in-flight requests, which caused performance issues on the surviving members of the cluster. The net result was a degraded service level that did not self-remedy after the failover as expected.

This resulted in a subset of users unable to access their mailbox during the ongoing maintenance.

The increase in in-flight requests as the peak hours approached further compounded the issue, affecting the load on other platform components. This resulted in issues that affected normal access to IMAP and Webmail services.

Root Causes Found

The cause was due to hardware failure compounded with an increase of workload demands on the surviving network elements.

Solution

In order to recover from the peak backlog, we had to throttle connections and slowly enable service to stabilize the cluster.

Post Mortem

To further address recurrence of issues like this, we have modified the way in which users are assigned to different platform components. We have begun work to transfer user mailboxes to other elements to improve the resource requirements across the board.

Posted Nov 19, 2024 - 13:56 PST

Resolved

This incident has been resolved.

Posted Nov 19, 2024 - 13:52 PST

Monitoring

A fix has been implemented, and we will continue to monitor to make sure there are no other issues.

Posted Nov 19, 2024 - 13:33 PST

Investigating

We are investigating an issue preventing some users on our mail system from logging into Webmail as well as their outgoing POP and IMAP connections. Our engineering team has been engaged.

We will provide an update once we have additional information.

Posted Nov 19, 2024 - 12:16 PST

This incident affected: Basic Email Service.