Amazon Web Services Outage: What Happened and What’s Down Now

Amazon’s cloud service experienced a brief outage this week, interrupting service across parts of the U.S. East Coast and affecting several high-profile sites, including Netflix, Reddit and Foursquare.

Some European customers also faced disruption after lightning struck a data center in Dublin, Ireland.

The affected Amazon Web Services (AWS) systems were restored within about half an hour, but the event evoked memories of a much larger outage the company suffered in April that lasted two days.

In a statement, an Amazon spokesperson described the incident as an unusual power event that temporarily impacted storage servers: “Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process. We’ve been able to restore EC2 instances without attached EBS volumes, as well as some EC2 instances with attached EBS volumes. We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone.”

The spokesperson added that the disruption began when a transformer-related electrical event caused a transient deviation large enough to affect the phase control system that synchronizes backup generators, disabling some of them: “Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators…The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronises the backup generator plant, disabling some of them.”

The outage quickly trended on Twitter as users raised concerns about the reliability and safety of hosting critical data and services in the cloud, especially if similar disruptions were to occur in the future.

While this incident was resolved swiftly, it underscores ongoing challenges for cloud providers and their customers. High availability depends on redundant infrastructure, rapid detection and recovery processes, and sufficient spare capacity to handle unexpected loads. When multiple systems are affected simultaneously—such as storage volumes that require manual intervention—the time to restore full service can increase significantly.

For businesses that rely heavily on cloud platforms, the event highlights several practical considerations: implement multi-region or multi-zone redundancy to reduce single points of failure; maintain clear disaster recovery plans and regular backup verification; and monitor service-level announcements and health dashboards to stay informed during incidents. Organizations can also review their architecture to reduce dependencies on specific storage types or configurations that might require prolonged manual recovery.

From a user perspective, intermittent outages of major services are a reminder to plan for temporary access issues. Content caching, local backups of critical data, and contingency processes for customer-facing systems can help soften the impact of short-term outages.

Amazon has stated it is adding capacity both onsite and from other availability zones to aid recovery and help prevent similar slowdowns in restoration. As cloud adoption continues to grow, both providers and customers will likely keep investing in resilience measures to limit the scope and duration of future disruptions.