Current status: All our ducks are in a row.

What is this site? We monitor the status of coursera.org and our class sites, and we update here whenever there are interruptions in service.

Saturday, May 11, 2013

2013-05-11: [Follow up] Site-wide Interruptions


Beginning around 23:22 UTC May 11, our database hosted on Amazon Web Services unexpectedly entered into a "failed connection" state. As our database caches expired, we experienced progressive site failure, ultimately leading to a site-wide outage.  We restored all class.coursera.org functionality at 00:00 UTC (12 May), and www.coursera.org functionality for all save Chrome users. Chrome users continued to experience issues browsing courses on www.coursera.org until 00:23 UTC, due to an experiment to optimize loading speed that required additional time to fix.

Based on early reports, it appears that the database outage was due to a bug in the way Amazon Web Services interacts with MySQL, the software powering our databases. Specifically, we believe the problem was caused by a MySQL binlog issue, a log that ensures that updates to the database are not lost. Engineers at Coursera and Amazon are continuing their investigation to determine the root cause of the outage.

Any downtime is unacceptable to us; we will be working closely with Amazon to prevent a similar issue from occurring again. We sincerely apologize for the inconvenience caused to the Coursera community.

2013-05-11: [Resolved] Site-wide Interruptions

Normal access to our site has been resolved, as of 5:25 pm PDT. Thank you for your patience.

2013-05-11: Site-wide Interruptions

We are currently experiencing issues with our servers that are causing site-wide interruptions. We are working with Amazon to resolve the issue. Please be patient as we investigate the problem.