Resolved -
Start time : 02/09/2024 10:30 UTC
End time : 26/09/2024 15:30 UTC
Root cause : This incident was due to change in underlying metrics infrastructure caused latency increase.
We would like to inform you that the incident have been resolved and the situation is back to normal.
We thank you for your understanding and patience throughout this incident
Oct 1, 09:19 UTC
Monitoring -
A change in underlying metrics infrastructure on August 21 ended up increasing latency for metrics calls in certain conditions, causing the September 2 event with a period of hours where all such calls ended up erroring. Other parts of the system decided that calls where latency was too high were in error, causing about a 3% error rate on metrics endpoints since then.
Increasing latency toleration in some parts of the system allows us to mitigate the impact and reach an almost 0% error rate. We continue to monitor and work on improvements to go back to nominal behavior. Please be advised that further details will be provided on Monday 30/09/2024.
Sep 27, 13:48 UTC
Update -
The incident is still ongoing. We would like to assure you that we are doing our utmost to resolve this situation as quickly as possible.
As soon as the situation evolves or the incident is resolved, we will keep you informed.
Sep 16, 09:25 UTC
Update -
The incident is still ongoing. We would like to assure you that we are doing our utmost to resolve this situation as quickly as possible.
As soon as the situation evolves or the incident is resolved, we will keep you informed.
Sep 6, 10:08 UTC
Identified -
Ongoing actions :Errors are again being observed since the 05/09/2024 08:00 UTC.
Our providers is currently working on a fix for this issue.
Sep 5, 15:29 UTC
Update -
Ongoing actions :Errors have been observed again the 04/09/2024 between 01:00 UTC and 2:00 UTC and also between 07:00 UTC and 08:00 UTC.
Our teams are currently monitoring the situation.
Sep 4, 08:46 UTC
Monitoring -
Ongoing actions : No more impact is being observed since 02/09/2024 09:30 UTC.
Our teams are currently monitoring the situation.
Sep 3, 13:05 UTC
Investigating -
Service impact : Errors are being observed again on API calls to metrics endpoints.
Customer may have error 500 and error 200.
Database service functionality is not impacted and should continue to work as expected.
Ongoing actions : Our technical teams are currently working with our partner to solve the issue.
Update will be posted as significant progress is made.
Sep 3, 07:57 UTC
Monitoring -
Ongoing actions : No more impact is being observed since 02/09/2024 20:00 UTC.
Our teams are currently monitoring the situation.
Sep 3, 07:17 UTC
Update -
The incident is still ongoing. We would like to assure you that our providers are doing their utmost to resolve this situation as quickly as possible.
As soon as the situation evolves or the incident is resolved, we will keep you informed.
Thank you for your understanding
Sep 2, 17:32 UTC
Update -
Our providers continuing to work on a fix for this issue.
Sep 2, 12:54 UTC
Identified -
The issue has been identified and a fix is being implemented.
Sep 2, 12:33 UTC
Investigating -
Start time : 02/09/2024 10:30 UTC
Service impact : API calls to metrics endpoints are erroring. Database service functionality is not impacted and should continue to work as expected.
Ongoing actions : Investigating
Our providers are working on the issue.
Update will be posted as significant progress is made.
Sep 2, 12:33 UTC