OVHcloud Public Cloud Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance

[Global][Public Cloud] - Databases API instabilities

Incident Report for Public Cloud

Resolved

Start time : 02/09/2024 10:30 UTC
End time : 26/09/2024 15:30 UTC
Root cause : This incident was due to change in underlying metrics infrastructure caused latency increase.
We would like to inform you that the incident have been resolved and the situation is back to normal.

We thank you for your understanding and patience throughout this incident
Posted Oct 01, 2024 - 09:19 UTC

Monitoring

A change in underlying metrics infrastructure on August 21 ended up increasing latency for metrics calls in certain conditions, causing the September 2 event with a period of hours where all such calls ended up erroring. Other parts of the system decided that calls where latency was too high were in error, causing about a 3% error rate on metrics endpoints since then.
Increasing latency toleration in some parts of the system allows us to mitigate the impact and reach an almost 0% error rate. We continue to monitor and work on improvements to go back to nominal behavior. Please be advised that further details will be provided on Monday 30/09/2024.
Posted Sep 27, 2024 - 13:48 UTC

Update

The incident is still ongoing. We would like to assure you that we are doing our utmost to resolve this situation as quickly as possible.

As soon as the situation evolves or the incident is resolved, we will keep you informed.
Posted Sep 16, 2024 - 09:25 UTC

Update

The incident is still ongoing. We would like to assure you that we are doing our utmost to resolve this situation as quickly as possible.

As soon as the situation evolves or the incident is resolved, we will keep you informed.
Posted Sep 06, 2024 - 10:08 UTC

Identified

Ongoing actions :Errors are again being observed since the 05/09/2024 08:00 UTC.
Our providers is currently working on a fix for this issue.
Posted Sep 05, 2024 - 15:29 UTC

Update

Ongoing actions :Errors have been observed again the 04/09/2024 between 01:00 UTC and 2:00 UTC and also between 07:00 UTC and 08:00 UTC.
Our teams are currently monitoring the situation.
Posted Sep 04, 2024 - 08:46 UTC

Monitoring

Ongoing actions : No more impact is being observed since 02/09/2024 09:30 UTC.
Our teams are currently monitoring the situation.
Posted Sep 03, 2024 - 13:05 UTC

Investigating

Service impact : Errors are being observed again on API calls to metrics endpoints.
Customer may have error 500 and error 200.
Database service functionality is not impacted and should continue to work as expected.
Ongoing actions : Our technical teams are currently working with our partner to solve the issue.
Update will be posted as significant progress is made.
Posted Sep 03, 2024 - 07:57 UTC

Monitoring

Ongoing actions : No more impact is being observed since 02/09/2024 20:00 UTC.
Our teams are currently monitoring the situation.
Posted Sep 03, 2024 - 07:17 UTC

Update

The incident is still ongoing. We would like to assure you that our providers are doing their utmost to resolve this situation as quickly as possible.

As soon as the situation evolves or the incident is resolved, we will keep you informed.

Thank you for your understanding
Posted Sep 02, 2024 - 17:32 UTC

Update

Our providers continuing to work on a fix for this issue.
Posted Sep 02, 2024 - 12:54 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Sep 02, 2024 - 12:33 UTC

Investigating

Start time : 02/09/2024 10:30 UTC
Service impact : API calls to metrics endpoints are erroring. Database service functionality is not impacted and should continue to work as expected.
Ongoing actions : Investigating
Our providers are working on the issue.
Update will be posted as significant progress is made.
Posted Sep 02, 2024 - 12:33 UTC
This incident affected: Data & Analytics || Kafka (BHS, DE, GRA, SBG, UK, WAW), Databases || Cassandra (BHS, DE, SBG, UK, WAW), Data & Analytics || Grafana (BHS, DE, GRA, SBG, UK, WAW), Data & Analytics || Kafka Connect (BHS, DE, GRA, SBG, UK, WAW), Data & Analytics || Kafka MirrorMaker (BHS, DE, GRA, SBG, UK, WAW), Databases || MySQL (BHS, DE, GRA, SBG, UK, WAW), Data & Analytics || OpenSearch (BHS, DE, GRA, SBG, UK, WAW), Databases || PostgreSQL (BHS, DE, GRA, SBG, UK, WAW), Databases || Caching (BHS, DE, GRA, SBG, UK, WAW), Databases || M3 Aggregator (BHS, DE, GRA, SBG, UK, WAW), and Databases || M3DB (BHS, DE, GRA, SBG, UK, WAW).