On Saturday 21th, October 2023 08:13 UTC a router in Montreal's PoP (Point of Presence) broadcasted warnings.
Our technical teams began investigating the situation as soon as the alerts was received.
While we started troubleshooting, clients began to report issues between BHS (Beauharnois) datacenter and external sources (such as their own external server or third-party services).
The external network traffic going through the PoP and cascading down to BHS was degraded while our internal services were operational.
Since our monitoring essentially rely on internal agents and network traffic wasn't fully down, we didn’t immediately identify that select external requests could fail randomly. In such a context the investigation was effectively slowed down which we regret.
Due to the nature of the issue (Network), multiple impacts have been identified, such as:
VPN instabilities
Reaching OVHcloud public IPs
Timeout from external domains to OVHcloud
Servers up from inside OVHcloud's network but down from an external point of view
Intermittent connection issues to OVHcloud servers
Intermittent host names resolution issues
Servers not answering external requests or very slowly
Ping/traceroute not reaching OVHcloud
Packet loss between multiple network links and IPs
Around 14:30 UTC, we identified the faulty network device which had an issue with its FIB (Forward Information Base). We immediately began the verification and isolation processes.
The issue was resolved by this action.
Some additional time was needed to fully deem the incident resolved.
Post-incident investigation points to a third-party software malfunction.
The issue has been raised to appropriate recipients.
This incident will help us improve on our action plans (campaign plan to check all our Pop devices).
We are sorry for any inconvenience caused by this issue.