Issue with outbound calls and connectivity

Incident Report for Level365

Postmortem

On the morning of August 14th, 2023, our internal monitoring detected that resource utilization for a particular service on one node in our Midwest data center increased from 6% to 80%. We immediately engaged our engineering department and began to investigate the cause of this. During our investigation, another increase occurred raising the 80% utilization to 100%. Once this occurred the node became intermittently unavailable to clients, causing a partial service disruption. Restarting the offending service successfully returned resource utilization to the previous 6% baseline and eliminated the disruptions being experienced.

We apologize for this brief disruption in service performance. Further post-incident troubleshooting indicated that the service in question suffered from a memory leak in another service. This memory problem was triggered by an unrelated issue that has since been resolved. Engineering is working on a permanent fix for the memory leak in the offending service, and we will be installing this in the future once it has been completed and tested.

Posted Aug 30, 2023 - 16:24 EDT

Resolved

This issue has been resolved. A post mortem will be prepared outlining the root cause.

Posted Aug 15, 2023 - 09:27 EDT

Monitoring

The impacted services have been restarted and we are no longer seeing the issues. We will continue to monitor this and provide further updates.

Posted Aug 14, 2023 - 10:56 EDT

Investigating

We are currently investigating an issue with outbound calling and device connections.

Posted Aug 14, 2023 - 10:45 EDT

This incident affected: UCaaS (Core Services).