Delays in push notifications

Incident Report for Zulip Cloud

Postmortem

Push notifications in Zulip are done by a background worker, which talks to Apple and Google servers to notify applications on users' mobile devices. On Monday, the rate of push notifications generated by the Zulip Cloud service surpassed the rate which a single worker could send those notifications, leading to a backlog of notifications. This was further worsened by other load on the system, which compounded the backlog, leading to delays of up to 10 minutes between when a push notification was triggered, and when it was sent to users' mobile devices.

We have since split the workers which deliver these notifications, allowing us to process many more in parallel.

Posted Dec 05, 2023 - 20:17 UTC

Resolved

This incident has been resolved.
Posted Dec 04, 2023 - 22:24 UTC

Monitoring

The mobile notifications delay is now down to 2.5 minutes, and we expect to clear that backlog shortly. We will continue to monitor the situation.
Posted Dec 04, 2023 - 18:20 UTC

Update

We are working on a fix for the issue. Notifications are now backlogged by 10 minutes.
Posted Dec 04, 2023 - 16:30 UTC

Identified

We are currently experiencing a backlog in our push notifications service, which is causing up to 5 minute delays in devices receiving push notifications from Zulip Cloud. Push notifications from self-hosted Zulip servers using our push bouncer services are not currently affected.
Posted Dec 04, 2023 - 15:42 UTC
This incident affected: Mobile Push Notification Service.