Cluster startup failure

Incident Report for Qubole

Resolved

Outstanding cluster issues appear to be specific to the clusters' configuration. At this time the service interruption is resolved.
Posted Apr 29, 2021 - 06:05 PDT

Update

Devops believes they have identified the issue preventing some individual clusters from coming online. They're monitoring to ensure that the change provided is the complete fix.
Posted Apr 28, 2021 - 07:42 PDT

Update

We are continuing to monitor for any further issues.
Posted Apr 25, 2021 - 22:53 PDT

Monitoring

A cluster engine restart has resolved this issue. Devops is resolving a few leftover cluster redirection issues manually.
Posted Apr 23, 2021 - 07:50 PDT

Identified

Tunnel server replacement uncovered an issue with the discovery server. The server is in the process of being replaced, and will have to be online before clusters can be started.
Posted Apr 22, 2021 - 06:09 PDT

Investigating

We are aware of a tunnel server availability issue on gcp.qubole.com that may prevent clusters from starting. Devops is in the process of restarting tunnel servers -- this incident will be updated as that is finalized.
Posted Apr 16, 2021 - 12:59 PDT
This incident affected: Cluster Operations.