Spark and Presto query failures

Incident Report for Qubole

Resolved

Devops expects operational issues to be resolved. After restarting discovery, they needed to augment client nodes to serve the scope of traffic.
Posted Apr 27, 2021 - 13:45 PDT

Update

We are continuing to monitor for any further issues.
Posted Apr 25, 2021 - 22:51 PDT

Update

An additional incidence of stalled operations was reported yesterday evening (4/24), which have since cleared. Devops is looking into a root cause for the stall, so that a more permanent fix can be applied.
Posted Apr 25, 2021 - 04:47 PDT

Update

We are continuing to monitor for any further issues.
Posted Apr 25, 2021 - 04:44 PDT

Monitoring

Devops is monitoring its latest fix -- this should be resolved. Additional information about the resolution will be added after monitoring.
Posted Apr 23, 2021 - 11:07 PDT

Update

We are continuing to investigate this issue.
Posted Apr 22, 2021 - 07:56 PDT

Investigating

Spark and Presto queries run in in.qubole.com may stall, returning Pending or Queued status. Devops is investigating.
Posted Apr 22, 2021 - 05:52 PDT
This incident affected: QDS API, Command Processing, Qubole Scheduler, and Cluster Operations.