Accessibility and performance issue on us.qubole.com

Incident Report for Qubole

Resolved

The database issue resolved, nodes have come back online and the environment is accessible.
Posted Jun 15, 2021 - 16:31 PDT

Update

All nodes of the problem cluster have been restarted, but Devops has run into an issue bringing some database resources back online. Devops is focused on resolving this error, which should restore functionality.
Posted Jun 15, 2021 - 14:16 PDT

Identified

Devops is continuing to work on this issue. While they troubleshoot a problem on an internal cluster, they are also confirming that access to the interface has degraded, frequently returning 404 or 502 errors.
Posted Jun 15, 2021 - 11:12 PDT

Monitoring

After restarting the nginix services, Qubole UI is now accessible on us.qubole.com

The team continues to monitor and investigate issues with the other identified issues (including Airflow and Notebooks). We will keep you posted with further updates, as soon as we receive them.
Posted Jun 15, 2021 - 06:37 PDT

Update

Devops team has restarted the nginx on webapp nodes at this stage after gathering the required logs for future analysis. After the restart, we could see that the webnodes are now joining back to the loadbalancer.

The investigation still continues to get this issue to resolution at the earliest
Posted Jun 15, 2021 - 06:27 PDT

Identified

Devops were able to identify errors in different tiers of us.q environment and are fixing webapp nodes that are not connected to ELB. The team is currently working towards resolution of the issue.
Posted Jun 15, 2021 - 04:49 PDT

Investigating

us.qubole.com is currently seeing some degraded performance, and occasionally returning 404 errors during access. At this time failures appear to be partial and intermittent, but Devops is investigating.
Posted Jun 15, 2021 - 02:24 PDT
This incident affected: us.qubole.com Environment (AWS) (Site Availability, QDS API, Command Processing, Qubole Scheduler, Cluster Operations, Notebooks).