Accessibility and performance issue on us.qubole.com
Incident Report for Qubole
Resolved
The database issue resolved, nodes have come back online and the environment is accessible.
Posted Jun 15, 2021 - 16:31 PDT
Update
All nodes of the problem cluster have been restarted, but Devops has run into an issue bringing some database resources back online. Devops is focused on resolving this error, which should restore functionality.
Posted Jun 15, 2021 - 14:16 PDT
Identified
Devops is continuing to work on this issue. While they troubleshoot a problem on an internal cluster, they are also confirming that access to the interface has degraded, frequently returning 404 or 502 errors.
Posted Jun 15, 2021 - 11:12 PDT
Monitoring
After restarting the nginix services, Qubole UI is now accessible on us.qubole.com

The team continues to monitor and investigate issues with the other identified issues (including Airflow and Notebooks). We will keep you posted with further updates, as soon as we receive them.
Posted Jun 15, 2021 - 06:37 PDT
Update
Devops team has restarted the nginx on webapp nodes at this stage after gathering the required logs for future analysis. After the restart, we could see that the webnodes are now joining back to the loadbalancer.

The investigation still continues to get this issue to resolution at the earliest
Posted Jun 15, 2021 - 06:27 PDT
Identified
Devops were able to identify errors in different tiers of us.q environment and are fixing webapp nodes that are not connected to ELB. The team is currently working towards resolution of the issue.
Posted Jun 15, 2021 - 04:49 PDT
Investigating
us.qubole.com is currently seeing some degraded performance, and occasionally returning 404 errors during access. At this time failures appear to be partial and intermittent, but Devops is investigating.
Posted Jun 15, 2021 - 02:24 PDT
This incident affected: us.qubole.com Environment (AWS) (Site Availability, QDS API, Command Processing, Qubole Scheduler, Cluster Operations, Notebooks).