Accessiblity and performance issue on
Incident Report for Qubole
Devops has confirmed that they are no longer seeing the errors on the DB. Currently the application is able to connect to DB and is up and working fine. Currently Query hist, Cluster start, stop and Read only replica of Primary DB are up and running.
Posted Jun 22, 2021 - 11:57 PDT
The RDS DB seems to have hit the maximum limit of DB size for instances. As per recommendation from AWS, DevOps team are upgrading the instances, and are currently generating a backup of older instances, so that they can restore that backup on a newer instance with higher configuration.
Posted Jun 22, 2021 - 08:00 PDT
Devops is still continuing to troubleshoot issues with databases. Currently the production rstore read replica DB is erroring out, due to  which there are various issues being encountered. Devops is actively working on these issues and are looking forward to a resolution at the earliest.
Posted Jun 22, 2021 - 04:00 PDT
Devops team is continuing their effort in moving out the DB table dump from one of the impacted tables. Once the data dump is completed, Devops will be able to free up some space which will help in resolving the issue.
Posted Jun 22, 2021 - 00:04 PDT
Devops has identified issue with the production-rstore due to table space issue, and has reached out to AWS support to resolve the issue. Currently Ddevops is moving out the old table to get some space, as Increasing the file system is a complex process at this stage.
Posted Jun 21, 2021 - 20:02 PDT
Devops was able to identify an issue with the EKS Cluster, and they could see the qds api are failing from the logs. Currently the team is still troubleshooting and as next step are working on restarting the pods associated with EKS cluster.
Posted Jun 21, 2021 - 16:05 PDT
Investigating is currently seeing some degraded performance, and is returning errors during access. At this time failures appear to be partial and intermittent, but Devops is investigating.
Posted Jun 21, 2021 - 12:10 PDT
This incident affected: Environment (AWS) (Site Availability, QDS API, Command Processing, Qubole Scheduler, Cluster Operations, Notebooks).