All Systems Operational

About This Site

This site is the Qubole's home for information on QDS system performance and availability.

Privacy Statement

api.qubole.com Environment (AWS) Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
us.qubole.com Environment (AWS) Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
Quantum ? Operational
wellness.qubole.com Environment (AWS) Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
gcp.qubole.com Environment (GCP) Operational
Site Availability Operational
QDS API Operational
Command Processing Operational
Qubole Scheduler Operational
Cluster Operations Operational
Notebooks Operational
gcp-eu.qubole.com Environment (GCP) - BETA Operational
Site Availability ? Operational
QDS API Operational
Command Processing Operational
Qubole Scheduler Operational
Cluster Operations Operational
Notebooks Operational
in.qubole.com Environment (AWS) Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
eu-central-1.qubole.com Environment (AWS) Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
oraclecloud.qubole.com Environment (Oracle) Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
Qubole Community & Support Portal ? Operational
Site Availability ? Operational
QDS API ? Operational
Command Processing ? Operational
Qubole Scheduler ? Operational
Cluster Operations ? Operational
Notebooks ? Operational
Site Availability ? Operational
QDS API Operational
Command Processing Operational
Qubole Scheduler Operational
Cluster Operations Operational
Notebooks Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Scheduled Maintenance
Update - Schedule update: The date and time for the maintenance have not changed. It is still scheduled for 5.21.2022 at 21:00 PDT. However, the projected window for the maintenance has been reduced. Maintenance is expected to complete within 2 hours.
May 19, 10:29 PDT
Update - UPDATE/CORRECTION: The maintenance window has been moved one week later to May 21st, at the same time.
May 11, 15:47 PDT
Update - We will be undergoing scheduled maintenance during this time.
May 11, 15:42 PDT
Update - We will be undergoing scheduled maintenance during this time.
May 11, 14:39 PDT
Scheduled - Important System Update Announcement

DATE CORRECTION/UPDATE: NOW MAY 21st

May 21st System maintenance on Qubole’s API environment



Below is a summary of the effort……



1. Qubole will be doing maintenance on api.qubole on May 21st between 11PM and 4AM CST. This is necessary to convert api.qubole to an entirely Virtual Private Cloud (VPC) environment.

2. This update will require no reconfiguration on the part of Qubole customers.

3. This update will improve stability and will be followed by an upgrade to R60. The following R60 update will be announced in a separate communication.

4. We acknowledge the impact of taking the environment down will have to our customers' business operations and have put forth every effort to minimize that impact.

5. Once the environment is converted, the Qubole team will test to ensure that all resources are available and operational. If any issues are found in testing then api.qubole will be rolled back to the previous state for the environment.

May 2, 06:52 PDT
Past Incidents
May 19, 2022

No incidents reported today.

May 18, 2022

No incidents reported.

May 17, 2022

No incidents reported.

May 16, 2022
Resolved - The issue with degradation in performance on API has been resolved. Customers should be able to execute their jobs and workloads now.
May 16, 16:18 PDT
Update - The issue with degradation in performance on API has been resolved. Customers should be able to execute their jobs and workloads now.
May 16, 15:15 PDT
Identified - The available space in the RStore MySQL database is reaching its limit and causing degraded performance. To resolve the issue we are in the progress of:

1. The quickest resolution to the issue is for AWS to convert our filesystem on classic from 32 bit to 64 bit on classic. This will expand the MySQL limit on data file size from 2TB to 64TB.

2. For AWS to complete this we need to create a read replica for them to convert while the current rStore stays running.

3. Once the conversion is complete we can cut over to the new instance on the 64-bit file system.

4. AWS estimates about 8 hrs to do the conversion, the Read Replica is currently being created. Once we have that then we will have an estimate on getting a fix in the performance degradation

May 16, 12:04 PDT
Investigating - We are seeing a degradation in performance on API and are looking into it currently, we will have an update in the next hour.
May 16, 10:09 PDT
May 15, 2022

No incidents reported.

May 14, 2022

No incidents reported.

May 13, 2022

No incidents reported.

May 12, 2022
Resolved - The issue with the rStore database has been resolved. Customers should be able to execute their jobs and workloads now.
May 12, 00:50 PDT
Monitoring - The issue with the rStore database has been resolved. Customers should be able to execute their jobs and workloads now.
May 11, 20:02 PDT
Update - We are still proceeding with the plan as outlined and on track to complete by 12:00 CST. We will update here if there are any changes.
May 11, 18:06 PDT
Update - Upon further investigation and working with AWS support we have a new update and plan:
1. In working with AWS this afternoon, DevOps figured out that a table reached the MySQL 2TB limit. This table is a system table so we cannot delete data.
2. The cause is that multiple tables are writing to the same file. Good practice would have been to have a separate datafile for each table, which was not the case.
3. To fix they will:
-Backup a handful of tables they are going to move data into their own files.
-Drop those tables and recreate them with their own data files.
-Restore the data to those tables which should move the data into their own data files and split it out of the data file with the 2TB limit thus
freeing space.
4. This should defragment the database and free up space while decreasing the file size of the data file running into the limit.

This will be a temporary measure to get back up and running. The process of testing and implementation should take the next 8 hrs or so depending on the data load. We estimate that by 12:00 CST to be complete and back up. The long term solution is to rebuild the entire database. That can be done offline and then cutover to it once it's ready, so no downtime would be involved. We have done similar updates in the other regions with no impact or downtime with customers.

May 11, 15:10 PDT
Update - As per the last update, we are still in the progress of moving the data.
May 11, 13:04 PDT
Update - Latest Update:

What caused the outage

* The Rstore database had a table that filled up and also caused the disk space to fill up, which caused the database to not respond.
Customers are not able to run jobs because of the unresponsive Rstore database


What has been done to resolve so far

* Increased memory and storage on instance

* The table was cleared but the disk space was not reclaimed and is still full.
* Engaged AWS and determined that we cannot set the parameter for the table to autoscale because it has to be set upon creation.
* Created a new instance from the old database increased storage and memory.


What’s Next

* The new mySQL database in in place, and setup is complete.
* Export data to S3 from prior DB, in progress.
* Import Data from prior instance to new instance.

Estimated ETA to complete the data load is 24hrs due to the size of the MySQL database (1TB+). We are working with AWS to identify any methods to decrease data load time. We will provide updates here if there is any change to the timeline.

May 11, 08:08 PDT
Update - -Right now, the Task is Under Investigation.
-Given the current RDS DB (MySQL) instance is using the deprecated major version (5.6.39) and the tablespace seems full even after applying the innodb_file_per_table=1.
-The team is currently working to migrate the environment along with DB to a supported version of MySQL.

We are continuing to investigate and will update accordingly.

May 11, 03:13 PDT
Update - Latest updates:
-Cleared the storage issues and the low memory on the longer running tunnels.
-Updated the RDS memory from 5000 GB to 5500 GB in the production rstore RDS instance as well as the replicate production rstore. This takes about 6 hours as per Amazon document. We started it about 5PM CST, so around 11PM CST the updated instance with added memory size should be up and running

After taking steps to free up storage the issue still exists and the storage is not being released. We are continuing to investigate and will update accordingly.

May 10, 20:50 PDT
Update - We continue to work on clearing resources and expanding the limits in the rStore database. We should have an ETA shortly.
May 10, 18:15 PDT
Identified - We have identified a full table in the Rstore database that appears to be causing the issue. We are in the process of clearing that condition.
May 10, 14:37 PDT
Investigating - Several customers are experiencing issues when scheduling jobs. We are looking into the matter and will update shortly.
May 10, 12:39 PDT
May 11, 2022
May 10, 2022
May 9, 2022

No incidents reported.

May 8, 2022

No incidents reported.

May 7, 2022

No incidents reported.

May 6, 2022

No incidents reported.

May 5, 2022

No incidents reported.