Qubole continues to monitor the AWS API error rate issue in the us-east-1 region. At this time, the Availability Zone (AZ) performance is sporadic and inconsistent. AWS recognizes that existing instances were not affected, so existing clusters are generally operational. For your current cluster start operations, we can recommend the following: if you cannot start your cluster, in the cluster startup log, you will notice the AZ referenced. You may remove the private subnet for that AZ in your cluster config if many subnets are configured or replace with a private subnet of a different AZ.
Similarly, attempts to upscale or downscale your cluster including acquiring spot nodes may run into similar errors. If you are trying to downscale or terminate your cluster, you may need to attempt this multiple times in the Qubole UI or via API
Our goal is to work with AWS to ensure this issue is resolved as expediently as possible, but at this time, there is no definitive ETA as of their 8:23 am PDT update. If you would like to follow along with their updates, they are here: https://status.aws.amazon.com/