Configuring Cloud Storage
Managed Cloud Storage
To make storage management and configuration simple for user, CloudTik does two good things for you When you are creating workspace for a specific cloud:
CloudTik creates a managed cloud storage for you (S3 for AWS, Data Lake Storage Gen 2 for Azure, GCS for GCP) to use without any configurations.
CloudTik creates roles to access your cloud storage in the account and the cluster instances are assigned with the roles for gaining access without any credential configurations.
These give great convenience for most of the use cases. For users who need perform advanced configurations, CloudTik provide the flexibility to do so.
AWS S3
By default, CloudTik will create a workspace managed S3 bucket for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.
Creating a S3 bucket
Every object in Amazon S3 is stored in a bucket. Before you can store data in Amazon S3, you must create a bucket.
Please refer to the S3 Creating buckets for instructions.
Configuring S3 in CloudTik
The name of S3 bucket will be used in CloudTik S3 storage configurations.
# Cloud-provider specific configuration.
provider:
type: aws
region: us-west-2
storage:
# S3 configurations for storage
aws_s3_storage:
s3.bucket: your_s3_bucket
s3.access.key.id: your_s3_access_key_id
s3.secret.access.key: your_s3_secret_access_key
s3.access.key.id
: your AWS Access Key ID.
s3.secret.access.key
: your AWS Secret Access Key.
AWS Access Key ID and AWS Secret Access Key can be found from the AWS guide of Managing access keys.
Azure Storage
By default, CloudTik will create a workspace managed storage account and a Data Lake Storage Gen 2 container for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.
Creating Azure Storage
Azure Blob storage or Data Lake Storage Gen 2 are both supported by CloudTik. For performance, we suggest you to use Azure Data Lake Storage Gen 2.
If you want to create your own Azure storage account and a storage container within the storage account, please refer to Creating an Azure storage account for instructions.
Configuring Azure Storage in CloudTik
Storage account name and storage container name will be used when configuring Azure cluster yaml.
You will also need Azure account access keys when configuring an Azure configuration yaml file, which grants the access to the created Azure storage.
You will be able to fill out the azure_cloud_storage
for your cluster configuration yaml file.
# Cloud-provider specific configuration.
provider:
type: azure
location: westus
subscription_id: your_subscription_id
storage:
azure_cloud_storage:
# Choose cloud storage type: blob (Azure Blob Storage) or datalake (Azure Data Lake Storage Gen 2).
azure.storage.type: datalake
azure.storage.account: your_storage_account
azure.container: your_container
azure.account.key: your_account_key
subscription_id
: Subscription ID
of your Azure account.
azure.storage.account
: Azure Storage Account name that you want CloudTik help to create.
azure.container
: Azure Storage Container name that you have created.
azure.account.key
: your Azure account access keys.
Google GCS
By default, CloudTik will create a workspace managed GCS bucket for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.
Creating GCS Bucket
If you want to use your own GCS bucket, you can create one by following the Creating buckets. The name of bucket will be used when configuring GCP cluster yaml.
Configuring GCS in CloudTik
You can use the same login service account to gain access to the bucket or create a dedicated service account. Refer to Creating a service account if you need to create a service account.
To use the service account through API, you need a service account key. Refer to Create and manage service account keys for details.
To control access to the bucket, please refer to Google Cloud Storage: Use IAM permissions for instructions to add principal (using the service account) and roles for bucket resource. We suggest you to choose “Storage Admin” role to gain full access to GCS bucket.
You will be able to fill out the gcp_cloud_storage
for your cluster configuration yaml file using the download JSON key file.
# Cloud-provider specific configuration.
provider:
type: gcp
region: us-central1
availability_zone: us-central1-a
project_id: your_project_id
storage:
# GCS configurations for storage
gcp_cloud_storage:
gcs.bucket: your_gcs_bucket
gcs.service.account.client.email: your_service_account_client_email
gcs.service.account.private.key.id: your_service_account_private_key_id
gcs.service.account.private.key: your_service_account_private_key
A JSON file should be safely downloaded and kept after a service account is created.
project_id
: “project_id” in the JSON file.
gcs.service.account.client.email
: “client_email” in the JSON file.
gcs.service.account.private.key.id
: “private_key_id” in the JSON file.
gcs.service.account.private.key
: “private_key” in the JSON file,
in the format of -----BEGIN PRIVATE KEY-----\n......\n-----END PRIVATE KEY-----\n
Alibaba Cloud OSS
By default, CloudTik will create a workspace managed OSS bucket for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.
Creating a OSS bucket
If you don’t want to use managed OSS bucket, you can create you own bucket and configure to use in Cloudtik.
Please refer to OSS Creating buckets for instructions.
Configuring OSS in CloudTik
Once you have your OSS bucket, configure the OSS bucket name, the OSS access key id and the OSS access key secret like following in your cluster configuration file.
# Cloud-provider specific configuration.
provider:
type: aliyun
region: cn-shanghai
storage:
# OSS configurations for storage
aliyun_oss_storage:
oss.bucket: your_oss_bucket
oss.access.key.id: your_oss_access_key_id
oss.access.key.secret: your_oss_access_key_secret