Creating Cluster
The cluster configuration is defined within a YAML file. It will be used by CloudTik to launch head node, and its cluster controller on head node to launch worker nodes.
CloudTik provides cluster configuration yaml examples, which are located under CloudTik’s examples/cluster/
directory.
Please follow instructions below to customize your cluster configuration.
Execution Mode
Choosing between host mode or container mode is simple with ‘enabled’ option under ‘docker’ config.
The following config chooses to run on host mode:
# Turn on or off container by setting "enabled" to True or False.
docker:
enabled: False
CloudTik runs with container mode by default if it is not explicitly disabled like above.
There are a few other advanced options for container mode. For details, please refer to Advanced Configuring: Configuring Container Mode
Controlling the Number of Workers
The minimum number of worker nodes to launch, and the default number is 1. You can change it according to your use case to overwrite default value as below, which sets minimum number of worker nodes to 3.
On AWS or Azure:
available_node_types:
worker.default:
min_workers: 3
On GCP:
available_node_types:
worker-default:
min_workers: 3
Choosing Runtimes for Cluster
Without extra configuration, CloudTik creates a cluster with the Spark runtimes by default. You can use ‘types’ configure key under ‘runtime’ to configure the list of runtimes for the cluster. Runtimes will be automatically configured to use the other runtimes if it there is a dependency.
The following example will start a cluster with HDFS as default storage and Spark.
runtime:
types: [hdfs, spark]
For more detailed information of each runtime, please refer to Reference: Runtimes
Using Bootstrap Commands to Customize Setup
You can specify a list of commands in the ‘bootstrap_commands’ in cluster config to run customized setup commands.
# Commands running after common setup commands
bootstrap_commands: ['your shell command 1', 'your shell command 2']
For example, If you want to add more integrations to your clusters within the setup steps, such as adding required tools to run TPC-DS,
please add the following bootstrap_commands
section to your cluster yaml file, which will install TPC-DS and required packages
with specified scripts to set up nodes after all common setup commands finish.
bootstrap_commands:
- wget -O ~/bootstrap-benchmark.sh https://raw.githubusercontent.com/oap-project/cloudtik/main/tools/benchmarks/spark/scripts/bootstrap-benchmark.sh &&
bash ~/bootstrap-benchmark.sh --tpcds
CloudTik allows user to custom as mang commands as CloudTik itself does. For more advanced commands customization, please refer to Advanced Configuring: Using Custom Commands
Mounting Files or Directories to Nodes
To mount files or directories to each node when cluster starting up, add the following to cluster configuration file.
# Files or directories to copy to the head and worker nodes. The format is a
# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
file_mounts: {
# "/path1/on/remote/machine": "/path1/on/local/machine",
# "/path2/on/remote/machine": "/path2/on/local/machine",
"~/.ssh/id_rsa.pub": "~/.ssh/id_rsa.pub"
}
Using Templates to Simplify Node Configuration
CloudTik designs a templating structure to allow user to reuse a standard configurations. A template defines a typical or useful configurations for node such as the instance type and disk configurations. By inheriting from a template, you get these configurations by default. You can use the keyword ‘from’ to specify the template you want to inherit from.
For example, the following instruction at the beginning of the cluster config declares
the inheritance from a template called ‘aws/standard’ which is defined as part of CloudTik
within python/cloudtik/templates
folder.
from: aws/standard
For a list of CloudTik defined system templates, please refer to python/cloudtik/templates
directory
under the cloudtik pip installation.
For more details as to templates, please refer to Advanced Configuring: Using Templates
Configuring Cluster Key
If you don’t specify cluster key information under auth section of configuration file,
The cluster key will be created automatically for AWS and GCP.
For Azure, you need to generate an RSA key pair manually (use ssh-keygen -t rsa -b 4096
to generate a new ssh key pair).
and configure the public and private key as following,
auth:
ssh_private_key: ~/.ssh/my_cluster_rsa_key
ssh_public_key: ~/.ssh/my_cluster_rsa_key.pub
For more details as to cluster key configuration, refer to Advanced Configuring: Understanding Cluster Key
Starting the cluster
Once the cluster configuration is defined and CloudTik is installed, you can use the following commands to create a cluster with CloudTik
$ cloudtik start /path/to/your-cluster-config.yaml -y
After that, you can also use the CloudTik commands to check the status or manage the cluster.
Please refer to Managing Cluster for detailed instructions.