Managing Cluster
After a cluster is created, you can use CloudTik to manage the cluster and submit jobs.
Status and Information
Use the following commands to show various cluster information.
# Check cluster status with:
cloudtik status /path/to/your-cluster-config.yaml
# Show cluster summary information and useful links to connect to cluster web UI.
cloudtik info /path/to/your-cluster-config.yaml
cloudtik head-ip /path/to/your-cluster-config.yaml
cloudtik worker-ips /path/to/your-cluster-config.yaml
cloudtik process-status /path/to/your-cluster-config.yaml
cloudtik monitor /path/to/your-cluster-config.yaml
cloudtik debug-status /path/to/your-cluster-config.yaml
cloudtik health-check /path/to/your-cluster-config.yaml
Here are examples to execute these CloudTik CLI commands on GCP clusters.
$ cloudtik status /path/to/your-cluster-config.yaml
Total 2 nodes. 2 nodes are ready
+---------------------+----------+-----------+-------------+---------------+---------------+-----------------+
| node-id | node-ip | node-type | node-status | instance-type | public-ip | instance-status |
+---------------------+----------+-----------+-------------+---------------+---------------+-----------------+
| 491xxxxxxxxxxxxxxxxx| 10.0.x.x | head | up-to-date | n2-standard-4 | 23.xxx.xx.xxx | RUNNING |
| 812xxxxxxxxxxxxxxxx | 10.0.x.x | worker | up-to-date | n2-standard-4 | None | RUNNING |
+---------------------+----------+-----------+-------------+---------------+---------------+-----------------+
Show cluster summary information and useful links to connect to cluster web UI.
$ cloudtik info /path/to/your-cluster-config.yaml
Cluster small is: RUNNING
1 worker(s) are running
Runtimes: prometheus, spark
The total worker CPUs: 4.
The total worker memory: 16.0GB.
Key information:
Cluster private key file: *.pem
Please keep the cluster private key file safe.
Useful commands:
Check cluster status with:
...
Show debug status of cluster scaling.
$ cloudtik debug-status /path/to/your-cluster-config.yaml
======== Cluster Scaler status: 2022-05-12 08:46:49.897707 ========
Node status
-------------------------------------------------------------------
Healthy:
1 head-default
1 worker-default
Pending:
(no pending nodes)
Recent failures:
(no failures)
Check if this cluster is healthy.
$ cloudtik health-check /path/to/your-cluster-config.yaml
Cluster is healthy.
Attach to Cluster Nodes
Connect to a terminal of cluster head node.
cloudtik attach /path/to/your-cluster-config.yaml
Then you will log in to head node of the cluster vis SSH as below
$ cloudtik attach /path/to/your-cluster-config.yaml
(base) ubuntu@cloudtik-example-head-a7xxxxxx-compute:~$
Log in to a worker node with --node-ip
as below.
$ cloudtik attach --node-ip 10.0.x.x /path/to/your-cluster-config.yaml
(base) ubuntu@cloudtik-example-worker-150xxxxx-compute:~$
Execute and Submit Jobs
Execute a command via SSH on head node.
cloudtik exec /path/to/your-cluster-config.yaml [CMD]
For example, list the items under $USER directory as below.
$ cloudtik exec /path/to/your-cluster-config.yaml ls
anaconda3 cloudtik_bootstrap_config.yaml cloudtik_bootstrap_key.pem jupyter runtime
Execute commands on specified worker node
cloudtik exec --node-ip x.x.x.x /path/to/your-cluster-config.yaml [CMD]
Execute commands on all nodes
cloudtik exec --all-nodes /path/to/your-cluster-config.yaml [CMD]
Submit job to cluster to run.
cloudtik submit [OPTIONS] /path/to/your-cluster-config.yaml SCRIPT [SCRIPT_ARGS]
Run TPC-DS on Spark cluster
Here is an example of how to use cloudtik submit
to run TPC-DS on the created cluster.
1. Creating a cluster
To generate data and run TPC-DS on a cluster, some extra tools need be installed to nodes within cluster setup steps. We provide a script to simplify the installation of these dependencies. You only need to add the following bootstrap_commands to the cluster configuration file.
bootstrap_commands:
- wget -O ~/bootstrap-benchmark.sh https://raw.githubusercontent.com/oap-project/cloudtik/main/tools/benchmarks/spark/scripts/bootstrap-benchmark.sh &&
bash ~/bootstrap-benchmark.sh --tpcds
2. Generating data
We provided the datagen scala script which can be found from CloudTik’s ./tools/benchmarks/spark/scripts/tpcds-datagen.scala
for you to generate data in different size.
Execute the following command to submit and run the script of generating data after the cluster’s all nodes are ready.
cloudtik submit /path/to/your-cluster-config.yaml $CLOUTIK_HOME/tools/benchmarks/spark/scripts/tpcds-datagen.scala --conf spark.driver.scaleFactor=1 --conf spark.driver.fsdir="s3a://s3_bucket_name" --jars \$HOME/runtime/benchmark-tools/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar
$CLOUTIK_HOME/tools/benchmarks/spark/scripts/tpcds-datagen.scala
is the script’s location on your working node.
spark.driver.scaleFactor=1
is to generate 1 GB data, you can change it by case.
spark.driver.fsdir="s3a://s3_bucket_name"
is to specify S3 bucket name, change it to your bucket link of cloud storage.
--jars \$HOME/runtime/benchmark-tools/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar
specifies the default path of spark-sql-perf jar when the cluster nodes are set up, just leave it untouched.
3. Run TPC-DS power test
We provided the power test scala script which can be found from CloudTik’s ./tools/benchmarks/spark/scripts/tpcds-power-test.scala
for users to run TPC-DS power test with Cloudtik cluster.
Execute the following command to submit and run the power test script on the cluster,
cloudtik submit /path/to/your-cluster-config.yaml $CLOUTIK_HOME/tools/benchmarks/spark/scripts/tpcds-power-test.scala --conf spark.driver.scaleFactor=1 --conf spark.driver.fsdir="s3a://s3_bucket_name" --conf spark.sql.shuffle.partitions=\$(cloudtik head info --worker-cpus) --conf spark.driver.iterations=1 --jars \$HOME/runtime/benchmark-tools/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar
Manage Files
Upload files or directories to cluster
Use rsync-up command to upload files to cluster.
cloudtik rsync-up /path/to/your-cluster-config.yaml [source] [target]
If not –node-ip parameter is specified, it will default upload to the head node. You can also upload files to all the nodes by specify the –all-nodes parameter.
For target path, if you want to refer to the remote user home path, make sure you use single quote for the path with ~ referring to the remote home. For example, ‘~/runtime’. If double quote is used with ~ in the target path, it will be replaced with the local user home instead of the remote user home.
For target path, if the path is not an absolute path, it will be relative to the current working directory which is the remote user home. For example, “runtime” will refer the same path as ‘~/runtime’.
Download files or directories from cluster
Use rsync-down command to download files from cluster.
cloudtik rsync-down /path/to/your-cluster-config.yaml [source] [target]
Start or Stop Runtime Services
cloudtik runtime start /path/to/your-cluster-config.yaml
cloudtik runtime stop /path/to/your-cluster-config.yaml
Scale Up or Scale Down Cluster
Scale up the cluster with a specific number cpus or nodes.
Scale up the cluster by specifying --cpus
as below.
$ cloudtik scale --cpus 8 /path/to/your-cluster-config.yaml
Are you sure that you want to scale cluster small to 8 worker CPUs? Confirm [y/N]: y
Shared connection to 23.xxx.xx.xxx closed.
$ cloudtik info /path/to/your-cluster-config.yaml
Cluster small is: RUNNING
2 worker(s) are running
Runtimes: prometheus, spark
The total worker CPUs: 8.
The total worker memory: 32.0GB.
...
Scale up the cluster by specifying --workers
as below.
$ cloudtik scale --workers 3 /path/to/your-cluster-config.yaml
Are you sure that you want to scale cluster small to 3 workers? Confirm [y/N]: y
Shared connection to 23.xxx.xx.xxx closed.
$ cloudtik info /path/to/your-cluster-config.yaml
Cluster small is: RUNNING
3 worker(s) are running
Runtimes: prometheus, spark
The total worker CPUs: 12.
The total worker memory: 48.0GB.
Key information:
...
Access the Web UI
The SOCKS5 proxy to access the cluster Web UI from local browsers:
localhost:6001
Prometheus:
http://<head-internal-ip>:9090
YARN ResourceManager Web UI:
http://<head-internal-ip>:8088
Spark History Server Web UI:
http://<head-internal-ip>:18080
Jupyter Web UI:
http://<head-internal-ip>:8888, default password is 'cloudtik'
For more information as to the commands, you can use cloudtik --help
or cloudtik [command] --help
to get detailed instructions.