gcloud_alpha_dataproc_workflow-templates_set-managed-cluster (1)
NAME
- gcloud alpha dataproc workflow-templates set-managed-cluster - set a managed cluster for the workflow template
SYNOPSIS
-
gcloud alpha dataproc workflow-templates set-managed-cluster (TEMPLATE : --region=REGION) [--no-address] [--bucket=BUCKET] [--cluster-name=CLUSTER_NAME] [--initialization-action-timeout=TIMEOUT; default="10m"] [--initialization-actions=CLOUD_STORAGE_URI,[...]] [--labels=[KEY=VALUE,...]] [--master-accelerator=[type=TYPE,[count=COUNT],...]] [--master-boot-disk-size=MASTER_BOOT_DISK_SIZE] [--master-boot-disk-type=MASTER_BOOT_DISK_TYPE] [--master-machine-type=MASTER_MACHINE_TYPE] [--master-min-cpu-platform=PLATFORM] [--max-idle=MAX_IDLE] [--metadata=KEY=VALUE,[KEY=VALUE,...]] [--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS] [--num-masters=NUM_MASTERS] [--num-preemptible-worker-local-ssds=NUM_PREEMPTIBLE_WORKER_LOCAL_SSDS] [--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS] [--optional-components=[COMPONENT,...]] [--preemptible-worker-boot-disk-size=PREEMPTIBLE_WORKER_BOOT_DISK_SIZE] [--preemptible-worker-boot-disk-type=PREEMPTIBLE_WORKER_BOOT_DISK_TYPE] [--properties=[PREFIX:PROPERTY=VALUE,...]] [--scopes=SCOPE,[SCOPE,...]] [--service-account=SERVICE_ACCOUNT] [--tags=TAG,[TAG,...]] [--worker-accelerator=[type=TYPE,[count=COUNT],...]] [--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE] [--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE] [--worker-machine-type=WORKER_MACHINE_TYPE] [--worker-min-cpu-platform=PLATFORM] [--zone=ZONE, -z ZONE] [--expiration-time=EXPIRATION_TIME | --max-age=MAX_AGE] [--image=IMAGE | --image-version=VERSION] [--network=NETWORK | --subnet=SUBNET] [--single-node | --num-preemptible-workers=NUM_PREEMPTIBLE_WORKERS --num-workers=NUM_WORKERS] [GCLOUD_WIDE_FLAG ...]
DESCRIPTION
(ALPHA) Set a managed cluster for the workflow template.
POSITIONAL ARGUMENTS
-
-
- Template resource - The name of the workflow template to set managed cluster. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways. To set the [project] attribute: provide the argument [TEMPLATE] on the command line with a fully specified name; provide the argument [--project] on the command line; set the property [core/project]. This must be specified.
-
- TEMPLATE
-
ID of the template or fully qualified identifier for the template. This
positional must be specified if any of the other arguments in this group are
specified.
- --region=REGION
-
The Cloud DataProc region for the template. Each Cloud Dataproc region
constitutes an independent resource namespace constrained to deploying instances
into Google Compute Engine zones inside the region. The default value of
global is a special multi-region namespace which is capable of deploying
instances into all Google Compute Engine zones globally, and is disjoint from
other Cloud Dataproc regions. Overrides the default dataproc/region
property value for this command invocation.
- Template resource - The name of the workflow template to set managed cluster. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways. To set the [project] attribute: provide the argument [TEMPLATE] on the command line with a fully specified name; provide the argument [--project] on the command line; set the property [core/project]. This must be specified.
-
FLAGS
-
- --no-address
-
If provided, the instances in the cluster will not be assigned external IP
addresses.
If omitted the instances in the cluster will each be assigned an ephemeral external IP address.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved without external IP addresses using Private Google Access (cloud.google.com/compute/docs/private-google-access
- --bucket=BUCKET
-
The Google Cloud Storage bucket to use with the Google Cloud Storage connector.
A bucket is auto created when this parameter is not specified.
- --cluster-name=CLUSTER_NAME
-
The name of the managed dataproc cluster. If unspecified, the workflow template
ID will be used.
- --initialization-action-timeout=TIMEOUT; default="10m"
-
The maximum duration of each initialization action. See $ gcloud topic datetimes
for information on duration formats.
- --initialization-actions=CLOUD_STORAGE_URI,[...]
-
A list of Google Cloud Storage URIs of executables to run on each node in the
cluster.
- --labels=[KEY=VALUE,...]
-
List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.
- --master-accelerator=[type=TYPE,[count=COUNT],...]
-
Attaches accelerators (e.g. GPUs) to the master instance(s).
-
- type
-
The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator
to attach to the instances. Use 'gcloud compute accelerator-types list' to
learn about all available accelerator types.
- count
-
The number of pieces of the accelerator to attach to each of the instances. The
default value is 1.
-
- --master-boot-disk-size=MASTER_BOOT_DISK_SIZE
-
The size of the boot disk. The value must be a whole number followed by a size
unit of KB for kilobyte, MB for megabyte, GB
for gigabyte, or TB for terabyte. For example, 10GB will
produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk
size must be a multiple of 1 GB.
- --master-boot-disk-type=MASTER_BOOT_DISK_TYPE
-
The type of the boot disk. The value must be pd-standard or
pd-ssd.
- --master-machine-type=MASTER_MACHINE_TYPE
-
The type of machine to use for the master. Defaults to server-specified.
- --master-min-cpu-platform=PLATFORM
-
When specified, the VM will be scheduled on host with specified CPU architecture
or a newer one. To list available CPU platforms in given zone, run:
- $ gcloud beta compute zones describe ZONE
CPU platform selection is available only in selected zones; zones that allow CPU platform selection will have an availableCpuPlatforms field that contains the list of available CPU platforms for that zone.
You can find more information online: cloud.google.com/compute/docs/instances/specify-min-cpu-platform
- --max-idle=MAX_IDLE
-
The duration before cluster is auto-deleted after last job completes, such as
"2h" or "1d". See $ gcloud topic datetimes for information on duration formats.
- --metadata=KEY=VALUE,[KEY=VALUE,...]
-
Metadata to be made available to the guest operating system running on the
instances
- --num-master-local-ssds=NUM_MASTER_LOCAL_SSDS
-
The number of local SSDs to attach to the master in a cluster.
- --num-masters=NUM_MASTERS
-
The number of master nodes in the cluster.
Number of Masters Cluster Mode 1 Standard 3 High Availability
- --num-preemptible-worker-local-ssds=NUM_PREEMPTIBLE_WORKER_LOCAL_SSDS
-
The number of local SSDs to attach to each preemptible worker in a cluster.
- --num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS
-
The number of local SSDs to attach to each worker in a cluster.
- --optional-components=[COMPONENT,...]
-
List of optional components to be installed on cluster machines.
The following page documents the optional components that can be installed. cloud.google.com/dataproc/docs/concepts/configuring-clusters/optional-components
- --preemptible-worker-boot-disk-size=PREEMPTIBLE_WORKER_BOOT_DISK_SIZE
-
The size of the boot disk. The value must be a whole number followed by a size
unit of KB for kilobyte, MB for megabyte, GB
for gigabyte, or TB for terabyte. For example, 10GB will
produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk
size must be a multiple of 1 GB.
- --preemptible-worker-boot-disk-type=PREEMPTIBLE_WORKER_BOOT_DISK_TYPE
-
The type of the boot disk. The value must be pd-standard or
pd-ssd.
- --properties=[PREFIX:PROPERTY=VALUE,...]
-
Specifies configuration properties for installed packages, such as Hadoop and
Spark.
Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations". The following are supported prefixes and their mappings:
Prefix File Purpose of file capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration core core-site.xml Hadoop general configuration distcp distcp-default.xml Hadoop Distributed Copy configuration hadoop-env hadoop-env.sh Hadoop specific environment variables hdfs hdfs-site.xml Hadoop HDFS configuration hive hive-site.xml Hive configuration mapred mapred-site.xml Hadoop MapReduce configuration mapred-env mapred-env.sh Hadoop MapReduce specific environment variables pig pig.properties Pig configuration spark spark-defaults.conf Spark configuration spark-env spark-env.sh Spark specific environment variables yarn yarn-site.xml Hadoop YARN configuration yarn-env yarn-env.sh Hadoop YARN specific environment variables
See cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.
- --scopes=SCOPE,[SCOPE,...]
-
Specifies scopes for the node instances. Multiple SCOPEs can be specified,
separated by commas. Examples:
-
$ gcloud alpha dataproc workflow-templates set-managed-cluster \
example-cluster \
--scopes www.googleapis.com/auth/bigtable.admin
-
$ gcloud alpha dataproc workflow-templates set-managed-cluster \
example-cluster --scopes sqlservice,bigquery
The following minimum scopes are necessary for the cluster to function properly and are always added, even if not explicitly specified:
If the --scopes flag is not specified, the following default scopes are also included:
- www.googleapis.com/auth/bigquery www.googleapis.com/auth/bigtable.admin.table www.googleapis.com/auth/bigtable.data www.googleapis.com/auth/devstorage.full_control
If you want to enable all scopes use the 'cloud-platform' scope.
SCOPE can be either the full URI of the scope or an alias. default scopes are assigned to all instances. Available aliases are:
DEPRECATION WARNING: www.googleapis.com/auth/sqlservice account scope and sql alias do not provide SQL instance management capabilities and have been deprecated. Please, use www.googleapis.com/auth/sqlservice.admin or sql-admin to manage your Google SQL Service instances.
-
$ gcloud alpha dataproc workflow-templates set-managed-cluster \
- --service-account=SERVICE_ACCOUNT
-
The Google Cloud IAM service account to be authenticated as.
- --tags=TAG,[TAG,...]
-
Specifies a list of tags to apply to the instance. These tags allow network
firewall rules and routes to be applied to specified VM instances. See gcloud
compute firewall-rules create(1) for more details.
To read more about configuring network tags, read this guide: cloud.google.com/vpc/docs/add-remove-network-tags
To list instances with their respective status and tags, run:
-
$ gcloud compute instances list \
--format='table(name,status,tags.list())'
To list instances tagged with a specific tag, tag1, run:
- $ gcloud compute instances list --filter='tags:tag1'
-
$ gcloud compute instances list \
- --worker-accelerator=[type=TYPE,[count=COUNT],...]
-
Attaches accelerators (e.g. GPUs) to the worker instance(s).
Note: No accelerators will be attached to preemptible workers, because preemptible VMs do not support accelerators.
-
- type
-
The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator
to attach to the instances. Use 'gcloud compute accelerator-types list' to
learn about all available accelerator types.
- count
-
The number of pieces of the accelerator to attach to each of the instances. The
default value is 1.
-
- --worker-boot-disk-size=WORKER_BOOT_DISK_SIZE
-
The size of the boot disk. The value must be a whole number followed by a size
unit of KB for kilobyte, MB for megabyte, GB
for gigabyte, or TB for terabyte. For example, 10GB will
produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk
size must be a multiple of 1 GB.
- --worker-boot-disk-type=WORKER_BOOT_DISK_TYPE
-
The type of the boot disk. The value must be pd-standard or
pd-ssd.
- --worker-machine-type=WORKER_MACHINE_TYPE
-
The type of machine to use for workers. Defaults to server-specified.
- --worker-min-cpu-platform=PLATFORM
-
When specified, the VM will be scheduled on host with specified CPU architecture
or a newer one. To list available CPU platforms in given zone, run:
- $ gcloud beta compute zones describe ZONE
CPU platform selection is available only in selected zones; zones that allow CPU platform selection will have an availableCpuPlatforms field that contains the list of available CPU platforms for that zone.
You can find more information online: cloud.google.com/compute/docs/instances/specify-min-cpu-platform
- --zone=ZONE, -z ZONE
-
The compute zone (e.g. us-central1-a) for the cluster. If empty and --region
is set to a value other than global, the server will pick a zone in the
region. Overrides the default compute/zone property value for this command
invocation.
-
At most one of these may be specified:
-
- --expiration-time=EXPIRATION_TIME
-
The time when cluster will be auto-deleted, such as
"2017-08-29T18:52:51.142Z." See $ gcloud topic datetimes for information on
time formats.
- --max-age=MAX_AGE
-
The lifespan of the cluster before it is auto-deleted, such as "2h" or "1d".
See $ gcloud topic datetimes for information on duration formats.
-
-
At most one of these may be specified:
-
- --image=IMAGE
-
The full custom image URI or the custom image name that will be used to create a
cluster.
- --image-version=VERSION
-
The image version to use for the cluster. Defaults to the latest version.
-
-
At most one of these may be specified:
-
- --network=NETWORK
-
The Compute Engine network that the VM instances of the cluster will be part of.
This is mutually exclusive with --subnet. If neither is specified, this
defaults to the "default" network.
- --subnet=SUBNET
-
Specifies the subnet that the cluster will be part of. This is mutally exclusive
with --network.
-
-
At most one of these may be specified:
-
- --single-node
-
Create a single node cluster.
A single node cluster has all master and worker components. It cannot have any separate worker nodes. If this flag is not specified, a cluster with separate workers is created.
-
Multi-node cluster flags
-
- --num-preemptible-workers=NUM_PREEMPTIBLE_WORKERS
-
The number of preemptible worker nodes in the cluster.
- --num-workers=NUM_WORKERS
-
The number of worker nodes in the cluster. Defaults to server-specified.
-
-
GCLOUD WIDE FLAGS
These flags are available to all commands: --account, --configuration, --flags-file, --flatten, --format, --help, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity. Run $ gcloud help for details.
NOTES
This command is currently in ALPHA and may change without notice. If this command fails with API permission errors despite specifying the right project, you will have to apply for early access and have your projects registered on the API whitelist to use it. To do so, contact Support at cloud.google.com/support These variants are also available:
- $ gcloud dataproc workflow-templates set-managed-cluster $ gcloud beta dataproc workflow-templates set-managed-cluster