gcloud_alpha_dataproc_clusters

gcloud_alpha_dataproc_clusters_create (1)

NAME

: gcloud alpha dataproc clusters create - create a cluster

SYNOPSIS

gcloud alpha dataproc clusters create NAME [--no-address] [--async] [--bucket=BUCKET] [--initialization-action-timeout=TIMEOUT; default="10m"] [--initialization-actions=CLOUD_STORAGE_URI,[...]] [--labels=[KEY=VALUE,...]] [--master-accelerator=[type=TYPE,[count=COUNT],...]] [--master-boot-disk-size=MASTER_BOOT_DISK_SIZE] [--master-boot-disk-type=MASTER_BOOT_DISK_TYPE] [--master-machine-type=MASTER_MACHINE_TYPE] [--master-min-cpu-platform=PLATFORM] [--max-idle=MAX_IDLE] [--metadata=KEY=VALUE,[KEY=VALUE,...]] [--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS] [--num-masters=NUM_MASTERS] [--num-preemptible-worker-local-ssds=NUM_PREEMPTIBLE_WORKER_LOCAL_SSDS] [--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS] [--optional-components=[COMPONENT,...]] [--preemptible-worker-boot-disk-size=PREEMPTIBLE_WORKER_BOOT_DISK_SIZE] [--preemptible-worker-boot-disk-type=PREEMPTIBLE_WORKER_BOOT_DISK_TYPE] [--properties=[PREFIX:PROPERTY=VALUE,...]] [--region=REGION] [--scopes=SCOPE,[SCOPE,...]] [--service-account=SERVICE_ACCOUNT] [--tags=TAG,[TAG,...]] [--worker-accelerator=[type=TYPE,[count=COUNT],...]] [--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE] [--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE] [--worker-machine-type=WORKER_MACHINE_TYPE] [--worker-min-cpu-platform=PLATFORM] [--zone=ZONE, -z ZONE] [--expiration-time=EXPIRATION_TIME | --max-age=MAX_AGE] [--gce-pd-kms-key=GCE_PD_KMS_KEY : --gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING --gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION --gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT] [--image=IMAGE | --image-version=VERSION] [--network=NETWORK | --subnet=SUBNET] [--single-node | --num-preemptible-workers=NUM_PREEMPTIBLE_WORKERS --num-workers=NUM_WORKERS] [GCLOUD_WIDE_FLAG ...]

DESCRIPTION

(ALPHA) Create a cluster.

POSITIONAL ARGUMENTS

NAME: The name of this cluster.

FLAGS

--no-address

If provided, the instances in the cluster will not be assigned external IP addresses.
If omitted the instances in the cluster will each be assigned an ephemeral external IP address.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved without external IP addresses using Private Google Access (cloud.google.com/compute/docs/private-google-access

--async

Display information about the operation in progress, without waiting for the operation to complete.

--bucket=BUCKET

The Google Cloud Storage bucket to use with the Google Cloud Storage connector. A bucket is auto created when this parameter is not specified.

--initialization-action-timeout=TIMEOUT; default="10m"

The maximum duration of each initialization action. See $ gcloud topic datetimes for information on duration formats.

--initialization-actions=CLOUD_STORAGE_URI,[...]

A list of Google Cloud Storage URIs of executables to run on each node in the cluster.

--labels=[KEY=VALUE,...]

List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.

--master-accelerator=[type=TYPE,[count=COUNT],...]

Attaches accelerators (e.g. GPUs) to the master instance(s).

type: The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator to attach to the instances. Use 'gcloud compute accelerator-types list' to learn about all available accelerator types.
count: The number of pieces of the accelerator to attach to each of the instances. The default value is 1.

--master-boot-disk-size=MASTER_BOOT_DISK_SIZE

The size of the boot disk. The value must be a whole number followed by a size unit of KB for kilobyte, MB for megabyte, GB for gigabyte, or TB for terabyte. For example, 10GB will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB.

--master-boot-disk-type=MASTER_BOOT_DISK_TYPE

The type of the boot disk. The value must be pd-standard or pd-ssd.

--master-machine-type=MASTER_MACHINE_TYPE

The type of machine to use for the master. Defaults to server-specified.

--master-min-cpu-platform=PLATFORM

When specified, the VM will be scheduled on host with specified CPU architecture or a newer one. To list available CPU platforms in given zone, run:

: $ gcloud beta compute zones describe ZONE

CPU platform selection is available only in selected zones; zones that allow CPU platform selection will have an availableCpuPlatforms field that contains the list of available CPU platforms for that zone.
You can find more information online: cloud.google.com/compute/docs/instances/specify-min-cpu-platform

--max-idle=MAX_IDLE

The duration before cluster is auto-deleted after last job completes, such as "2h" or "1d". See $ gcloud topic datetimes for information on duration formats.

--metadata=KEY=VALUE,[KEY=VALUE,...]

Metadata to be made available to the guest operating system running on the instances

--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS

The number of local SSDs to attach to the master in a cluster.

--num-masters=NUM_MASTERS

The number of master nodes in the cluster.

Number of Masters	Cluster Mode
1	Standard
3	High Availability

--num-preemptible-worker-local-ssds=NUM_PREEMPTIBLE_WORKER_LOCAL_SSDS

The number of local SSDs to attach to each preemptible worker in a cluster.

--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS

The number of local SSDs to attach to each worker in a cluster.

--optional-components=[COMPONENT,...]

List of optional components to be installed on cluster machines.
The following page documents the optional components that can be installed. cloud.google.com/dataproc/docs/concepts/configuring-clusters/optional-components

--preemptible-worker-boot-disk-size=PREEMPTIBLE_WORKER_BOOT_DISK_SIZE

--preemptible-worker-boot-disk-type=PREEMPTIBLE_WORKER_BOOT_DISK_TYPE

The type of the boot disk. The value must be pd-standard or pd-ssd.

--properties=[PREFIX:PROPERTY=VALUE,...]

Specifies configuration properties for installed packages, such as Hadoop and Spark.
Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations". The following are supported prefixes and their mappings:

Prefix	File	Purpose of file
capacity-scheduler	capacity-scheduler.xml	Hadoop YARN Capacity Scheduler configuration
core	core-site.xml	Hadoop general configuration
distcp	distcp-default.xml	Hadoop Distributed Copy configuration
hadoop-env	hadoop-env.sh	Hadoop specific environment variables
hdfs	hdfs-site.xml	Hadoop HDFS configuration
hive	hive-site.xml	Hive configuration
mapred	mapred-site.xml	Hadoop MapReduce configuration
mapred-env	mapred-env.sh	Hadoop MapReduce specific environment variables
pig	pig.properties	Pig configuration
spark	spark-defaults.conf	Spark configuration
spark-env	spark-env.sh	Spark specific environment variables
yarn	yarn-site.xml	Hadoop YARN configuration
yarn-env	yarn-env.sh	Hadoop YARN specific environment variables

See cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.

--region=REGION

Cloud Dataproc region to use. Each Cloud Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. The default value of global is a special multi-region namespace which is capable of deploying instances into all Compute Engine zones globally, and is disjoint from other Cloud Dataproc regions. Overrides the default dataproc/region property value for this command invocation.

--scopes=SCOPE,[SCOPE,...]

Specifies scopes for the node instances. Multiple SCOPEs can be specified, separated by commas. Examples:

: $ gcloud alpha dataproc clusters create example-cluster \
--scopes www.googleapis.com/auth/bigtable.admin

: $ gcloud alpha dataproc clusters create example-cluster \
--scopes sqlservice,bigquery

The following minimum scopes are necessary for the cluster to function properly and are always added, even if not explicitly specified:

: www.googleapis.com/auth/devstorage.read_write www.googleapis.com/auth/logging.write

If the --scopes flag is not specified, the following default scopes are also included:

: www.googleapis.com/auth/bigquery www.googleapis.com/auth/bigtable.admin.table www.googleapis.com/auth/bigtable.data www.googleapis.com/auth/devstorage.full_control

If you want to enable all scopes use the 'cloud-platform' scope.
SCOPE can be either the full URI of the scope or an alias. default scopes are assigned to all instances. Available aliases are:

Alias	URI
bigquery	www.googleapis.com/auth/bigquery
cloud-platform	www.googleapis.com/auth/cloud-platform
cloud-source-repos	www.googleapis.com/auth/source.full_control
cloud-source-repos-ro	www.googleapis.com/auth/source.read_only
compute-ro	www.googleapis.com/auth/compute.readonly
compute-rw	www.googleapis.com/auth/compute
datastore	www.googleapis.com/auth/datastore
default	www.googleapis.com/auth/devstorage.read_only
	www.googleapis.com/auth/logging.write
	www.googleapis.com/auth/monitoring.write
	www.googleapis.com/auth/pubsub
	www.googleapis.com/auth/service.management.readonly
	www.googleapis.com/auth/servicecontrol
	www.googleapis.com/auth/trace.append
gke-default	www.googleapis.com/auth/devstorage.read_only
	www.googleapis.com/auth/logging.write
	www.googleapis.com/auth/monitoring
	www.googleapis.com/auth/service.management.readonly
	www.googleapis.com/auth/servicecontrol
	www.googleapis.com/auth/trace.append
logging-write	www.googleapis.com/auth/logging.write
monitoring	www.googleapis.com/auth/monitoring
monitoring-write	www.googleapis.com/auth/monitoring.write
pubsub	www.googleapis.com/auth/pubsub
service-control	www.googleapis.com/auth/servicecontrol
service-management	www.googleapis.com/auth/service.management.readonly
sql (deprecated)	www.googleapis.com/auth/sqlservice
sql-admin	www.googleapis.com/auth/sqlservice.admin
storage-full	www.googleapis.com/auth/devstorage.full_control
storage-ro	www.googleapis.com/auth/devstorage.read_only
storage-rw	www.googleapis.com/auth/devstorage.read_write
taskqueue	www.googleapis.com/auth/taskqueue
trace	www.googleapis.com/auth/trace.append
userinfo-email	www.googleapis.com/auth/userinfo.email

DEPRECATION WARNING: www.googleapis.com/auth/sqlservice account scope and sql alias do not provide SQL instance management capabilities and have been deprecated. Please, use www.googleapis.com/auth/sqlservice.admin or sql-admin to manage your Google SQL Service instances.

--service-account=SERVICE_ACCOUNT

The Google Cloud IAM service account to be authenticated as.

--tags=TAG,[TAG,...]

Specifies a list of tags to apply to the instance. These tags allow network firewall rules and routes to be applied to specified VM instances. See gcloud compute firewall-rules create(1) for more details.
To read more about configuring network tags, read this guide: cloud.google.com/vpc/docs/add-remove-network-tags
To list instances with their respective status and tags, run:

: $ gcloud compute instances list \
--format='table(name,status,tags.list())'

To list instances tagged with a specific tag, tag1, run:

: $ gcloud compute instances list --filter='tags:tag1'

--worker-accelerator=[type=TYPE,[count=COUNT],...]

Attaches accelerators (e.g. GPUs) to the worker instance(s).
Note: No accelerators will be attached to preemptible workers, because preemptible VMs do not support accelerators.

type: The specific type (e.g. nvidia-tesla-k80 for nVidia Tesla K80) of accelerator to attach to the instances. Use 'gcloud compute accelerator-types list' to learn about all available accelerator types.
count: The number of pieces of the accelerator to attach to each of the instances. The default value is 1.

--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE

--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE

The type of the boot disk. The value must be pd-standard or pd-ssd.

--worker-machine-type=WORKER_MACHINE_TYPE

The type of machine to use for workers. Defaults to server-specified.

--worker-min-cpu-platform=PLATFORM

When specified, the VM will be scheduled on host with specified CPU architecture or a newer one. To list available CPU platforms in given zone, run:

: $ gcloud beta compute zones describe ZONE

--zone=ZONE, -z ZONE

The compute zone (e.g. us-central1-a) for the cluster. If empty and --region is set to a value other than global, the server will pick a zone in the region. Overrides the default compute/zone property value for this command invocation.

At most one of these may be specified:

--expiration-time=EXPIRATION_TIME: The time when cluster will be auto-deleted, such as "2017-08-29T18:52:51.142Z." See $ gcloud topic datetimes for information on time formats.
--max-age=MAX_AGE: The lifespan of the cluster before it is auto-deleted, such as "2h" or "1d". See $ gcloud topic datetimes for information on duration formats.

Key resource - The Cloud KMS (Key Management Service) cryptokey that will be used to protect the cluster. The 'Compute Engine Service Agent' service account must hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments in this group can be used to specify the attributes of this resource.

--gce-pd-kms-key=GCE_PD_KMS_KEY: ID of the key or fully qualified identifier for the key. This flag must be specified if any of the other arguments in this group are specified.
--gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING: The KMS keyring of the key.
--gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION: The Cloud location for the key.
--gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT: The Cloud project for the key.

At most one of these may be specified:

--image=IMAGE: The full custom image URI or the custom image name that will be used to create a cluster.
--image-version=VERSION: The image version to use for the cluster. Defaults to the latest version.

At most one of these may be specified:

--network=NETWORK: The Compute Engine network that the VM instances of the cluster will be part of. This is mutually exclusive with --subnet. If neither is specified, this defaults to the "default" network.
--subnet=SUBNET: Specifies the subnet that the cluster will be part of. This is mutally exclusive with --network.

At most one of these may be specified:

--single-node

Create a single node cluster.
A single node cluster has all master and worker components. It cannot have any separate worker nodes. If this flag is not specified, a cluster with separate workers is created.

Multi-node cluster flags

--num-preemptible-workers=NUM_PREEMPTIBLE_WORKERS: The number of preemptible worker nodes in the cluster.
--num-workers=NUM_WORKERS: The number of worker nodes in the cluster. Defaults to server-specified.

GCLOUD WIDE FLAGS

These flags are available to all commands: --account, --configuration, --flags-file, --flatten, --format, --help, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity. Run $ gcloud help for details.

EXAMPLES

To create a cluster, run:

: $ gcloud alpha dataproc clusters create my_cluster

NOTES

This command is currently in ALPHA and may change without notice. If this command fails with API permission errors despite specifying the right project, you will have to apply for early access and have your projects registered on the API whitelist to use it. To do so, contact Support at cloud.google.com/support These variants are also available:

: $ gcloud dataproc clusters create $ gcloud beta dataproc clusters create

gcloud_alpha_dataproc_clusters_create • man page

gcloud_alpha_dataproc_clusters_create • man page