gcloud_dataproc_clusters_create (1)
NAME
- gcloud dataproc clusters create - create a cluster
SYNOPSIS
-
gcloud dataproc clusters create NAME [--no-address] [--async] [--bucket=BUCKET] [--initialization-action-timeout=TIMEOUT; default="10m"] [--initialization-actions=CLOUD_STORAGE_URI,[...]] [--labels=[KEY=VALUE,...]] [--master-boot-disk-size=MASTER_BOOT_DISK_SIZE] [--master-boot-disk-type=MASTER_BOOT_DISK_TYPE] [--master-machine-type=MASTER_MACHINE_TYPE] [--metadata=KEY=VALUE,[KEY=VALUE,...]] [--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS] [--num-masters=NUM_MASTERS] [--num-preemptible-worker-local-ssds=NUM_PREEMPTIBLE_WORKER_LOCAL_SSDS] [--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS] [--preemptible-worker-boot-disk-size=PREEMPTIBLE_WORKER_BOOT_DISK_SIZE] [--preemptible-worker-boot-disk-type=PREEMPTIBLE_WORKER_BOOT_DISK_TYPE] [--properties=[PREFIX:PROPERTY=VALUE,...]] [--region=REGION] [--scopes=SCOPE,[SCOPE,...]] [--service-account=SERVICE_ACCOUNT] [--tags=TAG,[TAG,...]] [--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE] [--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE] [--worker-machine-type=WORKER_MACHINE_TYPE] [--zone=ZONE, -z ZONE] [--gce-pd-kms-key=GCE_PD_KMS_KEY : --gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING --gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION --gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT] [--image=IMAGE | --image-version=VERSION] [--network=NETWORK | --subnet=SUBNET] [--single-node | --num-preemptible-workers=NUM_PREEMPTIBLE_WORKERS --num-workers=NUM_WORKERS] [GCLOUD_WIDE_FLAG ...]
DESCRIPTION
POSITIONAL ARGUMENTS
-
- NAME
-
The name of this cluster.
FLAGS
-
- --no-address
-
If provided, the instances in the cluster will not be assigned external IP
addresses.
If omitted the instances in the cluster will each be assigned an ephemeral external IP address.
Note: Dataproc VMs need access to the Dataproc API. This can be achieved without external IP addresses using Private Google Access (cloud.google.com/compute/docs/private-google-access
- --async
-
Display information about the operation in progress, without waiting for the
operation to complete.
- --bucket=BUCKET
-
The Google Cloud Storage bucket to use with the Google Cloud Storage connector.
A bucket is auto created when this parameter is not specified.
- --initialization-action-timeout=TIMEOUT; default="10m"
-
The maximum duration of each initialization action. See $ gcloud topic datetimes
for information on duration formats.
- --initialization-actions=CLOUD_STORAGE_URI,[...]
-
A list of Google Cloud Storage URIs of executables to run on each node in the
cluster.
- --labels=[KEY=VALUE,...]
-
List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.
- --master-boot-disk-size=MASTER_BOOT_DISK_SIZE
-
The size of the boot disk. The value must be a whole number followed by a size
unit of KB for kilobyte, MB for megabyte, GB
for gigabyte, or TB for terabyte. For example, 10GB will
produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk
size must be a multiple of 1 GB.
- --master-boot-disk-type=MASTER_BOOT_DISK_TYPE
-
The type of the boot disk. The value must be pd-standard or
pd-ssd.
- --master-machine-type=MASTER_MACHINE_TYPE
-
The type of machine to use for the master. Defaults to server-specified.
- --metadata=KEY=VALUE,[KEY=VALUE,...]
-
Metadata to be made available to the guest operating system running on the
instances
- --num-master-local-ssds=NUM_MASTER_LOCAL_SSDS
-
The number of local SSDs to attach to the master in a cluster.
- --num-masters=NUM_MASTERS
-
The number of master nodes in the cluster.
Number of Masters Cluster Mode 1 Standard 3 High Availability
- --num-preemptible-worker-local-ssds=NUM_PREEMPTIBLE_WORKER_LOCAL_SSDS
-
The number of local SSDs to attach to each preemptible worker in a cluster.
- --num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS
-
The number of local SSDs to attach to each worker in a cluster.
- --preemptible-worker-boot-disk-size=PREEMPTIBLE_WORKER_BOOT_DISK_SIZE
-
The size of the boot disk. The value must be a whole number followed by a size
unit of KB for kilobyte, MB for megabyte, GB
for gigabyte, or TB for terabyte. For example, 10GB will
produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk
size must be a multiple of 1 GB.
- --preemptible-worker-boot-disk-type=PREEMPTIBLE_WORKER_BOOT_DISK_TYPE
-
The type of the boot disk. The value must be pd-standard or
pd-ssd.
- --properties=[PREFIX:PROPERTY=VALUE,...]
-
Specifies configuration properties for installed packages, such as Hadoop and
Spark.
Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations". The following are supported prefixes and their mappings:
Prefix File Purpose of file capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration core core-site.xml Hadoop general configuration distcp distcp-default.xml Hadoop Distributed Copy configuration hadoop-env hadoop-env.sh Hadoop specific environment variables hdfs hdfs-site.xml Hadoop HDFS configuration hive hive-site.xml Hive configuration mapred mapred-site.xml Hadoop MapReduce configuration mapred-env mapred-env.sh Hadoop MapReduce specific environment variables pig pig.properties Pig configuration spark spark-defaults.conf Spark configuration spark-env spark-env.sh Spark specific environment variables yarn yarn-site.xml Hadoop YARN configuration yarn-env yarn-env.sh Hadoop YARN specific environment variables
See cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.
- --region=REGION
-
Cloud Dataproc region to use. Each Cloud Dataproc region constitutes an
independent resource namespace constrained to deploying instances into Compute
Engine zones inside the region. The default value of global is a special
multi-region namespace which is capable of deploying instances into all Compute
Engine zones globally, and is disjoint from other Cloud Dataproc regions.
Overrides the default dataproc/region property value for this command
invocation.
- --scopes=SCOPE,[SCOPE,...]
-
Specifies scopes for the node instances. Multiple SCOPEs can be specified,
separated by commas. Examples:
-
$ gcloud dataproc clusters create example-cluster \
--scopes www.googleapis.com/auth/bigtable.admin
-
$ gcloud dataproc clusters create example-cluster \
--scopes sqlservice,bigquery
The following minimum scopes are necessary for the cluster to function properly and are always added, even if not explicitly specified:
If the --scopes flag is not specified, the following default scopes are also included:
- www.googleapis.com/auth/bigquery www.googleapis.com/auth/bigtable.admin.table www.googleapis.com/auth/bigtable.data www.googleapis.com/auth/devstorage.full_control
If you want to enable all scopes use the 'cloud-platform' scope.
SCOPE can be either the full URI of the scope or an alias. default scopes are assigned to all instances. Available aliases are:
DEPRECATION WARNING: www.googleapis.com/auth/sqlservice account scope and sql alias do not provide SQL instance management capabilities and have been deprecated. Please, use www.googleapis.com/auth/sqlservice.admin or sql-admin to manage your Google SQL Service instances.
-
$ gcloud dataproc clusters create example-cluster \
- --service-account=SERVICE_ACCOUNT
-
The Google Cloud IAM service account to be authenticated as.
- --tags=TAG,[TAG,...]
-
Specifies a list of tags to apply to the instance. These tags allow network
firewall rules and routes to be applied to specified VM instances. See gcloud
compute firewall-rules create(1) for more details.
To read more about configuring network tags, read this guide: cloud.google.com/vpc/docs/add-remove-network-tags
To list instances with their respective status and tags, run:
-
$ gcloud compute instances list \
--format='table(name,status,tags.list())'
To list instances tagged with a specific tag, tag1, run:
- $ gcloud compute instances list --filter='tags:tag1'
-
$ gcloud compute instances list \
- --worker-boot-disk-size=WORKER_BOOT_DISK_SIZE
-
The size of the boot disk. The value must be a whole number followed by a size
unit of KB for kilobyte, MB for megabyte, GB
for gigabyte, or TB for terabyte. For example, 10GB will
produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk
size must be a multiple of 1 GB.
- --worker-boot-disk-type=WORKER_BOOT_DISK_TYPE
-
The type of the boot disk. The value must be pd-standard or
pd-ssd.
- --worker-machine-type=WORKER_MACHINE_TYPE
-
The type of machine to use for workers. Defaults to server-specified.
- --zone=ZONE, -z ZONE
-
The compute zone (e.g. us-central1-a) for the cluster. If empty and --region
is set to a value other than global, the server will pick a zone in the
region. Overrides the default compute/zone property value for this command
invocation.
-
Key resource - The Cloud KMS (Key Management Service) cryptokey that will be
used to protect the cluster. The 'Compute Engine Service Agent' service account
must hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments in
this group can be used to specify the attributes of this resource.
-
- --gce-pd-kms-key=GCE_PD_KMS_KEY
-
ID of the key or fully qualified identifier for the key. This flag must be
specified if any of the other arguments in this group are specified.
- --gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING
-
The KMS keyring of the key.
- --gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION
-
The Cloud location for the key.
- --gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT
-
The Cloud project for the key.
-
-
At most one of these may be specified:
-
- --image=IMAGE
-
The full custom image URI or the custom image name that will be used to create a
cluster.
- --image-version=VERSION
-
The image version to use for the cluster. Defaults to the latest version.
-
-
At most one of these may be specified:
-
- --network=NETWORK
-
The Compute Engine network that the VM instances of the cluster will be part of.
This is mutually exclusive with --subnet. If neither is specified, this
defaults to the "default" network.
- --subnet=SUBNET
-
Specifies the subnet that the cluster will be part of. This is mutally exclusive
with --network.
-
-
At most one of these may be specified:
-
- --single-node
-
Create a single node cluster.
A single node cluster has all master and worker components. It cannot have any separate worker nodes. If this flag is not specified, a cluster with separate workers is created.
-
Multi-node cluster flags
-
- --num-preemptible-workers=NUM_PREEMPTIBLE_WORKERS
-
The number of preemptible worker nodes in the cluster.
- --num-workers=NUM_WORKERS
-
The number of worker nodes in the cluster. Defaults to server-specified.
-
-
GCLOUD WIDE FLAGS
These flags are available to all commands: --account, --configuration, --flags-file, --flatten, --format, --help, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity. Run $ gcloud help for details.
EXAMPLES
To create a cluster, run:
- $ gcloud dataproc clusters create my_cluster
NOTES
These variants are also available:
- $ gcloud alpha dataproc clusters create $ gcloud beta dataproc clusters create