gcloud_ml-engine_jobs_submit_training (1)
NAME
- gcloud ml-engine jobs submit training - submit a Cloud Machine Learning training job
SYNOPSIS
-
gcloud ml-engine jobs submit training JOB --module-name=MODULE_NAME [--config=CONFIG] [--job-dir=JOB_DIR] [--labels=[KEY=VALUE,...]] [--package-path=PACKAGE_PATH] [--packages=[PACKAGE,...]] [--python-version=PYTHON_VERSION] [--region=REGION] [--runtime-version=RUNTIME_VERSION] [--scale-tier=SCALE_TIER] [--staging-bucket=STAGING_BUCKET] [--async | --stream-logs] [GCLOUD_WIDE_FLAG ...] [-- USER_ARGS ...]
DESCRIPTION
This creates temporary files and executes Python code staged by a user on Google Cloud Storage. Model code can either be specified with a path, e.g.:
-
$ gcloud ml-engine jobs submit training my_job \
--module-name trainer.task \
--staging-bucket gs://my-bucket \
--package-path /my/code/path/trainer \
--packages additional-dep1.tar.gz,dep2.whl
Or by specifying an already built package:
-
$ gcloud ml-engine jobs submit training my_job \
--module-name trainer.task \
--staging-bucket gs://my-bucket \
--packages trainer-0.0.1.tar.gz,additional-dep1.tar.gz,dep2.whl
If --package-path=/my/code/path/trainer is specified and there is a setup.py file at /my/code/path/setup.py, the setup file will be invoked with sdist and the generated tar files will be uploaded to Cloud Storage. Otherwise, a temporary setup.py file will be generated for the build.
By default, this command runs asynchronously; it exits once the job is successfully submitted.
To follow the progress of your job, pass the --stream-logs flag (note that even with the --stream-logs flag, the job will continue to run after this command exits and must be cancelled with gcloud ml-engine jobs cancel JOB_ID).
For more information, see: cloud.google.com/ml/docs/concepts/training-overview
POSITIONAL ARGUMENTS
-
- JOB
-
Name of the job.
- [-- USER_ARGS ...]
-
Additional user arguments to be forwarded to user code
The '--' argument must be specified between gcloud specific args on the left and USER_ARGS on the right.
REQUIRED FLAGS
-
- --module-name=MODULE_NAME
-
Name of the module to run.
OPTIONAL FLAGS
-
- --config=CONFIG
-
Path to the job configuration file. This file should be a YAML document (JSON
also accepted) containing a Job resource as defined in the API (all fields are
optional): cloud.google.com/ml/reference/rest/v1/projects.jobs
EXAMPLES:
JSON:
-
{
"jobId": "my_job",
"labels": {
"type": "prod",
"owner": "alice"
},
"trainingInput": {
"scaleTier": "BASIC",
"packageUris": [
"gs://my/package/path"
],
"region": "us-east1"
} }
YAML:
-
jobId: my_job
labels:
type: prod
owner: alice trainingInput:
scaleTier: BASIC
packageUris:
- gs://my/package/path
region: us-east1
-
{
If an option is specified both in the configuration file **and** via command
line arguments, the command line arguments override the configuration file.
-
- --job-dir=JOB_DIR
-
Google Cloud Storage path in which to store training outputs and other data
needed for training.
This path will be passed to your TensorFlow program as the --job_dir command-line arg. The benefit of specifying this field is that Cloud ML Engine will validate the path for use in training.
If packages must be uploaded and --staging-bucket is not provided, this path will be used instead.
- --labels=[KEY=VALUE,...]
-
List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.
- --package-path=PACKAGE_PATH
-
Path to a Python package to build. This should point to a directory containing
the Python source for the job. It will be built using setuptools (which
must be installed) using its parent directory as context. If the parent
directory contains a setup.py file, the build will use that; otherwise, it
will use a simple built-in one.
- --packages=[PACKAGE,...]
-
Path to Python archives used for training. These can be local paths (absolute or
relative), in which case they will be uploaded to the Cloud Storage bucket given
by --staging-bucket, or Cloud Storage URLs
(gs://bucket-name/path/to/package.tar.gz).
- --python-version=PYTHON_VERSION
-
Version of Python used during training. If not set, the default version is 2.7.
Python 3.5 is available when runtime_version is set to 1.4 and above.
Python 2.7 works with all supported runtime versions.
- --region=REGION
-
Region of the machine learning training job to submit. If not specified, you may
be prompted to select a region.
To avoid prompting when this flag is omitted, you can set the compute/region property:
- $ gcloud config set compute/region REGION
A list of regions can be fetched by running:
- $ gcloud compute regions list
To unset the property, run:
- $ gcloud config unset compute/region
Alternatively, the region can be stored in the environment variable CLOUDSDK_COMPUTE_REGION.
- --runtime-version=RUNTIME_VERSION
-
Google Cloud ML Engine runtime version for this job. Defaults to a stable
version, which is defined in documentation along with the list of supported
versions:
cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list
- --scale-tier=SCALE_TIER
-
Specify the machine types, the number of replicas for workers, and parameter
servers. SCALE_TIER must be one of:
-
- basic
- Single worker instance. This tier is suitable for learning how to use Cloud ML Engine, and for experimenting with new models using small datasets.
- basic-gpu
- Single worker instance with a GPU.
- basic-tpu
- Single worker instance with a Cloud TPU.
- custom
-
CUSTOM tier is not a set tier, but rather enables you to use your own cluster
specification. When you use this tier, set values to configure your processing
cluster according to these guidelines (using the --config flag):
-
- *
- You must set TrainingInput.masterType to specify the type of machine to use for your master node. This is the only required setting.
- *
- You may set TrainingInput.workerCount to specify the number of workers to use. If you specify one or more workers, you must also set TrainingInput.workerType to specify the type of machine to use for your worker nodes.
- *
- You may set TrainingInput.parameterServerCount to specify the number of parameter servers to use. If you specify one or more parameter servers, you must also set TrainingInput.parameterServerType to specify the type of machine to use for your parameter servers. Note that all of your workers must use the same machine type, which can be different from your parameter server type and master type. Your parameter servers must likewise use the same machine type, which can be different from your worker type and master type.
-
- premium-1
- Large number of workers with many parameter servers.
- standard-1
- Many workers and a few parameter servers.
-
-
- --staging-bucket=STAGING_BUCKET
-
Bucket in which to stage training archives.
Required only if a file upload is necessary (that is, other flags include local paths) and no other flags implicitly specify an upload path.
-
At most one of these may be specified:
-
- --async
-
(DEPRECATED) Display information about the operation in progress without waiting
for the operation to complete. Enabled by default and can be omitted; use
--stream-logs to run synchronously.
- --stream-logs
-
Block until job completion and stream the logs while the job runs.
Note that even if command execution is halted, the job will still run until cancelled with
- $ gcloud ml-engine jobs cancel JOB_ID
-
GCLOUD WIDE FLAGS
These flags are available to all commands: --account, --configuration, --flags-file, --flatten, --format, --help, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity. Run $ gcloud help for details.
NOTES
These variants are also available:
- $ gcloud alpha ml-engine jobs submit training $ gcloud beta ml-engine jobs submit training