gcloud_ml-engine_local_train (1)
NAME
- gcloud ml-engine local train - run a Cloud ML Engine training job locally
SYNOPSIS
-
gcloud ml-engine local train --module-name=MODULE_NAME [--distributed] [--job-dir=JOB_DIR] [--package-path=PACKAGE_PATH] [--parameter-server-count=PARAMETER_SERVER_COUNT] [--start-port=START_PORT; default=27182] [--worker-count=WORKER_COUNT] [GCLOUD_WIDE_FLAG ...] [-- USER_ARGS ...]
DESCRIPTION
This is especially useful in the case of testing distributed models, as it allows you to validate that you are properly interacting with the Cloud ML Engine cluster configuration. If your model expects a specific number of parameter servers or workers (i.e. you expect to use the CUSTOM machine type), use the --parameter-server-count and --worker-count flags to further specify the desired cluster configuration, just as you would in your cloud training job configuration:
-
$ gcloud ml-engine local train --module-name trainer.task \
--package-path /path/to/my/code/trainer \
--distributed \
--parameter-server-count 4 \
--worker-count 8
Unlike submitting a training job, the --package-path parameter can be omitted, and will use your current working directory.
POSITIONAL ARGUMENTS
-
- [-- USER_ARGS ...]
-
Additional user arguments to be forwarded to user code. Any relative paths will
be relative to the parent directory of --package-path.
The '--' argument must be specified between gcloud specific args on the left and USER_ARGS on the right.
REQUIRED FLAGS
-
- --module-name=MODULE_NAME
-
Name of the module to run.
OPTIONAL FLAGS
-
- --distributed
-
Runs the provided code in distributed mode by providing cluster configurations
as environment variables to subprocesses
- --job-dir=JOB_DIR
-
Google Cloud Storage path or local_directory in which to store training outputs
and other data needed for training.
This path will be passed to your TensorFlow program as the --job_dir command-line arg. The benefit of specifying this field is that Cloud ML Engine will validate the path for use in training.
- --package-path=PACKAGE_PATH
-
Path to a Python package to build. This should point to a directory containing
the Python source for the job. It will be built using setuptools (which
must be installed) using its parent directory as context. If the parent
directory contains a setup.py file, the build will use that; otherwise, it
will use a simple built-in one.
- --parameter-server-count=PARAMETER_SERVER_COUNT
-
Number of parameter servers with which to run. Ignored if --distributed is not
specified. Default: 2
- --start-port=START_PORT; default=27182
-
Start of the range of ports reserved by the local cluster. This command will use
a contiguous block of ports equal to parameter-server-count + worker-count +
1.
If --distributed is not specified, this flag is ignored.
- --worker-count=WORKER_COUNT
-
Number of workers with which to run. Ignored if --distributed is not
specified. Default: 2
GCLOUD WIDE FLAGS
These flags are available to all commands: --account, --configuration, --flags-file, --flatten, --format, --help, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity. Run $ gcloud help for details.
NOTES
These variants are also available:
- $ gcloud alpha ml-engine local train $ gcloud beta ml-engine local train