pocketsphinx_batch (1)
NAME
pocketsphinx_batch - Run speech recognition in batch modeSYNOPSIS
pocketsphinx_batch -hmm hmmdir -dict dictfile [ options ]...DESCRIPTION
Run speech recognition over a list of utterances in batchmode. A list of arguments follows:
- -adchdr
- Size of audio file header in bytes (headers are ignored)
- -adcin
- Input is raw audio data
- -agc
- Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
- -agcthresh
- Initial threshold for automatic gain control
- -allphone
- phoneme decoding with phonetic lm
- -allphone_ci
- Perform phoneme decoding with phonetic lm and context-independent units only
- -alpha
- Preemphasis parameter
- -argfile
- file giving extra arguments.
- -ascale
- Inverse of acoustic model scale for confidence score calculation
- -aw
- Inverse weight applied to acoustic scores.
- -backtrace
- Print results and backtraces to log file.
- -beam
- Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
- -bestpath
- Run bestpath (Dijkstra) search over word lattice (3rd pass)
- -bestpathlw
- Language model probability weight for bestpath search
- -build_outdirs
- Create missing subdirectories in output directory
- -cepdir
- files directory (prefixed to filespecs in control file)
- -cepext
- Input files extension (suffixed to filespecs in control file)
- -ceplen
- Number of components in the input feature vector
- -cmn
- Cepstral mean normalization scheme ('current', 'prior', or 'none')
- -cmninit
- Initial values (comma-separated) for cepstral mean when 'prior' is used
- -compallsen
- Compute all senone scores in every frame (can be faster when there are many senones)
- -ctl
- file listing utterances to be processed
- -ctlcount
- No. of utterances to be processed (after skipping -ctloffset entries)
- -ctlincr
- Do every Nth line in the control file
- -ctloffset
- No. of utterances at the beginning of -ctl file to be skipped
- -ctm
- output in CTM file format (may require post-sorting)
- -debug
- level for debugging messages
- -dict
- pronunciation dictionary (lexicon) input file
- -dictcase
- Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
- -dither
- Add 1/2-bit noise
- -doublebw
- Use double bandwidth filters (same center freq)
- -ds
- Frame GMM computation downsampling ratio
- -fdict
- word pronunciation dictionary input file
- -feat
- Feature stream type, depends on the acoustic model
- -featparams
- containing feature extraction parameters.
- -fillprob
- Filler word transition probability
- -frate
- Frame rate
- -fsg
- format finite state grammar file
- -fsgctl
- file listing FSG file to use for each utterance
- -fsgdir
- directory for FSG files
- -fsgext
- extension for FSG files (including leading dot)
- -fsgusealtpron
- Add alternate pronunciations to FSG
- -fsgusefiller
- Insert filler words at each state.
- -fwdflat
- Run forward flat-lexicon search over word lattice (2nd pass)
- -fwdflatbeam
- Beam width applied to every frame in second-pass flat search
- -fwdflatefwid
- Minimum number of end frames for a word to be searched in fwdflat search
- -fwdflatlw
- Language model probability weight for flat lexicon (2nd pass) decoding
- -fwdflatsfwin
- Window of frames in lattice to search for successor words in fwdflat search
- -fwdflatwbeam
- Beam width applied to word exits in second-pass flat search
- -fwdtree
- Run forward lexicon-tree search (1st pass)
- -hmm
- containing acoustic model files.
- -hyp
- output file name
- -hypseg
- output with segmentation file name
- -input_endian
- Endianness of input data, big or little, ignored if NIST or MS Wav
- -jsgf
- grammar file
- -keyphrase
- to spot
- -kws
- file with keyphrases to spot, one per line
- -kws_delay
- Delay to wait for best detection score
- -kws_plp
- Phone loop probability for keyword spotting
- -kws_threshold
- Threshold for p(hyp)/p(alternatives) ratio
- -latsize
- Initial backpointer table size
- -lda
- containing transformation matrix to be applied to features (single-stream features only)
- -ldadim
- Dimensionality of output of feature transformation (0 to use entire matrix)
- -lifter
- Length of sin-curve for liftering, or 0 for no liftering.
- -lm
- trigram language model input file
- -lmctl
- a set of language model
The -hmm and -dict arguments are always required. Either -lm or -fsg is required, depending on whether you are using a statistical language model or a finite-state grammar. To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line. Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument).
If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format:
- FILENAME START-FRAME END-FRAME UTTERANCE-ID