Julius Book
Julius Book
Julius Book
Recognition Engine
Julius
rev. 3.2
(2001/12/03)
Julius
Julius is a high performance continuous speech recognition software based on word N-grams. It
is able to perform recognition at the sentence level with a vocabulary in the tens of thousands.
Julius realizes high-speed speech recognition on a typical desktop PC. It performs at near real
time and has a recognition rate of above 90% for a 20,000-word vocabulary dictation task.
The best feature of the Julius system is that it is multipurpose. By recombining the pronunciation
dictionary, language and acoustic models one is able to build various task specific systems. The
Julius code also is open source so one should be able to recompile the system for other
platforms or alter the code for one's specific needs.
Platforms currently supported include Linux, Solaris and other versions of Unix, and Windows.
There are two Windows versions, a Microsoft SAPI 5.0 compatible version and a Windows DLL
version.
This documentation relates to the Unix version of Julius. For documentation on the Windows
versions see the Julius for SAPI README (CSRC CD-ROM) (Japanese), or the Julius for SAPI
homepage (Kyoto University) (Japanese).
1
Contacts/Links
Contacts/Links
Developers
2
System Structure and Features
System Structure
The structure of the Julius speech recognition system is shown in the diagram below:
Main Features
3
Rev.3.2 New Features
All new features in version 3.2 are off by default. If you do not specifically set these options at run time
In the first pass the input is segmented by short pauses, the second pass sequential decodes these
segments and slots the results in appropriately. In the first pass when a short pause (sp) has the
maximum likelihood at a certain point in time, a break is placed at that point and the second pass is
executed on that utterance segment. When this occurs, word constraints are preserved as the context
within a utterance segment may continue over to the next utterance.
Using this feature one input file with multiple sentences can be decoded, thus it is not necessary to
pre-process long speech files into sentence length utterances, files of any length can be used.
*At this time this option cannot be used with microphone input, only with input files (speech files,
feature parameter files).
This option is by default off. Use the compile time option "--enable-sp-segment" to enable it.
A method for the high-speed calculation of gaussian mixture likelihoods has been included. For each
frame monophones likelihoods are calculated, and only for those with high likelihoods are triphones
likelihoods calculated.
The default is off. The startup option "-gshmm" can be used to select the monophone model to use for
mixture selection. "-gsnum" is used to set the number of mixtures to test (From all monophone states only
the best N are selected). The monophone model used for GMS is created from a conventional
monophone model using the attached tool mkgshmm.
4
Transparent word processing (Julius)
When performing N-gram calculations transparent word processing has been implemented. Specified
transparent words can be skipped within a contextual frame. Transparent words can be defined within the
recognition dictionary by placing "{}" rather then "[]" in the second column. These words will be treated as
filler words.
From Julian version 3.2 it is possible to include a word insertion penalty. For the first pass use the
"-penalty1" option and use "-penalty2" for the second pass.
Files can undergo the same speech segmentation preprocessing that is used for microphone
input. Use the startup option "-pausesegment".
You can alter the segmentation parameters using the options below.
-lv: Input threshold level (0-32767)
-zc: Zero crossing threshold (Counts per second)
-headmargin: Length of silence at the start of the file (msec)
-tailmargin: Length of silence at the end of the file (msec)
The speech input module uses threaded processing by default.
For microphone input, if the input is less then 50 frames (0.5 sec) CMN will not be performed.
Sampling rates other then 16kHz may be used (Including microphone input).
High and low pass filter cutoff frequencies can be set.
Acoustic analysis parameters can be set.
-smpFreq: Sample frequency (Hz)
-smpPeriod: Sample Period (ns)
-fsize: Frame size (Number of samples)
-fshift: Frame shift (Number of Samples)
-hipass: High pass cutoff frequency (Hz)
-lopass: Low pass cutoff frequency (Hz)
5
Other
To perform recognition on a number of files a filelist listing the names of the files can be used.
Julius can then be run with the option "-filelist filename".
In the case that there are errors within the recognition dictionary, the system can be forced to
ignore these and perform recognition with the "-forcedict" option.
For PTM/triphone models, it is no longer necessary to define monophones and biphones in the
HMMList. For interword triphones when the relevant triphone are not defined in HMMList, the
likelihood for all triphones that match the current context are calculated and those with the
maximum likelihood are used.
Various small bug-fixes, improvements in implementation (especially in the speech input and
acoustic likelihood calculation area)
6
Platform
OS
The Unix version of Julius/Julian Rev.3.2 will run on the following platforms.
Rev.3.1 was tested on the following platforms. Rev.3.2 should also run on these platforms with no
problems.
PC Linux 2.2.x
FreeBSD 3.2-RELEASE
Sun Solaris 2.5.1
Sun Solaris 2.6
SGI IRIX 6.3
DEC Digital UNIX 3.2
Digital UNIX 4.0
Sun SunOS 4.1.3
Machine Spec
For microphone input a soundcard that can record at 16bits and the appropriate driver are required.
7
Installation
Here an explanation of the installation procedure for the Unix version of Julius is given. For the
Windows version please refer to either the Julius for SAPI README (CSRC CD-ROM) (Japanese) or the
Julius for SAPI homepage (Kyoto University) (Japanese).
Quick Start
1. Preparation
By default Julius can only load RAW and WAV (no compression) audio files. However the libsndfile
library allows Julius to load various other audio format files, including AIFF, AU, NIST, SND, and
WAV(ADPCM) formats. At configuration time Julius automatically searches for the libsndfile library, which
should be installed in advance.
(If required)
Install the libsndfile library.
Next extract the Julius source package into the appropriate directory.
8
2. configure
The configure script searches for information on the OS type, CPU endian, C compiler and required
libraries, microphone support, and other information. It then automatically adapts the make file. First run
the configuration script.
% ./configure
It is possible to set various options at configuration time. By default Julius is compiled with the optimal
settings with respect to speed. Other settings can be used as shown below.
% ./configure --enable-setup=standard
The default compiler is "gcc" and the default debug options are "-O2 -g". When using Linux the default
debug options are "-O6 -fomit-frame-pointer". In other cases, the compiler settings can be changed by
altering the appropriate environment variables, "CC" and "CFLAG", before running the configure script.
The header path and other settings are set in the "CPPFLAGS" environment variable.
3. Compile
% make
Builds the julius, mkbingram, adinrec, adintool, and mkgshmm binary executable under the appropriate
directories.
4. Install
% make install
Installs the binary executable in /usr/local/bin/ and the online manuals in /usr/local/man/.
9
Documentation
Tutorial
An explanation of the basic functioning of Julius.
Microphone Input
An explanation of microphone input. OS specific setup, input level control etc.
Compile Time Options
An in depth explanation of compile time options.
File Types and Restrictions Format specifications of models and audio files supported by Julius.
Triphones and HMMList An explanation of physical and logical triphones and the format of the
HMMList file.
10
Tutorial -from initialization to recognition-
This tutorial describes the basic functioning of the Unix version of Julius. Here file preparation, system
startup, and recognition are explained.
1. Preparation
2. Audio file recognition
3. Microphone Input recognition
1. Preparation
When context dependent acoustic HMM models are used (triphone, PTM etc.) the following file is also
required.
For detailed information on each of the file types refer to File Formats and Restrictions.
Simple acoustic and language models have been included in the following distributions:
This tutorial assumes that user is using one of the CD-ROM distributions above.
11
The Run-time Configuration File (jconf)
The model files, recognition parameters, and input source can be set in the "jconf" configuration file. The
source archive contains a sample configuration file "Sample.jconf". Copy this file to the appropriate
directory and edit it to fit your requirements.
As described above users who are using a CD-ROM distribution with acoustic and language models
should use the "jconf" file provided on that CD-ROM.
It is assumed that the audio format is 16bit .WAV (no compression) or RAW (big endian)format. The
sampling rate is dependent on the analysis conditions used to create the acoustic model, but generally
this is 16kHz.
Execute Julius from the command line. Use the "-C" option to select the "jconf" file to be used.
% julius -C jconf_filename
Each of the option statements within the jconf file are treated as command line options. One can
override these by placing an option after the "-C jconf_filename" setting. For example if we wish to use
audio files as the system input and this has not been defined in the jconf file we can set this on the
command line.
Once initialization has completed, the prompt below is displayed and the system waits for a response.
enter filename->
Enter a filename and Julius will perform recognition on that file.
The recognition process takes place in two passes. First 2-gram frame synchronous recognition is
performed on the input. An example of the output of this first pass is shown below.
12
input speechfile: ./test.raw
97995 samples (6.12 sec.)
### speech analysis (waveform -> MFCC)
length: 610 frames (6.10 sec.)
attach MFCC_E_D_Z->MFCC_E_N_D_Z
### Recognition: 1st pass (LR beam with 2-gram)
.................
pass1_best: SPECULATION IN TOKYO WAS AT THE N. COULD RISE BECAUSE
OF THE REALIGNMENT
pass1_best_wordseq: <s> SPECULATION IN TOKYO WAS AT THE N. COULD RISE
BECAUSE OF THE REALIGNMENT </s>
pass1_best_phonemeseq: silB | s p E k y x l e S I n | I n | t o k y o l w a s | @ t | D i |
E n | k U d | r Y z | b x k A z | A v I D i | r i x l Y n m I n t | silE
pass1_best_score: -14745.729492
pass1_best: First pass best hypothesis word sequence (interim result).
pass1_best_wordseq: Same as above but as a N-gram sequence.
pass1_best_phonemeseq: Best hypothesis phoneme sequence ("|" separates words)
pass1_best_score: The hypothesis score (Log-likelihood)
After the first pass finishes, the second pass is performed and a final recognition result is displayed.
The 2nd pass uses the interim results from the first pass and searches these results using a 3-gram stack
decoding technique.
13
sentence1: is the final recognition result. Once recognition is complete the system returns to the filename
prompt.
The format of the recognition output can be altered. Using the "-progout" option at run time causes the
first pass interim results to gradually be shown, while the "-quiet" option gives a simple output as shown
below.
Direct microphone input can also be recognized. Use the run-time option "-input mic" to use microphone
input.
After initialization the prompt below will appear and the system will wait for input.
When one starts to speak into the microphone, first pass recognition processing will begin. When one
stops speaking the system will switch to the second pass, the finalized recognition result will be displayed
and the system will return to the above prompt.
Note that the first utterance after startup will not be recognized properly.
If recognition starts prematurely due to environmental noise or on the other hand, if you speak and the
system does not begin recognition, then alter the microphone input level appropriately.
14
Microphone Input
Here, environment settings, restrictions, and other points of caution relating to microphone input with
Julius/Julian-3.2 are explained.
At present the only possible feature extraction method that can take place within Julius/Julian is
MFCC_E_D_NZ feature extraction (this is the same features that are used in IPA's acoustic models). All
acoustic models that are to be used with Julius's direct microphone input must be in this format. When
using acoustic models that use different feature formats, e.g. LPC, microphone input cannot be used.
Care should be taken when dealing with other formats (including HTK parameter files).
15
For each OS, details and warnings are as follows.
Linux
To specifically set the sound driver use the "--with-mictype=TYPE" option with configure. (TYPE can
be set to "oss" or "alsa").
Please note that at present whether speech data is captured adequately is largely dependent on the
chipset and software driver being used. Commonly used "Sound-Blaster Pro compatible" soundcards
have been found not to work with the Julius system. It has also been found that especially on laptop PC's
that in many cases the quality of the sound cards 16bit capture capabilities is very poor. In such cases the
Julius systems direct microphone input may not function adequately.
Under Linux Julius does not perform sound mixing internally, an external tool such as xmixer should be
used to select the input device and set the input device volume.
Related Links:
Linux Sound-HOWTO
ALSA
OSS/Linux
The default device path is "/dev/audio". You can change the device path using the AUDIODEV variable
before compiling Julius.
16
After startup has completed, the audio input device will be automatically switched to the microphone
and the input volume will be set to 14.
Julius has been tested with SunOS 4.1.3. It is necessary to use the <multimedia/*> headers at compile
time.
The default device path is "/dev/audio". You can change the device path using the AUDIODEV variable
before compiling Julius.
After startup has completed, the audio input device will be automatically switched to the microphone
and the input volume will be set to 20.
SGI IRIX
Julius has been tested with IRIX 6.3. (Julius will very likely run under system 5.x as well)
After startup has completed, the audio input will be automatically set to microphone input, however the
input volume will not be set automatically. Use the 'apanel' command to set the volume manually.
FreeBSD
Julius has been tested with FreeBSD Release 3.2 using the 'snd' driver. Use the compile option
"--with-mictype=oss".
As audio mixing is not performed, an external tool must be used to select the input device (MIC/LINE)
and set the input volume.
Sound card and driver problems are similar to those in the Linux system, see the Linux section above.
It is not necessary to use any special configuration options at compile time to allow microphone input. The
"configure" script automatically detects the OS type used and includes the necessary libraries. You
should however check the final configure messages as shown below to make sure the appropriate sound
driver was detected correctly.
17
In the case that automatic detection fails set the microphone driver type use "-with-mictype = TYPE",
where TYPE is one of oss, alsa, freebsd, sol2, sun4, or irix.
From version 3,1p1 even if the driver is ALSA, OSS API is used by default. If one intents to use ALSA
then they should use OSS emulation. Native ALSA API (beta version) can be used with the configuration
option "--with-mictype = alsa".
Here microphone recognition with Julius is explained. In the case that an error occurs during startup
please refer to sections 5 and 6 below.
Microphone input can be selected at startup using the "-input mic" option. If you do this then after
initialization a prompt like that below will appear and the system will wait for a voice trigger. (Utterances
that occur before the prompt appears will be disregarded).
After confirmation of the prompt, face the microphone and speak. The microphone should be held about
15 cm from the mouth and speech should be as clear as possible.
Once the input is greater then a certain level Julius begins the first pass processing. Analysis proceeds
in parallel with the input. At the next long pause the first pass analysis is stopped, and the system then
switches over to the second pass search. The system then outputs the final result, and waits for the next
speech utterance. The process is then repeated.
! WARNING !
When using microphone input, The first utterance cannot be recognized properly.
Real time processing uses the previous input to calculate the current CMN parameters, thus for the
first input utterance CMN cannot be performed.
For the first utterance say "Mic Test" or any arbitrary utterance, and that input will be used to update
the CMN parameters. Normal recognition will be performed from the second utterance.
If the “-demo” option has been set at startup, then during the first pass real-time interim results will be
displayed.
18
5. Recording volume and trigger level adjustment
When recognition is not performing adequately, it may be necessary to reset the volume or speech
detection level.
When altering these settings it is best to first set the input volume and then reset the speech detection
level. When the microphone input is not being detected properly first the microphone input level should be
set. (It is often best to use an external sound/mixing tool to check that the volume is set correctly).
If sensitivity is low and voice detection is not occurring correctly, or on the other hand the voice
detection trigger is too sensitive to external noise then the trigger level should be altered. The speech
detection level is set at startup using the -lv option. The value has a range from 0 - 32767 (unsigned
short). The default is 3000. Increase the speech detection level to decrease the sensitivity of the trigger,
and decrease the level to increase the sensitivity of the trigger.
From Rev.2.2 a program that captures a single utterance from the microphone, adinrec, is included.
Using this, the quality of the captured speech can be checked.
% ./adinrec/adinrec myfile
When used as shown above, adinrec will record one utterance from the microphone to the file 'myfile'. As
'adinrec' uses the same voice capture routines as Julius, the quality of the recorded file will be the same
as that used for recognition.
If no options are set, then the recorded file will be headerless 16kHz, monoral, signed 16bit big endian.
It can be played back as shown below. (Using the sox or aplay utilities)
spwave
wavesurfer
snd
19
Compile Time Options
Options that can be used with "configure" when compiling Julius-3.2 are given below.
"--enable-setup=..." can be used to set the recognition algorithm to one of the three presets below.
You can see what the current Julius executable algorithm settings are by using the "-version" option at
start up, as shown below.
% julius -version
###### booting up system
Julius rev.3.2 (fast)
built on jale at Sun Aug 12 20:58:44 JST 2001
- compiler: gcc -O6 -fomit-frame-pointer
- options: UNIGRAM_FACTORING LOWMEM2 PASS1_IWCD SCAN_BEAM
GPRUNE_DEFAULT_BEAM
1) Standard --enable-setup=standard
20
2) Fast --enable-setup=fast (default)
In this setup the best balance between speed and precision is given. The gaussian beam
pruning technique is used. No specified time limit is set.
The differences in the three setup options above are shown in the following graph,
where an O indicates that the option is enabled.
By default Julius performs high-speed approximate calculations using 1-gram factoring and 1-best
approximation. Thus the first-pass recognition contains approximation errors and the results of this pass
are not optimal. (However these errors are usually recovered in the second pass)
In some cases only a 2-gram model may be available, or greater emphasis on the first pass precision
may be required. In these cases the settings below can be used.
With these settings, 2-gram factoring, interword triphone calculations and word-pair approximation rather
then 1-best approximation is performed. With these changes the computational cost increases
considerably however first pass precision is also gained.
21
Option Details
* At this time this option can only be used with file based
input. Microphone input cannot use this option.
--enable-words-int Change the word ID type from unsigned short to int. This
increases the memory required by Julius, but enables a
vocabulary of greater then 65535 words to be used. At
22
present system performance when using this option is not
guaranteed.
23
File Types and Restrictions
The file formats that Julius can use are given below.
(All files can also be read from their compressed gzip (.gz) state)
Details of the usage and format of each file type are given below.
Julius can only load HMM definition files that are written in HTK's HMM definition language. The
system automatically detects and loads the following three types of HMM acoustic models, monophone,
triphone, and tied-mixture based HMMs.
When using triphone models, in order to convert the pronunciation in the dictionary to triphones, a
HMMList file is required. (Described below). This lists the links between all tri-phones that may occur in
the dictionary and the physical triphone models.
Julius does not implement all the HMM methods that are used in HTK, it only implements a subset.
Special care needs to be taken especially with state transitions.
Format
(More detailed information on each item below is given in section 7.9 of "The HTK Book for HTK V2.0")
24
Mixture Component weights and Stream weight vectors are optional. The variance vector is by
default diagonal. InvCover, LLTCover, XForm, and FULL keywords cannot be used. Duration
parameter vectors cannot be used.
If the HMM definition file does not meet the restrictions above, Julius will output an error message and
exit.
Size restrictions
There is no limit to the number of HMM definitions, conditional probabilities, or macros that can be used
within one definition file. The system is only limited by memory size.
From version 3.0 tied-mixture based models are supported. As well as conventional tied-mixture
models, using a single codebook, phonetic tied-mixture models with multiple phonetic codebooks may be
used. As with HTK a codebook may have any number of definitions. However normal mixture distribution
definitions cannot be used.
1. state-driven
Calculation of monophone and shared transition triphones. Output likelihood calculations are
performed by state, the cache also stores information by state.
25
2. mixture-driven
Calculations are preformed by codebook (Gaussian distribution sets). For each frame, first the
likelihood of all Gaussian distributions within the codebook are first calculated, and then all
distributions likelihoods for each codebook are cached. HMM state output probabilities are calculated
and weighted using the data in the cache of the corresponding codebooks.
The acoustic model is deemed to be a tied mixture model if the <Tmix> directive is present within
hmmdefs. When Julius loads hmmdefs if the <Tmix> directive is present the Julius will treat the model as
tied-mixture based, otherwise mixture-driven calculation methods will be used.
Due to this acoustic models must use the <Tmix> directive for Julius to perform tied-mixture calculations.
The <Tmix> directive is used in the same way as in HTK, refer to the corresponding section in “The HTK
Book” for more details on the <Tmix> directive.
The HMMlist file is a dictionary of phoneme declarations of the form "e-i+q". It describes the mapping
between the phonetic triphone and the actual hmmdef definition. This file is necessary when using
triphone HMMs.
For information on the format of the HMMList file refer to Triphones and HMMList.
Format
When reading ARPA format files it is necessary to place the unknown language category (<UNK>) as
the first entry of the 1-grams. (In the case that the model has been built using the CMU SLM Tool Kit the
above restriction will be satisfied).
26
Within the word dictionary words without N-grams, will use the probability of the total unknown language
category.
When 2-gram and reverse 3-gram models are loaded if a reverse 3-gram tuple does not match any
2-gram context then that tuple will not be used, a warning message will appear, and the language model
will continue to load.
Size restrictions
The vocabulary is limited to 65535 words.
If the "-enable-word-int" option is used at compile time, then the limit will be extended to 2^31 words. It
should be noted that at present there is no guarantee that Julius will work on all systems if this option is
used.
When using this option caution needs to be taken, as it will be necessary to re-compile all binary N-grams,
compiled under the original configuration
Binary N-grams are created from 2-gram, and reverse 3-gram files (both in ARPA format). The utility
"mkbingram" is used to do this.
By pre-compiling the language model, the startup time of the system is greatly reduced. Also the runtime
size of the system is decreased.
Note that Julius's binary N-grams are not compatible with the CMU-TK binary N-gram format (.binlm)
27
4. Word dictionary files (-v)
Julius's word dictionary format is very similar to HTK's format. The difference is that the second field
(Output Symbol) is not optional.
Format
Field one (word name)
It is necessary to use the same Japanese encoding system in the dictionary file and the word N-gram
language files so the appropriate entries can be matched.
Words that do not have N-gram entries are matched to the <UNK> N-gram. That N-gram probability
will be that of the total corrected probabilities of all non-N-gram words.
Field two (Output symbol)
This is sent to the output queue as a recognition result. The value must be surrounded with brackets
[ ]. If the symbol is [] then nothing will be outputted.
Field three (HMM phoneme sequence)
The HMM phoneme sequence is described using monophones. (When using triphones HMM's
intraword context dependencies are automatically created when loading the dictionary)
[Example]
(It is not necessary for words to be sorted alphabetically.)
ABANDONMENT [ABANDONMENT] xb@ndInmInt
ABBAS [ABBAS] @bxs
ABBAS [ABBAS] @bxz
ABBEY [ABBEY] @bi
ABBOTT [ABBOTT] @bxt
ABBOUND [ABBOUND] xbud
ABIDE [ABIDE] xbYd
ABILITIES [ABILITIES] xbIlItiz
ABILITY [ABILITY] xbIlIti
ABLAZE [ABLAZE] xblez
Size Limit
However, at configuration time if the "-enable-word-int" option is used the dictionary can be extended to
2^31 words. At present performance is not guaranteed when using this option.
28
5. Microphone Input (-input mic)
The default maximum sample length is 20 seconds (320k samples). To increase this length increase the
size of MAXSPEECHLEN in the include/sent/speech.h header file and re-compile the system.
At this point of time Julius can only extract MFCC_E_D_N_Z feature parameters internally. Care should
be taken as acoustic models based on any feature extraction method other then MFCC_E_D_N_Z, can
not be used with microphone input.
Julius automatically detects on loading the two file types given below,
If the libsndfile library has been included at compile time then the file formats below may also be used for
file input. (Refer to the libsndfile documentation for further information) All data must be in 16kHz, 16bit
format. (Sound data cannot be resampled within Julius)
29
The libsndfile library can be downloaded from
http://www.zip.com.au/~erikd/libsndfile
Due to the nature of the Julius search algorithm, any extra long input will cause instability in the 2nd
search pass. Thus it is desirable to punctuate sentences with short soundless pauses.
At this time Julius can only internally extract MFCC_E_D_N_Z feature parameters. Other feature based
acoustic models cannot be used for speech waveform file recognition. An external parameter extraction
tool must be used. This is described in more detail below.
Format
Feature parameter formats (base, kind, qualifier) can be used for recognition. However it is necessary
that the HMM acoustic model being used was trained using the same feature format. If any of the
necessary parameters are not present then an error will occur.
If for some reason the HMM type checking does not function properly, you can use the -notypecheck
option to turn off type checking.
It is necessary that the feature parameter format matches the original HMM training data feature format.
However if all the necessary parameters for the HMM are held within the given feature parameter file,
Julius will then automatically extract the appropriate parameters for recognition.
For example
If the parameter format below is used for training
MFCC_E_D_N_Z = MFCC(12)+ ∆MFCC(12)+ ∆Pow(1) (CMN) 25-dimension
Then for recognition you can also use feature parameter files other then MFCC_E_D_N_Z, such as
30
or
MFCC_E_D_A_Z = MFCC(12)+Pow(1)+ ∆MFCC(12)+ ∆Pow(1)
+ ∆∆MFCC(12) + ∆∆Pow(1) (CMN) 39-dimension
The parameter file needs to contain all of the parameters used for the original training of the HMM model,
extra data contained with in the file will not be used.
In the case that an exceptionally long input sentence is inputted the error below will occur.
The maximum sentence length can be extended, by increasing the MAXSEQNUM definition in the
include/sent/speech.h header file,
31
Triphones and HMMList
This document describes how Julius handles context dependent phonemes models (triphones) and the
HMMList file.
When Julius is given a context dependent phoneme model (triphone), triphone declarations are created
from the phoneme declarations in the recognition dictionary, and these are mapped to the corresponding
HMMs.
To generate a triphone declaration from monophones, the phone "X" following a phone "L", and preceding
a phone "R", the triphone is declared in the form "L-X+R". Below is an example of the conversion to a
triphone declaration of the word "TRANSLATE".
TRANSLATE [TRANSLATE] t r @ n s l e t
|
TRANSLATE [TRANSLATE] t+r t-r+@ r-@+n @-n+s n-s+l s-l+e l-e+t e-t
In Julius phoneme declarations like this are created from the recognition dictionary and called "logical
triphones". The actual HMM names defined in hmmdefs are called "physical triphones".
2. HMMList File
The mapping between the logical triphones and physical triphones are specified in the HMMList file. The
HMMList file gives the mappings between all possible triphones and the HMM's that are defined in
hmmdefs. Below are details of the format.
1. Only one mapping for a logical triphone can be made per line.
2. The first column contains the logical triphone, the 2nd column defines the corresponding HMM
name in hmmdefs.
3. If the triphone uses the same name as that defined within hmmdefs then the 2nd column is empty.
4. All logical triphones that may occur must be defined.
5. If a triphone is mapped to itself then an error will occur.
Below is an example. Entries that have an empty 2nd column show that the triphone name relates to an
HMM that is directly defined within hmmdefs.
32
a-k
a-k+a
a-k+a: a-k+a
a-k+e
a-k+e: a-k+e
a-k+i
a-k+i: a-k+i
a-k+o
a-k+o: a-k+o
a-k+u
a-k+u: a-k+u
...
The actual mapping the system uses can be checked using the Julius run-time option "-check ". After
finishing initialization an input prompt will appear, here enter the logical HMM triphone name and
information relating to that triphone will be displayed.
For Julius the type of phoneme context used is different for the two passes. On the first pass, from the
intra-word triphones that match the current context, those with the maximum likelihoods are used. For
example, for the end boundary of the word "TRANSLATE" the likelihoods of the HMMs "e-t+a", "e-t+e",
"e-tn+u" etc are calculated, and those with the maximum likelihood are assigned.
In the second pass more precise inter-word contexts are calculated. When the following word is
expanded from a hypothesis, the search for the next model takes context into consideration.
33
4. Warnings When Creating HMMList Files
The assignments in the HMMList file overwrite the HMM definitions in hmmdefs. In other words, if a
definition name within hmmdefs is the same as one used in a mapping in HMMList, the mapping has
more priority. For example if hmmdefs contains the definition:
~h "s-I+z"
s-I+z z-I+z
then the HMM "s-I+z" will not be used, instead "z-I+z" will be used.
34
Man Pages
Online manuals:
35
NAME
SYNOPSIS
DESCRIPTION
36
Model Usage
Acoustic Models
Acoustic HMM(Hidden Markov Model) are used.
Phoneme models (monophone), context dependent
phoneme models (triphone), tied-mixture and
phonetic tied-mixture models can be used. When
using context dependent models, interword
context is taken into consideration. Files
written in HTKs HMM definition language can be
used.
Language Model
The system uses 2-gram and reverse 3-gram
language models. Standard format ARPA files
are used. Binary format N-gram models built
using the attached tool mkbingram can also be
used.
Speech Input
37
Search Algorithm
OPTIONS
The options below allow you to select the models used and
set system parameters. You can set these option at the
command line, however it is recommended that you combine
these options in the jconf settings file and use the "-C"
option at run time.
Speech Input
-input {rawfile|mfcfile|mic|netaudio|adinserv}
Select the speech wave data input source.
(default: mfcfile)
For information on file formats refer to the Julius
documentation.
38
-NA server:unit
When using (-input netaudio) set the server name
and unit ID of the Datlink unit to connect to.
-filelist file
With (-input rawfile|mfcfile) perform
recognition on all files contained within the target
filelist.
-adport portnum
With (input adinserv) A-D server port number.
Speech segmentation
-pausesegment
-nopausesegment
Force speech segmentation (segment detection) ON / OFF.
(For mic/adinnet default = ON. For files, default = OFF)
-lv threslevel
Amplitude threshold (0 - 32767). If the amplitude
passes this threshold it is considered to be the
beginning of a speech segment, if it drops below
this level then it is the end of the speech segment.
(default: 3000)
-headmargin msec
Margin at the start of the speech segment (msec).
(default: 300)
-tailmargin msec
Margin at the end of the speech segment (msec).
(default: 400)
-zc zerocrossnum
Zero crossing threshold. (default: 60)
39
-nostrip
Depending on the sound device, invalid "0" samples
at the start and end of recording may not be removed
automatically. The default is to perform automatic removal.
Acoustic Analysis
-smpFreq frequency
Sampling frequency (Hz).
(default: 16kHz = 625ns)
-smpPeriod period
Sampling rate (ns)
(default: 625ns = 16kHz)
-fsize sample
Analysis window size (No. samples).
(default: 400, 25mS)
-fshift sample
Frame shift (No. samples). (default: 160, 10mS)
-hipass frequency
Highpass filter cutoff frequency (Hz).
(default: -1 = disable)
-lopass frequency
Lowpass filter cutoff frequency (Hz)
(default: -1 = disable)
Language Model(N-gram)
-nlr 2gram_filename
2-gram language model filename (standard ARPA format)
-nrl rev_3gram_filename
Reverse 3-gram language model filename. This is
required for the second search pass. If this is
not defined then only the first pass will take
place.
40
-d bingram_filename
Use a binary language model as built using
mkbingram(1). This is used in place of the "-nlr"
and "-nlr" options above, and allows Julius to
perform rapid initialization.
-transp float
Insertion penalty for [transparent words].
(default: 0.0)
Word Dictionary
-v dictionary_file
Word Dictionary File (Required)
-silhead {WORD|WORD[OUTSYM]|#num}
41
-siltail {WORD|WORD[OUTSYM]|#num}
Sentence start and end silence as defined in the
word dictionary.
(default: "<s>" / "</s>")
Example
Word_name <s>
Word_name[output_symbol] <s>[silB]
#Word_ID #14
-forcedict
Disregard dictionary errors.
(Skip word definitions with errors)
Acoustic Model(HMM)
-h hmmfilename
The name of the HMM definition file to use.
(Required)
-hlist HMMlistfilename
HMMList filename. Required when using triphone
based HMMS. Details are contained in the Julius
documentation.
This file provides a mapping between the logical
triphones names genertated from the phonetic
representation in the dictionary and the HMM
definition names.
-force_ccd / -no_ccd
When using a triphone acoustic model these options
control interword context dependency. If neither of
42
these options are set then the use of interword
context dependency will be determined from the
models definition names.
If the "-force_ccd" option is set when using
something other then a triphone model, there is no
guarantee that Julius will run.
-notypecheck
Do not check the input parameter type.
(default: Perform the check)
-iwcd1 {max|avg}
When using a triphone acoustic model set the
interword acoustic likelihood calculation method
used in the first pass.
max: The maximum identical context triphone value (default)
avg: The average identical context triphone value
-gprune {safe|heuristic|beam|none}
Set the gaussian pruning technique to use.
(default: safe (standard) beam (high-speed))
-gshmm hmmdefs
Set the Gaussian Mixture Selection monophone acoustic
model to use. A GMS monophone model is generated
from an ordinary monophone HMM model using the
attached program mkgshmm(1).
(default : none (do not use GMS))
-gsnum N
When using GMS, only perform triphone calculations
for the top N monophone states. (default: 24)
43
Short pause segmentation
-spdur
Set the sp threshold length for use in the first
pass (number of frames). If number of frames that
the sp "unit" has the maximum likelihood is greater
then this threshold then, interrupt the first pass
and start the second pass. (default: 10)
-sepnum N
(Used with the configure option "--enable-lowmem2")
Number of high frequency words to separate from the
dictionary tree. (default: 150)
-1pass
Only perform the first pass search. This mode is
automatically set when no 3-gram language model
has been specified (-nlr).
-realtime
44
-norealtime
Explicitly state whether real time processing will be
used in the first pass or not. For file input the
default is OFF (-norealtime), for microphone, or
NetAudio network input the default is ON
(-realtime). This option relates to the way CMN is
performed: when OFF CMN is calculated for each
input independently, when the realtime option is ON
the previous 5 second of input is always used.
Refer to -progout.
-n candidate_num
The search continues until "candidate_num" sentence
hypothesis have been found. These hypotheses are
re-sorted by score and the final result is displayed.
(Refer to the "-output" option). As Julius does not
strictly guarantee a optimal second pass search,
the maximum likelihood candidate is not always
given first.
45
-output N
Used with the "-n" option above. Output the top N
sentence hypothesis. (default: 1)
-sb score
Score envelope width. For each frame, do not scan
areas that deviate from the highest score by more
then this envelope. This directly relates to the speed
of the second pass acoustic likelihood calculations.
(default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value gives more
stable results, but increases the amount of memory
required. (default: 500)
-m overflow_pop_times
Number of expanded hypotheses required to
discontinue the search. If the number of expanded
hypotheses is greater then this threshold then, the search
is discontinued at that point. The larger this
value is, the longer the search will continue, but
processing time for search failures will also
increase. (default: 2000)
-lookuprange nframe
When performing word expansion, this option sets
the number of frames before and after in which to consider
word expansion. This prevents the omission of short
words but, with a large value, the number of expanded
hypotheses increases and system becomes slow. (default: 5)
Forced alignment
-walign
Return the result of viterbi alignment of the word
units from the recognition results.
46
-palign
Return the result of viterbi alignment of the
phoneme units from the recognition results.
Message Output
-separatescore
Output the language acoustic scores separately
-progout
Gradually output the interim results from the
first pass at regular intervals.
-proginterval msec
set the -progout output time interval (msec).
Other
-debug Display debug information.
-C jconffile
Load the jconf settings file. Here runtime options
can be loaded that are set in this file.
-version
Display program name, compile time, and compile
time options.
-help
Display a brief overview of options.
47
EXAMPLES
SEE ALSO
DIAGNOSTICS
BUGS
48
AUTHORS
Rev.1.0 (1998/02/20)
Designed by Tatsuya Kawahara and Akinobo Lee
(Kyoto University)
Development by Akinobo Lee (Kyoto University)
Rev.1.1 (1998/04/14)
Rev.1.2 (1998/10/31)
Rev.2.0 (1999/02/20)
Rev.2.1 (1999/04/20)
Rev.2.2 (1999/10/04)
Rev.3.0 (2000/02/14)
Rev.3.1 (2000/05/11)
Development by Akinobo Lee (Kyoto University)
Rev.3.2 (2001/08/15)
Development mainly by Akinobo Lee
(Nara Institute of Science and Technology)
THANKS TO
49
NAME
SYNOPSIS
DESCRIPTION
OPTIONS
-freq threshold
Sampling frequency in Hz. (default: 16000)
-lv threslevel
Amplitude threshold (0 - 32767). If the amplitude
passes this threshold it is considered as the
beginning of the speech segment, if it drops below
this level then it is treated as the end of the speech
segment.
(default: 3000)
50
-zc zerocrossnum
Zero crossing threshold. (default: 60)
-margin msec
Margin to place at the beginning and end of the
speech segment (msec). (default: 300)
-nostrip
Depending on the sound device, invalid "0" samples
at the start and end of recording do not have to be
removed. The default is automatic removal.
SEE ALSO
BUGS
LICENSE
51
NAME
SYNOPSIS
DESCRIPTION
USAGE
SEE ALSO
julius(1)
BUGS
LICENSE
52
NAME
SYNOPSIS
DESCRIPTION
EXAMPLE
53
Warning: Make sure that the GMS model is created from the
same corpus as the triphone or PTM model to be used. If
this is not followed then there will be a mismatch with
the GMS model, selection errors will occur and
performance will degrade.
SEE ALSO
Julius(1)
BUGS
LICENSE
54
Bibliography
1. A.Lee, T.Kawahara, and K.Shikano. Julius -- an open source real-time large vocabulary
recognition engine. In Proc. EUROSPEECH, pp.1691--1694, 2001.
2. T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, S.Sagayama, K.Itou, A.Ito,
M.Yamamoto, A.Yamada, T.Utsuro, and K.Shikano. Free software toolkit for Japanese large
vocabulary continuous speech recognition. In Proc. ICSLP, Vol.4, pp.476--479, 2000.
3. T.Kawahara, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, M.Yamamoto, A.Yamada, T.Utsuro,
and K.Shikano. Japanese dictation toolkit -- plug-and-play framework for speech recognition
R¥&D --. In Proc. IEEE workshop on Automatic Speech Recognition and Understanding,
pp.393--396, 1999.
4. T.Kawahara, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, M.Yamamoto, T.Utsuro, and
K.Shikano. Sharable software repository for Japanese large vocabulary continuous speech
recognition. In Proc. ICSLP, pp.3257--3260, 1998.
55
Julius/Julian Release History
'98
02/20 Julius Rev.1.0 release
04/14 Julius Rev.1.1 release
07/30 Julian Rev.1.0 release
10/24 Julius Rev.1.2 release
'99
02/20 Julius Rev.2.0 release
03/01 Julian Rev.2.0 release
04/20 Julius/Julian Rev.2.1 release
10/04 Julius/Julian Rev.2.2 release
2000
02/14 Julius/Julian Rev.3.0 release
05/11 Julius/Julian Rev.3.1 release
06/23 Julius/Julian Rev.3.1p1 release
2001
02/27 Julius/Julian Rev.3.1p2 release
06/22 Julius/Julian Rev 3.2(beta) release
08/18 Julius/Julian Rev 3.2 release
The Julius user Mailing List is currently only available in Japanese. Please refer to the
Japanese documentation for information about this list.
56
Contact and Links
Links
Contacts
Contact: [email protected]
System Developer: Akinobo Lee ([email protected])
57
Grammar Based Continuous Speech Parser
Julian
rev. 3.2
(2001/08/18)
Julian
Julian is a finite state grammar based continuous speech recognition parser. The most likely
word sequence depending on the speech input and given grammar is calculated and the result is
displayed.
A two pass A* search is used. In the first pass a beam search using degrees of freedom derived
from the grammar takes place. In the second pass the results from the first pass are used as
heuristics, and high precision recognition is performed using a A* search.
Except for the grammar rules, most parts of the system are the same as those used in Julius.
The usage, acoustic models that can be used, and speech input settings etc. are the same as
Julius (for the same revision).
Maximum word limit of 65,535 words
58
Changes between Rev.2.2 and Rev.3.1:
59
Julian, System Structure
System Structure
A finite state grammar language model, and a HMM acoustic model is used.
The input speech is processed using a two-pass search.
1. First Pass: Category restricted frame synchronous beam search. (A high speed
approximate search)
2. Second Pass: Grammar based N-best stack decoding. (High Precision)
60
Operating Environment and Installation
OS
Machine Spec
For a several hundred word task, if a monophone model is used, real time processing can occur with a
process size under 10MB. For a 5000 word task using PTM then a process size on the order of 30MB is
Installation
61
NAME
SYNOPSIS
DESCRIPTION
Model Usage
Acoustic Models
Acoustic HMM(Hidden Markov Model) are used.
Phoneme models (monophone), context dependent
phoneme models (triphone), tied-mixture and
phonetic tied-mixture models can be used. When
using context dependent models, interword
context is taken into consideration. Files
written in HTKs HMM definition language are
used.
62
Language Model
For the task grammar, sentence structures are
written in a BNF style using word categories as
terminating symbols to a grammar file. A voca
file contains the pronunciation (phoneme sequence)
for all words within each category are created.
These files are converted with mkdfa.pl(1) to a
deterministic finite automaton file (.dfa) and a
dictionary file (.dict)
Speech Input
Search Algorithm
63
the first and second passes. For tied-mixture and phonetic
tied-mixture models, high speed acoustic likelihood
calculations using gaussian pruning are performed.
OPTIONS
Speech Input
-input {rawfile|mfcfile|mic|netaudio|adinserv}
Select the speech wave data input source.
(default: mfcfile)
For information on file formats refer to the Julius
documentation.
-NA server:unit
When using (-input netaudio) set the server name
and unit ID of the DatLink unit to connect to.
-filelist file
With (-input rawfile|mfcfile) perform
recognition on all files contained within the target
filelist.
-adport portnum
With (input adinserv) A-D server port number.
Speech segmentation
-pausesegment
-nopausesegment
Force speech segmentation (segment detection) ON / OFF.
(For mic, adinnet default = ON. For files, default = OFF)
64
-lv threslevel
Amplitude threshold (0 - 32767). If the amplitude
passes this threshold it is treated as the
beginning of the speech segment, if it drops below
this level then it is treated as the end of the speech
segment.
(default: 3000)
-headmargin msec
Margin at the start of the speech segment (msec).
(default: 300)
-tailmargin msec
Margin at the end of the speech segment (msec).
(default: 400)
-zc zerocrossnum
Zero crossing threshold. (default: 60)
-nostrip
Depending on the sound device, invalid "0" samples
at the start and end of recording may not be removed
automatically. The default is to perform automatic removal.
Acoustic Analysis
-smpFreq frequency
Sampling frequency (Hz).
(default: 16kHz = 625ns).
-smpPeriod period
Sampling rate (ns)
(default: 625ns = 16kHz).
-fsize sample
Analysis window size (No. samples).
(default: 400, 25mS)
65
-fshift sample
Frame shift (No. samples). (default: 160, 10mS)
-hipass frequency
Highpass filter cutoff frequency (Hz).
(default: -1 = disable)
-lopass frequency
Lowpass filter cutoff frequency (Hz).
(default: -1 = disable)
-penalty1 float
First pass word insertion penalty. (default: 0.0)
-penalty2 float
Second pass word insertion penalty.
(default: 0.0)
Recognition Dictionary
-v dictionary_file
Recognition Dictionary File (Required).
-silhead {WORD|WORD[OUTSYM]|#num}
-siltail {WORD|WORD[OUTSYM]|#num}
Sentence start and end silence as defined in the
word dictionary.
(default: "<s>" / "</s>")
66
Example
Word_name <s>
Word_name[output_symbol] <s>[silB]
#Word_ID #14
-forcedict
Disregard dictionary errors.
(Skip word definitions with errors)
Acoustic Model(HMM)
-h hmmfilename
The name of the HMM definition file to use.
(Required)
-hlist HMMlistfilename
HMMList filename. Required when using triphone
based HMMs. Details are contained in the Julius
documentation.
This file provides a mapping between the logical
triphones names generated from the phonetic
representation in the dictionary and the actual HMM
definition names (physical triphones).
-force_ccd / -no_ccd
When using a triphone acoustic model these options
control interword context dependency. If neither of
these options are set then the use of interword
context dependency will be determined from the
models definition names.
If the "-force_ccd" option is set when using
something other then a triphone model, there is no
guarantee that Julius will run.
-notypecheck
Do not check the input parameter type.
67
(default: Perform the check)
-iwcd1 {max|avg}
When using a triphone acoustic model set the
interword acoustic likelihood calculation method
used in the first pass.
max: The maximum same context triphone value (default)
avg: The average same context triphone value
-gprune {safe|heuristic|beam|none}
Set the gaussian pruning technique to use.
(default: safe (standard) beam (high-speed))
-gshmm hmmdefs
Set the Gaussian Mixture Selection monophone
model to use. A GMS monophone model is generated
from an ordinary monophone HMM model using the
attached program mkgshmm(1).
(default : none (do not use GMS))
-gsnum N
When using GMS, only perform triphone calculations
for the top N monophone states. (default: 24)
68
800 (triphone,PTM)
1000 (triphone,PTM,engine=v2.1)
-1pass
Only perform the first pass search. This mode is
automatically set when no 3-gram language model
has been specified (-nlr).
-realtime
-norealtime
Explicitly state whether real time processing will be
used in the first pass or not. For file input the
default is OFF (-norealtime), for microphone, or
NetAudio network input the default is ON
(-realtime). This option relates to the way CMN is
performed: when OFF CMN is calculated for each
input independently, when the realtime option is ON
the previous 5 second of input is always used.
Refer to -progout.
-n candidate_num
The search continues until "candidate_num" sentence
hypothesis have been found. These hypotheses are
re-sorted by score and the final result is displayed.
(Refer to the "-output" option). As Julius does not
strictly guarantee a optimal second pass search,
the maximum likelihood candidate is not always
given first.
69
but as a prolonged search must be performed, the
processing time also becomes large. (default: 1)
-output N
Used with the "-n" option above. Output the top N
sentence hypothesis. (default: 1)
-sb score
Score envelope width. For each frame, do not scan
parts that deviate from the highest score by more
then this envelope. This directly relates to the speed
of the second pass acoustic likelihood calculations.
(default: 80.0)
-s stack_size
The maximum number of hypothesis that can be stored
on the stack during the search. A larger value gives more
stable results, but increases the amount of memory
required. (default: 500)
-m overflow_pop_times
Number of expanded hypotheses required to
discontinue the search. If the number of expanded
hypotheses is greater then this threshold then, the search
is discontinued at that point. The larger this
value is, the longer the search will continue, but
processing time for search failures will also
increase. (default: 2000)
-lookuprange nframe
When performing word expansion, this option sets
the number of frames before and after in which to consider
word expansion. This prevents the omission of short
70
words but, with a large value, the expansion
hypotheses increase and becomes slow. (default: 5)
Forced alignment
-walign
Return the result of viterbi alignment of the word
units from the recognition results.
-palign
Return the result of viterbi alignment of the
phoneme units from the recognition results.
Message Output
-quiet
Omit phoneme sequence and score, only output
the best word sequence hypothesis.
-progout
Gradually output the interim results from the
first pass at regular intervals.
-proginterval msec
set the -progout output time interval (msec).
Other
-debug Display debug information.
-C jconffile
Load the jconf settings file. Here runtime options
can be loaded that are set in this file.
-version
Display program name, compile time, and compile
time options.
-help
71
Display a brief overview of options.
EXAMPLES
SEE ALSO
DIAGNOSTICS
BUGS
72
AUTHORS
Rev.1.0 (1998/07/20)
Designed by Tatsuya Kawahara and Akinobo Lee
(Kyoto University)
Rev.2.0 (1999/02/20)
Rev.2.1 (1999/04/20)
Rev.2.2 (1999/10/04)
Rev.3.1 (2000/05/11)
Development by Akinobo Lee (Kyoto University)
Rev.3.2 (2001/08/15)
Development mainly by Akinobo Lee
(Nara Institute of Science and Technology)
THANKS TO
73
Writing Grammar For Julius
Here an outline is given on how to write grammar to be used with Julian. For detailed information refer to
the separately distributed "Julian Grammar Kit" toolkit.
Task Grammar
Julian differs from Julius in that the sentence structure and words to be recognized have been explicitly
written. Julian performs a maximum likelihood search using only the degrees of freedom given by the
grammar, and outputs the word sequence that best matches the input utterance.
In most systems the sentence structure and word list are written for the specific task. Here we call this
"task grammar".
The task grammar is described in two files. A BNF style grammar file describes sentence structure and
category rules, and a voca file which registers word declarations and pronunciations (phoneme
sequence) for each category.
Using the sentence structure constraints from the format above it should be possible to describe CFG
classes however, Julian can only handle regular grammar classes. This constraint is automatically
checked at compile time. Also only left recursion can be handled. As an example, think of the grammar
that will describe the sentences below.
"Make it white"
"Change it to red"
"Make it red"
"Quit"
74
The grammar file below can be written to describe these sentences. The symbol S is fixed as the
sentence start symbol. NS_B, NS_E, and NOISE correspond to silences (pauses) at the start, end and
within a sentence.
Symbols that haven't appeared on the left are terminal symbols (shown above in red), in other words they
are word categories. Words pertaining to each of these word categories are registered in the voca file.
% COLOR_N
RED rEd
WHITE wYt
GREEN grin
BLUE blu
% OBJECT
IT It
% TO
TO tu
% MAKE_V
MAKE mek
% CHANGE_V
CHANGE CenJ
% QUIT_V
QUIT kwIt
% NS_B
silB silB
% NS_E
silE silE
% NOISE
sp sp
75
Conversion to Julian Format
Convert the grammar and voca files into a deterministic finite automaton file (dfa) and dictionary file (dict)
using the compiler "mkdfa.pl".
(When the grammar file is sample.grammar, and the voca file is sample.voca)
% mkdfa.pl sample
sample.grammar has 6 rules
sample.voca has 9 categories and 12 words
---
Now parsing grammar file
Now modifying grammar to minimize states[1]
Now parsing vocabulary file
Now making nondeterministic finite automaton[10/10]
Now making deterministic finite automaton[10/10]
Now making triplet list[10/10]
---
-rw-r--r-- 1 foo users 134 Aug 17 17:50 sample.dfa
-rw-r--r-- 1 foo users 212 Aug 17 17:50 sample.dict
-rw-r--r-- 1 foo users 75 Aug 17 17:50 sample.term
Starting Julian
Apart from the grammar it is also necessary to select the acoustic model to be used. Search parameters
can also be set. These settings can be stored in a jconf setting file.
76