-
Notifications
You must be signed in to change notification settings - Fork 22
Piper Commands
There are only nine main commands that can be used in piper files, divided into five major functions, plus variable use.
- Setting a Collection Reader
- Adding an Annotator
- Setting a Variable Value
- Loading an external Piper File
- Specifying a Directory or Package
- Using Variable Value Substitution
| Command | Parameter 1 | Parameters 2-n | Description |
|---|---|---|---|
reader |
CR name | <name=value ...> | Set the collection reader for pipeline input data. |
add |
AE or CC name | <name=value ...> | Add AE/CC to pipeline. |
addLast |
AE or CC name | <name=value ...> | Add AE/CC to the end of pipeline. Useful if the pipeline is meant to be extended. |
addDescription |
AE or CC name | <value ...> | Add AE/CC to pipeline using its .createAnnotatorDescription method. |
addLogged |
AE or CC name | <name=value ...> | Add AE/CC to pipeline with Start/Finish logging. |
set |
name=value | <name=value ...> | Add global variable values. |
cli |
name=char | <name=char ...> | Add global variable values based upon command-line character option values. |
load |
Piper file path | Load external piper file. | |
package |
package path | Add to known packages. Shortens load and add specifications. | |
// # !
|
comment text | Line Comment. |
You can find examples of all the above commands in the piper files included with cTAKES.
A Collection Reader is the first component in a Pipeline. It reads documents for the rest of the pipeline to process. A reader can read documents from files on disk, cells in a database, lines in a file, content on the internet, text typed at a prompt, and so forth.
The default reader in cTAKES is the File Tree Reader, which reads document texts from plaintext files in a directory tree. Document files can be organized in directories by Patient or have all files in a single directory. For instance, either of the two following directory structures will work for the File Tree Reader:
myCorpusDirectory/
patient_1/
Doc_A.txt
Doc_B.txt
Doc_C.txt
patient_2/
Doc_ABC.txt
Doc_XYZ.txt
myCorpusDirectory/
Doc_A.txt
Doc_B.txt
Doc_C.txt
Doc_ABC.txt
Doc_XYZ.txt
For each of the cases above, you would set your input directory to myCorpusDirectory.
You can do this using InputDirectory or -i.
See Setting a Variable Value below.
In the first directory example, cTAKES will take advantage of the subdirectories (patient_1, patient_2),
and during processing it will assign documents to the appropriate patient ID.
This can be very useful, as some Annotation Engine, including output writers,
can take advantage of the Patient ID. Without the subdirectories, all documents will belong to a DefaultPatient.
To use a collection reader for the pipeline other than the File Tree Reader,
use the piper command: reader, followed by the name of the collection reader.
For instance, to use a reader named MyDocumentReader, add this line to your piper file:
reader MyDocumentReader
If the default reader File Tree Reader fits your purposes,
you do not need to use the reader command.
An Annotator, also known as an Annotation Engine, is a pipeline component that performs some work on the document text, patient information, cohort information, or anything else that needs to be processed by the pipeline.
Many annotators come with cTAKES, but a custom annotator, depending upon its purpose, is generally easy to create.
There are four commands that will add components to a pipeline.
To add a component to the pipeline, use the piper command add.
This is the command primarily used to add components to a pipeline.
For instance, to add an annotator named MyAnnotatorA, add this line to your piper file:
add MyAnnotatorA
The placement of an add command in the piper file determines when the component will be executed when the pipeline is run.
Components are executed in the same order in which the add command appears in the piper file.
For instance, when the piper contains the lines:
add MyAnnotatorA
add MyAnnotatorB
add MyAnnotatorC
the execution order of annotators in the pipeline will be:
- MyAnnotatorA
- MyAnnotatorB
- MyAnnotatorC
However, with the lines:
add MyAnnotatorB
add MyAnnotatorA
add MyAnnotatorC
the execution order of annotators in the pipeline will be:
- MyAnnotatorB
- MyAnnotatorA
- MyAnnotatorC
addLast works exactly like the add command,
but it informs cTAKES that the annotator should be run at the end of the pipeline,
regardless of its placement in the piper file.
For instance, when the piper contains the lines:
addLast MyAnnotatorA
add MyAnnotatorB
add MyAnnotatorC
the execution order of annotators in the pipeline will be:
- MyAnnotatorB
- MyAnnotatorC
- MyAnnotatorA
When multiple addLast commands are used,
an annotator is added to the end of the pipeline in the same order in which the addLast command appears in the piper file.
For instance, when the piper contains the lines:
addLast MyAnnotatorA
addLast MyAnnotatorB
add MyAnnotatorC
the execution order of annotators in the pipeline will be:
- MyAnnotatorC
- MyAnnotatorA
- MyAnnotatorB
addDescription works like the add command, but it utilizes a special method of annotator creation by executing
a java method named createAnnotatorDescription(). This java method exists in very few annotators.
It may be in older annotators, but even when it does exist it is not a preferred manner of creating the annotator.
addDescription() methods are deprecated and will be removed in a future version of cTAKES.
The addDescription command should be used only in rare instances.
When it is used, its basic syntax maatches that of the add command.
addDescription MyAnnotatorA
Annotators are executed in the same order in which the addDescription command appears in the piper file.
addLogged works like the add command, but it will add log messages before and/or after an annotator
initializes and executes within the pipeline.
This may be useful with older annotators that do not themselves have logging.
However, almost all annotators in cTAKES have their own logging, and even those that do not will be upgraded in future
versions of cTAKES. The addLogged command will be useful only in rare instances.
When it is used, its basic syntax matches that of the add command.
addLogged MyAnnotatorA
Annotators are executed in the same order in which the addLogged command appears in the piper file.
Pipeline components can often use values to vary their manner of execution. Sometimes specification of these variable values are required, such as the input directory for the default collection reader File Tree Reader.
Values can be specified in three ways. The simplest is by
-
Passing a Value to the Component
There are also two piper commands that facilitate the setting of variable values: - set
-
cli
One more method to set a value is by - Using an Environment Variable
Once set, variable values will be used by pipeline components. In addition to variable values being passed explicitly to components, variable value substitution can be used within piper commands.
Values of variables can be passed directly to the component as a named parameter on the reader or add line.
For instance, to set the input directory to my_directories/input_directory for a reader named MyDocumentReader,
add this line to your piper file:
reader MyDocumentReader InputDirectory=my_directories/input_directory
In this example, the parameter name is InputDirectory, which is a standard parameter name for input directories.
The specification format for parameter values is simply parameterName=value.
Parameters can have different value types, such as text, integer, or a list of texts.
In the example above, the path of the input directory is passed as text.
Text values that require a space should be surrounded by double-quotes:
reader MyDocumentReader InputDirectory="my directories/input directory"
Specification of an integer is simple and requires no special formatting. For instance, to add an annotator MyAnnotatorA with an integer value for variable Y, use the following line:
add MyAnnotatorA Y=3
Specifying an array requires a comma between each array element. For instance, to add an annotator MyAnnotatorA with an array value for variable Z, use the following line:
add MyAnnotatorA Z=array,of,values
Note: do not use space characters before or after commas in an array. Text within an array that contains a space should be enclosed in double-quotes. For instance:
add MyAnnotatorA Z=array,of,"multiple values"
To pass a simple text value that contains commas, enclose the text in double-quotes.
For instance, the following line will pass a text value, not an array value.
add MyAnnotatorA X="not an array, just text"
Values can be passed using multiple parameters on the same line. For instance, to add an annotator MyAnnotatorA with a text value for variable X, an integer value for variable Y, and an array value for variable Z, the following lines will work:
add MyAnnotatorA X=some_text Y=3 Z=array,of,values
Specifying more than one value on a single line can lead to long lines that are difficult to read. For instance:
add MyAnnotatorA X=some_text Y=3 Z=array,of,values Z1=array,of,"multiple values" X1="not an array, just text"
To increase readability, you can use the set command to specify variable values for all following components.
For instance, the following commands achieve the same result as the command line above:
set Z=array,of,values
set Z1=array,of,"multiple values"
set X1="not an array, just text"
add MyAnnotatorA X=some_text Y=3
In addition to decreasing length of a single lines in a piper file, the set command can save space on multiple following lines.
This is because once a variable is set, that value is used by every following component.
For instance, if AnnotatorA, AnnotatorB, and AnnotatorC all use a variable named commonX, and you want to use the value
same_value for each. Instead of using the repetitive lines:
add AnnotatorA commonX=same_value
add AnnotatorB commonX=same_value
add AnnotatorC commonX=same_value
you can use the lines:
set commonX=same_value
add AnnotatorA
add AnnotatorB
add AnnotatorC
Variables set as parameters on a line will override values set by a preceding set command.
For instance, if AnnotatorA, AnnotatorB, and AnnotatorC all use a variable named commonX and you want to use the value
same_value for AnnotatorA and AnnotatorC, but you want to use the value different_value for AnnotatorB,
simply set the value specifically for AnnotatorB:
set commonX=same_value
add AnnotatorA
add AnnotatorB commonX=different_value
add AnnotatorC
Similar to the add command, you can specify multiple values on a single line using set. For instance:
set X=some_text Y=3 Z=array,of,values
The command cli is an acronym for "command-line interface".
That full name gives a hint as to what it does.
cli allows you to specify the value of a variable on the command line of a terminal or in a shell script
by passing the value as a parameter to the command used to run the piper file. As a reference:
RunPiperFile -p MyPiper -m some_textTo use the cli command you need to know the name of the variable,
just as you would to pass the value to a component or to use the set command.
Decide upon the name of a parameter that you would like to pass on the command line.
Then you can use cli to inform cTAKES that the parameter names are equal.
For instance, to indicate that you want to set the piper parameter myParameter to be set using the parameter m
on the command line, you would use the line:
cli myParameter=m
Then, to set the value of myParameter to some_text on the command line, (using the RunPiperFile script):
RunPiperFile -p MyPiper -m some_textThis will set the value of myParameter to some_text during cTAKES processing.
Note the use of a single dash before the parameter character, -m.
There are currently 3 reserved letters that cannot be used with cli: p, i, and o,
which stand for "PiperFile", "InputDirectory" and "OutputDirectory".
Any other lower-case character can be used with the cli command, and as cli is case-sensitive,
any upper-case character can also be used. Just as with the add and set commands,
multiple parameters can be set on a single line:
cli myParameter=m myOtherParameter=M
Should a single character not suffice for a command-line parameter name, you can specify a word instead. There are two requirements that must be met for this to work.
- You must use the parameter prefix ++ instead of the normal prefix -.
- You must add the word-length ++ parameter after all character-length - parameters on the command line. For instance, to simply set the component parameter myParameter on the command line with myParameter, use this line in the piper file:
cli myLongParameter=myParameter
and this on the command line:
RunPiperFile -p MyPiper ++myParameter some_textAny system environment variable with the same name as a component's variable will be used,
unless it is overridden by a value specified with add, set, or cli.
For instance, if you have a system environment variable named myParameter,
cTAKES will use its value as it would any value specified in a piper file.
cTAKES has many pipelines, such as the Default Clinical Pipeline,
that are actually comprised of smaller pipelines all working in order.
This is achieved using the load command, followed by the name of a piper file.
For example, from part of the DefaultFastPipeline
piper file:
// Add Chunkers
load ChunkerSubPipe
// Default fast dictionary lookup
load DictionarySubPipe
// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe
You can see that three other piper files are loaded:
- ChunkerSubPipe
- DictionarySubPipe
-
AttributeCleartkSubPipe
Piper files are executed in the order that they are added to the pipeline with the
loadcommand.
To find external resources like and Annotation Engine, piper file or resource file, you may need to specify a Java package or directory in which the resource resides. This can be done directly with the value, such as with the location of FinishedLogger in the DefaultFastPipeline:
// Log run time stats and completion
addLast util.log.FinishedLogger
In this case, cTAKES does not automatically know where to find FinishedLogger, but we tell it to look in util.log.
If we would like to use many Annotation Engines in util.log,
then we can specify it as a known location for cTAKES to search. This is done with the package command.
For instance, you can use the package command to substitute the former for the latter:
add myCustomLocation.forThings.MyAnnotatorA
add myCustomLocation.forThings.MyAnnotatorB
add myCustomLocation.forThings.MyAnnotatorC
package myCustomLocation.forThings
add MyAnnotatorA
add MyAnnotatorB
add MyAnnotatorC
In the second example, cTAKES has been informed that things might be located in myCustomLocation.forThings,
so it will search there if it cannot find them elsewhere.
The package command also works for directories and facilitates specification of things like piper files.
You can have multiple package specifications, they will all be searched.
Note: Behavior may be unpredictable if two resources with the same name are located in two different locations specified by package.
Many variables are automatically used by Collection Reader and Annotation Engine components,
such as InputDirectory with the File Tree Reader.
You can also use a variable anywhere (after it is set) within the piper file.
It is quite simple to do this, just use the name of the variable preceded by a dollar sign ($).
For instance, you can use set or cli to set the value of a variable with the name usefulDirectory,
and reuse it in your piper file by passing a value to the component:
reader MyDocumentReader InputDirectory=$usefulDirectory
add MyAnnotatorA Y=$usefulDirectory
You can have characters before or after the variable, for instance:
reader MyDocumentReader InputDirectory=$usefulDirectory/myCorpusDirectory
add MyAnnotatorA Y=$usefulDirectory/myModelDirectory
add MyAnnotatorB Z=almost_named_like_$usefulDirectory
If you have a variable name with the beginning of another variable name, such as usefulDirectory and usefulDirectory_B,
the best fit will be used. For instance, in the following code, MyAnnotatorA will be passed the value of usefulDirectory_B,
not the value of usefulDirectory followed by "_B".
reader MyDocumentReader InputDirectory=$usefulDirectory
add MyAnnotatorA Y=$usefulDirectory_B
Variable value substitution can be very powerful when combined with the cli command or system environment variables. Another use is passing variable values to processes started externally. A good example of variable use is in the ctakes-examples PbjWordFinder.piper:
// Start another instance of cTAKES, running the pipeline in PbjThirdStep.piper
// $OutputDirectory will substitute the value of this cTAKES pipeline's value for OutputDirectory.
// $ArtemisBroker will substitute the value of this cTAKES pipeline's value for ArtemisBroker.
add CtakesRunner Pipeline="-p PbjThirdStep -o $OutputDirectory -a $ArtemisBroker"
