Skip to content

Piper Commands

Sean Finan edited this page Dec 19, 2025 · 1 revision

There are only nine main commands that can be used in piper files, divided into five major functions, plus variable use.

Table of Piper Commands

Command Parameter 1 Parameters 2-n Description
reader CR name <name=value ...> Set the collection reader for pipeline input data.
add AE or CC name <name=value ...> Add AE/CC to pipeline.
addLast AE or CC name <name=value ...> Add AE/CC to the end of pipeline. Useful if the pipeline is meant to be extended.
addDescription AE or CC name <value ...> Add AE/CC to pipeline using its .createAnnotatorDescription method.
addLogged AE or CC name <name=value ...> Add AE/CC to pipeline with Start/Finish logging.
set name=value <name=value ...> Add global variable values.
cli name=char <name=char ...> Add global variable values based upon command-line character option values.
load Piper file path Load external piper file.
package package path Add to known packages. Shortens load and add specifications.
// # ! comment text Line Comment.

You can find examples of all the above commands in the piper files included with cTAKES.


Setting a Collection Reader

A Collection Reader is the first component in a Pipeline. It reads documents for the rest of the pipeline to process. A reader can read documents from files on disk, cells in a database, lines in a file, content on the internet, text typed at a prompt, and so forth.

The default reader in cTAKES is the File Tree Reader, which reads document texts from plaintext files in a directory tree. Document files can be organized in directories by Patient or have all files in a single directory. For instance, either of the two following directory structures will work for the File Tree Reader:

  myCorpusDirectory/
      patient_1/
          Doc_A.txt
          Doc_B.txt
          Doc_C.txt
      patient_2/
          Doc_ABC.txt
          Doc_XYZ.txt
  myCorpusDirectory/
      Doc_A.txt
      Doc_B.txt
      Doc_C.txt
      Doc_ABC.txt
      Doc_XYZ.txt

For each of the cases above, you would set your input directory to myCorpusDirectory. You can do this using InputDirectory or -i. See Setting a Variable Value below. In the first directory example, cTAKES will take advantage of the subdirectories (patient_1, patient_2), and during processing it will assign documents to the appropriate patient ID. This can be very useful, as some Annotation Engine, including output writers, can take advantage of the Patient ID. Without the subdirectories, all documents will belong to a DefaultPatient.

To use a collection reader for the pipeline other than the File Tree Reader, use the piper command: reader, followed by the name of the collection reader.
For instance, to use a reader named MyDocumentReader, add this line to your piper file:

reader MyDocumentReader

If the default reader File Tree Reader fits your purposes, you do not need to use the reader command.


Adding an Annotator

An Annotator, also known as an Annotation Engine, is a pipeline component that performs some work on the document text, patient information, cohort information, or anything else that needs to be processed by the pipeline.

Many annotators come with cTAKES, but a custom annotator, depending upon its purpose, is generally easy to create.

There are four commands that will add components to a pipeline.

add

To add a component to the pipeline, use the piper command add. This is the command primarily used to add components to a pipeline. For instance, to add an annotator named MyAnnotatorA, add this line to your piper file:

add MyAnnotatorA

The placement of an add command in the piper file determines when the component will be executed when the pipeline is run. Components are executed in the same order in which the add command appears in the piper file.
For instance, when the piper contains the lines:

add MyAnnotatorA
add MyAnnotatorB
add MyAnnotatorC

the execution order of annotators in the pipeline will be:

  1. MyAnnotatorA
  2. MyAnnotatorB
  3. MyAnnotatorC

However, with the lines:

add MyAnnotatorB
add MyAnnotatorA
add MyAnnotatorC

the execution order of annotators in the pipeline will be:

  1. MyAnnotatorB
  2. MyAnnotatorA
  3. MyAnnotatorC

addLast

addLast works exactly like the add command, but it informs cTAKES that the annotator should be run at the end of the pipeline, regardless of its placement in the piper file.
For instance, when the piper contains the lines:

addLast MyAnnotatorA
add MyAnnotatorB
add MyAnnotatorC

the execution order of annotators in the pipeline will be:

  1. MyAnnotatorB
  2. MyAnnotatorC
  3. MyAnnotatorA

When multiple addLast commands are used, an annotator is added to the end of the pipeline in the same order in which the addLast command appears in the piper file.
For instance, when the piper contains the lines:

addLast MyAnnotatorA
addLast MyAnnotatorB
add MyAnnotatorC

the execution order of annotators in the pipeline will be:

  1. MyAnnotatorC
  2. MyAnnotatorA
  3. MyAnnotatorB

addDescription

addDescription works like the add command, but it utilizes a special method of annotator creation by executing a java method named createAnnotatorDescription(). This java method exists in very few annotators.
It may be in older annotators, but even when it does exist it is not a preferred manner of creating the annotator. addDescription() methods are deprecated and will be removed in a future version of cTAKES. The addDescription command should be used only in rare instances.
When it is used, its basic syntax maatches that of the add command.

addDescription MyAnnotatorA

Annotators are executed in the same order in which the addDescription command appears in the piper file.

addLogged

addLogged works like the add command, but it will add log messages before and/or after an annotator initializes and executes within the pipeline. This may be useful with older annotators that do not themselves have logging. However, almost all annotators in cTAKES have their own logging, and even those that do not will be upgraded in future versions of cTAKES. The addLogged command will be useful only in rare instances.
When it is used, its basic syntax matches that of the add command.

addLogged MyAnnotatorA

Annotators are executed in the same order in which the addLogged command appears in the piper file.


Setting a Variable Value

Pipeline components can often use values to vary their manner of execution. Sometimes specification of these variable values are required, such as the input directory for the default collection reader File Tree Reader.

Values can be specified in three ways. The simplest is by

Once set, variable values will be used by pipeline components. In addition to variable values being passed explicitly to components, variable value substitution can be used within piper commands.

Passing a Value to the Component

Values of variables can be passed directly to the component as a named parameter on the reader or add line. For instance, to set the input directory to my_directories/input_directory for a reader named MyDocumentReader, add this line to your piper file:

reader MyDocumentReader InputDirectory=my_directories/input_directory

In this example, the parameter name is InputDirectory, which is a standard parameter name for input directories. The specification format for parameter values is simply parameterName=value.
Parameters can have different value types, such as text, integer, or a list of texts. In the example above, the path of the input directory is passed as text. Text values that require a space should be surrounded by double-quotes:

reader MyDocumentReader InputDirectory="my directories/input directory"

Specification of an integer is simple and requires no special formatting. For instance, to add an annotator MyAnnotatorA with an integer value for variable Y, use the following line:

add MyAnnotatorA Y=3

Specifying an array requires a comma between each array element. For instance, to add an annotator MyAnnotatorA with an array value for variable Z, use the following line:

add MyAnnotatorA Z=array,of,values

Note: do not use space characters before or after commas in an array. Text within an array that contains a space should be enclosed in double-quotes. For instance:

add MyAnnotatorA Z=array,of,"multiple values"

To pass a simple text value that contains commas, enclose the text in double-quotes.
For instance, the following line will pass a text value, not an array value.

add MyAnnotatorA X="not an array, just text"

Values can be passed using multiple parameters on the same line. For instance, to add an annotator MyAnnotatorA with a text value for variable X, an integer value for variable Y, and an array value for variable Z, the following lines will work:

add MyAnnotatorA X=some_text Y=3 Z=array,of,values

set

Specifying more than one value on a single line can lead to long lines that are difficult to read. For instance:

add MyAnnotatorA X=some_text Y=3 Z=array,of,values Z1=array,of,"multiple values" X1="not an array, just text"

To increase readability, you can use the set command to specify variable values for all following components. For instance, the following commands achieve the same result as the command line above:

set Z=array,of,values
set Z1=array,of,"multiple values"
set X1="not an array, just text"
add MyAnnotatorA X=some_text Y=3

In addition to decreasing length of a single lines in a piper file, the set command can save space on multiple following lines. This is because once a variable is set, that value is used by every following component. For instance, if AnnotatorA, AnnotatorB, and AnnotatorC all use a variable named commonX, and you want to use the value same_value for each. Instead of using the repetitive lines:

add AnnotatorA commonX=same_value
add AnnotatorB commonX=same_value
add AnnotatorC commonX=same_value

you can use the lines:

set commonX=same_value
add AnnotatorA
add AnnotatorB
add AnnotatorC

Variables set as parameters on a line will override values set by a preceding set command. For instance, if AnnotatorA, AnnotatorB, and AnnotatorC all use a variable named commonX and you want to use the value same_value for AnnotatorA and AnnotatorC, but you want to use the value different_value for AnnotatorB, simply set the value specifically for AnnotatorB:

set commonX=same_value
add AnnotatorA
add AnnotatorB commonX=different_value
add AnnotatorC

Similar to the add command, you can specify multiple values on a single line using set. For instance:

set X=some_text Y=3 Z=array,of,values

cli

The command cli is an acronym for "command-line interface". That full name gives a hint as to what it does. cli allows you to specify the value of a variable on the command line of a terminal or in a shell script by passing the value as a parameter to the command used to run the piper file. As a reference:

RunPiperFile -p MyPiper -m some_text

To use the cli command you need to know the name of the variable, just as you would to pass the value to a component or to use the set command. Decide upon the name of a parameter that you would like to pass on the command line. Then you can use cli to inform cTAKES that the parameter names are equal. For instance, to indicate that you want to set the piper parameter myParameter to be set using the parameter m on the command line, you would use the line:

cli myParameter=m

Then, to set the value of myParameter to some_text on the command line, (using the RunPiperFile script):

RunPiperFile -p MyPiper -m some_text

This will set the value of myParameter to some_text during cTAKES processing. Note the use of a single dash before the parameter character, -m.

There are currently 3 reserved letters that cannot be used with cli: p, i, and o, which stand for "PiperFile", "InputDirectory" and "OutputDirectory". Any other lower-case character can be used with the cli command, and as cli is case-sensitive, any upper-case character can also be used. Just as with the add and set commands, multiple parameters can be set on a single line:

cli myParameter=m myOtherParameter=M

Should a single character not suffice for a command-line parameter name, you can specify a word instead. There are two requirements that must be met for this to work.

  1. You must use the parameter prefix ++ instead of the normal prefix -.
  2. You must add the word-length ++ parameter after all character-length - parameters on the command line. For instance, to simply set the component parameter myParameter on the command line with myParameter, use this line in the piper file:
cli myLongParameter=myParameter

and this on the command line:

RunPiperFile -p MyPiper ++myParameter some_text

Using an Environment Variable

Any system environment variable with the same name as a component's variable will be used, unless it is overridden by a value specified with add, set, or cli. For instance, if you have a system environment variable named myParameter, cTAKES will use its value as it would any value specified in a piper file.


Loading an external Piper File

cTAKES has many pipelines, such as the Default Clinical Pipeline, that are actually comprised of smaller pipelines all working in order. This is achieved using the load command, followed by the name of a piper file. For example, from part of the DefaultFastPipeline piper file:

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe

You can see that three other piper files are loaded:


Specifying a Directory or Package

To find external resources like and Annotation Engine, piper file or resource file, you may need to specify a Java package or directory in which the resource resides. This can be done directly with the value, such as with the location of FinishedLogger in the DefaultFastPipeline:

// Log run time stats and completion
addLast util.log.FinishedLogger

In this case, cTAKES does not automatically know where to find FinishedLogger, but we tell it to look in util.log. If we would like to use many Annotation Engines in util.log, then we can specify it as a known location for cTAKES to search. This is done with the package command. For instance, you can use the package command to substitute the former for the latter:

add myCustomLocation.forThings.MyAnnotatorA
add myCustomLocation.forThings.MyAnnotatorB
add myCustomLocation.forThings.MyAnnotatorC
package myCustomLocation.forThings
add MyAnnotatorA
add MyAnnotatorB
add MyAnnotatorC

In the second example, cTAKES has been informed that things might be located in myCustomLocation.forThings, so it will search there if it cannot find them elsewhere. The package command also works for directories and facilitates specification of things like piper files. You can have multiple package specifications, they will all be searched.
Note: Behavior may be unpredictable if two resources with the same name are located in two different locations specified by package.


Using Variable Value Substitution

Many variables are automatically used by Collection Reader and Annotation Engine components, such as InputDirectory with the File Tree Reader. You can also use a variable anywhere (after it is set) within the piper file. It is quite simple to do this, just use the name of the variable preceded by a dollar sign ($). For instance, you can use set or cli to set the value of a variable with the name usefulDirectory, and reuse it in your piper file by passing a value to the component:

reader MyDocumentReader InputDirectory=$usefulDirectory
add MyAnnotatorA Y=$usefulDirectory

You can have characters before or after the variable, for instance:

reader MyDocumentReader InputDirectory=$usefulDirectory/myCorpusDirectory
add MyAnnotatorA Y=$usefulDirectory/myModelDirectory
add MyAnnotatorB Z=almost_named_like_$usefulDirectory

If you have a variable name with the beginning of another variable name, such as usefulDirectory and usefulDirectory_B, the best fit will be used. For instance, in the following code, MyAnnotatorA will be passed the value of usefulDirectory_B, not the value of usefulDirectory followed by "_B".

reader MyDocumentReader InputDirectory=$usefulDirectory
add MyAnnotatorA Y=$usefulDirectory_B

Variable value substitution can be very powerful when combined with the cli command or system environment variables. Another use is passing variable values to processes started externally. A good example of variable use is in the ctakes-examples PbjWordFinder.piper:

// Start another instance of cTAKES, running the pipeline in PbjThirdStep.piper
// $OutputDirectory will substitute the value of this cTAKES pipeline's value for OutputDirectory.
// $ArtemisBroker will substitute the value of this cTAKES pipeline's value for ArtemisBroker.

add CtakesRunner Pipeline="-p PbjThirdStep -o $OutputDirectory -a $ArtemisBroker"

Clone this wiki locally