Skip to content

Commit c8fa6bb

Browse files
authored
Add files via upload
added code to github
1 parent 670ad9f commit c8fa6bb

File tree

9 files changed

+2053
-6
lines changed

9 files changed

+2053
-6
lines changed

README.md

Lines changed: 74 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# SPTempRels
2-
SPTempRels trains and evaluates a structured perceptron model for extracting temporal relations from clinical texts, in which events and temporal expressions are given. It can also be used to replicate the experiments done by [Leeuwenberg and Moens (EACL, 2017)](http://www.aclweb.org/anthology/E/E17/E17-1108.pdf)
32

4-
# Code
5-
The code corresponding to the paper can be obtained at:
3+
SPTempRels trains and evaluates a structured perceptron model for extracting temporal relations from clinical texts, in which events and temporal expressions are given. It can also be used to replicate the experiments done by [Leeuwenberg and Moens (EACL, 2017)](http://www.aclweb.org/anthology/E/E17/E17-1108.pdf). The paper contains a detailed description of the model. The conference slides can be found [here](https://github.com/tuur/SPTempRels/raw/master/SPTempRels-EACL2017-Slides.pdf). When using this code please refer to the paper.
64

7-
http://liir.cs.kuleuven.be/software_pages/structured_learning_temporal_relation.php
85

9-
However, you can also always send me an email to request the code: aleeuw15 [at] umcutrecht [dot] nl
6+
> Any questions? Feel free to send me an email at aleeuw15@umcutrecht.nl
107
11-
# Reference
8+
9+
10+
11+
## Reference
12+
> In case of usage, please cite the corresponding publications.
1213
1314
```
1415
@InProceedings{leeuwenberg2017structured:EACL,
@@ -22,3 +23,70 @@ However, you can also always send me an email to request the code: aleeuw15 [at]
2223
}
2324
```
2425

26+
27+
### Requirements
28+
* [Gurobi](https://www.gurobi.com)
29+
- create account, download gurobi, and run setup.py
30+
* [Python2.7](https://www.python.org/downloads/release/python-2711/)
31+
* [Argparse](https://pypi.python.org/pypi/argparse)
32+
* [Numpy](http://www.numpy.org/)
33+
* [SciPy](https://www.scipy.org/)
34+
* [Networkx](https://networkx.github.io)
35+
* [Scikit-Learn](http://scikit-learn.org/stable/)
36+
* [Pandas](http://pandas.pydata.org/)
37+
38+
39+
When cTAKES output is not provided the program backs off to the [Stanford POS tagger](http://nlp.stanford.edu/software/tagger.shtml) for POS features. For this reason it is required to have the Stanford POS Tagger folder (e.g. `stanford-postagger-2015-12-09`), the `stanford-postagger.jar`, and the `english-bidirectional-distsim.tagger` file at the same level as `main.py`.
40+
41+
### Data
42+
43+
In the paper we used the [THYME](https://clear.colorado.edu/TemporalWiki/index.php/Main_Page) corpus sections as used for the [Clinical TempEval 2016](http://alt.qcri.org/semeval2016/task12/index.php?id=data) shared task. So, training, development, or test data should be provided in the anafora xml format, in the folder structure as indicated below, where in the deepest level contains the text file `ID001_clinic_001` and corresponding xml annotations `ID001_clinic_001.Temporal-Relation.gold.completed`. Notice that we refer to the top level of the THYME data (`$THYME`) also in the python calls below.
44+
45+
`$THYME`
46+
* `Train`
47+
* `ID001_clinic_001`
48+
* `ID001_clinic_001`
49+
* `ID001_clinic_001.Temporal-Relation.gold.completed.xml`
50+
* ...
51+
* `Dev`
52+
* ...
53+
* `Test`
54+
* ...
55+
56+
In our experiments we use POS, and dependency parse features from the [cTAKES Clincal Pipeline](http://ctakes.apache.org/). So, you need to provide the cTAKES output xml files as well. Here we assume these are in a directory called `$CTAKES_XML_FEATURES`. You can also call the program without the -ctakes_out argument. Then the it will use the Stanford POS Tagger for POS tag features instead (and no dependency parse features). The folder structure of this directory is:
57+
58+
`$CTAKES_XML_FEATURES`
59+
* `ID001_clinic_001.xml`
60+
* ...
61+
62+
### Experiments: Leeuwenberg and Moens (2017)
63+
To obtain the predictions from the experiments of section 4 in the paper you can use the example calls below. Each call will output the anafora xmls to the directory `$SP_PREDICTIONS`. To get more information about the usage of the tool you can run:
64+
```
65+
python main.py -h
66+
```
67+
68+
#### SP
69+
```sh
70+
python main.py $THYME 1 0 32 MUL 1000 Test -averaging 1 -local_initialization 1 -negative_subsampling 'loss_augmented' -lowercase 1 -lr 1 -output_xml_dir $SP_PREDICTIONS -constraint_setting CC -ctakes_out_dir $CTAKES_XML_FEATURES -decreasing_lr 0
71+
```
72+
73+
#### SP random
74+
```sh
75+
python main.py $THYME 1 0 32 MUL 1000 Test -averaging 1 -local_initialization 1 -negative_subsampling 'random' -lowercase 1 -lr 1 -output_xml_dir $SP_PREDICTIONS -constraint_setting CC -ctakes_out_dir $CTAKES_XML_FEATURES -decreasing_lr 0
76+
```
77+
78+
#### SP + 𝒞 *
79+
80+
```sh
81+
python main.py $THYME 1 0 32 MUL,Ctrans,Btrans,C_CBB,C_CAA,C_BBB,C_BAA 1000 Test -averaging 1 -local_initialization 1 -negative_subsampling 'loss_augmented' -lowercase 1 -lr 1 -output_xml_dir $SP_PREDICTIONS -constraint_setting CC -ctakes_out_dir $CTAKES_XML_FEATURES -decreasing_lr 0
82+
```
83+
84+
85+
#### SP + 𝚽sdr
86+
```sh
87+
python main.py $THYME 1 0 32 MUL 1000 Test -averaging 1 -local_initialization 1 -negative_subsampling 'loss_augmented' -lowercase 1 -lr 1 -output_xml_dir $SP_PREDICTIONS -constraint_setting CC -ctakes_out_dir $CTAKES_XML_FEATURES -decreasing_lr 0 -structured_features DCTR_bigrams,DCTR_trigrams
88+
```
89+
90+
91+
92+

entities.py

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
from __future__ import print_function, division
2+
3+
class Entity(object):
4+
5+
def __init__(self, type, id, string, spans, text_id, doctimerel=None, etree=None):
6+
self.type = type
7+
self.id = id
8+
self.string = string
9+
self.spans = spans
10+
self.text_id = text_id
11+
self.doctimerel = doctimerel
12+
self.phi = {}
13+
self.phi_v = None
14+
self.tokens = None
15+
self.paragraph = None
16+
self.xmltree = etree
17+
self.embedding = None
18+
self.next_event = None
19+
self.next_entity = None
20+
self.attributes = {}
21+
22+
def __str__(self):
23+
return str(self.string)
24+
25+
def __hash__(self):
26+
return hash(self.id)
27+
28+
def __eq__(self, other):
29+
return self.id == other.id
30+
31+
def __ne__(self, other):
32+
return not(self == other)
33+
34+
def type(self):
35+
return self.type
36+
37+
def ID(self):
38+
return self.id
39+
40+
def get_tokens(self):
41+
return self.tokens
42+
43+
def text_id(self):
44+
return self.text_id
45+
46+
def get_doctimerel(self):
47+
return self.doctimerel
48+
49+
def get_span(self):
50+
return self.spans[0]
51+
52+
def get_etree(self):
53+
return self.xmltree
54+
55+
def get_doc_id(self):
56+
return self.id.split('@')[2]
57+
58+
class TLink(object):
59+
60+
def __init__(self, e1, e2, tlink=None):
61+
self.e1 = e1
62+
self.e2 = e2
63+
self.tlink = tlink
64+
self.phi = {}
65+
self.phi_v = None
66+
self.tokens_ib = None
67+
self.id = None
68+
69+
def __str__(self):
70+
return str(self.e1) + '-' + str(self.e2)
71+
72+
def ID(self):
73+
if not self.id:
74+
self.id = self.e1.ID() + '-' + self.e2.ID()
75+
return self.id
76+
77+
def __hash__(self):
78+
return hash(self.id())
79+
80+
def __eq__(self, other):
81+
return self.ID() == other.ID()
82+
83+
def __ne__(self, other):
84+
return not (self == other)
85+
86+
def set_tokens_ib(self, tokens):
87+
self.tokens_ib = list(tokens)
88+
89+
def get_tokens_ib(self):
90+
return self.tokens_ib
91+
92+
def type(self):
93+
return self.e1.type + '-' + self.e2.type
94+
95+
def get_tlink(self):
96+
return self.tlink
97+
98+
def get_e1(self):
99+
return self.e1
100+
101+
def get_e2(self):
102+
return self.e2
103+
104+

evaluation.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
from __future__ import print_function, division
2+
from collections import Counter
3+
import pandas as pd
4+
5+
class Evaluation:
6+
7+
def __init__(self, Y_p, Y, name='', tasks='DCTR,TLINK'):
8+
self.name = name
9+
self.Y_p, self.Y = Y_p, Y
10+
print('\n---> EVALUATION:',self.name,'<---')
11+
if 'DCTR' in tasks.split(','):
12+
self.evaluate_e()
13+
if 'TLINK' in tasks.split(','):
14+
self.evaluate_ee()
15+
16+
def pprint(self):
17+
return 'todo'
18+
19+
def evaluate_e(self):
20+
print('\n*** Evaluating DOCTIMEREL ***')
21+
self.evaluate([yp[0] for yp in self.Y_p], [y[0] for y in self.Y])
22+
23+
def evaluate_ee(self):
24+
print('\n*** Evaluating TLINKS ***')
25+
self.evaluate([yp[1] for yp in self.Y_p], [y[1] for y in self.Y])
26+
27+
def evaluate(self, Yp, Y): # Internal evaluation, may not be the same as in Clinical TempEval (due to temporal closure and candidate generation)!
28+
Yp = [l for i in Yp for l in i]
29+
Y = [l for i in Y for l in i]
30+
labels = set(Y+Yp)
31+
print('Y:',set(Y),'Yp',set(Yp))
32+
y_actu = pd.Series(Y, name='Actual')
33+
y_pred = pd.Series(Yp, name='Predicted')
34+
confusion = Counter(zip(Y,Yp))
35+
df_confusion = pd.crosstab(y_actu, y_pred, rownames=['Actual'], colnames=['Predicted'], margins=True)
36+
print('==CONFUSION MATRIX==')
37+
print(df_confusion)
38+
print('==PER LABEL EVALUATION==')
39+
print(' P\t R\t F\t')
40+
s_TP, s_FP, s_FN = 0,0,0
41+
for l in labels:
42+
TP = confusion[(l,l)] if (l,l) in confusion else 0
43+
FP = sum([confusion[(i,l)] for i in labels if (i,l) in confusion and l!=i])
44+
FN = sum([confusion[(l,i)] for i in labels if (l,i) in confusion and l!=i])
45+
print('TP',TP,'FP',FP,'FN',FN)
46+
precision = float(TP) / (TP + FP + 0.000001)
47+
recall = float(TP) / (TP + FN + 0.000001)
48+
fmeasure = (2 * precision * recall) / (precision + recall + 0.000001)
49+
print(round(precision,4),'\t',round(recall,4),'\t',round(fmeasure,4),'\t',l)
50+
s_TP += TP
51+
s_FP += FP
52+
s_FN += FN
53+
s_prec = float(s_TP) / (s_TP + s_FP + 0.000001)
54+
s_recall = float(s_TP) / (s_TP + s_FN + 0.000001)
55+
s_fmeasure = (2 * s_prec * s_recall) / (s_prec + s_recall + 0.000001)
56+
print(round(s_prec,4),'\t',round(s_recall,4),'\t',round(s_fmeasure,4),'\t','**ALL**')
57+
58+
59+
60+

0 commit comments

Comments
 (0)