Syntax-Enhanced Relation Extraction

Syntax-enhanced RE models with depedency / constituency analysis

⭐ This codebase provides scripts of applying syntax-enhanced models on biomedical RE datasets as described in the paper: https://academic.oup.com/database/article/doi/10.1093/database/baac070/6675625.

This is an improved version including replacing BioBERT with PubMedBERT and some other modifications.

There are four models either dependency-syntax-enhanced or constituency-syntax-enhanced:

CE-PubMedBERT (constituency)
CT-PubMedBERT (constituency)
Late-fusion (dependency)
MTS-PubMedBERT (dependency)

⭐ Pre-processing scripts are provided under the directory /preprocessing/. Note that we use PubMedBERT tokenizer for wordpiece-level tokenization before pre-processing (change PubMedBERT to other BERT variants on your need):

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract")

Refer to the README file in the preprocessing folder for subsequent steps (if you want to use your own corpus).

🔨 You can skip pre-processing and download data for syntax-enhanced models (BB-Rel, ChemProt, DrugProt) here:

You can also generate these data using our pre-processing scripts (refer to /preprocessing/).

Prerequisites

Python 3.8.5
Pytorch 1.13.0
Transformers 4.6.0
Stanza 1.3.0
Benepar 0.2.0

🔨 For biomedical use, we use the package CRAFT (by default) of Stanza. If needed, you need to change Stanza to a general version replacing the 11-th line in remove_markers.py and the 10-th line in dependency_parse.py with:

nlp = stanza.Pipeline(lang='en', processors='pos,lemma,tokenize,depparse', tokenize_pretokenized=True, logging_level='FATAL')

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
data/wordpiece_level_files		data/wordpiece_level_files
preprocessing		preprocessing
slurm		slurm
README.md		README.md
loader.py		loader.py
main.py		main.py
model.py		model.py
opt.py		opt.py
pubmedbert_config.json		pubmedbert_config.json
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Syntax-Enhanced Relation Extraction

Prerequisites

About

Uh oh!

Releases

Packages

Languages

anfutang/syntax-enhanced-RE

Folders and files

Latest commit

History

Repository files navigation

Syntax-Enhanced Relation Extraction

Prerequisites

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages