Skip to content

Commit aea21e6

Browse files
committed
Update README
1 parent 9cb828f commit aea21e6

File tree

2 files changed

+53
-26
lines changed

2 files changed

+53
-26
lines changed

README.md

Lines changed: 38 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
11
# anaGo
2-
***anaGo*** is a state-of-the-art library for sequence labeling using Keras.
2+
***anaGo*** is a Keras implementation of sequence labeling.
33

4-
anaGo can performs named-entity recognition (NER), part-of-speech tagging (POS tagging), semantic role labeling (SRL) and so on for **many languages**.
5-
For example, **English Named-Entity Recognition** is shown in the following picture:
4+
anaGo can perform Named Entity Recognition (NER), Part-of-Speech tagging (POS tagging), semantic role labeling (SRL) and so on for **many languages**.
5+
For example, the following picture shows **Named Entity Recognition in English**:
66
<img src="https://github.com/Hironsan/anago/blob/docs/docs/images/example.en2.png?raw=true">
77

8-
**Japanese Named-Entity Recognition** is shown in the following picture:
8+
The following picture shows **Named Entity Recognition in Japanese**:
99
<img src="https://github.com/Hironsan/anago/blob/docs/docs/images/example.ja2.png?raw=true">
1010

11-
Similarly, **you can solve your task for your language.**
11+
Similarly, **you can solve your task (NER, POS,...) for your language.**
12+
You don't have to define features.
1213
You have only to prepare input and output data. :)
1314

14-
## Feature Support
15-
anaGo provide following features:
16-
* learning your own task without any knowledge.
17-
* defining your own model.
18-
* ~~(Not yet supported)downloading learned model for many tasks. (e.g. NER, POS Tagging, etc...)~~
15+
## anaGo Support Features
16+
anaGo supports following features:
17+
* training the model without any features.
18+
* defining the custom model.
19+
* downloading pre-trained models.
1920

2021

2122
## Install
@@ -34,8 +35,8 @@ $ pip install -r requirements.txt
3435
```
3536

3637
## Data and Word Vectors
37-
The data must be in the following format(tsv).
38-
We provide an example in train.txt:
38+
Training data takes a tsv format.
39+
The following text is an example of training data:
3940

4041
```
4142
EU B-ORG
@@ -52,7 +53,7 @@ Peter B-PER
5253
Blackburn I-PER
5354
```
5455

55-
You also need to download [GloVe vectors](https://nlp.stanford.edu/projects/glove/) and store it in *data/glove.6B* directory.
56+
anaGo supports pre-trained word embeddings like [GloVe vectors](https://nlp.stanford.edu/projects/glove/).
5657

5758
## Get Started
5859
### Import
@@ -63,7 +64,7 @@ from anago.reader import load_data_and_labels
6364
```
6465

6566
### Loading data
66-
After importing the modules, load training, validation and test dataset:
67+
After importing the modules, load [training, validation and test dataset](https://github.com/Hironsan/anago/blob/master/data/conll2003/en/ner/):
6768
```python
6869
x_train, y_train = load_data_and_labels('train.txt')
6970
x_valid, y_valid = load_data_and_labels('valid.txt')
@@ -74,13 +75,13 @@ Now we are ready for training :)
7475

7576

7677
### Training a model
77-
Let's train a model. For training a model, we can use train method:
78+
Let's train a model. To train a model, call `train` method:
7879
```python
7980
model = anago.Sequence()
8081
model.train(x_train, y_train, x_valid, y_valid)
8182
```
8283

83-
If training is progressing normally, progress bar will be displayed as follows:
84+
If training is progressing normally, progress bar would be displayed:
8485

8586
```commandline
8687
...
@@ -98,7 +99,7 @@ Epoch 5/15
9899

99100

100101
### Evaluating a model
101-
To evaluate the trained model, we can use eval method:
102+
To evaluate the trained model, call `eval` method:
102103

103104
```python
104105
model.eval(x_test, y_test)
@@ -111,20 +112,21 @@ After evaluation, F1 value is output:
111112

112113
### Tagging a sentence
113114
Let's try tagging a sentence, "President Obama is speaking at the White House."
114-
We can do it as follows:
115+
To tag a sentence, call `analyze` method:
116+
115117
```python
116118
>>> words = 'President Obama is speaking at the White House.'.split()
117119
>>> model.analyze(words)
118120
{
119121
'words': [
120-
'President',
121-
'Obama',
122-
'is',
123-
'speaking',
124-
'at',
125-
'the',
126-
'White',
127-
'House.'
122+
'President',
123+
'Obama',
124+
'is',
125+
'speaking',
126+
'at',
127+
'the',
128+
'White',
129+
'House.'
128130
],
129131
'entities': [
130132
{
@@ -145,6 +147,16 @@ We can do it as follows:
145147
}
146148
```
147149

150+
### Downloading pre-trained models
151+
To download a pre-trained model, call `download` function:
152+
```python
153+
from anago.utils import download
154+
155+
dir_path = 'models'
156+
url = 'https://storage.googleapis.com/chakki/datasets/public/models.zip'
157+
download(url, dir_path)
158+
model = anago.Sequence.load(dir_path)
159+
```
148160

149161
## Reference
150162
This library uses bidirectional LSTM + CRF model based on

tests/wrapper_test.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
import anago
88
from anago.reader import load_data_and_labels, load_glove
9+
from anago.utils import download
910

1011
get_path = lambda path: os.path.join(os.path.dirname(__file__), path)
1112
DATA_ROOT = get_path('../data/conll2003/en/ner')
@@ -91,3 +92,17 @@ def test_train_vocab_init(self):
9192
model = anago.Sequence(max_epoch=15, embeddings=self.embeddings, log_dir='logs')
9293
model.train(self.x_train, self.y_train, self.x_test, self.y_test, vocab_init=vocab)
9394
model.save(dir_path=self.dir_path)
95+
96+
def test_train_all(self):
97+
x_train = np.r_[self.x_train, self.x_valid, self.x_test]
98+
y_train = np.r_[self.y_train, self.y_valid, self.y_test]
99+
model = anago.Sequence(max_epoch=15, embeddings=self.embeddings, log_dir='logs')
100+
model.train(x_train, y_train, self.x_test, self.y_test)
101+
model.save(dir_path=self.dir_path)
102+
103+
def test_download(self):
104+
dir_path = 'test_dir'
105+
url = 'https://storage.googleapis.com/chakki/datasets/public/models.zip'
106+
download(url, dir_path)
107+
model = anago.Sequence.load(dir_path)
108+
model.eval(self.x_test, self.y_test)

0 commit comments

Comments
 (0)