11# anaGo
2- *** anaGo*** is a state-of-the-art library for sequence labeling using Keras.
2+ *** anaGo*** is a Keras implementation of sequence labeling.
33
4- anaGo can performs named-entity recognition (NER), part -of-speech tagging (POS tagging), semantic role labeling (SRL) and so on for ** many languages** .
5- For example, ** English Named- Entity Recognition** is shown in the following picture :
4+ anaGo can perform Named Entity Recognition (NER), Part -of-Speech tagging (POS tagging), semantic role labeling (SRL) and so on for ** many languages** .
5+ For example, the following picture shows ** Named Entity Recognition in English ** :
66<img src =" https://github.com/Hironsan/anago/blob/docs/docs/images/example.en2.png?raw=true " >
77
8- ** Japanese Named- Entity Recognition** is shown in the following picture :
8+ The following picture shows ** Named Entity Recognition in Japanese ** :
99<img src =" https://github.com/Hironsan/anago/blob/docs/docs/images/example.ja2.png?raw=true " >
1010
11- Similarly, ** you can solve your task for your language.**
11+ Similarly, ** you can solve your task (NER, POS,...) for your language.**
12+ You don't have to define features.
1213You have only to prepare input and output data. :)
1314
14- ## Feature Support
15- anaGo provide following features:
16- * learning your own task without any knowledge .
17- * defining your own model.
18- * ~~ (Not yet supported) downloading learned model for many tasks. (e.g. NER, POS Tagging, etc...) ~~
15+ ## anaGo Support Features
16+ anaGo supports following features:
17+ * training the model without any features .
18+ * defining the custom model.
19+ * downloading pre-trained models.
1920
2021
2122## Install
@@ -34,8 +35,8 @@ $ pip install -r requirements.txt
3435```
3536
3637## Data and Word Vectors
37- The data must be in the following format(tsv) .
38- We provide an example in train.txt :
38+ Training data takes a tsv format.
39+ The following text is an example of training data :
3940
4041```
4142EU B-ORG
@@ -52,7 +53,7 @@ Peter B-PER
5253Blackburn I-PER
5354```
5455
55- You also need to download [ GloVe vectors] ( https://nlp.stanford.edu/projects/glove/ ) and store it in * data/glove.6B * directory .
56+ anaGo supports pre-trained word embeddings like [ GloVe vectors] ( https://nlp.stanford.edu/projects/glove/ ) .
5657
5758## Get Started
5859### Import
@@ -63,7 +64,7 @@ from anago.reader import load_data_and_labels
6364```
6465
6566### Loading data
66- After importing the modules, load training, validation and test dataset:
67+ After importing the modules, load [ training, validation and test dataset] ( https://github.com/Hironsan/anago/blob/master/data/conll2003/en/ner/ ) :
6768``` python
6869x_train, y_train = load_data_and_labels(' train.txt' )
6970x_valid, y_valid = load_data_and_labels(' valid.txt' )
@@ -74,13 +75,13 @@ Now we are ready for training :)
7475
7576
7677### Training a model
77- Let's train a model. For training a model, we can use train method:
78+ Let's train a model. To train a model, call ` train ` method:
7879``` python
7980model = anago.Sequence()
8081model.train(x_train, y_train, x_valid, y_valid)
8182```
8283
83- If training is progressing normally, progress bar will be displayed as follows :
84+ If training is progressing normally, progress bar would be displayed:
8485
8586``` commandline
8687...
@@ -98,7 +99,7 @@ Epoch 5/15
9899
99100
100101### Evaluating a model
101- To evaluate the trained model, we can use eval method:
102+ To evaluate the trained model, call ` eval ` method:
102103
103104``` python
104105model.eval(x_test, y_test)
@@ -111,20 +112,21 @@ After evaluation, F1 value is output:
111112
112113### Tagging a sentence
113114Let's try tagging a sentence, "President Obama is speaking at the White House."
114- We can do it as follows:
115+ To tag a sentence, call ` analyze ` method:
116+
115117``` python
116118>> > words = ' President Obama is speaking at the White House.' .split()
117119>> > model.analyze(words)
118120{
119121 ' words' : [
120- ' President' ,
121- ' Obama' ,
122- ' is' ,
123- ' speaking' ,
124- ' at' ,
125- ' the' ,
126- ' White' ,
127- ' House.'
122+ ' President' ,
123+ ' Obama' ,
124+ ' is' ,
125+ ' speaking' ,
126+ ' at' ,
127+ ' the' ,
128+ ' White' ,
129+ ' House.'
128130 ],
129131 ' entities' : [
130132 {
@@ -145,6 +147,16 @@ We can do it as follows:
145147}
146148```
147149
150+ ### Downloading pre-trained models
151+ To download a pre-trained model, call ` download ` function:
152+ ``` python
153+ from anago.utils import download
154+
155+ dir_path = ' models'
156+ url = ' https://storage.googleapis.com/chakki/datasets/public/models.zip'
157+ download(url, dir_path)
158+ model = anago.Sequence.load(dir_path)
159+ ```
148160
149161## Reference
150162This library uses bidirectional LSTM + CRF model based on
0 commit comments