Codes for the project
-结果
-参考文献
/Basic 作为 baseline 的 bert 和 Albert 的实现
/DistillBert 对于 Basic 中模型的蒸馏
/Preprocessor 前处理
/Postprocessor 后处理
- Roberta_wwm
- Albert_zn
- TinyBert
- FastBert
- BiGRU



- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- TinyBERT: Distilling BERT for Natural Language Understanding
- FastBERT: a Self-distilling BERT with Adaptive Inference Time
- Transformer to CNN: Label-scarce distillation for efficient text classification
- Distilling task-specific knowledge from bert into simple neural networks
- Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data

