- A Recipe for Training Neural Networks, andrej karpathy.
- The Unreasonable Effectiveness of Recurrent Neural Networks, andrej karpathy
- Recurrent Neural Network. Link1, Link2, Link3, Link4
- Chris Olah's blog Link
... (Need to list a lot) (TODO)
- Pennington et al, GloVe: Global Vectors for Word Representation Link
- Mikolov et al, Distributed Representations of Words and Phrases and their Compositionality Link
- Lample et al, Word translation without parallel data, Link
- Peters et al, ELMo: Deep contextualized word representations Link
- Radford et al, Improving language understanding by generative pre-training Link
- Devlin et al, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Link
- Guillaume Lample et al, Word Translation Without Parallel Data. Link
- Xie et al, Neural Cross-Lingual Named Entity Recognition with Minimal Resources Link
- Ganin et al, Domain-Adversarial Training of Neural Networks. Link Slide1 Slide2
- Tzeng et al, Adversarial Discriminative Domain Adaptation. Link Slide
- Riccardo et al, Adversarial Feature Augmentation for Unsupervised Domain Adaptation. Link Slide
- Goodfellow et al, Generative Adversarial Networks. Link Slide
- Arjovsky et al, Wasserstein GAN. Link
- Arjovsky et al, Towards Principled Methods for Training Generative Adversarial Networks. Link
- Salimans et al, Improved Techniques for Training GANs. Link
- Miyato et al, Spectral Normalization for Generative Adversarial Networks. Link
- Zhu et al, (Cycle-GAN) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Link
- Zhao et al, Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture. Link Slide
- Zhang et al, Aspect-augmented Adversarial Networks for Domain Adaptation. Link Slide
- Chen et al, Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification.Link
- Guillaume Lample et al, Word Translation Without Parallel Data. Link
- Collobert et al, Natural Language Processing (Almost) from Scratch. Link
- Vinyals et al, Pointer Net. Link
- Lample et al, Neural Architectures for Named Entity Recognition. Link
- Strubell et al, Fast and Accurate Entity Recognition with Iterated Dilated Convolutions, Link Slide
- Robust Multilingual Part-of-Speech Tagging via Adversarial Training. Link
- Xie et al, Neural Cross-Lingual Named Entity Recognition with Minimal Resources Link
- Cho et al, Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Link Slide
- Sutskever et al, Sequence to Sequence Learning with Neural Networks. Link Slide
- Badanau et al, Neural Machine Translation by Jointly Learning to Align and Translate. Link Slide
- Luong et al, Effective Approaches to Attention-based Neural Machine Translation. Link Slide
- Vaswani et al, Attention Is All You Need Link
- Wu et al, Pay Less Attention with Lightweight and Dynamic Convolutions Link
- Sennrich et al, Neural Machine Translation of Rare Words with Subword Units. Link
- Greff et al, LSTM: A Search Space Odyssey, Link
- Shen et al, Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. Link
- Shen et al, Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. Link
- Generative Models for Discrete Data - (From the book of Kevin Murphy, chapter 3), Link