Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
https://
Overview of NLP¶
Deep Learning for NLP: An Overview of Recent Trends Chapter 8 of the book (Performance of Different Models on Different NLP Tasks) also summarizes the state-of-the-art methods fore each sub area of NLP.
Ten trends in Deep learning NLP
http://
https://
https://
The transformer architecture, which was first published at the end of 2017, addresses this by creating a way to allow parallel inputs. Each word can have a separate embedding and be process simultaneously which greatly improves training times which facilitates training on much larger datasets.
Google’s BERT and OpenAI’s GPT-2 models are based on Transformer.
transformer-XL
Semantics vs Syntactic¶
Coreferences¶
https://
Machine Translation¶
Transformer
In machine translation, self-attention also contributes to impressive results. For example, recently a model, named Transformer, was introduced in a paper with a rather bold title “Attention Is All You Need” [Vaswani, 2017]. As you can guess, this model relies only on self-attention without the use of RNNs. As a result, it is highly parallelizable and requires less time to train, while establishing state-of-the-art results on WMT2014.
Large embeddings with 2048 dimensions achieved the best results, but only by a small margin. Even small embeddings with 128 dimensions seem to have sufficient capacity to capture most of the necessary semantic information. • LSTM Cells consistently outperformed GRU Cells. • Bidirectional encoders with 2 to 4 layers performed best. Deeper encoders were significantly more unstable to train, but show potential if they can be optimized well. • Deep 4-layer decoders slightly outperformed shallower decoders. Residual connections were necessary to train decoders with 8 layers and dense residual connections offer additional robustness. • Parameterized additive attention yielded the overall best results. A well-tuned beam search with length penalty is crucial. Beam widths of 5 to 10 together with a length penalty of 1.0 seemed to work well.
seq2seq¶
https://
Attention¶
The article Attention in NLP has a very detailed summary of development and applications of Attention in NLP.
Libraries¶
SpaCy is an industrial-strength Natural Language Processing (NLP) library.
transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
https://
tensor2tensor
https://
Data¶
CoLA: The Corpus of Linguistic Acceptability
References¶
https://