Language Model

An empirical study of smoothing techniques for language modeling http://www.aclweb.org/anthology/P96-1041 Scalable modified kneser-ney language model estimation https://kheafield.com/papers/edinburgh/estimate_paper.pdf

NNLM

A neural probabilistic language model http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Word2Vec

Efficient Estimation of Word Representations in Vector Space https://arxiv.org/pdf/1301.3781.pdf

Seq2Seq

LSTM

Best Tutorial http://colah.github.io/posts/2015-08-Understanding-LSTMs/

RNNLM

RECURRENT NEURAL NETWORK REGULARIZATION https://arxiv.org/pdf/1409.2329.pdf

Gradients Descent

Best Survey http://ruder.io/optimizing-gradient-descent/index.html#minibatchgradientdescent

Hyper parameter searching

Random Search for Hyper-Parameter Optimization http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf A good answer from stackoverflow https://stats.stackexchange.com/questions/95495/guideline-to-select-the-hyperparameters-in-deep-learning