Deep Munich is a collaborative group of Deep Learning and Neural Network researchers in Munich. Our members represent:

Ask questions and discuss ideas in our forum:!forum/deep-munich

Sign up for our mailing list and stay up-to-date:

Join us in our weekly meeting (see below).


For questions, suggestions etc., please contact the .


Seminar in Neural Machine Translation

Thursdays 14:30 s.t., room C105 (directions)

Center for Information and Language Processing
University of Munich
Oettingenstraße 67
80538 Munich


Reading list








Torch is an open source machine learning library, a scientific computing framework, and a script language based on the Lua programming language. It provides a wide range of algorithms for deep machine learning, and uses an extremely fast scripting language LuaJIT, and an underlying C implementation. ~ Wikipedia

Code bases for Torch

General NN Resources

Online textbooks

Video courses

Blogs & Articles



[1] Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, “Character-aware neural language models,” arXiv preprint arXiv:1508.06615, 2015.

[2] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” arXiv preprint arXiv:1503.04069, 2015.

[3] R. Jozefowicz, W. Zaremba, and I. Sutskever, “An empirical exploration of recurrent network architectures,” in Proceedings of the 32nd international conference on machine learning, 2015, pp. 2342–2350.

[4] K. M. Hermann, T. Kočisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines to read and comprehend,” arXiv preprint arXiv:1506.03340, 2015.

[5] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Gated feedback recurrent neural networks,” arXiv preprint arXiv:1502.02367, 2015.

[6] H. Sak, A. Senior, and F. Beaufays, “Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition,” arXiv preprint arXiv:1402.1128, 2014.

[7] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” arXiv preprint arXiv:1411.4555, 2014.

[8] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint arXiv:1409.2329, 2014.

[9] O. İrsoy and C. Cardie, “Modeling compositionality with multiplicative recurrent neural networks,” arXiv preprint arXiv:1412.6577, 2014.

[10] X. Chen, Y. Wang, X. Liu, M. J. Gales, and P. C. Woodland, “Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch,” submitted to Proc. ISCA Interspeech, 2014.

[11] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.

[12] J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille, “Deep captioning with multimodal recurrent neural networks (m-rNN),” arXiv preprint arXiv:1412.6632, 2014.

[13] T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato, “Learning longer memory in recurrent neural networks,” arXiv preprint arXiv:1412.7753, 2014.

[14] Y. Shao, “Learning sparse recurrent neural networks in language modeling,” PhD thesis, The Ohio State University, 2014.

[15] C. Weng, D. Yu, S. Watanabe, and B.-H. F. Juang, “Recurrent deep neural networks for robust speech recognition,” in Acoustics, speech and signal processing (iCASSP), 2014 iEEE international conference on, 2014, pp. 5532–5536.

[16] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks.” in ICML (3), 2013, vol. 28, pp. 1310–1318.

[17] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” arXiv preprint arXiv:1303.5778, 2013.

[18] A. Graves, N. Jaitly, and A.-r. Mohamed, “Hybrid speech recognition with deep bidirectional LSTM,” in Automatic speech recognition and understanding (aSRU), 2013 iEEE workshop on, 2013, pp. 273–278.

[19] A. Graves, “Generating sequences with recurrent neural networks,” arXiv preprint arXiv:1308.0850, 2013.

[20] T. M. Breuel, A. Ul-Hasan, M. A. Al-Azawi, and F. Shafait, “High-performance OCR for printed english and fraktur using LSTM networks,” in Document analysis and recognition (iCDAR), 2013 12th international conference on, 2013, pp. 683–687.

[21] N. Kalchbrenner and P. Blunsom, “Recurrent convolutional neural networks for discourse compositionality,” arXiv preprint arXiv:1306.3584, 2013.

[22] M. Sundermeyer, I. Oparin, J.-L. Gauvain, B. Freiberg, R. Schluter, and H. Ney, “Comparison of feedforward and recurrent neural network language models,” in Acoustics, speech and signal processing (iCASSP), 2013 iEEE international conference on, 2013, pp. 8430–8434.

[23] Y. Shi, W.-Q. Zhang, J. Liu, and M. T. Johnson, “RNN language model with word clustering and class-based output layer,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2013, no. 1, pp. 1–7, 2013.

[24] T. Mikolov and G. Zweig, “Context dependent recurrent neural network language model.” in SLT, 2012, pp. 234–239.

[25] D. Monner and J. A. Reggia, “A generalized LSTM-like training algorithm for second-order recurrent neural networks,” Neural Networks, vol. 25, pp. 70–83, 2012.

[26] V. Frinken, F. Zamora-Martínez, S. E. Boquera, M. J. C. Bleda, A. Fischer, and H. Bunke, “Long-short term memory neural networks language modeling for handwriting recognition.” in ICPR, 2012, pp. 701–704.

[27] M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling.” in INTERSPEECH, 2012.

[28] I. Sutskever, J. Martens, and G. E. Hinton, “Generating text with recurrent neural networks,” in Proceedings of the 28th international conference on machine learning (iCML-11), 2011, pp. 1017–1024.

[29] J. Hammerton, “Named entity recognition with long short-term memory,” in Proceedings of coNLL-2003, 2003, pp. 172–175.

[30] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent nets: The difficulty of learning long-term dependencies.” Citeseer, 2001.

[31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” vol. 9, no. 8, pp. 1735–1780, 1997.

[32] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000.

[33] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, “Learning precise timing with LSTM recurrent networks,” The Journal of Machine Learning Research, vol. 3, pp. 115–143, 2003.

[34] F. Gers, “Long short-term memory in recurrent neural networks,” Unpublished PhD dissertation, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 2001.