Machine Learning, Spring 2019 • MINDS

Jacobs University Bremen, Spring 2019, Herbert Jaeger

Class sessions: Mondays 8:15-9:45 (Lecture Hall Res. III) and Wednesdays 8:15-9:45 (Lecture Hall Res. III)

Tutorial sessions: Tuesdays 17:15-18:30, West Hall 4.

TAs: Steven Abreu (s.abreu at jacobs-university.de) and Tianlin Liu (t.liu at jacobs-university.de)

Course description. Machine learning (ML) is all about algorithms which are fed with (large quantities of) real-world data, and which return a compressed model' of the data. An example is the world model' of a robot: the input data are sensor data streams, from which the robot learns a model of its environment -- needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English -- useful, for instance, in automated speech recognition systems. There is a large number of formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms: most notably, the "curse of dimensionality'' and the almost deadly-dangerous problem of under- vs. overfitting. This lecture introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, mixtures of Gaussians, Parzen windows). Furthermore, the course also provides a refresher of the requisite concepts from probability theory, statistics, and linear algebra.

Homework. There will be two kinds of homeworks, which are treated quite differently. A. Paper-and-pencil problems. These homeworks give an opportunity to exercise the theoretical concepts introduced in the lecture. These homeworks will not be checked or graded, and doing them is not mandatory. Instead, the problems will be discussed and show-solved in weekly tutorial sessions held by the TA. Model solutions will be put online a week after issuing the problem sheets. B. Programming miniprojects. The other type of homework comes in the form of small-sized machine learning programming projects. Students work in teams of two, each team submitting a single solution, by email to the TA, consisting of the code and a documentation (typeset pdf document, preferably generated in Latex, other word processing software allowed). These miniproject homeworks will be graded. Programming can be done in Matlab or Python.

Grading and exams: Grading and exams: The final course grade will be composed from programming homeworks (20%), quizzes (50%) and a final exam (30%). There will be three quizzes (written in class, 30 minutes), the best two of which will each account to 25% of the final grade (worst will be dropped). All quizzes and the final exam are open book.

Quiz makeup rules: if a quiz is missed without excuse, it will be graded with 0 points. One makeup will be offered soon after the quiz for medically excused quizzes according to the Jacobs rules (especially, the medical excuse must be announced to me before the quiz). Non-medical excuses can be accepted and makeups be arranged on a case-by-case basis. If the first makeup is likewise missed for medical reasons, similar rules apply to get admitted to a second makeup (medical excuse must be announced to me before the makeup). The second makeup is then to sit for the quiz in the next year's edition of this course; or the student may opt to get the grade of the final exam counted also as grade for the quiz.

The 2018 final exam for your private study and preparation is here. And the solutions are here.

Fully self-contained lecture notes are here (version 1.11, last update Mar 5, change: typo in Appendix A (eqn. 74) corrected) .

Schedule (this will be filled in synchrony with reality as we go along)

Feb 6	Introduction
Feb 11	Introducing the TICS example. Continuous <-> discrete data transformations. Reading: Sections 2.1, 2.2 in the lecture notes
Feb 13	A quick recap of basic concepts from probability theory. Reading: Appendix A in the lecture notes Exercise sheet 1 \| Solutions
Feb 18	The curse of dimensionality and the concept of manifolds in high-dimensional vector spaces. Reading: Section 2.3
Feb 20	The field of ML: overview and navigation guide. Reading: Section 3 Exercise sheet 2 \| Solutions
Feb 25	Basics of pattern classification. A look in passing at decision trees. Optimal decision boundaries. Reading: Section 4 of LNs
Feb 27	Dimension reduction through vector quantization: K-means clustering. Reading: LN Section 5.1 Exercise sheet 3 \| Solutions \| Miniproject 1 The first programming miniproject - this will be graded!
Mar 4	Principal Component Analysis - principle. Reading: LN Section 5.2
Mar 6	PCA - mathematical properties, algorithm. Eigendigits etc. Reading: LN Section 5.3, 5.4, 5.5 Exercise sheet 4 (paper and pencil exercise, not to be returned, not graded) \| Solutions ... and at noontime: first miniquiz. Time: 12:45-13:15 Location: CNLH
Mar 11	Linear regresssion, part 1. Reading: LN Section 6.1, 6.2 up to (including) Equation 19).
Mar 13	Linear regresssion, part 2. Reading: LN Section 6, complete.
Mar 18	A probability refresher: expectation, variance, covariance. -- Training and testing errors. Reading: LN Appendix D and LN Section 7.1 Exercise sheet 5 \| Solutions
Mar 20	The problem of overfitting. Reading: LN Section 7.2
Mar 25	Supervised learning: formal theory. Risk minimization through adapting model size. Reading: LN Sections 7.3, 7.4
Mar 27	Cross-validation: the key to everybody's success in ML. Exercise sheet 6 \| Solutions \| Miniproject 2 The second programming miniproject - this will be graded! --- Reading: LN Section 7.5 ... and at noontime: second miniquiz. Time: 12:45-13:15 Location: CNLH
Apr 1	Using regularization to fight overfitting. Ridge regression. Reading: LN Sections 7.6, 7.7
Apr 3	Why it's called the Bias-Variance Dilemma. Reading: LN Sections 7.8 Exercise 7 \| Solutions
Apr 8	Neural networks: introduction. Historical forefather: the perceptron. No reading. Slides
Apr 10	A zoo of neural networks -- "connectionist", associative memory networks, Boltzmann machines, spiking. No reading.
Apr 24	The multilayer perceptron: architecture. Universal approximation property. Reading: LN Section 8.2
Apr 29	Why deep is good.
May 6	Training MLPs: general scheme. Remarks on model optimization by gradient descent. Reading: LN Sections 8.3, 8.4
May 8	The backpropagation algorithm. A wonderful webpage! *** Exercise (fun) Nr. 8 Reading: LN Section 8.5... and at noontime: third miniquiz. Time: 12:45-13:15 Location: CNLH
May 13	Introduction to recurrent neural networks. This material will not be queried in the final exam!
May 15	Reservoir computing. This material will not be queried in the final exam!

May 23	9:00-11:00, SSC Hall 3 and 4: Final exam

References

The online lecture notes are self-contained, and no further literature is necessary for this course. However, if you want to study some topics in more depth, the following are recommended references.

Bishop, Christopher M.: Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995.) IRC: QA76.87 .B574 1995 A recommendable basic reference (beyond the online lecture notes)

Bishop, Christopher M.: Pattern Recognition and Machine Learning. Springer Verlag, 2006 Much more up-to-date and comprehensive than the previously mentioned Bishop book, but I dare say too thick and advanced for an undergraduate course (730 pages) -- more like a handbook for practicians. To find your way into ML, the older, slimmer Bishop book will work better.

Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/ and at the course resource repository. A transparently written book, concentrating on classification. Good backup reading. Thanks to Mantas for pointing this out!

Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edition (John Wiley, 2001) IRC: Q327 .D83 2001 Covers more than the Bishop book, more detailed and more mathematically oriented. Backup reference for the deep probers

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag 2001. IRC: Q325.75 .H37 2001 I have found this book only recently and haven't studied it in detail – looks extremely well written, combining (statistical) maths with applications and principal methods of machine learning, full of illuminating color graphics. May become my favourite.

Farhang-Boroujeny, B.: Adaptive Filters, Theory and Applications (John Wiley, 1999). IRC: TK7872.F5 F37 1998 Some initial portions of this book describe online linear filtering with the LMS algorithm, which will possibly be covered in the course

Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, 2016. Legal online version available. The "bible" of deep learning.

Mitchell, Tom M.: Machine Learning (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. More general and more comprehensive than the course, covers many branches of ML that are not treated in the course. Gives a good overview of the larger picture of ML

Nabney, Ian T.: NETLAB: Algorithms for Pattern Recognition (Springer Verlag, 2001). IRC: TA1637 .N33 2002. A companion book to the Bishop book, concentrating on Matlab implementations of the main techniques described in the Bishop book. Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/

Brownlee, J.: (author's own publication, online at author's ML service portal ). A decidedly user-friendly, hands-on intro to linear algebra, targetting ML usage, with Python exercises.