# Machine Learning, Spring 2018

#### Jacobs University Bremen, Spring 2018, Herbert Jaeger

**Class
sessions:**** **Wednesdays 8:15 – 9:30
Thursdays 9:45-11:00, lecture hall in Res. 1

**Tutorial
sessions: **Tuesdays 19:00 – 20:15, lecture hall in Res. 1

**TAs**: Tayyab Mateen (t.mateen at
jacobs-university.de) and Tianlin Liu (t.liu at
jacobs-university.de)

**Course
description. **Machine learning (ML) is all about algorithms
which are fed with (large quantities of) real-world data, and which
return a compressed model' of the data. An example is
the world model' of a robot: the input data are sensor data
streams,
from which the robot learns a model of its environment -- needed, for
instance, for navigation. Another example is a spoken language model:
the input data are speech recordings, from which ML methods build a
model of spoken English -- useful, for instance, in automated speech
recognition systems. There is a large number of formalisms in which
such
models can be cast, and an equally large diversity of learning
algorithms. However, there is a relatively small number of fundamental
challenges which are common to all of these formalisms and algorithms:
most notably, the "curse of dimensionality'' and the almost
deadly-dangerous problem of under- vs.\ overfitting. This lecture
introduces such fundamental concepts and illustrates them with a
choice of elementary model formalisms (linear classifiers and
regressors, radial basis function networks, clustering, mixtures of
Gaussians, Parzen windows). Furthermore, the course also provides a
refresher of the requisite concepts from probability theory,
statistics, and linear algebra.

**Homework. **There will be two kinds of homeworks,
which are treated quite differently. **A. Paper-and-pencil
problems**.
These homeworks give an opportunity to exercise the theoretical
concepts introduced in the lecture. These homeworks will not be
checked or graded, and doing them is not mandatory. Instead, the
problems will be discussed and show-solved in weekly tutorial sessions
held by the TAs. Model solutions will be put online a week after
issuing the problem
sheets. **B. Programming miniprojects.** The other type
of
homework comes in the form of small-sized machine learning
programming projects. Students work in teams of two, each team
submitting a single solution, by email to the TAs, consisting of the
code and a documentation (typeset pdf document, preferably generated
in Latex, other word processing software allowed). These miniproject
homeworks will be graded. Programming can be done in Matlab or
Python.

**Grading and exams:** Grading and exams:
The final course grade will be composed from programming homeworks
(20%), presence
sheets (10%), and quizzes/exams. There will be three miniquizzes
(written in class, 30 minutes), the best two of which will each
account to 20% of the final grade (worst will be dropped), and one
final exam, counting 30%. All quizzes and the final exam are open
book.

**Fully self-contained lecture notes** are
here
(Version 1.5, April 17, 2018. *Change w.r.t. previous version:
corrected a number of typos, also in formulas. Now checked up to (and
including) Section 7*).

**Schedule **(this
will be filled in synchrony with reality as we go along)

Feb 1 | Introduction: What this course aims to achieve |

Feb 7 | Introducing the TICS
example. Continuous -> discrete data transformations: binning and
clustering. Reading: Sections 2.1, 2.2 in the lecture
notes |

Feb 8 | Discrete -> continuous
data transformations. A quick recap of basic concepts from probability
theory. Reading: Appendix A in the lecture notes Exercise
sheet 1 |

Feb 14 | The curse of
dimensionality and the concept of manifolds in high-dimensional vector
spaces. Reading: Section 2.3 |

Feb 15 | Data manifolds, continued. The field of ML: overview and
navigation guide. Reading: Section 3 Exercise
sheet 2 |

Feb 21 | Basics of pattern
classification. A look in passing at decision trees. Reading:
Section 4 of LNs. |

Feb 22 | Optimal
decision boundaries. Reading: Section 4 Exercise
sheet 3 |

Feb 28 | Optimal decision
boundaries, finished. Dimension reduction by feature
extraction. Reading: Section 5 up to (excluding) 5.1 |

Mar 1 | K-means clustering. Reading:
Section 5.1. Exercise
sheet 4 |

Mar 6 | 19:00 CNLH: first
miniquiz |

Mar 7 | Principal
Component Analysis - principle Reading: LN Section 5.2 |

Mar 8 | PCA - mathematical properties,
algorithm. Reading: LN Section 5.3, 5.4. Exercise
sheet 5 |

Mar 14 | no
class |

Mar 15 | PCA - mathematical properties. Eigendigits. Exercise sheet 6 |

Mar 21 | Linear regresssion,
part 1. reading: LN Section 6.1, 6.2 up to (including) Equation
(19). |

Mar 22 | Linear regresssion,
part 2. reading: LN Section 6, complete. Exercise
sheet 7 |

Apr 4 | Discussion of
HW 5. Discussion of scientific writing ethics. The overfitting
problem. Reading: LN Section 7.1 |

Apr 5 | Bias-variance dilemma: examples. A quick refresher on the
concepts of random variables and expectation. |

Apr 11 | Bias-variance
dilemma: formalization. One remedy: model size
adaptation. Reading: LN Sections 7.3, 7.4 |

Apr 12 | Cross-validation. Exercise
sheet 9 Reading: LN Section 7.5 |

Apr 12 | 19:00 CNLH: second
miniquiz |

Apr 18 | Regularization, ridge regression, and why it is called
bias-variance dilemma. Reading: LN
Sections 7.6, 7.7, 7.8 |

Apr 19 | The
multilayer perceptron. Architecture. (Slides
from a 2008 Spring School course on neural networks, check out slides
44-84). Reading: LN Section 8.1 Exercise
sheet 10 |

Apr 25 | Universal approximation
properties. Training: task definition. Reading: LN
Section 8.2, 8.3 |

Apr 26 | Training MLPs:
general set-up. Optimizing empirical risk by gradient descent: general
principle. Reading: LN Section 8.3, 8.4 Exercise
sheet 11 (a substantial programming task, more than 2 weeks
processing time - yet, start early!) |

May 2 | Remaining sessions undocumented since webpage was hacked |

May 3 | |

May 9 | |

May 16 | |

May 31 | Final exam, 9:00-11:00, East Wing, IRC (pre-announcement, subject to confirmation) |

**References**

*The online lecture notes are
self-contained, and no further literature is necessary for this
course. However, if you want to study some topics in more depth, the
following are recommended references.*

Bishop, Christopher
M.: Neural Networks for Pattern Recognition (Oxford
Univ. Press, 1995.) IRC: QA76.87 .B574 1995* A recommendable basic
reference (beyond the online lecture notes) *

Bishop,
Christopher M.: Pattern Recognition and Machine Learning. Springer
Verlag, 2006 *Much
more up-to-date and comprehensive than the previously mentioned
Bishop book, but I dare say too thick and advanced for an
undergraduate course (730 pages) -- more like a handbook for
practicians. To find your way into ML, the older, slimmer Bishop book
will work better. *

Michie, D., Spiegelhalter, D.J.,
Taylor, C.C.: *Machine Learning, Neural and Statistical
Classification* (1994) Free and online at http://www.amsta.leeds.ac.uk/~charles/statlog/
and at the course
resource repository. *A transparently written book,
concentrating on classification. Good backup reading. Thanks to Mantas
for pointing this out!*

Duda, R.O., Hart, P.E., Stork,
D.G.: *Pattern Classification*, 2nd edition (John Wiley, 2001)
IRC: Q327 .D83 2001 *Covers more than the Bishop book, more
detailed and more mathematically oriented. Backup reference for the
deep probers*

T. Hastie, R. Tibshirani, J. Friedman,
*The Elements of Statistical Learning: Data Mining, Inference, and
Prediction*. Springer Verlag 2001. IRC: Q325.75 .H37
2001*
I have found this book only recently and haven't studied it in detail
–
looks extremely well written, combining (statistical) maths with
applications and principal methods of machine learning, full of
illuminating color graphics. May become my favourite. *

Farhang-Boroujeny, B.: *Adaptive Filters, Theory and
Applications* (John Wiley, 1999). IRC: TK7872.F5 F37 1998
*Some
initial portions of this book describe online linear filtering with
the
LMS algorithm, which will possibly be covered in the course*

Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT
Press, 2016. Legal online
version available. *The "bible" of deep
learning. *

Mitchell, Tom M.: *Machine
Learning* (McGraw-Hill, 1997) IRC: Q325.5 .M58 1997. *More
general and more comprehensive than the course, covers many branches
of
ML that are not treated in the course. Gives a good overview of the
larger picture of ML*

Nabney, Ian T.: *NETLAB:
Algorithms for Pattern Recognition* (Springer Verlag, 2001). IRC:
TA1637 .N33 2002. *A
companion book to the Bishop book, concentrating on Matlab
implementations of the main techniques described in the Bishop book.
Matlab code is public and can be downloaded from http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/
*

Brownlee, J.: (author's own publication, online at author's ML service portal
). * A decidedly user-friendly, hands-on intro to linear algebra, targetting ML usage, with Python exercises.*