Statistical and Computational Foundations of Machine Learning

Spring 2022

Sundays 5:00pm—7:30pm in in Jacobs Academic Building, Room 473

- Instructor: Dávid Pál
- Email: david.pal@nyu.edu
- Virtual Zoom Office: https://nyu.zoom.us/my/dapal
- Ed Discussion Forum (for questions): https://edstem.org/us/courses/20105/discussion/
- Office Hours: Thursdays, 9:30am—11:00am

I do not have an office at NYU. I work from home. I come to NYU only to give lectures. If you need to contact me, please email me and/or arrange a Zoom or Google Hangouts call. Office hours are on Zoom.

In this course you will learn the theoretical foundations of machine
learning. We will explain how a machine learning algorithm can make accurate
predictions on unseen data. This property is called **generalization**. We
develop a statistical theory of generalization for prediction problems. Along
the way we analyze some popular machine learning methods (support vector
machines, boosting, perceptron, stochastic gradient descent, decision trees,
neural networks, k-nearest neighbor). Besides generalization properties of
machine learning algorithms, we will look also at their computational complexity
and computational hardness of the underlying optimization problems.

**The class will be completely theoretical. In lectures, I will give
definitions and prove theorems. In homeworks and exams, I will ask you to solve
mathematical problems and prove theorems. You will NOT do any coding, you will
NOT do any data analysis, and you will NOT build any machine learning
models.**

You must have taken courses in probability theory, linear algebra, and algorithms. You will benefit from prior exposure to machine learning (ML) (e.g. introductory ML course, neural network course, statistical inference course, practical experience). However, knowledge of ML is not a prerequisite. I will give a brief introduction to ML in the first lecture.

- Probability theory.
- Random variables, discrete and continuous probability distributions, expectation, variance, correlation, covariance, independence, Bernoulli, binomial and Gaussian distributions, law of large numbers.
- Linear algebra.
- Vector spaces, subspaces, inner products, Cauchy-Schwarz inequality, norms, triangle inequality, linear functions, matrices, vector-matrix multplication, matrix multplication, orthogonal matrices, eigen values, eigen vectors.
- Algorithms.
- Arrays, dictionaries, sorting, searching, time and memory complexity, big O and big Omega notation, basics of NP-completeness.

I expect you to know one-dimensional calculus (limits, infimum, supremum, continuity, derivatives and integrals), mathematical notation (sets, functions, logical formulas), and elementary combinatorics (combinations, permutations, binomial coeficients). You must be comfortable doing mathematical proofs.

- Homeworks (40% of grade)
- There will be 4 homework sets. You should spend significant amount of time working on the homeworks. You can work in small groups and/or consult other materials or people. However, in your solution you have to list members of your group, all the materials used and all people you consulted. Type your solutions in LaTeX, print them out, and hand in them in class.
- Midterm exam (30% of grade)
- Midterm exam will take place in place one of the lectures, roughly in the middle of the semester. The questions will have the same format as the the homework problems. You can bring and use any amount of printed materials (notes, books, research papers). However, you cannot use any device connected to the internet (phones, tablets, computers).
- Final exam (40% of grade)
- Final exam will take place during the final period, at the same time as the class. The format of the final exam will be exactly the same as the midterm exam.

Note that the total percentage adds up to 110%. That means you can make up lost points. If you want to do well in the course, make sure to submit solutions to all homework problems.

There is one 2.5 hour class meeting per week, divided by a 15 minute break. In accordance with NYU guidelines, you must wear a mask covering your face and nose whenever you are attending lecture.

Your solution must be turned **both** by email **and** as a printed out
hard-copy in class by the due date. Send emails to david.pal@nyu.edu with subject "Homework
#?". Use LaTeX and submit your solution as a PDF file. Use the following template.

LaTeX source | Handout date | Due date | Graded by | Solution PDF | Soltion LaTeX source | ||
---|---|---|---|---|---|---|---|

Homework #0 |
LaTeX source | January 25 | February 1 |
February 6 | |||

Homework #1 |
LaTeX source | February 6 | February 20 |
February 27 | Solution PDF | Solution LaTeX source | |

Homework #2 |
LaTeX source | February 20 | March 6 |
March 13 | Solution PDF | Solution LaTeX source | |

Homework #3 |
LaTeX source | March 27 | April 10 |
April 17 | Solution PDF | Solution LaTeX source | |

Homework #4 |
LaTeX source | April 10 | April 24 |
April 31 | Solution PDF | Solution LaTeX source |

I am hoping to cover the topics listed below. Additional topics (decision trees, k-nearest neighbor, neural networks) might be added if time permits.

Week 1 | January 30 | Generalization error, Test error, Optimal Bayes classifier | Chapters 1,2,3 | |

Week 2 | February 6 | Empirical Risk Minimization (ERM), Overfitting, PAC model with finite hypothesis class, Hoeffding's bound | Chapters 2,3,4; Appendix B | |

Week 3 | February 13 | Hoeffding's bound, PAC model, No free lunch theorem | Chapters 3,4,5; Appendix B | |

Week 4 | February 20 | VC dimension, Sauer's lemma, Radon's theorem, VC dimension of halfspaces, epsilon-nets, epsilon-approximations | Chapter 6 | |

Week 5 | February 27 | epsilon-nets, epsilon-approximations, sample complexity upper and lower bounds | Chapters 6,28 | |

Week 6 | March 6 | Measure theory, Upper bound on the sample complexity for Agnostic PAC model | Chapters 6, 28 | |

Week 7 | March 13 | Midterm exam |
||

Week 8 | March 20 | Spring break — No lectures |
||

Week 9 | March 27 | Non-uniform learning and computational complexity of learning | Chapters 7, 8 | |

Week 10 | April 3 | Online learning model, Halving algorithm, Mistake bound, Perceptron, Winnow, Online-to-batch conversion | Chapter 21 | |

Week 11 | April 10 | Hedge, Perceptron, Online-to-batch conversion, Boosting | Chapters 21, 10 | |

Week 12 | April 17 | Boosting, Decision trees | Chapters 10, 18 | |

Week 13 | April 24 | Least squares, logistic regression, convex learning problems | Chapters 9, 12 | |

Week 14 | May 1 | Convex learning problems, stability, regularization | Chapters 12, 13 | |

Week 15 | May 8 | Regularization stability, Online gradient descent, stochastic gradient descent | Chapters 13, 14 | |

Week 16 | May 15 | Final exam |

Your solution must be turned **both** by email **and** as a printed out
hard-copy in class by the due date. Send emails to david.pal@nyu.edu with subject "Homework
#?". Use LaTeX and submit your solution as a PDF file. Use the following template.

Collaboration is allowed on homework, but solutions must be written independently. If you use external sources (books, online documents), list them in your solutions.

*Discussion of high-level ideas for solving problems sets or labs is allowed, but all solutions and any code must be written up independently.*This reflects the reality that algorithms and machine learning research are rarely done alone. However, any researcher or practitioner must be able to communicate and work through the details of a solution individually.*Students must name any collaborators they discussed problems with at the top of each assignment (list at the top of each problem separately).**Do not write solutions or code in parallel with other students.*If problem ideas are discussed, solutions should be written or implemented at a later time, individually. I take this policy very seriously. Do not paraphrase, change variables, or in any way copy another students solution.

Fair grading is important to me. If you feel that you were incorrectly docked points on an assignment or exam, please let me know - it is very possible I or the TAs misunderstood your work. Do not wait until the end of the semester to bring up grading issues! If you notice something off, please ask ASAP.

If you are struggling in the class, contact me as soon as possible so that we can discuss what’s not working, and how we can work together to ensure you are mastering the material. It is difficult to address questions about performance at the end of the semester, or after final grades are submitted: by Tandon policy no extra-credit or makeup work can be used to improve a students grade once the semester closes. So if you are worried about your grade, seek help from me early.

Midterm and Final will be open-book written exams. They will be 2.5 hours long.

You can bring and use any amount of printed or hand-written material you want
(hand-written or printed notes, cheat sheets, textbooks, lecture notes, research
papers, etc.) All materials you use have to be in **paper** form. Use of
electronic devices will **not** be allowed. This includes computers,
laptops, tablets, e-readers, smartphones, smart-watches, headphones,
calculators, etc.

Any communication with anybody inside or outside the classroom during the
exam will be stricly prohibited. Passing of any materials or objects from one
student to another will **not** be allowed. If you would like to have a cheat
sheet or notes from your classmate, make sure to get a copy **before** the
exam.

Bring pens, pencils, and extra paper for writing. You can bring ruler and
compass if you want, but they will not be necessary or useful. I will bring
extra writing paper as well. Calculators of any kind will **not** be allowed.
You will have to do all your calculations on paper.

You can bring drinks and food to the exam.

**Understading Machine Learning: From Theory to Algorithms**by Shai Shalev-Schwartz and Shai Ben-David. This will be our primary textbook. PDF of the textbook is freely available online.**Foundations of Machine Learning**by Mehryar Mohri, Afshin Rostamizaden, and Ameet Talwalkar. This is a good alternative text. It covers less topics but slightly more in depth.**Neural Network Learning: Theoretical Foundations**by Martin Anthony and Peter L. Bartlett. This is a comprehensive book on sample complexity bounds for PAC model, Agnostic PAC model, learning with real-valued losses.**An Introduction to Computational Learning Theory**by Michael J. Kearns and Umesh V. Vazirani. This book is a short and easy to read introduction to PAC model.**Kernel Methods for Pattern Analysis**by John Shawe-Taylor and Nello Cristianini. This book is a comprehensive introduction to Support Vector Machines (SVM) and other kernel methods.**Boosting: Foundations and Algorithms**by Robert E. Schapire and Yoav Freund. This book is about boosting and the theory behind it. The authors are the original inventors of boosting.

**Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm**by Nick Littlestone