601.765 Machine Learning: Linguistic & Sequence Modeling

Spring 2019




We will probably have 4 homeworks (plus a term project). Homeworks will be posted here:

Homeworks are to be submitted on Gradescope, following the Piazza instructions.

Old homeworks (from Spring 2018)

Last year’s homework assignments are listed here for reference. You can expect some similar assignments this year.

Course overview

Catalog description: This course surveys formal ingredients that are used to build structured models of character and word sequences. We will unpack recent deep learning architectures that consider various kinds of latent structure, and see how they draw on earlier work in structured prediction, dimensionality reduction, Bayesian nonparametrics, multi-task learning, etc. We will also examine a range of strategies used for inference and learning in these models. Students will be expected to read recent papers and carry out a research project. [Applications or Analysis]

Prerequisites: EN.600/601.465/665 or permission. Prior coursework in statistics or machine learning is recommended. Students may wish to prepare for their choice of research project by taking EN.601.382 Deep Learning Lab at the same time.


Requirements (details TBA)

Topic list

Setting the stage

  1. Overview
  2. Sequence labeling as a canonical problem

Classical methods

  1. Notation and statistical background
  2. Algorithmic background: Paths in graphs
  3. Classical sequence labeling models
  4. Graphical models and belief propagation

Richer scoring functions

  1. Beyond dynamic programming: Approximation algorithms
  2. Feature / architecture engineering
  3. Neuralization
  4. Word embeddings
  5. Backprop and optimization methods
  6. Hyperparameter tuning (model selection)
  7. Deep generative models

Beyond sequence labeling

  1. Distributions over other discrete structures (trees, proofs)
  2. Transition systems for transduction and parsing
  3. Integration over hidden variables
  4. Reinforcement learning
  5. Continuous generalizations
  6. Exchangeability
  7. Hierarchical modeling

Other possible topics (time permitting)

  1. Lambek calculus / CCG / automata / other models of grammaticality
  2. Spectral learning
  3. Structure learning

Recurring themes

In some sense, the point of the course is to explicitly show you the collection of design choices that you face when building probabilistic reasoning systems. Your choices will affect (1) how well your models fit the theory and the data, (2) the computational complexity of inference and training, and (3) the difficulty of implementing the system.