seq2class.github.io

601.765 Machine Learning: Linguistic & Sequence Modeling

Spring 2019

Announcements

The first day of class is Monday, Jan 28. See you there!
Please fill out the questionnaire.

Administration

Lectures: 3-4pm MWF, in Hackerman 320.
- Sometimes we may have to do 3-4:15 but this will be announced in advance.
Instructor: Jason Eisner
- Office hours: 4-4:30pm after class, or by appointment.
TA: Sabrina Mielke
- Office hours: Friday, 4-5pm, outside Hackerman 321
Email: cs765-staff at cs.jhu.edu
Discussion site: https://piazza.com/jhu/spring2019/601765/
Class notes:
- Formalisms and terminology
- Scribe Notes
  - LaTeX repo
  - Sign up to Scribe
- Readings (TBA)
Video lectures: via Blackboard

Homework

There are 4 homeworks:

Homework 1: Slow general algorithms for sequence labeling
Homework 2: Efficient finite-state methods
Homework 3: Neural models
Homework 4: A slightly different introduction to Deep Reinforcement Learning

Homeworks are to be submitted on Gradescope, following the Piazza instructions.

Course overview

Catalog description: This course surveys formal ingredients that are used to build structured models of character and word sequences. We will unpack recent deep learning architectures that consider various kinds of latent structure, and see how they draw on earlier work in structured prediction, dimensionality reduction, Bayesian nonparametrics, multi-task learning, etc. We will also examine a range of strategies used for inference and learning in these models. Students will be expected to read recent papers and carry out a research project. [Applications or Analysis]

Prerequisites: EN.600/601.465/665 or permission. Prior coursework in statistics or machine learning is recommended. Students may wish to prepare for their choice of research project by taking EN.601.382 Deep Learning Lab at the same time.

Remarks:

The focus of the class is on understanding the space of good options for designing probabilistic sequence models and computing with them. We will discuss the qualitative advantages and disadvantages of different options. Our goal is not to teach you exactly how today’s top-ranked system works, but rather to give you a toolbox for understanding and creating system designs.
This class builds on the dynamic programming algorithms and log-linear models covered in NLP. We will primarily extend to various neural (“log-nonlinear”) models, some of which allow dynamic programming.
As this is a graduate class, the lecture style will be a bit more improvisational than in NLP. The class is also still under development. We will probably only cover a subset of the topics on the syllabus.

Requirements (details TBA)

Attending lectures
Scribing? (i.e., drafting lecture notes)
Reading papers?
Homeworks
Midterm exam?
Final exam
Final project

Topic list

Setting the stage

Overview
Sequence labeling as a canonical problem

Classical methods

Notation and statistical background
Algorithmic background: Paths in graphs
Classical sequence labeling models
Graphical models and belief propagation

Richer scoring functions

Beyond dynamic programming: Approximation algorithms
Feature / architecture engineering
Neuralization
Word embeddings
Backprop and optimization methods
Hyperparameter tuning (model selection)
Deep generative models

Beyond sequence labeling

Distributions over other discrete structures (trees, proofs, program runs)
Transition systems for transduction and parsing
Integration over hidden variables
Reinforcement learning
Continuous generalizations

Kalman filters
Poisson and Hawkes processes
Gaussian processes

Exchangeability

Dirichlet processes

Hierarchical modeling

Types vs. tokens
Infinite Gaussian mixture model
Hierarchical Pitman-Yor language model
Infinite HMM

Recurring themes

In some sense, the point of the course is to explicitly show you the collection of design choices that you face when building probabilistic reasoning systems. Your choices will affect (1) how well your models fit the theory and the data, (2) the computational complexity of inference and training, and (3) the difficulty of implementing the system.

Training objectives
- Joint vs. conditional
- Loss-infused training (train a policy) vs. loss-infused decoding (train a model)
- Forms of regularization
- Smooth vs. non-smooth objectives
- Convex vs. non-convex objectives
- End-to-end vs. pipelined training
Inference objectives
- Maximization vs. summation; annealing
  - Search and sampling
  - Dual decomposition (for maximization)
  - Variational approximation (for summation)
Modeling schemes
- Global vs. local - and how local? (= lookahead vs. heuristics)
- Graph-based vs. transition-based (= subgraph features vs. history-based features)
- Tractable vs. faithful models
- Domain knowledge vs. generic architectures
Types vs. tokens
- Model structure
- Weighting the training data
Computational tricks of the trade and implementation know-how