Sequential Supervised Learning: General Methods for Sequence Labeling and Segmentation

by Thomas G. Dietterich, Oregon State University, USA

Many existing and emerging applications of machine learning and data mining involve the problem of labeling the elements of a sequence. Examples include information extraction from web pages, part-of-speech tagging in computational linguistics, protein and DNA sequence analysis, and computer intrusion detection. In all of these tasks, the training examples consist of pairs (X,Y), where X is a sequence of objects or events (x1, ..., xT) each described by a vector of features, and Y is a matching sequence of class labels (y1, ..., yT). Given a new sequence of objects X, the goal is to predict the corresponding sequence of labels Y. This is an example of "collective classification", where each of the objects xt is classified simultaneously with all of the other objects in the sequence. This talk will discuss practical, off-the-shelf machine learning methods for sequential supervised learning and describe our experience with applications in computational linguistics, information extraction, and bio-informatics.


This page has been accessed times since August 20, 2003.