Thursday, January 31, 2008 7:47 AM
JeffB
AdaBoost
Two of the more recent algorithms in the field of machine learning
include Support Vector Machines and AdaBoost. I think both are on the
order of ten years old or such.
The idea behind AdaBoost is
quite simple. One just tries to build a strong classifier from a set of
potentially weak classifiers. The idea is to choose the weak
classifiers in a way that when combined they perform much better.
One
of the classic applications of this technique is Face Detection. Lets
say you have several thousand different filters you can apply to an
image. You would then just apply each filter to a set of training
images some of which have the feature you are looking for and some of
which don't. Each filter would output some real number for each of the
images. You would then choose a threshold for each filter such that it
classifies over 50% correctly. Then just choose the filter with the
highest accuracy.
One just repeats this as long as one has more
filters to choose or a high enough accuracy is achieved. However at
each stage when we choose a new filter we re-weight all the data
samples that were correctly classified to have a lesser weight and the
incorrectly classified data points to have a stronger weight. So as you
progress you find filters that are specific to difficult cases.
There
are some mathematically provable facts about error bounds, and papers
on how to choose the weights at each step, but this is the basic idea.
Probably the hardest aspect of this problem is how do you choose your
set of filters or weak classifiers that you want to use.
Here is the classic
paper on Face Detection using this technique.