Ensemble Methods combine different supervised learning techniques to improve performance.
Bagging
“Bootstrap Aggregation”
Method to reduce over-fitting. Won't help if the existing model doesn't already over-fit. Uses bootstrap training technique, which creates random subsets of training data.
* Repeat K times
Bootstrap N' size training set
Sample N' number of data points from original training set, with replacement
train a classifier on this set
* To test, run each classifier
To classify, use a voting method for final prediction
For regression, use a function of classifier outputs
Use Boosting to weight base classifiers
Random Forests
Bagging applied to decision trees.
* Problem: Bootstrapping doesn't work well for constructing forests from very large data sets
Why: When data set is large, decision stumps tend to do the same thing
Solution: also bootstrap over features available for decision nodes
Conventional approach is to make new subset of features available at each node construction
Can also “block” restrict an entire tree from using a feature. This is data efficient, but less effective
Boosting
Method to increase complexity. Weighted combination of learners.
Gradient Boosting
Boosting technique for Logistic Regression. Called “gradient” boosting because error residuals computed by MSE are analogous to the gradient of an MSE cost function.
* Learn a regression predictor
* Compute the error residual
* Learn to predict the residual
The error residual provides a nice device for future predictors to train on. It discourages them from attention to data that are already well-predicted. With each iteration, the variance of the error residual should decrease and it should become more
uniform.
AdaBoost