Supervised Learning Setup and Bias-Variance Trade-off

Let us recall that the goal of Supervised Learning is to find the best function that estimate the mapping of some inputs X to some outputs Y. The mapping function is what we call the target function. The goal of this lecture is to measure the performance of a model. Keep in mind that when we train a machine learning model, we don’t just want it to learn to model the training data (not a simple inference). But, we want it to generalize to data it hasn’t seen before. In order to do so, we keep a held-out set that we call test data, consisting of examples it hasn’t seen before. The goal is that the model has a good performance on the test data and the training data.

0. Setup

  1. multiclassifcation: {1,2,3,4,….k}
  2. Regression: Real number (R)

I. Training Set , Validation Set and Test set

  1. Validation set: is the set used for hyperparameters tuning (finding the best hyperparameters). Examples of a hyperparameter: for an ANN includes the number of hidden units in each layer, for a polynomial regression includes the degree… The model sees this data, it doesn’t learn from it. We use the validation set results to dedcide hyperparameters. So the validation set affects a model, but only indirectly through the hyperparameters. The validation set is also known as the Dev set or the Development set.
  2. A test set is a set of examples used only to assess the performance.

II. Loss Function

Suppose you have a model and you want a metric to tell how good is your model, especially if you want to introduce your model to a client, or compare the performance of a set of models to choose which one is better, this is where loss functions comes into play.

III. Generalization Error

IV. Reduce overfitting

In this section, we will present briefly without going to details some of the ways that are used to reduce overfitting. Notice that I said reduce, rather than eliminate overfitting. A good model will probably still overfit at least a little bit. Every algorithm has its own ways to reduce overfitting when using it, but there are some common approaches:

  1. Simplify the model
  2. Use k-fold cross for validation and hyperparameters tuning
  3. Dimensionality reduction and feature selection
  4. Early stopping, Regularization…
  5. Ensemble Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store