Bias, Variance, and Regularization in Linear Regression

How to trouble shoot errors in prediction: After training the hypothesis there may be large errors in its prediction. Can these errors further be minimized? If possible then what are some ways to tackle these large errors. There are some ways through which we can further try to trouble shoot this problem. some of these are (a) getting more training examples, (b) trying smaller set of features, (c) trying additional features (d) trying polynomial features (e) increasing or decreasing the generalization parameter lambda. But we can not pick any of these at random.Evaluating a hypothesis: A hypothesis may have very low errors for the training examples but still be inaccurate while prediction may be due to overfitting. To evaluate a function we can split the given data dataset into two sets a training set and a test set. We determine the weights and minimize the error for training set and then compute the test set error.

Model selection: If the learning algorithm fits a training set very well this does not mean it is a good hypothesis. In order to choose the model of hypothesis we have to test each degree (d) of polynomial without and with the validation set.

Cross Validation Set: We use very important set the cross validation set used as intermediate set that can train d with. The hole dataset can be split into training set, cross validation set and test set in 60%, 20% and 20% manner but it is not hard and fast rule. We can calculate three separate errors for the three different sets.

Using Cross Validation Set: We first optimize the weights using the training set for each polynomial degree. Then find the polynomial degree d with the least error using the cross validation set. Finally estimate the generalization error using the test set. This way the degree of polynomial d has not been trained using the test set.

Bias vs. Variance: Either bias or variance is the main reason contributing the large errors in prediction. High bias is underfitting and high variance is overfitting. We need to find a mean between these two. The training error tends to decrease as we increase the degree of polynomial whereas cross validation error first decreases up to a point and the inceases as we increase d.

Regularization and Bias/Variance: As d contributes to bias or variance the other parameter is the regularization parameter lambda. A large lambda means High Bias means underfitting. Intermediate lambda means just right. A small lambda means the High Variance means overfitting. A large lambda heavily penalizes all the weights.

Learning Curves: If a learning algorithm is suffering from high bias, getting more training data will not help much. If a learning algorithm is suffering from high variance, getting more training data is likely to help.

Troubleshooting errors in prediction: (1) Getting more training examples fixes High Variance, (2) Trying smaller sets of features fixes High Variance, (3) Adding features fixes High Bias, (4) Adding polynomial features fixes High Bias (5) Decreasing lambda fixes high bias and (6) Increasing lambda fixes high variance.

Diagnosing Neural Networks: A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper. A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Bias-Variance trade-off: Complex model (high order polynomial degree) leads to high variance and low bias. Simple model (Low order polynomial degree) leads to low variance and high bias.

Regularization Effects: Small value of lambda allow model to lead large variance which leads to overfitting. Large value of lambda pull weight parameters to zero leading to large bias leading to underfitting.

Machine Learning System Design: Different ways we can approach a machine learning problem (a) Collect lots of data (b) Develop sophisticated features (c) Develop algorithm to process your input in different ways.