Logistic Regression In Machine Learning

In logistic regression the hypothesis is logistic function (most commonly sigmoid function) where as in linear regression it was just linear sum. The output of the hypothesis using sigmoid function is between 0 and 1. So this can be used for the classification. 

In classification the predicted value is some discrete value unlike regression in which the predicted values are continuous. If we have two predicted values 0 and 1. 0 for negative class and 1 for positive class. If the hypothesis value is less than 0.5 then the prediction can be made as 0 and if the value is greater than 0.5 then as 1.

The hypothesis value is basically the probability of output 1. Suppose value for some given input is 0.65 then it gives that the probability is 65% for output 1.

Decision Boundary

The decision boundary is a curve that separates the two classes y=0 and y=1. The decision boundary may be a straight line or may be any type of the curve such that circle. The decision boundary is generated by the hypothesis function.

Cost Function

The cost function for logistic regression will have many minima is we take it as mean squared error as in linear regression. Means if we take the square of the differences between the hypothesis value and the input value and then sum and take mean if it, then this problem will arise. So to overcome this many local minima the cost function has been given in term of log of h(x) for two different conditional case one for y=0 and other y=1 of binary classification. Defining the Cost function in terms of log it guarantees that cost function will be in convex form so the gradient descent will not stuck in local minima.

Gradient Descent

The gradient descent for logistic regression is also covered. But there are some other advanced optimization methods like conjugate gradient , BFGS and L-BFGS which are more faster than the classical gradient descent methods. In these advanced methods there is no need to choose the learning rate this is an advantage over gradient descent. But these are more complex.

Multiclass Classification

In multiclass classification we have more than two classes of prediction. In binary logistic regression we have only two class {0,1} but here in the multiclass classification we have more than two classes or categories {0,1,...,n}. n-class classification. This multiclass classification can be done as one vs all. Here in one vs all we take one class and all other class into a single second class and perform the binary logistic regression. This is repeated for almost n+1 times and finally the highest value returned by hypothesis is our prediction.

The problem of overfitting

The problem of overfitting arises when there are too many features and hypothesis may fit very well for the training set but fails for the new examples. In this the fitted curve passes through the data perfectly. There are two main options to overcome overfitting. One is reducing the number of features and other is Regularization. In regularization we keep all features but reduce the magnitudes of the parameters.
In the next post, we will discuss a simple artificial neural networks for machine learning.