### Understanding Basics of Machine Learning - ML Algo, Linear Regression, Cost Function, Gradient Descent

We will cover the following Machine Learning Basics:

Hypothesis:

To determine

To minimize

Other Method to determine is Gradient Descent Method

Also cover feature scaling.Idea of feature scaling is to make all features on a similar scale. One of the best method of feature scaling is Mean Normalization. We generally declare convergence if

The other Method to solve for Multivariate Linear Regression is Normal Equation. It is generally recommended to use if no of features is less than 1000 but if it is larger value then gradient descent is more useful.

## The two basic definitions of machine learning

1st by Arthur Samuel(1959)- Machine Learning: is Field of study that gives computers the ability to learn without being explicitly programmed.

2nd definition by Tom Mitchell (1998)- A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

2nd definition by Tom Mitchell (1998)- A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

## Machine Learning Algorithms: Supervised and unsupervised Learning

Supervised Learning is when the right answers (output/ target variables) are also given along with features(input variables) in the training data as input. In supervised learning we have Regression and Classification. Regression is when our output is continuous valued and Classification is when output is discrete valued.

Unsupervised Learning is when the right answers (output/ target variables) are not given. Clustering is an example of unsupervised learning. social network analysis, market segmentation, astronomical data analysis, organize computing clusters come under clustering.

Unsupervised Learning is when the right answers (output/ target variables) are not given. Clustering is an example of unsupervised learning. social network analysis, market segmentation, astronomical data analysis, organize computing clusters come under clustering.

## Linear Regression with one variable

First tried to make model for representing the linear regression with one variable taking an example of housing price prediction. In this example size (

Representation of

How to choose θi's : For this we have to study about Cost Function.

**) of house is taken as input variable/ feature. The price (***x***) is taken as the output/ target variable. The learning Algorithm for this problem is trained with training set. After training we get a hypothesis***y***which takes the size of house as input and gives the estimated price as output.***h*Representation of

**:***h*Where,

*and*

**θ**_{0 }**are parameters.**

*θ*_{0}How to choose θi's : For this we have to study about Cost Function.

## Cost Function

Where,

*and*

**θ**_{0 }**are parameters.**

*θ*_{2}Summation run for all training data.

*is total no of rows (examples) in training set.*

**m**Hypothesis:

To determine

*and*

**θ**_{0 }**our goal is to minimize J.**

*θ*_{2}To minimize

**we take partial derivatives of**

*J***w.r.t**

*J**and*

**θ**_{0 }**and equate to zero and solve we will get formulae to get values of**

*θ*_{2}*and*

**θ**_{0 }**---- this will give Normal Equation.**

*θ*_{2}Other Method to determine is Gradient Descent Method

## Gradient Descent Method

Here

**α**is Learning Rate. If**α**is too small, gradient descent can be slow. And if**α**is too large, gradient descent can overshoot the minimum. It may fail to converge or even diverge.Procedure:

- Start with some
and**θ**_{0 }**θ**_{2} - Keep changing
and**θ**_{0 }to reduce*θ*_{2}until we hope fully reach at minimum.**J**

## Linear Regression with Multiple Variables (Multivariate Linear Regression)

**n**- total no of features,**m**-total training examples,

**x**-input(feature).

_{j}*decreases by less than 10-3.*

**J( )**The other Method to solve for Multivariate Linear Regression is Normal Equation. It is generally recommended to use if no of features is less than 1000 but if it is larger value then gradient descent is more useful.

## Comments

## Post a Comment