Introduction to Machine Learning | Linear Regression

Machine Learning (ML) is the field of study that gives computers the ability to learn without being explicitly programmed. It is a subset of Artificial Intelligence (AI) — the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

Types of Machine Learning Algorithms

Machine learning (ML) algorithms can be broadly categorized into two main types:

Supervised learning

Supervised learning algorithms are trained using labeled data, where the input comes with corresponding target values (labels). The model learns to predict these target values based on the input data.

Examples

Predicting House Prices: Using historical data of house features (size, location, etc.) and their prices to predict the price of new houses.
Email Spam Classifier: Using a labeled dataset of emails (spam or not spam) to classify new emails.

Unsupervised learning

Unsupervised learning algorithms find hidden patterns or intrinsic structures in data without pre-existing labels. The model tries to learn the patterns and structure from the data itself.

Examples

Finding Groups of Similar Customers: Clustering customers into different groups based on purchasing behavior to tailor marketing strategies.
Detecting Abnormal Server Access Patterns: Identifying unusual access patterns to detect potential security threats.

Key Machine Learning Problems

Supervised Learning

Classification: It is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Target values are discrete classes. An algorithm that implements classification, especially in a concrete implementation, is known as a classifier.
Regression: Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, or ‘features’). The most common form of regression analysis is linear regression. Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Likewise, an algorithm that implements regression is called a regressor.

Unsupervised Learning

Clustering: Clustering involves grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. Clustering is used in problems like customer segmentation in marketing or, grouping similar documents.
Anomaly Detection: Anomaly detection is the identification of rare items, events, or observations which raise suspicions by differing significantly from the majority of the data. Algorithms used include Isolation Forest, One-Class SVM, and Autoencoders. Its applications include Fraud detection in finance, identifying defective products in manufacturing etc.

Machine Learning Algorithms

Classification

Logistic regression
Multinomial logistic regression
Probit regression
Support vector machines
Linear discriminant analysis

Regression

Linear & multivariable linear regression.
Polynomial Regression
Stepwise Regression

Each algorithm has its strengths and applications. Exploring different algorithms and customizing them can lead to more effective solutions to your data and problem.

Python Tutorial for Beginners (Introduction)

Linear Regression

Linear regression is a fundamental statistical method in machine learning that models the relationship between a dependent variable and one or more independent variables. It’s widely used for predictive analysis in various fields, such as finance, healthcare, and social sciences. This method is essential for developing a foundational understanding of more complex machine learning algorithms.

Linear regression uses a linear approach for modeling the relationship between a scalar response (y or dependent variable) and one or more explanatory variables (x or independent variables). The dependent variable (target value y) is the outcome we want to predict, while the independent variables (features (x)) are the factors or predictors used to make this prediction.

For a single independent variable, the approach is called Simple Linear Regression. While, with more than one independent variable (x1, x2,..), it’s known as Multiple Linear Regression.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Consider this example:

Example

In a Linear regression model which predicts house prices based on house features, the relationship is modeled as:

Y = \theta_ 0 + \theta_ 1 x

Dependent Variable ( $Y $ ): House Price (Predicted)
Independent Variables ( $x $ ): Size of the house. (We could use multiple features, but for now let’s consider only one feature.)
Model Parameters ( $\theta_ 0, \theta_ 1$ ): These are the coefficients that the model estimates from the data, representing the relationship between independent variable ( $x $ ) and the dependent variable ( $Y $ ).

Note: In Linear Regression, all independent variables ( $x$ ) can only have a degree (exponent) of 1, meaning the relationship is linear.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

If the goal is prediction, forecasting, or error reduction linear regression can be used to make a prediction of the observatory data.
If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables. [1]

Hypothesis function

In linear regression, the predicted values (Y) are the values obtained from the hypothetical function $h_\theta(x)$ . Hypothetical function $h_\theta(x)$ is also referred to as the hypothesis function. It represents the model’s predicted output based on the input features and the model parameters (coefficients).

h_ \theta (x) = \theta_ 0 + \theta_ 1 x

where,

$h_ \theta (x)$ = prediction = hypothesis (dependent variable)

$\theta_ i$ = parameters

$x$ = input/features (independent variable)

Goal: Choose $\theta_ i$ , such that hypothesis is close to expected output

How to select $\theta_ i$ (parameters), so that it best fits the training data?

Cost Function (J)

In linear regression, the goal is to find the best-fit line that predicts the dependent variable $(y)$ based on one or more independent variables $(x).$

To achieve this, we use a cost function, which is a measure of how well the model’s predictions match the actual data. The most commonly used cost function in linear regression is the Mean Squared Error (MSE).

Mean Squared Error (MSE)

The MSE cost function is defined as the average of the squared differences between the predicted values $(\hat{y_i})$ and the actual values $(y_i)$ . Mathematically, the linear regression cost function using MSE is expressed as:

J (\theta) = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y_i} - y_i)^2

where,

$J (\theta)$ is the cost function.
$m$ is the number of training examples.
$(\hat{y_i})$ is the predicted value using the Hypothesis function $h_ \theta (x)$ for the $i^{th}$ training example.
$(y_i)$ is the actual value for the $i^{th}$ training example.
$\theta$ represents the parameters of the linear regression model (including the intercept $(\theta_ 0)$ and slope $(\theta_1)$ ).

Overall, Cost function (J) of Linear Regression is the Mean Squared Error (MSE) between:

predicted value ( $\hat{y}$ ) obtained using $h_ \theta (x)$ and true value $(y)$ .

Selecting Parameters (θs)

Now, how do we select and find most suitable parameters ( $\theta_ i$ ) for our hypothetical function, so that predicted values are close to the actual values. To find the most suitable parameters ( $\theta_ i$ ) that minimize the cost function, we use an optimization algorithm called Gradient Descent.

Gradient Descent

Gradient Descent is an optimization algorithm used to update the parameters ( $\theta_ i$ ) of linear regression model to minimize the cost function, thereby achieving the best fit line. The core idea is to start with initial, often random values of θ (parameters) and iteratively update them to reduce the cost function $J (\theta)$ .

Gradient Descent equation (Parameters Update Rule)

\theta_{j} = \theta_{j} - \alpha \frac{\partial}{{\partial \theta_{j}}} J (\theta_{0}, \theta_{1})

Here,

j = 0, 1
$\alpha$ = learning rate (alpha)
$J (\theta_{0}, \theta_{1})$ = cost function

We would repeatedly update $\theta_{j}$ , until we converge to the minima

Intuition behind equation

As we could see from the above graph, θ will always converge towards local minima. Since, for values of θ lower than local minima value (ie with negative slope), the new θ value will increase and move towards the minima and vis-a-vis for the positive slope.

Now, the value of alpha (learning rate) could have different effects on the time to reach the minimum value and gradient descent’s performance.

This is called Linear regression in one variable i.e. when you have only one feature or input value. Next, we’ll learn about Linear regression with multiple variables and regularisation.

Footnotes
[1] Linear regression — Wikipedia

Reference
Supervised Machine Learning | Coursera