From 555d91a38a217958f97c60bdcdcdcde77e7c38d6 Mon Sep 17 00:00:00 2001 From: Anshika Yadav <14anshika7yadav@gmail.com> Date: Thu, 30 May 2024 20:28:50 +0530 Subject: [PATCH] Regression in Machine Learning: #15 --- Regressions/Readme.md | 56 +++ Regressions/linear_regression.ipynb | 505 ++++++++++++++++++++++++++++ 2 files changed, 561 insertions(+) create mode 100644 Regressions/Readme.md create mode 100644 Regressions/linear_regression.ipynb diff --git a/Regressions/Readme.md b/Regressions/Readme.md new file mode 100644 index 000000000..645521afa --- /dev/null +++ b/Regressions/Readme.md @@ -0,0 +1,56 @@ +# Regression + +Regression in machine learning refers to a type of supervised learning algorithm used to predict continuous values. +Regression helps us understand the relationship between independent variables (features) and dependent variables (target) and predict a continuous output based on the input data. + +## Types of Regression + +1. Linear Regression: + - the simplest form of regression that assumes a linear relationship between the independent variables and the dependent variable. + - the dependent variable is continuous, independent variable(s) can be continuous or discrete. + - represented by an equation `y = b*x + a + e` where y is the dependent variable , x is the independent varaible ,a is intercept, b is slope of the line and e is error term. + - Example : Predicting house prices based on features such as area, number of bedrooms, number of bathrooms, etc. + +2. Polynomial Regression: + - Polynomial regression is an extension of linear regression where the relationship between the independent and dependent variables is modeled as an nth degree polynomial. + - represented by the equation `y = b1*x + b2*x^2 + ..... +bn*x^n + a + e ` + - Example : Predicting the height of a plant based on the age of the plant. The relationship may not be linear; it could be quadratic or cubic, requiring a polynomial regression model to capture it accurately. + +3. Logistic Regression: + - logistic regression is used for binary classification rather than regression. It models the probability that an instance belongs to a particular class. + - The logistic regression model uses the logistic function g(z), where z is the linear combination of the input features and their corresponding coefficients: `z = a + b1*X1 + b2*X2 + ...... + bn*Xn` + the logistics function g(z) is defined as `g(z) = 1/(1+e^(-z))` , g(z) is the predicted probability. + - Logistic regression doesn’t require linear relationship between dependent and independent variables. It can handle various types of relationships. + - Example : Predicting whether an email is spam or not based on features like the sender's address, subject line, and content. Logistic regression can output the probability of an email being spam, enabling classification based on a threshold. + +4. Decision Tree Regression: + - Decision tree regression uses a decision tree to model the relationship between the independent variables and the target variable. + - A decision tree is a tree-like structure where each internal node represents a "test" on an attribute (a feature), each branch represents the outcome of the test, and each leaf node represents a class label (in classification) or a numerical value (in regression). + - Its leads to problem of overfitting and instability. + - Example : Predicting the price of a used car based on features such as mileage, age, brand, etc. A decision tree can split the data into segments based on these features and predict the price within each segment. + +5. Random Forest Regression: + - Random forest regression is an ensemble learning method that combines multiple decision trees to improve predictive performance and reduce overfitting. + - Example : Predicting the sales of a product based on various factors such as advertising expenditure, seasonality, competitor prices, etc. Random forest regression can capture complex relationships between these factors and the sales outcome. + +6. Ridge Regression: + - Ridge Regression is a technique used when the data suffers from multicollinearity (independent variables are highly correlated). + - Ridge regression is a regularized version of linear regression that penalizes large coefficients to prevent overfitting. + - Example : Predicting a person's salary based on various factors such as education, experience, location, etc. Ridge regression can help prevent overfitting if there are multicollinearity issues among the features. + +7. Lasso Regression: + - Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. + - Example : Identifying significant features in a dataset containing numerous variables. Lasso regression can be useful for feature selection by shrinking less important features' coefficients to zero. + +8. ElasticNet Regression + - ElasticNet is hybrid of Lasso and Ridge Regression techniques. + - Elastic-net is useful when there are multiple features which are correlated. + + + +Implementation of linear regression is given in `linear_regression.ipynb`: +The code involves finding the coefficients (intercept and slope) that best fit the data according to the least squares criterion. + + + + \ No newline at end of file diff --git a/Regressions/linear_regression.ipynb b/Regressions/linear_regression.ipynb new file mode 100644 index 000000000..3aab585bb --- /dev/null +++ b/Regressions/linear_regression.ipynb @@ -0,0 +1,505 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# import the libraries\n", + "import numpy as np\n", + "from sklearn.linear_model import LinearRegression\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# define our sample data\n", + "X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Independent variable , 1-dimensional array reshaped to a 2-dimensional array with one column using reshape(-1, 1).\n", + "Y = np.array([2, 4, 5, 4, 5]) # Dependent variable\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "LinearRegression()" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = LinearRegression() # instance of the LinearRegression model.\n", + "model.fit(X, Y) #fit the model to our data\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# extract the coefficients of the fitted linear regression model:\n", + "intercept = model.intercept_\n", + "slope = model.coef_[0]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Intercept (a): 2.1999999999999993\n", + "Slope (b): 0.6000000000000003\n" + ] + } + ], + "source": [ + "print(\"Intercept (a):\", intercept)\n", + "print(\"Slope (b):\", slope)\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}