Browsing Tag

# linear regression

## Statistical Terms in Data Science and Regression Metrics

Various statistical concepts are incorporated in Data Science. In this notebook I am going to cover some basic statistical terms, and talk about metrics used in Data Science for Regression tasks. This notebook can be also viewed on Github.

#### 1. Statistical terms

Let’s look at some simple statistical terms in detail:

Mean ($\bar{x}$): Averaging. Mean is a sum of all values divided by the number of values:

$\bar{x} = \frac{\sum_{i=1}^{n}x_i}{n}$

Variance ($\sigma^2$): Describes the spread of a distribution. For a set of values, the variance:

$\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}\big(x_i - \bar{x}\big)^2$

Standard Deviation ($\sigma$): Square root of variance, is in the units of the data it represents:

$\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\big(x_i - \bar{x}\big)^2}$

## Programming Exercise 1: Linear Regression

I started working on the Machine Learning course by Andrew Ng. The following blog postĀ contains exercise solution for linear regression using gradient descent algorithm. Also, this blog post is available as a jupyter notebook on GitHub.

This exercise was done using Numpy library functions. I also used scikit-learn library to demonstrate another way of linear regression plotting.

InĀ [1]:
# Standard imports. Importing seaborn for styling.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn; seaborn.set_style("whitegrid")