Browsing Tag

scatter plot

Plotting Error Bars in Python using Matplotlib and Numpy Random

by admin-postMarch 17, 2018no comment

A friend of mine needed help with plotting clusters with corresponding asymmetrical error bars. I decided to write a blog post about plotting error bars in Python after helping with the problem. The notebook can be also viewed on Github.

Error Bars

Error bars are graphical representations of the error or uncertainty in data, and they assist correct interpretation. For scientific purposes, reporting of errors is crucial in understanding the given data. Mostly error bars represent range and standard deviation of a dataset. They can help visualize how the data is spread around the mean value.

The Data

The data shown below is randomly generated for plotting purposes. This blog post is not about correct statistical interpretation of error bars, and solely written for demonstration purposes.

We will be using numpy for data generation. Let’s start by importing numpy.

In [1]:

# Importing numpy
import numpy as np
np.__version__

Out[1]:

'1.14.0'

Machine Learning, Python,

Statistical Terms in Data Science and Regression Metrics

by admin-postFebruary 14, 2018no comment

Various statistical concepts are incorporated in Data Science. In this notebook I am going to cover some basic statistical terms, and talk about metrics used in Data Science for Regression tasks. This notebook can be also viewed on Github.

1. Statistical terms

Let’s look at some simple statistical terms in detail:

Mean ( $\bar{x}$ ): Averaging. Mean is a sum of all values divided by the number of values:

$\bar{x} = \frac{\sum_{i=1}^{n}x_i}{n}$

Variance ( $\sigma^2$ ): Describes the spread of a distribution. For a set of values, the variance:

$\sigma^2 = \frac{1}{n}\sum_{i=1}^{n}\big(x_i - \bar{x}\big)^2$

Standard Deviation ( $\sigma$ ): Square root of variance, is in the units of the data it represents:

$\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\big(x_i - \bar{x}\big)^2}$

Machine Learning, Python,

Machine Learning – Programming Exercise 2: Logistic Regression

by admin-postOctober 11, 2017no comment

Programming Exercise 2: Logistic Regression

The following blog post contains exercise solution for logistic regression assignment from the Machine Learning course by Andrew Ng. Also, this blog post is available as a jupyter notebook on GitHub.

In [1]:

# Standard imports. Importing seaborn for styling.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn; seaborn.set_style('whitegrid')

Machine Learning, Python,

k-NN Nearest Neighbor Classifier

by admin-postSeptember 30, 2017no comment

Nearest Neighbor Classification

k-Nearest Neighbors (k-NN) is one of the simplest machine learning algorithms. Predictions for the new data points are done by closest data points in the training data set. The algorithm compares the Euclidean distances from the point of interest to the other data points to determine which class it belongs to. We can define the k-amount of the closest data points for the algorithm calculations.

Lower k results in low bias / high variance. As k grows, the method becomes less flexible, and decision boundary close to linear. Higher k results in high bias / low variance.

Few links on the topic:

Scikit-learn Neighbors
Scikit-learn KNeighborsClassifier
kNN Tutorial from Kevin Zakka
sentdex ML tutorials on Youtube

Also, this blog post is available as a jupyter notebook on GitHub.

Python,

Data visualization with Python and Matplotlib / Scatter Plot – Part 3

by admin-postJuly 9, 2017no comment

Let’s look at what else can be done with the data from Part 1. We start with the csv import mentioned in Part 2. Looking at the data we see that we have data available for 2010 and 2015 years, and we can analyze the change in suicide rates.

In [1]:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl

In [2]:

table = pd.read_excel('mergedData.xlsx')
table.head()

Out[2]:

	Country	2015_s	2010_s	2015_p	2013_p	2010_p	2013_d	suiAve	suiPerDeath	deaPerPop
0	Afghanistan	5.5	5.2	32526.6	30682.5	27962.2	7.7	5.35	0.694805	0.77
1	Albania	4.3	5.3	2896.7	2883.3	2901.9	9.4	4.80	0.510638	0.94
2	Algeria	3.1	3.4	39666.5	38186.1	36036.2	5.7	3.25	0.570175	0.57
3	Angola	20.5	20.7	25022.0	23448.2	21220.0	13.9	20.60	1.482014	1.39
4	Antigua and Barbuda	0.0	0.2	91.8	90.0	87.2	6.8	0.10	0.014706	0.68

codeWithMax

codeWithMax

scatter plot

Plotting Error Bars in Python using Matplotlib and Numpy Random

Error Bars

The Data

Statistical Terms in Data Science and Regression Metrics

1. Statistical terms

Machine Learning – Programming Exercise 2: Logistic Regression

Programming Exercise 2: Logistic Regression

k-NN Nearest Neighbor Classifier

Nearest Neighbor Classification

Data visualization with Python and Matplotlib / Scatter Plot – Part 3