Nearest Neighbor Classification
k-Nearest Neighbors (k-NN) is one of the simplest machine learning algorithms. Predictions for the new data points are done by closest data points in the training data set. The algorithm compares the Euclidean distances from the point of interest to the other data points to determine which class it belongs to. We can define the k-amount of the closest data points for the algorithm calculations.
Lower k results in low bias / high variance. As k grows, the method becomes less flexible, and decision boundary close to linear. Higher k results in high bias / low variance.
Few links on the topic:
- Scikit-learn Neighbors
- Scikit-learn KNeighborsClassifier
- kNN Tutorial from Kevin Zakka
- sentdex ML tutorials on Youtube
Also, this blog post is available as a jupyter notebook on GitHub.