Linear Regression
The idea behind the linear regression is Least squares: minimize the sum of the squares of the errors made between hypothese and the observed result, aka the cost function .
%pylab inline
Populating the interactive namespace from numpy and matplotlib
Let’s use the sample data from the exercise #1.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt(open("ex1data1.txt"), delimiter=",")
X = data[:, 0]
Y = data[:, 1]
plt.figure()
plt.scatter(X, Y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()
Here is the scalar cost function, computeCost
.
def computeCost(theta_0, theta_1):
return np.sum((theta_0 + theta_1 * X - Y) ** 2) /(2 * len(X))
print computeCost(0, 0)
32.0727338775
Based on the trial-n-error, we will identify that, to minimize the cost function, must be in , must be in :
theta_0_space = np.concatenate([
np.linspace(-10, -5, 400),
np.linspace(-5, 5, 1000),
np.linspace(5, 10, 400)])
theta_1_space = np.concatenate([
np.linspace(-1, 0, 400),
np.linspace(0, 2, 1200),
np.linspace(2, 4, 400)])
Theta_0, Theta_1 = np.meshgrid(theta_0_space, theta_1_space)
rows, cols = np.shape(Theta_0)
Z = np.zeros(shape=(rows, cols))
for i in range(rows):
for j in range(cols):
Z[i, j] = computeCost(Theta_0[i, j], Theta_1[i, j])
Here is the contour of the :
plt.figure()
CS = plt.contour(Theta_0, Theta_1, Z,
levels=[5, 8, 16, 32, 64, 128, 256])
plt.xlabel('theta0')
plt.ylabel('theta1')
plt.title('The contour of the J(theta)')
plt.clabel(CS, inline=1, fontsize=10)
plt.show()
The saddle point can be found at where and . The numerical approach is to use the graident descent.
More concretly:
theta = np.array([0,0])
alpha = 0.01
m = len(X)
iters = 1000
V = np.array([ones(shape(X)), X])
thetas = zeros(shape=(iters, 2))
for i in range(iters):
thetas[i] = theta
theta = theta - (alpha / m) * dot((np.dot(theta, V) - Y), V.transpose())
We can also render the gradient descent trace in the contour.
plt.figure()
CS = plt.contour(Theta_0, Theta_1, Z,
levels=[5, 8, 16, 32, 64, 128, 256])
plt.xlabel('theta0')
plt.ylabel('theta1')
plt.title('The trace of gradient descent in J(theta)')
plt.clabel(CS, inline=1, fontsize=10)
plt.plot(thetas[:, 0], thetas[:, 1], c='r')
plt.show()
Let’s check out how the hypothesis fit into the experiments:
B = np.linspace(0, 25, 100)
Xh = np.array([ones(shape(B)), B])
Yh = np.dot(theta, Xh)
plt.figure()
plt.scatter(X, Y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.plot(B, Yh, c='red')
plt.xlim([0, 25])
plt.show()
print 'theta: \n', theta
theta: [-3.24140214 1.1272942 ]
There is no reason to reinvent the wheel, we can leverage the power of Generalized Linear Models from scikit-learn:
from sklearn import linear_model
X = data[:, 0]
Y = data[:, 1]
clf = linear_model.LinearRegression()
X.shape = (len(X), 1)
clf.fit(X, Y)
print 'intercept: ', clf.intercept_
print 'Coefficients: ', clf.coef_
plt.figure()
plt.scatter(X, Y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.plot(X, clf.predict(X), color='red')
plt.xlim([0, 25])
plt.show()
intercept: -3.89578087831 Coefficients: [ 1.19303364]