Logistic regression- 9 mins
I though I would give some introduction on basic machine learning techniques and include some basic implementation with the explanations. I started with the topic below. Hope you will enjoy reading.
The main goal is to maximize the joint probability of the output label given the input data - in layman’s terms - make the predictions correct most of the time.
Since this equation gives:
and we want the predictions to represent the probability of a binary classification problem. In logistic regression we apply a sigmoid function to the regression term in order to achieve [0, 1] range and make it a normalized probability:
Where x is a data point and w, b are the parameters of our regression model. In order to maximize the joint probability we take its logarithmic mapping since the logarithm is a continuous and monotonic function therefore for a data and label pair (x, y) our loss function is:
We want to optimize this by stochastic gradient descent so we will need the gradients. For a dataset of
the computations are the following:
The i-th activation is:
And the i-th prediction:
The i-th loss therefore:
We still have to define the cost function which is the sum of the losses. If we do stochastic gradient descent we sample a batch of the dataset randomly and calculate the loss for
The derivatives are the following:
Since we know the chain rule we know that in order to get
Moving on the actual learnable parameters
Logistic regression in code with L2 weight regularization
Now the same in code:
import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.metrics import accuracy_score
sklearn a default binary classification dataset can be loaded and a metric can be used to test accuracy.
X, y = load_breast_cancer(return_X_y=True) y.reshape(y.shape, 1) print('Data shape : ', X.shape, '\t labels shape : ', y.shape)
Data shape : (569, 30) labels shape : (569,)
n_samples = X.shape test_split = np.random.choice(n_samples, size=n_samples // 10, replace=False) assert np.unique(test_split).size == n_samples // 10, 'Test split size constraint failed' train_split = list(set(range(0, n_samples)) - set(test_split)) assert np.unique(train_split).size == n_samples - n_samples // 10, 'Train split size constraint failed' X_train, y_train, X_test, y_test = X[train_split], y[train_split], X[test_split], y[test_split] y_train = y_train.reshape(n_samples - n_samples // 10, 1) y_test = y_test.reshape(n_samples // 10, 1) X_train.shape, y_train.shape, X_test.shape, y_test.shape
((513, 30), (513, 1), (56, 30), (56, 1))
Here I implemented some custom splitting technique with
numpy only but the default
train_test_split method can be used from
sklearn as well. Normalizing the dataset is also important to ensure numerical stability.
train_std, train_mean = np.std(X_train, axis=0), np.mean(X_train, axis=0) test_std, test_mean = np.std(X_test, axis=0), np.mean(X_test, axis=0) X_train -= train_mean X_test -= test_mean X_train /= train_std X_test /= test_std X_train.shape, X_test.shape
((513, 30), (56, 30))
def sigmoid(x): return 1. / (1. + np.exp(-x)) sigmoid = np.vectorize(sigmoid) learning_rate = 5e-4 m = 750 max_iter = 5_000 # regularization alpha = 10e-3 w = np.random.randn(1, X_train.shape) b = np.random.randn() for _iter in range(max_iter): batch_indices = np.random.choice(X_train.shape, size = m, replace = True) batch = X_train[batch_indices].T _y = y_train[batch_indices].reshape(1, m) assert batch.shape == (X_train.shape, m), 'Batch shape constraint failed' activation = np.dot(w, batch) + b assert activation.shape == (1, m), 'Activation shape constraint failed' y_hat = sigmoid(activation) assert y_hat.shape == activation.shape, 'Prediction shape constraint failed' if _iter % 500 == 0: # Calculate current accuracy y_pred_test = np.dot(w, X_test.T) + b y_pred_test[y_pred_test >= 0.5] = 1 y_pred_test[y_pred_test < 0.5] = 0 print('Iter : %d | Current accuracy on test set %.2f%%: ' % (_iter, 100 * accuracy_score(y_test, y_pred_test.T))) da = y_hat - _y assert da.shape == activation.shape, 'Activation derivative shape constraint failed' dw = np.matmul(batch, da.T).T assert dw.shape == w.shape, 'Weights derivative shape constraint failed' db = np.sum(da) # Update parameters w -= learning_rate * ( dw + 2. * alpha * np.linalg.norm(dw) ) b -= learning_rate * ( db + 2. * alpha * np.abs(db) ) y_pred_test = np.dot(w, X_test.T) + b y_pred_test[y_pred_test >= 0.5] = 1 y_pred_test[y_pred_test < 0.5] = 0 print('Final accuracy : %.2f%%' % (100 * accuracy_score(y_test, y_pred_test.T)))
Iter : 0 | Current accuracy on test set 16.07%: Iter : 500 | Current accuracy on test set 98.21%: Iter : 1000 | Current accuracy on test set 96.43%: Iter : 1500 | Current accuracy on test set 96.43%: Iter : 2000 | Current accuracy on test set 96.43%: Iter : 2500 | Current accuracy on test set 98.21%: Iter : 3000 | Current accuracy on test set 98.21%: Iter : 3500 | Current accuracy on test set 98.21%: Iter : 4000 | Current accuracy on test set 98.21%: Iter : 4500 | Current accuracy on test set 98.21%: Final accuracy : 98.21%
Here the final accuracy is 98.21%. Which seems to be pretty amazing for such a simple algorithm.