Logistic regression

Logistic regression

- 9 mins

Intro

I though I would give some introduction on basic machine learning techniques and include some basic implementation with the explanations. I started with the topic below. Hope you will enjoy reading.

Logistic regression

The main goal is to maximize the joint probability of the output label given the input data - in layman’s terms - make the predictions correct most of the time.

Since this equation gives:

and we want the predictions to represent the probability of a binary classification problem. In logistic regression we apply a sigmoid function to the regression term in order to achieve [0, 1] range and make it a normalized probability:

Where x is a data point and w, b are the parameters of our regression model. In order to maximize the joint probability we take its logarithmic mapping since the logarithm is a continuous and monotonic function therefore for a data and label pair (x, y) our loss function is:

We want to optimize this by stochastic gradient descent so we will need the gradients. For a dataset of

the computations are the following:

The i-th activation is:

And the i-th prediction:

The i-th loss therefore:

We still have to define the cost function which is the sum of the losses. If we do stochastic gradient descent we sample a batch of the dataset randomly and calculate the loss for

samples.

The derivatives are the following:

Since we know the chain rule we know that in order to get

we need:

Therefore:

Moving on the actual learnable parameters

Logistic regression in code with L2 weight regularization

Now the same in code:

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

Using sklearn a default binary classification dataset can be loaded and a metric can be used to test accuracy.

X, y = load_breast_cancer(return_X_y=True)

y.reshape(y.shape[0], 1)

print('Data shape : ', X.shape, '\t labels shape : ', y.shape)
Data shape :  (569, 30) 	 labels shape :  (569,)
n_samples = X.shape[0]

test_split = np.random.choice(n_samples, size=n_samples // 10, replace=False)
assert np.unique(test_split).size == n_samples // 10, 'Test split size constraint failed'

train_split = list(set(range(0, n_samples)) - set(test_split))
assert np.unique(train_split).size == n_samples - n_samples // 10, 'Train split size constraint failed'

X_train, y_train, X_test, y_test = X[train_split], y[train_split], X[test_split], y[test_split]
y_train = y_train.reshape(n_samples - n_samples // 10, 1)
y_test = y_test.reshape(n_samples // 10, 1)

X_train.shape, y_train.shape, X_test.shape, y_test.shape
((513, 30), (513, 1), (56, 30), (56, 1))

Here I implemented some custom splitting technique with numpy only but the default train_test_split method can be used from sklearn as well. Normalizing the dataset is also important to ensure numerical stability.

train_std, train_mean = np.std(X_train, axis=0), np.mean(X_train, axis=0)
test_std, test_mean = np.std(X_test, axis=0), np.mean(X_test, axis=0)

X_train -= train_mean
X_test -= test_mean
X_train /= train_std
X_test /= test_std

X_train.shape, X_test.shape
((513, 30), (56, 30))
def sigmoid(x):
    return 1. / (1. + np.exp(-x))

sigmoid = np.vectorize(sigmoid)

learning_rate = 5e-4
m = 750
max_iter = 5_000

# regularization
alpha = 10e-3

w = np.random.randn(1, X_train.shape[1])
b = np.random.randn()

for _iter in range(max_iter):
    batch_indices = np.random.choice(X_train.shape[0], size = m, replace = True)
    batch = X_train[batch_indices].T
    _y = y_train[batch_indices].reshape(1, m)
    assert batch.shape == (X_train.shape[1], m), 'Batch shape constraint failed'
    activation = np.dot(w, batch) + b
    assert activation.shape == (1, m), 'Activation shape constraint failed'
    y_hat = sigmoid(activation)
    assert y_hat.shape == activation.shape, 'Prediction shape constraint failed'
    
    if _iter % 500 == 0:
        # Calculate current accuracy
        y_pred_test = np.dot(w, X_test.T) + b
        y_pred_test[y_pred_test >= 0.5] = 1
        y_pred_test[y_pred_test < 0.5] = 0
        print('Iter : %d | Current accuracy on test set %.2f%%: ' % (_iter, 
                                                                     100 * accuracy_score(y_test, y_pred_test.T)))
    
    da = y_hat - _y
    assert da.shape == activation.shape, 'Activation derivative shape constraint failed'
    dw = np.matmul(batch, da.T).T
    assert dw.shape == w.shape, 'Weights derivative shape constraint failed'
    db = np.sum(da)
    # Update parameters
    w -= learning_rate * ( dw + 2. * alpha * np.linalg.norm(dw) )
    b -= learning_rate * ( db + 2. * alpha * np.abs(db) ) 
    
y_pred_test = np.dot(w, X_test.T) + b
y_pred_test[y_pred_test >= 0.5] = 1
y_pred_test[y_pred_test < 0.5] = 0
print('Final accuracy : %.2f%%' % (100 * accuracy_score(y_test, y_pred_test.T)))
Iter : 0 | Current accuracy on test set 16.07%: 
Iter : 500 | Current accuracy on test set 98.21%: 
Iter : 1000 | Current accuracy on test set 96.43%: 
Iter : 1500 | Current accuracy on test set 96.43%: 
Iter : 2000 | Current accuracy on test set 96.43%: 
Iter : 2500 | Current accuracy on test set 98.21%: 
Iter : 3000 | Current accuracy on test set 98.21%: 
Iter : 3500 | Current accuracy on test set 98.21%: 
Iter : 4000 | Current accuracy on test set 98.21%: 
Iter : 4500 | Current accuracy on test set 98.21%: 
Final accuracy : 98.21%

Here the final accuracy is 98.21%. Which seems to be pretty amazing for such a simple algorithm.

@Regards, Alex

Alex Olar

Alex Olar

Christian, foodie, physicist, tech enthusiast

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora