# Logistic regression

- 9 mins# Intro

I though I would give some introduction on basic machine learning techniques and include some basic implementation with the explanations. I started with the topic below. Hope you will enjoy reading.

## Logistic regression

The main goal is to maximize the joint probability of the output label given the input data - in layman’s terms - make the predictions correct most of the time.

Since this equation gives:

and we want the predictions to represent the probability of a binary classification problem. In logistic regression we apply a sigmoid function to the regression term in order to achieve [0, 1] range and make it a normalized probability:

Where ** x** is a data point and

**,**

*w***are the parameters of our regression model. In order to maximize the joint probability we take its logarithmic mapping since the logarithm is a continuous and monotonic function therefore for a data and label pair (x, y) our loss function is:**

*b*We want to optimize this by stochastic gradient descent so we will need the gradients. For a dataset of

the computations are the following:

The ** i**-th activation is:

And the ** i**-th prediction:

The ** i**-th loss therefore:

We still have to define the cost function which is the sum of the losses. If we do stochastic gradient descent we sample a batch of the dataset randomly and calculate the loss for

samples.

The derivatives are the following:

Since we know the chain rule we know that in order to get

we need:

Therefore:

Moving on the actual learnable parameters

## Logistic regression in code with L2 weight regularization

Now the same in code:

```
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
```

Using `sklearn`

a default binary classification dataset can be loaded and a metric can be used to test accuracy.

```
X, y = load_breast_cancer(return_X_y=True)
y.reshape(y.shape[0], 1)
print('Data shape : ', X.shape, '\t labels shape : ', y.shape)
```

```
Data shape : (569, 30) labels shape : (569,)
```

```
n_samples = X.shape[0]
test_split = np.random.choice(n_samples, size=n_samples // 10, replace=False)
assert np.unique(test_split).size == n_samples // 10, 'Test split size constraint failed'
train_split = list(set(range(0, n_samples)) - set(test_split))
assert np.unique(train_split).size == n_samples - n_samples // 10, 'Train split size constraint failed'
X_train, y_train, X_test, y_test = X[train_split], y[train_split], X[test_split], y[test_split]
y_train = y_train.reshape(n_samples - n_samples // 10, 1)
y_test = y_test.reshape(n_samples // 10, 1)
X_train.shape, y_train.shape, X_test.shape, y_test.shape
```

```
((513, 30), (513, 1), (56, 30), (56, 1))
```

Here I implemented some custom splitting technique with `numpy`

only but the default `train_test_split`

method can be used from `sklearn`

as well. Normalizing the dataset is also important to ensure numerical stability.

```
train_std, train_mean = np.std(X_train, axis=0), np.mean(X_train, axis=0)
test_std, test_mean = np.std(X_test, axis=0), np.mean(X_test, axis=0)
X_train -= train_mean
X_test -= test_mean
X_train /= train_std
X_test /= test_std
X_train.shape, X_test.shape
```

```
((513, 30), (56, 30))
```

```
def sigmoid(x):
return 1. / (1. + np.exp(-x))
sigmoid = np.vectorize(sigmoid)
learning_rate = 5e-4
m = 750
max_iter = 5_000
# regularization
alpha = 10e-3
w = np.random.randn(1, X_train.shape[1])
b = np.random.randn()
for _iter in range(max_iter):
batch_indices = np.random.choice(X_train.shape[0], size = m, replace = True)
batch = X_train[batch_indices].T
_y = y_train[batch_indices].reshape(1, m)
assert batch.shape == (X_train.shape[1], m), 'Batch shape constraint failed'
activation = np.dot(w, batch) + b
assert activation.shape == (1, m), 'Activation shape constraint failed'
y_hat = sigmoid(activation)
assert y_hat.shape == activation.shape, 'Prediction shape constraint failed'
if _iter % 500 == 0:
# Calculate current accuracy
y_pred_test = np.dot(w, X_test.T) + b
y_pred_test[y_pred_test >= 0.5] = 1
y_pred_test[y_pred_test < 0.5] = 0
print('Iter : %d | Current accuracy on test set %.2f%%: ' % (_iter,
100 * accuracy_score(y_test, y_pred_test.T)))
da = y_hat - _y
assert da.shape == activation.shape, 'Activation derivative shape constraint failed'
dw = np.matmul(batch, da.T).T
assert dw.shape == w.shape, 'Weights derivative shape constraint failed'
db = np.sum(da)
# Update parameters
w -= learning_rate * ( dw + 2. * alpha * np.linalg.norm(dw) )
b -= learning_rate * ( db + 2. * alpha * np.abs(db) )
y_pred_test = np.dot(w, X_test.T) + b
y_pred_test[y_pred_test >= 0.5] = 1
y_pred_test[y_pred_test < 0.5] = 0
print('Final accuracy : %.2f%%' % (100 * accuracy_score(y_test, y_pred_test.T)))
```

```
Iter : 0 | Current accuracy on test set 16.07%:
Iter : 500 | Current accuracy on test set 98.21%:
Iter : 1000 | Current accuracy on test set 96.43%:
Iter : 1500 | Current accuracy on test set 96.43%:
Iter : 2000 | Current accuracy on test set 96.43%:
Iter : 2500 | Current accuracy on test set 98.21%:
Iter : 3000 | Current accuracy on test set 98.21%:
Iter : 3500 | Current accuracy on test set 98.21%:
Iter : 4000 | Current accuracy on test set 98.21%:
Iter : 4500 | Current accuracy on test set 98.21%:
Final accuracy : 98.21%
```

Here the final accuracy is 98.21%. Which seems to be pretty amazing for such a simple algorithm.