Neural Networks Multi-Class Classification in Python

This complete guide to multi class neural networks will transform our data, create the model, evaluate with k-fold cross validation, compile and evaluate a model and save the model for later use. Later, we will reload the models to make predictions without the need to re-train.

Introduction

This is a multi-class classification problem, meaning that there are more than two classes to be predicted, in fact there are 7 categories.

You can download the source code from GitHub
If you would like to see how to code a neural network from scratch, check this article
Download the dataset we will be using from Kaggle
A very good article on multi class concepts which I reference below

This article will focus on:

            Import the classes and functions
            Train and save the model
            2.1 Load our data
            2.2 Prepare our features
            2.3 Split train and test data
            2.4 Hot Encoding Y
            2.5 Define The Neural Network Model
            2.6 Evaluate The Model with k-Fold Cross Validation
            2.7 Compile and evaluate model on training data
            2.8 Plot the learning curve
            2.9 Save the model
            3. Reload models from disk and predict
            3.1 Look at our files
            3.2 Reload the model
            3.4 Reload 5% random data
            3.5 Transform features
            3.6 Predict and check for accuracy
            4. Conclusion
          

1. Import Classes and functions

We can begin by importing all of the classes and functions we will need in this tutorial.

            from keras.models import Sequential
            from keras.layers import Dense
            from keras.wrappers.scikit_learn import KerasClassifier
            from keras.utils import np_utils
            from sklearn.model_selection import cross_val_score
            from sklearn.model_selection import KFold
            from sklearn.preprocessing import LabelEncoder
            from sklearn.pipeline import Pipeline
            from sklearn.preprocessing import MinMaxScaler
            from sklearn.model_selection import train_test_split
            from joblib import dump, load
            import pandas as pd
            import numpy as np
          

2. Train and save model

2.1 Load our data

Lets load our data into a dataframe.

              df = pd.read_csv('data/customertrain.csv')
              df = df.dropna()
              df = df.drop(['Segmentation','ID'], axis=1) # not needed
              df.head()
            

Figure 1: Results of loading the data

We can see that we have 8068 training examples, but we do have some things to sort out:

We will need to encode the categories from Y
Use dump to save the encoder for later use

              from sklearn.preprocessing import LabelEncoder
              from joblib import dump
              
              def prepareY(df):
              
                # extract Y and drop from dataframe
               Y = df['Var_1']
               # encode class values as integers
               yencoder = LabelEncoder()
               yencoder.fit(Y)
              
                dump(yencoder, 'models/yencoder.joblib')
               return yencoder.transform(Y)
              y = prepareY(df)
              df = df.drop(['Var_1'], axis=1)
              pd.DataFrame(y).head()
            

Figure 2: Y has been encoded

2.2 Prepare our features

We need to do a few things to our features, so we can work with them a little easier.

Lets convert our string fields to numbers using OrdinalEncoder
Use MinMaxScaler to normalise our numbers so thay have mean of zero with a deviation of 1.

Get a list of our string and numeral columns.

              numerical_ix = df.select_dtypes(include=['int64',
                'float64']).columns
              categorical_ix = df.select_dtypes(include=['object',
                'bool']).columns
            

Use ColumnTransformer to encode our string columns and then apply regularization to the numeric columns. We will use dump to save the column_trans class for later use.

              from sklearn.compose import ColumnTransformer
              from sklearn.preprocessing import OrdinalEncoder
              from sklearn.preprocessing import MinMaxScaler
              column_trans = ColumnTransformer([
              
                ('cat', OrdinalEncoder(),categorical_ix),
              
                ('num', MinMaxScaler(feature_range=(-1, 1)),
                numerical_ix)],
               remainder='drop')
              column_trans.fit(df)
              dump(column_trans,"models/column_trans.joblib")
              X = column_trans.transform(df)
              pd.DataFrame(X).head()
            

Figure 3: Results after running column transformation

2.3 Split train and test data

              X_train, X_test, y_train, y_test = train_test_split(X, y,
                test_size=0.2, random_state=42)
            

2.4 Hot Encoding Y

The output variable contains seven different string values.

When modeling multi-class classification problems using neural networks, it is good practice to reshape the output attribute from a vector that contains values for each class value to be a matrix with a boolean for each class value and whether or not a given instance has that class value or not.

This is called `one hot encoding` or creating dummy variables from a categorical variable.

For example, in this problem six class values are [1,2,3,4,5,6]. We can turn this into a one-hot encoded binary matrix for each data instance that would look as follows:

Figure 4: Results of hot encoding Y

              yhot = np_utils.to_categorical(y)
              yhot_train = np_utils.to_categorical(y_train)
              yhot_test = np_utils.to_categorical(y_test)
            

2.5 Define The Neural Network Model

So, now you are asking "What are reasonable numbers to set these to?"

Input layer = set to the size of the features ie. 8
Hidden layers = set to input_layer * 2 (ie. 16)
Output layer = set to the size of the labels of Y. In our case, this is 7 categories

The network topology of this two-layer neural network can be summarized as:

8 inputs -> [16 hidden nodes] -> 7 outputs

Now create our model inside a function so we can use it in the KerasClassifier as well as later when we compile our model.

              # define baseline model
              def baseline_model():
               # create model
               model = Sequential()
               
              
                # Rectified Linear Unit Activation Function
              
                model.add(Dense(16, input_dim=8, activation='relu'))
              
                model.add(Dense(16, activation = 'relu'))
              
                # Softmax for multi-class classification
              
                model.add(Dense(7, activation='softmax'))
               # Compile model
              
                model.compile(loss='categorical_crossentropy', optimizer='adam',
                metrics=['accuracy'])
               return model
            

We can now create our KerasClassifier for use in scikit-learn. We us mini batches as this tends to be the fastest to train

              cmodel = KerasClassifier(build_fn=baseline_model, epochs=200,
                batch_size=100, verbose=0)
            

2.6 Evaluate The Model with k-Fold Cross Validation

Now, lets evaluate the neural network model on all our data. Let's define the model evaluation procedure. Here, we set

kfold = KFold(n_splits=10, shuffle=True)

Now we can evaluate our model on our dataset (X and yhot) using a 10-fold cross-validation procedure (kfold).

              result = cross_val_score(cmodel, X, yhot, cv=kfold)
              print('Result: %.2f%% (%.2f%%)' % (result.mean()*100,
                result.std()*100))
            

After running above, you should see a result of around 67.64%.

Great, kfold has done its job, this is the best we can hope for from this dataset in terms of accuracy

2.7 Compile and evaluate model on training data

Now, that we are happy with our epochs and batch size, lets compile a model we can use later.

              model = baseline_model()
              model.compile(loss='categorical_crossentropy',
                optimizer='adam', metrics=['accuracy'])
              history = model.fit(X_train, yhot_train, validation_split=0.33,
                epochs=200, batch_size=100, verbose=0)
            

2.8 Plot the learning curve

The plots are provided below. The history for the validation dataset is labeled test by convention as it is indeed a test dataset for the model.

We can also see that the model has not yet over-learned the training dataset, showing comparable on both datasets.

              import matplotlib.pyplot as plt
              # list all data in history
              print(history.history.keys())
              # summarize history for accuracy
              plt.plot(history.history['accuracy'])
              plt.plot(history.history['val_accuracy'])
              plt.title('model accuracy')
              plt.ylabel('accuracy')
              plt.xlabel('epoch')
              plt.legend(['train', 'test'], loc='upper left')
              plt.show()
              # summarize history for loss
              plt.plot(history.history['loss'])
              plt.plot(history.history['val_loss'])
              plt.title('model loss')
              plt.ylabel('loss')
              plt.xlabel('epoch')
              plt.legend(['train', 'test'], loc='upper left')
              plt.show()
            

Figure 5: Our learning curve is looking good. Could even reduce the epochs

Let's run an evaluation on our test set and see how we hold up with new data. You should end up with an accuracy of 69.09%

              # evaluate the keras model
              _, accuracy = model.evaluate(X_test, yhot_test)
              print('Accuracy from evaluate: %.2f' % (accuracy*100))
            

Finally, for fun, let's make a prediction on ALL our data and see how we go. Again, you should end up with an accuracy of 69%

              predict_x = model.predict(X_test)
              pred = np.argmax(predict_x, axis=1)
              print(f'Prediction Accuracy: {(pred == y_test).mean() *
                100:f}')
            

2.9 Save the model

Now, lets save the model, so later we can reload and make predicions without the need to retrain. The model is then converted to JSON format and written to model.json in the local directory. The network weights are written to model.h5 in the local directory.

              model_json = model.to_json()
              with open("models/customermodel.json", "w") as json_file:
               json_file.write(model_json)
              # serialize weights to HDF5
              model.save_weights("model.h5")
              print("Saved model to disk")
            

3. Reload models from disk and predict

3.1 Look at our files

The model and weight data is loaded from the saved files and a new model is created. It is important to compile the loaded model before it is used. This is so that predictions made using the model can use the appropriate efficient computation from the Keras backend.

The model is evaluated in the same way printing the same evaluation score.

ls -lt models

Figure 6: Saving our model, transformer and encoder

3.2 Reload the models

We will reload our data, simulating the event where we may be wanting to run a prediction a day or two later.

              from keras.models import model_from_json
              # load json and create model
              json_file = open('models/customermodel.json', 'r')
              loaded_model_json = json_file.read()
              json_file.close()
              loaded_model = model_from_json(loaded_model_json)
              # load weights into new model
              loaded_model.load_weights('model.h5')
              print('Loaded model from disk')
              # evaluate loaded model on test data
              loaded_model.compile(loss='categorical_crossentropy',
                optimizer='adam', metrics=['accuracy'])
            

Now, lets reload our transformer

column_trans = load('models/column_trans.joblib')

3.4 Reload 5% random data

Reload our training data, but take a 10% random sample

              df = pd.read_csv('data/customertrain.csv')
              df = df.sample(frac=0.05)
              df.dropna(inplace=True)
              df = df.drop(['Segmentation', 'ID'], axis=1) # not needed
              df.info()
            

Now, when we reload Y, we first want to load our original encoder. Naturally, we cannot have new categories, else we will get an error at this point.

              def prepareYreload(df):
              
                yencoder = load("models/yencoder.joblib")
              
                return yencoder.transform(df["Var_1"])
              y = prepareYreload(df)
              df = df.drop(["Var_1"], axis=1)
              pd.DataFrame(y).head()
            

Ok, lets have a look at our data before we transform it

df.head()

Figure 7: Features before we transform

              column_trans = load("models/column_trans.joblib")
              X = column_trans.transform(df)
              pd.DataFrame(X).head()
            

Figure 8: Features after we transform

3.6 Predict and check for accuracy

Reload our training data, but take a 10% random sample, you again should end up with an accuracy of 69%.

              predict_x = loaded_model.predict(X)
              pred = np.argmax(predict_x, axis=1)
              print(f'Prediction Accuracy: {(pred == y).mean() *
                100:f}')
            

Now, lets reload our transformer

column_trans = load('models/column_trans.joblib')

4. Conclusion

In this article you discovered how to develop and evaluate a neural network using the Keras Python library for deep learning.

You learned:

How to load data and make it available to Keras.
How to prepare multi-class classification data for modeling using one hot encoding.
How to use Keras neural network models with scikit-learn.
How to define a neural network using Keras for multi-class classification.
How to evaluate a Keras neural network model using scikit-learn with k-fold cross validation

5. Sources

In this article I did find https://machinelearningmastery.com very helpful with alot of concepts easily explained.