My First Deep Learning Model

2 minute read

In this notebook, I try to build my first model Deep Learning Model using Tensorflow and Keras.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.metrics import confusion_matrix

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier
df = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')
df.shape
(1470, 35)
df.head()
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2

5 rows × 35 columns

df.isna().sum()
Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSinceLastPromotion     0
YearsWithCurrManager        0
dtype: int64

No missing Values. And like I said, the purpose of this notebook it’s to build a deep learning model. So in this time, I won’t exploring the data.

Dataset Splitting

X = pd.get_dummies(df.drop(columns = 'Attrition'))
y = df['Attrition'].map({'Yes' : 1, 'No' : 0})

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 101, stratify = y)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
((1176, 55), (294, 55), (1176,), (294,))

Preprocessing

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Build the model

classifier = Sequential()
classifier.add(Dense(64, activation = 'relu', input_dim = 55))
classifier.add(Dropout(rate = 0.1))
classifier.add(Dense(1, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')

classifier.summary()
Model: "sequential_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_30 (Dense)             (None, 64)                3584      
_________________________________________________________________
dropout_14 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_31 (Dense)             (None, 1)                 65        
=================================================================
Total params: 3,649
Trainable params: 3,649
Non-trainable params: 0
_________________________________________________________________
def make_model():
    classifier = Sequential()
    classifier.add(Dense(64, activation = 'relu', input_dim = 55))
    classifier.add(Dropout(rate = 0.1))
    classifier.add(Dense(1, activation = 'sigmoid'))
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')
    return classifier

classifier = KerasClassifier(build_fn = make_model, batch_size=10, nb_epoch=1)
acc = cross_val_score(estimator = classifier,X =X_train,y = y_train,cv = 10,n_jobs = -1)
acc.mean()
0.8495581686496735

Running Predictions

classifier.fit(X_train, y_train, epochs = 5)
Epoch 1/5
118/118 [==============================] - 1s 11ms/step - loss: 0.5126 - accuracy: 0.7662
Epoch 2/5
118/118 [==============================] - 1s 10ms/step - loss: 0.3574 - accuracy: 0.8656
Epoch 3/5
118/118 [==============================] - 1s 11ms/step - loss: 0.3186 - accuracy: 0.8801
Epoch 4/5
118/118 [==============================] - 1s 10ms/step - loss: 0.3015 - accuracy: 0.8861
Epoch 5/5
118/118 [==============================] - 1s 11ms/step - loss: 0.2864 - accuracy: 0.8946





<tensorflow.python.keras.callbacks.History at 0x7f6f087014d0>
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
confusion_matrix(y_test, y_pred)
array([[167,  80],
       [ 35,  12]])

Thank you!