My First Deep Learning Model

2 minute read

In this notebook, I try to build my first model Deep Learning Model using Tensorflow and Keras.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.metrics import confusion_matrix

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier

df = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')
df.shape

(1470, 35)

df.head()

	Age	Attrition	BusinessTravel	DailyRate	Department	DistanceFromHome	Education	EducationField	EmployeeCount	EmployeeNumber	...	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	41	Yes	Travel_Rarely	1102	Sales	1	2	Life Sciences	1	1	...	1	80	0	8	0	1	6	4	0	5
1	49	No	Travel_Frequently	279	Research & Development	8	1	Life Sciences	1	2	...	4	80	1	10	3	3	10	7	1	7
2	37	Yes	Travel_Rarely	1373	Research & Development	2	2	Other	1	4	...	2	80	0	7	3	3	0	0	0	0
3	33	No	Travel_Frequently	1392	Research & Development	3	4	Life Sciences	1	5	...	3	80	0	8	3	3	8	7	3	0
4	27	No	Travel_Rarely	591	Research & Development	2	1	Medical	1	7	...	4	80	1	6	3	3	2	2	2	2

5 rows × 35 columns

df.isna().sum()

Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSinceLastPromotion     0
YearsWithCurrManager        0
dtype: int64

No missing Values. And like I said, the purpose of this notebook it’s to build a deep learning model. So in this time, I won’t exploring the data.

Dataset Splitting

X = pd.get_dummies(df.drop(columns = 'Attrition'))
y = df['Attrition'].map({'Yes' : 1, 'No' : 0})

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 101, stratify = y)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1176, 55), (294, 55), (1176,), (294,))

Preprocessing

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Build the model

classifier = Sequential()
classifier.add(Dense(64, activation = 'relu', input_dim = 55))
classifier.add(Dropout(rate = 0.1))
classifier.add(Dense(1, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')

classifier.summary()

Model: "sequential_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_30 (Dense)             (None, 64)                3584      
_________________________________________________________________
dropout_14 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_31 (Dense)             (None, 1)                 65        
=================================================================
Total params: 3,649
Trainable params: 3,649
Non-trainable params: 0
_________________________________________________________________

def make_model():
    classifier = Sequential()
    classifier.add(Dense(64, activation = 'relu', input_dim = 55))
    classifier.add(Dropout(rate = 0.1))
    classifier.add(Dense(1, activation = 'sigmoid'))
    classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')
    return classifier

classifier = KerasClassifier(build_fn = make_model, batch_size=10, nb_epoch=1)

acc = cross_val_score(estimator = classifier,X =X_train,y = y_train,cv = 10,n_jobs = -1)

acc.mean()

0.8495581686496735

Running Predictions

classifier.fit(X_train, y_train, epochs = 5)

Epoch 1/5
118/118 [==============================] - 1s 11ms/step - loss: 0.5126 - accuracy: 0.7662
Epoch 2/5
118/118 [==============================] - 1s 10ms/step - loss: 0.3574 - accuracy: 0.8656
Epoch 3/5
118/118 [==============================] - 1s 11ms/step - loss: 0.3186 - accuracy: 0.8801
Epoch 4/5
118/118 [==============================] - 1s 10ms/step - loss: 0.3015 - accuracy: 0.8861
Epoch 5/5
118/118 [==============================] - 1s 11ms/step - loss: 0.2864 - accuracy: 0.8946





<tensorflow.python.keras.callbacks.History at 0x7f6f087014d0>

y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
confusion_matrix(y_test, y_pred)

array([[167,  80],
       [ 35,  12]])

Thank you!

Share on

Twitter Facebook Google+ LinkedIn

Stevanus Setiawan

My First Deep Learning Model

Dataset Splitting

Preprocessing

Build the model

Running Predictions

Thank you!

Share on

You May Also Enjoy

Fraud Detection on Bank Payments - Classification

Predicting Credit Card Default - Classification

Bank Telemarketing Analysis - Classification

Credit Risk Analysis - Classification and Clustering