GradientTape로 간단한 CNN 학습하기

Sequential API와 Functional API 그리고 Model Subclassing API를 이용하여 CNN을 구현해 보았다.

이번에는 model.compile 로 모델을 컴파일하고 model.fit 로 학습하는 대신에 모델에 손실함수와 옵티마이저를 직접 엮어 넣고, tf.GradientTape 를 사용하여 CNN을 학습해 보도록 하겠다. 전과 똑같이 데이터셋은 Fashion_MNIST이고 CNN모델도 전과 똑같으며 Model Subclassing API로 구축한 것이다.

model.compile 로 모델을 컴파일하는 부분은 다음과 같았다.

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.compile 대신에 손실함수와 옵티마이저를 다음과 같이 선택하고 학습시에 모델에 적용시킨다. 학습 데이터셋과 테스트 데이터셋을 이용하여 모델의 성능을 측정할 지표도 함께 선택한다.

# loss function and optimizer
get_loss = tf.keras.losses.SparseCategoricalCrossentropy()
opt = tf.keras.optimizers.Adam(learning_rate=0.001)

# metrics
mean_train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseTopKCategoricalAccuracy(name='train_accuracy')
mean_test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseTopKCategoricalAccuracy(name='test_accuracy')

손실함수는 교차 엔트로피(cross entropy)를 사용하고 최적화는 아담 옵티마이저(Adam optimizer)를 이용한다. 학습률은 0.001이다. 모델을 학습하는 부분은 다음과 같았다.

history = model.fit(x_train, y_train, epochs=10, validation_split=0.25)

model.fit 대신에 다음과 같이 tf.GradientTape 를 사용하여 모델을 학습하며, 테스트 데이터셋을 이용하여 정확도를 계산하는 부분도 함께 처리한다.

def train_step(images, labels):
  with tf.GradientTape() as tape:
    predictions = model(images)
    loss = get_loss(labels, predictions)
  grad = tape.gradient(loss, model.trainable_variables)
  opt.apply_gradients(zip(grad, model.trainable_variables))

  mean_train_loss(loss)
  train_accuracy(labels, predictions)

def test_step(images, labels):
  predictions = model(images)
  loss_t = get_loss(labels, predictions)

  mean_test_loss(loss_t)
  test_accuracy(labels, predictions)

train_step은 학습용 데이터셋으로 신경망을 학습하는 함수이고, test_step는 테스트 데이터셋으로 학습 성능을 계산하는 함수다. GadientTape.gradient 로 미분을 계산하며 apply_gradient 로 신경망 파라미터를 업데이트한다. 학습용 배치 데이터셋을 32개로 설정한다.

batch_train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(32)
batch_test = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

에폭 10번의 학습을 실시하며 에폭마다 학습 결과를 표시한다.

EPOCHS = 10

train_loss_history = []
test_loss_history = []

for epoch in range(EPOCHS):
  for images, labels in batch_train:
    train_step(images, labels)

  for images, labels in batch_test:
    test_step(images, labels)

  train_loss_history.append(mean_train_loss.result().numpy())
  test_loss_history.append((mean_test_loss.result().numpy()))

  print('epoch: {}, loss: {}, accuracy: {}, test_loss: {}, test_acc: {}'.format(
    epoch+1,
    mean_train_loss.result(),
    train_accuracy.result(),
    mean_test_loss.result(),
    test_accuracy.result()
  ))

약 99%의 테스트 정확도가 나온다.

다음은 테스트 데이터셋에서 20개의 패션 이미지를 추출하여 CNN이 아이템을 제대로 인식했는지 보기 위한 코드이다.

labels = model(x_test)

fig = plt.figure(figsize=(10,10))
for i in range(20):
    subplot=fig.add_subplot(4,5,i+1)
    subplot.set_xticks([])
    subplot.set_yticks([])
    subplot.set_title('%d' % np.argmax(labels[i]))
    subplot.imshow(x_test[i].reshape(28,28), cmap='gray')

plt.show()

다음 그림이 그 결과이다. 패션 아이템을 제대로 인식한 것을 알 수 있다.

참고로 Fashion_MNIST의 라벨은 10가지로서, 0: 상의, 1: 바지, 2: 스웨터, 3: 드레스, 4: 코트, 5: 샌달, 6: 셔츠, 7: 운동화, 8: 가방, 9: 부츠를 나타낸다.

전체 코드는 다음과 같다.

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

# loading fashion_mnist data
fashion_mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()


# adjusting to 0 ~ 1.0
x_train = x_train / 255.0
x_test = x_test / 255.0


# reshaping
x_train = x_train.reshape(-1,28,28,1)
x_test = x_test.reshape(-1,28,28,1)


class MyModel(tf.keras.Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.conv1 = tf.keras.layers.Conv2D(kernel_size=(3,3), filters=16, activation='relu')
    self.conv2 = tf.keras.layers.Conv2D(kernel_size=(3,3), filters=32, activation='relu')
    self.conv3 = tf.keras.layers.Conv2D(kernel_size=(3,3), filters=64, activation='relu')
    self.pool = tf.keras.layers.MaxPooling2D((2, 2))
    self.flatten = tf.keras.layers.Flatten()
    self.d1 = tf.keras.layers.Dense(32, activation='relu')
    self.d2 = tf.keras.layers.Dense(10, activation='softmax')

  def call(self, x):
    x = self.conv1(x)
    x = self.pool(x)
    x = self.conv2(x)
    x = self.pool(x)
    x = self.conv3(x)
    x = self.flatten(x)
    x = self.d1(x)
    return self.d2(x)

model = MyModel()

# instead of model.compile --------------------------------

# loss function and optimizer
get_loss = tf.keras.losses.SparseCategoricalCrossentropy()
opt = tf.keras.optimizers.Adam(learning_rate=0.001)

# metrics
mean_train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseTopKCategoricalAccuracy(name='train_accuracy')
mean_test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseTopKCategoricalAccuracy(name='test_accuracy')
# -----------------------------------------------


# instead of model.fit -------------------------------------
def train_step(images, labels):
  with tf.GradientTape() as tape:
    predictions = model(images)
    loss = get_loss(labels, predictions)
  grad = tape.gradient(loss, model.trainable_variables)
  opt.apply_gradients(zip(grad, model.trainable_variables))

  mean_train_loss(loss)
  train_accuracy(labels, predictions)

def test_step(images, labels):
  predictions = model(images)
  loss_t = get_loss(labels, predictions)

  mean_test_loss(loss_t)
  test_accuracy(labels, predictions)

# ---------------------------------------------------

# batch
batch_train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(32)
batch_test = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

# train
EPOCHS = 10

train_loss_history = []
test_loss_history = []

for epoch in range(EPOCHS):
  for images, labels in batch_train:
    train_step(images, labels)

  for images, labels in batch_test:
    test_step(images, labels)

  train_loss_history.append(mean_train_loss.result().numpy())
  test_loss_history.append((mean_test_loss.result().numpy()))

  print('epoch: {}, loss: {}, accuracy: {}, test_loss: {}, test_acc: {}'.format(
    epoch+1,
    mean_train_loss.result(),
    train_accuracy.result(),
    mean_test_loss.result(),
    test_accuracy.result()
  ))

plt.plot(train_loss_history, "-", label="train")
plt.plot(test_loss_history, "--", label="test")
plt.legend()
plt.show()

# test view

labels = model(x_test)

fig = plt.figure(figsize=(10,10))
for i in range(20):
    subplot=fig.add_subplot(4,5,i+1)
    subplot.set_xticks([])
    subplot.set_yticks([])
    subplot.set_title('%d' % np.argmax(labels[i]))
    subplot.imshow(x_test[i].reshape(28,28), cmap='gray')

plt.show()

'프로그래밍 > TensorFlow2' 카테고리의 다른 글

텐서와 변수 - 2 (0)	2021.02.10
텐서와 변수 - 1 (0)	2021.02.09
Model Subclassing API로 간단한 CNN 구현해 보기 (0)	2021.01.11
Functional API로 간단한 CNN 구현해 보기 (0)	2021.01.11
Sequential API로 간단한 CNN 구현해 보기 (0)	2020.07.17

DeepCampus

GradientTape로 간단한 CNN 학습하기

'프로그래밍 > TensorFlow2' 카테고리의 다른 글

댓글

티스토리툴바

GradientTape로 간단한 CNN 학습하기

'프로그래밍 > TensorFlow2' 카테고리의 다른 글

관련글

댓글

티스토리툴바