Google - 機会学習ライブラリTensorFlowに触れる - Ain't it fun living in the real world

Tutorialやってみた

最近、googleが自社の機会学習ライブラリをオープンソース化したということで早速触ってみました。
もうすでにいくつかブログや記事がありますが悪しからず。

基本Tutorialには二つあって、機会学習Beginner編と、Expert編があります。とりあえず、この二つをやってみたいと思います。

まずはインストール

自分はすでにpipを入れていたので、virtualenvでTensorFlow用の環境を作ってから、

pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl

でインストール完了！

正常にインストールされているか試すために、iPythonで下のようにテストしてみました。

$ ipython
In [1]: import tensorflow as tf
In [2]: hello = tf.constant('Hello TensorFlow!')
In [3]: sess = tf.Session()
In [4]: print sess.run(hello)
Hello TensorFlow!
In [5]: a = tf.constant(10)
In [6]: b = tf.constant(32)
In [7]: print sess.run(a+b)
42

でけとるでけとる。
ちなみに、iPythonなどのインタラクティブを使う場合は、
sess = tf.InteractiveSession()
の方がいいらしい。

MNIST For ML Beginners

最初にBeginner向けTutorialをやってみましょう。
とりあえず、MNISTとはなんぞやというはなしですが、MNISTとはようするに手書きの数字の画像です。http://yann.lecun.com/exdb/mnist/: Yann LeCunさんのブログに大量にあります。
f:id:the-middle-east:20151113140816p:plain
ちょうどこんな感じです。
Tutorailでは何がしたいかというと、大量のMNISTデータをTensorFlowにかけ、各画像がどの数字を表しているのかという予測をとって、正解と比べて、その精度を測ろうというものです。
MNISTのダウンロードはこちら。http://yann.lecun.com/exdb/mnist/。
ここで、
train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz
t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz
の四つをダウンロード。ちなみにtrainと書いてあるものは学習データです。これにtestデータをかけていきます。imagesは画像そのもののバイナリデータで、labelsはimagesに対応する数字を表すデータです。
TensorFlowをいじるにあたっていくつか知っておくことがあります。それはSessionとOperationという概念です。簡単に言うと、SessionはいくつかのOperationsを含有しているとう感覚です。Sessionをrunすると、それに含まれるOperationが実行される感じです。Operationには、値的なものと、関数的なものが存在します。挙動が結構違いますが、くくりとしては同じOperationなので注意です。このTutorialで出てくるOperationのうちよく出る三つの値的なものをチェックしておきます。
Variable:変数。Session内で値が変わる。
constant:定数。Session内で変更のないもの。
placeholder:箱?。Sessionをrunするときに引数的なものを入れる箱。
基本これら以外は関数的なものだと思います。
とりあえずコードを見ないことには始まらないので、以下にコードを載せておきます。

# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals
import input_data
import tensorflow as tf

### mnist(手書き数値)データの読み込み
mnist = input_data.read_data_sets("./MNIST_data/", one_hot=True)

### 設定
# placeholderは最後にsess.runする時にfeed_dictで与えられる"引数"の入る箱。Noneはどんなディメンションのものでも入る。
# Variableは読んで字のごとく変数
x = tf.placeholder("float", [None, 784]) 
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
# 下のnnはNeural Networkの略だと思われる。
y = tf.nn.softmax(tf.matmul(x, W) + b) 
y_ = tf.placeholder("float", [None, 10])
# 総和を求めてクロスエントロピーを求める
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
# 最急降下法(要数学的知識)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

### Sessionの初期化
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

### 1000回学習
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) # ここで値をplaceholderに渡している。

### 結果表示
print "~~~ Showing the result ~~~"
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) # yのほうが予測、y_の方が実際の値
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

91%くらいの精度でたーってなります。でも、Googleさんによると91%は全然らしく、次の Expert編では99.2%でるらしい。ということでそちらへ。

Deep MNIST for Experts

ここからは重み(入力された値がどれくらい強化/減衰されるかを決定する)やバイアス(実際の入力とは別に独立した値を入れることがあり、ax+bにおけるb(切片)のようなもの)なども考慮する感じになってきます。とりあえず、コードへ。

# -*-coding: utf-8 -*-
import input_data
import tensorflow as tf

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
x = tf.placeholder("float", shape=[None, 784])
x_image = tf.reshape(x, [-1,28,28,1])
y_ = tf.placeholder("float", shape=[None, 10])
sess = tf.InteractiveSession()

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

print "~~~ init ~~~"
sess.run(tf.initialize_all_variables())

print "~~~ learning 20000 times ~~~"
for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i%100 == 0:
	    train_accuracy = accuracy.eval(feed_dict={
	        x:batch[0], y_: batch[1], keep_prob: 1.0})
	    print "step %d, training accuracy %g"%(i, train_accuracy)
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "~~~ Showing the result ~~~"
print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

出力結果
~~~ learning 20000 times ~~~
step 0, training accuracy 0.02
step 100, training accuracy 0.86
step 200, training accuracy 0.94
step 300, training accuracy 0.92
step 400, training accuracy 0.96
step 500, training accuracy 0.92
.................
以下省略

accuracyがだんだんあがっていくのがわかると思います。

まとめ

ここまでのTutorialでは、なにをやっているかは何となくわかると思います。しかし、公式のDocsをみるとまだまだ多くの難解Operationが...
う〜む。なるほど。勉強します汗

最後に、TensorFlowにはTensorBoardというグラフの可視化ツールまで付いているというびっくりを共有して終わりにしたいと思います。googleすげー。
f:id:the-middle-east:20151113152046p:plain