TLDR

내가 MNIST에 대한 간단한 신경 네트워크에 맞게 시도하고있다, 그것은 작은 디버깅 설정을 위해 작동하지만, 내가 MNIST의 부분 집합에 데려 때, 그것은 슈퍼 빠른 기차 그라디언트는 매우 빠르게 0에 가까워 지지만 주어진 입력에 대해 동일한 값을 출력하고 최종 비용은 상당히 높습니다. 나는 의도적으로 그것이 실제로 작동하는지 확인하기 위해 지나치게 노력하고 있었지만, MNIST에서 셋업에 깊은 문제를 제안하지는 않을 것입니다. 그라데이션 검사를 사용하여 backpropagation 구현을 검사 한 결과 오류가 어디에 있는지, 또는 지금 해결해야 할 부분이 무엇인지 잘 모르는 것처럼 보입니다.디버깅 신경망

당신이 제공 할 수있는 도움에 대해 많은 감사를 드리며,이 문제를 해결하기 위해 애 쓰고 있습니다!

설명

나는이 설명에 따라, NumPy와의 신경 네트워크를 만들려고 노력되었습니다

Backpropagation: [ 0.01168585, 0.06629858, -0.00112408, -0.00642625, -0.01339408, 
    -0.07580145, 0.00285868, 0.01628148, 0.00365659, 0.0208475 , 
    0.11194151, 0.16696139, 0.10999967, 0.13873069, 0.13049299, 
    -0.09012582, -0.1344335 , -0.08857648, -0.11168955, -0.10506167] 
Gradient Checking: [-0.01168585 -0.06629858 0.00112408 0.00642625 0.01339408 
    0.07580145 -0.00285868 -0.01628148 -0.00365659 -0.0208475 
    -0.11194151 -0.16696139 -0.10999967 -0.13873069 -0.13049299 
    0.09012582 0.1344335 0.08857648 0.11168955 0.10506167]

그리고 : http://ufldl.stanford.edu/wiki/index.php/Neural_Networks http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm

역 전파 그라데이션 검사와 일치하는 것 이 간단한 디버그 설정에 대해 교육합니다.

나는이 사랑스러운 훈련 곡선

은 틀림이 명확 맞게 다운 무식하게, 아주 쉬운 기능입니다 얻을. 그러나 최대한 빨리이 설정으로, MNIST에 데려 같이

# Number of input, hidden and ouput nodes 
    # Input = 28 x 28 pixels 
    input_nodes=784 
    # Arbitrary number of hidden nodes, experiment to improve 
    hidden_nodes=200 
    # Output = one of the digits [0,1,2,3,4,5,6,7,8,9] 
    output_nodes=10 

    # Learning rate 
    learning_rate=0.4 

    # Regularisation parameter 
    lambd=0.0

아래 코드에서이 설정을 실행으로, 100 반복을 위해, 먼저 다음 그냥 "플랫 라인"에서 확실히 훈련을 보인다 신속하고 나던 아주 좋은 모델을 달성 :

Initial ===== Cost (unregularised): 2.09203670985 /// Cost (regularised):  2.09203670985 Mean Gradient: 0.0321241229793 
Iteration 100 Cost (unregularised): 0.980999805477 /// Cost (regularised): 0.980999805477 Mean Gradient: -5.29639499854e-09 
TRAINED IN 26.45932364463806

을이 모든 입력이 0.1 또는 모두 0 인 테스트에도 정말 좋지 테스트 정확도를 제공하고 동일한 출력을 예측한다.(정확하게는 출력되는 숫자 초기 임의의 무게에 따라 달라집니다 있지만) 9 난 그냥 같은 출력을 얻을 :

Test accuracy: 8.92 
Targets 2 2 1 7 2 2 0 2 3 
Hypothesis 5 5 5 5 5 5 5 5 5

을 그리고 MNIST 교육에 대한 곡선 :

코드 덤프 :

# Import dependencies 
import numpy as np 
import time 
import csv 
import matplotlib.pyplot 
import random 
import math 

# Read in training data 
with open('MNIST/mnist_train_100.csv') as file: 
    train_data=np.array([list(map(int,line.strip().split(','))) for line in file.readlines()]) 


# In[197]: 

# Plot a sample of training data to visualise 
displayData(train_data[:,1:], 25) 


# In[198]: 

# Read in test data 
with open('MNIST/mnist_test.csv') as file: 
    test_data=np.array([list(map(int,line.strip().split(','))) for line in file.readlines()]) 

# Main neural network class 
class neuralNetwork: 
    # Define the architecture 
    def __init__(self, i, h, o, lr, lda): 
     # Number of nodes in each layer 
     self.i=i 
     self.h=h 
     self.o=o 
     # Learning rate 
     self.lr=lr 
     # Lambda for regularisation 
     self.lda=lda 

     # Randomly initialise the parameters, input-> hidden and hidden-> output 
     self.ih=np.random.normal(0.0,pow(self.h,-0.5),(self.h,self.i)) 
     self.ho=np.random.normal(0.0,pow(self.o,-0.5),(self.o,self.h)) 

    def predict(self, X): 
     # GET HYPOTHESIS ESTIMATES/ OUTPUTS 
     # Add bias node x(0)=1 for all training examples, X is now m x n+1 
     # Then compute activation to hidden node 
     z2=np.dot(X,self.ih.T) + 1 
     #print(a1.shape) 
     a2=sigmoid(z2) 
     #print(ha) 
     # Add bias node h(0)=1 for all training examples, H is now m x h+1 
     # Then compute activation to output node 
     z3=np.dot(a2,self.ho.T) + 1 
     h=sigmoid(z3) 
     outputs=np.argmax(h.T,axis=0) 

     return outputs 

    def backprop (self, X, y): 
     try: 
      m = X.shape[0] 
     except: 
      m=1 

     # GET HYPOTHESIS ESTIMATES/ OUTPUTS 
     # Add bias node x(0)=1 for all training examples, X is now m x n+1 
     # Then compute activation to hidden node 
     z2=np.dot(X,self.ih.T) 
     #print(a1.shape) 
     a2=sigmoid(z2) 
     #print(ha) 
     # Add bias node h(0)=1 for all training examples, H is now m x h+1 
     # Then compute activation to output node 
     z3=np.dot(a2,self.ho.T) 
     h=sigmoid(z3) 

     # Compute error/ cost for this setup (unregularised and regularise) 
     costReg=self.costFunc(h,y) 
     costUn=self.costFuncReg(h,y) 

     # Output error term 
     d3=-(y-h)*sigmoidGradient(z3) 

     # Hidden error term 
     d2=np.dot(d3,self.ho)*sigmoidGradient(z2) 

     # Partial derivatives for weights 
     D2=np.dot(d3.T,a2) 
     D1=np.dot(d2.T,X) 

     # Partial derivatives of theta with regularisation 
     T2Grad=(D2/m)+(self.lda/m)*(self.ho) 
     T1Grad=(D1/m)+(self.lda/m)*(self.ih) 

     # Update weights 
     # Hidden layer (weights 1) 
     self.ih-=self.lr*(((D1)/m) + (self.lda/m)*self.ih) 
     # Output layer (weights 2) 
     self.ho-=self.lr*(((D2)/m) + (self.lda/m)*self.ho) 

     # Unroll gradients to one long vector 
     grad=np.concatenate(((T1Grad).ravel(),(T2Grad).ravel())) 

     return costReg, costUn, grad 

    def backpropIter (self, X, y): 
     try: 
      m = X.shape[0] 
     except: 
      m=1 

     # GET HYPOTHESIS ESTIMATES/ OUTPUTS 
     # Add bias node x(0)=1 for all training examples, X is now m x n+1 
     # Then compute activation to hidden node 
     z2=np.dot(X,self.ih.T) 
     #print(a1.shape) 
     a2=sigmoid(z2) 
     #print(ha) 
     # Add bias node h(0)=1 for all training examples, H is now m x h+1 
     # Then compute activation to output node 
     z3=np.dot(a2,self.ho.T) 
     h=sigmoid(z3) 

     # Compute error/ cost for this setup (unregularised and regularise) 
     costUn=self.costFunc(h,y) 
     costReg=self.costFuncReg(h,y) 

     gradW1=np.zeros(self.ih.shape) 
     gradW2=np.zeros(self.ho.shape) 
     for i in range(m): 
      delta3 = -(y[i,:]-h[i,:])*sigmoidGradient(z3[i,:]) 
      delta2 = np.dot(self.ho.T,delta3)*sigmoidGradient(z2[i,:]) 

      gradW2= gradW2 + np.outer(delta3,a2[i,:]) 
      gradW1 = gradW1 + np.outer(delta2,X[i,:]) 

     # Update weights 
     # Hidden layer (weights 1) 
     #self.ih-=self.lr*(((gradW1)/m) + (self.lda/m)*self.ih) 
     # Output layer (weights 2) 
     #self.ho-=self.lr*(((gradW2)/m) + (self.lda/m)*self.ho) 

     # Unroll gradients to one long vector 
     grad=np.concatenate(((gradW1).ravel(),(gradW2).ravel())) 

     return costUn, costReg, grad 

    def gradDesc(self, X, y): 
     # Backpropagate to get updates 
     cost,costreg,grad=self.backpropIter(X,y) 

     # Unroll parameters 
     deltaW1=np.reshape(grad[0:self.h*self.i],(self.h,self.i)) 
     deltaW2=np.reshape(grad[self.h*self.i:],(self.o,self.h)) 

     # m = no. training examples 
     m=X.shape[0] 
     #print (self.ih) 
     self.ih -= self.lr * ((deltaW1))#/m) + (self.lda * self.ih)) 
     self.ho -= self.lr * ((deltaW2))#/m) + (self.lda * self.ho)) 
     #print(deltaW1) 
     #print(self.ih) 
     return cost,costreg,grad 


    # Gradient checking to compute the gradient numerically to debug backpropagation 
    def gradCheck(self, X, y): 
     # Unroll theta 
     theta=np.concatenate(((self.ih).ravel(),(self.ho).ravel())) 
     # perturb will add and subtract epsilon, numgrad will store answers 
     perturb=np.zeros(len(theta)) 
     numgrad=np.zeros(len(theta)) 
     # epsilon, e is a small number 
     e = 0.00001 
     # Loop over all theta 
     for i in range(len(theta)): 
      # Perturb is zeros with one index being e 
      perturb[i]=e 
      loss1=self.costFuncGradientCheck(theta-perturb, X, y) 
      loss2=self.costFuncGradientCheck(theta+perturb, X, y) 
      # Compute numerical gradient and update vectors 
      numgrad[i]=(loss1-loss2)/(2*e) 
      perturb[i]=0 
     return numgrad 

    def costFuncGradientCheck(self,theta,X,y): 
     T1=np.reshape(theta[0:self.h*self.i],(self.h,self.i)) 
     T2=np.reshape(theta[self.h*self.i:],(self.o,self.h)) 
     m=X.shape[0] 
     # GET HYPOTHESIS ESTIMATES/ OUTPUTS 
     # Compute activation to hidden node 
     z2=np.dot(X,T1.T) 
     a2=sigmoid(z2) 
     # Compute activation to output node 
     z3=np.dot(a2,T2.T) 
     h=sigmoid(z3) 

     cost=self.costFunc(h, y) 
     return cost #+ ((self.lda/2)*(np.sum(pow(T1,2)) + np.sum(pow(T2,2)))) 

    def costFunc(self, h, y): 
     m=h.shape[0] 
     return np.sum(pow((h-y),2))/m 

    def costFuncReg(self, h, y): 
     cost=self.costFunc(h, y) 
     return cost #+ ((self.lda/2)*(np.sum(pow(self.ih,2)) + np.sum(pow(self.ho,2)))) 

# Helper functions to compute sigmoid and gradient for an input number or matrix 
def sigmoid(Z): 
    return np.divide(1,np.add(1,np.exp(-Z))) 
def sigmoidGradient(Z): 
    return sigmoid(Z)*(1-sigmoid(Z)) 

# Pre=processing helper functions 
# Normalise data to 0.1-1 as 0 inputs kills the weights and changes 
def scaleDataVec(data): 
    return (np.asfarray(data[1:])/255.0 * 0.99) + 0.1 

def scaleData(data): 
    return (np.asfarray(data[:,1:])/255.0 * 0.99) + 0.1 

# DISPLAY DATA 
# plot_data will be what to plot, num_ex must be a square number of how many examples to plot, random examples will then be plotted 
def displayData(plot_data, num_ex, rand=1): 
    if rand==0: 
     data=plot_data 
    else: 
     rand_indexes=random.sample(range(plot_data.shape[0]),num_ex) 
     data=plot_data[rand_indexes,:] 
    # Useful variables, m= no. train ex, n= no. features 
    m=data.shape[0] 
    n=data.shape[1] 
    # Shape for one example 
    example_width=math.ceil(math.sqrt(n)) 
    example_height=math.ceil(n/example_width) 
    # No. of items to display 
    display_rows=math.floor(math.sqrt(m)) 
    display_cols=math.ceil(m/display_rows) 
    # Padding between images 
    pad=1 
    # Setup blank display 
    display_array = -np.ones((pad + display_rows * (example_height + pad), (pad + display_cols * (example_width + pad)))) 
    curr_ex=0 
    for i in range(1,display_rows+1): 
     for j in range(1,display_cols+1): 
      if curr_ex>m: 
       break 
      # Max value of this patch 
      max_val=max(abs(data[curr_ex, :])) 
      display_array[pad + (j-1) * (example_height + pad) : j*(example_height+1), pad + (i-1) * (example_width + pad) :       i*(example_width+1)] = data[curr_ex, :].reshape(example_height, example_width)/max_val 
      curr_ex+=1 

    matplotlib.pyplot.imshow(display_array, cmap='Greys', interpolation='None') 


# In[312]: 

a=neuralNetwork(2,5,2,0.5,0.0) 
print(a.backpropIter(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]]))) 
print(a.gradCheck(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]]))) 
D=[] 
C=[] 
for i in range(100): 
    c,b,d=a.gradDesc(np.array([[0.1,0.9],[0.2,0.8]]),np.array([[0,1],[0,1]])) 
    C.append(c) 
    D.append(np.mean(d)) 
    #print(c) 

print(a.predict(np.array([[0.1,0.9]]))) 
# Debugging plot 
matplotlib.pyplot.figure() 
matplotlib.pyplot.plot(C) 
matplotlib.pyplot.ylabel("Error") 
matplotlib.pyplot.xlabel("Iterations") 
matplotlib.pyplot.figure() 
matplotlib.pyplot.plot(D) 
matplotlib.pyplot.ylabel("Gradient") 
matplotlib.pyplot.xlabel("Iterations") 
#print(J) 


# In[313]: 

# Class instance 

# Number of input, hidden and ouput nodes 
# Input = 28 x 28 pixels 
input_nodes=784 
# Arbitrary number of hidden nodes, experiment to improve 
hidden_nodes=200 
# Output = one of the digits [0,1,2,3,4,5,6,7,8,9] 
output_nodes=10 

# Learning rate 
learning_rate=0.4 

# Regularisation parameter 
lambd=0.0 

# Create instance of Nnet class 
nn=neuralNetwork(input_nodes,hidden_nodes,output_nodes,learning_rate,lambd) 


# In[314]: 

time1=time.time() 
# Scale inputs 
inputs=scaleData(train_data) 
# 0.01-0.99 range as the sigmoid function can't reach 0 or 1, 0.01 for all except 0.99 for target 
targets=(np.identity(output_nodes)*0.98)[train_data[:,0],:]+0.01 
J=[] 
JR=[] 
Grad=[] 
iterations=100 
for i in range(iterations): 
    j,jr,grad=nn.gradDesc(inputs, targets) 
    grad=np.mean(grad) 
    if i == 0: 
     print("Initial ===== Cost (unregularised): ", j, "\t///", "Cost (regularised): ",jr," Mean Gradient: ",grad) 
    print("\r", end="") 
    print("Iteration ", i+1, "\tCost (unregularised): ", j, "\t///", "Cost (regularised): ", jr," Mean Gradient: ",grad,end="") 
    J.append(j) 
    JR.append(jr) 
    Grad.append(grad) 
time2 = time.time() 
print ("\nTRAINED IN ",time2-time1) 


# In[315]: 

# Debugging plot 
matplotlib.pyplot.figure() 
matplotlib.pyplot.plot(J) 
matplotlib.pyplot.plot(JR) 
matplotlib.pyplot.ylabel("Error") 
matplotlib.pyplot.xlabel("Iterations") 
matplotlib.pyplot.figure() 
matplotlib.pyplot.plot(Grad) 
matplotlib.pyplot.ylabel("Gradient") 
matplotlib.pyplot.xlabel("Iterations") 
#print(J) 


# In[316]: 

# Scale inputs 
inputs=scaleData(test_data) 
# 0.01-0.99 range as the sigmoid function can't reach 0 or 1, 0.01 for all except 0.99 for target 
targets=test_data[:,0] 
h=nn.predict(inputs) 
score=[] 
targ=[] 
hyp=[] 
for i,line in enumerate(targets): 
    if line == h[i]: 
     score.append(1) 
    else: 
     score.append(0) 
    hyp.append(h[i]) 
    targ.append(line) 
print("Test accuracy: ", sum(score)/len(score)*100) 
indexes=random.sample(range(len(hyp)),9) 
print("Targets ",end="") 
for j in indexes: 
    print (targ[j]," ",end="") 
print("\nHypothesis ",end="") 
for j in indexes: 
    print (hyp[j]," ",end="") 
displayData(test_data[indexes, 1:], 9, rand=0) 


# In[277]: 

nn.predict(0.9*np.ones((784,)))

편집 한

다른 learni를 사용하도록 권장 요금을 겨 그러나 불행하게도, 그들은 모두 비슷한 결과가 나올 여기 MNIST 100 집합 사용 30 반복에 대한 플롯이다 : 구체적으로

가, 여기가 시작 및 종료하는 수치입니다

0.01의 학습 속도가 매우 낮은,하지만이 지역의 학습 속도를 탐험, 나는 단지 30 % ~ 40 % 정도, 큰 8 %에 개선 또는 함께 최선의 결과 나왔다있다

Initial ===== Cost (unregularised): 4.07208963507 /// Cost (regularised): 4.07208963507 Mean Gradient: 0.0540251381858 
    Iteration 50 Cost (unregularised): 0.613310215166 /// Cost (regularised): 0.613310215166 Mean Gradient: -0.000133981500849Initial ===== Cost (unregularised): 5.67535252616  /// Cost (regularised): 5.67535252616 Mean Gradient: 0.0644797515914 
    Iteration 50 Cost (unregularised): 0.381080434935 /// Cost (regularised): 0.381080434935 Mean Gradient: 0.000427866902699Initial ===== Cost (unregularised): 3.54658422176 /// Cost (regularised): 3.54658422176 Mean Gradient: 0.0672211732868 
    Iteration 50 Cost (unregularised): 0.981 /// Cost (regularised): 0.981 Mean Gradient: 2.34515341943e-20Initial ===== Cost (unregularised): 4.05269658215 /// Cost (regularised): 4.05269658215 Mean Gradient: 0.0469666696193 
    Iteration 50 Cost (unregularised): 0.980999999999 /// Cost (regularised): 0.980999999999 Mean Gradient: -1.0582706063e-14Initial ===== Cost (unregularised): 2.40881492228 /// Cost (regularised): 2.40881492228 Mean Gradient: 0.0516056901574 
    Iteration 50 Cost (unregularised): 1.74539997258 /// Cost (regularised): 1.74539997258 Mean Gradient: 1.01955789614e-09Initial ===== Cost (unregularised): 2.58498876008 /// Cost (regularised): 2.58498876008 Mean Gradient: 0.0388768685257 
    Iteration 3 Cost (unregularised): 1.72520399313 /// Cost (regularised): 1.72520399313 Mean Gradient: 0.0134040908157 
    Iteration 50 Cost (unregularised): 0.981 /// Cost (regularised): 0.981 Mean Gradient: -4.49319474346e-43Initial ===== Cost (unregularised): 4.40141352357 /// Cost (regularised): 4.40141352357 Mean Gradient: 0.0689167742968 
    Iteration 50 Cost (unregularised): 0.981 /// Cost (regularised): 0.981 Mean Gradient: -1.01563966458e-22

:로 이전에 보았던 0 %조차도 달성해야하는 것은 아닙니다.

편집 2

지금 완료 및 매트릭스보다는 반복적 인 공식에 최적화 된 역 전파 기능을 추가, 그래서 지금은 많은 시대에서 실행할 수 있습니다 한/천천히 고통스럽게없이 반복. 그래서 클래스의 "backprop"함수는 그라디언트 검사와 일치합니다 (실제로 크기는 1/2입니다. 그러나 그라디언트 검사에서는 문제라고 생각합니다. 따라서이 비평을 비례 적으로 중요하게 생각하지 않아야합니다. 이 문제를 해결하기 위해 부서를 추가 함). 많은 수의 신기원으로 훨씬 더 정확한 결과를 얻었지만, 이전에 약간 다른 스타일의 간단한 3 층 신경망을 동일한 데이터 세트 csvs에서 책의 일부로 프로그래밍했을 때와 마찬가지로 문제가있는 것으로 보입니다. 훨씬 나은 교육 결과를 얻을 수 있습니다. 다음은 대규모 신기원의 일부 플롯 및 데이터입니다.

좋은 보이지만, 우리는 여전히 매우 가난한 테스트 설정 정확성을 가지고 있고,이 데이터 세트를 통해 2500 개 실행을위한, 훨씬 적은 비용으로 좋은 결과를 얻기해야합니다!

Test accuracy: 61.150000000000006 
    Targets 6 9 8 2 2 2 4 3 8 
    Hypothesis 6 9 8 4 7 1 4 3 8

편집 3, 어떤 데이터 세트입니까?

http://makeyourownneuralnetwork.blogspot.co.uk/2015/03/the-mnist-dataset-of-handwitten-digits.html?m=1

더 많은 데이터와 함께 노력하고 내가 디버깅하는 동안 더 좋은 단지 이상 때문에 하위 집합 train_100 및 test_10를 사용되지 걸립니다 train.csv 및 test.csv을 사용합니다.

편집 4

는 전체 데이터 집합 (하지 backpropiter) 각 루프가 효과적으로 시대이며, 함께 backprop 기능에 사용되는 (14,000 등) 신 (新) 시대의 매우 많은 수의 후 뭔가를 배울 것 같습니다 100 열차 및 10 개의 테스트 샘플의 하위 집합에 대한 신기원의 양이 너무 많으면 테스트 정확도가 상당히 우수합니다. 그러나이 작은 샘플을 사용하면 쉽게 우연히 발생할 수 있으며 심지어 작은 데이터 세트에서도 목표로 삼을 것이 아니라 70 % 만 차지합니다. 그러나 배우는 것처럼 보입니다. 나는 그것을 배제하기 위해 매개 변수를 매우 광범위하게 시도하고 있습니다.

출처

2017-02-09 olliejday

더 작은 학습 속도 또는 더 높은 정규화 매개 변수 – BlackBear

제안을 주셔서 감사합니다. 질문을 업데이트하여 다른 학습 속도의 플롯을 표시하십시오! 불행히도, 이것은별로 도움이되지 못했습니다. – olliejday

내 자신이 아닌 전체 코드를 통과시키지 않고도, 이는 값을 스스로 (또는 비슷한 것과) 매핑하려고하는 것처럼 의심스러워 보이며 학습 속도를 줄이는 것은 필연적으로 느려지는 것입니다. 'x == x'를 예측하는 데 꽤 능숙합니다. 실수로 출력을 입력 기능으로 공급할 수있는 곳이 있습니까? – roganjosh

해결 된

내 신경망을 해결했습니다. 다른 사람에게 도움이되는 경우 간단한 설명이 이어집니다. 제안을 도와 준 모든 사람들에게 감사합니다. 기본적으로 완전히 매트릭스 방식으로 구현했습니다. backpropagation은 매번 모든 예제를 사용합니다. 나중에 벡터 접근 방식으로 구현하려고했습니다. 각 예제와 함께 역 전파. 이것은 행렬 접근법이 각 예제를 업데이트하지 않는다는 것을 깨달았을 때,이 방법을 통해 실행하는 것은 차례대로 각 예제를 실행하는 것과 같지 않습니다. 전체 트레이닝 세트가 효과적으로 하나의 예제로 역 전파되었습니다. 따라서, 내 행렬 구현은 작동하지만 많은 반복 후에는 어쨌든 벡터 접근보다 오래 걸립니다! 이 특정 부분에 대해 더 많은 것을 배우기 위해 새로운 질문을 열었지 만 행렬 접근법 또는 예제 접근 방식에 의한 점진적 예제를 사용하여 많은 반복이 필요했습니다.

출처

2017-02-14 18:09:47 olliejday

디버깅 신경망