2017-11-10 29 views
1

동일한 문제가 있었지만 동일한 수정을 적용하려고 할 때 다른 오류가 발생했습니다. 나는 5 Gpus에 그러나 달리고있다. 나는 당신이 당신의 샘플이 배치 주사위와 gpus의 수에 의해 나눌 수 있는지 확인해야한다는 것을 읽었지만 나는 그것을했다. 나는 며칠 동안 인터넷을 샅샅이 뒤졌으며 내가 가지고있는 문제를 해결할 수있는 것을 찾을 수 없습니다. attributeTables [0] (700 35,560)를 NumPy와 어레이 형상 인 Y는 (35,560)을 NumPy와 어레이 형상 인 I 역시 사용 시도 : I는 keras의 v2.0.9 및 텐서 흐름 v1.1.0 개발자케라 스테이트 풀 LSTM 멀티 -gpu 오류 호환되지 않는 셰이프 : [2540] 대 배치 크기 곱한 nGPU 구입

VARIABLES를 실행하고 모양에 대해 (35560, 1) 있지만, 발생하는 모든 것은 "호환되지 않는 모양 : [2540] 대 [508]"에서 "호환되지 않는 모양 : [2540, 1] 대 [508, 1]"로 변경됨 "

그래서이 문제는 타겟에만 해당되며 예상되는 배치 크기는 타겟의 프로세스 중간에서 어딘가에 곱 해져서 불일치를 유발하는 속성이나 최소한 유효성을 검증하는 데 걸리는 것은 아닙니다. 확실하지 않다.

다음은 문제의 코드와 오류입니다.

import numpy as np 
from keras.models import Sequential 
from keras.utils import multi_gpu_model 
from keras.layers import Dense 
from keras.layers import LSTM 
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler 
import matplotlib.pyplot as plt 

def baseline_model(): 
    # create model 
    print("Building Layers") 
    model = Sequential() 
    model.add(LSTM(700, batch_input_shape=(batchSize, X.shape[1], X.shape[2]), activation='tanh', return_sequences=False, stateful=True)) 
    model.add(Dense(1)) 
    print("Building Parallel model") 
    parallel_model = multi_gpu_model(model, gpus=nGPU) 
    # Compile model 
    #model.compile(loss='mean_squared_error', optimizer='adam') 
    print("Compiling Model") 
    parallel_model.compile(loss='mae', optimizer='adam', metrics=['accuracy']) 
    return parallel_model 

def buildModel(): 
    print("Bulding Model") 
    mlp = baseline_model() 
    print("Fitting Model") 
    return mlp.fit(X_train, y_train, epochs=1, batch_size=batchSize, shuffle=False, validation_data=(X_test, y_test)) 

print("Scaling") 
scaler = StandardScaler() 
X_Scaled = scaler.fit_transform(attributeTables[0]) 

print("Finding Batch Size") 
nGPU = 5 
batchSize = 500 
while len(X_Scaled) % (batchSize * nGPU) != 0: 
    batchSize += 1 

print("Filling Arrays") 
X = X_Scaled.reshape((X_Scaled.shape[0], X_Scaled.shape[1], 1)) 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=.8) 


print("Calling buildModel()") 
model = buildModel() 

print("Ploting History") 
plt.plot(model.history['loss'], label='train') 
plt.plot(model.history['val_loss'], label='test') 
plt.legend() 
plt.show() 

다음은 전체 출력물입니다. 내가 병렬 모델을 비활성화하고, 상태가 더 ptoblem을 일한 하나 개의 GPU에 넣어 때

Beginning OHLC Load 
Time took : 7.571000099182129 

Making gloabal copies 
Time took : 0.0 

Using TensorFlow backend. 
Scaling 
Finding Batch Size 
Filling Arrays 
Calling buildModel() 
Bulding Model 
Building Layers 
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py:2010: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified. 
    FutureWarning) 
Building Parallel model 
Compiling Model 
Fitting Model 
Train on 28448 samples, validate on 7112 samples 
Epoch 1/1 
Traceback (most recent call last): 

    File "<ipython-input-2-74c49f05bfbc>", line 1, in <module> 
    runfile('C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py', wdir='C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor') 

    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile 
    execfile(filename, namespace) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile 
    exec(compile(f.read(), filename, 'exec'), namespace) 

    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 77, in <module> 
    model = buildModel() 

    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 57, in buildModel 
    return mlp.fit(X_train, y_train, epochs=1, batch_size=batchSize, shuffle=False, validation_data=(X_test, y_test)) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1631, in fit 
    validation_steps=validation_steps) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1213, in _fit_loop 
    outs = f(ins_batch) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2332, in __call__ 
    **self.session_kwargs) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 778, in run 
    run_metadata_ptr) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run 
    feed_dict_string, options, run_metadata) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run 
    target_list, options, run_metadata) 

    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call 
    raise type(e)(node_def, op, message) 

InvalidArgumentError: Incompatible shapes: [2540,1] vs. [508,1] 
    [[Node: training/Adam/gradients/loss/concatenate_1_loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](training/Adam/gradients/loss/concatenate_1_loss/sub_grad/Shape, training/Adam/gradients/loss/concatenate_1_loss/sub_grad/Shape_1)]] 
    [[Node: replica_1/sequential_1/dense_1/BiasAdd/_313 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:1", send_device_incarnation=1, tensor_name="edge_1355_replica_1/sequential_1/dense_1/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 

Caused by op 'training/Adam/gradients/loss/concatenate_1_loss/sub_grad/BroadcastGradientArgs', defined at: 
    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 245, in <module> 
    main() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 241, in main 
    kernel.start() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 477, in start 
    ioloop.IOLoop.instance().start() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start 
    super(ZMQIOLoop, self).start() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\ioloop.py", line 832, in start 
    self._run_callback(self._callbacks.popleft()) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\ioloop.py", line 605, in _run_callback 
    ret = callback() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper 
    return fn(*args, **kwargs) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 265, in enter_eventloop 
    self.eventloop(self) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\eventloops.py", line 106, in loop_qt5 
    return loop_qt4(kernel) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\eventloops.py", line 99, in loop_qt4 
    _loop_qt(kernel.app) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\eventloops.py", line 83, in _loop_qt 
    app.exec_() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\eventloops.py", line 39, in process_stream_events 
    kernel.do_one_iteration() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 298, in do_one_iteration 
    stream.flush(zmq.POLLIN, 1) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 352, in flush 
    self._handle_recv() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv 
    self._run_callback(callback, msg) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback 
    callback(*args, **kwargs) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper 
    return fn(*args, **kwargs) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher 
    return self.dispatch_shell(stream, msg) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell 
    handler(stream, idents, msg) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request 
    user_expressions, allow_stdin) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute 
    res = shell.run_cell(code, store_history=store_history, silent=silent) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell 
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2698, in run_cell 
    interactivity=interactivity, compiler=compiler, result=result) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2808, in run_ast_nodes 
    if self.run_code(code, result): 
    File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code 
    exec(code_obj, self.user_global_ns, self.user_ns) 
    File "<ipython-input-2-74c49f05bfbc>", line 1, in <module> 
    runfile('C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py', wdir='C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor') 
    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile 
    execfile(filename, namespace) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile 
    exec(compile(f.read(), filename, 'exec'), namespace) 
    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 77, in <module> 
    model = buildModel() 
    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 57, in buildModel 
    return mlp.fit(X_train, y_train, epochs=1, batch_size=batchSize, shuffle=False, validation_data=(X_test, y_test)) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1608, in fit 
    self._make_train_function() 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 990, in _make_train_function 
    loss=self.total_loss) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper 
    return func(*args, **kwargs) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\optimizers.py", line 415, in get_updates 
    grads = self.get_gradients(loss, params) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\optimizers.py", line 73, in get_gradients 
    grads = K.gradients(loss, params) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2369, in gradients 
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 560, in gradients 
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 368, in _MaybeCompile 
    return grad_fn() # Exit early 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 560, in <lambda> 
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\math_grad.py", line 609, in _SubGrad 
    rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 411, in _broadcast_gradient_args 
    name=name) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op 
    op_def=op_def) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__ 
    self._traceback = _extract_stack() 

...which was originally created as op 'loss/concatenate_1_loss/sub', defined at: 
    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\ipython\start_kernel.py", line 245, in <module> 
    main() 
[elided 27 identical lines from previous traceback] 
    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 77, in <module> 
    model = buildModel() 
    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 55, in buildModel 
    mlp = baseline_model() 
    File "C:/Users/BeeAndTurtle/Documents/Programming/Python/Kraken_API_Market_Prediction/predictor/test.py", line 29, in baseline_model 
    parallel_model.compile(loss='mae', optimizer='adam', metrics=['accuracy']) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 860, in compile 
    sample_weight, mask) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 460, in weighted 
    score_array = fn(y_true, y_pred) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\keras\losses.py", line 13, in mean_absolute_error 
    return K.mean(K.abs(y_pred - y_true), axis=-1) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 821, in binary_op_wrapper 
    return func(x, y, name=name) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2627, in _sub 
    result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op 
    op_def=op_def) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__ 
    self._traceback = _extract_stack() 

InvalidArgumentError (see above for traceback): Incompatible shapes: [2540,1] vs. [508,1] 
    [[Node: training/Adam/gradients/loss/concatenate_1_loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/concatenate_1_loss/sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](training/Adam/gradients/loss/concatenate_1_loss/sub_grad/Shape, training/Adam/gradients/loss/concatenate_1_loss/sub_grad/Shape_1)]] 
    [[Node: replica_1/sequential_1/dense_1/BiasAdd/_313 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:1", send_device_incarnation=1, tensor_name="edge_1355_replica_1/sequential_1/dense_1/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 
+0

https://stackoverflow.com/questions/47190463/multi-gpu-model-lstm-with-stateful-on-keras-is-not-working/47200968#47200968 –

답변

1

다니엘 몰러의 링크를 바로했다. 현재 훈련을 기다리고 있습니다. 결과를 게시합니다.

0

방금 ​​실험 유틸리티 인 stateful_multi_gpu을 게시하여 여러 GPU의 상태 저장 모델 교육을 처리했습니다. 나는 그것이 당신에게 유용 할 것인지를 아는 데 관심이 있습니다.

Daniel Moller가 언급 한 동일한 질문은 my answer을 참조하십시오.