0

내가이 튜토리얼에서 모델이 매니페스트로 전송됩니다 (https://docs.microsoft.com/en-us/azure/machine-learning/preview/tutorial-classifying-iris-part-3#prepare-to-operationalize-locally)푸른 ML 워크 벤치는 Kubernetes 배포

, 득점 스크립트 및 스키마를 클러스터 모드를 사용하여 ML 워크 벤치 프로세스를 사용하여 푸른에 대한 예측 웹 서비스를 배포하는 것을 시도하고 실패

서비스 만들기 ......................................... ................. 오류 이 발생했습니다 : { '오류': { '코드': 'KubernetesDeploymentFailed', '세부 정보': [{ '메시지': '백 오프 40 초에 실패한 컨테이너 다시 시작 = ... pod = ... ', '상태': '실패', 'ID': '오류 코드': '코드': 'CrashLoopBackOff'}], 'StatusCode': 400, '메시지': 'Kubernetes 배포에 실패했습니다.', 'OperationType' ... ','ResourceLocation ': '/ api/subscriptions/... ','CreatedTime ': '2017-10-26T20 : 30 : 49.77362Z ','EndTime ':'2017-10-26T20 : 36 : 40.186369Z '}

여기에 "종료 요청을 지시받은 SIGTERM 경고"원인이 일어나는 침묵 뭔가 프로그래머 인 ml의 서비스 실시간 로그

C:\Users\userguy\Documents\azure_ml_workbench\projecto>az ml service logs realtime -i projecto 
2017-10-26 20:47:16,118 CRIT Supervisor running as root (no user in config file) 
2017-10-26 20:47:16,120 INFO supervisord started with pid 1 
2017-10-26 20:47:17,123 INFO spawned: 'rsyslog' with pid 9 
2017-10-26 20:47:17,124 INFO spawned: 'program_exit' with pid 10 
2017-10-26 20:47:17,124 INFO spawned: 'nginx' with pid 11 
2017-10-26 20:47:17,125 INFO spawned: 'gunicorn' with pid 12 
2017-10-26 20:47:18,160 INFO success: rsyslog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 
2017-10-26 20:47:18,160 INFO success: program_exit entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 
2017-10-26 20:47:22,164 INFO success: nginx entered RUNNING state, process has stayed up for > than 5 seconds (startsecs) 
2017-10-26T20:47:22.519159Z, INFO, 00000000-0000-0000-0000-000000000000, , Starting gunicorn 19.6.0 
2017-10-26T20:47:22.520097Z, INFO, 00000000-0000-0000-0000-000000000000, , Listening at: http://127.0.0.1:9090 (12) 
2017-10-26T20:47:22.520375Z, INFO, 00000000-0000-0000-0000-000000000000, , Using worker: sync 
2017-10-26T20:47:22.521757Z, INFO, 00000000-0000-0000-0000-000000000000, , worker timeout is set to 300 
2017-10-26T20:47:22.522646Z, INFO, 00000000-0000-0000-0000-000000000000, , Booting worker with pid: 22 
2017-10-26 20:47:27,669 WARN received SIGTERM indicating exit request 
2017-10-26 20:47:27,669 INFO waiting for nginx, gunicorn, rsyslog, program_exit to die 
2017-10-26T20:47:27.669556Z, INFO, 00000000-0000-0000-0000-000000000000, , Handling signal: term 
2017-10-26 20:47:30,673 INFO waiting for nginx, gunicorn, rsyslog, program_exit to die 
2017-10-26 20:47:33,675 INFO waiting for nginx, gunicorn, rsyslog, program_exit to die 
Initializing logger 
2017-10-26T20:47:36.564469Z, INFO, 00000000-0000-0000-0000-000000000000, , Starting up app insights client 
2017-10-26T20:47:36.564991Z, INFO, 00000000-0000-0000-0000-000000000000, , Starting up request id generator 
2017-10-26T20:47:36.565316Z, INFO, 00000000-0000-0000-0000-000000000000, , Starting up app insight hooks 
2017-10-26T20:47:36.565642Z, INFO, 00000000-0000-0000-0000-000000000000, , Invoking user's init function 
2017-10-26 20:47:36.715933: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instruc 
tions, but these are available on your machine and could speed up CPU computations. 
2017-10-26 20:47:36,716 INFO waiting for nginx, gunicorn, rsyslog, program_exit to die 
2017-10-26 20:47:36.716376: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instruc 
tions, but these are available on your machine and could speed up CPU computations. 
2017-10-26 20:47:36.716542: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructio 
ns, but these are available on your machine and could speed up CPU computations. 
2017-10-26 20:47:36.716703: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructi 
ons, but these are available on your machine and could speed up CPU computations. 
2017-10-26 20:47:36.716860: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructio 
ns, but these are available on your machine and could speed up CPU computations. 
this is the init 
2017-10-26T20:47:37.551940Z, INFO, 00000000-0000-0000-0000-000000000000, , Users's init has completed successfully 
Using TensorFlow backend. 
2017-10-26T20:47:37.553751Z, INFO, 00000000-0000-0000-0000-000000000000, , Worker exiting (pid: 22) 
2017-10-26T20:47:37.885303Z, INFO, 00000000-0000-0000-0000-000000000000, , Shutting down: Master 
2017-10-26 20:47:37,885 WARN killing 'gunicorn' (12) with SIGKILL 
2017-10-26 20:47:37,886 INFO stopped: gunicorn (terminated by SIGKILL) 
2017-10-26 20:47:37,889 INFO stopped: nginx (exit status 0) 
2017-10-26 20:47:37,890 INFO stopped: program_exit (terminated by SIGTERM) 
2017-10-26 20:47:37,891 INFO stopped: rsyslog (exit status 0) 

Received 41 lines of log 

내 추측을 확인하는 결과이다. scoring.py 스크립트의 나머지 부분은 시작됩니다 - tensorflow get이 시작되고 "this is init"print 문을 참조하십시오.

http://127.0.0.1:63437은 로컬 컴퓨터에서 액세스 할 수 있지만 ui 끝 점은 비어 있습니다.

Azure 클러스터에서 이것을 실행하는 방법에 대한 아이디어가 있습니까? Kubernetes가 작동하는 방식에 익숙하지 않아 기본 디버깅 지침을 이해할 수 있습니다.

+0

안녕하세요, 다음으로 이동하여 Kubernetes 대시 보드를 볼 수 있습니다. 127.0.0.1:63437/ui -이 항목이 비어 있습니까? – Ahmet

+0

http://127.0.0.1:63437/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy로 리디렉션되고 그 내용은 빈 – user4446237

답변

2

Google 시스템에서이 문제를 일으킬 수있는 버그를 발견했습니다. 어제 밤에 수정본이 배포되었습니다. 이 문제가 계속 발생하면 다시 시도하고 알려주십시오.

+0

이며 다시 시도되었습니다. – user4446237