1

PBS 클러스터에서 GNU 병렬로 여러 개의 작은 직렬 작업을 실행하려고합니다. 각 계산 노드에 16 개의 코어가 있으므로 다중 계산 노드를 사용하려고 했으므로 -S $ SERVERNAME 옵션을 GNUParallel로 전달했지만 나 혼란 스럽습니다. 작업의 수는 내가 9 개 이상의 일자리를 생성하도록 구성 할 때 -S $SERVERNAME 내가 지정한 작업의 수와 동일하지 않습니다 사용하여 노드에서 시작한다는 것입니다, 아래에있는 내 관찰은 다음과 같습니다"실행할 최대 작업 수"가 원격 서버에서 GNU 병렬을 사용할 때 지정된 작업 수와 같지 않습니까?

[[email protected] ~]$ parallel --version 
GNU parallel 20160922 
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016 
Ole Tange and Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> 
This is free software: you are free to change and redistribute it. 
GNU parallel comes with no warranty. 

Web site: http://www.gnu.org/software/parallel 

When using programs that use GNU Parallel to process data for publication 
please cite as described in 'parallel --citation'. 
[[email protected] ~]$ hostname # this shows my hostname 
shelob001 

때 사용 GNUParallel 로컬 호스트로 -S $ SERVERNAME이 없으면 문제가 없으며 10 개의 작업을 생성하려고했으며 GNUParallel은 10 개의 작업을 시작했습니다.

[[email protected] ~]$ parallel --progress echo ::: `seq 1 10` 

Computers/CPU cores/Max jobs to run 
1:local/16/10 # 10 jobs spawned, no problem 

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete 
local:10/0/100%/0.0s 1 
local:9/1/100%/0.0s 2 
local:8/2/100%/0.0s 3 
local:7/3/100%/0.0s 4 
local:6/4/100%/0.0s 5 
local:5/5/100%/0.0s 6 
local:4/6/100%/0.0s 7 
local:3/7/100%/0.0s 8 
local:2/8/100%/0.0s 9 
local:1/9/100%/0.0s 10 
local:0/10/100%/0.0s 

GNUParallel을 사용하여 -S $SERVERNAME을 사용하는 10 개 미만의 작업을 생성 할 때 여전히 문제가 없습니다. 여기

[[email protected] ~]$ parallel -S shelob001 --progress echo ::: `seq 1 1` 

Computers/CPU cores/Max jobs to run 
1:shelob001/16/1 # When the number of jobs is less than 10, no problem 

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete 
shelob001:1/0/100%/0.0s 1 
shelob001:0/1/100%/1.0s 

[[email protected] ~]$ parallel -S shelob001 --progress echo ::: `seq 1 8` 

Computers/CPU cores/Max jobs to run 
1:shelob001/16/8 # When the number of jobs is less than 10, no problem 

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete 
shelob001:8/0/100%/0.0s 1 
shelob001:7/1/100%/1.0s 7 
shelob001:6/2/100%/0.5s 3 
shelob001:5/3/100%/0.3s 8 
shelob001:4/4/100%/0.2s 5 
shelob001:3/5/100%/0.2s 2 
shelob001:2/6/100%/0.2s 6 
shelob001:1/7/100%/0.1s 4 
shelob001:0/8/100%/0.1s 

[[email protected] ~]$ parallel -S shelob001 --progress echo ::: `seq 1 9` 

Computers/CPU cores/Max jobs to run 
1:shelob001/16/9 # When the number of jobs is less than 10, no problem 

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete 
shelob001:9/0/100%/0.0s 1 
shelob001:8/1/100%/1.0s 5 
shelob001:7/2/100%/0.5s 8 
shelob001:6/3/100%/0.3s 2 
shelob001:5/4/100%/0.2s 6 
shelob001:4/5/100%/0.2s 9 
shelob001:3/6/100%/0.2s 3 
shelob001:2/7/100%/0.1s 4 
shelob001:1/8/100%/0.1s 7 
shelob001:0/9/100%/0.1s 

내가 작업 번호> = 10을 사용하려고하면, 나를 혼란 것입니다, 작업의 수 나는 10를 생성하려면 여기 원보다 항상 한 적은 양산 만 9 작업을 시작 :

[[email protected] ~]$ parallel -S shelob001 --progress echo ::: `seq 1 10` # I want to start 10 jobs 

Computers/CPU cores/Max jobs to run 
1:shelob001/16/9 #why here "Max jobs to run" is 9? 

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete 
shelob001:9/0/100%/0.0s 2 
shelob001:9/1/100%/3.0s 1 
shelob001:8/2/100%/1.5s 7 
shelob001:7/3/100%/1.0s 4 
shelob001:6/4/100%/0.8s 9 
shelob001:5/5/100%/0.6s 8 
shelob001:4/6/100%/0.5s 3 
shelob001:3/7/100%/0.4s 5 
shelob001:2/8/100%/0.4s 6 
shelob001:1/9/100%/0.4s 10 
shelob001:0/10/100%/0.4s 

[[email protected] ~]$ parallel -S shelob001 --progress echo ::: `seq 1 11` 

Computers/CPU cores/Max jobs to run 
1:shelob001/16/10 # it seems the jobs started is one less than I specified 

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete 
shelob001:10/0/100%/0.0s 1 
shelob001:10/1/100%/3.0s 2 
shelob001:9/2/100%/1.5s 8 
shelob001:8/3/100%/1.0s 3 
shelob001:7/4/100%/0.8s 4 
shelob001:6/5/100%/0.6s 5 
shelob001:5/6/100%/0.5s 7 
shelob001:4/7/100%/0.4s 10 
shelob001:3/8/100%/0.4s 9 
shelob001:2/9/100%/0.3s 6 
shelob001:1/10/100%/0.4s 11 
shelob001:0/11/100%/0.4s 
[[email protected] ~]$ 

"top"을 사용하여 계산 노드의 상태를 확인하면 seq 1 10을 사용할 때 9 개의 Cpus 만 사용되었음을 알 수 있습니다. 다행히도 내 문제를 분명히 했으므로 누구나이 문제의 가능한 원인을 지적 할 수 있을까요? 어떤 제안이라도 환영합니다.

대단히 감사합니다!

답변

0

버그를 발견 한 것 같습니다. 해결 방법 : -j + 1

+0

감사합니다. 올레,하지만 도움이되지 않습니다. –

+0

[fchen14 @ shelob001 ~] $ parallel -j + 1 -S 16/shelob001 --progress echo :::'seq 1 10' 컴퓨터는/CPU 코어/최대 작업 1 실행 : shelob001/16/9 컴퓨터 : 9/0/100 % 작업이 shelob001을 완료하기 시작 작업/평균 초/% 완료/실행중인 작업 /0.0s 1 shelob001 : 9/1/100 %/2.0s 2 shelob001 : 8/2/100 %/1.0s 9 shelob001 : 7/3/100 %/0.7s 7 shelob001 : 6/4/100 %/0.5s 3 shelob001 : 5/5/100 %/0.4s 8 shelob001 : 4/6/100 %/0.3s 5Shelob001 : 3/7/100 %/0.3s 4 shelob001 : 2/8/100 %/0.2s 6 shelob001 : 1/9/100 %/0.3s 10 shelob001 : 0/10/100 %/0.3 에스 –