MPI 교착 상태로 인한 교착 상태

저는 MPI 라이브러리가있는 C++로 프로그램을 작성하고 있습니다. 하나의 노드 만 작동하는 교착 상태가 있습니다! 집단 작업을 보내거나 사용하지 않고 두 가지 집단 기능 (MPI_Allreduce 및 MPI_Bcast) 만 사용합니다. 노드가 다른 노드가 무언가를 보내거나받을 때까지 대기하는 경우 실제로이 교착 상태의 원인을 이해하지 못합니다. 다른 하나는 여전히 루프를 실행하는 모든 "마스터"과정으로MPI 교착 상태로 인한 교착 상태

void ParaStochSimulator::first_reacsimulator() { 
    SimulateSingleRun(); 
} 

double ParaStochSimulator::deterMinTau() { 
    //calcualte minimum tau for this process 
    l_nLocalMinTau = calc_tau(); //min tau for each node 
    MPI_Allreduce(&l_nLocalMinTau, &l_nGlobalMinTau, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);  
    //min tau for all nodes 
    //check if I have the min value 
    if (l_nLocalMinTau <= l_nGlobalMinTau && m_nCurrentTime < m_nOutputEndPoint) { 
     FireTransition(m_nMinTransPos); 
     CalculateAllHazardValues(); 
    } 
    return l_nGlobalMinTau; 
} 

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
      //std::cout << "size of mani place :" << l_nMinplacesPos.size() << std::endl; 
     } 
    } 
    MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    PostProcessRun(); 
}

출처

2017-05-05 Ramy Al-Anwar

가 MPI_Bcast를 실행하고, 다음 다음 MPI_Allreduce을 실행 deterMinTau를 입력.

마스터 노드가 모든 노드가 Brodcast를 실행하기를 기다리고 있고 다른 모든 노드가 마스터 노드가 Reduce를 실행하기를 기다리고 있기 때문에 이것은 교착 상태입니다.

나는 당신을 위해 무엇을 찾고있는 것은 생각 :

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     //All the nodes reduce tau at the same time 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      //Removed bordcast for master here 
     } 
     //All the nodes broadcast at every loop iteration 
     MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    } 
    PostProcessRun(); 
}

출처

2017-05-05 14:56:12 Tezirg

이 도와 주셔서 감사하지만 불행히도 내가 마스터를 형성하는 방송 제거하고 여전히 교착 상태입니다 -_- –

답변

관련 문제