Alpha/Beta가 내 MiniMax 알고리즘에 영향을주지 않는 이유는 무엇입니까?

먼저 약간 틀린 제목에 대해 유감스럽게 생각합니다. 나는 30 단어 길이로하고 싶지 않았습니다. 내가 구현 한 알파/베타 프 루닝은 TicTacToe 게임에 적용했을 때 평가 량을 크게 줄였습니다. 아래에서 직접 확인하십시오.Alpha/Beta가 내 MiniMax 알고리즘에 영향을주지 않는 이유는 무엇입니까?

평가 계수의 각 쌍은 입력과 동일한 게임 상태로 측정됩니다. 내가 작업했습니다 신경망을 재생 체커에 가지 치기를 구현하고자 할 때

문제

가 발생합니다. 처음부터이 모든 목표를 달성하기 위해 TicTacToe 게임을 사용하여 MiniMax + Alpha/Beta를 실험 해 보았습니다.

NN을 사용한 동일한 종류의 실험이 있습니다. 코드에 대한 지금

(체커 하나, 그들은 그래도 거의 동일합니다, 당신은 사용하여 ticTacToe 버전에서 엿봄을 갖고 싶어 알려주세요).

절대적으로 동일하므로 두 방법의 시작 부분을 한 번 붙여 넣기 만합니다. 두 기호가 약간 다르기 때문에 두 기호가 모두 표시됩니다.

코드를 명확하게 작성하기위한 작은 메모입니다.

보드는

이동 모두 들어있는 개체입니다 ... 게임이 승리 한 경우/등을 그려, 그것이 설정 조각 추적 가능한 이동 를 유지하는 객체입니다 이동과 관련된 정보, 내가 을 메서드의 첫 번째 줄로 복제하면 나는 주어진 클론을 만들고 생성자가 주어진 이동을 적용합니다.

private double miniMax(Board b, Move m, int depth) {

과 두 방법의

private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {

시작 :

Testboard clone = new Testboard(b, m); 
    // Making a clone of the board in order to 
    // avoid making changes to the original one 

    if (clone.isGameOver()) { 

     if (clone.getLoser() == null) 
      // It's a draw, evaluation = 0 
      return 0; 

     if (clone.getLoser() == Color.BLACK) 
      // White (Max) won, evaluation = 1 
      return 1; 

     // Black (Min) won, evaluation = -1 
     return -1; 
    } 

    if (depth == 0) 
     // Reached the end of the search, returning current Evaluation of the board 
     return getEvaluation(clone);

정기 최소 최대 연속 : 알파와

// If it's not game over 
    if (clone.getTurn() == Color.WHITE) { 

     // It's white's turn (Maxing player) 
     double max = -1; 
     for (Move move : clone.getMoves()) { 
      // For each children node (available moves) 
      // Their minimax value is calculated 
      double score = miniMax(clone, move, depth-1); 
      // Only the highest score is stored 
      if (score > max) 
       max = score; 
     } 
     // And is returned 
     return max; 
    } 

    // It's black's turn (Min player) 
    double min = 1; 
    for (Move move : clone.getMoves()) { 
     // For each children node (available moves) 
     // Their minimax value is calculated 
     double score = miniMax(clone, move, depth-1); 
     // Only the lowest score is stored 
     if (score < min) 
      min = score; 
    } 
    // And is returned 
    return min; 
}

최소 최대/베타 prunin g 계속 :

// If it's not game over 
    if (clone.getTurn() == Color.WHITE) { 

     // It's white's turn (Maxing player) 
     for (Move move : clone.getMoves()) { 

      // For each children node (available moves) 
      // Their minimax value is calculated     
      double score = alphaBeta(clone, move, depth-1, alpha, beta); 

      if (score > alpha) 
       // If this score is greater than alpha 
       // It is assigned to alpha as the new highest score 
       alpha = score; 
      if (alpha >= beta) 
       // The cycle is interrupted early if the value of alpha equals or is greater than beta 
       break; 
     } 
     // The alpha value is returned 
     return alpha; 
    } 

    // It's black's turn (Min player) 
    for (Move move : clone.getMoves()) { 

     // For each children node (available moves) 
     // Their minimax value is calculated    
     double score = alphaBeta(clone, move, depth-1, alpha, beta); 

     if (score < beta) 
      // If this score is lower than beta 
      // It is assigned to beta as the new lowest score 
      beta = score; 
     if (alpha >= beta) 
      // The cycle is interrupted early if the value of alpha equals or is greater than beta 
      break; 
    } 
    // The beta value is returned 
    return beta; 
}

나는 정직하게 붙어 그리고 내가 시도하고 무슨 일이 일어나고 있는지 알아 내기 위해 무엇을 할 수 있는지 모르겠어요. 나는 여러 다른 무작위로 생성 된 신경 네트워크에서 MiniMax + A/B를 시도했지만, 평가 횟수에 대해서는 개선 된 것을 본 적이 없다. 나는 여기 누군가가이 상황에 대해 약간의 빛을 비춰 줄 수 있기를 바랍니다. 감사합니다!

출처

2017-03-23 Daniel

하나의 이유는 이동 순서 일 수 있습니다. 단 하나의 이유 일 수도 있습니다. 좋은 움직임을 먼저 시도하면 더 많은 것들이 가지 치기 될 것입니다. – maraca

@maraca 안녕하십니까. 응답 해 주셔서 감사합니다. 움직임은 무작위로 정렬됩니다. 나는 많은 실험을했으며 알파/베타가 있거나없는 미니 맥스 평가의 수는 매번 동일합니다. 코드에서 더 중대한 실수가 있어야합니다. 평가의 수는 동일하지만 때로는 그렇지 않을 수도 있다고 생각하는 경우가 있습니다. – Daniel

평가의 깊이는 어디입니까? 보드의 선량을 측정하기위한 정상 상태 평가 기능은 무엇입니까? 검수원 나무는 깊을 수있다; tic-tac-toe가 아니라 Checker에 대한 평가 기능은 본질적으로 모든 보드에 0을 제공합니다. 따라서 가지 치기가 발생하지 않습니다. – tucuxi

감사의 말을 전 해주는 @maraca에게 감사의 말을 전합니다.

내가 게시 한 코드에는 아무런 문제가 없으며, 일단 검색이 심도 한계에 도달하면 사용하고있는 평가 기능에 문제가 있습니다.

본질적으로 무작위 값을 뱉어내는 아직 훈련받지 않은 신경망을 사용하고 있었는데, 이것은 MiniMax + A/B가 모든 노드를 통과하도록 강제했습니다. 가지 치기.

출처

2017-03-24 20:30:34 Daniel

Alpha/Beta가 내 MiniMax 알고리즘에 영향을주지 않는 이유는 무엇입니까?

답변

관련 문제