tictactoe 검색 공간을 미리 주문하여 모든 주를 생성하지 않음

tictactoe에 q-learning을 구현하려고합니다. 이를 수행하는 단계 중 하나는 tictactoe 보드의 가능한 모든 상태를 열거하여 상태 - 값 테이블을 형성하는 것입니다. 빈 보드에서 시작하여 가능한 모든 상태를 재귀 적으로 생성하는 절차를 작성했습니다. 이렇게하기 위해서, 나는 검색 공간 트리의 선주문 순회를 암묵적으로 수행하고있다. 내가 법적 상태의 수를 참조하고있다 : 그러나, 모든의 끝에서, 나는 일반적인 합의 반면에만 707 고유의 상태를 얻고 것은 법적 상태의 수는 약 5000tictactoe 검색 공간을 미리 주문하여 모든 주를 생성하지 않음

참고 것입니다. 나는 어느 선수가 경기가 끝난 후에 계속해서 경기를 할 수 있다면 (나는 불법 국가를 의미한다) 주립대 수가 19,000에 가깝다는 것을 알고 있습니다.

CODE :

def generate_state_value_table(self, state, turn): winner = int(is_game_over(state)) #check if, for the current turn and state, game has finished and if so who won #print "\nWinner is ", winner #print "\nBoard at turn: ", turn #print_board(state) self.add_state(state, winner/2 + 0.5) #add the current state with the appropriate value to the state table open_cells = open_spots(state) #find the index (from 0 to total no. of cells) of all the empty cells in the board #check if there are any empty cells in the board if len(open_cells) > 0: for cell in open_cells: #pdb.set_trace() row, col = cell/len(state), cell % len(state) new_state = deepcopy(state) #make a copy of the current state #check which player's turn it is if turn % 2 == 0: new_state[row][col] = 1 else: new_state[row][col] = -1 #using a try block because recursive depth may be exceeded try: #check if the new state has not been generated somewhere else in the search tree if not self.check_duplicates(new_state): self.generate_state_value_table(new_state, turn+1) else: return except: #print "Recursive depth exceeded" exit() else: return

원하는 경우는 전체 코드 here 볼 수 있습니다.

편집 : 코드를 링크에서 정돈하고 여기에 더 많은 의견을 달아서 명확하게 설명합니다. 희망이 도움이됩니다.

출처

2016-11-20 SpiderWasp42

그래서 마침내 문제를 해결했으며 비슷한 문제에 직면 한 모든 사람들에게이 답변을 전합니다. 버그는 내가 중복 된 상태를 다루는 방식에 있었다. 생성 된 새 상태가 검색 트리의 다른 곳보다 먼저 생성 된 경우 상태 표에 추가해서는 안되지만 내가 실수 한 것은 중복되어야하는 시점에서 복제 상태를 찾는데 필요한 선주문 탐색을 줄이는 것이 었습니다 하나.

간단히 말해 : 또한

#check if the new state has not been generated somewhere else in the search tree 
if not self.check_duplicates(new_state): 
    self.generate_state_value_table(new_state, turn+1) 
else: 
    return

나는 상태가 발생하면 명확한 있었다 어디 검색 나뭇 가지를 탐험 중지 : 코드에서 다른 절을 제거 아래에 나에게 6046와 같은 상태의 수를 준 우승자. 구체적으로, 나는 self.add_state(state, winner/2 + 0.5) 뒤에 다음 코드를 추가 :

#check if the winner returned is one of the players and go back to the previous state if so 
if winner != 0: 
    return

이

내가 무엇을 찾고 있었다입니다 5762로 나에게 상태의 수를했다.

출처

2016-11-21 02:49:37 SpiderWasp42

tictactoe 검색 공간을 미리 주문하여 모든 주를 생성하지 않음

답변

관련 문제