You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand in normal CFR how a parent node computes the CFV, but I'm struggling to understand how this is done in the lookahead's vectorized implementation.
Specifically, in Lookahead:_compute_cfvs():
functionLookahead:_compute_cfvs()
ford=self.depth,2,-1dolocalgp_layer_terminal_actions_count=self.terminal_actions_count[d-2]
localggp_layer_nonallin_bets_count=self.nonallinbets_count[d-3]
self.cfvs_data[d][{{}, {}, {}, {1}, {}}]:cmul(self.empty_action_mask[d])
self.cfvs_data[d][{{}, {}, {}, {2}, {}}]:cmul(self.empty_action_mask[d])
self.placeholder_data[d]:copy(self.cfvs_data[d])
--player indexing is swapped for cfvsself.placeholder_data[d][{{}, {}, {}, self.acting_player[d], {}}]:cmul(self.current_strategy_data[d])
torch.sum(self.regrets_sum[d], self.placeholder_data[d], 1)
--use a swap placeholder to change {{1,2,3}, {4,5,6}} into {{1,2}, {3,4}, {5,6}}localswap=self.swap_data[d-1]
swap:copy(self.regrets_sum[d])
self.cfvs_data[d-1][{{gp_layer_terminal_actions_count+1, -1}, {1, ggp_layer_nonallin_bets_count}, {}, {}, {}}]:copy(swap:transpose(2,3))
endend
So I understand we multiply empty (illegal) actions by the mask so their CFV is zero, but I'm lost about what the swap and transpose is doing, or how slicing the parent's cfvs_data copies things to the right place.
Can anyone explain more clearly?
The text was updated successfully, but these errors were encountered:
Every node in d layer is indexed by [action_id, parent_id, gp_id]. When transitioning to d+1 layer, action_id becomes the new parent_id and [parent_id, gp_id] are combined to generate the new gp_id.
For new gp_id, the code (gp_id - 1) * gp_nonallinbets_count + (parent_id) means that parent_id axis are higher than gp_id axis. That is to say [parent_id, gp_id] should be transposed to [gp_id, parent_id].
From d layer to d+1 layer, [parent_id, gp_id] --(transpose)--> [gp_id, parent_id] --(change view)--> [gp_id*parent_id]
From d+1 layer to d layer, [gp_id] --(change view)--> [gp_id, parent_id] --(transpose)--> [parent_id, gp_id]
cfvs_data[d][{{}, {}, {}, 1, {}}] stores the cfv data for “player 2” in the lookahead tree, and cfvs_data[d][{{}, {}, {}, 2, {}}] stores the cfv data for “player 1”. That is to say the player indexs are swapped. Note that “player 1” is the first player in lookahead tree, and not really refer to player 1 in the real game.
I understand in normal CFR how a parent node computes the CFV, but I'm struggling to understand how this is done in the lookahead's vectorized implementation.
Specifically, in
Lookahead:_compute_cfvs()
:So I understand we multiply empty (illegal) actions by the mask so their CFV is zero, but I'm lost about what the swap and transpose is doing, or how slicing the parent's
cfvs_data
copies things to the right place.Can anyone explain more clearly?
The text was updated successfully, but these errors were encountered: