individual meeting 11.04.2012
Q:how to implement reusing knowledge?
– This is your research topic for the thesis. So come up with an idea next time.
Q: Is my idea for adaptive UCT, ok?
– K * childrange/totalrange = C (is good, such that the user can define a domain specific K)
– there is no difference you look at the range of rewards not of actions ( in HOO there is a difference )
Q: Do I have to play each Childnode once ?
– no not necessary. Hoo have at some points nearly 50% unplayed leafs
only if you reach an unplayed childnode you have to play it
Q: How to handle “nSimulations” + “sumRewards” after a split for DELETION/REUSE/RECALL??
– for recall there is no problem
– for reuse (you have to check, but i should be the same->nothing to change)
– for deletion: you could either delete the old expectedReward and “nSim” and update all nodes above
– but this cost a lot of performace, (but therefore your expected value is better than in usual UCT)
or you take the expected reward and “nSims” from your test-statistics
– this is what you should do
Q: How shall I do my actual “greedy pick” for the “realworld” after “n” simulations?
Shall I compare pairwaise to the best leaf of the regression tree, or shall I compare all leafs against each other?
– try both (Micheal mentioned that in theory there should be no difference because the parents should get a better reward)
Q: How to compare the child nodes for the “greedy pick”? There are four known kinds like “highest value” “most simulated” “most simulated and highest” “best mean”
– pick one ( expected reward ) or check if there is a proof that one is the best of them. Do not investiged to much on this
Q: Are there known environments with fixed depth?
– pole balancing
– Micheal showed me a second environment
TODOs and advice
visualize your tree
put your structure of the thesis to your block
keep track on time (for time purposes it is always necessary to rerun a run again which saves the result, the first run (for time) has to be “just” fast)
Random vs. Vanilla vs. MyAgent vss MyAgenV2 vs. usw.
Some nice outcome of the thesis might be:
if you have low memory + low time use TLSv1
if you have only low memory use TLSv2
compare the different versions for time and memory purposes
How to handle Environment with terminalstates on different depth.
I.e. there could be one MCTSNode(representing a range of states) which leads
either to a terminal state or to a “normal” state.
put my structure of the thesis to your block
implement the update of “nSims” “expectedReward”
make all savings and outputs depending on one BOOLEAN parameter
test with time (does that part of the software still work)
START READING AND WRITING!!!
create a presentation for Thursday presentation
think about reusing and concequences
next indivisual meeting maybe in 2 Weeks
next joint meeting 2.5.2012