Minutes 11.04.2012

individual meeting 11.04.2012

Questioning:
Q:how to implement reusing knowledge?
This is your research topic for the thesis. So come up with an idea next time.

Q: Is my idea for adaptive UCT, ok?
K * childrange/totalrange = C (is good, such that the user can define a domain specific K)
Multidimension actions?:
there is no difference you look at the range of rewards not of actions ( in HOO there is a difference )

Q: Do I have to play each Childnode once ?
no not necessary. Hoo have at some points nearly 50% unplayed leafs
only if you reach an unplayed childnode you have to play it

Q: How to handle “nSimulations” + “sumRewards” after a split for DELETION/REUSE/RECALL??
for recall there is no problem
for reuse (you have to check, but i should be the same->nothing to change)
for deletion: you could either delete the old expectedReward and “nSim” and update all nodes above
but this cost a lot of performace, (but therefore your expected value is better than in usual UCT)
or you take the expected reward and “nSims” from your test-statistics
this is what you should do

Q: How shall I do my actual “greedy pick” for the “realworld” after “n” simulations?
Shall I compare pairwaise to the best leaf of the regression tree, or shall I compare all leafs against each other?
try both (Micheal mentioned that in theory there should be no difference because the parents should get a better reward)

Q: How to compare the child nodes for the “greedy pick”? There are four known kinds like “highest value” “most simulated” “most simulated and highest” “best mean”
pick one ( expected reward ) or check if there is a proof that one is the best of them. Do not investiged to much on this

Q: Are there known environments with fixed depth?
pole balancing
Micheal showed me a second environment

TODOs and advice
introduce error
visualize your tree
put your structure of the thesis to your block
keep track on time (for time purposes it is always necessary to rerun a run again which saves the result, the first run (for time) has to be “just” fast)
Compare performances:
Random vs. Vanilla vs. MyAgent vss MyAgenV2 vs. usw.

Some nice outcome of the thesis might be:
if you have low memory + low time use TLSv1
if you have only low memory use TLSv2
if …
compare the different versions for time and memory purposes

OPEN QUESTIONS:

How to handle Environment with terminalstates on different depth.
I.e. there could be one MCTSNode(representing a range of states) which leads
either to a terminal state or to a “normal” state.

MY TODO:
put my structure of the thesis to your block
implement the update of “nSims” “expectedReward”
implement error
make all savings and outputs depending on one BOOLEAN parameter
test with time (does that part of the software still work)
visualize tree
START READING AND WRITING!!!
create a presentation for Thursday presentation
think about reusing and concequences

Appointments
next indivisual meeting maybe in 2 Weeks
next joint meeting 2.5.2012

Advertisements
This entry was posted in Minutes. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s