Minutes 11.5

Individual meeting:

We started with “state of the art”:

Available environments:

Environement Action dimensions no. of steps
Sinus 1 1
Sixhumpcamelback 2 1
Multidimensional Sinus n 1
DonutWorld 1 n
Acrobot 1 variable length

Comment: Start with Donutworld. It has a fix depth and your research topic is about the behaviour of TLS for multistep problems.

Output
1. all samples
2. best-seen sample
3. split-probability
4. performance
5. error
todo:
automatic learning curves (currently I have to create them by hand)
regret/error for multistep environments
automatic evaluation for experiments with time constraints
Comment:
– You might also collect the tree depth to look for Colins problems (performance leak after lots of samples + he had a max depth for his action tree-> The the max-depth the reason for the performance leak?)
– Michael want to see the split-probability as horizontal bar + a certain gray scale for each sample, indicating how probable a split is after that sample(Black = 100%; White = 0%)
keywords: imagec (vector of colors), subfigure (for multiple figures in one picture)

Agents
MCTS
Vanilla
TLS Deletion
TLS Recall
todo:
– TLS cloning
Comment: We will discuss “TLS cloning” next time.

_______________________________
Questions
I told Kurt and Michael about the my results of Global_Greedy_Selection and Global_UCT_Selection.
A:
– Compare the different picks if you use “Global_Greedy_Pick” against “pairwise_selection”. ->When do they differ? and why?
– Does it also have an influence for “TLS_Recall” ? -> if yes: you have a bug.

Q: What is the reason for “pairwise selection” ?
A:
– It should come to the same result
– It is much faster than comparing all N child nodes against each other
– t(noisIf you have noise in your environment-> “pairwise selection” should lead you to a region which you can truse resistant). By just looking at single nodes the result can be strongly effected by noise. If you go down the action tree following the highest average the effect of noise should be lower.

Q: Shall I evaluate multi-shot experiments jsut for the cumulative reward?
A: yes!

Q: Shall I design experiments with time constraints like I did for samples? Result after 1 milliseconds , result after 2 milliseconds,…
A: Not that detailed but with small and significant steps. i.e. step size: 50 millisecond

Q: What shall I evaluate in the “fine tuning” section? (Currently: TG_Significance, TG_N_TESTS, TG_MIN_EXAMPLES, apdativeC)
A: If you use different UCT_K you have to test it. Otherwise you can leave it out.

Q: How many pages are expected for the thesis?
A: There is no strict number. Kurt’s students had round about 60 pages.

About the split-probability results:
Kurt: Currently your TG_MIN_EXAMPLES is the indicator when a split is probably done, but it should be TG_Significance.
Me: I also recognized it but Colin told me that he has the same results for higher significance levels. I will test that next week.
Kurt: I would expect some kind of standard deviation curve around the most probable splitting point

TODOs
– finish the pipeline for multistep (general scripts)
– test/search for a bug in “global selection” vs. “pair wise selection”
– test the influence of higher(/lower) significance levels
– benchmark on Donutworld
– write till section 2.4

Advertisements
This entry was posted in Minutes. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s