split probabilities (Sixhumpcamelback)


significance splits samples reward
0.1 167.758 1000.0 0.8391405906445085
0.05 127.989 1000.0 0.5690483573253337
0.001 154.865 1000.0 1.0206215175910123
0.0001 134.718 1000.0 1.0236344912295705


significance splits samples reward
0.1 61.82 1000.0 0.7980104591540007
0.05 61.894 1000.0 0.9266597255352897
0.01 60.142 1000.0 1.0055977331055612
0.001 49.74 1000.0 1.0278628308975193
0.0001 36.412 1000.0 1.026826879201695


0.01 over 1000 samples

0.01 1-100


0.0001 1-1000

0.0001 1-100

Posted in Thesis Progress | Leave a comment

split probabilities

I played around with the significance levels of TG.
HINT: Significance levels I test the Sinusfunction with the same parameter set than the last experiment for
significance = 0.05, 0.01, 0.001, 0.0001

It is good to see that the significance has more influence than Colin told me, before.

significance splits samples reward
0.5 57.419 1000.0 0.6141098429842214
0.1 57.296 1000.0 0.614034635554727
0.05 57.036 1000.0 0.6137081380421051
0.01 55.298 1000.0 0.6126827592559511
0.001 49.881 1000.0 0.6106817905851388
0.0001 43.194 1000.0 0.6072959593103062

#Sample 1-100







#Sample 1-1000



0.05 #sample 1-1000

0.01 #sample 1-1000

0.001 #sample 1-1000

0.0001 #sample 1-1000

Experiment Config:
-a = agent.tls.TLSAgent
-env = test.sinus.SinusEnvironment
-e = 1000
-t = 0
-er = 1.0E-15

Agent Config:
tg_significance = 0.001
tg_min_examples = 15
tg_n_tests = 30
nrs = 1000
simtime = 0
pseudorandom = 1000
adaptiveuct = 1
uct_c = 2.0
uct_k = 1.0
splittype = 0
greddy_global_compare = true
uct_global_compare = false
discountfactor = false
Posted in Thesis Progress | 1 Comment

Selection strategies in TLS (learning curves)

I created also the learning curves for the results from yesterday which are very interesting.

I will discuss that tomorrow with Kurt.
For today I will try to create some results of splitting probability according to the significance level of TG.

Sinus learning (sampling)

Sixhumpcamelback learning (sampling)

DonutWorld(3 steps) learning (sampling)

Posted in Thesis Progress | 3 Comments

Selection strategies in TLS

As I mentioned before I recognized a difference when I changed the strategy of selection. First of all I start debugging my code and this was a hard thing to do but after 2 days of fixing I guess the Perfect Recall should finally work right.
There are to different types selections in my TLS agent.
The first selection type is used during sampling(UCT selection). In standard TLS this should go from the root of an action tree and compare two child nodes against each other and select the child node with the higher UCT-Value. (UCT_SELECTION)
The second selection type is used when the agent has to choose an action for the ‘real world'(the environment). Again standard TLS goes from the root to a leaf by comparing two child nodes.
For normal MCTS you usually compare always all children of a (Meta) node against each other and select the one with the highest value.
(The strategy of standard TLS should be more robust against noise data and should be also much faster in comparing child nodes against each other)

What I did today is running 1000 experiments with 1000 samples each (for multi step 1000 samples per step) for SinusEnvironment, Sixhumpcamelback, and DonutWorld. All experiments without noise.
Comparing 4 TLS agents with perfect recall against each other. The agents differ only in their selection strategies.
(named the N vs. N comparison “global” and the comparison of two child nodes “pairwise”)

f0 = greedy=global uct=pairwise (red)
f1 = greedy=pairwise uct=pairwise (green)
f2 = greedy=pairwise uct=global (blue)
f3 = greedy=global uct=global (pink)

f2 and f3 are obviously stronger than f0 and f1. So picking with a global comparison during sampling seems to give better results for environments without noise.
Because I have to create “work-around” in RL-Glue to get the right results for experiments with noise, those experiments have to wait for a while.

sinus regret (best seen sample)

sinus regret (greedy picked value)

sixhumpcamelback regret (best seen sample)

sixhumpcamelback regret (greedy pick)

DonutWorld(3 steps) regret (best seen sample)

DonutWorld(3 steps) regret (greedy pick)

Posted in Thesis Progress | Leave a comment

Minutes 11.5

Individual meeting:

We started with “state of the art”:

Available environments:

Environement Action dimensions no. of steps
Sinus 1 1
Sixhumpcamelback 2 1
Multidimensional Sinus n 1
DonutWorld 1 n
Acrobot 1 variable length

Comment: Start with Donutworld. It has a fix depth and your research topic is about the behaviour of TLS for multistep problems.

1. all samples
2. best-seen sample
3. split-probability
4. performance
5. error
automatic learning curves (currently I have to create them by hand)
regret/error for multistep environments
automatic evaluation for experiments with time constraints
– You might also collect the tree depth to look for Colins problems (performance leak after lots of samples + he had a max depth for his action tree-> The the max-depth the reason for the performance leak?)
– Michael want to see the split-probability as horizontal bar + a certain gray scale for each sample, indicating how probable a split is after that sample(Black = 100%; White = 0%)
keywords: imagec (vector of colors), subfigure (for multiple figures in one picture)

TLS Deletion
TLS Recall
– TLS cloning
Comment: We will discuss “TLS cloning” next time.

I told Kurt and Michael about the my results of Global_Greedy_Selection and Global_UCT_Selection.
– Compare the different picks if you use “Global_Greedy_Pick” against “pairwise_selection”. ->When do they differ? and why?
– Does it also have an influence for “TLS_Recall” ? -> if yes: you have a bug.

Q: What is the reason for “pairwise selection” ?
– It should come to the same result
– It is much faster than comparing all N child nodes against each other
– t(noisIf you have noise in your environment-> “pairwise selection” should lead you to a region which you can truse resistant). By just looking at single nodes the result can be strongly effected by noise. If you go down the action tree following the highest average the effect of noise should be lower.

Q: Shall I evaluate multi-shot experiments jsut for the cumulative reward?
A: yes!

Q: Shall I design experiments with time constraints like I did for samples? Result after 1 milliseconds , result after 2 milliseconds,…
A: Not that detailed but with small and significant steps. i.e. step size: 50 millisecond

Q: What shall I evaluate in the “fine tuning” section? (Currently: TG_Significance, TG_N_TESTS, TG_MIN_EXAMPLES, apdativeC)
A: If you use different UCT_K you have to test it. Otherwise you can leave it out.

Q: How many pages are expected for the thesis?
A: There is no strict number. Kurt’s students had round about 60 pages.

About the split-probability results:
Kurt: Currently your TG_MIN_EXAMPLES is the indicator when a split is probably done, but it should be TG_Significance.
Me: I also recognized it but Colin told me that he has the same results for higher significance levels. I will test that next week.
Kurt: I would expect some kind of standard deviation curve around the most probable splitting point

– finish the pipeline for multistep (general scripts)
– test/search for a bug in “global selection” vs. “pair wise selection”
– test the influence of higher(/lower) significance levels
– benchmark on Donutworld
– write till section 2.4

Posted in Minutes | Leave a comment

split probability

Michael and Kurt asked my for graph that shows the probability of a split at the “i”th sample

I ran 10000 experiments for 1000 samples on the sinus and sixhumcamelback function.
The first 100 samples shows significant point, after that it becomes a ‘quasi’ linear graph with y=5.5%.
Splitpolicy= deletion

Sixhumpcamelback for 10000 runs with 1000 samples

Sinus for 10000 runs with 1000 samples

Sinus for 10000 runs with 1000 samples

Splitpolicy= recall

sinus with recall

sinus with recall

sixhump with recall

sixhumpcamelback with recall

Posted in Thesis Progress | Leave a comment

parameter set

To calm Kurt a bit about the number of parameters that are necessary for a TLS-Agent, I summarized my list of parameters.
The list contains of 6 parameters..but:
1. Which TLS_Splittype a user should take will/could be answered by my thesis.
2. We might also fix the significance level.
3. UCT_K could be ignored because the user would still have the chance to influence the results.
4. The discount factor could be treated as an environment problem (general problem for each MDP system)
TLS works also for other environments that do not need a discount factor.
-> So for the perfect case 2 parameter are left.

I think that if the significance level is chose well than there is no need for TG_MIN_EXAMPLES, except for the case that the user needs/wants a hard limit for that.
TG_N_TESTS has to be tested, because it will have an influence on the memory demand and the simulation time. More tests per node means more values to store and also more tests that have to be updated and checked after each sample that has passed the node.

//how many tests per node (how many possible splitting points for each action node)
TG_N_TESTS = 25;
//min number of samples, before a test starts to look for a split
//domain specific (should be answered by my thesis)
//tradeoff for exploration and exploitation in adaptive UCT
TLS_UCT_K = 2;
//Discount factor
gamma = 0.9;
//significance for splits (may fix that parameter)

//which adaptive C is the best version -> crossvalidation
adaptiveUCTType = UCT_PARENT;
//Compare all N child nodes against each other or compare pairwise to the root
//same for UCT selection (during simulation)

Posted in Thesis Progress | Leave a comment