Problem of Errorbars

I have tried to add errorbars to my plots, but there are several problems…
I do not see more information in any of those plots…
Or I have to plot 1-2 graphs per plot-> that will lead to many many plots and a lot of work

current version in the thesis

fill area between lower and upper bound

single curve,…colors do not fit + graphs shift the content in the legend

all bars to all points. Graphical error while drawing the data

all curves,… but no additional information

Posted in Uncategorized | Leave a comment

Minutes 25.6.

Skype meeting with Michael:

We talked about the points that Michael wants to be changed in chapter 3 and 4.
I should come up with a more precise research question that could not be answered with “Yes” or “No”
Section 2.5 should have two subsections 1=Hoot 2=Colins thesis in 5 sentences(own words)
I should create a list of small things and experiments that I would have done if I get more time.
Michael want to see the next version on Wednesday or early Thursday to be able to give last advice for Friday.

I should hand in on Friday morning (latest 12 o’clock).
If the version of the thesis is good enough -> everything is fine
Else they will call me on Friday and discuss with me further steps ( if I get a few more days//if the date of the defense has to be changed …)

The version has to be more less finish, which means all important experiments and especially the text for conclusion/discussion about the results/introduction

Posted in Minutes | Leave a comment

One shot environments

I currently writing my “one-shot environments” section of the thesis and rerun old experiments.
(Maybe a bug causes this but) my “Deletion” agent performs worst than the Vanilla Agent in the Sinus environments.

The problem is that the “Deletion” agent start to focus “too” fast on local maxima instead of looking for the global maxima.
I changed the C in the UCT formula to various values but the result over 1000 runs stays always the same.

I will discuss those results with Kurt during our meeting today.

Error for the Sinus Function

Error for the Sinus3D Function

Error for the Sinus4D Function

Error for the Six-hump Camel Back Function

Posted in Thesis Progress | Leave a comment

Minutes 25.5 Joint Meeting

Nice meeting in the garden with Michael, Colin and Andres 🙂 start:11:00 – end:15:00

I showed my results about the significance level + TG_MIN_EXAMPLES and give an small overview about my results with noise.
Michael told me to focus on my goal for the thesis(“Reuse of Knowledge”). Time keeps running and I have to focus on the important parts
and may skip a few “fine tunings”. And I should come up with charts instead of tables.

I present my “Thesis Plan” and explained why the plan has changed that much.
In the beginning I thought I would have to implement a fixed algorithm and come up with new version(“Reuse”) and benchmark those on a few environments. The problem that occur is that the algorithm isn’t explained in such detail, that it was just a “re-implementation”.
Many detail-questions rises during the implementation such that I had to test a lot of variations. In addition there was no fixed set of parameters and non of the existing once has been tested/evaluated yet.

I guess this is no bad development of the thesis. It is just a small shift of the major goal. When I will finish my fine tuning results and also come up with a reasonable version of “Reuse”, the essence of my thesis is analyzing and explaining TLS in more detail + a benchmarking of three different versions of TLS.

My future planning is:
– finish my already started fine tuning results
– start/develop/finish the “Reuse” version of TLS
– writing thesis
– benchmark “deletion”, “perfect recall” and “resuse”

We also talked about noise and how to apply it to an environment.
Michael wants noise on the reward instead of noise on the action because with noise on rewards you still know the optimum.
The calculation of the optimum for noise on the action is not that easy and has to be done for each level of noise * for all environments.
Instead of the “modolu” operation that I have chosen, we should use a normal distribution with a “sigma” equal to the noise level.
Reward = reward( action + N(0,sigma) )

– Andreas will check if there is a problem with the defense date 25.7.

– We are aiming to hand in
draft version: 29.6
final version: 13.7

– Michael will be absent for the next 3 weeks ( till 18.6 ) -> 2 weeks for meetings and corrections of chapters (approximately 2 corrections)

Posted in Minutes | Leave a comment

Selection strategies (3) with noise

I re-ran all experiments again, but with a certain noise level to the action.
The noise has been added to all dimensions of the action with a new random number.
The random number comes from a standard deviation with sigma=1 and mean=0 and has an interval of [-1;1].
A noise level of 5% means that the highest reachable noise value is equal to -5% or +5% of the possible action range.

In the SinusEnvironment the action range is equal to [0;3]. 5% of this range is 0.15 which is multiplied with a random number and than added to the action value. The action=1.0 could therefore be change to an set of values that lies in the interval [0.85; 1.15].
( actionValue = rangeLength * noiseLevel * randomNumber + realActionValue )

noise level: 5%, 10% and 25%

Sinus 5%:

Name reward splits steps samples
greedyTuctF 0.6035023062423907 18.802 2.0 1000.0
greedyFuctF 0.6040585838460535 18.787 2.0 1000.0
greedyFuctT 0.5979691038489539 15.911 2.0 1000.0
greedyTuctT 0.59825262100435 15.88 2.0 1000.0

Sinus 10%:

Name reward splits steps samples
greedyTuctF 0.6066269988505658 8.818 2.0 1000.0
greedyFuctF 0.6067923388366762 8.822 2.0 1000.0
greedyFuctT 0.6062219667448856 8.838 2.0 1000.0
greedyTuctT 0.6062244257106849 8.836 2.0 1000.0

Sinus 25%:

Name reward splits steps samples
greedyTuctF 0.5939506761968181 6.226 2.0 1000.0
greedyFuctF 0.5940263413928148 6.23 2.0 1000.0
greedyFuctT 0.5930231757837592 6.28 2.0 1000.0
greedyTuctT 0.5928948564852644 6.281 2.0 1000.0

Sixhumpcamelback 5%:

Name reward splits steps samples
greedyTuctF 0.7091581425244804 27.924 2.0 1000.0
greedyFuctF 0.7124759285110971 27.943 2.0 1000.0
greedyFuctT 0.7095909070580521 27.728 2.0 1000.0
greedyTuctT 0.7138356501297819 27.598 2.0 1000.0

Sixhumpcamelback 10%:

Name reward splits steps samples
greedyTuctF 0.006789875018259652 24.004 2.0 1000.0
greedyFuctF 0.038062705281156656 24.023 2.0 1000.0
greedyFuctT 0.04887399195052821 23.991 2.0 1000.0
greedyTuctT 0.05299734373558295 23.98 2.0 1000.0

Sixhumpcamelback 25%:

Name reward splits steps samples
greedyTuctF -2.6071878023028106 23.307 2.0 1000.0
greedyFuctF -2.567087832987661 23.266 2.0 1000.0
greedyFuctT -2.464286323408397 23.357 2.0 1000.0
greedyTuctT -2.5385730999717024 23.335 2.0 1000.0

DonutWorld 5%:

Name reward splits steps samples
greedyTuctF 2.581416127418022 127.288 3.0 1000.0
greedyFuctF 2.5779135800014794 127.256 3.0 1000.0
greedyFuctT 2.601451485622715 121.695 3.0 1000.0
greedyTuctT 2.6013879848545107 121.661 3.0 1000.0

DonutWorld 10%:

Name reward splits steps samples
greedyTuctF 2.2842186987644393 98.103 3.0 1000.0
greedyFuctF 2.2804123932855713 98.188 3.0 1000.0
greedyFuctT 2.284743097292101 92.289 3.0 1000.0
greedyTuctT 2.289177410042218 92.123 3.0 1000.0

DonutWorld 25%:

Name reward splits steps samples
greedyTuctF 1.4885869776709448 65.289 3.0 1000.0
greedyFuctF 1.4879622238233174 65.183 3.0 1000.0
greedyFuctT 1.5125176928198325 57.452 3.0 1000.0
greedyTuctT 1.511118881469104 57.406 3.0 1000.0
Posted in Thesis Progress | Leave a comment

Results from selection experiments

After I fixed my Bug I had rerun the experiments for the selection strategies.
It is obvious that I would need more runs per experiments to get rid of the bias, but I think the results are still reasonable.
The pictures from my “last” posts are updated(corrected) and here are the averages of the final picks.


Name reward splits samples
greedyTuctF 0.6200948656572903 64.12 1000.0
greedyFuctF 0.6023553929980933 64.079 1000.0
greedyFuctT 0.6036651458279564 70.646 1000.0
greedyTuctT 0.6267016743750767 70.642 1000.0


Name reward splits samples
greedyTuctF 1.0295803509510142 121.688 1000.0
greedyFuctF 1.029588522858646 121.819 1000.0
greedyFuctT 1.028956652335863 121.647 1000.0
greedyTuctT 1.0289708657884484 121.736 1000.0


Name reward splits samples
greedyTuctF 0.9976101741153903 216.749 1000.0
greedyFuctF 0.9975777801603039 216.864 1000.0
greedyFuctT 0.9981728203167863 212.326 1000.0
greedyTuctT 0.9981697167821809 212.416 1000.0
Posted in Uncategorized | Leave a comment

Minutes 21.5

(Individual meeting with Kurt)
Last posts
First we looked at the my last posts and my results.
It seems that my results for the “selection strategies” are wrong, because it doesn’t make sense, that UCT-pairwise and UCT-global differ after 25 samples.
-> (I looked it up in the data and it is the 28th sample that differ + because it is “TLS_RECALL”, sometimes already 2 splits has been done.. but I also found a bug in the UCT-pairwise selection ;-)… I will correct the pictures as far as I have the results again)

The significance results seems to be good.

There are two ways to add noise to your environment.
1. Add noise to your action (i.e. noise engine)
2. Add noise to your reward (i.e. noise sensors)
It is possible to combine those else well as doing just one of them.
Kurt:”For the beginning just add a (gausian distributed-)noise to your X and Y value in your environment (DonutWorld). Take noise rates i.e. of 5% and 10%.”

Reusing knowledge
I told Kurt my problems and ideas about “reusing old knowledge” in TLS and he agreed that my approaches will not help. He gave me the hint that I could somehow weight the “old knowledge” and the “new knowledge”. I guess this is the only reasonable idea and I have to figure out how (and where) to apply this to TLS.

– fix axis labels + make clear what “regret” means
– investigate on “reuse”
– do the splitting probabilities also for “multi-dimensional-sinusfunction”
– add noise to your environments and run the selection strategies experiments again

Posted in Minutes | Leave a comment