Minutes 21.5

(Individual meeting with Kurt)
First we looked at the my last posts and my results.
It seems that my results for the “selection strategies” are wrong, because it doesn’t make sense, that UCT-pairwise and UCT-global differ after 25 samples.
-> (I looked it up in the data and it is the 28th sample that differ + because it is “TLS_RECALL”, sometimes already 2 splits has been done.. but I also found a bug in the UCT-pairwise selection ;-)… I will correct the pictures as far as I have the results again)

The significance results seems to be good.

There are two ways to add noise to your environment.
1. Add noise to your action (i.e. noise engine)
2. Add noise to your reward (i.e. noise sensors)
It is possible to combine those else well as doing just one of them.
Kurt:”For the beginning just add a (gausian distributed-)noise to your X and Y value in your environment (DonutWorld). Take noise rates i.e. of 5% and 10%.”

Reusing knowledge
I told Kurt my problems and ideas about “reusing old knowledge” in TLS and he agreed that my approaches will not help. He gave me the hint that I could somehow weight the “old knowledge” and the “new knowledge”. I guess this is the only reasonable idea and I have to figure out how (and where) to apply this to TLS.

– fix axis labels + make clear what “regret” means
– investigate on “reuse”
– do the splitting probabilities also for “multi-dimensional-sinusfunction”
– add noise to your environments and run the selection strategies experiments again

