Page 6 of 8

Posted: Tue Apr 27, 2004 12:08 am
by Baak
Was neat to see that the codes uses a "tick count" for one aspect of the randomness. :)

I guess theoretically you could get the same seed for a game every 4,294,967,296 games or so (assuming it's a 32-bit unsigned int), but then even if it is the same seed, all the players actions are going to be different in two different games so they will never play *exactly* the same way. (Every attack/bottle/arrow would have to be shot at precisely the same time as the previous game, every unit would have to have the same health when killed, etc., etc.).

Let's see, I've played about 3,500 games in the last 3.5 years - so 1,000 games/year - that'll take me 4.3 million years...

I think we're safe mb - whew! :D


Yeah, it probably needs a sample more like 1,000 - but after tracking 312 tosses (156 x 2) I can safely say that I doubt if I could do 2,000 - my arm was getting tired watching that poor Mini Rocket Dorf lobbing shots! :;):

I think it's random enough mb... :D

Posted: Tue Apr 27, 2004 3:40 am
by qwerty2
Woden wrote:Running ANOVA tests gives the following p-values:
uphill vs. version p-value .153
level vs. version p-value .1
downhill vs. verson p-value .159

For real evidence of difference we want p-values of less than .1 (usually)
For those of you not sure what ANOVA is a p-value of .153 means there is 15.3% chance that there is no difference between the data sets, .1 means 10% etc

In ecology anything below .05 is considered statistically signifigant. Physcology uses .1.

Posted: Tue Apr 27, 2004 4:42 am
by Woden
qwerty2 wrote:For those of you not sure what ANOVA is a p-value of .153 means there is 15.3% chance that there is no difference between the data sets, .1 means 10% etc

In ecology anything below .05 is considered statistically signifigant. Physcology uses .1.

I don't think I mentioned it the first time, but when comparing range variances for archers from version to version I got p-values of .78, .93, and 1.0.

I was thinking the ANOVA p-values are just for the means?

Either way, the heavy handed rounding of the data points (varies from about 14 - 16, but is rounded to the nearest .25) is likely blowing any reliability out of the water with the number of samples we are working with (5).

Larger sample sizes are needed imo.

Posted: Tue Apr 27, 2004 9:11 am
by igmo
ok, so i figured i could start making data points for 1.3 and 1.4.3, since those won't change (and to see how long it would take... its actually more than 1 min per point)

i also took 10 points for 1.5b1

with more data, the ranges are extremely similar in all versions.

uphill datapoints here: http://home.austin.rr.com/aoffice/rangeTest-042704.htm

if someone wants the excel file, lemmie know.

in that discussion on random number seeds, i have a question. since i am running this test in saved solo games, does the seed get reset each time the game is loaded, or is it a part of the save? since i am performing a very similar set of actions (select all units and click tartget) it seems possible i could repeat the uses of a random if the seed is saved in my game. how fine are the ticks?

Posted: Tue Apr 27, 2004 9:56 am
by ChrisP
The seed from the game is always saved as part of the film, otherwise every film would go OOS.

Posted: Tue Apr 27, 2004 11:03 am
by ducky
erm, in each test, the seed starts out at a different value, right?

SO, if you started 2 games with the exact same seed, you would generate the exact same dud/bounce/archer results, given you performed the actions in the same sequence (say, for instance, you had a dorf scripted to throw bottles, an archer scripted to shoot arrows, or whatever)... right?

If the random number formula + the variance settings in the (monster?) tag haven't been changed, how could ANYTHING dicussed here (range, dudding, bouncing) be "different" in 1.5?!


Can you set the seed? Would a 1.5 game with a scripted performance (%$&$&%@%&@^& now i'm thinking about what to do here :( ) with the SAME SEED as a 1.4/1.3 game yield the SAME RESULTS?!

If the answer to this question is yes, I can't possibly consider any hypothesis that the range (among other things) has 'increased' or is even different between versions.

Posted: Tue Apr 27, 2004 11:45 am
by Woden
igmo wrote:uphill datapoints here: http://home.austin.rr.com/aoffice/rangeTest-042704.htm

if someone wants the excel file, lemmie know.

Woohoo! More data!

I have a test this afternoon, but when I get home tonight I will run some analysis on what is there.

If the web page was generated directly from Excel, it can be saved, then reopened in excel again. :)

One thought: It is probably best to save the films that you use to generate the test numbers - secondary verification of numbers is always a good idea.

Posted: Tue Apr 27, 2004 12:28 pm
by igmo
ChrisP wrote:The seed from the game is always saved as part of the film, otherwise every film would go OOS.
so that means that if i saved the film while my units were in the process of walking up for an attack, the result would always be identical? (it seemed the case in the 2x i let that happen)

perhaps this test should not be performed with the solo/revert to last save setup.

Posted: Tue Apr 27, 2004 12:32 pm
by igmo
Woden wrote:One thought: It is probably best to save the films that you use to generate the test numbers - secondary verification of numbers is always a good idea.
i could save film, and thats good practice and all - but it would increase the time required for each data point. you will all just have to trust me or perform your own tests :)

besides, who would really open them all to confirm?

Posted: Tue Apr 27, 2004 12:37 pm
by igmo
ducky wrote:If the random number formula + the variance settings in the (monster?) tag haven't been changed, how could ANYTHING dicussed here (range, dudding, bouncing) be "different" in 1.5?!
that seems to be the point at which we are arriving. however, 1.4 range always felt different - and was supposed to be due to intentional tinkering by mythdev. perhaps that was present only in 1.4.1 and .2, and went away in 1.4.3.

anyway, the range felt to me (and others based on in-game commentary) to have changed again from 1.4.3 to 1.5b1. the whole point of checking now is to see that 1.5 is indeed back to 1.3. my own (limited) testing doesnt show a dramatic difference in the 1.4.3 to 1.3 uphill range - so i'm not sure what i was "feeling." mb the level and down ranges are more different... we shall see.

Posted: Tue Apr 27, 2004 3:22 pm
by Graydon
Aye...what's the point in comparing to 1.4.x if in fact you are testing the theory that 1.5 is reverted to 1.3 gameplay style? Shouldn't you be making your tests against gameplay in 1.3? 1.4 will soon be obsolete....The logic doesnt make sense to me in the point of testing against 1.4 is all :)

Posted: Tue Apr 27, 2004 3:57 pm
by igmo
my intent IS to test 1.5 vs 1.3... i'm waiting for 1.5b2 to do so.

just checked 1.4.3 for interest and to (i thought) confirm it was different than 1.3 in an objective compairison.

Posted: Tue Apr 27, 2004 4:46 pm
by Woden
qwerty2 wrote:For those of you not sure what ANOVA is a p-value of .153 means there is 15.3% chance that there is no difference between the data sets, .1 means 10% etc

In ecology anything below .05 is considered statistically signifigant. Physcology uses .1.

I got out of the test early and have minitab access at school so I ran ANOVA checks against the data that igmo linked earlier.

Results:

Archer range up 1.3 v 1.5 gives a p value .388
Archer range up 1.4 v 1.5 gives a p value .955
Archer range up 1.3 v 1.4 gives a p value .201
Archer range up 1.3v1.4v1.5 gives a p value of .374

Still doesn't say much as we are still fairly low on data points. The only thing I would say seems to be certain is that 1.4 and 1.5 archer ranges uphill are the same.

More data is always better.

Posted: Wed Apr 28, 2004 8:00 am
by igmo
i'll point out b4 others that there are only 10 points in the 1.5 data woden used... so more points may reveal a different p value.

Posted: Wed Apr 28, 2004 12:47 pm
by Woden
igmo wrote:i'll point out b4 others that there are only 10 points in the 1.5 data woden used... so more points may reveal a different p value.

With enough data I expect the p-values would rapidly approach either 1 or 0.