Whoa....time out folks - My proposal

Post by **Doobie** » Sat Apr 24, 2004 4:38 pm

Hey Baak,

Yes, I really did go to the galapagos and peru for just over 2 weeks. I just got back on Thursday evening (the 22nd of april).

It was awesome, galapagos islands were great, lots of unique animals; and everything is so incredibly tame, you can get within inches of many species.

Good snorkelling too. Apparently 5-7 ocean currents converge there depending on the time of year including both warm and cold water making for an incredibly nutrient rich environment, which means lots of sea life.

We swam with sea lions, sharks, rays, penguins, HUGE schools of fish, sea turtles, and even saw marine iguanas feeding in some shallower areas.

Then we went to peru

Peru was awesome too, we stayed in the capital, Cusco, and saw lots of sweet Inca ruins around there. Then we went to machu picchu which was incredible. So huge and intricate and such an incredible setting. We also did a fair bit of shopping at some great markets, and bought a bunch of alpaca stuff for really cheap. jackets, blankets, socks, toques, bags etc. The nice thing about alpaca is it's 8x warmer than wool. I spent about $100cnd for what would likely cost over $1000 here.

Our flights were absolutely attrocious though, in total, I saw 10 airports during this trip, and my way home took 26 hours including lay-overs. To make matters worse, I got a sinus infection shortly before leaving for home, which made all those hours extra miserable.

But it was all deffinately worth it

Once we get our pictures sorted and edited. I'll put together a simple website for people to take a look if they're interested.

igmo · Post by **igmo** » Sat Apr 24, 2004 5:22 pm

Woden wrote:Ideally I am going to need about 30 data points per unit per version per test type (i.e. distances of 30 archer shots in 1.3 and 30 shots in 1.5b1 for uphill, repeat for downhill).

thks

when 1.5b2 comes out ill do one with more data points. 100 is really out of the question, because each will take at least a min (if i get more efficient), and whatever total # you want needs to be multiplied by 6 (3 conditions, 2 game versions)

5 is a very small number of tests, true - but in those cases where there was a difference between the versions (archer say) there was a trend - the lowest values in one near the highest in others.

igmo · Post by **igmo** » Sat Apr 24, 2004 5:38 pm

ChrisP wrote:I looked at your results, but I don't think 5 tests per unit can prove anything.
...
Again, I recommend either using tags without random values or testing with at least 100 shots instead of just 5
....
I tried to use the PoOp forum a couple months ago, but you have to register for the damn thing, and I'm just too lazy.

heh, 100 eh? at a min each and times 6, thats 10 hours....

its pretty funny to say i need to put in 10hrs to make a valid analysis, when you wont take 30 secs to type in a username and fake email to post on the PoOp forum.

5 is small, but does begin to show a trend - and should not be ignored out of hand.

i'll do a larger pool for 1.5b2 (which myrd suggests may be a closer 1.3/1.5 match)

i'm not interested in vetting the arch (5 vet archers are rare in play) or in modifying the randoms. i think both would skew the test away from a "real-world" gameplay check.

Woden · Post by **Woden** » Sat Apr 24, 2004 6:25 pm

igmo wrote:when 1.5b2 comes out ill do one with more data points. 100 is really out of the question, because each will take at least a min (if i get more efficient), and whatever total # you want needs to be multiplied by 6 (3 conditions, 2 game versions)

Unfortunately I am going to need at least 30 data points per setup to be able to give any real result either. Stick with what you think is going to be most likely to show a problem to begin with, say archers shooting uphill and then create 30 data points for 1.5b2 and 30 for 1.3. I would be willing to help out in creating data points, and if a couple of other folks pitch in the majority of the work (getting the data) will hopefully not consume too much time.

If there was some way to know the true standard deviation for all possible shots in a certain setup then we could work with much smaller data sets....

If someone was willing to work with me on trying to build a mathematical model of how all the tag variables for archer shots work we could attack the problem from that end: attempt to show that the tag variables are producing the expected distribution of the range of shots.

Baak · Post by **Baak** » Sat Apr 24, 2004 6:43 pm

I went ahead and watched the "Double Dud Test" for M2 1.5 Beta #1 and have posted the results here for anyone who is interested:

Double Dud Test Results - M2 1.5 Beta 1

Funny that there were no back-to-back duds in 156 throws, but you can see that rebounds/duds tend to "cluster".

I'm perfectly happy with the way this plays, but just thought it was interesting.

I will post the 1.4.3 results tomorrow.

Doob: *too cool!*

Yeah, I really wish I lived in the Star Trek Universe with the ability to just beam places - not to mention the Holodecks!

My wife and I did a trip to Egypt a few years ago and the entire return trip from Mt. Sinai to home took us 48 hours with no actual hotel stays, just sleeping on planes. Needless to say we were thrashed. We both got the sinus infection thing as well. But it was well worth it! Definitely post those photos, etc. - I'd love to see them!

Seeya!

Woden · Post by **Woden** » Sat Apr 24, 2004 7:02 pm

Baak wrote:I went ahead and watched the "Double Dud Test" for M2 1.5 Beta #1 and have posted the results here for anyone who is interested:

Double Dud Test Results - M2 1.5 Beta 1

Funny that there were no back-to-back duds in 156 throws, but you can see that rebounds/duds tend to "cluster".

Woohoo! Numbers!

Question: Exactly how does the rebound setting work? Is there a 16.7% chance of a rebound occuring every time the ball hits something? i.e. If it bounces when it first hits the ground is there a 16.7% chance of it bouncing again?

And same with the dud setting: Is there a 6.5% chance of bomb turning into a dud every time it hits or bounces?

qwerty2 · Post by **qwerty2** » Sat Apr 24, 2004 9:58 pm

Baak wrote:Now that I've made a 1.5-compatible version of RDF 4, I could also have it with films if needed. I'll see if I can whip out a test this weekend if at all possible. This "double-dud" effect just seems more noticable than you would expect.

I've been complaining and complaining about duff dud/bounce being different in 1.4.x and 1.5 This is another thing that should really be looked into

qwerty2 · Post by **qwerty2** » Sat Apr 24, 2004 10:01 pm

Woden wrote:If someone was willing to work with me on trying to build a mathematical model of how all the tag variables for archer shots work we could attack the problem from that end: attempt to show that the tag variables are producing the expected distribution of the range of shots.

I can probably help. Last time I did maths was only last year and I use alot of stats for my ecology subjects

Orlando the Axe · Post by **Orlando the Axe** » Sat Apr 24, 2004 10:04 pm

I thought Lima was the capital of Peru

Post by **Doobie** » Sat Apr 24, 2004 11:24 pm

Lima is the biggest and craziest city in peru, lots of casinos, clubs, big hotels and such.

Cusco is the capital, and the cultural center, it's an ancient inca city originally with tons or great ruins surrounding the city, and a lot of old inca walls right in the city which have been incorporated into cathedrals and museums and such.

ChrisP · Post by **ChrisP** » Sat Apr 24, 2004 11:29 pm

igmo wrote:its pretty funny to say i need to put in 10hrs to make a valid analysis, when you wont take 30 secs to type in a username and fake email to post on the PoOp forum.

5 is small, but does begin to show a trend - and should not be ignored out of hand.

Well, registering for forums happens to be a pet peeve of mine. I even considered holding out on this (Magma) one, but no one seemed to care, even when I tried pouting and holding my breath.

And yes, just like if I _really_ wanted to post on the PoOp forum, I would need to do whatever it takes, I say you need to do whatever it takes if you _really_ want a valid analysis.

Mind you, this experiment isn't at all important to me except in that it seems to be important to you, and doubtless several other people as well. Trying to get accurate measurements of attack speeds for 1.4's vTFL was important to me though. TFL units have a random variable in their attack speed, so averages were needed, and to get an accurate time, the films had to be watched at 1/16th speed. This had to be done for each type of unit, and it probably did take me somewhat close to 10 hours. In the end, my results were judged not accurate enough due to errors I must have made, and weren't used.

However, I don't think your tests need to take a full minute per attack. If you get set up well, play the game, then watch the film at x2 speed while ticking off results on a chart, you could probably do it in less than 5 seconds to an attack + the initial set up time.

Finally, sorry, but I really do think a test with only 5 attacks should be ignored because any trend demonstrated by it could be very misleading. Looking at the minimum and maximum ranges in your archer tests, the results were either identical or very close to it. Were these way off, I might concede there was a trend to be worried about.

For the heck of it, I tossed both a penny and a dime 10 times each. As it turned out, both had identical results: 4 heads and 6 tails. But I'm sure you understand that from this test (which used twice as much data and with less random variables then your tests) that we can not infere there is a 60% chance of getting tails when flipping a coin. And honestly, if I tossed them 100 times and got only 40 tails, would that prove anything either? This is why I say any test either needs to remove the random variables or use a whole lot of data.

I hope I don't seem too argumentive; I'm really only trying to be help.

ChrisP · Post by **ChrisP** » Sun Apr 25, 2004 12:00 am

Woden wrote:If there was some way to know the true standard deviation for all possible shots in a certain setup then we could work with much smaller data sets....

If someone was willing to work with me on trying to build a mathematical model of how all the tag variables for archer shots work we could attack the problem from that end: attempt to show that the tag variables are producing the expected distribution of the range of shots.

Hey, I took Algebra I three times in High School. I should be perfect for any mathematical help you guys need!

As it turns out, bowman arrow projectile tags don't have a random initial velocity, so there are only two random variables that I can think of that need to be considered. Both are in the monster tag.

The initial velocity of the bowman arrow attack is 0.240 to 0.260. The increments are in steps of .002, i.e. 0.242, 0.244, etc... How this velocity datum relates to Myth (.240 world units per second?) I have no idea, nor can I guess how it affects range other than by making another test using 1.3 as the "pure" basis. At the VERY least, two special archer tags would have to be made - one with a straight .240 velocity, and the other with .260 - and the impact of these differences would have to be measured.

The other variable is initial velocity error, and the bowman arrow attack is set at 0.008. Again, the increments are in steps of 0.002, and this variable goes down by .002 for each vet kill, till it gets to 0.000 or the unit reaches 5 kills, which ever comes first. Now, I'm not even sure this setting even has an impact on what igmo is testing, but my _guess_ is that it works like this:

To calculate misses for each attack, a plus or minus range of the initial velocity error (-0.008 to 0.008) is added to the initial velocity in random increments of 0.002. So, assuming distribution is linear (and it problably is) we have the following random velocity variables in the bowman arrow attack:

(0.240 to 0.260) + (-0.008 to 0.008)

Now once we know exactly how much deviation a random setting of 0.002 can cause in terms of igmo's distance measurements, we can get on with a valid analysis.

Hope that helps :\

Woden · Post by **Woden** » Sun Apr 25, 2004 1:14 am

igmo wrote:so, big whoop you say? those differences seem pretty damn small you say? ok, mb, but an assertion from projectMagma is the gameplay is the same. a frequent assertion from the players is that range seems different. well, the players are right - the ranges are different.

if the unit tags are unchanged, the physics or some other aspect is altering the end result.

I forgot I had minitab, so I went ahead and used the data you provided in the rangeTest.htm document and ran some tests for archer ranges with the following results:

(much time passes while I monkey around with various things within minitab start to type up a response, trash it, and type up another one.)

95% Confidence intervals:
v1.3 downhill mean: 18.895-20.005 StDev: 0.268-1.285
v1.3 flat mean: 16.811-17.689 StDev: .212-1.016
v1.3 uphill mean: 14.129-15.371 StDev: .3-1.437

v1.5 downhill mean: 18.419-19.581 StDev: .28-1.344
v1.5 flat mean: 16.29-17.31 StDev: .246-1.18
v.15 uphill mean: 14.629-15.871 StDev: .3-1.437

Whats that mean? I interpret it as meaning that are values are pretty flaky and any results we get can hardly be trusted. By comparison, if I assume that I am using 30 data points per instead of 5, and I get the extra data just by repeating the 5 points over and over, my mean CI's drop down to a range of about .4 (from around 1.0) and StDev's drop to .3 (from around 1.0).

Null Hyp: equal
Alternate Hyp: not equal

Running ANOVA tests gives the following p-values:
uphill vs. version p-value .153
level vs. version p-value .1
downhill vs. verson p-value .159

For real evidence of difference we want p-values of less than .1 (usually)

Problems I see:
Small data sets
Distributions are NOT CONTINUOUS (all these tests assume pop dists are continuous and I don't really know how the chunkyness of the data effects things but I will try to find out on monday.) Is there anyway to gather a true continuous set of range data? I don't know anything about how myth works internally, but I guess that it has to cut up the map into sections and make arrows land in specific sections - is there some way to capture a true range (via script etc.) in manner that can be recorded before myth truncates the data?

Overall thoughts: I called myself an amateur statistician and I meant it - any feedback from others with experience in stats is welcome. Personally I don't really think we have data that means a damn thing either way right now. :)

igmo · Post by **igmo** » Sun Apr 25, 2004 9:33 am

thanks chris and woden for all the thought you've put into this.

to both of you, i repeat that i agree 5 is a small sample and that i plan on running a bigger one for 1.5b2.

to woden, um, since i'm not clear on what all those statistical terms mean, i'll just take your word that five is too small a sample (it also makes sence intuitively that 5 is too small.) it is intriguing that if the 5 were repeated to 30, the sample would get into a range that provides meaning.

to chris, the correlation between coin flipping (cf) and myth unit behavior (mub) is remote in the extreme. cf is a pure random event where you can expect that with a large sample the 2 outcomes will occur with perfect equality (excluding, as that one wonder woman episode showed, the 1 bazzilionth of a time that a coin lands on its edge, heh.) mub on the otherhand is a precisely designed event to which a small amount of randomness has been applied in order to keep things interesting. archer heroes are boring, archers are maddeningly intriguing. we are also looking at only one aspect of that mub - range of shot. accuracy is not considered here, though it is a primary goal of the randomizing.

because mub is a designed event, it will need less data to determine a trend than a pure random event like cf. there is a specific goal in mub - that, say, and archer shoots when 18 units away. that goal is filtered through randoms and shows up as the archer shooting from 17.5 to 18.5 units away... hopefully this goal and variance will be clear to see in 30 data points. cf has no specific goal. although anyone can guess that half the time you will get one result, it is also likely that with even 100 data points you'd get a 49/51 breakdown and could make statements that its not really 50/50. in your 10 point analogy, the result really would be meaningless.

chris, i'd make unrandomed artil units for grins - only im looking at the tags now and don't see where those randoms are.

can you give a brother a clue?

i'd also automate my map with a script that had the units attack the dummy automatically, and only once - only i'd have to figure that scipt out and it would take me a long while - and thats not the part that takes a minute. its opening the game, (waiting for the units to walk up), (spacebarring them at one attack), closely checking where they stopped, recording the data. stuff in parenthesis could be automated. other stuff has to take time - prolly more like 45 secs each test. if you can visualize that script easily and want to apply it to the plugin, i'd use it. the hour or two it would take me would be better spent running tests without automation.

igmo · Post by **igmo** » Sun Apr 25, 2004 9:38 am

aha! double click on the bowman arrow attack. of course! (there should be an edit button down there with the add, dup, delete btw.)

The Tain

Whoa....time out folks - My proposal