
“If luck weren’t involved, I guess I’d win every one.” – Phil Hellmuth
I’d like to share my experiences at California States and Regionals, and how they relate to probability and statistics. Few people seem to really understand what statistics is, beyond some nebulous number crunching done by groups of nerds. In card games with a significant amount of luck involved, however, it is crucial to understand the laws of probability and their implications for the game.
The main theme of my article is to use statistics to answer these two questions – what do testing results mean? And, what does it take to win a tournament? You’ll see how I applied this insight to real life tournaments as well!
I had done some extensive testing for States, all detailed in my last article! Having not participated in many tournaments this season, this proved invaluable to understanding how to play specific matchups and what to test.
From my results and what I heard about the Cali metagame, I decided to go with Sablelock. I really think Vilegar was the play, but I hadn’t tested with it much and didn’t have the cards.
Statistics of Testing
If you test a matchup with 2 games, and you win one and lose one, then you would think that matchup is 50-50 (and given your results, this is the most likely). However, there is a probability it could be 45-55, right? And then there’s a smaller, but still significant chance that it could be 40-60, and still a large chance that it could be 10-90!
But surely, you’d have more confidence in your matchup results if you played 10 games – the question is, how much more? This uncertainty in testing results comprises almost the entire study of statistics, which you will certainly use if you go into a science or engineering field.
Despite having studied engineering (or perhaps because of it), I hate unnecessary math. Fortunately, the importance of statistics in everyday life has given us plenty of tools to figure out these things without any calculations.
If we believe (and this is an important assumption) that a deck has a certain intrinsic probability to win against another deck, i.e. the matchup percentages, then we can consider the “game” to be played as a “binomial experiment.” There are two outcomes: you either win or lose, and the chance of each of them happening can be denoted by a probability p.
Then, if you perform the binomial experiment (Pokemon testing) a certain number of times (which we will call “n”), people have figured out that a certain formula, the “binomial distribution,” will tell you how often you can expect to win a certain number of games.
Taken in the reverse direction, if we perform the binomial experiment a certain number of times (what statisticians call a “sample”), we can estimate p, the matchup percentage, as well as how sure we are about our answer!
In science, this is typically denoted by a 95% confidence interval. While I won’t discuss the esoteric specifics, let’s think about it like we are 95% sure that the real matchup percentage is within the defined interval. We aren’t as exacting as science, so I think it’ll be okay if we are only 80% sure.
By the law of statistics known as the Central Limit Theorem, a random sample taken from any distribution will form a normal distribution (aka Bell Curve) centered around the true “population” mean and spread out according to the population standard deviation (a measure of how spread out the data are) divided by the square root of the number of samples (in our case, testing games played).
Whew, that was a mouthful.
We invoke the Central Limit Theorem so that we can approximate our binomially-distributed samples as normally-distributed ones, for which a number of tools are available.
I’ll spare you the details, but the simplest way of calculating the confidence interval is to use the z statistic, in the equation: p = p_est +/- z*sqrt(p_est(1-p_est)/n), where p is the confidence interval, p_est is the testing result, z is the normal distribution z statistic, which is dependent on how much confidence you want, and n is the number of samples (games) you played.
For 80%, z is about 1.28. Plugging in this equation, if you win 7 of 10 games, you are 80% sure that your matchup is between 50-50 and 90-10. To be 80% sure your matchup is better than 60-40, you’d need to test for a whopping 30 games! So you can see easily how testing results may not always be what they seem.

















