by Charlton Rose
We’re pregnant, and we don’t know the gender of our baby (codenamed “Sharky 2.1”). However, a bag of coins, assembled by an accomplice, does. Inside the bag are 2 coins representing the correct gender and 1 coin representing the wrong gender.^{1} Each night, we draw a coin from the bag at random, learn its identity,^{2} and return it to the bag. As a result, each night, we have a slightly better clue about the gender of our child – but we may never be 100% sure.
We wondered what we can assert, statistically, about the probability of our child being a certain gender, based on our draw history. To satisfy this curiosity, I constructed the following derivation. This is our “gender estimation function.”
Consider the following events:
$B$ | = | It’s a boy! |
$G$ | = | It’s a girl! |
${S}_{b,g}$ | = | A random drawing of $b+g$ coins, with replacement, produces $b$ boy coins and $g$ girl coins. |
If we use the notation $P\left(X\right)$ to indicate the probability that event $X$ occurs, then we can express
$P\left(B\right)$ | = | the probability it's a boy |
$P\left(G\right)$ | = | the probability it's a girl |
$P\left({S}_{b,g}\right)$ | = | the probability that a random drawing of $b+g$ coins, with replacement, produces $b$ boy coins and $g$ girl coins |
Another useful notation, $P\left(X|Y\right)$ denotes the probability that $X$ occurs, given that $Y$ occurs. Thus, as we successively draw coins, we are interested in determining
$P\left(B|{S}_{b,g}\right)$ | = | the probability it's a boy, given that when we draw $b+g$ coins, we draw $b$ boy coins and $g$ girl coins |
Lest we cause a misunderstanding that we are boy-focused, we proclaim that we are also aware that
$$P\left(G|{S}_{b,g}\right)=1-P\left(B|{S}_{b,g}\right)$$based on a crazy notion^{3} that $P\left(B\right)+P\left(G\right)=1$.
A useful theorem, known as Bayes’ theorem, proposes that^{4}
$$P\left(X|Y\right)=\frac{P\left(Y|X\right)\xb7P\left(X\right)}{P\left(Y\right)}$$Thus, we can say
$$P\left(B|{S}_{b,g}\right)=\frac{P\left({S}_{b,g}|B\right)\xb7P\left(B\right)}{P\left({S}_{b,g}\right)}$$We’re going to tackle this formula using two somewhat tricky substitutions.
$P\left({S}_{b,g}\right)$ is difficult to evaluate directly, because we don’t know the distribution of coins in the purse. However, to our rescue comes the law of total probability, which tells us
$$P\left({S}_{b,g}\right)=P\left(B\right)\xb7P\left({S}_{b,g}|B\right)+P\left(G\right)\xb7P\left({S}_{b,g}|G\right)$$This is true as long as we know that $P\left(B\right)+P\left(G\right)=1$.^{5} Substituting, then, we can arrive at
$$P\left(B|{S}_{b,g}\right)=\frac{P\left({S}_{b,g}|B\right)\xb7P\left(B\right)}{P\left(B\right)\xb7P\left({S}_{b,g}|B\right)+P\left(G\right)\xb7P\left({S}_{b,g}|G\right)}$$Now, if we are willing to assume that $P\left(B\right)=P\left(G\right)$, we can simplify this to
$$P\left(B|{S}_{b,g}\right)=\frac{P\left({S}_{b,g}|B\right)\xb7P\left(B\right)}{P\left(B\right)\xb7P\left({S}_{b,g}|B\right)+P\left(B\right)\xb7P\left({S}_{b,g}|G\right)}=\frac{P\left({S}_{b,g}|B\right)}{P\left({S}_{b,g}|B\right)+P\left({S}_{b,g}|G\right)}$$Next, $P\left({S}_{b,g}|B\right)$ can be resolved by observing that ${S}_{b,g}$ is a binomial distribution.
A “binomial distribution is the discrete probability distribution of the number of successes in a sequence of $n$ independent yes/no experiments, each of which yields success with probability $p$.”^{6} The probability mass function for a binomial distribution is
$$P\left(X=k\right)=\left(\genfrac{}{}{0ex}{}{n}{k}\right){p}^{k}{\left(1-p\right)}^{n-k}$$where
$X$ | is the variable that follows a binomial distribution and indicates the number of successes, |
$k$ | is the exact number of successes sought, |
$n$ | is the number of trials, and |
$p$ | is the probability of success. |
The notation $\left(\genfrac{}{}{0ex}{}{n}{k}\right)$ is called the "binomial coefficient." It is read, "$n$ choose $k$," and is evaluated as $\frac{n!}{k!\left(n-k\right)!}$, but this detail won’t matter by the time we’re done.
In our application, we can define $B$ as the successful event,^{7} and then note that
Thus, we can infer that
$$P\left({S}_{b,g}|B\right)=\left(\genfrac{}{}{0ex}{}{b+g}{b}\right){\left(\frac{2}{3}\right)}^{b}{\left(1-\frac{2}{3}\right)}^{\left(b+g\right)-b}=\left(\genfrac{}{}{0ex}{}{b+g}{b}\right){\left(\frac{2}{3}\right)}^{b}{\left(\frac{1}{3}\right)}^{g}$$Using similar logic, we can also infer the other side of the coin (so to speak):
$$P\left({S}_{b,g}|G\right)=\left(\genfrac{}{}{0ex}{}{b+g}{b}\right){\left(\frac{1}{3}\right)}^{b}{\left(\frac{2}{3}\right)}^{g}$$Substituting both of these expressions into our current expression for $P\left(B|{S}_{b,g}\right)$, we get
$$P\left(B|{S}_{b,g}\right)=\frac{\left(\genfrac{}{}{0ex}{}{b+g}{b}\right){\left(\frac{2}{3}\right)}^{b}{\left(\frac{1}{3}\right)}^{g}}{\left(\genfrac{}{}{0ex}{}{b+g}{b}\right){\left(\frac{2}{3}\right)}^{b}{\left(\frac{1}{3}\right)}^{g}+\left(\genfrac{}{}{0ex}{}{b+g}{b}\right){\left(\frac{1}{3}\right)}^{b}{\left(\frac{2}{3}\right)}^{g}}$$By letting the binomial coefficient $\left(\genfrac{}{}{0ex}{}{b+g}{b}\right)$ cancel out, we can simplify this to
$$P\left(B|{S}_{b,g}\right)=\frac{{\left(\frac{2}{3}\right)}^{b}{\left(\frac{1}{3}\right)}^{g}}{{\left(\frac{2}{3}\right)}^{b}{\left(\frac{1}{3}\right)}^{g}+{\left(\frac{1}{3}\right)}^{b}{\left(\frac{2}{3}\right)}^{g}}$$Further simplification gives us
$$P\left(B|{S}_{b,g}\right)=\frac{{2}^{b}}{{2}^{b}+{2}^{g}}$$Now we have everything we need to determine gender probabilities based on our growing tally of observed boy coins and girl coins. As the total number of draws grows large, it is reasonable to expect that $b\approx 2g$ or $g\approx 2b$. When these ratios hold, the value of $P\left(B|{S}_{b,g}\right)$ asymptotically approaches 1.0 and 0.0, respectively – suggesting that a near perfect certainty about our baby’s gender will develop over time. Certainly, after 9 months of drawing, we’ll know for sure.
© 2013 Charlton Rose. All rights reserved.