Homework #1: Counting things with answers

The answers below are suggestions, and mostly haven't been checked carefully. Verbal answers and general discussion should be taken as a guide. You should not memorize them, especially not if you are poor in English. Rewrite them, in English if you can. (I've done it, for Japanese. I can assure you from experience that despite the large amount of effort, it's excellent practice, giving some of the best performance/cost ratios you'll find in language study.)

For computational problems, probably my answers are accurate, but again, you should not simply assume that. First you should confirm that you understand my calculations, and that you believe they are correct. Then you should check your own calculations to understand why your calculations differ from mine -- whether it was a mistake, a second solution to the same problem, or a calculation based on different assumptions.

The last case is the most interesting. That is the kind of situation that leads to an idea for research.

Problems

Problem 1

In class we showed that the distribution of sums of dots on a pair of dice is:

sum 2 3 4 5 6 7 8 9 10 11 12
frequency 1 2 3 4 5 6 5 4 3 2 1

Recall the state space of sums:

red/blue 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Count the number of combinations of the number of dots on a pair of dice that achieves a particular:

1. product

The state space of (red, blue) pairs and products:

red/blue 1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 4 6 8 10 12
3 3 6 9 12 15 18
4 4 8 12 16 20 24
5 5 10 15 20 25 30
6 6 12 18 24 30 36

The frequency of products:

sum 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
frequency 1 2 2 3 2 3 0 2 1 2 0 4 0 0 2 1 0 2
sum 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
frequency 0 2 0 0 0 2 1 0 0 0 0 2 0 0 0 0 0 1
1. difference

The state space of (red, blue) pairs and differences:

red/blue 1 2 3 4 5 6
1 0 -1 -2 -3 -4 -5
2 1 0 -1 -2 -3 -4
3 2 1 0 -1 -2 -3
4 3 2 1 0 -1 -2
5 4 3 2 1 0 -1
6 5 4 3 2 1 0

The frequency of differences:

difference -5 -4 -3 -2 -1 0 1 2 3 4 5
frequency 1 2 3 4 5 6 5 4 3 2 1
1. quotient

The state space of (red, blue) pairs and quotients:

red/blue 1 2 3 4 5 6
1 1 1/2 1/3 1/4 1/5 1/6
2 2 1 2/3 1/2 2/5 1/3
3 3 3/2 1 3/4 3/5 1/2
4 4 2 4/3 1 4/5 2/3
5 5 5/2 5/3 5/4 1 5/6
6 6 3 2 3/2 6/5 1

The frequency of quotients:

quotient 1/6 1/5 1/4 1/3 2/5 1/2 3/5 2/3 3/4 4/5 5/6 1
frequency 1 1 1 2 1 3 1 2 1 1 1 6
quotient 6/5 5/4 4/3 3/2 5/3 2 5/2 3 4 5 6
frequency 1 1 1 2 1 3 1 2 1 1 1

Discussion:

The distribution of ordered pairs is said to be uniform. For each pair, the frequency seen in a series of throws of the dice should be about the same. Theoretically, this occurs because each (ordered) pair occurs once in a list of pairs (not shown here -- usually displayed as a square table with six rows and six columns). The frequency of sums is nonuniform (different frequencies for different values). Theoretically this difference occurs because there are often multiple pairs of dice that result in the same sum and the number of appropriate pairs varies according to the sum specified.

Problem 2

Did you notice any interesting similarities or differences between the frequency distributions? Did you notice any interesting similarities or differences between the random variables (i.e., the function from the pairs of dice to the mathematical result)?

The obvious interesting fact is that the distributions of sums and differences are identical. This is not an accident. Look at the state space to value mappings (they are not quite a random variables, despite what I wrote before, because there is no probability here yet). Notice that the patterns are the same, with a tall ridge along the diagonal, except that the diagonals are perpendicular to each other.

The mathematical reason why this happens is that the operation of subtraction is "the same" as addition in the sense that we can transform the subtracted numbers y (in the distribution of differences x - y) into the added numbers z (in the distribution of sums x + z) by the formula z = 7 - y. However, although the set of numbers is the same, the order of numbers is reversed. If you're not a mathematician, that may not have much flavor.

But think about this: reversing the order of rows in the difference table doesn't change the distribution. Now add 7 to each number in the table, and compare to the sum table. It's the same.

Problem 3

Describe a model of "what is the gender of the first person to arrive in the classroom of 'Mathematics for Policy and Planning Science'" with an underlying set whose probabilities are uniform. What "model" means here is "What things do you count to determine the probability that the first person to arrive is female?"

One convenient underlying set is the set of students in the class. If we assume that the probability of each person arriving first is the same (symmetry of people in terms of class arrival), then we can just count the number of women in the class and divide by the number of students to get the probability that the first person to arrive is female.

Do you think this model actually describes the probability that the first person to arrive is female accurately? If so, why? If not, why not?

No, it's not very accurate. First, to be precise, I might arrive at the class first. But ignoring that was just a convenience. More important, the first person to arrive might be someone with no connection to the class itself, but who needs to talk to someone in the class for a few minutes. That would leave us with no good way to estimate probabilities. Second, even if we ignore non-students (that is, redefine the problem to be the gender of the first student to arrive), the symmetry assumption seems unlikely to hold. Student who had a class in the same room in 4th period are far more likely to arrive first (or simply remain in the room). Students who smoke who had a class in 4th period are likely to want to go somewhere to smoke between classes, so they are relatively unlikely to be first. And so on -- there are many personal characteristics that affect the likely timing of arrival.

Problem 4

Think of a practical or daily life application where you can explain observed rates of occurance with a model with an underlying uniform distribution, but a nonuniform distribution for the observed (or practically relevant) outcomes. What is the underlying set composed of? Why should its elements be uniformly distributed? What is the function relating the underlying things to observed outcomes?

It's actually quite difficult to come up with a practical application that matters where the underlying set is symmetrical unless it's either very abstract or deliberately constructed to generate equal probabilities (like a gambling game using dice, cards, or a wheel with equally-sized slots).

The best I could come up with is a pencil. It has six equally wide sides. So if you roll it, each side should come up with equal probability. But usually only two sides have imprinting on them, so the probability of "printed side up" is 1/3, while the probability of "blank side up" is 2/3.

Problem 5

(Optional) Are there other ways to combine the two numbers on a pair of dice into an interesting single result with non-uniform probability? (Note: I don't have a good answer in mind as of writing this question, so don't lose sleep trying to answer it!)

The best I could come up with is "Do the numbers match?" The probability of such a "doublet" is 6/36 = 1/6, and the probability of non-match is 5/6.

Problem 6

(Optional) Here's a brain-breaker for those of you who think you're good at math. The numbers on dice are all positive, and therefore it makes sense to take logarithms. There's a one-to-one relationship between numbers and their logarithms, so given a number we can find its logarithm, and it is the unique number with that logarithm. And given a number that is a logarithm there's a unique number it is the logarithm for.

Now, after taking logarithms, multiplication becomes addition. Therefore we might expect that there should be a one-to-one relationship between the distribution of sums of two dice and the distribution of products of two dice. That is wrong.

Explain why.

Basically the problem is that we haven't taken exponentials of the addends to get factors. The equivalence of multiplication to addition under the logarithmic transformation requires transforming both sides.

Thus, if instead of taking products of 1, 2, 3, 4, 5, and 6, where the equivalence doesn't work, we take products of 1, 2, 4, 8, 16, and 32, we do get an equivalent distribution. Check it!

Due: June 1, 2015 at 14:00.