UFM Statistics

Year 13 course of Further Statistics

Discrete random variables, including joint distributions and covariance

Showing 1-1 of 1 problems
1972 Paper 4 Q16
D: 1500.0 B: 1500.0

Two players play a dice game on a board marked with squares numbered 0 to 13. Each player has a counter that is initially on square 0 and they take turns to throw a six-sided die. A player's counter is not moved until he throws a six, when it moves to square 6. Thereafter, if it is on square \(m\) and he throws an \(n\), it advances to square \(m + n\) if \(m + n \leq 13\), and 'rebounds' to square \(26 - (m + n)\) if \(m + n > 13\). The winner is the player whose counter first reaches square 13. Find (i) the probability that the first player to throw is the first to move his counter; (ii) the probability that the loser's counter never leaves square 0.

Showing 1-5 of 5 problems
1972 Paper 2 Q8
D: 1500.0 B: 1500.0

Every morning I walk to the bus stop and must then decide whether to catch my journey on foot, taking 10 minutes, or wait for the bus which covers the same distance in 2 minutes. Since I have no watch I have no idea when the next bus will arrive, though I know that on this route buses run exactly 10 minutes apart. My observations suggest that other people arrive randomly at the bus stop in such a way that (since after the departure of a bus, the probability that there is no-one waiting is \(e^{-t/5}\)) that, if I adopt the strategy 'wait for the bus if there is already someone at the bus, walk if not', my mean journey time will be about 6 seconds less than if I always walk.

1975 Paper 2 Q6
D: 1500.0 B: 1500.0

A proof reader is checking galley-proofs. The number of misprints on a galley is random and has a Poisson distribution with mean \(\mu\). The probability that he detects any one misprint is \(p = 1 - q\), and his result with each misprint is independent of his results with the others. Show that the number of misprints detected (\(X\), say) and the number undetected (\(Y\), say) on a galley are independent random variables with Poisson distributions with means \(p\mu\) and \(q\mu\) respectively.

1977 Paper 2 Q10
D: 1500.0 B: 1500.0

Copies of a daily newspaper, which appears six times a week, are examined for misprints over a long period. It is discovered that the probability of there being one or more misprints in a given issue is \(\frac{1}{3}\). What is the most likely number of misprints in a week? What assumptions have you made? Find an expression for the probability that there will be fewer than the most likely number in a week. [\(\log_e (3/2)\) is approximately 0.4055.]

Show Solution
Assuming misprints are independent and occur at a constant average rate, then they are distributed as a Poisson random variable. The sum of Poisson random variables is Poisson, so the distribution of number of misprints in a week is \(Pois(2)\). The mode of a \(Pois(\lambda)\) is \(\lfloor \lambda \rfloor\), so the most likely is \(2\). Let \(X \sim Pois(2)\), then \begin{align*} \mathbb{P}(X < 2) &= \mathbb{P}(X = 0) + \mathbb{P}(X = 1) \\ &= e^{-2}\left ( \frac{2^0}{0!} + \frac{2^1}{1!}\right) \\ &= \frac{3}{e^2} \end{align*}
1978 Paper 2 Q7
D: 1500.0 B: 1500.0

The number of accidents occurring in a particular year on the M1 motorway has the Poisson distribution with mean \(\lambda_1\), while the number occurring on the M2 has the Poisson distribution with mean \(\lambda_2\). Assuming that the numbers of accidents occurring on different motorways are independent, prove that the total number of accidents on both motorways has the Poisson distribution with mean \(\lambda_1+\lambda_2\). Given that the total number of accidents on the two motorways is \(n\), find the probability that there were \(k\) accidents on the M1.

Show Solution
Suppose \(X_1 \sim Pois(\lambda_1), X_2 \sim Pois(\lambda_2)\) \begin{align*} \mathbb{P}(X_1+X_2 = n) &= \sum_{i=0}^n \mathbb{P}(X_1 = i, X_2 = n-i) \\ &= \sum_{i=0}^n \mathbb{P}(X_1 = i)\mathbb{P}(X_2 = n-i) \tag{assuming independent} \\ &= \sum_{i=0}^n e^{-\lambda_1} \frac{\lambda_1^i}{i!} e^{-\lambda_2}\frac{\lambda_2^{n-i}}{(n-i)!} \\ &= e^{-(\lambda_1+\lambda_2)} \sum_{i=0}^n \frac{\lambda_1^i\lambda_2^{n-i}}{i!(n-i)!} \\ &= e^{-(\lambda_1+\lambda_2)} \frac{1}{n!}\sum_{i=0}^n \frac{n!\lambda_1^i\lambda_2^{n-i}}{i!(n-i)!} \\ &= e^{-(\lambda_1+\lambda_2)} \frac{1}{n!}\sum_{i=0}^n \binom{n}{i}\lambda_1^i\lambda_2^{n-i} \\ &= e^{-(\lambda_1+\lambda_2)} \frac{1}{n!}(\lambda_1+\lambda_2)^n \end{align*} Therefore their sum has the same distribution as \(Pois(\lambda_1+\lambda_2)\). \begin{align*} \mathbb{P}(X_1 = k | X_1 + X_2 = n) &= \frac{\mathbb{P}(X_1 = k, X_1+X_2 = n)}{\mathbb{P}(X_1+X_2=n)} \\ &= \frac{e^{\lambda_1+\lambda_2}n! }{(\lambda_1+\lambda_2)^n} \mathbb{P}(X_1 = k, X_2 = n-k) \\ &= \frac{e^{\lambda_1+\lambda_2}n! }{(\lambda_1+\lambda_2)^n} \mathbb{P}(X_1 = k)\mathbb{P}(X_2 = n-k) \\ &= \frac{e^{\lambda_1+\lambda_2}n! }{(\lambda_1+\lambda_2)^n}e^{-\lambda_1} \frac{\lambda_1^k}{k!}e^{-\lambda_2}\frac{\lambda_2^{n-k}}{(n-k)!} \\ &= \binom{n}{k} \frac{\lambda_1^k\lambda_2^{n-k}}{(\lambda_1+\lambda_2)^n} \\ &= \binom{n}{k}p^k(1-p)^{n-k} \end{align*} Where \(p = \frac{\lambda_1}{\lambda_1+\lambda_2}\), ie it is distributed \(Binomial(n, \frac{\lambda_1}{\lambda_1+\lambda_2})\)
1980 Paper 2 Q8
D: 1500.0 B: 1500.0

A machine produces boiled sweets in large batches. Each batch is either satisfactory, and contains no sub-standard sweets, or defective, when a known proportion \(p\) of the sweets are tasteless. The cost of rejecting a defective batch immediately after production is \(K\); however, if a defective batch is not detected and reaches the customer, it costs \(MK\) to replace it and to recover lost goodwill, where \(M > 1\). The quality control officer decides to test each batch by removing, for tasting, a random number \(N\) of sweets selected from it at random, at a cost of \(c\) per sweet, where \(N\) has a Poisson distribution with mean \(\lambda\). A batch is rejected if any of the \(N\) sweets proves to be tasteless. Show that his chance of detecting a defective batch is \(1 - e^{-\lambda p}\). If the proportion of defective batches produced is known to be \(\alpha\), show that the expected running cost per batch is \(c\lambda + \alpha K[1 + (M-1)e^{-\lambda p}]\). Find the value of \(\lambda\) that minimizes the expected cost (a) if \(c < \alpha p K(M-1)\), and (b) if \(c \geq \alpha p K(M-1)\).

Showing 1-1 of 1 problems
1982 Paper 2 Q12
D: 1500.0 B: 1500.0

The number of messages to be sent by carrier pigeon during a week is a random variable whose distribution is Poisson with parameter \(\mu\). Each bird released with a message has probability \(p\) of evading predatory hawks and arriving at its destination. At the end of each week, all birds which arrived are returned safely to the starting point. Assuming no shortage of pigeons, find the distribution of the net loss of birds per week, and hence write down its mean and variance. Suppose \(\mu = 50\). What is the smallest number of pigeons that needs to be available in order that one may be 99\% confident that all messages to be sent during a week can be carried?

Showing 1-1 of 1 problems
1958 Paper 4 Q112
D: 1500.0 B: 1500.0

There are 50,000 shares in a lottery with 1000 prizes. If a syndicate buys 100 shares, write down an approximate expression for, and evaluate approximately, the chance that it wins four or more prizes. Find also the variance of the number of prizes that may be expected. [The approximation \((1 + 1/n)^{xn} = e^x\) for large \(n\) may be used.]

Showing 1-5 of 5 problems
1977 Paper 2 Q8
D: 1500.0 B: 1500.0

A coin which has the probability \(p\) of falling heads is tossed repeatedly until exactly \(k\) heads have been obtained. Show that the probability that this requires \(n\) tosses is \[\binom{n-1}{k-1}p^k(1-p)^{n-k} \quad (n = k, k+1, \ldots).\] Show that this probability is the coefficient of \(z^n\) in the expansion of \[\left(\frac{pz}{1-(1-p)z}\right)^k.\] By differentiating this series, or otherwise, deduce the mean of \(n\).

1972 Paper 4 Q10
D: 1500.0 B: 1500.0

Each of \(n\) men attending a dinner leaves his hat in the cloakroom and collects a hat when he departs. It may be assumed that each man's choice of hat end of the dinner is completely random. Let \(P_{n,k}\) be the probability that exactly \(k\) men end up wearing the right hat. Show that \(P_{n,k} = (k+1)P_{n-1, k-1}\), and deduce that if \[F_n(x) \text{ is defined to be } \sum_{k=0}^{n} P_{n,k}x^k, \text{ then}\] \[F_n(x) = \frac{d}{dx}F_{n+1}(x).\] Hence, or otherwise, show that \[P_{n,k} = \frac{1}{k!}\sum_{j=k}^{n}\frac{(-1)^{j-k}}{(j-k)!}.\]

1979 Paper 4 Q10
D: 1500.0 B: 1500.0

\(X\) is an integer-valued random variable, with distribution given by \[\text{Pr}[X = k] = \frac{c}{k \cdot 2^k}, \quad k \geq 1.\] Find the probability generating function of \(X\), and hence deduce the value of \(c\). A car insurance company observes that the number \(N\) of claims in any year is distributed as a Poisson random variable with mean \(\mu\), and that the sums of money paid out on the different claims are distributed, independently of \(N\) and of each other, in the same way as \(X\). By considering probability generating functions, or otherwise, find the mean and variance of the total sum \(S\) paid out per year. [Hint. If the probability generating function of \(S\), given that \(N = n\), is denoted by \(G_n(z)\), then the probability generating function of \(S\) is given by \(\sum_{n \geq 0} G_n(z) \cdot \text{Pr}[N = n]\).]

Show Solution
\begin{align*} G_X(z) &= \sum_{k=1}^{\infty} \text{Pr}[X = k]z^k \\ &= \sum_{k=1}^{\infty} \frac{c}{k \cdot 2^k}z^k \\ &= c\sum_{k=1}^{\infty} \frac1k \left ( \frac{z}{2} \right)^k \\ &= -c \ln \left (1 - \frac{z}{2} \right) \end{align*} \begin{align*} S = \end{align*} Since \(G_X(1) = 1\), we must have \(-c \ln \tfrac12 = 1 \Rightarrow c = \frac1{\ln 2}\)
1964 Paper 4 Q101
D: 1500.0 B: 1500.0

A bag contains a large number of red, white and blue dice in equal numbers. If \(n\) are drawn at random, show that the probability \(P(n,r)\) of drawing exactly \(r\) red dice is equal to the term containing \((\frac{1}{3})^r(\frac{2}{3})^{n-r}\) in the binomial expansion of \((\frac{1}{3} + \frac{2}{3})^n\). If \(r\) dice are thrown, find the probability \(Q(r,s)\) of throwing exactly \(s\) sixes. If \(n\) dice are drawn from the bag and the red dice drawn are thrown, show that the probability of throwing exactly \(s\) sixes is $$\sum_{t=0}^{n-s} P(n, s+t)Q(s+t, s)$$ and prove that this is equal to a term in a binomial expansion. Explain why a binomial distribution is obtained.

1953 Paper 2 Q301
D: 1500.0 B: 1500.0

A die marked with the numbers 1, \dots, 6 is thrown \(r\) times and the \(r\) numbers obtained are added. If the numbers 1, \dots, 6 are equally likely to be obtained on each throw show, by considering the coefficient of \(x^n\) in \((x+x^2+\dots+x^6)^r\), that the probability that the sum of the \(r\) numbers should be \(n\) is \[ \frac{1}{6^r}\left[ \frac{r(r+1)\dots(n-1)}{(n-r)!} - r\frac{r(r+1)\dots(n-7)}{(n-r-6)!} + \frac{r(r-1)}{2!}\frac{r(r+1)\dots(n-13)}{(n-r-12)!} - \dots \right], \] and hence find the probability that the total after four throws should be 14.

Showing 1-1 of 1 problems
1973 Paper 2 Q6
D: 1500.0 B: 1500.0

Suppose that the random variable \(X\) has cumulative distribution function \(F(x)\) (which is the probability that \(X\) is less than or equal to \(x\)) and probability density function \(f(x)\) (which is \(F'(x)\)). If \(F(x) = 1-e^{-\lambda x}\) for \(x \geq 0\), and \(F(x) = 0\) for \(x < 0\), find the mean and variance of \(X\). A business man awaits an order from each of \(n\) clients. The \(n\) orders are sent out simultaneously, and the times taken to reach the business man are independent random variables, each with density function \(\lambda e^{-\lambda x}\) (\(x \geq 0\)). The first order to be received will be dispatched free of charge to the client. How long should the business man expect to wait before dispatching the free order? [Hint: the minimum of \(n\) variables, \(X_1, ..., X_n\) is greater than \(x\) if and only if each of \(X_1, ..., X_n\) is greater than \(x\).] What chance has a particular client of getting his order free?

Showing 1-5 of 5 problems
1981 Paper 2 Q7
D: 1500.0 B: 1500.0

The lifetime in days, \(X\), of a safety component in a chemical plant is given by the negative exponential distribution \begin{align} P(X \leq t) = 1 - e^{-\lambda t} \text{ for } t \geq 0 \end{align} Find the mean lifetime of the component. The component is checked at 8 o'clock every morning, and if faulty is replaced immediately. Let \(Y\) be the length of time, in days, between the component failing and being replaced. Show that the probability that the component fails on the \(n\)th day and is replaced within \(24y\) hours, where \(0 \leq y \leq 1\), is \((e^{\lambda y} - 1)e^{-\lambda n}\) for \(n = 1, 2, ...\). Hence prove that \begin{align} P(Y \leq y) = \frac{e^{\lambda y} - 1}{e^{\lambda} - 1} \end{align} and calculate the mean of \(Y\).

1969 Paper 3 Q4
D: 1500.0 B: 1500.0

Initially a machine is in good running order but is subsequently liable to break down. As soon as a breakdown occurs repairs begin. If the machine is in good order at time \(t\) then the probability that a breakdown occurs in a small interval \((t, t + dt)\) is \(\alpha dt\), and if it is under repair at time \(t\) the probability that the repair is completed in time \((t, t + dt)\) is \(\beta dt\). Let \(p(t)\) be the probability that the machine is under repair at time \(t\). Write down an equation relating \(p(t + dt)\) to \(p(t)\) and hence show that \(p(t)\) is $$\frac{\alpha}{\alpha + \beta}\{1 - \exp[- (\alpha + \beta)t]\}.$$

Show Solution
Here we are assuming that the interval is small enough that there aren't multiple events happening. \begin{align*} && \underbrace{p(t+dt)}_{\text{P under repair at time \(t+dt\)}} &= \underbrace{p(t)}_{\text{P under repair at time \(t\)}}\underbrace{(1-\beta)dt}_{\text{P not fixed in time}} + \underbrace{(1-p(t))}_{\text{P not under repair}} \underbrace{\alpha dt}_{\text{P breaks}} \\ &&&= p(t)(1-\beta-\alpha)dt + \alpha dt \\ \Rightarrow && \frac{p(t+dt)-p(t)}{dt} &= -(\alpha+\beta)p(t) + \alpha \\ \Rightarrow && \frac{\d p}{\d t} &= -(\alpha+\beta)p(t) + \alpha \\ \Rightarrow && p(t) &= A\exp(-(\alpha+\beta)t) + B \\ \Rightarrow && 0 &= -(\alpha+\beta)B + \alpha \\ \Rightarrow && B &= \frac{\alpha}{\alpha+\beta} \\ \Rightarrow && 0 &= A + \frac{\alpha}{\alpha+\beta} \\ \Rightarrow && A &= - \frac{\alpha}{\alpha+\beta} \\ \Rightarrow && p(t) &= \frac{\alpha}{\alpha+\beta} \left ( 1 - \exp(-(\alpha+\beta)t)\right) \end{align*}
1978 Paper 3 Q10
D: 1500.0 B: 1500.0

Let the random variable \(X\) have the exponential distribution with parameter \(\lambda > 0\), that is \[P\{X \leq x\} = \begin{cases} 1-e^{-\lambda x}, & \text{if}~ x \geq 0,\\ 0, & \text{if}~ x < 0. \end{cases}\] Let \(Y\) be a random variable having the exponential distribution with parameter \(\mu\), and suppose that \(X\) and \(Y\) are independent. Find the distribution of min\((X, Y)\) and the probability that \(Y\) exceeds \(X\).

1953 Paper 2 Q404
D: 1500.0 B: 1500.0

Find the number of different arrangements of \(n\) different articles in \(m\) different pigeon-holes. An event happens irregularly but in the long run occurs once a year on an average. Show that the chance that it will not take place in a particular future year is \(1/e\).

1955 Paper 2 Q404
D: 1500.0 B: 1500.0

An event happens on an average once a year. Show that the chance it will not happen in any particular future year is \(1/e\), where \(e\) is the base of natural logarithms. Find also the chance of it happening twice in any particular future year. \subsubsection*{SECTION B}

Showing 1-6 of 6 problems
1974 Paper 2 Q8
D: 1500.0 B: 1500.0

Two independent random variables \(X\) and \(Y\) are each uniformly distributed between 0 and 2. Find the probability that \(X^m Y^n \leq 1\) in the cases (i) \(m = n = 1\), (ii) \(m = 2\), \(n = -1\).

1975 Paper 2 Q8
D: 1500.0 B: 1500.0

The two random variables \(U\) and \(V\) are independent and each is uniformly distributed on \((0, 1)\). The random variables \(X\) and \(Y\) are defined by \(X = \log_e(1/U)\), \(Y = \log_e(1/V)\). Prove that the probability that \(X + Y \leq z\) is \[\int_0^z te^{-t}dt \quad (z > 0).\]

1976 Paper 2 Q8
D: 1500.0 B: 1500.0

Let \(X_1, X_2, \ldots, X_n\) be independent random variables each uniformly distributed on the interval \((0,1)\). Find for \(0 < u < v < 1\) the probability of the event that the smallest of them is between 0 and \(u\) and the largest is between \(u\) and \(v\).

1974 Paper 4 Q11
D: 1500.0 B: 1500.0

Let \(X\) and \(Y\) be two discrete random variables with correlation coefficient \(\rho(X, Y)\). Prove that \(|\rho(X, Y)| \leq 1\). Prove also that, if \(X\) and \(Y\) are independent, then \(\rho(X, Y) = 0\). Show that the converse of the latter result is true if \(X\) and \(Y\) take only two values; and show by giving an example that it is not true in general.

1978 Paper 4 Q10
D: 1500.0 B: 1500.0

On the basis of an interview, the \(N\) candidates for admission to a college may be ranked in order of excellence. The candidates are interviewed in random order; that is, each possible ordering is equally likely.

  1. Given that the \(n\)th candidate interviewed is the best among the first \(n\) what is the probability that he is the best overall?
  2. Given that the \(n\)th candidate interviewed is the best among the first \(n\) what is the probability that he is the best or second-best overall?
  3. For \(n = 1, 2, \ldots, N\), let \(X_n\) denote the rank among the first \(n\) of the \(n\)th candidate interviewed. Prove that \(X_1, X_2, \ldots, X_N\) are independent random variables.

1966 Paper 3 Q12
D: 1500.0 B: 1500.0

A factory makes components in the form of a rectangle whose length is intended to be twice its breadth. There is, however, a random error with standard deviation 0.1\% in the lengths; similarly, the breadths are distributed independently about a certain value with standard deviation 0.1\%. Find the percentage standard deviations of the perimeters and of the areas of the components produced.

Showing 1-2 of 2 problems
1970 Paper 3 Q1
D: 1500.0 B: 1500.0

A computer data tape is prepared with the numbers $$n, x_1, y_1, x_2, y_2, \ldots, x_n, y_n,$$ where \((x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\) are pairs of observations of two variables \(X\) and \(Y\). Write a program in any standard language, or draw a flow diagram for such a program, which will read in the data, and print out the mean and variance of \(X\) and of \(Y\) and the (product-moment) correlation coefficient of \(X\) and \(Y\).

1917 Paper 1 Q107
D: 1500.0 B: 1500.0

The following readings connect the candle power and voltage of an incandescent lamp.

\begin{tabular}{l|c|c|c|c} Candle power & 20.68 & 23.24 & 26.00 & 28.96 \\ \hline Voltage & 94 & 98 & 102 & 106 \end{tabular}
Assuming that the candle power varies as the \(n\)th power of the voltage, find \(n\). Do the numbers given justify the assumption?

No problems in this section yet.

Showing 1-3 of 3 problems
1978 Paper 4 Q11
D: 1500.0 B: 1500.0

A basket contains \(N\) eggs, a proportion \(P\) of which are rotten. It is decided to estimate \(P\) by \(R/n\), where \(R\) is the number of rotten eggs in a sample of \(n\) eggs chosen randomly from the basket. Prove that the mean of this estimate is \(P\) and its variance is \((N-n)P(1-P)/n(N-1)\).

1980 Paper 4 Q11
D: 1500.0 B: 1500.0

Martians come in two colours, blue and green, the proportion of blue Martians in the (effectively infinite) population being \(p\). A psychologist, interested in assessing the mean IQ among Martians, decides to take a random sample of size 8 from the population, and to use the average \(I\) of their IQ's as an estimate of the overall mean in the population. His eager young assistant, however, suggests that it may be better to sample \(S_1\) and \(S_2\) of blue and green Martians, \(S_1 + S_2 = 8\), and to use \begin{align*} J = pI_1 + (1-p)I_2 \end{align*} as the estimate of the overall mean IQ, where \(I_1\) and \(I_2\) are the averages of the IQ's in the blue and the green samples, respectively. Show that if the IQ of blue Martians has mean \(\mu_1\) and variance \(\sigma_1^2\), and that of green Martians has mean \(\mu_2\) and variance \(\sigma_2^2\), then both \(I\) and \(J\) are unbiased estimators of the mean IQ in the population; and that, for the best choice of the ratio \(S_1: S_2\) (to be determined), the variance of \(J\) is less than that of \(I\), unless \(\mu_1 = \mu_2\).

1917 Paper 1 Q109
D: 1500.0 B: 1500.0

A steel bar with rectangular faces has diagonal lines drawn on one of its faces, dimensions 9" by 2", and is subjected to a tension of 4 tons per square inch of section. Find the change of angle between the lines, having given that Young's modulus is 13000 tons per square inch, and that Poisson's ratio is 0.25. (Poisson's ratio is the ratio of the numerical magnitude of the lateral strain to that of the longitudinal strain.)

Showing 1-3 of 3 problems
1981 Paper 2 Q8
D: 1500.0 B: 1500.0

Let \(X_1, ..., X_m\) be independent normally distributed random variables, with mean \(\mu\) and variance \(\sigma^2\). Let \(X > 0\), and let \(Y\) be the number of observations falling in the range \((a-X, \mu+X)\). Give an expression for \(P(Y = r)\) for \(r = 0, 1, ..., m\). If \(\alpha = \frac{1}{2}\) and \(m = 10\), what is \(P(Y \leq 2)\)? (You may leave your answer in a form suitable for calculation.)

1984 Paper 2 Q9
D: 1500.0 B: 1500.0

Let \(X_1, X_2, ...\) be independent random variables uniformly distributed on \([1, 2]\). Show that \[\Pr(a < (X_1X_2 ... X_n)^{1/n} < b) \to 1 \text{ as } n \to \infty\] if and only if \(a < 4/e < b\). You may use any results from probability theory that you know.

1980 Paper 4 Q10
D: 1500.0 B: 1500.0

A businessman puts money into one deal each year. At the end of each year, the deal has either fallen through, in which case he loses his entire outlay, or has been successful, in which case he recovers twice his outlay. The outcomes of different deals are independent, each one having a chance \(p > \frac{1}{2}\) of success, and the businessman can choose how much money he wishes to risk each year, within the limits of his current fortune. A possible strategy for the businessman is to put his entire current fortune into each deal. If his initial fortune is \(X_0\), show that his fortune \(X_n\) after \(n\) years' dealing has expectation \((2p)^nX_0\), but that he is certain to lose all his money eventually. A more conservative strategy is for him to invest a fixed proportion \(s\) of his current fortune in each deal, where \(0 < s < 1\). Show that the expected value of \(X_n\) is now only \([1+s(2p-1)]^nX_0\), but that, by choosing \(s\) suitably, he can guarantee an eventual compound growth rate of \(\alpha = 2p^p(1-p)^{1-p}\), in the sense that \(\frac{1}{n} \log_e (X_n/X_0)\) tends to \(\log_e \alpha\) as \(n \to \infty\). [Hint. Show that, with an appropriate definition of \(Y_n\), \(X_{n+1}\) can be expressed in the form \(X_{n+1} = X_n (1+s)^{Y_n}(1-s)^{1-Y_n}\). You may assume the 'law of large numbers' in the form: if \(Y_1, Y_2, \ldots\) are independent and identically distributed random variables with mean \(\beta\), then \(\frac{1}{n} \sum_{j=1}^n Y_j \to \beta\) as \(n \to \infty\).]

Showing 1-6 of 6 problems
1973 Paper 3 Q10
D: 1500.0 B: 1500.0

(i) The real numbers \(a_1, \ldots, a_n\) satisfy the constraint \begin{equation*} \sum_{i=1}^{n} a_i = C, \end{equation*} where \(C\) is a given constant. Show that \(\sum_{i=1}^{n} a_i^2\) is minimised subject to (*) by \(a_i = C/n\) for \(i = 1, \ldots, n\). (ii) In an experiment to determine the mean body-weight \(\mu\) of a species of moth, \(n\) moths of this species are weighed, and their weights \(x_1, \ldots, x_n\) recorded. It may be assumed that \(x_1,\ldots, x_n\) are uncorrelated and have common mean \(\mu\) and common variance \(\sigma^2\), where \(\sigma^2\) is known. We wish to find the best linear unbiased estimator of \(\mu\), that is the function \(\sum_{i=1}^{n} a_i x_i\) which has expectation \(\mu\) and smallest variance. Assuming (i), find the appropriate values of the set \(\{a_i\}\), and find the variance of the best linear unbiased estimator.

1981 Paper 3 Q9
D: 1500.0 B: 1500.0

Independent random variables \(X_1, \ldots, X_n\) have a uniform distribution on the interval \([\theta-\frac{1}{2}, \theta+\frac{1}{2}]\), where \(\theta\) is unknown. Let \[V = \max\{X_1, \ldots, X_n\}, \quad U = \frac{1}{n}\sum_{i=1}^{n} X_i.\] Show that if \(\theta = \theta_0\), \[P(V \leq x) = \left (x-\theta_0+\frac{1}{2} \right)^n \quad \text{for } \theta_0-\frac{1}{2} \leq x \leq \theta_0+\frac{1}{2}.\] Calculate the mean and variance of \(V\) and \(U\). If \(W = V - c\), where \(c\) is chosen so that \(EW = \theta_0\), compare \(\text{var}\ U\) and \(\text{var}\ W\). Suppose that \(n\) is large. Which of \(U\), \(W\) would you use if you wished to estimate \(\theta\), and why?

Show Solution
\begin{align*} P(V \leq x) &= P(\max(X_1, \ldots, X_n) \leq x) \\ &= P(X_1 \leq x) \cdots P(X_n \leq x) \\ &= \frac{\left ( x - \left ( \theta_0 - \frac12\right) \right)}{1} \cdots \frac{\left ( x - \left ( \theta_0 - \frac12\right) \right)}{1} \\ &= \left ( x - \left ( \theta_0 - \frac12\right) \right)^n \end{align*} \begin{align*} \mathbb{E}(U) &= \mathbb{E} \left ( \frac{1}{n}\sum_{i=1}^{n} X_i\right) \\ &= \frac{1}{n}\sum_{i=1}^{n} \mathbb{E} \left ( X_i\right) \\ &=\frac{1}{n}\sum_{i=1}^{n}\theta_0 \\ &= \theta_0 \\ \textrm{Var}(U) &= \textrm{Var}\left ( \frac{1}{n}\sum_{i=1}^{n} X_i\right) \\ \\ &= \frac1{n^2} \sum_{i=1}^{n} \textrm{Var}\left ( X_i\right) \\ &= \frac{1}{n^2} \frac{n}{12} \\ &= \frac{1}{12n} \\ \\ \mathbb{E}(V) &= \int_{\theta_0-\frac12}^{\theta_0+\frac12}xn \left ( x - \left ( \theta_0 - \frac12\right) \right)^{n-1} \d x \\ &= n\int_0^1 \left (u + \left ( \theta_0 - \frac12\right) \right)u^{n-1} \d u \\ &= n \left [ \frac{u^{n+1}}{n+1} + \left ( \theta_0 - \frac12\right) \frac{u^{n}}{n}\right]_0^1 \\ &= \frac{n}{n+1} + \theta_0 - \frac12 \\ \textrm{Var}(V) &= \mathbb{E}(V^2) - \left [ \mathbb{E}(V) \right]^2 \\ &= \int_{\theta_0-\frac12}^{\theta_0+\frac12}x^2n \left ( x - \left ( \theta_0 - \frac12\right) \right)^{n-1} \d x - \left (\frac{n}{n+1} + \theta_0 - \frac12 \right)^2 \\ &= n\int_0^1 \left (u + \theta_0 - \frac12 \right)^2u^{n-1} \d u - \left (\frac{n}{n+1} + \theta_0 - \frac12 \right)^2 \\ &= n \left [ \frac{u^{n+2}}{n+2} + (2 \theta_0 - 1) \frac{u^{n+1}}{n+1} + \left ( \theta_0 - \frac12 \right)^2 \frac{u^n}{n}\right]_0^1 - \left (\frac{n}{n+1} + \theta_0 - \frac12 \right)^2 \\ &= \frac{n}{n+2} + \frac{n}{n+1} \left ( 2\theta_0 - 1 \right) + \left ( \theta_0 - \frac12 \right)^2 - \left (\frac{n}{n+1} + \theta_0 - \frac12 \right)^2 \\ &= \frac{n}{n+2} - \frac{n^2}{(n+1)^2} \\ &= \frac{n(n+1)^2-n^2(n+2)}{(n+2)(n+1)^2} \\ &= \frac{n^3+2n^2+n - n^3-2n^2}{(n+2)(n+1)^2} \\ &= \frac{n}{(n+2)(n+1)^2} \end{align*} Therefore the variance of \(V\) is much smaller than \(U\) for large \(n\), so we should choose that.
1970 Paper 4 Q9
D: 1500.0 B: 1500.0

If \(x_1, x_2, \ldots, x_n\) is a random sample from the uniform distribution with density function \(f(x) = 1/\theta\), \(0 < x < \theta\), where \(\theta\) is an unknown parameter:

  1. [(i)] find the maximum likelihood estimate \(\hat{\theta}\) of \(\theta\),
  2. [(ii)] find the density function of \(\hat{\theta}\) and hence find its mean.
[The maximum likelihood estimate of a parameter \(\theta\) based on a random sample \(x_1, x_2, \ldots, x_n\) from a distribution with density function \(f(x, \theta)\) is that value of \(\theta\) which maximizes the likelihood function \(f(x_1, \theta)f(x_2, \theta) \ldots f(x_n, \theta)\).]

Show Solution
  1. \begin{align*} L(\theta) &= f(x_1, \theta)f(x_2, \theta) \ldots f(x_n, \theta) \\ &= \begin{cases} \frac{1}{\theta^n} & \text{ if } \theta \geq \max(x_1, x_2, \ldots) \\ 0 & \text{otherwise} \end{cases} \end{align*} which is clearly maximised if \(\theta = \max(x_1, x_2, \ldots)\)
  2. \begin{align*} F_{\hat{\theta}} (t) &= P(\hat{\theta} < t) \\ &= P(\max(x_1, \ldots) < t) \\ &= P(x_1 < t)P(x_2 < t) \cdots P(x_n < t) \\ &= \left(\frac{t}{\theta}\right)^n \\ &= \frac{t^n}{\theta^n} \end{align*} Therefore \(f_{\hat{\theta}}(t) = n\theta^{-n}t^{n-1}\). \begin{align*} \mathbb{E}(\hat{\theta}) &= \int_0^{\theta} tf_{\hat{\theta}}(t) \d t \\ &= \int_0^{\theta} n\theta^{-n}t^{n} \d t \\ &= \frac{n}{n+1} \theta \end{align*}
1973 Paper 4 Q10
D: 1500.0 B: 1500.0

\(X\) and \(Y\) are discrete valued random variables, and \[\text{Pr}(X = x, Y = y) = p(x, y), \quad \text{say}.\] The expectation of \(X\) conditional on the value of \(Y\) being \(y\) is defined as \(\mu(y)\), where \[\mu(y) = E(X|Y = y) = \sum_x x \frac{p(x, y)}{b(y)},\] and \[b(y) = \text{Pr}(Y = y),\] so that \[b(y) = \sum_x p(x, y).\] Show that \(E(X) = \sum\mu(y)b(y)\). By taking \(Z = X^2\), find an expression for the variance of \(X\) in terms of \(E(X|Y = y)\) and \(E(X^2|Y = y)\). An ornithologist observes that the number of eggs laid by a sparrow in a nest is distributed approximately as a Poisson random variable with mean \(\lambda\). He suspects that any egg has the same probability \(p\) of hatching, and that they are independent with respect to hatching. Denote by \(X\) the number of fledgelings from a nest, and denote by \(Y\) the number of eggs laid in that nest. Find expressions for \[E(X|Y = y) \quad \text{and} \quad E(X^2|Y = y)\] and hence find the (unconditional) mean and variance of \(X\). A second ornithologist contests that the eggs in a nest are not independent with respect to hatching. He suspects that either, with probability \(\pi\), the whole clutch of eggs hatches, or, with the probability \(1-\pi\), none of the clutch hatches. What are the mean and variance of \(X\) with this model? If you looked at a large sample of sparrows' nests, and found that the mean number of fledgelings per nest was 4, and the sample variance was 12, which ornithologist would you take to be more expert?

1975 Paper 4 Q11
D: 1500.0 B: 1500.0

The random variables \(X_1, X_2, \ldots, X_n\) are independent and have identical probability distributions. The function \(\phi\) of \(n\) arguments is such that \(\phi(X_1, X_2, \ldots, X_n)\) has expectation \(\mu\) and variance \(\sigma^2\). Furthermore, \(\phi\) is not symmetric, so that there is at least one pair of suffixes \((i, j)\) such that with positive probability \[\phi(X_1, \ldots, X_i, \ldots, X_j, \ldots, X_n) \neq \phi(X_1, \ldots, X_j, \ldots, X_i, \ldots, X_n).\] The symmetrisation \(\psi\) of \(\phi\) is defined by \[\psi(X_1, \ldots, X_n) = \frac{1}{n!}\sum\phi(X_{i_1}, \ldots, X_{i_n})\] where the summation is over all \(n!\) permutations \((i_1, i_2, \ldots, i_n)\) of \((1, 2, \ldots, n)\). Prove that \(\psi(X_1, X_2, \ldots, X_n)\) has expectation \(\mu\) but variance less than \(\sigma^2\). [A simpler version, using exactly the same strategy of proof, has \(n = 2\).]

Show Solution
\begin{align*} && \mathbb{E}\left (\psi(X_1, \ldots, X_n) \right) &=\mathbb{E}\left (\frac{1}{n!}\sum\phi(X_{i_1}, \ldots, X_{i_n}) \right) \\ &&&=\frac{1}{n!}\sum\mathbb{E}\left (\phi(X_{i_1}, \ldots, X_{i_n}) \right) \\ &&&=\frac{1}{n!}\sum\mu \\ &&&= \mu \end{align*} \begin{align*} && \textrm{Var}\left (\psi(X_1, \ldots, X_n) \right) &=\mathbb{E}\left (\frac{1}{n!}\sum\phi(X_{i_1}, \ldots, X_{i_n}) \right)^2 - \mu^2 \\ &&&=\frac{1}{(n!)^2}\left ( \sum\mathbb{E}\left (\phi(X_{i_1}, \ldots, X_{i_n}) ^2\right) + \sum\mathbb{E}\left (\phi(X_{i_1}, \ldots, X_{i_n}) \phi(X_{j_1}, \ldots, X_{j_n}) \right) \right) -\mu^2\\ &&&=\frac{1}{n!} (\sigma^2+\mu^2) + \sum\mathbb{E}\left (\phi(X_{i_1}, \ldots, X_{i_n}) \phi(X_{j_1}, \ldots, X_{j_n}) \right) -\mu^2 \\ &&&\leq \frac{1}{n!} (\sigma^2+\mu^2) + \sum \sqrt{\mathbb{E}\left (\phi(X_{i_1}, \ldots, X_{i_n})^2 \right)\mathbb{E}\left ( \phi(X_{j_1}, \ldots, X_{j_n}) \right)^2} -\mu^2 \\ &&&=\frac{1}{n!} (\sigma^2+\mu^2) + \sum \sqrt{(\mu^2+\sigma^2)^2} -\mu^2 \\ &&&= \sigma^2+\mu^2-\mu^2 \\ &&&= \sigma^2 \end{align*} But since for some value the two variables inside C-S are not the same, the inequality is strict.
1976 Paper 4 Q11
D: 1500.0 B: 1500.0

A population contains individuals of \(k\) types, in equal proportions. Among type \(i\), a quantity \(X\) is distributed with mean \(\mu_i\) and variance \(\sigma^2\) (the same for all \(i\)), for \(i = 1, 2, ..., k\). It is desired to estimate the mean of \(X\) over the whole population. Two methods of estimation are considered. In the first a random sample of size \(n\) (with replacement) is drawn from each of the \(k\) types, and in the second a random sample of size \(kn\) is drawn (with replacement) from the whole population without regard to type. In each case the mean of the \(kn\) \(X\)-values is computed. Show that the expectation of the resulting estimate is in each case \[\mu = \frac{1}{k}\sum_{i=1}^k \mu_i,\] but that the second estimate has variance greater than that of the first by an amount \[\frac{1}{k^2n}\sum_{i=1}^k (\mu_i-\mu)^2.\]

Show Solution
Let \(X_{i,j} \sim N(\mu_i, \sigma^2)\) be iid. Let \(Y_i\) be a sample from the second distribution, In the first case: \begin{align*} && \mathbb{E}\left ( \frac{1}{kn} \sum_{i=1}^k \sum_{j=1}^n X_{i,j} \right) &=\frac{1}{kn} \sum_{i=1}^k \sum_{j=1}^n\mathbb{E}\left ( X_{i,j} \right) \\ &&&= \frac{1}{kn} \sum_{i=1}^k \sum_{j=1}^n\mu_i \\ &&&= \frac{1}{kn} \sum_{i=1}^k n\mu_i \\ &&&= \frac{1}{k}\sum_{i=1}^k \mu_i \end{align*} In the second case: \begin{align*} && \mathbb{E}\left ( \frac{1}{kn} \sum_{i=1}^{kn} Y_i \right) &=\frac{1}{kn} \sum_{i=1}^{kn} \mathbb{E} \left (Y_i \right) \\ &&&=\frac{1}{kn} \sum_{i=1}^{kn} \mathbb{E} \left (Y_i |Y_i \text{ is of type }T\right) \\ &&&=\frac{1}{kn} \sum_{i=1}^{kn} \mathbb{E} \left (\mu_T\right) \\ &&&= \frac{1}{kn} \sum_{i=1}^{kn} \frac1k \left ( \sum_{j=1}^{k}\mu_j\right) \\ &&&= \frac1k \sum_{j=1}^{k}\mu_j \\ \end{align*} so they are equal. \begin{align*} && \textrm{Var}\left ( \frac{1}{kn} \sum_{i=1}^k \sum_{j=1}^n X_{i,j} \right) &= \frac{1}{k^2n^2} \sum_{i=1}^k \sum_{j=1}^n \textrm{Var}\left ( X_{i,j} \right) \\ &&&= \frac{1}{k^2n^2} \sum_{i=1}^k \sum_{j=1}^n \sigma^2 \\ &&&= \frac{\sigma^2}{kn} \end{align*} \begin{align*} && \textrm{Var}\left ( \frac{1}{kn} \sum_{i=1}^{kn} \sum_{j=1}^n Y_i \right) &= \frac{1}{k^2n^2}\sum_{i=1}^{kn} \textrm{Var}\left ( Y_i \right) \\ &&&= \frac{1}{k^2n^2}\sum_{i=1}^{kn} \mathbb{E}\left (Y_i^2 \right)-\frac1{kn}\mu^2 \\ &&&= \frac{1}{k^2n^2}\sum_{i=1}^{kn} \mathbb{E} \left ( \mathbb{E}\left (Y_i^2 | Y_i \text{ is of type }T \right)\right)-\frac1{kn}\mu^2 \\ &&&= \frac{1}{kn}\sum_{j=1}^{k} \frac{1}{k} \left (\sigma^2+\mu_j^2 \right)-\frac1{kn}\mu^2 \\ &&&= \frac{1}{kn}\sum_{j=1}^{k} \frac{1}{k} \left (\sigma^2+\mu_j^2 \right)-\frac1{kn}\mu^2 \\ &&&= \frac{\sigma^2}{kn}+\frac{1}{k^2n}\sum_{j=1}^k\mu_j^2-\frac1{kn}\mu^2 \\ &&&= \frac{\sigma^2}{kn}-\frac{1}{k^2n}\sum_{j=1}^k(\mu^2-\mu_j^2) \\ \end{align*} \begin{align*} \sum_{i=1}^k (\mu_i-\mu)^2 &= \sum_{i=1}^k \mu_i^2-2\sum_{i=1}^k\mu_i\mu+k\mu^2 \\ &= \sum_{i=1}^k \mu_i^2-2k\mu^2+k\mu^2 \\ &= \sum_{i=1}^k \mu_i^2-k\mu^2 \\ &= \sum_{i=1}^k \left(\mu_i^2-\mu^2\right) \end{align*} as required.

Showing 1-17 of 17 problems
1971 Paper 2 Q6
D: 1500.0 B: 1500.0

A hospital buys batches of a certain tablet from a pharmaceutical company. A tablet is considered unsatisfactory if it contains more than 1 microgram of arsenic. It is known that within any batch of tablets the arsenic content is normally distributed with standard deviation 0.05 micrograms about a mean which depends on the batch. From every batch the hospital randomly selects \(n\) tablets for analysis, and rejects the batch if the mean arsenic content of the \(n\) tablets is greater than \(C\). What values should be chosen for \(n\) and \(C\) if the desired chances of rejecting batches with 0.1\% and 1\% of defective tablets, respectively, are 20\% and 90\%?

1972 Paper 2 Q15
D: 1500.0 B: 1500.0

A firm needs to buy a large number of metal links which must stand a load of 1.20 tons weight. There are two grades on the market, the load under which a link will break being in each case normally distributed with parameters as follows:

\begin{tabular}{l c c} & Mean & Standard deviation\\ \hline Grade 1 & 1.20 & 0.10\\ Grade 2 & 1.60 & 0.20 \end{tabular}
Grade 1 cost 80p each, grade 2 cost £1.00 each, and it costs the firm £10.00 in damaged equipment each time a link breaks. Which grade should the firm buy? Grade 3 now comes on the market at 95p. The critical load is again normally distributed with standard deviation 0.10, but the mean is unknown. Testing to destruction ten of these, chosen randomly, gives figures for the critical load:
1.17 \quad 1.29 \quad 1.26 \quad 1.31 \quad 1.55 \quad 1.36 \quad 1.42 \quad 1.13 \quad 1.32 \quad 1.29
Making use of a suitable 95\% confidence limit, what advice would you give the firm as to whether or not they should buy the new grade?

1973 Paper 2 Q9
D: 1500.0 B: 1500.0

An anthropologist encounters a large group of savages in the jungle. He knows that either they all come from tribe \(A\) or they all come from tribe \(B\). In both cases their heights are independently distributed; if they are from \(A\) then the heights are normal with mean \(\mu_A = 60\) inches and standard deviation \(\sigma = 5\) inches; if they are from \(B\) the heights are normal with mean \(\mu_B = 66\) inches, and standard deviation \(\sigma = 5\) inches. In order to decide to which tribe they belong, the anthropologist uses a rule of the following form. He assigns them to \(A\) if \(\overline{x}_n < \xi\), and otherwise to \(B\), where \(\overline{x}_n\) is the mean of the heights of \(n\) savages. Show how he should choose \(\xi\) in order that \(\alpha\), the probability of wrongly assigning them to \(B\), is 0.05. Find the corresponding value of \(\beta\), the probability of wrongly assigning them to \(A\), and find how large \(n\) should be in order that \(\beta\) is 0.01 or less. [You may assume that \(\overline{x}_n\) has a normal distribution, whose mean depends on whether the savages are from \(A\) or \(B\).]

1974 Paper 2 Q9
D: 1500.0 B: 1500.0

In a sample of 50 male undergraduates at Cambridge in 1900 the mean height was found to be 68.93 in. In a sample of 25 male undergraduates at Cambridge in 1974 the mean height was 70.66 in. It may be assumed that the heights of male undergraduates are always normally distributed with a standard deviation of 2.5 in. Is it reasonable to suppose that there has been an increase in the average height of male undergraduates at Cambridge over the past 74 years? Explain carefully the reasoning you use.

1975 Paper 2 Q9
D: 1500.0 B: 1500.0

An entomologist measures the lengths of 8 specimens of each of two closely related species of bees. His measurements of species \(A\) and of species \(B\) have mean values 15 mm and 17 mm respectively. If he believes that in each species length is normally distributed with standard deviation 2 mm, should he conclude that the mean lengths of the two species differ? What procedure should he use if he does not know the standard deviation (though still believing it to be the same for both species)?

1978 Paper 2 Q8
D: 1500.0 B: 1500.0

A tug-of-war contest is to be held between two colleges. The weights of students in College \(A\) follow a normal distribution with mean 140 lb and standard deviation 8 lb. Thanks to the superiority of its kitchens, the weights of students in College \(B\) follow a normal distribution with mean 150 lb and standard deviation 6 lb. Teams are chosen by selecting \(n\) students at random from each college. How large must \(n\) be in order to ensure that with probability at least 0.9 the combined weight of the College \(B\) team exceeds that of the College \(A\) team by at least 50 lb?

1979 Paper 2 Q8
D: 1500.0 B: 1500.0

An experiment was conducted to investigate the effect of a new fertilizer on the yield of tomato plants. Ten plants were grown using the new fertilizer, and ten using the one previously recommended, giving yields (in kg): New \(1.5\) \(1.9\) \(1.7\) \(1.8\) \(1.5\) \(2.0\) \(2.0\) \(1.8\) \(1.9\) \(1.8\) Old \(1.4\) \(1.3\) \(1.3\) \(1.5\) \(1.8\) \(1.3\) \(1.1\) \(1.3\) \(1.4\) \(1.6\) Assuming that the yields are normally and independently distributed, with means \(\mu_N\) for plants having the new fertilizer and \(\mu_0\) for those having the old one, and with standard deviation 0.3 kg whichever fertilizer was used, test whether or not there is evidence that the new fertilizer is an improvement on the old one. How would you estimate the standard deviation of the yield of a tomato plant if it was not known to be 0.3 kg?

1980 Paper 2 Q7
D: 1500.0 B: 1500.0

The average weight in grams of the contents of a sachet of instant mashed potato varies between batches, because of the variable quality of the synthetic feedstock. Within a given batch, the weights of the sachets are independently and normally distributed, with common unknown mean \(m\) and standard deviation \(0 \cdot 1\). In order to check the weight of a given batch, the manufacturer weighs the contents of 25 sachets, obtaining an average weight of \(4 \cdot 92\). Does this give him good grounds for rejecting the hypothesis that \(m\) is really 5? He now decides upon the policy of rejecting a batch whenever the average weight of a sample of \(N\) sachets falls below \(T\). If \(N\) and \(T\) are to be chosen so that the probabilities of wrongly rejecting a batch with \(m = 5\) and of wrongly accepting a batch with \(m = 4 \cdot 95\) are both less than \(0 \cdot 05\), what values would you choose to make \(N\) as small as possible?

1967 Paper 3 Q12
D: 1500.0 B: 1500.0

A manufacturer is asked to supply steel tubing in lengths of 10 feet. Several samples are obtained from him and the mean lengths in feet of four samples each of 16 tubes found to be as follows: $$10 \cdot 16; \quad 10 \cdot 38; \quad 10 \cdot 31; \quad 10 \cdot 07.$$ What type of distribution would you expect mean lengths such as these to have and why? Samples are also obtained from another source, and in this case the mean lengths in feet of five samples of 16 tubes are found to be as follows: $$10 \cdot 15; \quad 10 \cdot 36; \quad 10 \cdot 11; \quad 10 \cdot 11; \quad 10 \cdot 07.$$ Assuming that both manufacturers produce tubing whose length has a standard deviation of \(0 \cdot 48\) feet, is there any evidence that either manufacturer's tubing has a mean length greater than \(10 \cdot 1\) feet? Is there any evidence that tubes supplied by the two manufacturers differ in mean length? [Let $$\Phi(X) = \int_{-\infty}^x \phi(x)dx,$$ where \(\phi(x) = (2\pi)^{-1}\exp(-\frac{1}{2}x^2)\). Then \(\Phi(-2 \cdot 58) = 0 \cdot 005\), \(\Phi(-2 \cdot 33) = 0 \cdot 01\), \(\Phi(-1 \cdot 96) = 0 \cdot 025\), \(\Phi(-1 \cdot 64) = 0 \cdot 05\).]

1968 Paper 3 Q4
D: 1500.0 B: 1500.0

For a certain mass-produced item the time that a randomly chosen individual lasts before failure may be supposed for practical purposes to be Normal with mean 100 and variance 1. A slight change is made in the conditions of manufacture, and the times until failure of \(n\) independently chosen items fail are determined, these being \(x_1, x_2, \ldots, x_n\). Construct a significance test at the 5\% level which would be appropriate in order to discover whether the mean length of life has increased, and explain carefully the meaning of such a procedure. (The variance may be supposed unchanged.) Determine how large \(n\) must be in order that the probability of not rejecting the null hypothesis is 0.05 if in fact the new mean is 101.

1970 Paper 3 Q3
D: 1500.0 B: 1500.0

Explain what is meant by the term 'standard error of the mean'. Matches are put into a box five at a time until the weight of the box and matches combined reaches \(M\) grams, when the box is said to be full. The weight of an individual match is normally distributed with mean \(m\) grams and standard deviation \(\sigma\) grams. The weight of an empty match-box is normally distributed with mean \(5m\) grams and standard deviation \(2\sigma\) grams. Find the value of \(M\) such that there is only one chance in a hundred that a full match-box contains fewer than 50 matches.

1970 Paper 3 Q4
D: 1500.0 B: 1500.0

Two normal distributions have different means of 100 and 110 cm and the same standard deviation of 10 cm. A random sample is to be drawn from one of these distributions on the basis of which we have to decide which distribution is being sampled. We wish to have less than 1\% probability of making an error if the distribution is really the one with mean 100 and less than 5\% probability of error in the other case. What is the smallest possible size of sample?

1971 Paper 3 Q11
D: 1500.0 B: 1500.0

The following figures are the additional hours of sleep gained by the use of a certain drug on ten patients:

\(+1.9\), \(+0.8\), \(+1.1\), \(+0.1\), \(-0.1\), \(+4.4\), \(+5.5\), \(+1.6\), \(+4.6\), \(+3.4\).
  1. [(i)] Using a significance test discuss whether these results show convincingly that the drug is an effective sleeping pill.
  2. [(ii)] In what circumstances is the test you have used valid?

1980 Paper 3 Q10
D: 1500.0 B: 1500.0

Let \(X_1, X_2, ..., X_n\) be a random sample of size \(n\) drawn from a normal distribution with variance 1 and with unknown mean \(\beta\). Show how to use the sample mean to construct an interval which contains \(\beta\) with probability approximately 0.95. Now suppose that \(X_1, X_2, ..., X_n\) are not necessarily normally distributed, but merely that their common unknown distribution is continuous (so that \(P[X_i = x] = 0\) for any real \(x\)). Show that, if \(q_{\alpha}\) is the \(\alpha\)-quantile of the unknown distribution (i.e. if \(q_{\alpha}\) is such that \(P[X_i \leq q_{\alpha}] = \alpha\)), and if \(X_{(1)}, X_{(2)}, ..., X_{(n)}\) denotes the sample \(X_1, X_2, ..., X_n\) arranged in ascending order, then \(P[X_{(r)} < q_{\alpha} < X_{(r+1)}] = \binom{n}{r}\alpha^r(1-\alpha)^{n-r}\). Use this fact to construct, in the case when \(n = 6\), an interval within which the median \(q_{1/2}\) of the distribution will lie with probability at least 0.95. Evaluate both intervals when \((X_{(1)}, X_{(2)}, ..., X_{(6)}) = (-0.92, -0.77, 0.41, 0.47, 0.48, 0.99)\).

1982 Paper 3 Q10
D: 1500.0 B: 1500.0

The King of Smorgasbrod proposes to raise lots of money by fining those who sell underweight kippers. The weight of a kipper is normally distributed with mean 200 grams and standard deviation 10 grams. Kippers are packed in cartons of 625 and vast quantities of them are consumed. The Efficient Extortion Committee has produced three possible schemes for determining the fines.

  1. Weigh the entire carton, and fine the vendor 1500 crowns if the average weight of a kipper is less than 199 grams.
  2. Weigh 25 randomly selected kippers and fine the vendor 100 crowns if the average weight of a kipper is less than 198 grams.
  3. Remove kippers one at a time and at random from the carton until an over-weight kipper has been found, and fine the vendor \(3n(n-1)\) crowns, where \(n\) is the number of kippers removed.
Which of the EEC's schemes should the avaricious king select?

1970 Paper 4 Q8
D: 1500.0 B: 1500.0

The number of hours of sleep of a group of patients was recorded. On a subsequent night the patients were each given a sleeping pill and the number of hours of sleep was again recorded. The results were as follows:

\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|} \hline Patient number & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline Hours, before treatment & 7.0 & 6.1 & 6.0 & 3.0 & 2.7 & 3.2 & 4.1 & 7.1 & 0.1 & 2.6 \\ \hline Hours, after treatment & 7.2 & 5.9 & 6.0 & 6.2 & 4.1 & 3.5 & 4.7 & 7.0 & 0.5 & 3.5 \\ \hline \end{tabular}
The results show that most patients slept better after taking the sleeping pill. Are the figures sufficient to demonstrate beyond reasonable doubt that the pill is effective? Justify the use of any statistical technique you have employed.

1981 Paper 4 Q11
D: 1500.0 B: 1500.0

In the run up to the general election in Ruritania, two polling organisations, \(A\) and \(B\), attempted to measure the support of the two political parties, the Reds and the Blues, by each questioning a random sample of 1000 voters (out of a population of several million). The combined results were

Polled 2000 \quad Red supporters 1056 \quad Blue supporters 944
What conclusions may be drawn from these figures? A few days after these figures were published, it was discovered that \(A\) and \(B\) had each subcontracted to polling organization \(C\) the task of sampling 500 voters in the \(A\)--\(L\) section of the alphabet, and had sampled the remaining 500 from the \(M\)--\(Z\) section itself. To cut costs, \(C\) had given the results of the same sample of 500 to \(A\) and \(B\). Is it possible to deduce anything from the figures now?

Showing 1-3 of 3 problems
1973 Paper 3 Q9
D: 1500.0 B: 1500.0

In an election there are three candidates, \(A, B\) and \(C\), and \(N\) voters. Each voter acts independently of the others, and is equally likely to vote for any one of the candidates. Each voter votes exactly once. Suppose the voters are numbered \(1, 2, \ldots, N\) and define the random variables \(A_1, \ldots, A_N, B_1, \ldots, B_N\) by \begin{align*} A_j &= \begin{cases} 1 & \text{if the \(j\)th voter votes for \(A\),} \\ 0 & \text{otherwise}; \end{cases} \\ B_j &= \begin{cases} 1 & \text{if the \(j\)th voter votes for \(B\),} \\ 0 & \text{otherwise}. \end{cases} \end{align*} Let \(X, Y\) be the total number of votes cast for \(A, B\) respectively. Find the distribution of \(X\), and the covariance of \(X\) and \(Y\). Why would you expect this covariance to be negative?

1958 Paper 4 Q111
D: 1500.0 B: 1500.0

Defining the coefficient of correlation between two variables \(x\) and \(y\) as \(\rho = \frac{E[(x-Ex)(y-Ey)]}{\sqrt{E[(x-Ex)^2]} \cdot \sqrt{E[(y-Ey)^2]}},\) where \(E\) denotes the expectation value, prove

  1. [(i)] \(|\rho| \leq 1\),
  2. [(ii)] \(\rho = 1\) if \(y = ax + b\), where \(a\) and \(b\) are constants and \(a > 0\),
  3. [(iii)] if \(X = px + q\) and \(Y = ry + s\), where \(p\), \(q\), \(r\) and \(s\) are constants and \(p\) and \(r\) are positive, the coefficient of correlation between \(X\) and \(Y\) is equal to that between \(x\) and \(y\).

1959 Paper 4 Q112
D: 1500.0 B: 1500.0

Define the coefficient of correlation between two variables. The numbers of bacteria present in 10 samples were as follows:

\begin{tabular}{c|cccccccccc} Bacteria A & 9 & 0 & 1 & 6 & 3 & 5 & 6 & 1 & 5 & 4 \\ Bacteria B & 10 & 1 & 4 & 9 & 3 & 5 & 5 & 4 & 0 & 9 \\ Bacteria C & 21 & 20 & 20 & 14 & 23 & 22 & 22 & 20 & 24 & 14 \\ \end{tabular}
Calculate the coefficients of correlation between the pairs AB, AC, BC. What meaning do you attach to your results?

No problems in this section yet.

No problems in this section yet.

No problems in this section yet.

Showing 1-5 of 5 problems
1974 Paper 3 Q10
D: 1500.0 B: 1500.0

The following test was designed to examine whether cards shuffled by a machine were in random order. Four red cards followed by six black cards were placed in the machine. After shuffling the first four cards were examined and the number of red cards among them was noted. This was repeated 1260 times and the results are tabled below.

\begin{array}{c|c|c|c|c|c|c} \text{No. of red cards} & 0 & 1 & 2 & 3 & 4 & \text{Total} \\ \hline \text{Frequency} & 78 & 504 & 486 & 180 & 12 & 1260 \\ \end{array}
Using the hypothesis that the machine shuffles successfully, calculate the expected frequencies from a suitable theoretical model and apply the \(\chi^2\)-test to examine the hypothesis.

1976 Paper 3 Q10
D: 1500.0 B: 1500.0

Analyse the following cases by any methods you think suitable, explaining briefly in each case the principles underlying your method.

  1. In order to test whether a certain treatment increases the strength of a material, an experiment was carried out on 12 specimens of the material, of which 6 received the treatment and 6 did not, yielding the following results. \begin{array}{lrrrrrr} \text{With treatment} & 96 & 94 & 102 & 98 & 98 & 100 \\ \text{Without treatment} & 102 & 104 & 108& 110 & 106 & 106 \end{array} (Higher figures correspond to higher strength.)
  2. The following data were obtained on the occurrences of 80 objects among 5 classes. \begin{array}{crrrrr} \text{Class} & A & B & C & D & E \\ \text{Expected numbers} & 10 & 10 & 20 & 20 & 20 \\ \text{Observed numbers} & 9 & 12 & 19 & 19 & 21 \end{array}

1977 Paper 3 Q10
D: 1500.0 B: 1500.0

Sixty cars are chosen at random from all those of makes \(A\), \(B\) and \(C\) with a mileage of approximately 10,000. Each is evaluated for tyre wear and classified as good, average or poor.

\begin{array}{|l|ccc|} \hline {\text{Tyre wear}} &&{\text{Make of car}} \\ & A & B & C \\ \hline \text{Good} & 11 & 8 & 5 \\ \text{Average} & 3 & 8 & 4 \\ \text{Poor} & 6 & 4 & 11 \\ \hline \end{array}
Test the hypothesis that there is no difference in tyre wear between the makes. The manufacturer of car \(A\) suggests that it would be more sensible to multiply the entries in the table by 4 as each car has four tyres. If this suggestion were accepted would it affect the result? Is it a sensible suggestion?

1979 Paper 3 Q10
D: 1500.0 B: 1500.0

A machine produces boiled sweets in 100 kg batches, at the rate of 10 tonnes per day. When the machine is working properly, the chance of a given batch being bad is known to be \(\frac{1}{10}\), different batches being independent: however, the machine is prone to develop a fault which increases the chance of bad batches. At the end of a day in which 2 tonnes of bad sweets were produced, the quality control officer reasoned as follows: 'The machine has produced 8 tonnes of good sweets and 2 tonnes of bad, as against the expected tonnages of 9 and 1 respectively. Hence \begin{align*} \chi^2 = (9-8)^2/9 + (2-1)^2/1 = 10/9, \end{align*} which is not significant against \(\chi_1^2\).' Is his argument satisfactory, and is there evidence for the machine being faulty? Does it matter that the same value of \(\chi^2\) is obtained on a day when all the sweets produced are good?

Show Solution
His argument is erroneous. He essentially is losing information about the size of the sample, we should be using: \begin{align*} && \chi^2 &= \frac{(90-80)^2}{90} + \frac{(20-10)^2}{10} \\ &&&= \frac{100}{9} = 11.11 \end{align*} which is significant at \(p=0.001\), so there is a very strong evidence the machine is faulty. When all the sweets are good the issue is Chi-squared is a two-sided test, so we are still learning that the machine is acting well out of it's distribution which is not good.
1973 Paper 4 Q11
D: 1500.0 B: 1500.0

A survey is conducted among \(n\) people in order to examine whether there is any association between smoking and lung cancer. The following data are obtained

\begin{array}{l|cc|c} & \text{With cancer} & \text{Without cancer} & \text{Total} \\ \hline \text{Smokers} & n_{11} & n_{12} & n_{1.} \\ \text{Non-smokers} & n_{21} & n_{22} & n_{2.} \\ \hline \text{Total} & n_{.1} & n_{.2} & n \\ \end{array}
Here \(n_{11}\) is the number of people who both smoke and have cancer, with \(n_{12}\), \(n_{21}\), \(n_{22}\) defined in the obvious way. It is desired to compress the table \((n_{ij})\) into a single real number \(\delta = \delta(n_{ij})\) which indicates the association between smoking and cancer. Two medical experts are consulted, on the choice of the function \(\delta(n_{ij})\). The first says that \(\delta(n_{ij})\) should be determined uniquely by the pair \((n_{11}/n_{1.}, n_{21}/n_{2.})\), and the second says that \(\delta(n_{ij})\) should be determined uniquely by the pair \((n_{11}/n_{.1}, n_{12}/n_{.2})\). Show that if we choose the function \(\delta(n_{ij})\) to satisfy both the experts, and if \((a_{ij})\), \((b_{ij})\) are two tables for which \[a_{11}a_{22}/a_{12}a_{21} = b_{11}b_{22}/b_{12}b_{21}\] then \(\delta(a_{ij}) = \delta(b_{ij})\). If each of the \(n's\) is fairly large, describe the association between smoking and lung-cancer if
  1. \((n_{11}n_{22})/(n_{12}n_{21}) \simeq 1\),
  2. \((n_{11}n_{22})/(n_{12}n_{21}) \gg 1\).

No problems in this section yet.

Showing 1-2 of 2 problems
1979 Paper 4 Q11
D: 1500.0 B: 1500.0

Show that, if \(X_1\), \(X_2\) and \(X_3\) are independent, and have a common continuous distribution, \(\text{Pr}[X_1 > \max(X_2, X_3)] = \frac{1}{3}\). Independent random samples \(X_1, X_2, ..., X_n\) and \(Y_1, Y_2, ..., Y_n\) are drawn from two unknown continuous distributions. It is suspected that, in fact, the distribution of the \(X\)'s is the same as that of the \(Y\)'s, and, to test this, it is proposed to consider the statistic \[W = \sum_{i,j=1}^{n} Z_{ij},\] where \[Z_{ij} = \begin{cases} 1 & \text{if} \quad X_i < Y_j \\ 0 & \text{if} \quad X_i \geq Y_j. \end{cases}\] Show that, if the two distributions are indeed identical, \(W\) has expectation \(n^2/2\) and variance \(n^2(2n + 1)/12\).* Show also that if, instead, \(P[X_i > Y_j] = p \neq \frac{1}{2}\), the expectation of \(W\) is \((1-p)n^2\). Assuming that \(W\) is approximately normally distributed, investigate whether the following data support the view that values from the \(Y\) distribution are typically larger than values from the \(X\) distribution: \(X\) \(3.41\) \(3.63\) \(3.77\) \(4.00\) \(4.54\) \(4.82\) \(4.91\) \(5.08\) \(Y\) \(3.91\) \(4.70\) \(4.71\) \(4.93\) \(4.95\) \(5.12\) \(5.37\) \(5.90\) * [Hint. Find the variance of \(W\) by first evaluating the expectation of \(W^2\).]

1926 Paper 1 Q602
D: 1500.0 B: 1500.0

A variable point X is taken on the side BC of a triangle ABC in which AB=AC. Points Y, Z are taken in CA, AB such that CY=BX and BZ=CX. Prove that the triangle XYZ is of constant shape and that its circumcentre coincides with the in-centre of the triangle ABC.

No problems in this section yet.