Post on 04-May-2023
Chapter 1. INTRODUCTION
1.1 Examples. The need forprobability theory
What is it? Probability theory is the branch
of mathematics that treats those aspects of
systems that have random or haphazard
features.
Example A: Clinical trial of a new drug
Old drug: 82% success rate
New drug: 90% success rate claimed.
Action: Test the new drug on 100 patients,
investigate the success rate.
Notes:
1. We will need to have a ‘reasonable’
number of patients. But how many must
we use before we can be confident that
we can rely on the results?
2. Different groups of patients will give
different results, so there is need for some
theory to describe and explain what
happens.
3. How should one choose the patients who
will receive the new drug?
Example B: Opinion Polling
Political opinion polls are frequently used to
assess the current political view of the
electorate.
Simple illustration: By-election, 2 candidates.
[no ‘don’t knows’]
Action: Choose a sample of 100 individuals
from the electorate, and ask them how they
would vote ‘if the by-election were tomorrow’.
Problems:
• possibility of unrepresentative (‘biased’)
sample
• inherent error (uncertainty) of sampling
We will need the concept of sampling
at random . This in turn will require us to
understand the concept of randomness.
The same features appear in many other
cases; for example:
Example C: Accident studies: before and
after remodelling a road junction .
Example D: Quality control. We have an
incoming batch of fireworks , which requires
‘accepting’ or ‘rejecting’. We need to to test
a representative set of fireworks (a random
sample).
Reminder: Probability theory is the
branch of mathematics that treats those
aspects of systems that have random or
haphazard features.
BOOKS
1. G.M. Clarke and D. Cooke. A Basic Course inStatistics.
2. S. Ross A First Course in Probability, 4th Ed
3. P.L. Meyer. Introductory Probability andStatistical Applications.
4. W. Feller Introduction to Probability Theory andIts Applications, Volume I.
5. Many others in the library . . .
1.2 Experiment and Event
In each example in §1.1
something which is done
e.g. drug given to patient, sampledrawn from the electorate
has an uncertain result
e.g. number cured, identity of samplemembers.
These are called the
EXPERIMENT OR TRIAL
and
EVENT, OUTCOME or RESULT.
Experiments
• Experiments are considered to be
repeatable, but at any repetition we do
not know what the result will be.
• But we do know the set of all possible
outcomes.
DEFINITION: THE SAMPLE SPACE
The Sample Space for an experiment is the
set of all possible outcomes of that
experiment.
EXAMPLES:
(1) Experiment: toss a coin 5 times and
count the number of heads.
Sample Space: {0,1,2,3,4,5} .
(2) Experiment: As (1) above, but record
the event in order.
Sample Space: all 25 5-tuples of H and T ;
that is:
HHHHH
HHHHT
HHHTH
· · ·TTTTT
Both these sample spaces are finite. There
are other possibilities.
(3) Experiment: Toss a coin repeatedly, and
record the number of T s before the first H.
Examples:
TTH → 2
TTTTTH → 5
Sample space: {0,1,2, . . .}
(4) Experiment: Record the duration (in
seconds) of the next telephone call through
the University switchboard.
Sample Space: Set of all positive real
numbers.
Notes on Sample Spaces
• In example 4 the Sample Space is continuous. Allthe others are discrete.
• Usually an experiment involving measuring has acontinuous Sample Space and one involvingcounting a discrete Sample Space.
• For most practical purposes we ignore the factthat strictly all Sample Spaces are discretebecause of the finite precision with which numbersare recorded.
• The Sample Space of an experiment is notnecessarily unique: there may well be more thanone way to describe the outcomes. (c.f. earlierexamples on coin tossing.)
• The elements of a Sample Space form a set ofoutcomes.
Standard notation: Experiment E and Sample SpaceS.
Definition: An event is any subset of a sample
space.
Examples of events
Sample Space Event
1. number of heads is even;
exceeds 3;
is 2 or 5.
2. the 5-tuple contains no pair
of consecutive H’s;
is HTTTH;
has 2nd element equal to T .
3. Number of T s as 1, above.
preceding first H
4. Length of between 100 and 200;
phone call ≤ 60;
(in seconds) > 500.
VENN DIAGRAMS
These diagrams are just ways of illustratingSample Spaces, outcomes and events, andtheir relationships.
The Sample Space S is represented by arectangle .
The outcomes in S are represented by pointsin the rectangle.
Events are collections of outcomes, so arerepresented by collections of points, that is,by regions in the rectangle.
Events are subsets of S. They are identifiedby drawing a closed curve around the relevantpoints.
'
&
$
%A
S
EXAMPLE
E: Toss 4 coins, record sequence.S
· ·HH
· ·HT
· · TH
· · TT
HH · · HT · · TH · · TT · ·
× × × ×
× × × ×
× × × ×
× × × ×
Event A: 3 or 4 H
Event B: exactly 2 H
Event C: 2nd and 3rd result H
USE OF VENN DIAGRAMS TO EXAMINE
RELATIONSHIPS BETWEEN EVENTS
When there are 3 events (A, B and C, say),
the sample space S is divided into 8 (= 23)
regions. The general configuration is of the
form:
'
&
$
%'
&
$
%
'
&
$
%
A
B
CS
R1
R2 R3 R4
R5R6R7
R8
It is often convenient to identify the regions in someway.
When using Venn diagrams to prove relationships
between events, always draw them as above, to include
all possible intersections.
The 4 coins example (earlier) is a specialcase:
• A and B do not overlap
• If C occurs then one of A and B must occur.
This can be illustrated like this:
'
&
$
%
'
&
$
%
A B
C
S
But usually the general form is used with some regions
being recognised as empty (that is, containing no
outcomes).
Events of special importance
1. S itself – the certain event.
2. 0 or ∅, a dummy event containing no
outcomes, the impossible event.
3. {x}, where x is a single element of S, i.e.
an outcome.
DEFINITION: An event comprising exactly
one outcome is often termed a simple event;
any other event is a compound event.
Reminder: Events can be viewed as sets - it’s
only the context and area of application
which differ.
1.3 Relations between sets
Notation:
Sets (Events): A, B, . . . , E1, B3 . . .
Elements (Outcomes) x, y, z . . . , x2, z3, . . . ,
xεA: x is contained in A
x 6εA: x is not contained in A
A ⊂ B: A is contained in B
THE THREE BASIC RELATIONSHIPS
UNION of sets:
The union of A and B (A ∪B) consists of all
elements in A or in B or in both A and B.
INTERSECTION of sets:
The intersection of A and B (A ∩B), consists
of all elements in both A and B.
COMPLEMENT of a set:
The complement of a set (A) consists of
those elements not in A.
All other forms of relationship between sets
can be derived from combinations of these
three.
It is sometimes convenient to define extra
notation to represent different forms of
relationship.
Example: For two sets A and B, the notation
A−B means ‘in A but not in B’
Now if xε(A−B), then (xεA) and (xεB), and
so xε(A ∩B).
Similarly, xε(A ∩B) implies that xε(A−B).
Hence A−B = A ∩B .
So the notation A−B is not necessary, since
we can define it in terms of the three basic
relationships (Union, Intersection,
Complement).
Extensions to more than two events
Note: This needs to be done with some care.
It is not difficult to show that
( A ∪B ) ∪ C = A ∪ ( B ∪ C ) .
It is therefore acceptable to write both ofthese as A ∪B ∪ C. That is,
A ∪B ∪ C = (A ∪B) ∪ C = A ∪ (B ∪ C).
Similarly
A ∩B ∩ C = (A ∩B) ∩ C = A ∩ (B ∩ C).
But a Venn diagram shows that, in general,
(A ∩B) ∪ C 6= A ∩ (B ∪ C).
Compare with the algebraic use of + and ×signs:
2× 3× 4: no problem2 + 3 + 4: no problem2× 3 + 4: ambiguous
Further relationships between events
DEFINITIONS
Events A1, A2, . . . An are mutually exclusive
if no two can occur simultaneously, i.e. if
Ai ∩Aj = ∅ for all i 6= j.
Events A1, A2, . . . An are mutually exhaustive
if at least one is certain to occur, i.e. if
A1 ∪A2 ∪ . . . ∪An = S
Several important results are concerned with
events that are
mutually exclusive and exhaustive.
1.4 Probability
To each event arising out of an experiment, a
number (the probability of that event) is
permanently assigned.
For the various different events arising out of
the same experiment, the probabilities need
to be consistent - see the next section.
In principle, we can assign any self-consistent
numbers to these events. But, for the
probabilities to be useful, we insist that the
‘more likely’ the event is, the larger its
probability should be.
Standard notation: The probability of an
event A is denoted by
P(A) or Pr(A).
Example:
E: Two dice are thrown.
A: total score is 7.
Pr(A) = 1/6 .
It is good practice to define and use clear
notation, as in this example. For each event
you consider, define notation such as A, B,
. . . or A1, A2, . . ., as appropriate. Then write
the probabilities as Pr(A), Pr(B), . . . or
Pr(A1), Pr(A2), . . ..
It is possible to write statements such as
Pr(total of 7 when two dice are thrown) = 1/6 ,
but this confuses the definitions of the
experiment and event.
Something to avoid doing
We all use abbreviated terminology from time to time.
However, some students get into difficulties throughfailing to use suitable notation.
A statement such as
Heads = 12
is probably not too confusing – but it is much better as
Pr(Heads) = 12.
Better still, make sure the reader will know whatexperiment was performed.
Worse, we sometimes see statements like
0 = (12)3.
This is very bad practice!
What is really meant might be:
E: Three fair coins are tossed, and the number ofheads is counted.
A: The number of Heads is 0.
Pr(A) = (12)3.
‘Calibration’ of probability scale
In principle, one can use any scale when
assigning probabilities to events.
In practice, we invariably use a scale on the
range [0,1], as follows .
0 11/2
impossible fair coin certain
It is not unusual to have probabilities
expressed in percentage terms - but this is
essentially equivalent.
Probability and relative frequency
Suppose that an experiment is performed n
times; suppose that event A occurs on nA of
these.
The frequency of A is nA.
The relative frequency (r.f.) of A is nAn .
The r.f. may give us some idea of how likely
A is to occur. But note that:
• The probability of A, Pr(A), is a fixed
quantity, but nAn can and will change.
• If another set of n trials were performed, nAn
would be very likely to be different.
• If just one further trial is performed, the r.f.nAn will change to nA+1
n+1 or nAn+1.
So the relative frequency nAn cannot be used
as the definition of Pr(A).
1.5 The Axioms of Probability
The theory of probability is developed from
just three statements:
A1 For any event A, 0 ≤ Pr(A) ≤ 1.
A2 For the event S, Pr(S) = 1.
A3 For any two events A and B satisfying
A ∩B = ∅
Pr(A ∪B) = Pr(A) + Pr(B).
Notes
1. The underlying concept of probability is
for practical purposes linked to relative
frequency but it is not the same thing.
For any event, the relative frequency can and
does vary; the probability is fixed.
2. Mathematically, the theory of probability
develops from the axioms alone, without
reference to any interpretation.
1.6 Numerical assessment ofprobabilities
There are three main ways of doing this:
• symmetry
• limiting relative frequency
• subjective (degrees of belief)
SYMMETRY
This can be applied only when all the
outcomes of the experiment are known to be
equally likely.
The experiment is then symmetrical with
respect to its outcomes.
Calculation: If there are n outcomes and if all areequally likely, each has probability 1
n.
If an event A comprises a outcomes then Pr(A) = an.
So
Pr(event) =No. of outcomes favourable to event
Total no. of outcomes.
Applications: mainly to
• games of chance
• sampling problems
• genetics
Probability calculations reduce to counting problems.
LIMITING RELATIVE FREQUENCY
We have seen that relative frequencies cannot
be used directly to define probabilities.
We need to define Pr(A), where A is some
event based on the experiment E.
Suppose that the experiment E is repeated
over and over again. Let nA denote the
number of occurrences of A in the first n
repetitions of the experiment.
The relative frequency of A, Sn, is given by
Sn =nA
n,
and we can plot a graph of Sn against n.
Convergence of S(n)
Graphs of this sort ‘settle down’ to some
constant value.
The intuitive idea is that as n gets large Sn
gets close to some constant, which we then
define as the probability Pr(A).
(Note: The form of convergence is not
straightforward)
This definition is known as the Limiting
Relative Frequency definition of probability.
IMPORTANT DISTINCTION:
For any finite n, Sn is not the probability of
A. It may be thought of as an estimate of
Pr(A).
SUBJECTIVE PROBABILITY
There are drawbacks with earlier definitions:
Symmetry: restricted to experimentswhere we know the outcomes have equalprobability.
Lim Rel Freq: restricted to experimentswhich can be repeated over and over again.
If we define probability in some other way, wecan extend the concept to experiments whichdo not have these properties.
Subjective probability does this. It representsa person’s degree of belief in some event.
Examples:Labour will win the next general election .The 11.09 to London today will be late .
This definition of probability is closely relatedto ideas of consistent betting behaviour.
CHAPTER 1 SUMMARY
• The subject of probability relates toexperiments whose result is uncertain butmust be one of a set of outcomes, thesample space.
• Event: events are sets and can bemanipulated (∪,∩, A etc)
• To each event (E) is attached a number,its probability, denoted by P(E) or Pr(E).
• Probabilities satisfy certain conditions –the axioms – on which the developmentof probability theory is based.
• Several interpretations of probability obeythe system of axioms.