Chapter 1. INTRODUCTION - University of Kent

35
Chapter 1. INTRODUCTION 1.1 Examples. The need for probability theory What is it? Probability theory is the branch of mathematics that treats those aspects of systems that have random or haphazard features. Example A: Clinical trial of a new drug Old drug: 82% success rate New drug: 90% success rate claimed. Action: Test the new drug on 100 patients, investigate the success rate.

Transcript of Chapter 1. INTRODUCTION - University of Kent

Chapter 1. INTRODUCTION

1.1 Examples. The need forprobability theory

What is it? Probability theory is the branch

of mathematics that treats those aspects of

systems that have random or haphazard

features.

Example A: Clinical trial of a new drug

Old drug: 82% success rate

New drug: 90% success rate claimed.

Action: Test the new drug on 100 patients,

investigate the success rate.

Notes:

1. We will need to have a ‘reasonable’

number of patients. But how many must

we use before we can be confident that

we can rely on the results?

2. Different groups of patients will give

different results, so there is need for some

theory to describe and explain what

happens.

3. How should one choose the patients who

will receive the new drug?

Example B: Opinion Polling

Political opinion polls are frequently used to

assess the current political view of the

electorate.

Simple illustration: By-election, 2 candidates.

[no ‘don’t knows’]

Action: Choose a sample of 100 individuals

from the electorate, and ask them how they

would vote ‘if the by-election were tomorrow’.

Problems:

• possibility of unrepresentative (‘biased’)

sample

• inherent error (uncertainty) of sampling

We will need the concept of sampling

at random . This in turn will require us to

understand the concept of randomness.

The same features appear in many other

cases; for example:

Example C: Accident studies: before and

after remodelling a road junction .

Example D: Quality control. We have an

incoming batch of fireworks , which requires

‘accepting’ or ‘rejecting’. We need to to test

a representative set of fireworks (a random

sample).

Reminder: Probability theory is the

branch of mathematics that treats those

aspects of systems that have random or

haphazard features.

BOOKS

1. G.M. Clarke and D. Cooke. A Basic Course inStatistics.

2. S. Ross A First Course in Probability, 4th Ed

3. P.L. Meyer. Introductory Probability andStatistical Applications.

4. W. Feller Introduction to Probability Theory andIts Applications, Volume I.

5. Many others in the library . . .

1.2 Experiment and Event

In each example in §1.1

something which is done

e.g. drug given to patient, sampledrawn from the electorate

has an uncertain result

e.g. number cured, identity of samplemembers.

These are called the

EXPERIMENT OR TRIAL

and

EVENT, OUTCOME or RESULT.

Experiments

• Experiments are considered to be

repeatable, but at any repetition we do

not know what the result will be.

• But we do know the set of all possible

outcomes.

DEFINITION: THE SAMPLE SPACE

The Sample Space for an experiment is the

set of all possible outcomes of that

experiment.

EXAMPLES:

(1) Experiment: toss a coin 5 times and

count the number of heads.

Sample Space: {0,1,2,3,4,5} .

(2) Experiment: As (1) above, but record

the event in order.

Sample Space: all 25 5-tuples of H and T ;

that is:

HHHHH

HHHHT

HHHTH

· · ·TTTTT

Both these sample spaces are finite. There

are other possibilities.

(3) Experiment: Toss a coin repeatedly, and

record the number of T s before the first H.

Examples:

TTH → 2

TTTTTH → 5

Sample space: {0,1,2, . . .}

(4) Experiment: Record the duration (in

seconds) of the next telephone call through

the University switchboard.

Sample Space: Set of all positive real

numbers.

Notes on Sample Spaces

• In example 4 the Sample Space is continuous. Allthe others are discrete.

• Usually an experiment involving measuring has acontinuous Sample Space and one involvingcounting a discrete Sample Space.

• For most practical purposes we ignore the factthat strictly all Sample Spaces are discretebecause of the finite precision with which numbersare recorded.

• The Sample Space of an experiment is notnecessarily unique: there may well be more thanone way to describe the outcomes. (c.f. earlierexamples on coin tossing.)

• The elements of a Sample Space form a set ofoutcomes.

Standard notation: Experiment E and Sample SpaceS.

Definition: An event is any subset of a sample

space.

Examples of events

Sample Space Event

1. number of heads is even;

exceeds 3;

is 2 or 5.

2. the 5-tuple contains no pair

of consecutive H’s;

is HTTTH;

has 2nd element equal to T .

3. Number of T s as 1, above.

preceding first H

4. Length of between 100 and 200;

phone call ≤ 60;

(in seconds) > 500.

VENN DIAGRAMS

These diagrams are just ways of illustratingSample Spaces, outcomes and events, andtheir relationships.

The Sample Space S is represented by arectangle .

The outcomes in S are represented by pointsin the rectangle.

Events are collections of outcomes, so arerepresented by collections of points, that is,by regions in the rectangle.

Events are subsets of S. They are identifiedby drawing a closed curve around the relevantpoints.

'

&

$

%A

S

EXAMPLE

E: Toss 4 coins, record sequence.S

· ·HH

· ·HT

· · TH

· · TT

HH · · HT · · TH · · TT · ·

× × × ×

× × × ×

× × × ×

× × × ×

Event A: 3 or 4 H

Event B: exactly 2 H

Event C: 2nd and 3rd result H

USE OF VENN DIAGRAMS TO EXAMINE

RELATIONSHIPS BETWEEN EVENTS

When there are 3 events (A, B and C, say),

the sample space S is divided into 8 (= 23)

regions. The general configuration is of the

form:

'

&

$

%'

&

$

%

'

&

$

%

A

B

CS

R1

R2 R3 R4

R5R6R7

R8

It is often convenient to identify the regions in someway.

When using Venn diagrams to prove relationships

between events, always draw them as above, to include

all possible intersections.

The 4 coins example (earlier) is a specialcase:

• A and B do not overlap

• If C occurs then one of A and B must occur.

This can be illustrated like this:

'

&

$

%

'

&

$

%

A B

C

S

But usually the general form is used with some regions

being recognised as empty (that is, containing no

outcomes).

Events of special importance

1. S itself – the certain event.

2. 0 or ∅, a dummy event containing no

outcomes, the impossible event.

3. {x}, where x is a single element of S, i.e.

an outcome.

DEFINITION: An event comprising exactly

one outcome is often termed a simple event;

any other event is a compound event.

Reminder: Events can be viewed as sets - it’s

only the context and area of application

which differ.

1.3 Relations between sets

Notation:

Sets (Events): A, B, . . . , E1, B3 . . .

Elements (Outcomes) x, y, z . . . , x2, z3, . . . ,

xεA: x is contained in A

x 6εA: x is not contained in A

A ⊂ B: A is contained in B

THE THREE BASIC RELATIONSHIPS

UNION of sets:

The union of A and B (A ∪B) consists of all

elements in A or in B or in both A and B.

INTERSECTION of sets:

The intersection of A and B (A ∩B), consists

of all elements in both A and B.

COMPLEMENT of a set:

The complement of a set (A) consists of

those elements not in A.

All other forms of relationship between sets

can be derived from combinations of these

three.

It is sometimes convenient to define extra

notation to represent different forms of

relationship.

Example: For two sets A and B, the notation

A−B means ‘in A but not in B’

Now if xε(A−B), then (xεA) and (xεB), and

so xε(A ∩B).

Similarly, xε(A ∩B) implies that xε(A−B).

Hence A−B = A ∩B .

So the notation A−B is not necessary, since

we can define it in terms of the three basic

relationships (Union, Intersection,

Complement).

Extensions to more than two events

Note: This needs to be done with some care.

It is not difficult to show that

( A ∪B ) ∪ C = A ∪ ( B ∪ C ) .

It is therefore acceptable to write both ofthese as A ∪B ∪ C. That is,

A ∪B ∪ C = (A ∪B) ∪ C = A ∪ (B ∪ C).

Similarly

A ∩B ∩ C = (A ∩B) ∩ C = A ∩ (B ∩ C).

But a Venn diagram shows that, in general,

(A ∩B) ∪ C 6= A ∩ (B ∪ C).

Compare with the algebraic use of + and ×signs:

2× 3× 4: no problem2 + 3 + 4: no problem2× 3 + 4: ambiguous

Further relationships between events

DEFINITIONS

Events A1, A2, . . . An are mutually exclusive

if no two can occur simultaneously, i.e. if

Ai ∩Aj = ∅ for all i 6= j.

Events A1, A2, . . . An are mutually exhaustive

if at least one is certain to occur, i.e. if

A1 ∪A2 ∪ . . . ∪An = S

Several important results are concerned with

events that are

mutually exclusive and exhaustive.

1.4 Probability

To each event arising out of an experiment, a

number (the probability of that event) is

permanently assigned.

For the various different events arising out of

the same experiment, the probabilities need

to be consistent - see the next section.

In principle, we can assign any self-consistent

numbers to these events. But, for the

probabilities to be useful, we insist that the

‘more likely’ the event is, the larger its

probability should be.

Standard notation: The probability of an

event A is denoted by

P(A) or Pr(A).

Example:

E: Two dice are thrown.

A: total score is 7.

Pr(A) = 1/6 .

It is good practice to define and use clear

notation, as in this example. For each event

you consider, define notation such as A, B,

. . . or A1, A2, . . ., as appropriate. Then write

the probabilities as Pr(A), Pr(B), . . . or

Pr(A1), Pr(A2), . . ..

It is possible to write statements such as

Pr(total of 7 when two dice are thrown) = 1/6 ,

but this confuses the definitions of the

experiment and event.

Something to avoid doing

We all use abbreviated terminology from time to time.

However, some students get into difficulties throughfailing to use suitable notation.

A statement such as

Heads = 12

is probably not too confusing – but it is much better as

Pr(Heads) = 12.

Better still, make sure the reader will know whatexperiment was performed.

Worse, we sometimes see statements like

0 = (12)3.

This is very bad practice!

What is really meant might be:

E: Three fair coins are tossed, and the number ofheads is counted.

A: The number of Heads is 0.

Pr(A) = (12)3.

‘Calibration’ of probability scale

In principle, one can use any scale when

assigning probabilities to events.

In practice, we invariably use a scale on the

range [0,1], as follows .

0 11/2

impossible fair coin certain

It is not unusual to have probabilities

expressed in percentage terms - but this is

essentially equivalent.

Probability and relative frequency

Suppose that an experiment is performed n

times; suppose that event A occurs on nA of

these.

The frequency of A is nA.

The relative frequency (r.f.) of A is nAn .

The r.f. may give us some idea of how likely

A is to occur. But note that:

• The probability of A, Pr(A), is a fixed

quantity, but nAn can and will change.

• If another set of n trials were performed, nAn

would be very likely to be different.

• If just one further trial is performed, the r.f.nAn will change to nA+1

n+1 or nAn+1.

So the relative frequency nAn cannot be used

as the definition of Pr(A).

1.5 The Axioms of Probability

The theory of probability is developed from

just three statements:

A1 For any event A, 0 ≤ Pr(A) ≤ 1.

A2 For the event S, Pr(S) = 1.

A3 For any two events A and B satisfying

A ∩B = ∅

Pr(A ∪B) = Pr(A) + Pr(B).

Notes

1. The underlying concept of probability is

for practical purposes linked to relative

frequency but it is not the same thing.

For any event, the relative frequency can and

does vary; the probability is fixed.

2. Mathematically, the theory of probability

develops from the axioms alone, without

reference to any interpretation.

1.6 Numerical assessment ofprobabilities

There are three main ways of doing this:

• symmetry

• limiting relative frequency

• subjective (degrees of belief)

SYMMETRY

This can be applied only when all the

outcomes of the experiment are known to be

equally likely.

The experiment is then symmetrical with

respect to its outcomes.

Calculation: If there are n outcomes and if all areequally likely, each has probability 1

n.

If an event A comprises a outcomes then Pr(A) = an.

So

Pr(event) =No. of outcomes favourable to event

Total no. of outcomes.

Applications: mainly to

• games of chance

• sampling problems

• genetics

Probability calculations reduce to counting problems.

LIMITING RELATIVE FREQUENCY

We have seen that relative frequencies cannot

be used directly to define probabilities.

We need to define Pr(A), where A is some

event based on the experiment E.

Suppose that the experiment E is repeated

over and over again. Let nA denote the

number of occurrences of A in the first n

repetitions of the experiment.

The relative frequency of A, Sn, is given by

Sn =nA

n,

and we can plot a graph of Sn against n.

Convergence of S(n)

Graphs of this sort ‘settle down’ to some

constant value.

The intuitive idea is that as n gets large Sn

gets close to some constant, which we then

define as the probability Pr(A).

(Note: The form of convergence is not

straightforward)

This definition is known as the Limiting

Relative Frequency definition of probability.

IMPORTANT DISTINCTION:

For any finite n, Sn is not the probability of

A. It may be thought of as an estimate of

Pr(A).

SUBJECTIVE PROBABILITY

There are drawbacks with earlier definitions:

Symmetry: restricted to experimentswhere we know the outcomes have equalprobability.

Lim Rel Freq: restricted to experimentswhich can be repeated over and over again.

If we define probability in some other way, wecan extend the concept to experiments whichdo not have these properties.

Subjective probability does this. It representsa person’s degree of belief in some event.

Examples:Labour will win the next general election .The 11.09 to London today will be late .

This definition of probability is closely relatedto ideas of consistent betting behaviour.

CHAPTER 1 SUMMARY

• The subject of probability relates toexperiments whose result is uncertain butmust be one of a set of outcomes, thesample space.

• Event: events are sets and can bemanipulated (∪,∩, A etc)

• To each event (E) is attached a number,its probability, denoted by P(E) or Pr(E).

• Probabilities satisfy certain conditions –the axioms – on which the developmentof probability theory is based.

• Several interpretations of probability obeythe system of axioms.