FORMULA SHEET (Statistics

13
FORMULA SHEET (Statistics) 1 FORMULA SHEET [1] DESCRIPTIVE STATISTICS Sample Mean (ungrouped): n x x = Population Mean (ungrouped): Population Mean (grouped): Sample variance s 2 = 1 n ) x - x ( 2 - = 1 n 2 ) X n( - 2 X - = 1 n 2 n x) ( - 2 X - Population variance (ungrouped) σ 2 = N ) - X ( 2 μ = N 2 N - 2 X μ = N 2 N x) ( - 2 X Population Variance (grouped): σ 2 = N ) - M f( 2 μ = N N - 2 M f 2 ) ( fM The standard deviation is the positive square root of the variance. Quartiles First quartile position: Q 1 = (n+1)/4 Second quartile position: Q 2 = (n+1)/2 (the median) Third quartile position: Q 3 = 3(n+1)/4 Where n is the number of observed values (sample size). Rule 1: If the result is an integer then the quartile is equal to the ranked value. Rule 2: If the result is a factional half, then the quantile is equal to the mean of the corresponding values. Rule 3: If the result is neither an integer of fractional half, round the result to the nearest integer and select that ranked value. N x = μ N fM grouped = μ

Transcript of FORMULA SHEET (Statistics

FORMULA SHEET (Statistics) 1

FORMULA SHEET

[1] DESCRIPTIVE STATISTICS

Sample Mean (ungrouped): n

xx ∑=

Population Mean (ungrouped):

Population Mean (grouped):

Sample variance s2 =

1n

)x - x( 2

∑ =

1n

2)Xn( -2X

=1n

2

nx)(

-2X

Population variance (ungrouped) σ2 =

N

) -X ( 2∑ µ =

N

2N -2X

∑ µ

=

N

2

Nx)(

-2X

Population Variance (grouped): σ2 =

N

) -M f( 2∑ µ =

N

N -2M f

∑ 2)( fM

The standard deviation is the positive square root of the variance. Quartiles First quartile position: Q1 = (n+1)/4

Second quartile position: Q2 = (n+1)/2 (the median) Third quartile position: Q3 = 3(n+1)/4 Where n is the number of observed values (sample size).

Rule 1: If the result is an integer then the quartile is equal to the ranked value. Rule 2: If the result is a factional half, then the quantile is equal to the mean of the

corresponding values. Rule 3: If the result is neither an integer of fractional half, round the result to the nearest integer and select that ranked value.

N

x∑=µ

N

fMgrouped

∑=µ

FORMULA SHEET (Statistics) 2

Coefficient of Variation CV = µσ

. 100

Standardized value or z score = X - µ

σ

or

X - X

s

and for a given z value X = µ + z σ

The Empirical Rule : The interval of values one standard deviation either side of the mean X ± 1 S contains approximately 68% of the items or people

in the sample

[ 2 ] PROBABILITY THEORY

If N is the total number of opportunities for the event to occur and x is the number of

times the event has occurred

P(E) = N

x

Addition Law P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

= P(A) + P(B) if A and B are mutually exclusive

Conditional Probability P(A | B) = )(

)|().(

BP

ABPAP=

∩P(B)

B) P(A

= P(A) if A and B are independent

Multiplication Rule P(A ∩ B) = P(B) P(A | B) = P(A) P(B | A)

= P(A) P(B) if A and B are independent

FORMULA SHEET (Statistics) 3

PROBABILITY DISTRIBUTIONS

Mean or expected value of discrete distribution: ∑== )]([)( xxPxEµ

Variance of a discrete distribution: ∑ −= )](.)[( 22 xPx µσ

The Covariance

Definition formula: )()]()][(([σ1∑

=

−−=N

iiiiiXY YXPYEYXEX

Calculation formula: ∑=

−=N

iiiiiXY YEXEYXPYX

1

)()()(σ

Where XiYi = the ith outcome of the discrete random variables X and Y respectively P(XiYi) = probability of the ith occurrence of X and Y Portfolio Expected Return and Portfolio Risk Portfolio expected return (weighted average return): ���� = ����� + �1 − ������

Portfolio risk (weighted variability): � = ������ + �1 − ������ + 2��1 − �����

Where ���� = portfolio expected return

� = portion of portfolio value in asset X �1 − �� = portion of portfolio value in asset Y

Binomial Distribution P(X) = n!

X! (n X)!− p

x q

n-x

= nCX p

x (1 - p)

n-x (if using a calculator)

µ = n p and σ2 = n p (1 - p)

q = 1-p

The standard deviation is the positive square root of the variance.

FORMULA SHEET (Statistics) 4

[ 3 & 4 ] STATISTICAL INFERENCE

SAMPLING DISTRIBUTIONS

SAMPLE MEAN

X ~ N ( Xµ = µ, σ

X =

σ

n)

If the population standard deviation σ is not known we use the unbiased estimate s when we find the estimated value of σ

X. The Testing Statistics we use are either

z = X

/ n

− µσ

(when σ is known)

or

t = X

s / n

− µ (when σ is not known)

THE SAMPLE PROPORTION

p ~ N (µp =π, σp = n

(1 )ππ −)

The Testing Statistic we use is

z =

n

)-(1 πππ−p

UNIFORM DISTRIBUTION

WORKING WITH FINITE POPULATIONS

Whenever we have n / N > 0.05, the standard error of the sample estimate is multiplied by the following term

Finite Population Correction Factor = N n

N 1

−−

FORMULA SHEET (Statistics) 5

INTERVAL ESTIMATION

[ µµµµ ] Sample Estimate ( X ) ± zα/2 σX

= X ± z α/2 (σ / n )

or

Sample Estimate ( X ) = x ± t α/2, n-1 n

s

[ p ] Sample Estimate ( p ) ± z pσ = n

qpzp

ˆˆˆ 2/α±

Confidence Interval for the Population Total Population Total = ��� Confidence Interval Estimate: ��� = ������� �√� �� − �� − 1

Confidence Interval for Total Difference Total Difference = ��� Mean Difference: �� = ∑ � ! "��

Where: Di = audited value – original value

Confidence Interval Estimate: ��� = ������� �#√� �� − �� − 1

Where: $# = ∑ �� − ����� "�� − 1

ESTIMATING SAMPLE SIZE

MEAN n = 2

2

E

2/2

z σα where E = ( X - µ) is the error of estimation

PROPORTION n = 2

2

E

)(1z ππ − where E is the largest value of ( p - p) we will tolerate

In both cases we take the solution for n given by the formula and ROUND UP.

FORMULA SHEET (Statistics) 6

[ 5 ] REGRESSION ANALYSIS

POPULATION REGRESSION FUNCTION (PRF)

yi = β0 + β1 xi + εi

SAMPLE REGRESSION FUNCTION (SRF)

y = b0 + b1 x

KEY VALUES USED IN REGRESSION CALCULATIONS

SSXY = ∑∑ ∑

∑ −=n

yx ))((xy )y - y)(x - x(

SSXX = n

xxxx ∑

∑∑ −=−2

22)(

)(

SSyy = n

yyyy ∑

∑∑ −=−2

22)(

)(

ESTIMATION FORMULAE

Slope b1 = SS

SS

x-(x

y-(yx-(x

xx

xy=∑

∑2)

)) b1

Intercept b0 = n

xb

n

yxby

)(11∑∑ −=−

SSE = ∑ ∑∑ −− xybyby 102

Standard error of estimate 2−

=n

SSEse

Coefficient of determination

∑∑−

−=−=

n

yy

SSE

SS

SSEr

yy2

2

2

)(11

Computational formula for r2

yy

xx

SS

SSbr

212 =

Pearson Product-Moment Correlation Coefficient YYXX

XY

SS

Sr =

Adjusted R2:

−−−−−=

1

1)1(1 22

kn

nrradj

FORMULA SHEET (Statistics) 7

Sampling Distribution for the estimated slope

t test for slope :

xx

eb

b

SS

sswhere

s

bt

=

−=

:

11 β

t test for correlation:

2n

r1

ρ-r2

−−

=t

where: 0 b if

0 b if

12

12

<−=

>+=

rr

rr

COMPLETE MODEL EVALUATION FORMULAE

Total variation = Explained Variation + Unexplained Variation

SST = SSR + SSE

Where SSR = ∑ − 2)ˆ( yy

These results are used in the following definition

Coefficient of determination r2 =

∑∑−

−=−=

n

yy

SSE

SST

SSEr

22

2

)(11

INTERVAL ESTIMATION

CONFIDENCE INTERVAL FOR THE CONDITIONAL MEAN

iYXni hsty 2−

∧± ,

∑=

−+=n

ii

i

XX

XX

nh

1

2

2

)(

)(1

PREDICTION INTERVAL FOR AN INDIVIDUAL RESPONSE

iYXni hsty +± −

∧12 ,

∑=

−+=n

ii

i

XX

XX

nh

1

2

2

)(

)(1

FORMULA SHEET (Statistics) 8

[6] TIME SERIES FORECASTING AND INDEX NUMBERS

Mean Absolute Deviation = n

en

ii∑

=1

||

Mean Square Error = n

en

ii∑

=1

2

Exponential Smoothing: F 2 = X1

Ft+1 = aX t + (1-a) F t

Simple Weighted Index: 100.0X

XI i

i =

Weighted Aggregate Price Index: )100(oo

iii QP

QPI

∑∑=

Laspeyres Price Index: )100(00QP

QPI Oi

L ∑∑=

Paasche Price Index: )100(0 i

iiP QP

QPI

∑∑=

Non-Linear Trend Forecasting: Quadratic form: iεβββ +++= 2

i2i10i XXY

Exponential trend: iX

10i εββY i=

Exponential trend (logged transformation): )εlog()log(βX)βlog()log(Y i1i0i ++=

Model selection: First differences (linear trend): )YY()YY()Y(Y 1-nn2312 −==−=− L

Second differences (quadratic): )]YY()Y[(Y

)]YY()Y[(Y)]YY()Y[(Y

2-n1-n1-nn

23341223

−−−==−−−=−−−

L

percentage diff (exponential): %100Y

)Y(Y%100

Y

)Y(Y%100

Y

)Y(Y

1-n

1-nn

2

23

1

12 ×−==×−=×−L

FORMULA SHEET (Statistics) 9

FORMULA SHEET (Statistics) 10

FORMULA SHEET (Statistics) 11

FORMULA SHEET (Statistics) 12

FORMULA SHEET (Statistics) 13