FORMULA SHEET (Statistics
Transcript of FORMULA SHEET (Statistics
FORMULA SHEET (Statistics) 1
FORMULA SHEET
[1] DESCRIPTIVE STATISTICS
Sample Mean (ungrouped): n
xx ∑=
Population Mean (ungrouped):
Population Mean (grouped):
Sample variance s2 =
1n
)x - x( 2
−
∑ =
1n
2)Xn( -2X
−
∑
=1n
2
nx)(
-2X
−
∑
∑
Population variance (ungrouped) σ2 =
N
) -X ( 2∑ µ =
N
2N -2X
∑ µ
=
N
2
Nx)(
-2X
∑
∑
Population Variance (grouped): σ2 =
N
) -M f( 2∑ µ =
N
N -2M f
∑
∑ 2)( fM
The standard deviation is the positive square root of the variance. Quartiles First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median) Third quartile position: Q3 = 3(n+1)/4 Where n is the number of observed values (sample size).
Rule 1: If the result is an integer then the quartile is equal to the ranked value. Rule 2: If the result is a factional half, then the quantile is equal to the mean of the
corresponding values. Rule 3: If the result is neither an integer of fractional half, round the result to the nearest integer and select that ranked value.
N
x∑=µ
N
fMgrouped
∑=µ
FORMULA SHEET (Statistics) 2
Coefficient of Variation CV = µσ
. 100
Standardized value or z score = X - µ
σ
or
X - X
s
and for a given z value X = µ + z σ
The Empirical Rule : The interval of values one standard deviation either side of the mean X ± 1 S contains approximately 68% of the items or people
in the sample
[ 2 ] PROBABILITY THEORY
If N is the total number of opportunities for the event to occur and x is the number of
times the event has occurred
P(E) = N
x
Addition Law P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
= P(A) + P(B) if A and B are mutually exclusive
Conditional Probability P(A | B) = )(
)|().(
BP
ABPAP=
∩P(B)
B) P(A
= P(A) if A and B are independent
Multiplication Rule P(A ∩ B) = P(B) P(A | B) = P(A) P(B | A)
= P(A) P(B) if A and B are independent
FORMULA SHEET (Statistics) 3
PROBABILITY DISTRIBUTIONS
Mean or expected value of discrete distribution: ∑== )]([)( xxPxEµ
Variance of a discrete distribution: ∑ −= )](.)[( 22 xPx µσ
The Covariance
Definition formula: )()]()][(([σ1∑
=
−−=N
iiiiiXY YXPYEYXEX
Calculation formula: ∑=
−=N
iiiiiXY YEXEYXPYX
1
)()()(σ
Where XiYi = the ith outcome of the discrete random variables X and Y respectively P(XiYi) = probability of the ith occurrence of X and Y Portfolio Expected Return and Portfolio Risk Portfolio expected return (weighted average return): ���� = ����� + �1 − ������
Portfolio risk (weighted variability): � = ������ + �1 − ������ + 2��1 − �����
Where ���� = portfolio expected return
� = portion of portfolio value in asset X �1 − �� = portion of portfolio value in asset Y
Binomial Distribution P(X) = n!
X! (n X)!− p
x q
n-x
= nCX p
x (1 - p)
n-x (if using a calculator)
µ = n p and σ2 = n p (1 - p)
q = 1-p
The standard deviation is the positive square root of the variance.
FORMULA SHEET (Statistics) 4
[ 3 & 4 ] STATISTICAL INFERENCE
SAMPLING DISTRIBUTIONS
SAMPLE MEAN
X ~ N ( Xµ = µ, σ
X =
σ
n)
If the population standard deviation σ is not known we use the unbiased estimate s when we find the estimated value of σ
X. The Testing Statistics we use are either
z = X
/ n
− µσ
(when σ is known)
or
t = X
s / n
− µ (when σ is not known)
THE SAMPLE PROPORTION
p ~ N (µp =π, σp = n
(1 )ππ −)
The Testing Statistic we use is
z =
n
)-(1 πππ−p
UNIFORM DISTRIBUTION
WORKING WITH FINITE POPULATIONS
Whenever we have n / N > 0.05, the standard error of the sample estimate is multiplied by the following term
Finite Population Correction Factor = N n
N 1
−−
FORMULA SHEET (Statistics) 5
INTERVAL ESTIMATION
[ µµµµ ] Sample Estimate ( X ) ± zα/2 σX
= X ± z α/2 (σ / n )
or
Sample Estimate ( X ) = x ± t α/2, n-1 n
s
[ p ] Sample Estimate ( p ) ± z pσ = n
qpzp
ˆˆˆ 2/α±
Confidence Interval for the Population Total Population Total = ��� Confidence Interval Estimate: ��� = ������� �√� �� − �� − 1
Confidence Interval for Total Difference Total Difference = ��� Mean Difference: �� = ∑ � ! "��
Where: Di = audited value – original value
Confidence Interval Estimate: ��� = ������� �#√� �� − �� − 1
Where: $# = ∑ �� − ����� "�� − 1
ESTIMATING SAMPLE SIZE
MEAN n = 2
2
E
2/2
z σα where E = ( X - µ) is the error of estimation
PROPORTION n = 2
2
E
)(1z ππ − where E is the largest value of ( p - p) we will tolerate
In both cases we take the solution for n given by the formula and ROUND UP.
FORMULA SHEET (Statistics) 6
[ 5 ] REGRESSION ANALYSIS
POPULATION REGRESSION FUNCTION (PRF)
yi = β0 + β1 xi + εi
SAMPLE REGRESSION FUNCTION (SRF)
y = b0 + b1 x
KEY VALUES USED IN REGRESSION CALCULATIONS
SSXY = ∑∑ ∑
∑ −=n
yx ))((xy )y - y)(x - x(
SSXX = n
xxxx ∑
∑∑ −=−2
22)(
)(
SSyy = n
yyyy ∑
∑∑ −=−2
22)(
)(
ESTIMATION FORMULAE
Slope b1 = SS
SS
x-(x
y-(yx-(x
xx
xy=∑
∑2)
)) b1
Intercept b0 = n
xb
n
yxby
)(11∑∑ −=−
SSE = ∑ ∑∑ −− xybyby 102
Standard error of estimate 2−
=n
SSEse
Coefficient of determination
∑∑−
−=−=
n
yy
SSE
SS
SSEr
yy2
2
2
)(11
Computational formula for r2
yy
xx
SS
SSbr
212 =
Pearson Product-Moment Correlation Coefficient YYXX
XY
SS
Sr =
Adjusted R2:
−−−−−=
1
1)1(1 22
kn
nrradj
FORMULA SHEET (Statistics) 7
Sampling Distribution for the estimated slope
t test for slope :
xx
eb
b
SS
sswhere
s
bt
=
−=
:
11 β
t test for correlation:
2n
r1
ρ-r2
−−
=t
where: 0 b if
0 b if
12
12
<−=
>+=
rr
rr
COMPLETE MODEL EVALUATION FORMULAE
Total variation = Explained Variation + Unexplained Variation
SST = SSR + SSE
Where SSR = ∑ − 2)ˆ( yy
These results are used in the following definition
Coefficient of determination r2 =
∑∑−
−=−=
n
yy
SSE
SST
SSEr
22
2
)(11
INTERVAL ESTIMATION
CONFIDENCE INTERVAL FOR THE CONDITIONAL MEAN
iYXni hsty 2−
∧± ,
∑=
−
−+=n
ii
i
XX
XX
nh
1
2
2
)(
)(1
PREDICTION INTERVAL FOR AN INDIVIDUAL RESPONSE
iYXni hsty +± −
∧12 ,
∑=
−
−+=n
ii
i
XX
XX
nh
1
2
2
)(
)(1
FORMULA SHEET (Statistics) 8
[6] TIME SERIES FORECASTING AND INDEX NUMBERS
Mean Absolute Deviation = n
en
ii∑
=1
||
Mean Square Error = n
en
ii∑
=1
2
Exponential Smoothing: F 2 = X1
Ft+1 = aX t + (1-a) F t
Simple Weighted Index: 100.0X
XI i
i =
Weighted Aggregate Price Index: )100(oo
iii QP
QPI
∑∑=
Laspeyres Price Index: )100(00QP
QPI Oi
L ∑∑=
Paasche Price Index: )100(0 i
iiP QP
QPI
∑∑=
Non-Linear Trend Forecasting: Quadratic form: iεβββ +++= 2
i2i10i XXY
Exponential trend: iX
10i εββY i=
Exponential trend (logged transformation): )εlog()log(βX)βlog()log(Y i1i0i ++=
Model selection: First differences (linear trend): )YY()YY()Y(Y 1-nn2312 −==−=− L
Second differences (quadratic): )]YY()Y[(Y
)]YY()Y[(Y)]YY()Y[(Y
2-n1-n1-nn
23341223
−−−==−−−=−−−
L
percentage diff (exponential): %100Y
)Y(Y%100
Y
)Y(Y%100
Y
)Y(Y
1-n
1-nn
2
23
1
12 ×−==×−=×−L