Elementary Stastistics

download Elementary Stastistics

of 37

Transcript of Elementary Stastistics

  • 7/28/2019 Elementary Stastistics

    1/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 1

    1

    Elementary Statistics

    M. Ghamsary, Ph.D.

    Chapter 02

  • 7/28/2019 Elementary Stastistics

    2/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 2

    2

    Descriptive Statistics

    Grouped vs Ungrouped Data

    Ungrouped data:have not been summarized in any wayare also calledraw data Grouped data: have been organized into a frequency distribution

    RRaaww DDaattaa:: When data are collected in original form, they are calledrraaww ddaattaa.

    The following are the scores on the first test of the statistics class in fall of 2004.

    76 78 71 86 80

    62 55 89 66 72

    68 96 78 81 82

    69 89 88 85 86

    79 73 58 85 99

    90 66 76 70 63

    79 88 59 55 75

    86 92 92 62 83

    52 94 93 80 78

    97 50 88 60 61

    Table 2.1: Data fromTest#1 of fall 2007

    Stem-and-Leaf: One method of displaying a set of data is with a stem-and-leaf plot.

    Stem Leaf

    5 0 2 5 5 8 9

    6 0 1 2 2 3 6 6 8 9

    7 0 1 2 3 5 6 6 8 8 8 9 9

    80 0 1 2 3 5 5 6 6 6 8 8 8 9 9

    9 0 2 2 3 4 6 7 9

    Group Data: When the raw data is organized into a ffrreeqquueennccyy ddiissttrriibbuuttiioonn

    FFrreeqquueennccyy DDiissttrriibbuuttiioonn: is the organizing of raw data in table form, using classes and

    frequencies.

  • 7/28/2019 Elementary Stastistics

    3/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 3

    3

    Class: Number of classes in the above table is 5.Class Limits: represent the smallest and largest data values in each class. LLoowweerr CCllaassss:: the lowest number in each class. In above table 50 is the lower class limit of the

    first class, 60 is the lower class limit of the 2ndclass, etc.

    UUppppeerr CCllaassss::tthhee hhiigghheesstt nnuummbbeerriinn eeaacchh ccllaassss.. In above table 59 is the upper class limit of thefirst class, 69 is the upper class limit of the 2ndclass, etc.

    CCllaassss WWiiddtthh:: for a class in a frequency distribution is found by subtracting the lower (orupper) class limit of one class minus the lower (or upper) class limit of the previous class. In

    above table the class width is 10.

    CCllaassss BBoouunnddaarriieess are used to separate the classes so that there are no gaps in the frequencydistribution.

    ClassClass

    Boundaries

    Frequency

    50-59 49.5-59.5 6

    60-69 59.5-69.5 9

    70-79 69.5-79.5 12

    80-89 79.5-89.5 15

    90-99 89.5-99.5 8

    Class Tally Frequency

    50-59 6

    60-69 9

    70-79 12

    80-89 15

    90-99 8

  • 7/28/2019 Elementary Stastistics

    4/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 4

    4

    Cumulative Frequency:

    Relative Frequency:

    ClassFrequency Cumulative

    FrequencyRelativeFrequency

    50-59 6 6 6/50=0.12

    60-69 9 9+6=15 9/50=0.18

    70-79 12 12+15=27 12/50=0.24

    80-89 15 15+27=42 15/50=0.30

    90-99 8 8+42=50 8/50=0.16

    nn==5500

    MMoosstt PPooppuullaarr GGrraapphhss iinn SSttaattiissttiiccss

    TThhee mmoosstt ccoommmmoonnllyy uusseeddggrraapphhss iinn ssttaattiissttiiccss aarree::

    11.. TThhee HHiissttooggrraamm

    22.. TThhee FFrreeqquueennccyy PPoollyyggoonn..

    33.. TThhee CCuummuullaattiivvee FFrreeqquueennccyy GGrraapphh

    44.. TThhee BBaarr CChhaarrtt

    55.. PPiiee CChhaarrtt

    66.. PPaarreettoo CChhaarrttss

    77.. DDoott PPlloott

    88.. SStteemm--LLeeaaff99.. TTiimmee SSeerriieess GGrraapphh

    11.. TThhee HHiissttooggrraamm

    o Making decisions about a process, product, or procedure that could be improved afterexamining the variation (example: Should the school invest in a computer-based tutoring

    program for low achieving students in Algebra I after examining the grade distribution? Are

    more shafts being produced out of specifications that are too big rather than too small?)

    o Displaying easily the variation in the process (example: Which units are causing the mostdifficulty for students? Is the variation in a process due to parts that are too long or parts that

    are too short?)

  • 7/28/2019 Elementary Stastistics

    5/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 5

    5

    Test1

    Frequency

    9585756555

    16

    14

    12

    10

    8

    6

    4

    2

    0

    Mean 76.8

    StDev 12.98

    N 50

    Histogram of Test1Normal

    Test1

    Percent

    12011010090807060504030

    99

    95

    90

    80

    70

    60

    50

    40

    30

    20

    10

    5

    1

    Mean 76.8

    StDev 12.98

    N 50

    AD 0.537

    P-Value 0.161

    Probability Plot of Test1Normal - 95% CI

  • 7/28/2019 Elementary Stastistics

    6/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 6

    6

    2. The frequency polygon

    o Making decisions about a process, product or procedure that could be improved(example: a frequency polygon for 642 psychology test scores, shown below to the right.)

    X Frequency

    54.5 6

    64.5 9

    74.5 12

    84.5 15

    94.5 8

    Midpoints x

    f

    10090807060

    15.0

    12.5

    10.0

    7.5

    5.0

    Scatterplot of f vs x

  • 7/28/2019 Elementary Stastistics

    7/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 7

    7

    2.The Cumulative Frequency Graph (Ogive)

    Cumulative frequency is used to determine the number of observations that lie above

    (or below) a particular value.

    Upper ClassBoundaries

    CumulativeFrequency

    59.5 6

    69.5 15

    79.5 27

    89.5 42

    99.5 50

    Upper Class Boudaries

    Cumula

    tivef

    10090807060

    50

    40

    30

    20

    10

    0

    Scatterplot of Cumulative f vs x

  • 7/28/2019 Elementary Stastistics

    8/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 8

    8

    4. The bar chart

    Bar charts are useful for comparing classes or groups of data. A class or group can have a

    single category of data or they can be broken down further into multiple categories for

    greater depth of analysis.

    Class Grade Frequency

    50-59 F 6

    60-69 D 9

    70-79 C 12

    80-89 B 15

    90-99 A 8

    ABCDF

    16

    14

    12

    10

    8

    6

    4

    2

    0

    Grade

    Frequency

  • 7/28/2019 Elementary Stastistics

    9/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 9

    9

    55.. PPiiee CChhaarrtt

    oo AA ppiiee cchhaarrtt iiss aa wwaayy ooffssuummmmaarriizziinngg aa sseett ooffccaatteeggoorriiccaall ddaattaa oorrddiissppllaayyiinngg tthhee ddiiffffeerreennttvvaalluueess ooffaa ggiivveenn vvaarriiaabbllee ((eexxaammppllee:: ppeerrcceennttaaggee ddiissttrriibbuuttiioonn))..

    oo PPiiee cchhaarrttss uussuuaallllyy sshhooww tthhee ccoommppoonneenntt ppaarrttss ooffaa wwhhoollee.. OOfftteenn yyoouu wwiillll sseeee aa sseeggmmeenntt oofftthheeddrraawwiinngg sseeppaarraatteeddffrroomm tthhee rreesstt oofftthhee ppiiee iinn oorrddeerrttoo eemmpphhaassiizzee aann iimmppoorrttaanntt ppiieeccee ooff

    iinnffoorrmmaattiioonn

    A8, 16.0%

    B15, 30.0%

    C12, 24.0%

    D9, 18.0%

    F6, 12.0%

  • 7/28/2019 Elementary Stastistics

    10/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 10

    10

    66.. PPaarreettoo CChhaarrttss

    A Pareto chart is used to graphically summarize and display the relative importance of thedifferences between groups of data.

    FADCB

    16

    14

    12

    10

    8

    6

    4

    2

    0

    Frequency

    77.. DDoott pplloottA dot plot is a visual representation of the similarities between two sequences.

    T e s t 1

    9891847770635649

    Dotp lot o f Te st1

  • 7/28/2019 Elementary Stastistics

    11/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 11

    11

    88.. SStteemm--LLeeaaff

    o The Stem-and-Leaf Plot summarizes the shape of a set of data (the distribution) andprovides extra detail regarding individual values.

    o They are usually used when there are large amounts of numbers to analyze. Series ofscores on sports teams, series of temperatures or rainfall over a period of time, series of

    classroom test scores are examples of when Stem and Leaf Plots could be used.

    Stem Leaf

    5 0 2 5 5 8 9

    6 0 1 2 2 3 6 6 8 9

    7 0 1 2 3 5 6 6 8 8 8 9 9

    80 0 1 2 3 5 5 6 6 6 8 8 8 9 9

    9 0 2 2 3 4 6 7 9

    9. Time series Graph

    NovOctSepAugJulJunMayAprMarFebJanDec

    120

    110

    100

    90

    80

    70

    60

    50

    40

    30

    Month

    AOL

    MSFT

    Variable

    Time Series Plot of AOL, MSFT

    Month Price of

    AOL

    Price of

    MSFT

    Jan

    Feb

    MarApr

    May

    JunJul

    Aug

    SepOct

    Nov

    Dec

    65

    60

    5862

    55

    5048

    55

    5750

    48

    40

    110

    115

    120100

    95

    9085

    75

    8060

    50

    40

  • 7/28/2019 Elementary Stastistics

    12/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 12

    12

    Type of Distributions:

    There are several different kinds of distributions, but the following are the most common used in

    statistics.

    Symmetric , normal, or bell shape Positively skewed, Right tail, or skewed to the right side. Negatively skewed, Left tail, or skewed to the left side. Uniform Symmetric, Bell Shape, or Normal Distribution

    1441261089072543618

    600

    500

    400

    300

    200

    100

    0

  • 7/28/2019 Elementary Stastistics

    13/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 13

    13

    Positively skewed

    0.630.540.450.360.270.180.090.00

    500

    400

    300

    200

    100

    0

    Negatively skewed

    0.990.900.810.720.630.540.450.36

    500

    400

    300

    200

    100

    0

  • 7/28/2019 Elementary Stastistics

    14/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 14

    14

    Uniform

    1086420

    1000

    800

    600

    400

    200

    0

  • 7/28/2019 Elementary Stastistics

    15/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 15

    15

    1=Female 0=Male

    Male Female

    Sex

    0

    2

    4

    6

    8

    Count

    Grade

    F

    D

    C

    B

    A

    F D C B A

    Grade

    0

    3

    6

    9

    12

    15

    Count

    Sex

    Male

    Female

    F D C B A

    Grade

    0

    10

    20

    30

    40

    50

    60

    70

    Percent

    Sex

    Male

    Female

    Test1 Sex Grade Test1 Sex Grade

    76 1 C 76 1 C

    62 1 D 59 1 F

    68 1 D 92 1 A

    69 1 D 93 1 A

    79 0 C 88 0 B

    90 0 A 86 0 B

    79 1 C 66 0 D

    86 1 B 81 1 B

    52 0 F 85 0 B

    97 1 A 85 0 B

    78 1 C 70 1 C

    55 1 F 55 1 F

    96 1 A 62 1 D

    89 1 B 80 1 B

    73 0 C 60 1 D

    66 0 D 80 1 B

    88 1 B 72 1 C

    92 0 A 82 0 B94 1 A 86 1 B

    50 1 F 99 1 A

    71 0 C 63 1 D

    89 0 B 75 1 C

    78 1 C 83 1 B

    88 0 B 78 0 C

    58 1 F 61 1 D

  • 7/28/2019 Elementary Stastistics

    16/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 16

    16

    Sex

    Test1

    MaleFemale

    100

    90

    80

    70

    60

    50

    Boxplot of Test1 vs Sex

  • 7/28/2019 Elementary Stastistics

    17/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 17

    17

    Numerical measurements: Sttaattiissttiicc::: any value(s) or measure(s) obtained from a sample.

    PPaarraammeetteerr:: any value(s) or measure(s) obtained from a specific population.Measures of central tendency: are Mean, Median, andMode,

    MMeeaannis defined to be the sum of the scores in the data set divided by the total number of scores.

    o Sample Mean: is denoted by x , and it is defined by:

    x

    x

    n

    i

    i

    n

    = =

    1

    , or simply

    x

    x n=

    .

    o Population Mean: is denoted by , and it is defined by:

    = =xN

    i

    i

    N

    1 , or simplyx

    N = .

    Note: The sample mean,x is an unbiased estimate of the population mean, .

    Example1: Find the mean of 10, 7, 3, 12, 18.

    x = + + + + =10 7 3 12 185

    10 .

    Example2: Find the mean of 10, 7, 3, 12, 18, 13, 17, 15, 25, 3

    x =+ + + + + + + + +

    = =10 7 3 12 18 13 17 15 25 30

    10

    150

    10 15

    Example3: Find the mean of scores in the test#1, 2004 in data set in this chapter.

    76 62 78 61

    57

    06 8.x

    + + + += =

    "

  • 7/28/2019 Elementary Stastistics

    18/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 18

    18

    MMeeddiiaann::is defined to be the midpointof the data set that is arranged from smallest to largest.

    Example4: Find the median of 10, 7, 3, 12, 15.

    Solution: First we must sort the data set as follows: 3, 7, 10, 12, 15.

    The median is 10.

    Example5: Find the median of 10, 7, 3, 12, 15, 20.

    Solution: After we sort we get: 3, 7, 10, 12, 15, 20.

    As we observe, there are 2 middle observations. So to find the median we average these 2 values,

    namely: Median=(10+12)/2 =11.

    Example6: The median of scores in the test#1, 2004 in data is 78.50

    Median = 78.50

    MMooddee::is defined to be the value in the data set that occurs most frequently.

    Example7A: Find the mode of 10, 7, 3, 12, 15, 3.

    Mode is 3.

    Example7B: Find the mode of 10, 7, 3, 10, 15, 3.

    Modes are 3 and 10.

    Example7C: Find the mode of 10, 7, 3, 10, 10, 3.

    Mode is 10.

    Example7D: Find the mode of 10, 7, 3, 10, 7, 3.

    There is no mode, since all values occur with same frequency

    Example7E: Find the mode of 10, 7, 3, 12, 15, 18.

    There is no mode, since no values occur more than once.

  • 7/28/2019 Elementary Stastistics

    19/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 19

    19

    Example 8: Find the mean, the median, and the mode of data set:10, 17, 13, 12, 15, 18, 10, 17, 14, 16, 35, 28, 22, 17, 23, 12, 15, 28, 10, 20

    Solution: First we must sort the data set

    10, 10, 10, 12, 12, 13, 14, 15, 15, 16, 17, 17, 17, 18, 20, 22, 23, 28, 28, 35

    o Mean: x = + + + + + + + = =10 10 10 12 28 28 3520

    352

    2017 6

    . . . . ..

    o Median: 16 172

    16 5= . , since there are 2 middle observationso Mode: 10 17,

    Example 9: Find the mean, the median, and the mode of data set:

    25, 42, 18, 37, 25, 18, 40, 57, 64, 66, 85, 86, 92

    85, 88, 92, 67, 33, 75, 85, 48, 60, 80, 60, 50

    Example10: Find the mean, the median, and the mode of data set:

    12.37, 13.33, 32.67, 12.37, 26.45

  • 7/28/2019 Elementary Stastistics

    20/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 20

    20

    Example11A: Find the mean for the following group data

    Class Frequency

    50-59

    60-69

    70-79

    80-89

    90-99

    6

    9

    12

    15

    8

    Solution: First we need to find the class marks(midpoints) and then we use the following formula

    :[ ]x. f

    x

    n

    =

    ,

    wherex : is the midpoint or class mark, andf :is the frequency

    n :is the number of data points

    Class Frequency

    f

    Class marks

    x x f.

    50-59

    60-69

    70-79

    80-89

    90-99

    6

    9

    12

    15

    8

    54.5

    64.5

    74.5

    84.5

    94.5

    327

    580.5

    894

    1267.5

    756

    n f =50 x f. =3825

    So the mean is[ ] 3825

    76 550

    x. fx

    n.= = =

  • 7/28/2019 Elementary Stastistics

    21/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 21

    21

    Example11B: Find the mean for the following group dataClass Frequency

    00-04

    05-09

    10-14

    15-19

    20-24

    25-29

    4

    10

    12

    20

    8

    6

    Weighted Average (Mean): The formula in above is also called weighted average or weightedmean. It can also be written as follows:

    [ ]x ww.x=

    where w is weight andx is the score.

    Example12: Find the GPA of John who has the following courses with the corresponding units

    and grades.

    English 5 units with the grade of AMath 3 units with the grade of FSpanish 2 units with the grade of D

    Solution: In this problem, x will be the value of the grades and w is the number of units,

    [ ] [ ] [ ] [ ]4 0 15 3 20 0 2 222 2

    13 0 0

    2

    15 2

    .x . . .x

    w

    w.

    + + + += = = = =

    + +

    .

    Example13: A teacher is teaching 3 classes: There are 30 students in the first Class with the

    average of 70 on the final exam. The second class has 40 students with the average of 60 on the final

    exam. The 3rdclass has 20 students with the average of 80 on the final exam. Find the weighted

    average of the three classes combined together.Solution: Let x be the average of and w be the number of students.

    [ ] 70 30 60 40 80 20 2100 2400 1600 610067 8

    30 40 20 90 90

    ( ) ( ) ( ).xx

    w

    w.

    + + + += = =

    + +=

  • 7/28/2019 Elementary Stastistics

    22/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 22

    22

    MMeeaassuurreess ooffVVaarriiaattiioonn

    RRaannggee VVaarriiaannccee SSttaannddaarrdd DDeevviiaattiioonn

    The RRaannggee: is defined to be the highest value minus the lowest value in the data set

    The Variance: is defined by the following:

    Sample:( )

    2

    2 1

    1

    n

    ii

    x xs

    n

    = =

    or( )

    2

    2

    2

    1

    x

    x nsn

    =

    (short cut formula of the sample

    variance).

    Population:

    ( )2

    2 1

    N

    i

    i

    x

    N

    =

    =

    , or 22

    2

    = x x

    N

    N

    d i

    (short cut formula of the sample

    variance).

    Standard deviation: is the positive square root of the variance.

    Standard deviation = Variance

    Sample:

    ( )21

    1

    n

    i

    i

    x x

    sn

    =

    =

    , and

    Population:

    ( )21

    N

    i

    i

    x

    N

    =

    =

  • 7/28/2019 Elementary Stastistics

    23/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 23

    23

    Example14A: Find the range, variance, and the standard deviation of the following dataset.

    3, 0, 7, 5, 15.

    Solution:

    o Range: Largest- Smallest = 15-0=15

    o Variance: If we use the( )

    2

    2 1

    1

    n

    i

    i

    x x

    sn

    =

    =

    , first we need to find the sample mean x .

    So x = + + + + = =3 0 7 5 155

    30

    56 , then we substitute in the above formula and we get

    s2

    2 2 2 2 23 6 0 6 7 6 5 6 15 6

    5 1= + + + +

    b g b g b g b g b g

    ,

    s 22 2 2 2 2

    3 6 1 1 9

    5 1= + + + +

    b g b g b g b g b gs 2

    9 36 1 1 81

    5 1= + + + + ,

    s 2128

    432= , So the variance is s 2 32 .

    x x x ( ) 2x x 3

    0

    7

    5

    15

    3-6=-3

    0-6=-6

    7-6=1

    5-6=-1

    15-6=9

    9

    36

    1

    1

    81

    ( )x x =0 ( ) 2x x =128

    ( )

    2

    2 1 1281

    128 325 1 4

    n

    ii x xs

    n

    =

    = = = =

    o Standard deviation: As we know the standard deviation is positive square root ofvariance. standard deviation = Variance = 32 5 66.

  • 7/28/2019 Elementary Stastistics

    24/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 24

    24

    But if we use the short cut formula

    ( )22

    2

    1

    xx

    nsn

    =

    , first we need to find their sum, x , and their sum of squares,

    2x .

    3 0 5 05 1 37x + + + += = 2 2 2 2 22 3083 0 7 5 15 9 0 49 25 225x = + + + + = + + + + = then we have

    ( )2

    2

    2

    1

    x

    x nsn

    =

    ( )2

    303

    5

    0

    1

    8 5=

    =308308 1900 80 1285 32

    4 4 4

    = = = , which is exactly the

    same as above.

    ----------------------------------------------------------------------------------

    Example14B: Find the range, variance, and the standard deviation of the following data set.

    10, 17, 13, 12, 15, 18, 10, 17, 14, 16

    28, 22, 17, 23, 12, 15, 28, 10, 20, 35

    Solution:

  • 7/28/2019 Elementary Stastistics

    25/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 25

    25

    Example15A: Find the standard deviation for the following group dataClass Frequency

    50-59

    60-69

    70-79

    80-89

    90-99

    6

    9

    12

    15

    8

    Solution: First will modify the above formula for the variance. But first we need to find the class marks

    (midpoints) and then we use the following formula

    s

    x x f

    n

    i

    2

    2

    1=

    b g .or

    ( )2

    2

    2

    1

    x f

    n

    x

    ns

    =

    where

    x : is the midpoint or class mark

    f : is the frequency

    n : is the number of data points

    We already know the mean[ ] 3865

    5

    76 5

    0

    x. fx

    n

    .= = =

    Classf x x f. x xi b g2 x x fi b g2 .

    50-59

    60-69

    70-79

    80-89

    90-99

    6

    9

    12

    15

    8

    54.5

    64.5

    74.5

    84.5

    94.5

    327

    580.5

    894

    1267.5

    756

    (54.5-76.5)2=484

    (64.5-76.5)2=144

    (74.5-76.5)2=4

    (84.5-76.5)2=64

    (94.5-76.5)2=324

    2904

    1296

    48

    960

    2592

    n f =50 x.=3825

    x x fi b g2 .= 7800

  • 7/28/2019 Elementary Stastistics

    26/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 26

    26

    After substitution insx x f

    n

    i2

    2

    1= b g .

    we get2

    50 1

    7800159 18s .= =

    , and hence the

    standard deviation will be 159 1 12 68.s .=

    If we use the short cut formula

    ( )2

    2

    2

    1

    x ff

    n

    x

    ns

    =

    , we need the following table.

    Classf x x f.

    x f2 .

    50-59

    60-69

    70-79

    80-89

    90-99

    6

    9

    12

    15

    8

    54.5

    64.5

    74.5

    84.5

    94.5

    327

    580.5

    894

    1267.5

    756

    (54.5)2.6 =17821.5

    (64.5)2..9 =37442.25

    (74.5)2.12 =66603

    (84.5)2.15=107103.8

    (94.5)2.8 =71442

    n f =50 x f.=3825

    x f2 .

    =300412.5

    ( )2

    2

    38230

    50412

    0

    5

    504 1

    .

    s

    = =

    14630625300412 5

    3004125 5 292612 55039 49

    .. .

    = =

    7159 18

    8

    49

    00.= and hence the standard deviation will be 159 1 12 68.s .= , which the same as

    the above result.

  • 7/28/2019 Elementary Stastistics

    27/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 27

    27

    Example15B: Find the standard deviation for the following group dataClass Frequency

    00-04

    05-09

    10-14

    15-19

    20-24

    25-29

    4

    10

    12

    20

    8

    6

  • 7/28/2019 Elementary Stastistics

    28/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 28

    28

    Question 1. What will happen to the mean, median, mode, range, and standard deviation if we add

    a fix number, c, to all values in the data set?

    Answer. The mean, median, and mode will increase by c units, but the range, and standard

    deviation will not change.

    Question 2. What will happen to the mean, median, mode, range, and standard deviation if we

    subtract a fix number, c, from all values in the data set?

    Answer. The mean, median, and mode will decrease by c units, but the range, and standard

    deviation will not change.

    Question 3. What will happen to the mean, median, mode, range, and standard deviation if we

    multiply a fix number, c, to all values in the data set?

    Answer. The mean, median, and mode will be multiplied by c units, so does to the range, and

    standard deviation.

    Example 16:

    X X+7 X-7 X*7

    15

    13

    15

    15

    22

    15+7=22

    16+7=23

    15+7=22

    15+7=22

    22+7=29

    15-7=8

    16-7=9

    15-7=8

    15-7=8

    22-7=15

    15*7=105

    16*7=112

    15*7=105

    15*7=105

    22*7=154

    Mean

    Median

    Mode

    Range

    Sd

    16

    15

    15

    9

    3.46

    16+7=23

    15+7=22

    15+7=22

    9

    3.46

    16-7=9

    15-7=8

    15-7=8

    9

    3.46

    16*7=112

    15*7=105

    15*7=105

    9*7=63

    3.46*7=24.22

    In general ifY aX b+ , then we have Mean of Y = a. [Mean of X]+b or y ax b+ Standard deviation of Y = |a| [standard deviation of X], S a Sy X

  • 7/28/2019 Elementary Stastistics

    29/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 29

    29

    Empirical RuleIf the distribution of a data is bell shape or normal, then

    Approximately 68% of scores are one standard deviation away from the mean. They fall in theinterval x s1 , x s1 .

    Approximately 95% of scores are two standard deviation away from the mean. They fall in theinterval x s2 , x s2 .

    Approximately 99.7% of scores are two standard deviation away from the mean. They fall in theintervalx s3 , x s3 .

    Example17. Suppose the IQ scores are normally distributed with the mean of = 100 andstandard deviation of = 15 . Then by the empirical rule Approximately 68% of scores are in the interval 100-15, to100+15 or85 to 115. Approximately 95% of scores are in the interval 100-2(15), to100+2(15) or70 to 130. Approximately 99.7% of scores are in the interval 100-3(15), to100+3(15) or55 to 145.

  • 7/28/2019 Elementary Stastistics

    30/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 30

    30

    CCooeeffffiicciieenntt ooffVVaarriiaattiioonn

    The ccooeeffffiicciieenntt ooffvvaarriiaattiioonn is defined to be the standard deviation divided by the mean.

    Coefficient of variation (CV) =s

    x. Ifx is 0 or close to 0, then this measure shall not be used.

    Normally this measure is used in the case we have 2 or more groups of data with different units.

    Example18.

    Class A Mean =129, and standard deviation= 11 CV=11/129=.085 or 8.5%

    Class B Mean =150, and standard deviation= 25 CV=25/150=.167 or 16.7%

    Class C Mean =60, and standard deviation= 15 CV=15/60 = .25 or 25.0%

    The class C has the greatest relative variation.

    MMeeaassuurreess ooffPPoossiittiioonn

    Standard Scoresz

    x x

    s= or z x= ,

    where, x or is the mean s or is the standard deviation.This value, z, measures the deviation from the mean in number of standard deviation which is also has

    no unit.

    Example19. Suppose John is taking 3 classes with the following scores. In which class has he

    better score?

    Class A English test score = 145Mean =129, and standard deviation= 11

    Z=(145-129)/11 =1.45

    Class B Physics test score = 190Mean =150, and standard deviation= 25

    Z=(190-150)/25 = 1.60

    Class C Statistics test score = 88Mean =60, and standard deviation= 15 Z=(88-60)/15=1.87

    So his score in class C is higher relatively.

  • 7/28/2019 Elementary Stastistics

    31/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 31

    31

    PPeerrcceennttiilleessThe percentile corresponding to a given score (X) is denoted by P and it is given by the following

    formula

    #of scores less thanP .100total number of scores

    x=

    Example20. John has the score of 88 in a class of 20 students. Find the percentile rank of a his

    score.

    81, 65, 75, 76, 78, 62, 63, 65, 70, 90,61, 75, 76, 79, 58, 88, 82, 95, 90, 67.

    Solution: In any problem of finding percentile, we must sort the data set from smallest to largest.

    58, 61, 62, 63, 65, 65, 67, 70, 75, 75

    76, 76, 78, 79, 81, 82,88, 90, 90, 95.

    P#of scores less thanx

    total number of scores.100 =

    16

    20= =.100 80

    So johns score has 80th percentile, which means 80% of all scores are below 88.

    FFiinnddiinngg tthhee SSccoorree CCoorrrreessppoonnddiinngg ttoo aa GGiivveenn PPeerrcceennttiillee

    Example21. In data set of example 20, find the score corresponding 12th percentile.

    Solution:

    Step1: Make sure data is sorted

    58, 61, 62, 63, 65, 65, 67, 70, 75, 75

    76, 76, 78, 79, 81, 82, 88, 90, 90, 95

    Step2: Compute the L = p% of n., where L is the location for the score.

    In this example L=12%of 20=0.12(20)=2.4 or 3.

    Step3: Go to the data set and pick the score at the 3rdposition which is 62.

    It is usually written as P12=62

  • 7/28/2019 Elementary Stastistics

    32/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 32

    32

    NNoottee::

    If L is not a whole number, round up to the next whole number. IfLLis a whole number, use the score as the average of Lth and((LL++11))tthh location score.Example22. In data set of example 20, find the score corresponding 40th percentile.

    Step1: as before58, 61, 62, 63, 65, 65, 67, 70, 75, 7576, 76, 78, 79, 81, 82, 88, 90, 90, 95

    Step2: L =40% of 20= 0.40(20)=8 which is a whole number so we are going to pick the average of 8th

    and 9th scores.

    Step3: 8th score is 70

    9th score is 75 and their average is (70+75)/2=72.5. So P40=72.5.

    DDeecciilleess:: divide the data set into 10 groups.

    D1=10th percentile which the same as P10

    D2=20th percentile which the same as P20

    .

    D9=90th percentile which the same as P90

    QQuuaarrttiilleess::divide the data set into 4 groups.

    Q1=First quartile or 25th percentile which the same as P25

    Q2=second quartile or 50th percentile which the same as P50 . This is also median

    Q3=third quartile or 75th percentile which the same as P75

    IInntteerr--QQuuaarrttiillee RRaannggee ((IIQQRR))::iiss tthhee ddiiffffeerreennccee bbeettwweeeenn 33rrdd aanndd11sstt qquuaarrttiilleess aannddiitt iiss ddeennootteeddbbyy

    IIQQRRaannddiitt iiss ddeeffiinneeddbbyyIIQQRR ==QQ33 QQ11.

  • 7/28/2019 Elementary Stastistics

    33/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 33

    33

    Example23. In data set of example 20, find the score corresponding toD2Q1Q3 IQR

    Outlier: Anoouuttlliieerris an extremely high or an extremely low data value, To check for outlier we

    compute Q1-1.5(IQR) andQ3+1.5(IQR), then if

    The suspected score is below Q1-1.5(IQR) orThe suspected score is above Q3+1.5(IQR)

    Then the score is said to be an outlier.

    Example24. Is there any outlier in the following data set?

    55 46 46 41

    55 49 51 41

    36 41 86 5352 47 44 51

    51 61 51 48

    Sorted Data

    36 41 41 41 44 46 46 47 48 49

    51 51 51 51 52 53 55 55 61 86

  • 7/28/2019 Elementary Stastistics

    34/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 34

    34

    Five commonly used Statistics: The five numbers in any data set that is used frequentlyare

    Minimum, Q1, Q2, Q3, Maximum

    Box pplloott oorrbox-and-whisker plot:: is another graphical representation of any data set. We

    use the five commonly used statistics to graph the box plot. The box plot can provide answers to the

    following questions

    o Is a factor significant?o Does the location differ between subgroups?o Does the variation differ between subgroups?o Are there any outliers?

    Example25. In data set of example 20, find the 5 common statistics.

    58, 61, 62, 63, 65, 65, 67, 70, 75, 7576, 76, 78, 79, 81, 82, 88, 90, 90, 95

    1. Minimum: is58

    2. Q1: L= 25% of 20 =.25(20) = 5. Since this is a whole number we use the average of 5th and 6th

    observation. In above ordered data set we have5th score is 65

    6th score is 65

    their average is also 65. SO Q1=65.

    3. Q2: L= 50% of 20 =0.50(20) =10. Again since this is a whole number we use the average of 10th

    and 11th observation. In above ordered data set we have

    10th score is 75

    11th score is 76

    their average is (75+76)/2=75.5 SO Q2=75.5.

    4. Q3: L= 75% of 20 =0.75(20) =15. This is a whole number we use the average of 15th and 16th

    observation. In above ordered data set we have

    15th score is 81

  • 7/28/2019 Elementary Stastistics

    35/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 35

    35

    16th score is 82

    their average is (81+82)/2=81.5 SO Q3=81.5.

    5. Maximum: is 95.

    So the five statistics are 58, 65, 75.5, 81.5, and95.

    C1

    100

    90

    80

    70

    60

    Boxplot of C1

    Example26 In data set of example 24, find the 5 common statistics.

  • 7/28/2019 Elementary Stastistics

    36/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 36

    36

    Example27. In data set below use computer to find the descriptive statistics and plot allappropriate charts for all variables that was discussed so far.

    Test1 Sex Grade Test1 Sex Grade

    76 1 C 76 1 C

    62 1 D 59 1 F

    68 1 D 92 1 A69 1 D 93 1 A

    79 0 C 88 0 B

    90 0 A 86 0 B

    79 1 C 66 0 D

    86 1 B 81 1 B

    52 0 F 85 0 B

    97 1 A 85 0 B

    78 1 C 70 1 C

    55 1 F 55 1 F

    96 1 A 62 1 D

    89 1 B 80 1 B

    73 0 C 60 1 D

    66 0 D 80 1 B

    88 1 B 72 1 C

    92 0 A 82 0 B

    94 1 A 86 1 B

    50 1 F 99 1 A

    71 0 C 63 1 D

    89 0 B 75 1 C

    78 1 C 83 1 B

    88 0 B 78 0 C

    58 1 F 61 1 D

    Descriptive Statistics: Test1

    Variable Sex N N* Mean SE Mean StDev Minimum Q1 Median Q3 MaximumTest1 Female 34 0 75.59 2.36 13.76 50.00 62.00 77.00 86.50 99.00

    Male 16 0 79.38 2.77 11.10 52.00 71.50 83.50 88.00 92.00

  • 7/28/2019 Elementary Stastistics

    37/37

    Elementary Statistics Dr. Ghamsary Chapter 2 Page 37

    Sex

    Test1

    MaleFemale

    100

    90

    80

    70

    60

    50

    Boxplot of Test1 vs Sex