Correlation Formula: Multiple choice or True/False questions

35
Correlation Formula: Multiple choice or True/False questions: 1. Correlation relates the relative position of a score in one distribution to a. the relative position of a score in another distribution b. the mean of the z-scores from another distribution c. the total variance of all scores in both distributions d. the standard deviation of the z-scores for both distributions A 2. (TRUE/FALSE). If a positive correlation exists between height and weight, a person with above average height is expected to have above average weight. TRUE 3. If height is independent of average yearly income, what is the predicted correlation between these two variables? a. 1 b. -1 c. 0 d. Impossible to say for sure

Transcript of Correlation Formula: Multiple choice or True/False questions

Correlation Formula:

Multiple choice or True/False questions:

1. Correlation relates the relative position of a scorein one distribution to

a. the relative position of a score in another distribution

b. the mean of the z-scores from another distribution

c. the total variance of all scores in both distributions

d. the standard deviation of the z-scores for both distributions

A

2. (TRUE/FALSE). If a positive correlation exists between height and weight, a person with above average height is expected to have above average weight.

TRUE

3. If height is independent of average yearly income, what is the predicted correlation between these two variables?

a. 1b. -1c. 0d. Impossible to say for sure

C

4. Correlation has two symbols associated with it, one for the true (and usually unknown) population correlation, and one for the sample correlation. They are:

a. r for population, (rho) for the sampleb. (rho) for population, r for the samplec. (rho) for population, for the sampled. for population, r for the samplee. both are the same, r

B

5. (TRUE/FALSE) The following plot displays a strong, positive correlation:

FALSE

6. In a linear regression, the null distribution for 1follows what kind of distribution?

a. normal-distributionb. t-distributionc. F-distributiond. exponential distribution

B

7. (TRUE/FALSE). When performing a t-test using a linear regression, if the difference between the groupson the x-axis is equal to 1 (e.g., males are coded “0” and females coded “1”), the value of the Beta 1 estimate equals twice the difference between group means.

FALSE

8. What is one situation where it can be useful to use the Spearman-Correlation statistic?

a. nominal datab. data with dependence between variablesc. continuous datad. monotonic non-linear data

D

9. (TRUE/FALSE). The range of possible values for a coefficient of correlation is [0, 1].

FALSE

10. For a two-factor experiment with 2 levels of factorA and 3 levels of factor B and n=10 in each treatment condition, there is a total of _____ subjects in each level of factor A and a total of _____ subjects in factor B.

a. 10, 10b. 20, 30c. 30, 20d. 60, 60

C

11. In a two-factor ANOVA, what is the implication of asignificant AxB interaction on the main effects of factors A and B?

a. At least one of the main effects must also be significant

b. Both of the main effects must also be significant

c. Neither of the main effects can be significantd. The significance of the interaction has no

implication for the main effects

D

12**. (TRUE/FALSE). A two-factor independent measures study with 2 levels of factor A and two levels of factor B and n=5 participants in each treatment condition (each cell) would require a total of 10 participants.

FALSE

13. For which of the following correlations would the data points be clustered most closely around a straightline?

a. r = 0.10b. r = 0.50c. r = -0.80d. There is no relationship between the value of r

and how close the data points are to a straight line

C

14. For the following data, the Pearson correlation ______.

X Y

2 45 23 52 5

a. is positiveb. is negativec. is zerod. cannot be determined

B

15. For the regression equation Y = 3X -2, if the mean of Y is 10, what is the mean of X?

a. 8b. 28c. 4d. cannot be determined

C

16. (TRUE/FALSE). It is possible for the regression equation to have none of the observed data points located on the regression line.

TRUE

17. If factors A and B both have significant main effects, then the interaction of these factors will also be significant.

a. Trueb. False

FALSE

18**. Can you infer causation from a study that uses a correlation statistic?

a. Alwaysb. Neverc. Only if the sample is randomly selected from

the larger populationd. Only if the levels of the independent variable

are randomly assigned

D

19**. Suppose the correlation between height and weightfor adults is +0.90. What is the percentage of variability in height is explained by variability in weight?

a. 90%b. 45%c. 100 – 90 = 10%d. 81%

D

20**. (TRUE/FALSE). You measure height in inches and weight in pounds and get a correlation of +.60. If you decide to measure weight in kilograms, the correlation between height in inches and weight in kilograms will be the same (+.60).

TRUE

21. For a linear regression, assuming that SSy is constant, which of the following correlations would produce the smallest SS residual?

a. r = -.1b. r = .4c. r = -.7

d. cannot answer this question without more information

C

22**. For a multiple regression with 2 continuous predictor variables (X1 and X2) predicting Y, what is the interpretation of a large negative residual?

a. the sum of squared residuals is larger than would be predicted given the mean

b. a Y-hat is much smaller than would be predictedgiven X1 and X2

c. an actual score on Y is much lower than would be predicted given the person’s scores on X1 & X2,

d. none of the above

C

23. (TRUE/FALSE). It is possible for two sets of data to have identical betas across their two regression equations, but different correlations.

TRUE

24) In ANOVA, the term factor refers to _____________a. the dependent variableb. an independent (or quasi-independent) variablec. the levels of independent variablesd. a sums of squares term between variables

B

25) If a two-factor ANOVA produces a statistically significant interaction, you can infer that __________________

a. The significance of the main effects is not related to the significance of the interactionb. Both main effects are significantc. At least one main effect is significantd. Only one main effect is significant

A

26) In what situation would a factorial ANOVA be used?a) Predicting a discrete dependent variable with two ormore categorical independent variablesb) Predicting a continuous dependent variable with two or more continuous independent variablesc) Predicting a continuous dependent variable with two or more discrete independent variablesd) Predicting a continuous dependent variable with another discrete independent variable

C

27) (TRUE/FALSE). A two-factor ANOVA consists of three separate hypothesis tests.

TRUE

28) A two-factor, independent-measures research study with two levels of factor A and two levels of factor B would require two separate samples.

FALSE

29) If the pearson correlation between X and Y is r = 0.8, then the regression equation (r2) predicts 0.16 (or 16%) of the variance in the Y scores

FALSE

Short answer questions:

30. If we are examining data that contains information on measured cognitive ability and number of hours of sleep the night before testing, what is the null and alternative hypothesis for a correlation between these variables? (Use symbols and write the interpretation relative to the hypothesis in parenthesis)

H0: ρ = 0 (There is no correlation between cognitive ability and hours of sleep)H1: ρ ≠ 0 (There is a correlation between cognitive ability and hours of sleep)

31. List the 4 assumptions we make about our data when we decide to use a linear regression:

i. Independenceii. Linearity

iii. Normalityiv. Equal variance

32. Why does the first plot have the same correlation as the second, even though it clearly has a much stronger relationship present in the data?

the first relationship is non-linear, correlation doesn’t detect this well

33. If we are examining data that contains information on how many parties students have attended while going to CU and students GPA, what is the null and alternative hypothesis for 1 of a linear regression? (Use symbols and write the interpretation of each hypothesis in plain English next to each symbol)

H0: 1 = 0 (number of parties does not predict GPA)

H1: 1 ≠ 0 (number of parties does predict GPA)

34. The following is a sample correlation. Variable Y is plotted against variable X. Would the relationship between these two variables be suitable for analysis with a Pearson’s correlation? Also, briefly justify your answer:

No. There is a clear non-linear (and non-monotonic) relationship.

35. What 2 error terms can SSwithin be broken down into when conducting a repeated measures analysis of variance?

SSbetween subjects + SSerror

36. In words, what does the SSbetween subjects term mean in a repeated measures ANOVA?

This is the (“soaked up”) error due to individual differences. It is measured and removed from the denominator of a repeated measures ANOVA, giving you a more sensitive test.

37. A) Draw a scatterplot of the following datapoints (place “x” along the x-axis and “y” along the y), and place an approximate “best fit” line on the scatterplot.

Something like this:

B) For the scatterplot you drew above, which person is the biggest “outlier” (most positive or negative residual) in the data? What is the approximate value ofthis person’s residual? Put in words what this residualmeans.

Person 2 is the biggest outlier. Person 2’s outlier value is ~ +14. This means that person 2 scores 14 points higher on y than would have been predicted basedon their scores on X.

38. You perform a 2x2 factorial ANOVA for gender (male vs. female) x smoker (yes vs. no) predicting extraversion. You find that three of the cell means (men smokers, men non-smokers, and female smokers) all

have the same mean extraversion scores, but that femalenon-smokers have extremely high extraversion scores. Describe which effects are likely to be significant (assume that any marginal mean difference is likely to be significant). Feel free to plot the 2x2 ANOVA results if it will help you.

All of the effects are likely to be significant. Females have higher scores than males; non-smokers havehigher scores than smokers, and the effects of smoking depend on gender, being stronger for females.

39**. (A) Explain the difference between the correlation statistic and a study that uses a correlational design.

A correlation statistic measures the degree to an increase in one variable is related to an increase or decrease in the other variable. A correlational design lacks either direct manipulation of the independent variable or lacks random assignment to groups (e.g., comparing Americans and Canadians).

(B) Briefly explain an experimental design that could be analyzed using a correlation statistic.

I could look at the effects of drug dosage on mood randomly assigning people to many different levels of dosage (e.g., 0, .5, 1, 1.5, 2, 2.5, and 3 mg). Random assignment and manipulation of the predictor variable makes this an experimental design. Two continuous variables (the independent variable and the dependent variable) make these data testable with a correlation statistic.

(C) Could you infer causality from an experimental design that uses a correlation statistic?

Yes.

40. A Pearson correlation is calculated for a sample ofn =25 pairs of scores. What are the correct degrees of freedom for the inferential test of whether the correlation in the population is different from 0?

23.

41**. (a) Explain in simple terms what the 95% confidence interval for a regression coefficient (e.g.,β1) means. (b) If the confidence interval included 0 asa possible value of beta, what could you conclude aboutthe p-value?

(a) Based on the variability in our sample data, we are95% sure that the “true” value of slope in the population (or beta) lies between the two values given for the confidence interval. (b) If the confidence interval includes zero, we are not sure the value of beta is not zero, thus the p-value would be nonsignificant (i.e., p > alpha).

42. We have talked several times in class about the sum of squared residuals (or sum of squared errors) in a regression equation. Explain intuitively how you would figure out the sum of squared residuals for a given best-fit regression line of variables X and Y.

You would find the distance of each point from the best-fit line, square those distances, and sum them together. This would give you the sum of squared residuals.

43. A multiple regression equation with 2 continuous predictors has an R2 = .25 and SSY = 90 for a sample of n = 30 observations. We find that Fobserved = 4.50 forthe regression equation (as a whole). We know that Fcrit = 3.35. (a) What can you conclude about the relationship between your dependent variable and your two predictor variables? (b) Can you draw any conclusions about the individual predictors in your regression without being given more information?

(a) Our two predictor variables significantly reduce the amount of error in our prediction of the DV. Alternatively, we predict the dependent variable significantly better using both predictors together than we would just by using the mean of the DV.

(b) No. At this point we cannot talk about individual predictors.

44**) Interpret the beta for the linear Regression model: Y-hat = 12X + 10

For every 1 point increase in X, there is a predicted 12 point increase in Y

45) Why does a simple regression with only one independent predictor require two degrees of freedom instead of one?

Answer: Because you are estimating the y-intercept and the slope

46) Two discrete independent variables A and B show a significant interaction when predicting Y. In general, how would you interpret the interaction effect?

The effect of A depends on the level of B or …The effect of B depends on the level of A

47) When predicting weight from height, would you use an ANOVA or a regression design? Explain why in a single sentence.

I would use a regression design because I am using a continuous independent variable

48**) Weight predicts height with an r2 of 0.25. Interpret what this r2 means when predicting height with weight?

25% of the variance in height can be explained by the variance in weight. In other words, we do 25% better (in terms of squared residuals) by guessing a person’s dependent variable score by knowing their predictor variable score than we would if we just guessed the mean of the dependent variable.

49**) If every student scored 100% on this test, explain why ANOVA or regression on these test scores would not improve the predictive accuracy of determining future test scores.

Guessing the mean score (100) will give you perfect prediction accuracy; using an ANOVA or regression cannot improve upon it.

4-sentence summary of R output:

50. A Geneticist theorizes that the greater the number of mutations in a persons genome, the more colds that person has over the course of their lifetime. Write a 4sentence summary based on the following R output that shows the linear relationship between number of mutations and number of colds:

We expected to find a positive relationship between number of mutations in the genome and number of colds. To test this, we predicted number of colds over the course of a lifetime from the number of mutations in 20subjects. We found that each mutation was associated with a predicted increase of 9.35 colds over a persons lifetime (beta1 = 9.35, t(18) = 6.156, p=8.2*e-06). This evidence is consistent with the hypothesis that the greater the number of mutations in a persons genome, the more colds that person has over the course of their lifetime.

51**. Provide a “4” sentence summary of the following 2x2 ANOVA analysis (you may use up to 6 sentences if you need to). The researcher hypothesized that holding a pencil in the teeth (mimicking a smile) would be related to higher ratings of humor from watching a

video (the variable “youtube”) than when holding the pencil in the lips (mimicking a frown – the variable “lipteeth”). The researcher also wanted to know if thiseffect depended on gender, or if there was an overall difference between humor scores between males and females. The output & interaction plot of this analysisare shown below:

We hypothesized that holding a pencil in the teeth (mimicking a smile) would be related to higher ratings of humor from watching a video than when holding the pencil in the lips (mimicking a frown). To test this question, and to understand whether the effect depended

on gender, we asked 104 males and females to watch a humorous video and had them rate how funny it was on a 7-point scale. We found that males rated the video as being funnier than did females (F(1,100)=22.2, p=.001).Moreover, there was no overall effect of holding the pencil in the lips or teeth (F(1,100)=1.68, ns). However, the effects of how the pencil was held in the mouth depended significantly on gender (F(1,100)=16.6, p=.006), such that males found the video funnier when holding the pencil in the teeth (as predicted), but that females found the video less funny when doing so. We conclude that the effects of holding a pencil in themouth to mimic smiles or frowns depend on gender.

52**. As people get older, they both prepare and set more long-term goals than they did as a child. In addition, they also become more involved and interestedin politics and world events. We are interested in seeing if there is a relationship between long term planning and interest in news/politics among college students, so we ran a correlation test in R using the lab survey from psych 3101.

Pearson's product-moment correlation

data: lab_survey$long_term_planning and lab_survey$enjoys_news_politicst = 1.7525, df = 105, p-value = 0.0826alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.02197843 0.34732503 sample estimates: cor 0.1685834

Write a 4-sentence summary explaining these results.

We are interested in looking at the relationship of long-term planning to interest in news/politics. To

answer this question, we surveyed a sample of CU psych students. Using pair-wise correlation, we found that long-term planning does not show a significant relationship with interest in news/politics (r(105)=0.17, t=1.75, p=0.08). In conclusion, there is not sufficient evidence to conclude that a relationshipexists between long-term planning and interest in news/politics.

Correlation by hand:

53. A Geneticist theorizes that people with fewer mutations in their genome tend to be taller. Below are the heights of 10 male subjects and their genome-wide mutation count. What is the correlation between height in inches and number of mutations in males?

sum(z1*z2)/n-1 = 3.49/4 = .872 = r

54. There is a well-documented correlation between number of storks per administrative borough (county) in

Germany and the number of births for that county. The following are data from 3 counties.

A) Calculate the correlation between number of storks and births for each county using the numbers provided:

Storks(hundred

s)

Births(thousan

ds)

Z ScoreStorks

Z ScoreBirths

Z*Z

25 10 -1 -1.09 1.0940 25 0 .872 055 20 1 .218 .21

sum(z1*z2)/(n-1) = 1.3/2 = .65 = r

B) Describe the nature of the correlation between storkpopulation and births:

Positive relationship. More storks associated withhigher birthrate

C) Do you think the stork population is causally predictive of births in Germany? Justify whether or notyou believe so:

No. There is probably another factor (called a “mediating factor”) that is associated both with number of births and number of storks. For example, there are more births in rural areas, andstorks prefer to live in rural areas.

Factorial ANOVA:

55. A researcher collects test scores for male and female freshmen and sophomores. The following table contains the means of a 2x2 study on test scores as a function of gender and year in school (freshman or sophomore).

A) The researcher decides to examine these results using factorial ANOVA. What are the three questions this analysis will try to answer in the factorial ANOVA?

1) Do males or females have higher test scores?2) Do freshen or sophomores have higher test scores?3) Is there an interaction?

B) All other things being equal (e.g., assuming equal samples sizes for each of the four cells cell above), which of the three effects is most likely to be significant?

The main effect for freshman vs. sophomore is most likely to be significant, as it has the largest mean difference.

Test Scores Male Femalemean

Freshman 75 80 77.5

Sophomore 85 90 87.5

mean 80 85

C) Explain in plain English why there was no interaction in the results presented in the previous question.

a. The effect of being a freshman or a sophomore on test scores does not depend on one’s gender. Alternatively, the effect of gender ontest scores doesn’t depend on whether you are a freshman or sophomore.

56. See the R output below:> summary(aov(hours_studied_weekly ~ girl_boyfriend*smokes)) Df SumSq MeanSq F-value p-value girl_boyfriend 1 339.3 339.28

4.7271 0.03196 *smokes 1 3.8 3.82

0.0532 0.81810 girl_boyfriend:smokes 1 24.2 24.21

0.3373 0.56264 Residuals 104 7464.4 71.77

For the R output above, identify the three things this analysis of variance is testing. Hint: For full credit,your answer should contain terms such as “main effect” and “interaction”.Main effect of having a significant otherMain effect of smokingInteraction between smoking and having a significant other

57. The following output comes from a 2x2 factorial ANOVA analysis done on a dataset of Apgar scores. Apgarscores are a method for quickly and reliably summarizing the health of a newborn infant (lower scores indicate worse health, higher scores indicate better health).

(A). Based on the data provided below, write your conclusions about the effects of a baby’s gender and a mother’s smoking habits (smoker or nonsmoker) on Apgar scores.

A baby’s gender has little to no effect on their Apgar score, whereas a mothers smoking habits havea significant effect. There is no interaction between these terms, suggesting that a mother’s smoking habits have the same effect on both male and female infants.

Df Sum Sq Mean Sq F value Pr(>F)babygender 1 0.417 0.417 0.1063

0.745641 smokes 1 41.000 41.000 10.4575

0.002050 **BG x Smokes 1 0.009 0.009 0.0024

0.960978 Residuals 56 219.5 3.921

(B). Based on the following means, draw an interaction plot of the data: Boy-Smoker = 4.3, Girl-Smoker = 4.8, Boy-Nonsmoker = 7.1, Girl-Nonsmoker = 7.5.

58. Let’s say that you are interested in whether the effects of placebo on pain differ depending on gender. You do a 2x2 factorial ANOVA. The means plot of your results is shown below (higher scores are associated with higher pain reports). Describe which of the three effects you test will be significant (you used a large sample size, so you can assume that mean differences > .1 are significantly different from 0).

There is a main effect of gender, such that females arehigher than males. There is not a main effect of placebo vs. nothing. There is an interaction – females show a placebo effect whereas males show an anti-placebo effect.

59. We want to know how one’s gender and one’s relationship status affect how often they party, so we ran a two-way ANOVA predicting number of parties attended per month with gender and relationship status.

summary(aov(parties ~ gender * girl_boyfriend,data=lab_survey))

Df Sum Sq Mean Sq F value Pr(>F)gender 1 3.49 3.49 0.3791

0.539411 girl_boyfriend 1 82.86 82.86 9.0026

0.003375 **gender:girl_boyfriend 1 28.73 28.73 3.1214

0.080205 . Residuals 104 957.19 9.20 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 1 observation deleted due to missingness

A) If our alpha level is set at 0.05, is there a statistically significant interaction in this model?

No

B) There is a significant main effect in this model. Explain what it is, its direction, and how it persists across levels of the other variable.

There is a significant main effect of relationship status on number of parties per month, where individuals not in a relationship attend more parties than those in a relationship, regardless of whether they are male or female.

Interpreting simple regression output or scatterplot

60. For the next three questions (A – C), refer to the R output below:

A) Let’s say you must guess a male’s weight. ResearcherA knows only the mean weight of males, and guesses that. Researcher B also knows the male’s height, and uses the regression equation above to get an estimate of weight based on the male’s height. How much better does Researcher B do than Researcher A? (Be precise). R2 = .6099

B) Based on our sample, if a person is 70 inches tall, how much are they predicted to weigh in pounds?about 151 pounds

C) What does the number “21.438” mean under the “Residuals” output above? Put your answer in plain English that grandma could understand.There is a male in the sample who weighs 21.438 pounds more than would be expected given his height.

61. In the following regression, the number of calories(CALORIES) consumed in a single day by a bike commuter

is the dependent variable. The number of miles commuted(MILESBIKED) by that bike commuter is the independent variable. Assume the bike commuters were randomly sampled from the population of Boulder County.

lm(formula = CALORIES ~ MILESBIKED)

Residuals: Min 1Q Median 3Q Max -222.50 -93.66 -24.11 88.14 218.98

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1715.726 76.934 22.30

.000018MILESBIKED 23.613 1.756 13.45

.008632---Residual standard error: 150.8 on 7 degrees of freedomMultiple R-squared: 0.9627,Adjusted R-squared: 0.9574 F-statistic: 180.9 on 1 and 7 DF, p-value: .008632

A) Interpret the β1 estimate for MILESBIKED:

23.61 calories increase for each additional mile of biking

B) Interpret the value of the intercept:

1715 calories consumed on a day where no biking was done

C) Write a 4-sentence summary for this analysis:

It is assumed that commuting by bicycle is associated with increased caloric expenditure. To test this, we sample 9 individuals who commuted various lengths to work in Boulder County. We found that 23.6 more

calories were burned per mile biked (beta1 = 23.6, t(7)= 13.45, p = .009). We conclude that the further one bikes to work, the more calories are burned.

62. Based on the survey we conducted in lab, we found that people who report enjoying abstract games are significantly more likely to report being good at math.The simple regression model for this relationship is:

Call:lm(formula = good_at_math ~ enjoys_abstract_games, data = lab_survey)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.97523 0.34889 5.661 1.32e-07 ***enjoys_abstract_game 0.25181 0.09866 2.552 0.0121 *

(A). What is your interpretation of the intercept and the slope of this regression model?

Someone who does not enjoy abstract games at all (who scores a “0” on that predictor) would report very low enjoyment for math, approximately 1.97. For each additional point on our rating scale, we would predict that person would report liking math more by about a quarter of a point.

(B). Use the coefficients above to draw the best fit line on the scatter-plot below. The beta

Should look like this:

63. We want to know how conscientiousness scores predict GPA. Here is a scatterplot of the results:

A) Using the best-fit line from above, what is the predicted GPA of an individual how scores a zero on conscientiousness? What is this also know as in the regression equation?

About 2.6 or 2.7. This is also known as the intercept.

B) The slope of this line is 0.04, however it is statistically significant (p= 0.02). What is the interpretation of the slope for GPA scores? What does the p-value mean in terms of observing a slope of 0.04 if conscientiousness and GPA were in reality unrelated?

The slope means that for every 1 point increase in conscientiousness, there is a 0.04 increase in GPA. Thep-value means that we would expect to see a slope of

this magnitude in only 2% of samples if GPA and conscientiousness were unrelated in the population.

Multiple regression output

64**. The following output predicts the weight of high school students from both age and height:

A) Interpret the Beta coefficient for height in inches:Controlling for the effect of age, for every increase

in one inch of height, there is a predicted increase of3.59 pounds.

B) If someone is 30 years old and is 70 inches tall, what is their predicted weight?-141.22 + 1.278*30 + 3.597*70 = 148.9

B) What is the interpretation of the intercept (-141.2)here?It is the predicted weight of someone who is 0 years old and who is 0 inches tall.

65. (NOTE: This is a great question – but way harder than we’d ask on the real test). The following output shows a multiple regression where I asked students to accurately report their SAT Math and Verbal scores. These self-reported factors were then regressed onto student’s real SAT total scores.

lm(formula = RealSATTotal ~ SATMath + SATVerbal, data = sat)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.03653 2.39105 0.852 0.396 SATMath 0.98981 0.03300 29.999 <2e-16 ***SATVerbal 0.95767 0.03712 25.801 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.377 on 167 degrees of freedomMultiple R-squared: 0.9241, Adjusted R-squared: 0.9232 F-statistic: 1017 on 2 and 167 DF, p-value: < 2.2e-16

(A) Based on the output of the multiple regression below, can we conclude that students are reporting their SAT scores perfectly?

No, students are clearly not reporting their SAT scoresperfectly. Otherwise, the value of β1 and β2 would be 1.0 and the R2 would be 1 (because true SAT scores would be being perfectly explained by reported SAT scores). Thus, although students are fairly accurate inreporting their SAT scores, they are not perfect.

(B) I suspect that students are not simply misremembering their SAT scores, I think that students are generally reporting better scores than they actually received. Does you interpretation of β1 and β2 support this conclusion. Explain why or why not.

The regression coefficients for both reported SATMath and SATVerbal are slightly less than 1. Thus, for each additional point a student reported receiving on the SAT, their true SAT score increased by less than a point. This means that, on average, students are slightly biased to report better scores than they actually received.

66. We want to know how conscientiousness scores AND number of parties per a month predict GPA. Here are theresults from R:

Call:lm(formula = lab_survey$GPA ~ lab_survey$conscientiousness + lab_survey$parties)

Residuals: Min 1Q Median 3Q Max -1.31924 -0.26802 0.03584 0.33550 0.95387

Coefficients: Estimate Std. Error t value

Pr(>|t|) (Intercept) 2.67824 0.24545 10.912

<2e-16 ***lab_survey$conscientiousness 0.03679 0.01570 2.343

0.0210 * lab_survey$parties -0.00271 0.01497 -0.181

0.8567 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.48 on 104 degrees of freedom (2 observations deleted due to missingness)Multiple R-squared: 0.054, Adjusted R-squared: 0.03581 F-statistic: 2.968 on 2 and 104 DF, p-value: 0.05575

A) Fill in the multiple regression model below:

predicted GPA = __________ + __________*conscientiousness + __________*parties

predicted GPA = ___2.68_______ + ___0.04_______*conscientiousness + ____-0.003______*parties

B) From the previous question, we recall that conscientiousness significantly predicted GPA. What is the interpretation of conscientiousness predicting GPA in the multiple regression model above?

This is effect of conscientiousness scores predicting GPA when controlling for number of parties per month (over and above parties per month).