Post on 04-Apr-2023
1
REGIONAL MARITIME UNIVERSITY
Unit 8
RESEARCH DATA ANALYSISQualitative & Quantitative
Data Analysis
G.S.K. AKAKPO
WHAT IS DATA?
• They are numerical facts and figures from which conclusions are drawn using statistical analysis.
• It is the information researchers obtain on the subjects of their study/research.
• The statistical data usually come in two forms
1. Demographic information
2. Information gathered on the basis of research objectives
Data Management
• Data management is the assembling and keeping of data accurately and securely and in a form that will be available and easy to use.
• After the instruments have been administered and collated, there is the need to review, assemble and sort them out based on well defined criteria such as community, job type, gender, religion, social status, marital status, reasons for doing something etc.
Process of data management (1)
• After data have been collated, and before its analyses, do the following:
i. Prepare a coding or scoring scheme,
ii. Prepare data dictionary,
iii. Edit and clean the data,
Processes Of Data Management (2)
1. Coding data
a. Questionnaires are coded as
i. 01, 02, 03, 04, etc for up to 99 elements
ii. 001, 002, 003, etc for hundreds
b. Gender is coded as
i. 1 for males
ii. 2 for females
Processes Of Data Management (3)
Coding data
c. Likert scale interpretation is coded as
1= Strongly Agree
2=Agree
3=Do not know
4=Disagree
5=Strongly Disagree
Likert scale: Example 1
On a scale of 1 to 5, (1 being least and 5 being highest), rate your assessment of this course.
----------------------------------------------------
Likert scale: Example 2
In your opinion, how do you rate this course?
1 = Poor
2 = Satisfactory
3 = Good
4 = Very good
5 = Excellent
Processes Of Data Management (4)
2. Data dictionary
This is a record book to keep track of all variables, names and codes used during data collection in a computer file.
It is prepared by using all the acronyms in the data and their definitions & meanings.
E.g. GNPC-Ghana National Petroleum Corporation
ECG- Electricity Company of Ghana
GMG- Good Morning Ghana
1-male
2-female
01-questionnaire number 1 for 1-99 range items
009 – questionnaire 9 for 1-999 range items
Processes Of Data Management (5)
3. Data editing : Handling missing dataWhen questionnaires are returned and are being retrieved, one
is likely to encounter wrong; and/or no responses
When this occurs the researcher can do 1 of 2 things
i. Remove the respondent from the analysis if no response
ii. Insert the average response into the missing case. Note that this can lead to data torturing
iii. Don’t know /not applicable responses must be clearly defined as to how they will be used in the analysis.
Data
Analysis
Data analysis can be thought of as making graphical and numerical meaning out of raw or processed
statistical data
Data Analysis
Data analysis involves summarizing data with tables and presenting it using graphs.
It also involves statistical calculations to ensure statistical truths and relevance of data gathered.
• We Summarize data with
• Frequency tables or
• Contingency tables
Data Analysis
• The data presented in the tables are represented using:
i. Pie charts
ii. Histograms
iii. Bar charts
iv. Scatter plots/regression lines
Further statistical calculations can be done for the purpose of statistical inferences by use of statistical tests-chi-square, t-test, etc.
Data Analysis
Many research data has 2 parts
1. Demographic Data Analysis
2. Research objective based data
Part 1: Demographic Data Analysis
Data on personal variables are best presented usingbasic descriptive statistical charts such as
Pie charts (nominal variables-gender, nationality) and;
Bar graphs (ordinal variables-age, qualification)
Hypothetical Example
Imagine data was collected on 60 respondents with demographic variables
Gender, and
Age
Demographic Data Analysis 1
Table 1. Frequency table showing gender distribution of respondents
Gender Number %
Males 39 65
Females 21 35
Totals 60 100
Demographic data analysis-graphs
65
35
Males
Females
Figure 1: A pie chart showing gender distribution ofrespondents
Discussion on gender distribution
The graph (Fig.1) shows that 39 out of the total of 60 representing 65% of the respondents were males.
This could be inferred that two-thirds of the target population were males meaning that the target population was male dominated.
Demographic data analysis 2
Age of respondent
Number % of respondents
25-below 15 25.0
26-30 8 13.3
31-35 10 16.7
36-40 17 28.3
41-above 10 16.7
Total 60 100%
Table 2. Frequency table showing distribution of respondents in terms of age
Demographic data analysis
0
2
4
6
8
10
12
14
16
18
25-below 26-30 31-35 36-40 41-above
Nu
mb
er
of
re
sp
on
de
nts
Age distribution of respondentsFigure 2: A simple bar graph showing Age distribution of respondents
Discussion on marital status distribution
The graph (Fig.2) shows that the modal age groupwas the 36-40 age bracket involving 17 out of 60respondents followed by those from 25 and belowwhich was 15. The least group of 8 were the 26-30age bracket.
Other demographic variables
Variables such as
Marital status
Health status
Social status
Level of education
Nationality
Occupation
Qualification
Preferences
CONTENT DATA ANALYSIS
• Content data is usually based on research objectives
• Content data can be analyzed using
i. Descriptive statistics
ii. Inferential statistics
Descriptive analysis can be done using any of the methods earlier indicated
Inferential statistics analysis usually takes the form of hypothesis testing/further statistical computations
Likert Scale Questionnaire analysis
Recall the item
In your opinion, how do you rate this course?
1. Poor
2. Satisfactory
3. Good
4. Very good
5. Excellent
Item Weight Frequency Value
Poor 1 10 10
Satisfactory 2 12 24
Good 3 7 21
Very good 4 17 68
Excellent 5 14 70
60 193
Generated frequency table from the administered Questionnaire
Overall rating by the respondents
Total respondents = 60 Total points = 193 Mean choice = 193/60 =3.22 =3 approx
From the calculation, it can be observed and concluded that the overall rating of the course by the respondents is “3” which means that the respondents rated the course as “Good”.
Steps in hypothesis testing
a) State the null hypothesis, Ho
b) State the alternative hypothesis, H1
c) Choose the level of significance, α
d) Select an appropriate test statistic –t-test, x2, etc
e) Calculate the value of the test statistic
f) Determine the critical region
g) Reject Ho if the value of the test statistic falls in the
critical region; otherwise accept Ho
Test statistic
Two common test statistics used in student research
Student’s t-test
Used to analyze categorical variables over continuous variables.
Example:
Testing differences in mean performances of boys and girls in class
Test statistic
Chi-square test
Used to analyze categorical variables over discrete variables
Example:
Analyzing equality of illegal electricity connections by customers of different social classes
Let’s focus on content data analysis
Chi-Square Analysis
• Chi-square analysis• Chi-square (χ2) tests are the most popular and
frequently used non-parametric tests of significance in social sciences
Chi-square (χ2) analysis
In a Chi-square (χ2) , the data collected in the study (observed frequencies) are compared with the expected data (expected frequencies)
Their actual difference determines the level of significance
Chi-squared = ∑(observed-expected)2/(expected)
Formula
When to use Chi-square (χ2) analysis?
a. Every value falls in only one category (nominal data)
b. The probability of a subject falling in a particular category is independent
c. The expected frequency is at least 5
Chi-square (χ2) analysis
• Procedure
i. A simple table called contingency table is constructed
ii. The observed frequencies are identified
iii. The expected frequencies are ascertained
iv. The expected frequencies are subtracted from the observed frequencies
v. The differences are squared for each category
vi. Divide each squared difference by corresponding E
vii. Sum up the ratios in (vi) = (χ2)calculated
Chi-square (χ2) Analysis
Table 3: Contingency table for gender involvement in illegal connection
Formula
Gender Males Females
Observed (O) 39 21
Expected (E) 30 30
Calculations
i) Formulate the null hypotheses (Ho) & alternative hypotheses (H1)
ii) Ascertain the degree of freedom and critical value of Chi-square (χ2)
iii) Compare critical value of (χ2) with the calculated (χ2) score
Calculations
iv. If the computed (χ2) value is greater than or equal to the critical value, the null hypothesis is rejected and the difference is considered to be significant
v. If the computed (χ2) value is less than the critical value, the null hypothesis is accepted meaning there is no significant difference.
Example
In a study of 60 illegal electrical connections, a researcher reported 30 such cases in the lower social class, 20 in the middle class and 10 in the upper social class.
Test at 1% level of significance, the hypotheses that illegal electricity connections are equally likely in all 3 social classes.
Descriptive analysis: Frequency Table
The above data can be summarized and presented using simple frequency tables and graphs
Social class Frequency
Lower class 30
Middle class 20
Upper class 10
60
Discussion (1)
Looking at the above graph, one can say that there were more people involved in the illegal connections of electricity within the lower class social class community.
But this statement needs to be tested statistically by performing further calculations
Hence the chi-square calculation below
SolutionRecall the hypotheses stated under chapter one
Ho: Illegal connections are the same in all 3 social classes
Ha: Illegal connections vary according to the respondent’s social classes (3 social classes)
If all 3 social classes are made up of human beings who behave the same way then expected occurrence of illegal connections equals 60/3=20 each
X2 Contingency table
• Table for analysis
There are 3 classes, hence, k=3
Degree of freedom , df=k-1 =3-1 =2
Critical value of χ2 (0.01, 2) =9.21 (refer to table)(0.01 is the alpha level & 2 is degree of freedom)
Social class Lower Middle Upper
Observed frequencies (O) 30 20 10
Expected frequencies (E) 20 20 20
Preparing the Table
Social class Lower Middle Upper Total
O 30 20 10 60
E 20 20 20 60
O-E 10 0 -10
(O-E)2 102=100 0 (-10)2=100
∑[(O-E)2/E] 100/20=5
0/20=0
100/20=5
χ2 =5+5
=10
Summary Of Results
• Critical value of χ2 (0.01, 2) =9.21
• Calculated value of χ2 =10
Recall the Decision rule
• When Calculated value of χ2 >= Critical value ofχ2, Ho is rejected
• When Calculated value of χ2 < Critical value of χ2, Ho is accepted
Decision rule revoked
Since χ2 calc > χ2 critical
=> 10 > 9.21
Conclusion: We reject the null hypothesis that illegal connections are equally likely in all 3 social classes.
That is, illegal electrical connections vary in terms of the inhabitants’ social class status.
Type I & Type II errors
Type I error:
1. Rejection of a true null hypothesis is called the type I error.
2. The subsequent results might not produce the result observed in the original investigation.
3. Leads to changes that are unwarranted.
Type II error:
1. Retention of false null hypothesis is called the type II error.
2. The ultimate truth remains unknown although evidence might support an alternative hypothesis.
3. Leads to maintenance of a status quo when a change is warranted.
Summary
Data can be referred to as numerical facts and figures from which conclusions are drawn using statistical analysis. It is the information researchers obtain on the subjects of their study/research. The statistical data usually come in two forms. These are: (1) Demographic information, and (2) Information gathered on the basis of research objectives.
After the instruments have been administered and collated, there is the need to review, assemble and sort them out based on well-defined criteria such as community, job type, gender, religion, social status, marital status, etc.
Data management is the assembling and keeping of data accurately and securely and in a form that will be available and easy to use.
Data analysis involves summarizing into tables and presentation of data using graphs. It also involves statistical calculations to ensure guaranteed statistical truths and relevance of data gathered.