Screening Depression and Anxiety via Brief Measures of ...
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Screening Depression and Anxiety via Brief Measures of ...
Screening Depression and Anxiety via Brief Measures of Psychological Inflexibility
Sean N. Weeksa, Tyler L. Renshawa, and Anthony J. Robersonb
a Department of Psychology, Utah State University.
b Department of Clinical, Health, and Applied Sciences, University of Houston-Clear Lake
Author Note
Correspondence should be addressed to Sean Weeks, Utah State University, Department
of Psychology, 2810 Old Main Hill, Logan, UT 84322, USA. Email: [email protected].
We declare no conflicts of interest related to this study.
Abstract
We evaluated the usefulness of scores from two transdiagnostic scales—the 8-item version of the
Avoidance and Fusion Questionnaire for Youth (AFQ-Y8) and the second edition of the
Avoidance and Action Questionnaire (AAQ-II)—for estimating symptom severity on two
measures of depression and anxiety. Responses from 797 college students, who mostly identified
as White and female, to both measures of psychological inflexibility were analyzed to determine
how well scores estimated anxiety and depression above or below a given severity level and at
specific categories of symptom severity. Findings indicated that scores from both measures were
acceptable to excellent screeners of concurrent ratings of anxiety and depression. Results varied
somewhat depending on the measure used, level of severity targeted, and scope of screening. By
investigating the screening accuracy of these transdiagnostic measures and potential cut scores to
ease in interpreting results, we hope these measures might prove useful for addressing barriers in
public health screening endeavors.
Keywords: psychological inflexibility, mental health, depression, anxiety, screening
Screening Depression and Anxiety via Brief Measures of Psychological Inflexibility
Psychological flexibility is the theorized mechanism of change targeted through
acceptance and commitment therapy (ACT; Hayes et al., 2006). This overarching construct is
derived from six underlying processes: acceptance, defusion, present moment awareness, self-as-
context, values, and committed action. Broadly put, psychological flexibility refers to one’s
ability to adapt or persist in various contexts for the purposes of contacting valued outcomes
(Hayes et al., 2006). Conversely, psychological inflexibility refers to the opposite valence of the
construct. Psychological flexibility was developed out of necessity to be a pragmatic,
overarching construct for guiding the goals of treatment. The aim of improving psychological
flexibility is applicable for all clients, regardless of presentation (Hayes et al., 2006; Levin et al.,
2014). Therefore, the purpose of ACT is to increase psychological flexibility by addressing the
relevant processes implicated in the individual’s presenting inflexibility. This goal can be
generalized across time, place, problem, and pathology—making ACT a truly transdiagnostic
treatment approach (Dindo et al., 2017).
Transdiagnostic treatments, like ACT, have proven to be beneficial by focusing on
changing the mechanism that leads to multiple problems, instead of targeting just one.
Transdiagnostic approaches also help to reduce the burden of highly tailored treatment plans,
such as intensity, cost, resources, and complexity (Martin et al., 2017). In broader
implementations of mental health services, it is difficult to train, finance, and retain professionals
who are knowledgeable in evidence-based practice (EBP; Stewart et al., 2016). Transdiagnostic
mental health treatments offer a potential solution toward this end, as training and
implementation can be streamlined across diverse presenting problems as well as generalized and
scaled for use within a public health framework (Martin et al., 2017).
One promising inroad for making good use of transdiagnostic approaches within a public
health framework is the practice of screening transdiagnostic indicators that might predict the
presence of multiple mental health problems (Renshaw, 2016). Prior to the implementation of a
transdiagnostic treatment in public health, individuals who might benefit from mental health
services must first be identified and prioritized. This identification and prioritization process is
another key function of public health as it helps distribute resources within a population and can
be conducted at the universal level, focusing on everyone, or at a more targeted level, focusing
on those who might already be identified as at-risk. Although screening serves a critical function
within a public health framework, it can be difficult to implement given practical barriers (Weist
et al., 2007). We suggest that brief transdiagnostic measures may greatly reduce the
implementation burden of mental health screening. Furthermore, by developing cut scores for
universal and targeted standard screening procedures that are validated in settings that can access
large numbers of individuals, like with adolescents and young adults in schools and universities,
transdiagnostic screeners might offer an evidence-based means for efficiently identifying
multiple mental health needs within a population.
The 8-item version of the Acceptance and Fusion Questionnaire for Youth (AFQ-Y8;
Greco et al., 2008) and the Acceptance and Action Questionnaire-II (AAQ-II; Bond et al., 2011)
are two brief measures of psychological inflexibility that may be useful as transdiagnostic mental
health screening tools. Both measures share the key usability features of being brief, free, and
quick to administer, while also having adequate validity evidence to support interpreting their
scores as measuring a transdiagnostic construct (e.g., Greco et al., 2008; Bond et al., 2011).
Thus, both the AFQ-Y8 and the AAQ-II are amenable to norming for young adult populations.
However, to address the barriers of resources, fidelity, quality assurance, and training, additional
validation is needed to support the intended screening functions of both measures in a university
context. Thus far, the AAQ-II and AFQ-Y8 have both been used to predict multiple mental
health outcomes by determining at-risk thresholds, but the relative and comparative classification
validity of their scores have yet to be tested. Below is a brief review of the limited literature
supporting the potential of the AAQ-II and AFQ-Y8 as screening tools.
Acceptance and Action Questionnaire-II
In Bond et al.’s (2011) development study of the AAQ-II, preliminary psychometrics of
the new measure were investigated. The AAQ-II measures the constructs of experiential
avoidance, acceptance, and psychological inflexibility—and is the most widely used instrument
for assessing psychological inflexibility and experiential avoidance with adults (Hayes et al.,
2004). Because the AAQ-II was intended to be used as a broad community-based measure, we
suggest the general version has potential as a screener. Additionally, the AAQ-II is a relatively
brief and cost free (i.e., non-trademarked) self-report measure, with only 10 items, making it
feasible for quick and multi-modal administrations within a variety of settings.
Psychometric properties of the AAQ-II scores were originally evaluated from
administration of the measure with 2,816 participants across six samples. While the authors state
that the AAQ-II was not originally intended for diagnosis, cut points were established for people
who likely meet criteria for a psychological disorder. The following cut scores were derived
from a sample of University of Nevada undergraduates: 28 for depression, 24 for general
psychological distress, and 28 for global psychological symptoms. The cut score for
psychological distress derived from bank and financial samples was set at 25. Given that the
AAQ-II was initially developed to assess an individual’s overall level of psychological
inflexibility, establishing predictive cut scores was not the primary intention of the AAQ-II, and
therefore detailed statistical results were not provided. Thus, many indicators necessary for
defending the screening functions of a measure are unknown. However, the AAQ-II’s
development as a brief, efficient, and psychometrically sound community-based measure
provided sufficient rationale for researchers to consider evaluating it as a possible instrument for
screening.
In other work investigating possible cut scores for the AAQ-II, Hussey and Barnes-
Holmes (2012) sought to validate the use of an implicit measure based on relational frame theory
(Hayes et al., 2001) and depressive mood states. Scores from the Implicit Relational Assessment
Procedure (IRAP; Barens-Holmes et al., 2010) were tested in relation with psychological
inflexibility scores, measured by the AAQ-II. Results showed a distinction between participants
in the two depression groups based on the AAQ-II, with scores of 41 and below (indicating
participants with “high flexibility”) capturing all participants in the “Normal” range depressive
group. Moreover, using an AAQ-II cut score of 48 and above, 80% of the mild/moderate
depressive group were captured in the “low flexibility” group. Interestingly, Hussey and Barnes-
Holmes (2012) state that these findings are comparable with the cut scores suggested by Bond et
al. (2011) findings; however, roughly a 20-point difference exists between their recommended
cut points on the AAQ-II for depression risk.
Avoidance and Fusion Questionnaire for Youth
The AFQ-Y, developed by Greco et al. (2008), is a direct adaptation of the AAQ-I for
measuring psychological inflexibility in youth. Originally created as both a 17- and 8-item
measure, the AFQ-Y has been shown to have strong convergent and discriminant validity based
on Greco et al.’s (2008) initial validation study using confirmatory factor analysis. Moreover, the
results suggested strong internal consistency of scores with both college ( = .82; Renshaw,
2018) and youth samples ( = .90; Livheim et al., 2016).
Venta et al. (2012) conducted one of the few studies in which AFQ-Y cut scores from the
17-item version were evaluated for classification utility in predicting anxiety diagnoses. This
study used a sample of 111 adolescents in an inpatient psychiatric facility. Using a receiver
operating characteristic (ROC) curve analysis, Venta et al. (2012) found that scores from the
AFQ-Y17 had moderate classification accuracy (AUC = .78) for any anxiety diagnosis
(determined by a priori clinical decision making). The optimal cut score on the AFQ-Y17 for
screening for anxiety disorders was determined to be 26.5, which had both modest sensitivity
(.73) and specificity (.71).
Renshaw (2016) evaluated the classification utility of the short, 8-item version of the
AFQ-Y as a universal screener in schools for clinical-level depression and anxiety. This study
involved 219 adolescents attending an urban public high school, who responded to the AFQ-Y8
along with concurrent measures of depression, anxiety, and academic problems. Results based on
a confirmatory factor analysis showed that the AFQ-Y8 provided a better one-factor model fit
with the data than did the AFQ-Y17, and thus a ROC curve analysis was conducted with the
AFQ-Y8 as opposed to the longer version of the measure. Ultimately, findings indicated that the
scores from the AFQ-Y8 had excellent accuracy in identifying clinical-level depression (AUC
= .91) and anxiety (AUC = .92) based on concurrent gold-standard diagnostic scales. Renshaw
(2016) determined that an AFQ-Y8 cut score of 15 or greater was optimal for identifying a
school-based population of youth as being at-risk for significant internalizing problems, with
good sensitivities (.86 and .92) and specificities (.88 and .87) for depression and anxiety,
respectively.
Additional work to identify cut scores for the AFQ-Y8 with youth was done by Oppo and
colleagues (2019) using the Italian version of the measure (I-AFQ-Y8; Schweiger et al., 2017).
In their study, 1336 Italian secondary students’ scores on measures of psychological inflexibility
and mindfulness were examined using ROC curve analyses to determine who was at risk for
behavioral and emotional problems as well as for social, practical, and academic skill difficulties.
Using an identified cut score of 10.5, scores from the I-AFQ-8 were determined to predict 78%
of internalizing cases based on scores on the Youth Self-Report (Achenbach & Rescorla, 2001),
with modest sensitivity (.73) and specificity (.70). It is noteworthy that the results of this study
found AUC, sensitivity, and specificity values that were nearly identical with those found in the
Venta et al.’s (2012) study. It is also noteworthy that Renshaw (2016) found the AFQ-Y8 to have
stronger classification utility than did other studies, yet this is likely attributable to the
differences in measurement approaches for defining criterion variables.
The Present Study
As the above review highlights, screening cut scores previously developed for the AAQ-
II and AFQ-Y8 have varied as a function of the instrument used, the populations sampled, and
the criterion variables to which scores were compared. All of the studies outlined above have
used measures derived from the original AAQ-I (Hayes et al., 2004) and have shown to have at
least modest sensitivity, specificity, and overall classification utility. These findings are
reasonable, given the expectation within the screening literature that classification accuracy will
vary within sub-populations and across measures (Streiner & Cairney, 2007). They are also
promising, suggesting scores from both measures might be useful for transdiagnostic screening.
However, none of the studies reviewed tested the comparative classification utility of the AAQ-II
and AFQ-Y8 within the same sample; rather, each study administered and tested only one of the
brief psychological inflexibility measures, not both. Additionally, neither measures’ utility has
been tested at different levels of screening (i.e., universal or targeted). Considering scores from
the AAQ-II and AFQ-Y8 have both been deemed defensible for use with young adults (e.g.,
Bond et al., 2011; Fergus et al., 2012; Renshaw, 2018), both are potentially viable screeners for
public health purposes within this population. Young adulthood is a transitional stage that is
characterized by increased agency, new responsibilities, and identity formation, making this time
a pivotal contributor to an individual’s lifelong mental health trajectory (Schulenberg et al.,
2004). It is therefore important to find ways to adequately identify those at-risk for a broad range
of mental health problems during the developmental window of young adulthood. Thus, the
overarching goal of the present study was to directly evaluate the comparative classification
utility of scores from the AAQ-II and AFQ-Y8 within a young adult sample. Based on previous
research in this area, we predicted that scores from both the AAQ-II and AFQ-Y8 would yield at
least modest overall classification accuracy (AUC > .70), and that a potentially optimal cut score
would be characterized by at least modest sensitivity and specificity (> .70).
Ultimately, findings from the present study contribute to science-based practice in two
primary ways. First, the study expands the literature regarding validity evidence supporting the
practice of screening for psychological inflexibility. Second, findings help inform applied use of
psychological inflexibility measures with young, college-age adults within a public health
context, providing mental health professionals with evidence-based guidelines for detecting
internalizing problems via scores derived from the AFQ-Y8 and AAQ-II.
Methods
Participants
The present study was a secondary analysis of the same sample of participants described
in Renshaw (2018), using a narrower set of variables derived from responses to the large survey
of undergraduates described in that study. Participants were 797 college students attending a
large university in the southern region of the United States, who were recruited to participate
through an online system managed by the University’s Department of Psychology. Participants
were predominantly White (76.7%) and female (84.6%), and they were relatively evenly
distributed across educational class (first year = 32.6%, second year = 22.3%, third year = 27%,
fourth year or more = 18.1%). Additional details regarding the sampling method and survey
administration are reported in Renshaw (2018).
Measures
AAQ-II
The AAQ-II (Bond et al., 2011) was originally developed as a 10-item measure; however,
the present study used the reduced, 7-item version of measure recommended following further
psychometric analyses (Hayes, ND). Previous research regarding the psychometrics of the 10-
item AAQ-II was reviewed above (see Introduction), and validity evidence supporting the
defensibility of the 7-item version with the present sample was established by Renshaw (2018).
AFQ-Y8
As described above, the full version of the AFQ-Y (Greco et al., 2008) contains 17 items,
yet the briefer version—intended for population-based assessments—consists of a subset of 8
items. Previous validity evidence supporting both versions of the AFQ-Y was reviewed above
(see Introduction), and the psychometric superiority of the 8-item version compared to the 17-
item version was established with the present sample by Renshaw (2018).
Beck Depression Inventory–2 (BDI-II)
The BDI-II (Beck et al., 1996) is a 21-item measure of depression. Previous research
shows that scores from the BDI-II show strong internal consistency reliability and concurrent
validity with other mental health variables (Beck et al., 1996). Cut scores suggested in the user
manual for the BDI-II allow for the following four classifications of depression-related
symptoms: 13 or below = minimal, 14–19 = mild, 20–28 = moderate, 29 or above = severe.
Beck Anxiety Inventory (BAI)
The BAI (Beck & Steer, 1993) is a 21-item measure of anxiety. Previous research shows
that responses to the BAI have strong internal consistency reliability and concurrent validity with
other mental health indicators (Beck & Steer, 1993). Cut scores provided in the user manual for
the BAI allow for the following four classifications of anxiety-related symptoms: 7 or below =
minimal, 8–15 = mild, 16–25 = moderate, 26 or above = severe.
Data Analyses
Prior to running the primary analyses, descriptive statistics were evaluated to ensure the
useability of the data from each measure for answering the relevant research questions.
Psychometrics reported in Renshaw (2018) were also referenced to support functionality of the
measures.
Next, a series of ROC curve models were run to test the classification accuracy of scores
derived from the AAQ-II and AFQ-Y8 with the several classification options operationalized by
recommended cut scores on the BDI-II and BAI. Area under the curve (AUC) statistics from
these analyses were evaluated to determine the overall risk identification ability of each measure
(AUC values .50–.70 = low, .70–.90 = moderate, .90–1.00 = high; Streiner & Cairney, 2007). All
ROC curve models were evaluated in the R statistical environment (R Development Core Team,
2020) using the pROC package (Robin et al., 2011).
Model series 1 evaluated the ability of AAQ-II and AFQ-Y8 scores to discriminate
between minimal and mild-only classifications for anxiety and depression, respectively. Model
series 2 looked at the ability of AAQ-II and AFQ-Y8 scores to discriminate between minimal
and at-least-mild classifications (including scores in the moderate and severe ranges). Model
series 3 assessed the ability of these scores to discern between minimal and moderate-only
classifications, whereas Model series 4 assessed the ability of the scores in discriminating
between mild-or-below (including the minimal range) and at-least-moderate classifications
(including scores in the severe range). Model series 5 examined the ability of AAQ-II and AFQ-
Y8 scores to discriminate between minimal and severe classifications, whereas Model series 6
evaluated the discrimination ability of these scores for moderate-or-below (including the
minimal and mild ranges) and severe classifications, respectively.
These various classification accuracy models were conducted to help inform differential
use of screeners within a public health service delivery framework. Specifically, results from the
categorical models (i.e., Model series 1, 2, and 3) predicting the difference between minimal
symptoms and a single category or classification of elevated symptoms (i.e., mild versus
moderate versus severe) may inform targeted screening, which is used to better estimate
symptom severity—and therefore prioritize services—for those who have previously been
identified as at-risk by another means (e.g., referral for services from a peer, colleague, or
caregiver). Whereas the threshold models (i.e., Model series 4, 5, and 6) that estimate the
difference between a level of symptoms (i.e., minimal-or-higher, moderate-or-higher, severe)
and any level below that threshold (i.e., minimal, minimal-to-mild, and minimal-to-moderate)
may inform universal screening, which functions to identify a subset of individuals within a
population who meet a certain threshold of risk deemed consequential by the service system. All
iterations of categorical and threshold models were explored in this study given that there were
no priors from other studies to guide a more limited set of analyses. Therefore, it was assumed
plausible that results from ROC curve analyses could have differential classification utility based
on factors of the measure (i.e., AAQ-II vs. AFQ-Y8), outcome of interest (i.e., depression vs.
anxiety), and model type (i.e., categorical vs. threshold).
Following the series of ROC curve analyses, secondary analyses were conducted to
evaluate potential cut scores for the most viable categorical and threshold uses of each measure
based on evaluation of AUC results. Given the large number of potential measure–outcome–
classification combinations that could warrant cut scores based on promising AUC results
(> .70), a more limited set of analyses were run to serve as a proof-of-concept for how AAQ-II
and AFQ-Y8 cuts might be derived for the purposes of targeted or universal screening.
Appropriate cut scores were determined by creating comparison tables of sensitivity, specificity,
and positive and negative predictive values for the range of possible measure scores using the
pROC package (Robin et al., 2011). To minimize false positive and false negative results, we
considered an optimal cut score as one that balances the highest possible levels of both
sensitivity and specificity (.70–.80 = adequate, .80–.90 = good, and .90–1.00 = excellent), while
prioritizing negative and positive predictive values as a secondary consideration.
Results
Preliminary Analyses
Data from all 797 respondents were complete with no missing values. The composite
scores derived from each measure of psychological inflexibility (i.e., AAQ-II and AFQ-Y8),
anxiety (i.e., BAI), and depression (i.e., BDI-II) showed relative normality in their sample
distributions and adequate estimates of internal consistency (see Table 1). Pearson’s bivariate
correlations were conducted between scores derived from each measure and showed a strong
positive correlation between both measures of psychological inflexibility. Additionally,
moderately strong positive associations were found between the predictor and criterion measures
(see Table 2).
Primary Analyses
As described above, six series of ROC curve analyses (Model series 1–6) were evaluated
to investigate the classification accuracy of scores from the AAQ-II on the BDI-II (series A), the
AAQ-II on the BAI (series B), the AFQ-Y8 on the BDI-II (series C), and the AFQ-Y8 on the
BAI (series D). For each of these, we described Model series 1, 2, and 3 as categorical models
given they estimated the difference between minimal symptoms and a single other category of
elevated symptoms (i.e., mild, moderate, or severe), mutually exclusive from the others. We
described Model series 4, 5, and 6 as threshold models given they estimated the difference
between a stated threshold level of symptoms (i.e., mild-or-higher, moderate-or-higher, severe)
and any level below that threshold (i.e., minimal, minimal-to-mild, minimal-to-moderate).
Table 3 shows the AUC results for each series of ROC curve analyses. Based on the
perceived utility of both types of ROC models (i.e., categorical vs. threshold) for informing
differential screening uses (i.e., targeted vs. universal), the most appropriate and effective AUC
for each model type and purpose was selected to determine optimal cut scores based on
consideration of sensitivity, specificity, and positive and negative predictive values. Specifically,
threshold cut scores were determined to be more appropriate at the mild or moderate level in
order to be as inclusive as possible for universal screening purposes, whereas categorical cut
scores were determined to be more appropriate at the level that was deemed empirically superior
for classification. Again, the goal of this demonstration was not to exhaustively probe all cut
score possibilities for all purposes, but rather to illustrate the viability of deriving cut scores from
the ROC models analyzed previously.
AAQ-II Depression Classification Accuracy
The AAQ-II had an excellent AUC of 0.93 for identifying individuals scoring in the
severe versus minimal score range (see Table 3, Categorical Model A3). Table 4 shows that a cut
score of 26 on the AAQ-II accurately identified 84% of the sample with scores in the severe
depression range on the BDI-II and 85% who scored below the severe range. When assessing
whether a participant experienced symptoms of depression between moderate or less (see Table
3, Threshold Model A5), AAQ-II scores resulted in an AUC of .86. Table 5 shows that an AAQ-
II cut score of 26 accurately identified 79% of the sample with scores in the moderate-or-severe
range on the BDI-II and 78% who scored below. Further, 50% of participants with an AAQ-II
score of 26 or more had concurrent scores in the moderate or severe depression range on the
BDI-II while 93% with a score of 25 or less were also below this range of depression scores.
AAQ-II Anxiety Classification Accuracy
As with the other categorical model described above, individuals scoring in the severe
range of anxiety were most effectively identified by the AAQ-II when compared to minimal
symptoms (AUC = .93; see Table 3, Categorical Model B3). Table 6 shows that a cut score of 21
on the AAQ-II accurately identified 89% of the sample with BAI scores in the severe range and
89% who scored below. When assessing whether a participant experienced symptoms of anxiety
between mild or less (see Table 3, Threshold Model B4), AAQ-II scores resulted in an AUC
of .82. Table 7 shows that an AAQ-II cut score of 17 accurately identified 73% of the sample
with scores in the mild-to-severe range on the BAI and 75% that scored below.
AFQ-Y8 Depression Classification Accuracy
Based on results in Table 3, scores from the AFQ-Y8 had the best predictive value for
identifying individuals in the severe versus minimal score range as evidenced by an excellent
AUC of .95 (see Table 3, Categorical Model C3). Values presented in Table 8 indicate that a cut
score of 12 on the AFQ-Y8 accurately identified 89% of the sample with scores in the severe
versus minimal depression range on the BDI-II and 88% that scored below the severe range.
When assessing whether a participant experienced symptoms of depression between moderate or
less (see Table 3, Threshold Model C5), AFQ-Y8 scores resulted in a good AUC of .88. Table 9
shows that an AFQ-Y8 cut score of 11 accurately identified 80% of the sample with scores in the
moderate-or-severe range on the BDI-II and 77% who scored below.
AFQ-Y8 Anxiety Classification Accuracy
Assessing the utility of the AFQ-Y8 scores for identifying relative levels of anxiety, the
severe versus minimal evaluation (see Table 3; Categorical Model D3) was most accurate overall
with an AUC of .90, falling in the excellent range. Results presented in Table 10 indicate that a
cut score of 9 accurately identified 79% of the sample with scores in the severe anxiety range on
the BAI and 84% who scored below that range. When assessing whether a participant
experienced symptoms of anxiety between mild or less (see Table 3, Threshold Model D4),
AFQ-Y8 scores resulted in an acceptable AUC of .77. Table 11 shows that an AFQ-Y8 cut score
of 7 accurately identified 68% of the sample with scores in the mild-to-severe range on the BAI
and 73% who scored below.
Discussion
The current study aimed to improve public health-oriented mental health screening by
evaluating the usefulness of scores from two transdiagnostic rating scales of psychological
inflexibility in estimating concurrent symptom severity on measures of depression and anxiety.
By investigating the screening accuracy of these transdiagnostic measures and potential cut
scores to ease in interpreting results, we hoped the AFQ-Y8 and AAQ-II might be useful for
addressing practical barriers inherent in public health screening endeavors.
Both measures of psychological inflexibility were assessed to determine how their scores
may most effectively be used for estimating anxiety and depression risk above or below a given
severity level (i.e., threshold models), and for estimating the specific category of symptom
severity (i.e., mild, moderate, or severe exclusively) compared to minimal symptoms (i.e.,
categorical models). Overall, findings indicated that scores from both measures were acceptable
to excellent screeners of concurrent ratings of anxiety and depression based on certain metrics
but performed more poorly based on others. Further, results varied depending on the measure
used (AFQ-Y8 vs. AAQ-II), level of severity (i.e., mild vs. moderate vs. severe), and scope of
screening (i.e., categorical models vs. threshold models; see Table 3).
Considering the categorical models, in which psychological inflexibility scores were used
to estimate specific risk categories, as would be used in a targeted screening paradigm, scores
from both AFQ-Y8 and AAQ-II were most effective at identifying anxiety and depression
severity in the severe range, with uniformly strong AUC values. Similarly, considering the
results from the threshold models, where risk was set at a given level and all individuals above
that point were considered at-risk, as would be in a universal screening paradigm, scores from
the screening measures also performed best when risk was set at the severe level for both anxiety
and depression scores. However, AUC values for the severe categorical models were consistently
stronger than AUC for the severe threshold models. Additionally, the range of AUC values for
the threshold models within each series were not substantively different from one another when
considering their confidence intervals. In interest of investigating cuts that capture larger swaths
of respondents, as is often of interest in universal screening, lower risk thresholds were chosen
though these cuts should not be taken as absolutes but rather proof-of-concept for the approach.
Results indicated that the optimal performance of derived cut scores on the AFQ-Y8 and
AAQ-II were more meaningfully related to the outcome of interest (i.e., anxiety or depression)
rather than properties of the screeners themselves which both performed similarly for each
outcome domain respectively. Generally, both psychological flexibility measures performed best
when identifying individuals scoring at or above the moderate level of severity for depression
scores (AUC range = .86–.88) and the mild level for anxiety scores (AUC range = .77–.82).
Considering the conditional probabilities from the eight derived cut scores, all showed at
least acceptable sensitivity and specificity values (≥.70), with the exception of sensitivity for the
AFQ-Y8 in estimating mild-or-greater risk on the BAI, which was just below the threshold of
acceptability (.68). Broadly, this pattern of findings suggests that all identified cuts for both
screening measures performed acceptably well for identifying individuals in the respective
anxiety and depression severity ranges—as well as for rejecting individuals whose scores failed
to indicate risk on those measures. Further, considering PPV and NPV, identified cut scores on
both the AAQ-II and AFQ-Y8 for severe anxiety were good to excellent (.80–.90), suggesting a
strong likelihood for individuals identified as at-risk on the screeners to actually be at severe risk
on the anxiety measures and, on the other hand, not at severe risk if not identified. However,
while cut scores from both screeners showed strong ability to correctly identify individuals with
at-least-mild anxiety scores (PPV), the rates of true negatives (NPV) were considerably weaker,
with most individuals with a negative screener result actually endorsing at-least-mild levels of
anxiety. Conversely, while true negative rates were excellent for all four cuts based on the
depression measure, the true positive rates were only at about chance levels for each (PPV range
= .46–.53), suggesting considerable overidentification of depression risk.
We suggest that these results were largely consistent with findings from past research
using similar criterion measures. In Bond et al.’s (2011) development study of the AAQ-II, they
found that scores ranging from 24–28 were useful, depending on the sample and outcome. The
results of the AAQ-II on depression in the current study were also within this range (i.e., 26).
It is noteworthy that the AUC values were similar for predicting threshold levels of
anxiety across the AAQ-II and AFQ-Y8 in this sample (.77–.80) and with results from Venta et
al. and Oppo et al. calculating AUCs in the same range. Compared to Renshaw’s (2016) work
with the AFQ-Y8 using an urban high school student sample, our study found similarly high
AUC values when looking at the severe category of depression. But we determined that both
depression and anxiety risk could be more usefully identified with somewhat lower cut scores of
12 and 9, respectively, compared to Renshaw’s cut score of 15 for both criterions. One plausible
explanation for this discrepancy is that the AFQ-Y8 may be more sensitive when used as a
screener with young adults compared to adolescents. Additionally, the difference in samples
went beyond age, with the majority of Renshaw’s (2016) participants identifying as Black or
African American (96.3%) and roughly equally split between female and male respondents,
whereas the current study was comprised mostly of individuals identifying primarily as White
(76.7%) and female. Thus, these differences could possibly be explained by cultural and gender
identity factors, as these demographic variables have been shown to influence reporting in
response to sensitive mental health questions (Van de Vijer & Leung, 2021). Future studies
should therefore investigate the potential social/contextual mechanisms that impact the
applicability of psychological inflexibility cut scores across demographically diverse samples.
Implications
The current study’s findings indicate that scores from the AFQ-Y8 and the AAQ-II could
determine with reasonable precision if a college student scores in a given range on concurrent
measures of anxiety and depression. We suggest these findings have implications for both
universal and targeted screening endeavors and that brief transdiagnostic measures, including the
AFQ-Y8 and AAQ-II, could aid in reducing barriers to mental health care outlined by Weist et
al. (2007). Additionally, determining cut scores for risk identification can help to further
streamline the screening process by simplifying score interpretation. Findings from the current
study provide preliminary support for the usefulness of these measures toward such ends.
Although the purposes of screening will differ depending on the needs of the provider or
organization, results from this study suggest that taking a threshold approach may be more
broadly useful in situations where entire populations are being monitored. By using the
psychological flexibility measures to determine if an individual is above or below a given
symptom level (or threshold) for transdiagnostic internalizing symptoms, practitioners can
capture all respondents as functioning somewhere along a spectrum of symptom severity. This
screening paradigm can inform further assessment or may justify the need for a transdiagnostic
intervention, such as ACT, which addresses psychological inflexibility as the targeted
mechanism for change.
Moreover, the AFQ-Y8 and AAQ-II may be useful measures in multiple-gating
assessment procedures, where multiple measures and methods are used in sequential order—
often increasing in focus and intensity—to determine diagnoses or identify target problems
(Walker et al., 2014). Traditionally, multiple-gating assessment has been an approach to address
the barriers previously mentioned with public health screening by reducing the numbers of
individuals who would require more intensive assessment. This is done by identifying and
eliminating those who do not qualify as “at-risk” on less intensive measures. For this reason,
multiple-gating is considered best practice for screening in schools and community settings
(Whitcomb & Merrell, 2013). By analyzing the AFQ-Y8 and AAQ-II at the universal and
targeted levels, they could be used at multiple gates in the multiple-gating procedure, but with
different thresholds for identifying those in need. Ideally, this could simplify the identification
process by accounting for both levels of screening and further streamlining the process for those
who need more intensive assessment to access services.
While the current study highlighted the most effective cut scores for the AFQ-Y8 and
AAQ-II based on the criterion domains of interest (i.e., anxiety vs. depression), we suggest that
researchers might use the information provided in this study to evaluate alternative cuts for these
measures at other severity levels and with local samples. Furthermore, considering the much
higher and distributed base rates of anxiety risk compared to the lower prevalence depression
risk, it may be useful for future research or practical applications to set a lower threshold for
identifying depression risk and a higher bar for anxiety risk—recognizing that how scores are
used should always be driven by intent and needs of the organization implementing screening.
Limitations
A major limitation of this study is the generalizability of results based on sample
characteristics. Data for the current study was collected with university undergraduate students
and, when compared to results from a sample of mostly African American high school students
(Renshaw, 2016), the AUC values remained similar; however, identified cut scores were lower.
Considering our findings, the performance of AFQ-Y8 and AAQ-II scores are still likely to have
good predictive ability, though the thresholds for determining risk are likely to shift based on the
particular sample, highlighting the need to evaluate local norms, if possible, in comparison with
those derived from previous research. We therefore recommend that future research continue to
explore the usefulness of brief transdiagnostic screeners across populations with varying
demographic characteristics. It is noteworthy that the current study also employed data collection
procedures that were based on convenience sampling, a method that can often produce biased
results that may mispresent the true population. Thus, future research using purposive,
representative sampling techniques is warranted.
Finally, this study was also limited in its choice of predictor and criterion measures. The
current study used only anxiety and depression as mental health criterions, and measured the
transdiagnostic construct of psychological inflexibility using only the AFQ-Y8 and AAQ-II.
There is a breadth of broadband and narrowband measures that focus on various aspects of
mental health that could be tested in a similar manner. Furthermore, the AFQ-Y8 and AAQ-II
are compared to other screening instruments in this study, but to strengthen confidence in
classification, clinical samples and gold-standard diagnostic instruments/assessment procedures
(e.g., clinical interviews) should be used for comparison in later studies. Future research could
benefit from investigating other mental health criterions, diagnostic tools, more global measures
of mental health, and even different valences of these constructs. Testing the predictive ability of
psychological inflexibility measures with different types of assessment procedures and outcomes
could further aid in the generalizability of evidence supporting the viability of screening mental
health difficulties via brief transdiagnostic measures.
References
Achenbach, T. M., & Rescorla, L. (2001). Manual for the ASEBA school-age forms & profiles:
an integrated system of multi-informant assessment. Burlington, VT: ASEBA.
Barnes-Holmes, D., Barnes-Holmes, Y., Stewart, I., & Boles, S. (2010). A sketch of the Implicit
Relational Assessment Procedure (IRAP) and the Relational Elaboration and Coherence
(REC) model. The Psychological Record, 60(3), 527–542.
https://doi.org/10.1007/BF03395726
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck Depression Inventory-2.
Psychological Corporation.
Beck, A. T., & Steer, R. A. (1993). Manual for the Beck Anxiety Inventory. Psychological
Corporation.
Bond, F. W., Hayes, S. C., Baer, R. A., Carpenter, K. M., Guenole, N., Orcutt, H. K., Waltz, T.,
& Zettle, R. D. (2011). Preliminary psychometric properties of the Acceptance and
Action Questionnaire–II: A revised measure of psychological inflexibility and
experiential avoidance. Behavior Therapy, 42(4), 676–688.
https://doi.org/10.1016/j.beth.2011.03.007
Dindo, L., Van Liew, J. R., & Arch, J. J. (2017). Acceptance and commitment therapy: a
transdiagnostic behavioral intervention for mental health and medical
conditions. Neurotherapeutics, 14(3), 546–553.
Fergus, T. A., Valentiner, D. P., Gillen, M. J., Hiraoka, R., Twohig, M. P., Abramowitz, J. S., &
McGrath, P. B. (2012). Assessing psychological inflexibility: The psychometric
properties of the Avoidance and Fusion Questionnaire for Youth in two adult samples.
Psychological Assessment, 24(2), 402–408. https://doi.org/10.1037/a0025776
Greco, L. A., Lambert, W., & Baer, R. A. (2008). Psychological inflexibility in childhood and
adolescence: Development and evaluation of the Avoidance and Fusion Questionnaire for
Youth. Psychological Assessment, 20(2), 93–102. https://doi.org/10.1037/1040-
3590.20.2.93
Hayes, S. C. (ND). The 7-item Acceptance and Action Questionnaire-II. Association for
Contextual Behavioral Science.
https://contextualscience.org/acceptance_action_questionnaire_aaq_and_variations
Hayes, S. C., Luoma, J. B., Bond, F. W., Masuda, A., & Lillis, J. (2006). Acceptance and
commitment therapy: Model, processes and outcomes. Behaviour Research and Therapy,
44(1), 1–25. https://doi.org/10.1016/j.brat.2005.06.006
Hayes, S. C., Barnes-Holmes, D., & Roche, B. (Eds.). (2001). Relational frame theory: A post-
Skinnerian account of human language and cognition. Plenum.
Hayes, S. C., Strosahl, K. D., Wilson, K. G., Bissett, R. T., Pistorello, J., Toarmino, D., Polusny,
M., A., Dykstra, T. A., Batten, S. V., Bergan, J., Stewart, S. H., Zvolensky, M. J., Eifert,
G. H., Bond, F. W., Forsyth J. P., Karekla, M., & McCurry, S. M. (2004). Measuring
experiential avoidance: A preliminary test of a working model. The Psychological
Record, 54(4), 553–578. https://doi.org/10.1007/BF03395492
Hussey, I., & Barnes-Holmes, D. (2012). The implicit relational assessment procedure as a
measure of implicit depression and the role of psychological flexibility. Cognitive and
Behavioral Practice, 19(4), 573–582. https://doi.org/10.1016/j.cbpra.2012.03.002
Levin, M. E., MacLane, C., Daflos, S., Seeley, J. R., Hayes, S. C., Biglan, A., & Pistorello, J.
(2014). Examining psychological inflexibility as a transdiagnostic process across
psychological disorders. Journal of contextual behavioral science, 3(3), 155-163. https://
doi.org/10.1016/j.jcbs.2014.06.003Get
Livheim, F., Tengström, A., Bond, F. W., Andersson, G., Dahl, J., & Rosendahl, I. (2016).
Psychometric properties of the Avoidance and Fusion Questionnaire for Youth: A
psychological measure of psychological inflexibility in youth. Journal of Contextual
Behavioral Science, 5(2), 103–110. https://doi.org/10.1016/j.jcbs.2016.04.001
Martin, P., Murray, L. K., Darnell, D., & Dorsey, S. (2017). Transdiagnostic treatment
approaches for greater public health impact: Implementing principles of evidence based ‐
mental health interventions. Clinical Psychology: Science and Practice, 25(4), e12270.
https://doi.org/10.1111/cpsp.12270
Oppo, A., Schweiger, M., Ristallo, A., Presti, G., Pergolizzi, F., & Moderato, P. (2019).
Mindfulness skills and psychological inflexibility: Two useful tools for a clinical
assessment for adolescents with internalizing behaviors. Journal of Child and Family
Studies, 28(12), 3569–3580. https://doi.org/10.1007/s10826-019-01539-w
R Core Team. (2020). R: A language and environment for statistical computing. Vienna, Austria:
R Foundation for Statistical Computing. https://www.R-project.org/
Renshaw, T. L. (2016). Screening for psychological inflexibility: Initial validation of the
Avoidance and Fusion Questionnaire for Youth as a school mental health
screener. Journal of Psychoeducational Assessment, 35(5), 482–493.
https://doi.org/10.1177/0734282916644096
Renshaw, T. L. (2018). Probing the relative psychometric validity of three measures of
psychological inflexibility. Journal of Contextual Behavioral Science, 7, 47–54.
https://doi.org/10.1016/j.jcbs.2017.12.001
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J. C., & Müller, M. (2011).
pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC
bioinformatics, 12(1), 1–8. https://doi.org/10.1186/1471-2105-12-77
Schulenberg, J. E., Sameroff, A. J., & Cicchetti, D. (2004). The transition to adulthood as a
critical juncture in the course of psychopathology and mental health. Development and
psychopathology, 16(4), 799-806. https://doi.org/10.1017/s0954579404040015
Schweiger, M., Ristallo, A., Oppo, A., Pergolizzi, F., Presti, G., & Moderato, P. (2017). Ragazzi
in lotta con emozioni e pensieri: la validazione della versione italiana dell Avoidance andʼ
Fusion Questionnaire for Youth (I-AFQ-Y). Psicoterapia Cognitiva e Comportamentale,
23(2), 141–162.
Stewart, R. E., Adams, D. R., Mandell, D. S., Hadley, T. R., Evans, A. C., Rubin, R., Erney, J.,
Neimark, G., Hurford, M. O., Beidas, R. S., & Beidas, R. S. (2016). The perfect storm:
collision of the business of mental health and the implementation of evidence-based
practices. Psychiatric Services, 67(2), 159–161.
https://doi.org/10.1176/appi.ps.201500392
Streiner, D. L., & Cairney, J. (2007). What's under the ROC? An introduction to receiver
operating characteristics curves. The Canadian Journal of Psychiatry, 52(2), 121–128.
https://doi.org/10.1177/070674370705200210
Van de Vijver, F. J., & Leung, K. (2021). Methods and data analysis for cross-cultural
research (Vol. 116). Cambridge University Press.
Venta, A., Sharp, C., & Hart, J. (2012). The relation between anxiety disorder and experiential
avoidance in inpatient adolescents. Psychological Assessment, 24(1), 240–248.
https://doi.org/10.1037/a0025362
Walker, H. M., Small, J. W., Severson, H. H., Seeley, J. R., & Feil, E. G. (2014). Multiple-
gating approaches in universal screening within school and community settings. In R. J.
Kettler, T. A. Glover, C. A. Albers, & K. A. Feeney-Kettler (Eds.), School psychology
book series. Universal screening in educational settings: Evidence-based decision
making for schools (p. 47–75). American Psychological
Association. https://doi.org/10.1037/14316-003
Weist, M. D., Rubin, M., Moore, E., Adelsheim, S., & Wrobel, G. (2007). Mental health
screening in schools. Journal of School Health, 77(2), 53–58.
https://doi.org/10.1111/j.1746-1561.2007.00167.x
Whitcomb, S., & Merrell, K. W. (2013). Behavioral, social, and emotional assessment of
children and adolescents. Routledge.
Table 1
Descriptive Statistics for All Study Measures
Measure M SD min max skewness kurtosis α ωAFQ-Y8 9.14 5.95 0 30 0.78 0.18 0.80 0.85AAQ-II 21.65 9.65 7 49 0.50 -0.45 0.91 0.93BAI 17.16 11.10 0 57 0.87 0.50 0.92 0.94BDI-II 12.53 10.51 0 52 1.05 0.64 0.93 0.96Note. AFQ-Y8 = 8-item version of the Avoidance and Fusion Questionnaire for Youth; AAQ-II
= second edition of the Acceptance and Action Questionnaire; BAI = Beck Anxiety Inventory;
BDI-II = second edition of the Beck Depression Inventory.
Table 2
Bivariate Correlations Among All Study Measures
Measures
Pearson r [95% CI]AFQ-Y8 AAQ-II BAI BDI-II
AFQ-Y8 ––
AAQ-II .78* [.75, .80] ––
BAI .59* [.54, .64] .63* [.59, .68] ––
BDI-II .68* [.64, .72] .69* [.65, .73] .67* [.62, .71] ––
*p < .001.
Note. AFQ-Y8 = 8-item version Avoidance and Fusion Questionnaire for Youth; AAQ-II =
second edition of the Acceptance and Action Questionnaire; BAI = Beck Anxiety Inventory;
BDI-II = second edition of the Beck Depression Inventory.
Table 3
Area Under the Curve (AUC) for All Classification Accuracy Models
Classification Level
Classification Series/Model Mild Moderate Severe
AUC [95% CI] AUC [95% CI] AUC [95% CI]Series A: AAQ-II on BDI-II
Categorical A1: .80 [.76, .84] A2: .88 [.84, .91] A3: .93 [.90, .96]Threshold A4: .86 [.83, .89] A5: .86 [.83, .89] A6: .87 [.83, .91]
Series B: AAQ-II on BAICategorical B1: .71 [.66, .76] B2: .86 [.82, .90] B3: .93 [.90, .96]Threshold B4: .82 [.78, .85] B5: .81 [.78, .84] B6: .83 [.79, .86]
Series C: AFQ-Y8 on BDI-IICategorical C1: .75 [.70, .79] C2: .87 [.83, .90] C3: .95 [.93, .97]Threshold C4: .84 [.81, .87] C5: .88 [.85, .90] C6: .90 [.87, .93]
Series D: AFQ-Y8 on BAICategorical D1: .67 [.62, .72] D2: .79 [.74, .84] D3: .90 [.87, .93]Threshold D4: .77 [.73, .81] D5: .77 [.73, .80] D6: .80 [.76, .84]
Note. AFQ-Y8 = 8-item version of the Avoidance and Fusion Questionnaire for Youth; AAQ-II
= second edition of the Acceptance and Action Questionnaire; BAI = Beck Anxiety Inventory;
BDI-II = second edition of the Beck Depression Inventory.
Table 4
Conditional Probabilities of AAQ-II Scores for Identifying Severe Depression Classification
Score Sensitivity Specificity PPV NPV23 .95 .76 .37 .9924 .92 .78 .39 .9825 .88 .83 .44 .9826 .84 .85 .46 .9727 .80 .88 .49 .9728 .75 .90 .54 .9629 .73 .92 .59 .96
Note. Bold text = values associated with the preferred cut score. PPV = positive predictive value;
NPV = negative predictive value.
Table 5
Conditional Probabilities of AAQ-II Scores for Identifying At-Least-Moderate Depression
Classifications
Score Sensitivity Specificity PPV NPV23 .90 .67 .43 .9624 .86 .70 .45 .9525 .84 .75 .48 .9426 .79 .78 .50 .9327 .75 .80 .52 .9228 .68 .84 .54 .9029 .65 .87 .58 .90
Note. Bold text = values associated with the preferred cut score. PPV = positive predictive value;
NPV = negative predictive value.
Table 6
Conditional Probabilities of AAQ-II Scores for Identifying Severe Anxiety Classification
Score Sensitivity Specificity PPV NPV18 .92 .79 .82 .9119 .91 .84 .86 .9020 .91 .87 .88 .9021 .89 .89 .90 .8822 .88 .90 .90 .8723 .85 .91 .91 .8524 .83 .93 .92 .84
Note. Bold text = values associated with the preferred cut score.
Table 7
Conditional Probabilities of AAQ-II Scores for Identifying At-Least-Mild Anxiety Classification
Score Sensitivity Specificity PPV NPV14 .84 .61 .89 .4915 .80 .64 .90 .4516 .76 .70 .91 .4217 .73 .75 .92 .4118 .69 .79 .93 .4019 .66 .84 .94 .3820 .64 .87 .95 .38
Note. Bold text = values associated with the preferred cut score.
Table 8
Conditional Probabilities of AFQ-Y8 Scores for Identifying Severe Depression Classification
Score Sensitivity Specificity PPV NPV9 .96 .73 .35 .9910 .96 .79 .41 .9911 .93 .84 .46 .9912 .89 .88 .53 .9813 .88 .89 .56 .9614 .77 .91 .57 .9615 .75 .94 .66 .96
Note. Bold text = values associated with the preferred cut score.
Table 9
Conditional Probabilities of AFQ-Y8 Scores for Identifying At-Least-Moderate Depression
Classification
Score Sensitivity Specificity PPV NPV8 .90 .57 .37 .959 .88 .66 .42 .9510 .86 .72 .46 .9511 .80 .77 .49 .9312 .73 .82 .54 .9213 .71 .86 .58 .9114 .62 .88 .59 .89
Note. Bold text = values associated with the preferred cut score.
Table 10
Conditional Probabilities of AFQ-Y8 Scores for Identifying Severe Anxiety Classification
Score Sensitivity Specificity PPV NPV6 .94 .65 .74 .917 .89 .73 .78 .878 .84 .75 .78 .829 .79 .84 .84 .8010 .77 .88 .87 .7811 .71 .89 .88 .7512 .67 .94 .92 .73
Note. Bold text = values associated with the preferred cut score.
Table 11
Conditional Probabilities of AFQ-Y8 Scores for Predicting At-Least-Mild Anxiety Classification
Score Sensitivity Specificity PPV NPV4 .89 .37 .85 .465 .83 .52 .87 .446 .76 .65 .89 .417 .68 .73 .91 .378 .60 .75 .91 .329 .53 .84 .93 .3110 .48 .88 .94 .30
Note. Bold text = values associated with the preferred cut score.