Automated approaches to characterizing educational digital library usage: linking computational...

14
Int J Digit Libr (2012) 13:51–64 DOI 10.1007/s00799-012-0096-x Automated approaches to characterizing educational digital library usage: linking computational methods with qualitative analyses Keith E. Maull · Manuel Gerardo Saldivar · Tamara Sumner Received: 28 November 2011 / Revised: 27 June 2012 / Accepted: 3 July 2012 / Published online: 29 July 2012 © Springer-Verlag 2012 Abstract The need for automatic methods capable of characterizing adoption and use has grown in operational dig- ital libraries. This paper describes a computational method for producing two, inter-related, user typologies based on use diffusion. Furthermore, a case study is described that demon- strates the utility and applicability of the method: it is used to understand how middle and high school science teach- ers participating in an academic year-long field trial adopted and integrated digital library resources into their instructional planning and teaching. Use diffusion theory views technol- ogy adoption as a process that can lead to widely different patterns of use across a given population of potential users; these models use measures of frequency and variety to char- acterize and describe such usage patterns. By using computa- tional techniques such as clickstream entropy and clustering, the method produces both coarse- and fine-grained user ty- pologies. As a part of improving the initial coarse-grain typol- ogy, clickstream entropy improvements are described that aim at better separation of users. In addition, a fine-grained user typology is described that identifies five different types of teacher-users, including “interactive resource specialists” and “community seeker specialists.” This typology was val- idated through comparison with qualitative and quantitative data collected using traditional educational field research methods. Results indicate that qualitative analyses corre- late with the computational results, suggesting automatic K. E. Maull (B ) · M. G. Saldivar · T. Sumner Department of Computer Science, Institute of Cognitive Science, University of Colorado, Boulder, CO 80309, USA e-mail: [email protected] M. G. Saldivar e-mail: [email protected] T. Sumner e-mail: [email protected] methods may prove an important tool in discovering valid usage characteristics and user types. Keywords Technology adoption · Diffusion of innovation · Use diffusion models · Educational digital libraries 1 Introduction Educational digital libraries have evolved from being primar- ily research-oriented enterprises to encompass a large num- ber of operational library sites, such as NSDL, Merlot, and DLESE in the US, SchoolNet in Europe, and the National Digital Learning Resources Network in Australia, to name just a few. As library efforts continue to mature, there is a growing need for efficient and scalable methods to char- acterize their uptake and adoption, their impact on teacher and student practices, and ultimately their impact on stu- dent learning. In this article, we describe a computational method for automatically identifying and characterizing dif- ferent patterns of digital library adoption and use. This method instantiates a particular theoretical model of technology adoption, use diffusion [31], which in turn builds on prior work on the diffusion of innovation [27]. Diffu- sion of innovation is one of the most researched and widely employed social science models; it has been used to study the adoption of agriculture innovations such as new corn varie- ties [27], health innovations such as water purification and disease treatments [27], and the very rapid adoption of digital consumer products [24]. Diffusion of innovation theory pro- vides a lens for understanding the different factors that influ- ence a person’s decision to use, or not use, an innovation, and when in the product lifecycle they might adopt. For instance, an “early adopter” farmer might be motivated by the thought of potential harvest gains to be the first farmer in the region to 123

Transcript of Automated approaches to characterizing educational digital library usage: linking computational...

Int J Digit Libr (2012) 1351ndash64DOI 101007s00799-012-0096-x

Automated approaches to characterizing educational digitallibrary usage linking computational methods with qualitativeanalyses

Keith E Maull middot Manuel Gerardo Saldivar middotTamara Sumner

Received 28 November 2011 Revised 27 June 2012 Accepted 3 July 2012 Published online 29 July 2012copy Springer-Verlag 2012

Abstract The need for automatic methods capable ofcharacterizing adoption and use has grown in operational dig-ital libraries This paper describes a computational methodfor producing two inter-related user typologies based on usediffusion Furthermore a case study is described that demon-strates the utility and applicability of the method it is usedto understand how middle and high school science teach-ers participating in an academic year-long field trial adoptedand integrated digital library resources into their instructionalplanning and teaching Use diffusion theory views technol-ogy adoption as a process that can lead to widely differentpatterns of use across a given population of potential usersthese models use measures of frequency and variety to char-acterize and describe such usage patterns By using computa-tional techniques such as clickstream entropy and clusteringthe method produces both coarse- and fine-grained user ty-pologies As a part of improving the initial coarse-grain typol-ogy clickstream entropy improvements are described thataim at better separation of users In addition a fine-graineduser typology is described that identifies five different typesof teacher-users including ldquointeractive resource specialistsrdquoand ldquocommunity seeker specialistsrdquo This typology was val-idated through comparison with qualitative and quantitativedata collected using traditional educational field researchmethods Results indicate that qualitative analyses corre-late with the computational results suggesting automatic

K E Maull (B) middot M G Saldivar middot T SumnerDepartment of Computer Science Institute of Cognitive ScienceUniversity of Colorado Boulder CO 80309 USAe-mail maullcoloradoedu

M G Saldivare-mail saldivarcoloradoedu

T Sumnere-mail sumnercoloradoedu

methods may prove an important tool in discovering validusage characteristics and user types

Keywords Technology adoption middot Diffusion of innovation middotUse diffusion models middot Educational digital libraries

1 Introduction

Educational digital libraries have evolved from being primar-ily research-oriented enterprises to encompass a large num-ber of operational library sites such as NSDL Merlot andDLESE in the US SchoolNet in Europe and the NationalDigital Learning Resources Network in Australia to namejust a few As library efforts continue to mature there isa growing need for efficient and scalable methods to char-acterize their uptake and adoption their impact on teacherand student practices and ultimately their impact on stu-dent learning In this article we describe a computationalmethod for automatically identifying and characterizing dif-ferent patterns of digital library adoption and use

This method instantiates a particular theoretical model oftechnology adoption use diffusion [31] which in turn buildson prior work on the diffusion of innovation [27] Diffu-sion of innovation is one of the most researched and widelyemployed social science models it has been used to study theadoption of agriculture innovations such as new corn varie-ties [27] health innovations such as water purification anddisease treatments [27] and the very rapid adoption of digitalconsumer products [24] Diffusion of innovation theory pro-vides a lens for understanding the different factors that influ-ence a personrsquos decision to use or not use an innovation andwhen in the product lifecycle they might adopt For instancean ldquoearly adopterrdquo farmer might be motivated by the thoughtof potential harvest gains to be the first farmer in the region to

123

52 K E Maull et al

try out a new corn variety whereas a ldquolate majorityrdquo farmerwould wait until most of the surrounding farms had alreadyadopted the new corn Characterizing the adoption of con-temporary information services such as educational digitallibraries however is more complex than identifying whenthe farmer planted the corn or when the consumer boughtthe digital device Did a teacher that used NSDL or MER-LOT one time lsquoadoptrsquo the library In acknowledgement ofthis complexity instead of focusing on when an innovation isfirst used use diffusion examines both how and how much aninnovation is used to identify different adopter categories Itrecognizes that both the depth and breadth of usage will varywidely across different users and that successful adoptionwill take many different forms

Building on techniques from web analytics and datamining the proposed computational method employs twodifferent algorithms in a two-step process to develop bothcoarse- and fine-grained views of user behavior These algo-rithms rely on detailed web site usage logs where each indi-vidual action in the interface is recorded and associated with aunique user identifier In the first step one algorithm uses fre-quency of use and variety of use to sort users into different usediffusion adopter quadrants such as ldquointense userdquo ldquolimiteduserdquo ldquospecialized userdquo etc One challenge with operational-izing use diffusion in a computational method is modelingvariety in a way that is application independent In Maull et al[21] variety was calculated using Shannon [30] entropy amathematical construct from information theory While thefirst attempt using entropy resulted in a valuable step forwardin understanding how to map web usage onto the use dif-fusion model it did not sufficiently characterize specializeduse In this study we extend the initial calculations to includean enhanced entropy computation that includes a penalty forlow variety The details of these new computations and theirperformance are described in detail later in this article In thesecond step a clustering algorithm is employed to develop afiner-grained understanding of the different patterns of gener-alized and specialized use within and across these quadrantsThe result of this step is a user typology or classification ofusers along the selected dimensions of each cluster

To illustrate the utility of the computational method theresulting user typology and how they might be applied inpractice we present a case study where we used this methodto better understand how middle and high school scienceteachers integrated digital library resources from NSDL andDLESE into their instructional planning and teaching prac-tice Teachers were provided with a web-based planning toolthat enabled them to customize their districtrsquos adopted cur-riculum with digital library resources to better meet diverselearner needs and to share their customizations with otherteachers in their district [34] This planning tool the Cur-riculum Customization Service (CCS) was deployed to allmiddle and high school Earth science teachers within a large

urban school district in the Midwest US for a full academicyear the Service was carefully instrumented to record everyuser action and detailed usage logs were collected This caseoffers an excellent testbed for this method for two reasonsFirst the total potential user population is known and quan-tifiable enabling us to more easily assess overall rates ofuptake and adoption Second this deployment was also stud-ied using traditional educational field research methods suchas surveys interviews and classroom observations Thuswe can compare the coarse- and fine-grained views of userbehavior identified by the computational method with find-ings from the field study to better assess the accuracy andvalidity of the methodrsquos output While this case is clearlyfocused on understanding educational use of digital librarieswe believe the proposed method to be generalizable andapplicable to a wide variety of digital libraries informationservices and learning applications

2 Background and related work

This research draws on theories and computational tech-niques from several disciplines in order to better understanddigital library adoption use First as previously described wedraw on technology adoption and diffusion theories whichare historically rooted in social science to inform the pur-pose and overall functioning of our computational methodSecond we discuss related research to understand userbehavior and user typologies describing how our approachcompares to other efforts that are similarly focused on devel-oping automatic methods

21 Adoption and diffusion theories

Technology adoption occurs when an individual decides thata given technological innovation has utility and can add valueto his or her activitiesmdashsuch as teachingmdashif that innova-tion is somehow incorporated into those activities [33] Thusmuch theoretical work to date has focused on understandingthe cognitive affective and contextual factors that influencea potential userrsquos decision-making process One prominentfamily of theories offers extensions or refinements of Rogerrsquosinnovation diffusion theory [72527] To Rogers technol-ogy adoption is fundamentally a function of the communica-tion channels and social systems of which one is a part Thistheory suggests that (1) within a social system there are typi-cally five different ldquoadopter categoriesrdquo describing the differ-ent characteristics that users bring to bear when consideringwhether to adopt an innovation and (2) these characteristicsinfluence how innovations move through the social systemin predictable ways Another model is the concerns-basedadoption model developed by Fuller [12] and Hall [13] As itsname implies this model is focused on individualsrsquo concerns

123

Linking computational methods with qualitative analyses 53

which are defined as the specific reasonsmdashsituated in onersquossocio-cultural contextmdashthat one might have to adopt or notadopt technology A third major approach is the technologyacceptance model or TAM [636] This was one of the firstmodels to take into account the individualsrsquo self-efficacy andexpertise Proponents of this model argue that individualsrsquoself-perception of their ability to use technology and theirability to judge whether a technology has utility for themare important factors for understanding technology adoptionbehaviors

These prior models have contributed greatly to our under-standing of technology adoption however they all share acouple of weaknesses First none of them take into accountdiscontinuance ie people often stop using a new technol-ogy after they have tried it out a few times Second theyprovide very little insight into actual use of the new technol-ogy Since most of these methods rely on self-reported sur-vey data they more often predict variance in self-reporteduse rather than actual use [35] Thus more recently theo-retical attention has shifted from focusing on the decisioncomponent of adoption towards understanding adoption as aprocess that can lead to different patterns of use Models thataccount for use fall into the category of ldquouse diffusionrdquo mod-els These models attempt to characterize the way and thedegree to which people make use of the new technology Forexample once a consumer purchases a cell phone to whatdegree does he or she actually use the phone What featuresare used and how are these features used Can different typesof usage be recognized and compared

Formally proposed by Ram and Jung [26] and subse-quently updated and expanded by Shih and Venkatesh [31]to accommodate more robust and predictable descriptions ofusage use diffusion extends the traditional notion of adop-tion diffusion by focusing on system usage patterns Thework in this paper builds on the Shih and Venkatesh use dif-fusion model which suggests two dimensions to patterns oftechnology use The first dimension frequency provides ameasurement of how much a technology is used In the webcontext for example frequency of web site use might bedefined as the number of sessions a user has generated oversome period of time This frequency measure can be a veryuseful indicator of their interest in site content The seconddimension of the use diffusion model is what the authors callvariety This dimension measures the range of use of a tech-nology did the consumer make use of most of the featuresof their new cell phone or only two or three In the webcontext unlike frequency and number of sessions there areno standard measures of variety For this study we modelvariety as clickstream entropy The use diffusion model thusproduces four categories of use of a technology as shown inFig 1 When plotted along these two dimensions the pop-ulation of users is thus segmented into these four ldquoadoptercategoriesrdquo intense use limited use specialized use and

Fig 1 Use diffusion model proposed by Shih and Venkatesh [31]

non-specialized use A user may move to and from adoptercategories depending for example on the time interval con-sidered and the granularity of the data used for analysis (egan individual session vs lifetime sessions)

22 User typology modeling

In general typologies aim at classifying or categoriz-ing common characteristics of objects They are widelyused in sociological biological linguistic and psycholog-ical classifications to help both researchers and practitio-ners communicate efficiently about phenomenon of interestFor example in sociological typologies specific traits andfeatures of social groups are segmented (often times hier-archically) into meaningful categories Typological determi-nations are often made by careful examining of patterns inresearch data and do not always gain widespread signifi-cance or acceptance until broad examination within a scien-tific community As massive datasets of interest are rapidlybecome available however typological determinations aremore often being made automatically computationally andexperimentally as in this research

A great deal of recent typology research has been directedtowards understanding and classifying Internet users thecommon tasks they perform and the details of their onlinebehaviors The Pew Research Group for example is devel-oping user typologies describing the technology and Internetuse patterns of Americans [16] In their typology distinc-tions are made for example between ldquoLight But Satisfiedrdquousers individuals who use some technology but for whomtechnology does not play a central role in day-to-day lifeand ldquoOmnivoresrdquo who embrace technology fully and partic-ipate heavily in online activities Typologies of media usershave been extensively explored by Brandtzaeligg [3] and withthe rise of social media systems user typologies within thiscontext are a growing area of research [2] In educationresearch recent work by Eynon and Malmberg [11] illus-trates how typologies can inform design and implementationrecommendations they are using Internet usage typologiesof young students to more effectively integrate new technol-ogies into classroom practices

123

54 K E Maull et al

Typologies within digital libraries and other repositorieshave been used to explore the background behavior and moti-vations of end users to help inform decision making aboutlibrary user interfaces possible design or service enhance-ments or digital resource development For example per-sonas a type of user typology have been successfully usedto understand the characteristics and needs of institutionalrepository users [20] and the needs of users of library servicessupporting scientific data curation [18] Typically personasare created using a labor- and expertise-intensive qualita-tive research method that relies on extensive interviews andobservations of users The result of this qualitative researchis a typology of users that describes their needs motivationsand contextual factors that might influence system adoptionand use Automatic computational methods for creating dig-ital library personas have been explored by Maness et al [20]and further by Miaskiewicz et al [23]

Within educational data mining research using computa-tional methods for identifying user typologies often use clus-tering algorithms to group students into different categoriesbased on skills sets [1] or performance on a test or assess-ment [9] Xu et al [39] use clustering to identify and classifyusage types of teachers They examine features of teacher-generated projects within the Instructional Architect tool tocreate a typology of users based on the kinds of projects thatthey produced As these examples illustrate clustering algo-rithms are generally used to assign group membership amongitems with common attributes or features in large datasetsThe computational method presented in this article also usesclustering to identify and classify usage types of teachersThis research differs from prior efforts in that the featuresselected for the clustering algorithm are theoretically moti-vated by the use diffusion model

Thus the outputs of our computational method are twointer-related user typologies (1) a course-grained view ofthe user population segmented into use diffusion adoptercategories and (2) a fine-grained view of the same popu-lation segmented along the same two dimensions but usingmore detailed measures for variety and frequency Classi-fying and categorizing users into groups is a common taskin user behavior analysis The output of adoption models isoften a set of adopter categories which are a particular typeof user typology The categories produced by the use dif-fusion model are analogous to those produced by Rogerrsquosdiffusion of innovation model (ie ldquoearly adopterrdquo or ldquolatemajorityrdquo)

3 Computational method

The two-step method for this research is constructed to dis-cover usage patterns and user typologies The first step cap-tures coarse-grained user categories while the second step

Fig 2 Step 1 overview

determines fine-grained typologies of system use The nexttwo sections will examine the details of each step theirinputs processes and outputs

31 Step 1 Use diffusion patterns (Fig 2)

To understand how use diffusion patterns are modeled it isimportant to more fully examine the frequency and varietydimensions of the model In this study frequency is mod-eled as the number of user-initiated web sessions Whileother frequency measures can be considered this measureprovides a good initial approximation of overall system usefewer sessions imply lower system use while more sessionsimply higher system use Variety on the other hand is morechallenging to model because it is difficult to develop appli-cation-independent approaches to the concept For the firststep we chose a variety metric that is based on aggregateuser clickstreams Intuitively the clickstream of a partic-ularly user approximates their broad usage of the systemFurthermore over time clickstreams become regularmdashthatis they become more predictable as users develop normalpatterns of use within the system By applying entropymdashandspecifically Shannon entropy [30]mdashover the lifetime click-stream of each user a basic notion of variety is developedthat gives an approximate measurement of user behaviorEntropy has been used extensively in many systems to calcu-late measures of randomness and to approximate the amountof information being communicated in a system

In Maull et al [21] initial entropy computations weremade based on a simple unmodified Shannon entropy calcu-lation To summarize the first computations from that worka clickstream is modeled in a robust domain independentcomputationally trivial way using entropy as a model for vari-ety Since users generate a path of click interactions through asite applying entropy models allow for a coarse-grain mea-surement of the predictability of clicks within a site Thisintuition is then extended to the notion of variety highlypredictable low-entropy models imply low variety whereashighly unpredictable high-entropy models imply high vari-ety The result of this coarse-grained step is a projection offrequency and variety patterns onto use diffusion quadrantswhere each user binned into a quadrant

While these calculations yielded interesting overallresults they were not reliable predictors of specialized use

123

Linking computational methods with qualitative analyses 55

To experiment with an entropy-based model further an exten-sion to the entropy calculation was developed One of theprimary disadvantages of the entropy calculation used pre-viously is that there is little difference between breath anddepth of the URLs and clicks used in the calculation Forexample a large variety of URLs and clicks that broadlyexplore a site is no different from a large variety that deeplyexplore a site This is actually not surprising but the concernit presents in terms of mapping entropy calculations onto thevariety model of use diffusion is that there are too few usersin the high variety low-frequency quadrant This is particu-larly pronounced since variety and time used in the systemtend to also go togethermdashthat is the longer one uses a systemthe more likely the increase in variety of use of the system

To approach this problem experimentally we revise theentropy calculations by bifurcating clicks into two catego-ries Since our objective is to improve the number of usersin the lower quadrant representing specialized use we wantto account for two kinds of variety entropy of the clicks thatare related to the interactive resources components of thesystem (eg interactive resources) and then the entropy ofclicks related to the use of publisher materials (eg PDFs)Without getting lost within the specifics nuances of the appli-cation under examination such bifurcation should be possi-ble generically without affecting the goal of our initial step ofgeneralizing variety so that it is not overly specific to any oneparticular application It may also be possible to extend thismethod beyond two partitions to n partitions though suchextensions are beyond the scope of this paper

Let us consider the set of clicks Ca such that Ca =a1

an where a1 an are clicks in the first category of appli-cation clicks and Cb are the set of clicks Cb = b1 bnwhere b1 bn are the set if clicks in the second categoryof the application For generalization it will be left to thespecification of the application to determine how these cat-egories are determined In the case of our application wechose to split the application along the interactive resourcesand publisher components of the system

The new entropy calculation now considers the balanceof clicks within each category of the application so that thisbalance B for a user ui is computed by

Bui = 1 minus |Cuia minus Cui

b ||Cui

a + Cuib |

This new balance calculation thus penalizes large imbal-ances and allows for more balanced ratios to go largely unaf-fected Recall the entropy calculation H from [21]

H(Suiαk

) = minusnsum

i=1

pαkilog2 pαki

where αuik is the clickstream αk for a user ui Our new entropy

calculation HB is given by

Fig 3 Updated and re-calculated use diffusion pattern showing newfrequency and variety calculations

Fig 4 Step 2 overview

HB(Sα

uik

) = H(Suiαk

) + log2(Bui )

Figure 3 shows the results of the new calculations Plot-ted against the means of the original data of Fig 6 there arenow more users in the specialized category indicating thatthe penalty calculation did indeed improve the separation ofusers While this new calculation is not nearly as specific asthe clustering done in the next step it is a step toward re-bal-ancing the diffusion pattern so that it more robustly accountsfor the breadth and depth of use discussed previously

32 Step 2 User typology modeling (Fig 4)

While use diffusion patterns provide domain-independentquadrants of generalized usage behavior to understand fine-grained user behavior we apply data mining algorithms spe-cifically clustering Having discussed the challenges of thevariety variable in step 1 above in the second step featureswere selected from the clickstream data that expand varietyand frequency in order to discover more detailed views ofuser behavior Since clickstream data provides a good metricfor computing variety through entropy by selecting featuresthat model variety in more detail such as the usage of spe-cific system components we will develop a higher fidelityview of user behavior These new refinements and the appli-cation of clustering expands the large grained use diffusion

123

56 K E Maull et al

Fig 5 The CCS offers four major capabilities access to publishermaterial (IES Investigations tab) access to digital library resources(Interactive Resources tab) personalization capabilities (My Stuff tab)

and community features (Shared Stuff tab) The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

patterns into a fine-grained user typology that continues tomodel frequency and variety

The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables In the case of this study frequency isexpanded to include both sessions and total time spent withinthe system Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials digital library components anduser-contributedsocial features of the system While thereare a number of clustering algorithms to choose for this sec-ond step these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8]EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong After many iterations the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster The resulting output are clusters compris-ing a fine-grained user typology The EM algorithm is usedin these experiments because it is fast robust and typicallyconverges quickly Furthermore cluster shapes (eg circlesellipsoids) may vary to include more flexible cluster mem-bership

4 Case study

For the remainder of this paper we will focus on the casestudy that examines the use of the CCS The CCS is a NationalScience Foundation funded program overseen by Digital

Fig 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

Learning Sciences (DLS)mdasha joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research DLS began development of the CCSin early 2008 and in July 2009 the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest Over 100 teachers were trained on the CCSfor use in the 2009ndash2010 school year

The CCS provides four major features to the end user (seeFig 5) First it provides users with Web-based access to dig-ital versions of the paper-based student textbooks teachermanuals and curriculum guides that comprise the Earth

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

52 K E Maull et al

try out a new corn variety whereas a ldquolate majorityrdquo farmerwould wait until most of the surrounding farms had alreadyadopted the new corn Characterizing the adoption of con-temporary information services such as educational digitallibraries however is more complex than identifying whenthe farmer planted the corn or when the consumer boughtthe digital device Did a teacher that used NSDL or MER-LOT one time lsquoadoptrsquo the library In acknowledgement ofthis complexity instead of focusing on when an innovation isfirst used use diffusion examines both how and how much aninnovation is used to identify different adopter categories Itrecognizes that both the depth and breadth of usage will varywidely across different users and that successful adoptionwill take many different forms

Building on techniques from web analytics and datamining the proposed computational method employs twodifferent algorithms in a two-step process to develop bothcoarse- and fine-grained views of user behavior These algo-rithms rely on detailed web site usage logs where each indi-vidual action in the interface is recorded and associated with aunique user identifier In the first step one algorithm uses fre-quency of use and variety of use to sort users into different usediffusion adopter quadrants such as ldquointense userdquo ldquolimiteduserdquo ldquospecialized userdquo etc One challenge with operational-izing use diffusion in a computational method is modelingvariety in a way that is application independent In Maull et al[21] variety was calculated using Shannon [30] entropy amathematical construct from information theory While thefirst attempt using entropy resulted in a valuable step forwardin understanding how to map web usage onto the use dif-fusion model it did not sufficiently characterize specializeduse In this study we extend the initial calculations to includean enhanced entropy computation that includes a penalty forlow variety The details of these new computations and theirperformance are described in detail later in this article In thesecond step a clustering algorithm is employed to develop afiner-grained understanding of the different patterns of gener-alized and specialized use within and across these quadrantsThe result of this step is a user typology or classification ofusers along the selected dimensions of each cluster

To illustrate the utility of the computational method theresulting user typology and how they might be applied inpractice we present a case study where we used this methodto better understand how middle and high school scienceteachers integrated digital library resources from NSDL andDLESE into their instructional planning and teaching prac-tice Teachers were provided with a web-based planning toolthat enabled them to customize their districtrsquos adopted cur-riculum with digital library resources to better meet diverselearner needs and to share their customizations with otherteachers in their district [34] This planning tool the Cur-riculum Customization Service (CCS) was deployed to allmiddle and high school Earth science teachers within a large

urban school district in the Midwest US for a full academicyear the Service was carefully instrumented to record everyuser action and detailed usage logs were collected This caseoffers an excellent testbed for this method for two reasonsFirst the total potential user population is known and quan-tifiable enabling us to more easily assess overall rates ofuptake and adoption Second this deployment was also stud-ied using traditional educational field research methods suchas surveys interviews and classroom observations Thuswe can compare the coarse- and fine-grained views of userbehavior identified by the computational method with find-ings from the field study to better assess the accuracy andvalidity of the methodrsquos output While this case is clearlyfocused on understanding educational use of digital librarieswe believe the proposed method to be generalizable andapplicable to a wide variety of digital libraries informationservices and learning applications

2 Background and related work

This research draws on theories and computational tech-niques from several disciplines in order to better understanddigital library adoption use First as previously described wedraw on technology adoption and diffusion theories whichare historically rooted in social science to inform the pur-pose and overall functioning of our computational methodSecond we discuss related research to understand userbehavior and user typologies describing how our approachcompares to other efforts that are similarly focused on devel-oping automatic methods

21 Adoption and diffusion theories

Technology adoption occurs when an individual decides thata given technological innovation has utility and can add valueto his or her activitiesmdashsuch as teachingmdashif that innova-tion is somehow incorporated into those activities [33] Thusmuch theoretical work to date has focused on understandingthe cognitive affective and contextual factors that influencea potential userrsquos decision-making process One prominentfamily of theories offers extensions or refinements of Rogerrsquosinnovation diffusion theory [72527] To Rogers technol-ogy adoption is fundamentally a function of the communica-tion channels and social systems of which one is a part Thistheory suggests that (1) within a social system there are typi-cally five different ldquoadopter categoriesrdquo describing the differ-ent characteristics that users bring to bear when consideringwhether to adopt an innovation and (2) these characteristicsinfluence how innovations move through the social systemin predictable ways Another model is the concerns-basedadoption model developed by Fuller [12] and Hall [13] As itsname implies this model is focused on individualsrsquo concerns

123

Linking computational methods with qualitative analyses 53

which are defined as the specific reasonsmdashsituated in onersquossocio-cultural contextmdashthat one might have to adopt or notadopt technology A third major approach is the technologyacceptance model or TAM [636] This was one of the firstmodels to take into account the individualsrsquo self-efficacy andexpertise Proponents of this model argue that individualsrsquoself-perception of their ability to use technology and theirability to judge whether a technology has utility for themare important factors for understanding technology adoptionbehaviors

These prior models have contributed greatly to our under-standing of technology adoption however they all share acouple of weaknesses First none of them take into accountdiscontinuance ie people often stop using a new technol-ogy after they have tried it out a few times Second theyprovide very little insight into actual use of the new technol-ogy Since most of these methods rely on self-reported sur-vey data they more often predict variance in self-reporteduse rather than actual use [35] Thus more recently theo-retical attention has shifted from focusing on the decisioncomponent of adoption towards understanding adoption as aprocess that can lead to different patterns of use Models thataccount for use fall into the category of ldquouse diffusionrdquo mod-els These models attempt to characterize the way and thedegree to which people make use of the new technology Forexample once a consumer purchases a cell phone to whatdegree does he or she actually use the phone What featuresare used and how are these features used Can different typesof usage be recognized and compared

Formally proposed by Ram and Jung [26] and subse-quently updated and expanded by Shih and Venkatesh [31]to accommodate more robust and predictable descriptions ofusage use diffusion extends the traditional notion of adop-tion diffusion by focusing on system usage patterns Thework in this paper builds on the Shih and Venkatesh use dif-fusion model which suggests two dimensions to patterns oftechnology use The first dimension frequency provides ameasurement of how much a technology is used In the webcontext for example frequency of web site use might bedefined as the number of sessions a user has generated oversome period of time This frequency measure can be a veryuseful indicator of their interest in site content The seconddimension of the use diffusion model is what the authors callvariety This dimension measures the range of use of a tech-nology did the consumer make use of most of the featuresof their new cell phone or only two or three In the webcontext unlike frequency and number of sessions there areno standard measures of variety For this study we modelvariety as clickstream entropy The use diffusion model thusproduces four categories of use of a technology as shown inFig 1 When plotted along these two dimensions the pop-ulation of users is thus segmented into these four ldquoadoptercategoriesrdquo intense use limited use specialized use and

Fig 1 Use diffusion model proposed by Shih and Venkatesh [31]

non-specialized use A user may move to and from adoptercategories depending for example on the time interval con-sidered and the granularity of the data used for analysis (egan individual session vs lifetime sessions)

22 User typology modeling

In general typologies aim at classifying or categoriz-ing common characteristics of objects They are widelyused in sociological biological linguistic and psycholog-ical classifications to help both researchers and practitio-ners communicate efficiently about phenomenon of interestFor example in sociological typologies specific traits andfeatures of social groups are segmented (often times hier-archically) into meaningful categories Typological determi-nations are often made by careful examining of patterns inresearch data and do not always gain widespread signifi-cance or acceptance until broad examination within a scien-tific community As massive datasets of interest are rapidlybecome available however typological determinations aremore often being made automatically computationally andexperimentally as in this research

A great deal of recent typology research has been directedtowards understanding and classifying Internet users thecommon tasks they perform and the details of their onlinebehaviors The Pew Research Group for example is devel-oping user typologies describing the technology and Internetuse patterns of Americans [16] In their typology distinc-tions are made for example between ldquoLight But Satisfiedrdquousers individuals who use some technology but for whomtechnology does not play a central role in day-to-day lifeand ldquoOmnivoresrdquo who embrace technology fully and partic-ipate heavily in online activities Typologies of media usershave been extensively explored by Brandtzaeligg [3] and withthe rise of social media systems user typologies within thiscontext are a growing area of research [2] In educationresearch recent work by Eynon and Malmberg [11] illus-trates how typologies can inform design and implementationrecommendations they are using Internet usage typologiesof young students to more effectively integrate new technol-ogies into classroom practices

123

54 K E Maull et al

Typologies within digital libraries and other repositorieshave been used to explore the background behavior and moti-vations of end users to help inform decision making aboutlibrary user interfaces possible design or service enhance-ments or digital resource development For example per-sonas a type of user typology have been successfully usedto understand the characteristics and needs of institutionalrepository users [20] and the needs of users of library servicessupporting scientific data curation [18] Typically personasare created using a labor- and expertise-intensive qualita-tive research method that relies on extensive interviews andobservations of users The result of this qualitative researchis a typology of users that describes their needs motivationsand contextual factors that might influence system adoptionand use Automatic computational methods for creating dig-ital library personas have been explored by Maness et al [20]and further by Miaskiewicz et al [23]

Within educational data mining research using computa-tional methods for identifying user typologies often use clus-tering algorithms to group students into different categoriesbased on skills sets [1] or performance on a test or assess-ment [9] Xu et al [39] use clustering to identify and classifyusage types of teachers They examine features of teacher-generated projects within the Instructional Architect tool tocreate a typology of users based on the kinds of projects thatthey produced As these examples illustrate clustering algo-rithms are generally used to assign group membership amongitems with common attributes or features in large datasetsThe computational method presented in this article also usesclustering to identify and classify usage types of teachersThis research differs from prior efforts in that the featuresselected for the clustering algorithm are theoretically moti-vated by the use diffusion model

Thus the outputs of our computational method are twointer-related user typologies (1) a course-grained view ofthe user population segmented into use diffusion adoptercategories and (2) a fine-grained view of the same popu-lation segmented along the same two dimensions but usingmore detailed measures for variety and frequency Classi-fying and categorizing users into groups is a common taskin user behavior analysis The output of adoption models isoften a set of adopter categories which are a particular typeof user typology The categories produced by the use dif-fusion model are analogous to those produced by Rogerrsquosdiffusion of innovation model (ie ldquoearly adopterrdquo or ldquolatemajorityrdquo)

3 Computational method

The two-step method for this research is constructed to dis-cover usage patterns and user typologies The first step cap-tures coarse-grained user categories while the second step

Fig 2 Step 1 overview

determines fine-grained typologies of system use The nexttwo sections will examine the details of each step theirinputs processes and outputs

31 Step 1 Use diffusion patterns (Fig 2)

To understand how use diffusion patterns are modeled it isimportant to more fully examine the frequency and varietydimensions of the model In this study frequency is mod-eled as the number of user-initiated web sessions Whileother frequency measures can be considered this measureprovides a good initial approximation of overall system usefewer sessions imply lower system use while more sessionsimply higher system use Variety on the other hand is morechallenging to model because it is difficult to develop appli-cation-independent approaches to the concept For the firststep we chose a variety metric that is based on aggregateuser clickstreams Intuitively the clickstream of a partic-ularly user approximates their broad usage of the systemFurthermore over time clickstreams become regularmdashthatis they become more predictable as users develop normalpatterns of use within the system By applying entropymdashandspecifically Shannon entropy [30]mdashover the lifetime click-stream of each user a basic notion of variety is developedthat gives an approximate measurement of user behaviorEntropy has been used extensively in many systems to calcu-late measures of randomness and to approximate the amountof information being communicated in a system

In Maull et al [21] initial entropy computations weremade based on a simple unmodified Shannon entropy calcu-lation To summarize the first computations from that worka clickstream is modeled in a robust domain independentcomputationally trivial way using entropy as a model for vari-ety Since users generate a path of click interactions through asite applying entropy models allow for a coarse-grain mea-surement of the predictability of clicks within a site Thisintuition is then extended to the notion of variety highlypredictable low-entropy models imply low variety whereashighly unpredictable high-entropy models imply high vari-ety The result of this coarse-grained step is a projection offrequency and variety patterns onto use diffusion quadrantswhere each user binned into a quadrant

While these calculations yielded interesting overallresults they were not reliable predictors of specialized use

123

Linking computational methods with qualitative analyses 55

To experiment with an entropy-based model further an exten-sion to the entropy calculation was developed One of theprimary disadvantages of the entropy calculation used pre-viously is that there is little difference between breath anddepth of the URLs and clicks used in the calculation Forexample a large variety of URLs and clicks that broadlyexplore a site is no different from a large variety that deeplyexplore a site This is actually not surprising but the concernit presents in terms of mapping entropy calculations onto thevariety model of use diffusion is that there are too few usersin the high variety low-frequency quadrant This is particu-larly pronounced since variety and time used in the systemtend to also go togethermdashthat is the longer one uses a systemthe more likely the increase in variety of use of the system

To approach this problem experimentally we revise theentropy calculations by bifurcating clicks into two catego-ries Since our objective is to improve the number of usersin the lower quadrant representing specialized use we wantto account for two kinds of variety entropy of the clicks thatare related to the interactive resources components of thesystem (eg interactive resources) and then the entropy ofclicks related to the use of publisher materials (eg PDFs)Without getting lost within the specifics nuances of the appli-cation under examination such bifurcation should be possi-ble generically without affecting the goal of our initial step ofgeneralizing variety so that it is not overly specific to any oneparticular application It may also be possible to extend thismethod beyond two partitions to n partitions though suchextensions are beyond the scope of this paper

Let us consider the set of clicks Ca such that Ca =a1

an where a1 an are clicks in the first category of appli-cation clicks and Cb are the set of clicks Cb = b1 bnwhere b1 bn are the set if clicks in the second categoryof the application For generalization it will be left to thespecification of the application to determine how these cat-egories are determined In the case of our application wechose to split the application along the interactive resourcesand publisher components of the system

The new entropy calculation now considers the balanceof clicks within each category of the application so that thisbalance B for a user ui is computed by

Bui = 1 minus |Cuia minus Cui

b ||Cui

a + Cuib |

This new balance calculation thus penalizes large imbal-ances and allows for more balanced ratios to go largely unaf-fected Recall the entropy calculation H from [21]

H(Suiαk

) = minusnsum

i=1

pαkilog2 pαki

where αuik is the clickstream αk for a user ui Our new entropy

calculation HB is given by

Fig 3 Updated and re-calculated use diffusion pattern showing newfrequency and variety calculations

Fig 4 Step 2 overview

HB(Sα

uik

) = H(Suiαk

) + log2(Bui )

Figure 3 shows the results of the new calculations Plot-ted against the means of the original data of Fig 6 there arenow more users in the specialized category indicating thatthe penalty calculation did indeed improve the separation ofusers While this new calculation is not nearly as specific asthe clustering done in the next step it is a step toward re-bal-ancing the diffusion pattern so that it more robustly accountsfor the breadth and depth of use discussed previously

32 Step 2 User typology modeling (Fig 4)

While use diffusion patterns provide domain-independentquadrants of generalized usage behavior to understand fine-grained user behavior we apply data mining algorithms spe-cifically clustering Having discussed the challenges of thevariety variable in step 1 above in the second step featureswere selected from the clickstream data that expand varietyand frequency in order to discover more detailed views ofuser behavior Since clickstream data provides a good metricfor computing variety through entropy by selecting featuresthat model variety in more detail such as the usage of spe-cific system components we will develop a higher fidelityview of user behavior These new refinements and the appli-cation of clustering expands the large grained use diffusion

123

56 K E Maull et al

Fig 5 The CCS offers four major capabilities access to publishermaterial (IES Investigations tab) access to digital library resources(Interactive Resources tab) personalization capabilities (My Stuff tab)

and community features (Shared Stuff tab) The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

patterns into a fine-grained user typology that continues tomodel frequency and variety

The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables In the case of this study frequency isexpanded to include both sessions and total time spent withinthe system Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials digital library components anduser-contributedsocial features of the system While thereare a number of clustering algorithms to choose for this sec-ond step these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8]EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong After many iterations the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster The resulting output are clusters compris-ing a fine-grained user typology The EM algorithm is usedin these experiments because it is fast robust and typicallyconverges quickly Furthermore cluster shapes (eg circlesellipsoids) may vary to include more flexible cluster mem-bership

4 Case study

For the remainder of this paper we will focus on the casestudy that examines the use of the CCS The CCS is a NationalScience Foundation funded program overseen by Digital

Fig 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

Learning Sciences (DLS)mdasha joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research DLS began development of the CCSin early 2008 and in July 2009 the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest Over 100 teachers were trained on the CCSfor use in the 2009ndash2010 school year

The CCS provides four major features to the end user (seeFig 5) First it provides users with Web-based access to dig-ital versions of the paper-based student textbooks teachermanuals and curriculum guides that comprise the Earth

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

Linking computational methods with qualitative analyses 53

which are defined as the specific reasonsmdashsituated in onersquossocio-cultural contextmdashthat one might have to adopt or notadopt technology A third major approach is the technologyacceptance model or TAM [636] This was one of the firstmodels to take into account the individualsrsquo self-efficacy andexpertise Proponents of this model argue that individualsrsquoself-perception of their ability to use technology and theirability to judge whether a technology has utility for themare important factors for understanding technology adoptionbehaviors

These prior models have contributed greatly to our under-standing of technology adoption however they all share acouple of weaknesses First none of them take into accountdiscontinuance ie people often stop using a new technol-ogy after they have tried it out a few times Second theyprovide very little insight into actual use of the new technol-ogy Since most of these methods rely on self-reported sur-vey data they more often predict variance in self-reporteduse rather than actual use [35] Thus more recently theo-retical attention has shifted from focusing on the decisioncomponent of adoption towards understanding adoption as aprocess that can lead to different patterns of use Models thataccount for use fall into the category of ldquouse diffusionrdquo mod-els These models attempt to characterize the way and thedegree to which people make use of the new technology Forexample once a consumer purchases a cell phone to whatdegree does he or she actually use the phone What featuresare used and how are these features used Can different typesof usage be recognized and compared

Formally proposed by Ram and Jung [26] and subse-quently updated and expanded by Shih and Venkatesh [31]to accommodate more robust and predictable descriptions ofusage use diffusion extends the traditional notion of adop-tion diffusion by focusing on system usage patterns Thework in this paper builds on the Shih and Venkatesh use dif-fusion model which suggests two dimensions to patterns oftechnology use The first dimension frequency provides ameasurement of how much a technology is used In the webcontext for example frequency of web site use might bedefined as the number of sessions a user has generated oversome period of time This frequency measure can be a veryuseful indicator of their interest in site content The seconddimension of the use diffusion model is what the authors callvariety This dimension measures the range of use of a tech-nology did the consumer make use of most of the featuresof their new cell phone or only two or three In the webcontext unlike frequency and number of sessions there areno standard measures of variety For this study we modelvariety as clickstream entropy The use diffusion model thusproduces four categories of use of a technology as shown inFig 1 When plotted along these two dimensions the pop-ulation of users is thus segmented into these four ldquoadoptercategoriesrdquo intense use limited use specialized use and

Fig 1 Use diffusion model proposed by Shih and Venkatesh [31]

non-specialized use A user may move to and from adoptercategories depending for example on the time interval con-sidered and the granularity of the data used for analysis (egan individual session vs lifetime sessions)

22 User typology modeling

In general typologies aim at classifying or categoriz-ing common characteristics of objects They are widelyused in sociological biological linguistic and psycholog-ical classifications to help both researchers and practitio-ners communicate efficiently about phenomenon of interestFor example in sociological typologies specific traits andfeatures of social groups are segmented (often times hier-archically) into meaningful categories Typological determi-nations are often made by careful examining of patterns inresearch data and do not always gain widespread signifi-cance or acceptance until broad examination within a scien-tific community As massive datasets of interest are rapidlybecome available however typological determinations aremore often being made automatically computationally andexperimentally as in this research

A great deal of recent typology research has been directedtowards understanding and classifying Internet users thecommon tasks they perform and the details of their onlinebehaviors The Pew Research Group for example is devel-oping user typologies describing the technology and Internetuse patterns of Americans [16] In their typology distinc-tions are made for example between ldquoLight But Satisfiedrdquousers individuals who use some technology but for whomtechnology does not play a central role in day-to-day lifeand ldquoOmnivoresrdquo who embrace technology fully and partic-ipate heavily in online activities Typologies of media usershave been extensively explored by Brandtzaeligg [3] and withthe rise of social media systems user typologies within thiscontext are a growing area of research [2] In educationresearch recent work by Eynon and Malmberg [11] illus-trates how typologies can inform design and implementationrecommendations they are using Internet usage typologiesof young students to more effectively integrate new technol-ogies into classroom practices

123

54 K E Maull et al

Typologies within digital libraries and other repositorieshave been used to explore the background behavior and moti-vations of end users to help inform decision making aboutlibrary user interfaces possible design or service enhance-ments or digital resource development For example per-sonas a type of user typology have been successfully usedto understand the characteristics and needs of institutionalrepository users [20] and the needs of users of library servicessupporting scientific data curation [18] Typically personasare created using a labor- and expertise-intensive qualita-tive research method that relies on extensive interviews andobservations of users The result of this qualitative researchis a typology of users that describes their needs motivationsand contextual factors that might influence system adoptionand use Automatic computational methods for creating dig-ital library personas have been explored by Maness et al [20]and further by Miaskiewicz et al [23]

Within educational data mining research using computa-tional methods for identifying user typologies often use clus-tering algorithms to group students into different categoriesbased on skills sets [1] or performance on a test or assess-ment [9] Xu et al [39] use clustering to identify and classifyusage types of teachers They examine features of teacher-generated projects within the Instructional Architect tool tocreate a typology of users based on the kinds of projects thatthey produced As these examples illustrate clustering algo-rithms are generally used to assign group membership amongitems with common attributes or features in large datasetsThe computational method presented in this article also usesclustering to identify and classify usage types of teachersThis research differs from prior efforts in that the featuresselected for the clustering algorithm are theoretically moti-vated by the use diffusion model

Thus the outputs of our computational method are twointer-related user typologies (1) a course-grained view ofthe user population segmented into use diffusion adoptercategories and (2) a fine-grained view of the same popu-lation segmented along the same two dimensions but usingmore detailed measures for variety and frequency Classi-fying and categorizing users into groups is a common taskin user behavior analysis The output of adoption models isoften a set of adopter categories which are a particular typeof user typology The categories produced by the use dif-fusion model are analogous to those produced by Rogerrsquosdiffusion of innovation model (ie ldquoearly adopterrdquo or ldquolatemajorityrdquo)

3 Computational method

The two-step method for this research is constructed to dis-cover usage patterns and user typologies The first step cap-tures coarse-grained user categories while the second step

Fig 2 Step 1 overview

determines fine-grained typologies of system use The nexttwo sections will examine the details of each step theirinputs processes and outputs

31 Step 1 Use diffusion patterns (Fig 2)

To understand how use diffusion patterns are modeled it isimportant to more fully examine the frequency and varietydimensions of the model In this study frequency is mod-eled as the number of user-initiated web sessions Whileother frequency measures can be considered this measureprovides a good initial approximation of overall system usefewer sessions imply lower system use while more sessionsimply higher system use Variety on the other hand is morechallenging to model because it is difficult to develop appli-cation-independent approaches to the concept For the firststep we chose a variety metric that is based on aggregateuser clickstreams Intuitively the clickstream of a partic-ularly user approximates their broad usage of the systemFurthermore over time clickstreams become regularmdashthatis they become more predictable as users develop normalpatterns of use within the system By applying entropymdashandspecifically Shannon entropy [30]mdashover the lifetime click-stream of each user a basic notion of variety is developedthat gives an approximate measurement of user behaviorEntropy has been used extensively in many systems to calcu-late measures of randomness and to approximate the amountof information being communicated in a system

In Maull et al [21] initial entropy computations weremade based on a simple unmodified Shannon entropy calcu-lation To summarize the first computations from that worka clickstream is modeled in a robust domain independentcomputationally trivial way using entropy as a model for vari-ety Since users generate a path of click interactions through asite applying entropy models allow for a coarse-grain mea-surement of the predictability of clicks within a site Thisintuition is then extended to the notion of variety highlypredictable low-entropy models imply low variety whereashighly unpredictable high-entropy models imply high vari-ety The result of this coarse-grained step is a projection offrequency and variety patterns onto use diffusion quadrantswhere each user binned into a quadrant

While these calculations yielded interesting overallresults they were not reliable predictors of specialized use

123

Linking computational methods with qualitative analyses 55

To experiment with an entropy-based model further an exten-sion to the entropy calculation was developed One of theprimary disadvantages of the entropy calculation used pre-viously is that there is little difference between breath anddepth of the URLs and clicks used in the calculation Forexample a large variety of URLs and clicks that broadlyexplore a site is no different from a large variety that deeplyexplore a site This is actually not surprising but the concernit presents in terms of mapping entropy calculations onto thevariety model of use diffusion is that there are too few usersin the high variety low-frequency quadrant This is particu-larly pronounced since variety and time used in the systemtend to also go togethermdashthat is the longer one uses a systemthe more likely the increase in variety of use of the system

To approach this problem experimentally we revise theentropy calculations by bifurcating clicks into two catego-ries Since our objective is to improve the number of usersin the lower quadrant representing specialized use we wantto account for two kinds of variety entropy of the clicks thatare related to the interactive resources components of thesystem (eg interactive resources) and then the entropy ofclicks related to the use of publisher materials (eg PDFs)Without getting lost within the specifics nuances of the appli-cation under examination such bifurcation should be possi-ble generically without affecting the goal of our initial step ofgeneralizing variety so that it is not overly specific to any oneparticular application It may also be possible to extend thismethod beyond two partitions to n partitions though suchextensions are beyond the scope of this paper

Let us consider the set of clicks Ca such that Ca =a1

an where a1 an are clicks in the first category of appli-cation clicks and Cb are the set of clicks Cb = b1 bnwhere b1 bn are the set if clicks in the second categoryof the application For generalization it will be left to thespecification of the application to determine how these cat-egories are determined In the case of our application wechose to split the application along the interactive resourcesand publisher components of the system

The new entropy calculation now considers the balanceof clicks within each category of the application so that thisbalance B for a user ui is computed by

Bui = 1 minus |Cuia minus Cui

b ||Cui

a + Cuib |

This new balance calculation thus penalizes large imbal-ances and allows for more balanced ratios to go largely unaf-fected Recall the entropy calculation H from [21]

H(Suiαk

) = minusnsum

i=1

pαkilog2 pαki

where αuik is the clickstream αk for a user ui Our new entropy

calculation HB is given by

Fig 3 Updated and re-calculated use diffusion pattern showing newfrequency and variety calculations

Fig 4 Step 2 overview

HB(Sα

uik

) = H(Suiαk

) + log2(Bui )

Figure 3 shows the results of the new calculations Plot-ted against the means of the original data of Fig 6 there arenow more users in the specialized category indicating thatthe penalty calculation did indeed improve the separation ofusers While this new calculation is not nearly as specific asthe clustering done in the next step it is a step toward re-bal-ancing the diffusion pattern so that it more robustly accountsfor the breadth and depth of use discussed previously

32 Step 2 User typology modeling (Fig 4)

While use diffusion patterns provide domain-independentquadrants of generalized usage behavior to understand fine-grained user behavior we apply data mining algorithms spe-cifically clustering Having discussed the challenges of thevariety variable in step 1 above in the second step featureswere selected from the clickstream data that expand varietyand frequency in order to discover more detailed views ofuser behavior Since clickstream data provides a good metricfor computing variety through entropy by selecting featuresthat model variety in more detail such as the usage of spe-cific system components we will develop a higher fidelityview of user behavior These new refinements and the appli-cation of clustering expands the large grained use diffusion

123

56 K E Maull et al

Fig 5 The CCS offers four major capabilities access to publishermaterial (IES Investigations tab) access to digital library resources(Interactive Resources tab) personalization capabilities (My Stuff tab)

and community features (Shared Stuff tab) The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

patterns into a fine-grained user typology that continues tomodel frequency and variety

The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables In the case of this study frequency isexpanded to include both sessions and total time spent withinthe system Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials digital library components anduser-contributedsocial features of the system While thereare a number of clustering algorithms to choose for this sec-ond step these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8]EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong After many iterations the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster The resulting output are clusters compris-ing a fine-grained user typology The EM algorithm is usedin these experiments because it is fast robust and typicallyconverges quickly Furthermore cluster shapes (eg circlesellipsoids) may vary to include more flexible cluster mem-bership

4 Case study

For the remainder of this paper we will focus on the casestudy that examines the use of the CCS The CCS is a NationalScience Foundation funded program overseen by Digital

Fig 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

Learning Sciences (DLS)mdasha joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research DLS began development of the CCSin early 2008 and in July 2009 the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest Over 100 teachers were trained on the CCSfor use in the 2009ndash2010 school year

The CCS provides four major features to the end user (seeFig 5) First it provides users with Web-based access to dig-ital versions of the paper-based student textbooks teachermanuals and curriculum guides that comprise the Earth

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

54 K E Maull et al

Typologies within digital libraries and other repositorieshave been used to explore the background behavior and moti-vations of end users to help inform decision making aboutlibrary user interfaces possible design or service enhance-ments or digital resource development For example per-sonas a type of user typology have been successfully usedto understand the characteristics and needs of institutionalrepository users [20] and the needs of users of library servicessupporting scientific data curation [18] Typically personasare created using a labor- and expertise-intensive qualita-tive research method that relies on extensive interviews andobservations of users The result of this qualitative researchis a typology of users that describes their needs motivationsand contextual factors that might influence system adoptionand use Automatic computational methods for creating dig-ital library personas have been explored by Maness et al [20]and further by Miaskiewicz et al [23]

Within educational data mining research using computa-tional methods for identifying user typologies often use clus-tering algorithms to group students into different categoriesbased on skills sets [1] or performance on a test or assess-ment [9] Xu et al [39] use clustering to identify and classifyusage types of teachers They examine features of teacher-generated projects within the Instructional Architect tool tocreate a typology of users based on the kinds of projects thatthey produced As these examples illustrate clustering algo-rithms are generally used to assign group membership amongitems with common attributes or features in large datasetsThe computational method presented in this article also usesclustering to identify and classify usage types of teachersThis research differs from prior efforts in that the featuresselected for the clustering algorithm are theoretically moti-vated by the use diffusion model

Thus the outputs of our computational method are twointer-related user typologies (1) a course-grained view ofthe user population segmented into use diffusion adoptercategories and (2) a fine-grained view of the same popu-lation segmented along the same two dimensions but usingmore detailed measures for variety and frequency Classi-fying and categorizing users into groups is a common taskin user behavior analysis The output of adoption models isoften a set of adopter categories which are a particular typeof user typology The categories produced by the use dif-fusion model are analogous to those produced by Rogerrsquosdiffusion of innovation model (ie ldquoearly adopterrdquo or ldquolatemajorityrdquo)

3 Computational method

The two-step method for this research is constructed to dis-cover usage patterns and user typologies The first step cap-tures coarse-grained user categories while the second step

Fig 2 Step 1 overview

determines fine-grained typologies of system use The nexttwo sections will examine the details of each step theirinputs processes and outputs

31 Step 1 Use diffusion patterns (Fig 2)

To understand how use diffusion patterns are modeled it isimportant to more fully examine the frequency and varietydimensions of the model In this study frequency is mod-eled as the number of user-initiated web sessions Whileother frequency measures can be considered this measureprovides a good initial approximation of overall system usefewer sessions imply lower system use while more sessionsimply higher system use Variety on the other hand is morechallenging to model because it is difficult to develop appli-cation-independent approaches to the concept For the firststep we chose a variety metric that is based on aggregateuser clickstreams Intuitively the clickstream of a partic-ularly user approximates their broad usage of the systemFurthermore over time clickstreams become regularmdashthatis they become more predictable as users develop normalpatterns of use within the system By applying entropymdashandspecifically Shannon entropy [30]mdashover the lifetime click-stream of each user a basic notion of variety is developedthat gives an approximate measurement of user behaviorEntropy has been used extensively in many systems to calcu-late measures of randomness and to approximate the amountof information being communicated in a system

In Maull et al [21] initial entropy computations weremade based on a simple unmodified Shannon entropy calcu-lation To summarize the first computations from that worka clickstream is modeled in a robust domain independentcomputationally trivial way using entropy as a model for vari-ety Since users generate a path of click interactions through asite applying entropy models allow for a coarse-grain mea-surement of the predictability of clicks within a site Thisintuition is then extended to the notion of variety highlypredictable low-entropy models imply low variety whereashighly unpredictable high-entropy models imply high vari-ety The result of this coarse-grained step is a projection offrequency and variety patterns onto use diffusion quadrantswhere each user binned into a quadrant

While these calculations yielded interesting overallresults they were not reliable predictors of specialized use

123

Linking computational methods with qualitative analyses 55

To experiment with an entropy-based model further an exten-sion to the entropy calculation was developed One of theprimary disadvantages of the entropy calculation used pre-viously is that there is little difference between breath anddepth of the URLs and clicks used in the calculation Forexample a large variety of URLs and clicks that broadlyexplore a site is no different from a large variety that deeplyexplore a site This is actually not surprising but the concernit presents in terms of mapping entropy calculations onto thevariety model of use diffusion is that there are too few usersin the high variety low-frequency quadrant This is particu-larly pronounced since variety and time used in the systemtend to also go togethermdashthat is the longer one uses a systemthe more likely the increase in variety of use of the system

To approach this problem experimentally we revise theentropy calculations by bifurcating clicks into two catego-ries Since our objective is to improve the number of usersin the lower quadrant representing specialized use we wantto account for two kinds of variety entropy of the clicks thatare related to the interactive resources components of thesystem (eg interactive resources) and then the entropy ofclicks related to the use of publisher materials (eg PDFs)Without getting lost within the specifics nuances of the appli-cation under examination such bifurcation should be possi-ble generically without affecting the goal of our initial step ofgeneralizing variety so that it is not overly specific to any oneparticular application It may also be possible to extend thismethod beyond two partitions to n partitions though suchextensions are beyond the scope of this paper

Let us consider the set of clicks Ca such that Ca =a1

an where a1 an are clicks in the first category of appli-cation clicks and Cb are the set of clicks Cb = b1 bnwhere b1 bn are the set if clicks in the second categoryof the application For generalization it will be left to thespecification of the application to determine how these cat-egories are determined In the case of our application wechose to split the application along the interactive resourcesand publisher components of the system

The new entropy calculation now considers the balanceof clicks within each category of the application so that thisbalance B for a user ui is computed by

Bui = 1 minus |Cuia minus Cui

b ||Cui

a + Cuib |

This new balance calculation thus penalizes large imbal-ances and allows for more balanced ratios to go largely unaf-fected Recall the entropy calculation H from [21]

H(Suiαk

) = minusnsum

i=1

pαkilog2 pαki

where αuik is the clickstream αk for a user ui Our new entropy

calculation HB is given by

Fig 3 Updated and re-calculated use diffusion pattern showing newfrequency and variety calculations

Fig 4 Step 2 overview

HB(Sα

uik

) = H(Suiαk

) + log2(Bui )

Figure 3 shows the results of the new calculations Plot-ted against the means of the original data of Fig 6 there arenow more users in the specialized category indicating thatthe penalty calculation did indeed improve the separation ofusers While this new calculation is not nearly as specific asthe clustering done in the next step it is a step toward re-bal-ancing the diffusion pattern so that it more robustly accountsfor the breadth and depth of use discussed previously

32 Step 2 User typology modeling (Fig 4)

While use diffusion patterns provide domain-independentquadrants of generalized usage behavior to understand fine-grained user behavior we apply data mining algorithms spe-cifically clustering Having discussed the challenges of thevariety variable in step 1 above in the second step featureswere selected from the clickstream data that expand varietyand frequency in order to discover more detailed views ofuser behavior Since clickstream data provides a good metricfor computing variety through entropy by selecting featuresthat model variety in more detail such as the usage of spe-cific system components we will develop a higher fidelityview of user behavior These new refinements and the appli-cation of clustering expands the large grained use diffusion

123

56 K E Maull et al

Fig 5 The CCS offers four major capabilities access to publishermaterial (IES Investigations tab) access to digital library resources(Interactive Resources tab) personalization capabilities (My Stuff tab)

and community features (Shared Stuff tab) The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

patterns into a fine-grained user typology that continues tomodel frequency and variety

The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables In the case of this study frequency isexpanded to include both sessions and total time spent withinthe system Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials digital library components anduser-contributedsocial features of the system While thereare a number of clustering algorithms to choose for this sec-ond step these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8]EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong After many iterations the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster The resulting output are clusters compris-ing a fine-grained user typology The EM algorithm is usedin these experiments because it is fast robust and typicallyconverges quickly Furthermore cluster shapes (eg circlesellipsoids) may vary to include more flexible cluster mem-bership

4 Case study

For the remainder of this paper we will focus on the casestudy that examines the use of the CCS The CCS is a NationalScience Foundation funded program overseen by Digital

Fig 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

Learning Sciences (DLS)mdasha joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research DLS began development of the CCSin early 2008 and in July 2009 the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest Over 100 teachers were trained on the CCSfor use in the 2009ndash2010 school year

The CCS provides four major features to the end user (seeFig 5) First it provides users with Web-based access to dig-ital versions of the paper-based student textbooks teachermanuals and curriculum guides that comprise the Earth

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

Linking computational methods with qualitative analyses 55

To experiment with an entropy-based model further an exten-sion to the entropy calculation was developed One of theprimary disadvantages of the entropy calculation used pre-viously is that there is little difference between breath anddepth of the URLs and clicks used in the calculation Forexample a large variety of URLs and clicks that broadlyexplore a site is no different from a large variety that deeplyexplore a site This is actually not surprising but the concernit presents in terms of mapping entropy calculations onto thevariety model of use diffusion is that there are too few usersin the high variety low-frequency quadrant This is particu-larly pronounced since variety and time used in the systemtend to also go togethermdashthat is the longer one uses a systemthe more likely the increase in variety of use of the system

To approach this problem experimentally we revise theentropy calculations by bifurcating clicks into two catego-ries Since our objective is to improve the number of usersin the lower quadrant representing specialized use we wantto account for two kinds of variety entropy of the clicks thatare related to the interactive resources components of thesystem (eg interactive resources) and then the entropy ofclicks related to the use of publisher materials (eg PDFs)Without getting lost within the specifics nuances of the appli-cation under examination such bifurcation should be possi-ble generically without affecting the goal of our initial step ofgeneralizing variety so that it is not overly specific to any oneparticular application It may also be possible to extend thismethod beyond two partitions to n partitions though suchextensions are beyond the scope of this paper

Let us consider the set of clicks Ca such that Ca =a1

an where a1 an are clicks in the first category of appli-cation clicks and Cb are the set of clicks Cb = b1 bnwhere b1 bn are the set if clicks in the second categoryof the application For generalization it will be left to thespecification of the application to determine how these cat-egories are determined In the case of our application wechose to split the application along the interactive resourcesand publisher components of the system

The new entropy calculation now considers the balanceof clicks within each category of the application so that thisbalance B for a user ui is computed by

Bui = 1 minus |Cuia minus Cui

b ||Cui

a + Cuib |

This new balance calculation thus penalizes large imbal-ances and allows for more balanced ratios to go largely unaf-fected Recall the entropy calculation H from [21]

H(Suiαk

) = minusnsum

i=1

pαkilog2 pαki

where αuik is the clickstream αk for a user ui Our new entropy

calculation HB is given by

Fig 3 Updated and re-calculated use diffusion pattern showing newfrequency and variety calculations

Fig 4 Step 2 overview

HB(Sα

uik

) = H(Suiαk

) + log2(Bui )

Figure 3 shows the results of the new calculations Plot-ted against the means of the original data of Fig 6 there arenow more users in the specialized category indicating thatthe penalty calculation did indeed improve the separation ofusers While this new calculation is not nearly as specific asthe clustering done in the next step it is a step toward re-bal-ancing the diffusion pattern so that it more robustly accountsfor the breadth and depth of use discussed previously

32 Step 2 User typology modeling (Fig 4)

While use diffusion patterns provide domain-independentquadrants of generalized usage behavior to understand fine-grained user behavior we apply data mining algorithms spe-cifically clustering Having discussed the challenges of thevariety variable in step 1 above in the second step featureswere selected from the clickstream data that expand varietyand frequency in order to discover more detailed views ofuser behavior Since clickstream data provides a good metricfor computing variety through entropy by selecting featuresthat model variety in more detail such as the usage of spe-cific system components we will develop a higher fidelityview of user behavior These new refinements and the appli-cation of clustering expands the large grained use diffusion

123

56 K E Maull et al

Fig 5 The CCS offers four major capabilities access to publishermaterial (IES Investigations tab) access to digital library resources(Interactive Resources tab) personalization capabilities (My Stuff tab)

and community features (Shared Stuff tab) The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

patterns into a fine-grained user typology that continues tomodel frequency and variety

The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables In the case of this study frequency isexpanded to include both sessions and total time spent withinthe system Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials digital library components anduser-contributedsocial features of the system While thereare a number of clustering algorithms to choose for this sec-ond step these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8]EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong After many iterations the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster The resulting output are clusters compris-ing a fine-grained user typology The EM algorithm is usedin these experiments because it is fast robust and typicallyconverges quickly Furthermore cluster shapes (eg circlesellipsoids) may vary to include more flexible cluster mem-bership

4 Case study

For the remainder of this paper we will focus on the casestudy that examines the use of the CCS The CCS is a NationalScience Foundation funded program overseen by Digital

Fig 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

Learning Sciences (DLS)mdasha joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research DLS began development of the CCSin early 2008 and in July 2009 the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest Over 100 teachers were trained on the CCSfor use in the 2009ndash2010 school year

The CCS provides four major features to the end user (seeFig 5) First it provides users with Web-based access to dig-ital versions of the paper-based student textbooks teachermanuals and curriculum guides that comprise the Earth

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

56 K E Maull et al

Fig 5 The CCS offers four major capabilities access to publishermaterial (IES Investigations tab) access to digital library resources(Interactive Resources tab) personalization capabilities (My Stuff tab)

and community features (Shared Stuff tab) The Interactive Resourcescomponent is opened above showing the top recommended digitalresources

patterns into a fine-grained user typology that continues tomodel frequency and variety

The first aspect of refinement requires that we expand fre-quency and variety by choosing more detailed features tomodel those variables In the case of this study frequency isexpanded to include both sessions and total time spent withinthe system Variety is expanded to include eight featuresthat capture varying usage across system components thatinclude publisher materials digital library components anduser-contributedsocial features of the system While thereare a number of clustering algorithms to choose for this sec-ond step these experiments are based on the widely usedmodel-based expectation maximization (EM) algorithm [8]EM works by iteratively examining the parameters of eachobject instance to be clustered and builds a probability distri-bution that best explains where each object instance shouldbelong After many iterations the model settles into a set ofclusters that are represented by the model parameters derivedfor each cluster The resulting output are clusters compris-ing a fine-grained user typology The EM algorithm is usedin these experiments because it is fast robust and typicallyconverges quickly Furthermore cluster shapes (eg circlesellipsoids) may vary to include more flexible cluster mem-bership

4 Case study

For the remainder of this paper we will focus on the casestudy that examines the use of the CCS The CCS is a NationalScience Foundation funded program overseen by Digital

Fig 6 Use diffusion pattern comparison for step 1 from [21] and thenew entropy with penalty calculation showing frequency and variety ofthe data source

Learning Sciences (DLS)mdasha joint institute of the Universityof Colorado at Boulder and the University Corporation forAtmospheric Research DLS began development of the CCSin early 2008 and in July 2009 the CCS was made availableto all Earth science teachers in a large urban school districtin the midwest Over 100 teachers were trained on the CCSfor use in the 2009ndash2010 school year

The CCS provides four major features to the end user (seeFig 5) First it provides users with Web-based access to dig-ital versions of the paper-based student textbooks teachermanuals and curriculum guides that comprise the Earth

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

Linking computational methods with qualitative analyses 57

science curricula for both Grade 6 and Grade 9 The manu-als and guides outline the state standards that must be metexplain how the various units in the Earth science curric-ula are connected to state standards and provide additionalsupplementary materials for teacher use such as activitiesteaching tips and student assessments These materials areall grouped under a single user interface component and areorganized by key concept which allows teachers to organizetheir lessons in a manner that flexibly meets the learningneeds of their students The digital versions of these curric-ular materials are identical to what was already available toteachers in paper form but can now be accessed from anycomputer with a Web connection

Second the CCS integrates digitized publisher contentwith interactive resources available from the Digital Libraryfor Earth System Education (DLESE) a collection of Earthscience related digital resources that are part of the NationalScience Digital Library By clicking on the InteractiveResources tab a user can see recommendations for ani-mations video clips classroom activities and other digitalresources that pertain to the given key concept The inter-active resources available via the CCS have been vetted bythe experts who manage the DLESE collection Moreoverthese resources are filtered by the system to ensure that theyalign with the Earth science curricula Thus when a teacheraccesses a DLESE resource on for example volcanoes theresource not only has been determined to have educationalvalue by a subject matter expert but it has also been tied tothe specific science concepts that must be taught as well asthe science standards that must be met

The third major feature of the CCS is an interactive Web20 capability whereby teachers can save digital resourcesrecommended to them via the Interactive Resources com-ponent or they can upload their own resources to an areacalled My Stuff thus storing teacher-developed materials inthe same space as interactive resources from DLESE or fromthe curriculum for easy access Once a resource is saved toMy Stuff teachers have the option to share a copy of theresource to an area of the CCS called Shared Stuff which isaccessible to any CCS user who clicks on the Shared Stuffcomponent associated with a given key concept

The final major feature enables teachers to share materialswith their peers When a digital resource is added to SharedStuff the teacher who originally uploaded the resource aswell as other CCS users can add searchable tags or key-words so that any search of the CCS system for those tagswill list all resources tagged with the same keyword or phraseFinally CCS users can add ldquostar ratingsrdquo to resources so thatother users can determine how their colleagues rate a givenresource A resource that many users rate highlymdashfour orfive starsmdashmight be more likely to capture the attention ofsystem users than a resource with a low rating Hew andHara [15] argue that this kind of sharing may be a catalyst

Table 1 Summary statistics for the data plotted in Fig 6 N = 98

Entropy Lifetime sessions

Min 48063 Min 100

1st Qu 3559 1st Qu 700

Median 4834 Median 2300

Mean 4535 Mean 3785

3rd Qu 5830 3rd Qu 4800

Max 7336 Max 17100

Std Dev 156 Std Dev 42201

for enabling improvement of practice because such knowl-edge sharing tends to be tied to shared situated instructionalgoals and challenges and is thus more likely to be relevantto a teacherrsquos immediate short-term needs A more detaileddescription of the CCS including results from the field trialcan be found in Sumner et al [34]

41 Data source and step 1

The data source for this study was clickstream data of 98users from 9 months of interaction within the CCS The datafor these experiments were particularly interesting becausethe user interface environment contains many dynamic cli-ent-side components which would not ordinarily be capturedin a traditional web server log The system was thereforeinstrumented to extract additional information from clickson these dynamic client-side interface components The CCSclickstream data log therefore includes detailed tracking ofuser activity that provides rich user interaction data Since wewere able to capture this data over a relatively long period oftime we were able to analyze actual user behavior as the usersworked with the system in a natural unhindered manner Fur-thermore we were able to examine how publisher materialswhich are already core to standard teacher practice are com-plemented by digital library resources giving us a broaderview of how teachers integrate these digital library resourcesinto their traditional practices

As a result of step 1 we obtained initial frequency andvariety computations Table 1 shows the descriptive statisticsresulting from the first step of our experimental method Thedata show that the mean entropy of our population is 448 andthe mean frequency here measured as the lifetime sessionsthat a user logged is 3785

When we apply the use diffusion model to the data bymarking the quartiles at the means of each axis we obtain ause diffusion pattern that shows that there is a large (n = 40)limited use group compared to the previous entropy calcu-lations (n = 38) Similarly there are fewer (n = 26) intenseusers (entropy gt 454 frequency gt 3785)mdashthose usersthat exhibit both a larger amount of variety and frequency

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

58 K E Maull et al

Table 2 CCS clusterexperiment features anddescriptions

Feature label Description

1 Sessions Total lifetime sessions

2 Hours Total system hours

3 IR Activity Total activity within interactive resources

4 IR Saving Total interactive resource saving behavior

5 Shared Stuff Activity Total user-contributed content activity

6 Shared Stuff Saving Total user-contributed content saves

7 My Stuff Activity Total ldquoMy Stuffrdquo activity

8 My Stuff Saving Total ldquoMy Stuffrdquo saving behavior

9 PublisherndashTeacher Materials Total activity within publisherndashteacher materials

10 PublisherndashStudent Materials Total activity within publisherndashstudent materials

This shows a 10 decrease from the initial computation (n =30) The plot also shows that there are now 5 users in the spe-cialized category (n = 5) where there was only a single pointin the specialized category

The key improvement of the new calculations over the ini-tial calculation is that it spreads the data out in a significantway and this change can be seen both in the standard devi-ation (from 125 up to 156) and in the visual spread in thegraph It is encouraging that there are now a larger penaltiesfor higher entropy but there are yet generalizable changesthat could be made

As can be seen in Fig 6 the value of the use diffusion pat-tern modeling is that it provides a comprehensive overviewof coarse-grained behavior within the system The dotted linedivides the graph at the means of each axis thus creating thefour use diffusion quadrants We can quickly see the distri-bution of basic usage patterns within the system At the sametime however the pattern does not produce enough informa-tion to determine the specific details of use To do this wemust turn to the second step of the method

42 Feature selection and step 2

We began the second step by choosing 10 features of our datafor further analysis The features for this step of the methodrepresent four major functions of the system (1) use of dig-ital library-related system functions (2) use of traditionalpublisher materials and related system functions (3) systemfunctions that involve personalization and (4) user-contrib-uted functions Table 2 summarizes the system features usedin the second step of our proposed method

421 Digital library features

The CCS is specifically designed with the goal of provid-ing access to high-quality digital library resources Analyz-ing usage of the embedded digital library within the CCSshould therefore provide useful data about typical teacher

practices around digital resources To capture this behaviorwe selected features that detail the clickstream patterns ofthe embedded digital library resources within the systemThese resources were presented in the user interface underfour sub-categories Top Picks Animations ImagesVisualsand Inquiry with Data Each category contained resourcesfrom DLESE that were either selected as highly relevant tothe subject materials and unit focus (Top Picks) or containedmetadata that were of the appropriate type scope and topic(Animations etc) The items presented under each of theseviews were derived directly from DLESE web services [37]and appropriately presented to the user Clickstreams into thiscomponent are tracked with the IR Activity and IR Savingfeatures

422 Publisher materials features

Publisher materials are included as a core component of thesystem functionality The majority of the publisherrsquos itemsrepresent digitized versions of paper-based materials whe-ther they be book chapters supplemental materials such ashand-outs assessments etc The features here are thereforeconvenient digital proxies for real-world paper-based ana-logs Clickstreams into this component of the system andcorresponding sub-components were tracked and organizedwith the PublisherndashStudent Materials and PublisherndashTeacherMaterials features The PublisherndashStudent Materials includepublisher materials like digital versions of the student text-book while PublisherndashTeacher Materials include supple-mental publisher materials such as instructional supportmaterials

423 Personalization features

The CCS provides functionality to allow teachers to per-sonalize the contents of their accounts in particular usersare provided with the ability to save digital materials thatthey find of interest Once saved items may be retrieved for

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

Linking computational methods with qualitative analyses 59

Table 3 Experimental clusterresults showing the parametersobtained for each feature andcluster where N indicates thenumber of users in each cluster

Feature Cluster

1 2 3 4 5

Sessions 4934 47096 119551 26131 83897

Hours 10125 26702 52586 8427 24984

IR Activity 2483 74253 95537 26801 17109

IR Saving 0321 4703 26039 1218 0649

Shared Stuff Activity 3382 25753 81159 17558 82174

Shared Stuff Saving 0356 4314 18942 1877 3701

My Stuff Activity 0123 4864 19116 1106 0860

My Stuff Saving 0741 18623 11492 0388 4913

PublisherndashStudent Material 1421 21715 98143 14212 74404

PublisherndashTeacher Material 4781 28790 86070 14649 92306

N 31 10 8 35 14

further review and may even be shared with others if desiredSaving is considered a personalization feature because it pro-vides direct control over items that may fit a specific need(either at the time of save or in the future) Furthermoresaving implies an interest in the saved item and while thatinterest may only last for a short time it nonetheless acts as amarker of personalization behavior Personalization behaviorwas captured with the saving behavior of the system throughthe My Stuff Saving feature For example teachers are able tosave embedded digital library resources such as animationsimages visuals and top picks for the units of study they maybe interested in The My Stuff Activity feature represents thetotal activity performed within the My Stuff features of thesystem

424 Community behavior features

There are features of the system that promote community-centric behaviors For example resources and other materialsthat users find interesting can be shared with the communityat large in a kind of community pool of resources calledldquoShared Stuffrdquo The feature has many implications whenconsidering the nature of communities of practice of K-12educators who are often encouraged to share materials ped-agogical strategies and best practices amongst their peersThe community behaviors are captured with the Shared StuffActivity and Shared Stuff Saving features

43 A user typology

Our second step relies on feature clustering to develop a usertypology There are many clustering algorithms to choosefrom but for this set of typology experiments we chosethe model-based EM algorithm [8] Elsewhere we describeother experiments using other clustering algorithms andparameters [22] Using the Bayesian Information Criteria to

determine the optimal number of clusters for a given datasetit was determined there were five clusters to be discoveredin our experiments The details of the clusters are shown inTable 3 It should be noted that the clusters presented hereshow representative members of each cluster and do not nec-essarily represent actual users That is for each cluster inthe table the values present the ideal parameters that fit theobject instances that belong within the cluster

As can be seen in Table 3 cluster 1 characterizes the low-use pattern (low variety low frequency) of the step 1 usediffusion pattern The users in this cluster have producedvery few hours of total use within the system Furthermorethey do not seem to be using the full range of system fea-tures On the other hand the experiments revealed an intenseuser cluster (cluster 3) that shows robust use of the systemIndeed this ldquointense userrdquo category seems to have used thesystem in fullmdashexercising nearly every aspect of the systemand logging both large numbers of sessions and significanthours of use Two specialized user groups emerged from thetypology While they both show about the same number oftotal hours within the system cluster 5 identifies users whospend a great deal of time with the community features andpublisher materials of the system while cluster 2 shows userswho spend far more time within the interactive resources andembedded digital library component of the system Theseclusters are valuable to understand because they suggest thatsome users find the embedded digital resources as impor-tant as others find the community features Finally cluster 4shows a group of users that had a session count that was abovethe median but less than that of the intense and specializedusers This cluster also exhibits broad use of most of the fea-tures with slightly more use of interactive resources Table 4summarizes each cluster giving the key characteristics of thecluster and the diffusion pattern that the cluster belongs toFurthermore we have created fictitious typology labels toprovide an easy way to remember the cluster characteristics

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

60 K E Maull et al

Table 4 An initial user typology derived from step 2 showing the diffusion patterns and characteristics of each usage type

Cluster Diffusion pattern ldquoTypology Labelrdquo Characteristics

1 Limited Use ldquoUninterested Non-Adopterrdquo Very low over all system use

2 Specialized Use ldquoInteractive Resource Specialistrdquo Heavy use of interactive resources relative to other system featuresTends to access system weekly

3 Intense Use ldquoArdent Power Userrdquo Heavy and robust overall use of all features Tends to access system daily

4 Non-Specialized Use ldquoModerate Generalistrdquo Moderate overall system use Shows slightly more use of interactive resourcesthan other system features Tends to access system several times monthly

5 Specialized Use ldquoCommunity Seeker Specialistrdquo Makes heavy use of Shared Stuff features and Publisher materials relative toInteractive Resource Activity Tends to use the system weekly

44 Validation of results with field research findings

It is important to note that the method which produced theuser typology clusters is not simply a new kind of analysisrather it presents an opportunity to bridge different analyt-ical approaches Although the clusters that emerged fromour computational approach are based on clickstream datathese clusters are not arbitrary aggregations of user behav-iors They map onto real-world usage patterns that emergedfrom the traditional educational research techniques used inour study of the CCS Thus the findings from our compu-tational method are validated by the findings from our fieldresearch of CCS use ldquoin the wildrdquo

The CCS field trial used a mixed-method research design[19410] We collected quantitative data via survey in a lon-gitudinal fashion from all DPS Earth science teachers (n =124) during the 2009ndash2010 academic year In addition wecollected qualitative data from a large subset of teachers dur-ing the same period See Saldivar [29] for a complete dis-cussion of the CCS field trial here we summarize our datasources sample sizes and analysis techniques

ndash Survey 1 (n = 85) A pre-survey administered at thebeginning of the field trial to all Earth science teach-ers to establish a baseline of attitudes and behaviorsregarding educational technology (including interactivedigital resources) DPS Earth science curricular materi-als and differentiated instruction Data were collectedvia both quantitative (constrained-response) and qual-itative (free-response) items Quantitative survey itemswere analyzed by determining counts and frequencies ofresponses Qualitative items were analyzed via contentanalysis (Subsequent surveys also collected both quanti-tative and qualitative data and were analyzed in the samemanner)

ndash Survey 2 (n = 84) A pre-survey administered at themid-point of the field trial to all Earth science teachersto establish a baseline of attitudes and behaviors regard-ing general access to and attitudes towards instructionaltechnology tools in general and the CCS in particular

ndash Survey 3 (n = 81) A post-survey administered at the endof the field trial to all Earth science teachers This finalsurvey comprised a number of items included in the firsttwo surveys to enable a pre- and post-analysis of changesif any in CCS usersrsquo attitudes and behaviors during thecourse of the school year

ndash Adoption interviews (n = 24) These semi-structuredtelephone interviews regarding teachersrsquo adoption anduse of the CCS tool were administered to teachers whowere regular-to-frequent CCS users and resulted in anextensive database of qualitative data (interview tran-scripts) which were analyzed via content analysis

ndash Classroom observation cycles (n =8) Teachers identifiedas frequent CCS users were each recruited to participatein several hours of interviews and classroom observationsper teacher during the course of the school year result-ing in an extensive database of qualitative data (inter-view transcripts and field notes) which were analyzed viacontent analysis

Analysis of these field trial data conducted independentlyof the computational analysis of clickstream data we havedescribed above suggested that teachers in our study fellonto a spectrum of system use behavior from low-frequencyusers (corresponding to cluster 1) to moderate (cluster 4)and heavy (cluster 3) users The final two clusters fell on themoderate-to-heavy side of the spectrum Rogersrsquo [27] the-ory of diffusion of innovation discussed in our introductorysection predicts that the earliest users of a new technology(ldquoinnovatorsrdquo and ldquoearly adoptersrdquo) comprise approximately16 of all users Moore [24] who revised and extended Rog-ersrsquo work further argues that a technology cannot becomeldquomainstreamrdquo within a given population until it is adoptedby at least half of all potential users Our step 2 analysis indi-cates that about two-thirds of teachers in the district adoptedthe system to a significant degree with one-third of users(represented by clusters 2 3 and 5) making heavy use of thesystem Even if we confine ourselves to a theoretical frame-work such as Rogersrsquo that focuses on quantifying when dif-ferent segments of a target population adopt an innovation

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

Linking computational methods with qualitative analyses 61

our findings indicate that the CCS has been strongly adoptedand is heavily used by the target population

It can be argued that identifying low moderate and intenseuser clusters is hardly a profound finding after all any setof system users can be divided into ldquolowrdquo and ldquoheavyrdquo usercategories if the major variable of interest is frequency of sys-tem use Recall that use diffusion theory calls for an analysisthat incorporates frequency of use with an evaluation of howa given technology is used Looking through the lens of usediffusion the most salient finding from our computationalmethod is that two of the clusters cluster 2 the InteractiveResource Specialist and cluster 5 the Community SeekerSpecialist represent ldquovariety of userdquo behaviors similar to userbehaviors identified by our field research This enables us tovalidate the findings of our computational method with real-world data while at the same time giving us deep insightsinto how teachers integrated digital library resources intotheir instructional practices

For example Interactive Resource Specialists were mod-erately heavy users who spent most of their time in the Inter-active ResourcesEmbedded Digital Library component ofthe CCS Data from the final survey show that many teachershighly valued interactive resources In response to the state-ment ldquoUsing interactive resources effectively is importantto my personal success as a teacherrdquo about three-fourths ofrespondents agreed or strongly agreed When asked ldquoUsinginteractive resources in the classroom enables me to bet-ter meet the learning needs of students in my classroomrdquoalmost 90 agreed or strongly agreed Qualitative data pro-vide insight into these survey findings

In response to an interview question that asked him to dis-cuss why he used the CCS teacher Corey told us My teach-ing practices have always focused on student engagement Ifound that the CCS made it easier for me to find [interactive]resources with which I could capture student attention

Many teachers reported that they accessed digital resour-ces to supplement textbook material Overwhelmingly themost popular kinds of digital resources teachers found viathe CCS were graphic representations (eg pictures dia-grams animations) of Earth science phenomena In thefollowing example note Carliersquos observations regarding stu-dentsrsquo engagement with and understanding of key conceptsI think that since students are visual learners theyrsquore hands-on learners theyrsquore growing up in this technological ageSeeing the animations Really drives home the idea or thetopic I ask them if it does help them [to see graphic represen-tations] If whatever they just viewed made the material makemore sense and often times they [say] ldquoWow That totallymade sense Can we see it againrdquo I think they benefit fromit a lot and theyrsquore vocal about it They love it

Lizrsquos experience was similar to Carliersquos in that her stu-dents seemed excited simply by the presence of digitalresources which led to increased engagement and deeper

comprehension Liz told us in an interview I think in gen-eral my students are really interested in seeing any kind of[digital resource] [Digital resources are] really engaging forstudents because itrsquos different than what they normally see inthe class-room When they walk in theyrsquore just super-excitedwhen they see the projectorrsquos set up [They say] ldquoOh yeswe get to watch a videordquo or ldquoWe get to see a PowerPointrdquoThey come in automatically excited to learn

In contrast Community Seeker Specialists had CCS usagefrequencies comparable to Interactive Resource Specialistsbut spent most of their time in the Shared Stuff and Pub-lisher components of the system Publisher materials hadutility for teachers because they were directly related to theEarth science curricula used by the district but why wereShared Stuff materials popular Survey data suggest that theability to see other teachersrsquo uploaded materials gave CCSusers insights into the community of Earth science educa-tors in their district that were not possible before the systemwas made available For instance in the final survey almost61 of respondents agreed or strongly agreed with the state-ment ldquoThe CCS has increased my awareness of other teach-ersrsquo practicesrdquo Approximately half of respondents agreedor strongly agreed with the statement ldquoThe CCS has helpedme become a more active member of the DPS professionallearning communityrdquo Qualitative data provide a context forthese survey responses

In response to an open-ended survey question that askedrespondents to explain the value of accessing Shared Stuffteacher Sheila stated Looking at the Shared Stuff uploadedby others gives me ideas about how I can present particu-lar concepts in my [own] classroom In response to the samequestion Norma commented [Resources in Shared Stuff]have given me different perspectives on the different top-ics [in the curriculum] and thus enabled me to teach moreeffectively

Henrietta told us in an interview When [another Earthscience] teacher from across town might have found a greatwebsite or a video clip or something that really does bringthe point home and makes it more relevant to the studentsthatrsquos what Irsquom looking for and so far I have found somethings like that on [the CCS] Later in the same interviewHenrietta added that she and the other Earth science teach-ers at her schoolmdashall CCS usersmdashoften discussed usefulresources that they had found using the CCS We certainlyconfer on [resources we discover] [We tell each other] ldquoThiswas greatrdquo [and] This is something you need to [use] whenyou get to this [certain] point in your bookrdquo

5 Discussion

Step 1 of our method shows promise for the rapid determina-tion of application-independent large-grain usage patterns

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

62 K E Maull et al

while Step 2 produced a meaningful typology based on clus-ters that emerged from refined application-dependent featuredata The computational generation of clusters that corre-spond with user behaviors identified by qualitative analysisof interview and survey data suggests that it is possible tobuild an understanding of user behaviors and motivations byquantitative examination of their behaviors within a systemor application This is only a first step however at bridg-ing the gap toward understanding the complex interplay ofobservable behaviors that occur within an application andbehaviors that occur outside of that system

The CCSrsquos main purpose was to encourage teachers todifferentiate instruction by incorporating high-quality digitalresources into their teaching practices Thus the system wasdesigned to produce a specific outcomemdashteacher differenti-ation of instruction Our field trial data validated the in-sys-tem behaviors characterized for example by the InteractiveResource Specialist and Community Seeker Specialist clus-ters However we also know from field trial data that teach-ers with similar in-system behaviors may have very differentteaching practices in the classroom It is also possible thatteachers with similar classroom practices may exhibit dif-ferent in-system behaviors Ideally then the most completeunderstanding of a systemrsquos impact on users would comefrom a method that takes into account both in-system andout-of-system behaviors motivations and characteristics

As previously mentioned in Sect 22 existing literatureon digital library use offers some guidance Researchers inthe field of humanndashcomputer interaction have long used theconcept of personas to guide the development of informa-tion systems Since personas are constructs devised by sys-tem developers to describe different types of system userswhich most often take the form of fictional biographies ofsystem users that include demographic characteristics anddescriptions of how and why they interact with the systemdevelopers can begin to model and build systems that morespecifically address the needs of actual users The result thusincreases the chances that the system will serve the needs ofits end users better and subsequently enhance the adoption ofthe system Our method describes in-system behaviors butby extending our approach to incorporate a full set of out-of-system behaviors including for example demographicinformation teaching behaviors and student outcomesmdashthatis by developing personas of CCS usersmdashwe can not onlydevelop a richer understanding of the relationship betweensystem use and usersrsquo practices ldquoin the wildrdquo but also wecan extend the notion of personas beyond system design toactual system adoption and use over time

While we have obtained promising results from this studythere are several significant limitations First our method hasonly been applied to a single population It is plausible thatthe results that we obtained are specific to the populationunder examination A different typology may emerge as a

result of larger data populations for example new special-ized user types might emerge Second in order to validatethat our results are robust and generalizable it will be impor-tant to study the method with different applications and userpopulations Third while the computational method seems tosupport the analyses of qualitative data there are opportuni-ties to expand our understanding of system behavior and usercharacteristics to their impact on other meaningful behaviorsoutside the system

As can be seen in the case study both of the steps of themethod yielded interesting results It appears however thatthe entropy modeling of the first step requires more discrim-inatory power when describing specialized users While thenew calculations do a better job of penalizing usage thereare yet future modifications that may be developed to sepa-rate the specialized group(s) more Furthermore work stillremains to develop a precise coarse-grained model that sepa-rates every user group better The challenges revealed by thefirst step could furthermore indicate that specialization doesnot always imply low varietymdashand this is clear when consid-ering the details of the system usage when variety is expandedto include more variables as in the extended entropy calcu-lations and second step of the method

Applying automated typology discovery to inform guideand develop system user characteristics is one aspect of unde-rstanding within-system user behavior that may be extendedto include broader behaviors and user characteristics throughpersonas Such an extension may prove to be a useful toolfor understanding the behavioral characteristics outside thesystem so that an extremely rich description of system usersemerge that include in-system and out-of-system behaviorsThis would in turn allow for more data to be used in mak-ing predictive and prescriptive tool (in-system) policy (out-of-system) and training (in-system and out-of-system)decisions

6 Broader impacts

While this research has focused on demonstrating the valueof our method for the study of digital library adoption webelieve it can help address other educational challenges chiefamong which are teacher professional development the cor-relation of teacher practice to student learning and the eval-uation of teaching practices

Extant research indicates that one of the major barriers toeffective integration of technology into teaching practices isthe dearth of quality professional development and trainingvis-agrave-vis technology [14] Teachers often complain that evenwhen technology-related training is made available they areoften confronted with training that does not meet their needsas practicing educators because it assumes that they do ordo not already possess a given set of skills By modeling

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

Linking computational methods with qualitative analyses 63

teachersrsquo system use behavior one will be able to under-stand inter-user differences and target system training andprofessional development to usersrsquo true needs For exampleCCS users in the Community Seeker Specialist cluster mightbe presented with training that would help them integratemore interactive resources into their teaching because suchusers are known to not spend much of their CCS usage timeexploring interactive resources For teachers in the InteractiveResource Specialist cluster such training would be redun-dant because they are already making extensive use of theinteractive resources available via the CCS

While a consensus has emerged among policy makers andeducation researchers that effective adoption of technologycan indeed have a positive impact on teacher practice [1732]the relationship between teachersrsquo adoption of technologicalsystems and student achievement is still not well understoodAs part of our larger study of the CCS we are analyzing thestandardized test scores of students taught by teachers whoused the CCS to determine what impact the teachersrsquo use ofthe CCS had on student outcomes Our analysis of studenttest score data for both our field trial year and the prior schoolyear [28] shows increased overall achievement among Earthscience students after the introduction of the CCS due to thelimitations of the data available to us however we cannotmake strong causal claims We continue our efforts to under-stand what link if any exists between system use and studentoutcomes In the long term the ability to correlate system usebehaviors with studentsrsquo academic achievement will make itpossible to focus on teacher use behaviors that most benefitstudents in turn these behaviors could be taught to otherteachers

Evaluating teachersrsquo instructional practices is a very dif-ficult task that requires a massive commitment of human andfinancial resources [5] For example most evaluation systemsrely on administrators to observe teachers in the classroom avery labor-intensive approach that can result in teachers beingawarded or denied tenure or pay raises based on the shortperiod of time during which they were observed Furthervariance between evaluatorsrsquo adherence to evaluation rubricsand bias in the evaluation instruments themselves can make itdifficult to assess the validity of the evaluation process [38]Alternatively asking teachers to self-report their teachingpractices might provide evaluators with additional informa-tion but such data would be subject to all the limitations thatcome with asking individuals to report on their own behav-iors Neutral ldquothird partyrdquo datamdashsuch as the clickstream datato which we applied our computational methodmdashcan helpbridge the gap between what evaluators observe during therelatively brief periods they visit a teacherrsquos classroom andthe teacherrsquos activities when he or she was not being formallyevaluated A teacher evaluation process that incorporates tra-ditional observational and self-reported data with usage dataproduced by systems like the CCS would give administrators

policymakers and teachers themselves deeper insights intoinstructional practices further such a hybrid system wouldproduce a rich new understanding of teaching-related bestpractices that then could be shared with other educators

7 Conclusion

By applying models of adoption and use diffusion along-side data mining techniques we have developed a two-stepmethod for discovering patterns within clickstream data thatreveal both general and specific typologies of digital libraryuser behavior The application of this method to data froman academic year-long field trial in a large urban school dis-trict provided a rich opportunity to study the adoption andusage behavior of embedded digital library resources by mid-dle and high school science teachers The method showsconsiderable promise for extracting useful behavioral pat-terns ldquoin the wildrdquo the resulting fine-grained user typologymaps well with results emerging from traditional educationalfield research methods The proposed method while requir-ing more data to further validate provides a valuable contri-bution towards our effort to develop automatic methods forstudying digital library adoption and use When fully real-ized this method also has the potential to be extended toother applications and areas such as informing teacher pro-fessional development understanding the impact of digitallibrary applications on student learning and developing newapproaches to the evaluation of teacher practice

Acknowledgments This material is based upon work supported inpart by the NSDL program in the National Science Foundation underAwards 0734875 and 0840744 and by the University of Colorado atBoulder

References

1 Ayers E Nugent R Dean N Skill Set Profile Clustering Basedon Student Capability Vectors Computed From Online TutoringData In Proceedings of the 1st International Conference on Edu-cational Data Mining Montreal Canada pp 210ndash217 (2008)

2 Benevenuto F Rodrigues T Cha M Almeida V Characteriz-ing user behavior in online social networks In Proceedings of the9th ACM SIGCOMM Conference on Internet Measurement Con-ference ACM Chicago Illinois USA pp 49ndash62 (2009) doi10114516448931644900

3 Brandtzaeligg PB Towards a unified Media-User typology(MUT) a meta-analysis and review of the research literature onmedia-user typologies Comput Hum Behav 26(5) 940ndash956(2010) doi101016jchb201002008 httpwwwsciencedirectcomsciencearticleB6VDC-4YJSW8D-12011453cc70c0a6bdc29d3fde3b8a9304

4 Creswell J Educational Research Planning Conducting andEvaluating Quantitative and Qualitative Research PearsonEducation Upper Saddle Creek NJ (2008)

5 Danielson C McGreal TL Teacher Evaluation To EnhanceProfessional Practice Association for Supervision and CurriculumDevelopment Alexandria VA USA (2000)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123

64 K E Maull et al

6 Davis FD Perceived usefulness perceived ease of use anduser acceptance of information technology MIS Quart 13(3)319ndash340 (1989)

7 Deffuant G Huet S Amblard F An individual-based modelof innovation diffusion mixing social value and individual benefit1 Am J Sociol 110(4) 1041ndash1069 (2005)

8 Dempster AP Laird NM Rubin DB Maximum likelihoodfrom incomplete data via the EM algorithm J R Stat Soc Ser B(Methodological) 39(1) 1ndash38 (1977)

9 Dominguez AK Yacef K Curran JR Data mining for gen-erating hints in a python tutor In Proceedings of the 3rd Inter-national Conference on Educational Data Mining Pittsburgh PApp 91ndash100 (2010)

10 Dzurec L Abraham I The nature of inquiry linking quantitativeand qualitative research Adv Nurs Sci 16 73ndash79 (1993)

11 Eynon R Malmberg L A typology of young peoplersquos inter-net use implications for education Comput Educ 56(3)585ndash595 (2011) doi101016jcompedu201009020 httpwwwsciencedirectcomsciencearticleB6VCJ-517J24P-125c5d02fb284c227d3a9ef1cc4f3e1bf6

12 Fuller FF Concerns of teachers a developmental conceptualiza-tion Am Educ Res J 6(2) 207ndash226 (1969)

13 Hall GE The concerns-based approach to facilitating changeEduc Horizons 57(4) 202ndash208 (1979)

14 Hanson K Carlson B Effective Access Teachersrsquo Use of DigitalResources in STEM Teaching Gender Diversity and TechnologyInstitute Education Development Center Inc Newton (2005)

15 Hew KF Hara N Empirical study of motivators and barri-ers of teacher online knowledge sharing Educ Technol ResDev 55(6) 573ndash595 (2007)

16 Horrigan J A typology of information and communication tech-nology users Research report Pew Internet amp American Life Pro-ject (2007)

17 Kelly MG McAnear A National Educational Technology Stan-dards for Teachers Preparing Teachers to Use Technology Interna-tional Society for Technology in Education (ISTE) Eugene (2002)

18 Lage K Maness J Losoff B Receptivity to library involve-ment in scientific data curation a case study at the University ofColorado Boulder Portal Libr Acad 11(4) 915ndash937 (2011)

19 Leech N Onwuegbuzie A A typology of mixed methodsresearch designs Qual Quant 43(2) 265ndash275 (2009)

20 Maness J Miaskiewicz T Sumner T Using personas to under-stand the needs and goals of institutional repository users D-LibMag 14(910) 1082ndash9873 (2008)

21 Maull K Saldivar M Sumner T Understanding digital libraryadoption a use diffusion approach In Proceeding of the 11thAnnual International ACMIEEE Joint Conference on Digitallibraries ACM pp 259ndash268 (2011)

22 Maull KE Saldivar MG Sumner T Online curriculum plan-ning behavior of teachers In Proceedings of the 3rd Interna-tional Conference on Educational Data Mining Pittsburgh PApp 121ndash130 (2010)

23 Miaskiewicz T Sumner T Kozar K A latent semantic anal-ysis methodology for the identification and creation of personas

In Proceeding of the Twenty-Sixth Annual SIGCHI Conferenceon Human Factors in Computing Systems ACM pp 1501ndash1510(2008)

24 Moore GA Crossing the Chasm Marketing and SellingTechnology Products to Mainstream Customers HarperCollinsNew York (2006)

25 Pennington MC Cycles of innovation in the adoption of infor-mation technology a view for language teaching Comput AssistLang Learn 17(1) 7ndash33 (2004)

26 Ram S Jung H The conceptualization and measurementof product usage J Acad Mark Sci 18(1) 67ndash76 (1990)doi101007BF02729763 httpwwwspringerlinkcomcontent9kjl7574145320mv

27 Rogers EM Diffusion of Innovations 5th edn The Free PressNew York (2003)

28 Saldivar M Teacher integration of digital resources into instruc-tional practice CCS Report No 4 Digital Learning SciencesBoulder (2011)

29 Saldivar MG Teacher adoption of a Web-based instructionalplanning system Doctoral dissertation University of ColoradoBoulder CO (2012)

30 Shannon CE A mathematical theory of communcation Bell SystTech J 27 379ndash423 (1948)

31 Shih C Venkatesh A Beyond adoption development and appli-cation of a use-diffusion model J Mark 68(1) 59ndash72 (2004)httpwwwjstororgstable30161975

32 Smerdon B Teachersrsquo Tools for the 21st Century A Report onTeachersrsquo Use of Technology US Dept of Education Office ofEducational Research and Improvement Washington DC (2000)

33 Straub ET Understanding technology adoption theory andfuture directions for informal learning Rev Educ Res 79(2)625ndash649 (2009)

34 Sumner T Team C Customizing science instruction with edu-cational digital libraries In Proceedings of the 10th Annual JointConference on Digital libraries ACM JCDL rsquo10 New York NYUSA pp 353ndash356 (2010) doi10114518161231816178

35 Turner M Kitchenham B Brereton P Charters S BudgenD Does the technology acceptance model predict actual useA systematic literature review Inform Software Technol 52(5)463ndash479 (2010)

36 Venkatraman MP The impact of innovativeness and innovationtype on adoption J Retail 67(1) 51ndash67 (1991)

37 Weatherley J A web service framework for embedding discov-ery services in distributed library interfaces In Proceedings of the5th ACMIEEE-CS Joint Conference on Digital libraries (JCDLrsquo05) ACM New York NY USA pp 42ndash43 (2005) doi10114510653851065394

38 Wilson B Wood JA Teacher evaluation a national dilemmaJ Person Eval Educ 10(1) 75ndash82 (1996)

39 Xu B Recker M Hsi S Data deluge opportunities for researchin educational digital libraries In Cassie M Edwards (ed) InternetIssues Blogging the Digital Divide and Digital Libraries NovaScience Pub Inc New York (2010)

123