Assess Queries for Interactive Analysis of Data Cubes
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of Assess Queries for Interactive Analysis of Data Cubes
AssessQueries for Interactive Analysis of Data CubesMatteo Francia
DISI - University of BolognaBologna, Italy
Matteo GolfarelliDISI - University of Bologna
Bologna, [email protected]
Patrick MarcelUniversity of Tours
Blois, [email protected]
Stefano RizziDISI - University of Bologna
Bologna, [email protected]
Panos VassiliadisUniversity of Ioannina
Ioannina, [email protected]
ABSTRACTAssessment is the process of comparing the actual to the expectedbehavior of a business phenomenon and judging the outcome ofthe comparison. In this paper we propose assess, a novel query-ing operator that supports assessment based on the results of aquery on a data cube. This operator requires (1) the specificationof an OLAP query over a measure of a data cube, to define thetarget cube to be assessed; (2) the specification of a reference cubeof comparison (benchmark), which represents the expected per-formance of the measure; (3) the specification of how to performthe comparison between the target cube and the benchmark, and(4) a labeling function that classifies the result of this comparisonusing a set of labels. After introducing an SQL-like syntax forour operator, we formally define its semantics in terms of a setof logical operators. To support the computation of assess wepropose a basic plan as well as some optimization strategies, thenwe experimentally evaluate their performance using a prototype.
1 INTRODUCTIONAssume an analyst wants to assess the state of milk sales in Francefor 2019. She will have to issue a query against an OLAP server toobtain a cube, and then ask: โhow good, normal, surprising, etc.is the situation I observe for this particular cube as compared tosome reference data?โ. Assessment, as a process, is about compar-ing the actual to the expected behavior and judging, for instancethrough a labeling, the outcome of the comparison. Examples ofhow to assess the status of a cube (or of each single cell of a cube)include its comparison to:
(1) . . . a predefined target goal for the sales, e.g., because of theexistence of a predefined KPI (Key Performance Indicator);
(2) . . . a predefined golden standard, acting as a referencebenchmark (e.g., comparing French milk sales against theEU average) or, as an example in another domain, compar-ing a stock value to the S&P 500 index);
(3) . . . sibling cells, i.e., cells describing a similar context andsharing some dimension values (i.e., compare sales foryogurt and ice-cream in Greece in 2019, or milk sales inSpain and Italy for 2019);
(4) . . . the expected status of the cube as can be predicted fromthe past (e.g., compare actual milk sales in December 2018with those that can be predicted from the sales of theprevious six months).
ยฉ 2021 Copyright held by the owner/author(s). Published in Proceedings of the24th International Conference on Extending Database Technology (EDBT), March23-26, 2021, ISBN 978-3-89318-084-4 on OpenProceedings.org.Distribution of this paper is permitted under the terms of the Creative Commonslicense CC-by-nc-nd 4.0.
(5) . . . a new, derived measure produced via a function whoseformula involves other measures (e.g., profit=storeSales-storeCost).
This kind of tabular data assessment is consistently reportedas a frequent activity of data explorers [3, 12, 23] who often useSQL in combination with languages like Python and R. Notice-ably, assessment is one of the userโs intentions considered in theIntentional Analytics Model (IAM), which has been envisionedas a way to tightly couple OLAP and analytics [4, 21]. The IAMapproach relies on two major cornerstones: (i) the user exploresthe data space by expressing her analysis intentions rather thanby explicitly stating what data she needs, and (ii) in return shereceives both multidimensional data and knowledge insights inthe form of annotations of interesting subsets of data. Amongthe five intention operators proposed, assess is meant to judge acube measure with reference to some baseline.
In this paper we adopt the OLAP-centered nature of the IAMand operate in the context of a traditional OLAP environmentwith cubes, dimensions, hierarchies, and measures. This allowsus to take advantage of the neat logical-level schema structure ofOLAP and focus on the essence of the paper, which is proposingan assess operator to complement the traditional OLAP roll-upโsand drill-downโs. The idea of how to perform an assessment forthe measure values of a cube encompasses (a) the specificationof another cube, called benchmark, that represents the expectedor desirable performance of the measure; (b) the comparisonof the measure under investigation to the benchmark measure(for instance via a simple mathematical difference); and (c) thecharacterization, or labeling, of the status of the original cellbased on the result of the comparison.
Example 1.1. Given a SALES cube, the userโs intention de-scribed above can be expressed with this statement:
with SALES
for year = โ2019โ, product = โmilkโby year, product
assess quantity against 1000
using ratio(quantity, 1000)
labels {[0, 0.9): bad, [0.9, 1.1]: acceptable, (1.1,inf): good}
Intuitively, the total quantity of milk sold in France in 2019 islabeled as bad/acceptable/good depending on the ratio with thetarget value 1000. โก
Summary of contributions. Our contributions can be listedas follows:
โข We introduce a novel operator, assess, that allows to au-tomatically evaluate and characterize the result of a cubequery.
Series ISSN: 2367-2005 121 10.5441/002/edbt.2021.12
โข We introduce alternatives for specifying benchmarks, com-parison, and labeling schemes against which the resultsof a cube query can be compared and evaluated. For eachalternative we provide rigorous definitions and semantics,based on a set of logical operators, as well an SQL-likesyntax for the specification of an assess statement.
โข We discuss alternative plans for the execution of assessstatements and experimentally evaluate them for theirefficiency and scalability.
Roadmap. In Section 2 we formalize the involved conceptsand give definitions. In Section 3 we explain how assessmentsare computed and introduce the alternatives of the assess opera-tor, while in Section 4 we provide its syntax and semantics. InSection 5 we present alternative strategies for query executionand in Section 6 we experimentally evaluate them. In Section 7we discuss the related work. Finally, in Section 8 we summarizeour findings and discuss our future work.
2 FORMALITIESTo simplify the formalization, we will restrict to consider linearhierarchies.
Definition 2.1 (Hierarchy and Cube Schema). A hierarchy is atriple โ = (๐ฟ, โชฐ, โฅ) where:(i) ๐ฟ is a set of categorical levels, each coupled with a domain
of values (a.k.a. as members), ๐ท๐๐(๐) ;(ii) โชฐ is a roll-up total order of ๐ฟ; and(iii) โฅ is a part-of partial order of
โ๐ โ๐ฟ ๐ท๐๐(๐).
The part-of partial order is such that, for each couple of levels๐ and ๐ โฒ such that ๐ โชฐ ๐ โฒ, for each member ๐ข โ ๐ท๐๐(๐) there isexactly one member ๐ข โฒ โ ๐ท๐๐(๐ โฒ) such that ๐ข โฅ ๐ข โฒ.
A cube schema is a couple C = (๐ป,๐) where:(i) ๐ป is a set of hierarchies;(ii) ๐ is a tuple1 of numerical measures, each coupled with one
aggregation operator ๐๐ (๐) โ {sum, avg, . . .}.
Example 2.2. As a working example we will use cube schemaSALES = (๐ป,๐), where
๐ป = {โDate, โCustomer, โProduct, โStore},๐ = โจquantity, storeSales, storeCostโฉ,date โชฐ month โชฐ year,
customer โชฐ gender,
product โชฐ type โชฐ category,
store โชฐ city โชฐ country
and ๐๐ (quantity) = ๐๐ (storeSales) = ๐๐ (storeCost) = sum. Asto the part-of partial order we have, for instance, Fresh Fruit โฅFruit and 1997-04-15 โฅ 1997. โก
Aggregation is the basic mechanism to query cubes, and itis captured by the following definition of group-by set. As nor-mally done when working with the multidimensional model, ifa hierarchy โ does not appear in a group-by set it is implicitlyassumed that a complete aggregation is done along โ.
Definition 2.3 (Group-by Set and Coordinate). Given cube schemaC = (๐ป,๐), a group-by set of C is a tuple of levels, at most onefrom each hierarchy of ๐ป . The partial order induced on the setof all group-by sets of C by the roll-up orders of the hierarchies1When dealing with tuples we will write ๐ก1 = ๐ก2 |๐ ๐๐๐ก (๐ก1 )
to denote that tuple ๐ก1 iscontained in tuple ๐ก2; (๐ก1, ๐ก2) to denote the tuple that concatenates ๐ก1 and ๐ก2; ๐ก |๐to denote the projection of tuple ๐ก on its component(s) ๐ [1].
in ๐ป , is denoted with โชฐ๐ป . A coordinate of group-by set ๐บ is atuple of members, one for each level of ๐บ . Given coordinate ๐พ ofgroup-by set ๐บ and another group-by set ๐บ โฒ such that ๐บ โชฐ๐ป ๐บ โฒ,we will denote with ๐๐ข๐๐บโฒ (๐พ) the coordinate of ๐บ โฒ whose mem-bers are related to the corresponding members of ๐พ in the part-oforders, and we will say that ๐พ roll-ups to ๐๐ข๐๐บโฒ (๐พ). By definition,๐๐ข๐๐บ (๐พ) = ๐พ .
Definition 2.4 (Detailed Cube). Let๐บ0 be the top group-by setin the โชฐ๐ป partial order (i.e., the finest one). A detailed cube overC is a partial function ๐ถ0 that maps the coordinates of ๐บ0 to anumerical value for each measure๐ in๐ .
The function is partial since cubes are normally sparse: notall possible business events actually occur, and a coordinate par-ticipates in the function only if the event it describes took place.Each coordinate๐พ that participates in๐ถ0, with its associated tuple๐ก of measure values, is called a cell of ๐ถ0 and denoted ๐ = โจ๐พ, ๐กโฉ.With a slight abuse of notation, we will also consider a cube asthe set of the coordinates corresponding to its cells, so we willwrite ๐พ โ ๐ถ0 to state that โจ๐พ, ๐กโฉ is a cell of ๐ถ0.
Example 2.5. Three group-by sets of SALES are
๐บ0 = โจdate, customer, product, store}โฉ๐บ1 = โจdate, type, countryโฉ๐บ2 = โจmonth, categoryโฉ
where ๐บ0 โชฐ๐ป ๐บ1 โชฐ๐ป ๐บ2. ๐บ0 is the top group-by set. ๐บ1 ag-gregates sales by date, product type, and store country (for allcustomers), ๐บ2 by month and category (for all customers andstores). Examples of coordinates of the three group-by sets are,respectively,
๐พ0 = โจ1997-04-15, Eric Long, Lemon, SmartMartโฉ๐พ1 = โจ1997-04-15, Fresh Fruit, Italyโฉ๐พ2 = โจ1997-04, Fruitโฉ
where ๐๐ข๐๐บ1 (๐พ0) = ๐พ1 and ๐๐ข๐๐บ2 (๐พ1) = ๐พ2. An example of cellof a detailed cube over SALES is โจ๐พ0, โจquantity = 5, storeSales =20, storeCost = 12โฉโฉ. โก
Definition 2.6 (Cube Query and Derived Cube). Given a detailedcube ๐ถ0 over schema C, a query over ๐ถ0 is a quadruple ๐ =
(๐ถ0,๐บ๐, ๐๐, ๐๐) where:(i) ๐บ๐ is a group-by set of C;(ii) ๐๐ is a (possibly empty) set of selection predicates each
expressed over one level of ๐ป ;(iii) ๐๐ โ ๐ .The result of ๐ is called a derived cube, i.e., a partial function thatassigns to each coordinate ๐พ of ๐บ๐ satisfying the conjunctionof the predicates in ๐๐ and to each measure๐ in ๐๐ the valuecomputed by applying ๐๐ (๐) to the values of๐ for all the coor-dinates of ๐ถ0 that roll-up to ๐พ , provided that such coordinates of๐ถ0 exist.
Like detailed cubes, even derived cubes can be sparse; a co-ordinate ๐พ does not participate in the function if there is nocoordinate in ๐ถ0 that rolls-up to ๐พ . Like for detailed cubes, wewill write ๐พ โ ๐ถ to state that ๐พ is a coordinate of the derived cube๐ถ . Consistently with this, we will denote with |๐ถ | the number ofcoordinates in ๐ถ .
Example 2.7. A cube query over SALES is ๐ = (๐ถ0,๐บ๐, ๐๐, ๐๐)where ๐บ๐ = โจproduct, countryโฉ, ๐๐ = {type = โFresh Fruitโ,country = โItalyโ}, and ๐๐ = โจquantityโฉ. A cell of the resulting
122
s e l e c t country , product , sum ( qu an t i t y ) as q u an t i t y2 from s a l e s s
j o i n cus tomer c on c . ckey = s . ckey4 j o i n p roduc t p on p . pkey = s . pkey
where type = ' Fre sh F r u i t ' and count ry = ' I t a l y '6 group by country , p roduc t
Listing 1: Getting the sales of fresh fruit products in Italy(Example 2.7)
cube is โจโจApple, Italyโฉ, โจquantity = 100โฉโฉ. The SQL formulationof ๐ on a star schema is given in Listing 1. โก
3 COMPUTING AN ASSESSMENTBasically, the assessment of the values of a measure๐ in a cube๐ถ (called target cube) is done in three steps:
(1) the specification of a benchmark, i.e., a cube ๐ต such that(i) its cells can be mapped one-to-one with the cells of ๐ถ ,and (ii) it has a measure๐โฒ representing the expected/ac-ceptable/normal performance of๐;
(2) the cell-wise comparison of๐ to๐โฒ, which can be donein a basic way (e.g., algebraic/absolute/normalized differ-ence, percentage) or using more elaborate schemes (e.g.,z-scoring), possibly after applying some transformationsto๐ and๐โฒ (e.g., to compute derived measures);
(3) the characterization, or labeling, of the status of each cellof ๐ถ based on the result of the comparison; in the sim-plest case, this is done using a set of rules that map theresult of the comparison to a set of predefined labels (e.g.,โinsufficientโ, โexcellentโ, etc.).
3.1 BenchmarksThe specification of the benchmark is given by the analyst at theposing of the query. Thus, the question is โtell me how we aredoing with respect to this benchmarkโ.
A thorough comparison of a target cube ๐ถ against a bench-mark ๐ต would require that the latter comes with the same levelmembers so that, for each cell of ๐ถ , we can map onto a cell of ๐ต.However, in practical cases, due to cube sparsity, there is no guar-antee that all cells can be mapped โespecially if the benchmarkis retrieved from the web or other external data sources. Thus,in the following we provide a broad definition of the conditionsunder which two cubes are joinable, i.e., one of them can be usedas a benchmark to assess the other; in this definition, we will justrequire that the two cubes have the same group-by set.
Definition 3.1 (Cube Joinability). Let a target cube๐ถ over cubeschema C and a benchmark ๐ต over B (where possibly, but notnecessarily, B = C) be given. Let ๐ = (๐ถ0,๐บ๐ถ , ๐๐ถ , ๐๐ถ ) and๐โฒ = (๐ต0,๐บ๐ต, ๐๐ต, ๐๐ต) be the queries that resulted in ๐ถ and ๐ต,respectively. We say that ๐ถ and ๐ต are joinable if
๐บ๐ถ = ๐บ๐ต
In OLAP terms, two cubes are joinable if a drill-across is possiblebetween ๐ถ and ๐ต.
Let C = (๐ป,๐) be the schema of the target cube ๐ถ , and ๐ถ0 bethe detailed cube from which ๐ถ is derived. There are four typesof benchmarks we consider in our approach:
โข Constant benchmarks. Here the user simply wants to assessthe cells of the target cube ๐ถ against some fixed value, astypically done with key performance indicators. In thiscase, the benchmark ๐ต has schema B = (๐ป, โจ๐๐๐๐๐ ๐ก โฉ);
its cells have exactly the same coordinates as ๐ถ , and allof them store a constant value in๐๐๐๐๐ ๐ก . The cell-to-cellmapping is trivially based on equality of coordinates.
โข External benchmarks. Here the userโs goal is to assess thetarget cube against the data stored in a cube with schemaB = (๐ป โฒ, ๐ โฒ). In principle, as long asB includes the group-by of the target cube (which ensures joinability), it is notnecessary to impose further constraints on B. However,for simplicity, in the following we will assume that theexternal benchmark has been reconciled with the targetcube so that ๐ป = ๐ป โฒ and that all necessary transcodingsto level members have been applied (see e.g. [10] for anapproach that can be pursued to this end). Thus, also inthis case, mapping is based on equality of coordinates.
โข Sibling benchmarks. The idea here is to compare the valuesof a measure in a slice on member ๐ข โ ๐ท๐๐(๐) with thevalues of the same measure in another slice of ๐ถ relatedto a sibling member ๐ข๐ ๐๐ โ ๐ท๐๐(๐) (e.g., assess the salesof fruit in Italy with reference to those in France). In thiscase, the benchmark has the same schema C of the targetcube. Both cubes have the same group-by set, but whilethe cells in ๐ถ are those obtained from ๐ถ0 using predicate๐ = ๐ข, those in ๐ต are obtained from ๐ถ0 using predicate๐ = ๐ข๐ ๐๐ . Then the cell-to-cell mapping is established byreplacing ๐ข with ๐ข๐ ๐๐ in each coordinate of ๐ถ .
โข Past benchmarks. In this case the user wants to assess thevalues taken by a measure๐ in some time slice with thevalues that can be predicted for๐ based on a number ofpast time slices. Like in the previous case, it is B = C. Thecells of ๐ต have exactly the same coordinates as ๐ถ , but the(actual) values of๐ are replaced with the predicted ones.
Example 3.2. Let๐ถ be the derived cube obtained by query ๐ inExample 2.7 (total quantity sold by product and country for freshfruit products and Italy). An example of (joinable) sibling bench-mark is ๐ต returned by ๐โฒ, being ๐โฒ obtained from ๐ by replacingItaly with France. ๐ต can be used to assess the sales of fresh fruitin Italy against those in France. The cell-to-cell mapping is estab-lished by replacing Italy with France; so, for instance, coordinateโจApple, Italyโฉ is mapped onto โจApple, Franceโฉ. โก
3.2 Comparison & TransformationThe essence of assessment is to contrast the actual performanceagainst its expected value. Thus, the goal of this step is to providethe means to express and perform the evaluation of how far apartthe query result and the benchmark are. We refer to this actionas comparison to express the idea that this is not necessarilya simple measure difference. Modeling-wise, we assume that alibrary of comparison functions, all with signature ๐ฟ : R ร RโR, is available to the users. Practically, a cell-wise comparisonbetween measures of the target and benchmark can be easilyimplemented via different functions obeying the above signature,the simplest choice being a difference (either algebraic, or absolute,or normalized, etc.). In our examples, we will use two libraryfunctions of our system, named difference and ratio.
One could possibly expect that, once the target cube and thebenchmark have been obtained, their comparison is immediatelyapplicable. Interestingly, this is not always the case, since thecomparison may require the computation of derived measures.For instance, with reference to the SALES cube, comparing theactual profit for some given sales requires to compute a derivedmeasure as profit = storeSalesโ storeCost. Clearly, this requires
123
de f d i f f e r e n c e ( a , b ) :2 r e t u r n a โ b
4 de f minmaxnorm ( a ) :minv = a . min ( )
6 maxv = a . max ( )r e t u r n ( a โ minv ) / ( maxv โ minv )
Listing 2: Implementation of the difference andminmaxnorm functions
that either of, or both, the target and the benchmark measurespass through a set of transformations to be actually compara-ble. The transformations that are applicable to target cubes andbenchmarks can be simple (like the above mentioned one, wherethe measures are computed via simple per-cell arithmetic opera-tions), or more complex ones (like ranking or z-scoring) whichrequire a holistic scan of the entire cube and cannot produce thenew value on a per-cell basis.
We forego the formalities of the computation of the derivedmeasures (to be discussed in Section 4.2) and simply mentionthat we assume a functional-style composition of the invocationof functions from our library of functions in a nestable way. Forexample, the min-max normalization of the difference betweenstoreSales and target value 1000 is computed as
minMaxNorm(difference(storeSales,1000))
Listing 2 shows the implementation of these two functions inPython using Pandas DataFrames.
3.3 LabelingThe goal of this step is to associate each cell of the target cubewith a label, taken from a predefined set, to express an evaluationof that cell with reference to the benchmark. Clearly, ordinallabeling will frequently be the case, however, for the sake ofgenerality we assume labels to be nominal, i.e., categorical. Givena finite set of distinct values ๐ฟ, a labeling function has the form_ : Rโ ๐ฟ. Each value resulting from the comparison of a targetcube cell with the corresponding benchmark cell is fed to thelabeling function, and assigned the appropriate label.
There are some properties of interest for a labeling function:โข The labeling of comparison values is generic enough toalso incorporate the labeling based on the actual value ofthe cell, without the usage of any benchmark and com-parison. One simply needs to assign a fixed benchmark ofzeros for all cells and a simple arithmetic difference as thecomparison function.
โข A labeling function should partition the values of the com-parison into equivalence classes, i.e., there must be a com-plete mapping of the values of the domain of the compari-son to a set of non-overlapping, disjoint labels. Thus, everycell of the result is assigned to exactly one label.
โข The labeling function does not necessarily have to be pre-defined before the query. Assuming, for example, that aLikert-like scale based on the absolute difference valueis to be adopted, the labeling function is produced afterthe results are obtained and split into a fixed number ofgroups (say 5).
With reference to the last point, in the sequel we introduceand explain two cases of labeling functions.
3.3.1 Labeling based on explicit ranges. In this case, we labeleach cell based on the result of the comparison between the
de f 5 s t a r s ( a ) :2 r e t u r n pd . cu t ( a , [ โ1 , โ0 . 6 , โ0 . 2 , 0 . 2 , 0 . 6 , 1 . 0 ] ,
i n c l u d e _ l owe s t = True ,4 l a b e l s =[ " โ " , " โ โ " , " โ โ โ " , " โ โ โ โ " , " โ โ โ โ โ " ] )
Listing 3: Implementation of the 5stars function
measure values of (a) the target and (b) the benchmark cube, usinga set of explicitly-specified rules. This is the case, e.g., where theorganization has predetermined goals to achieve (expressed viathe benchmark), and the (positive or negative) deviation fromthese goals characterizes the extent of success or failure.
Example 3.3. Let a query be given that computes the totalstore sales by customer gender, returning a target cube ๐ถ withtwo cells, say ๐ถ = {โจmale, 4400โฉ, โจfemale, 6900โฉ}. Assume thatwe have specified an external benchmark with two cells also, ๐ต =
{โจmale, 5400โฉ, โจfemale, 6400โฉ}. Finally, assume that we specify arange-based labeling function called 5stars to be applied overthe min-max normalized difference ๐ฅ of the target cube and thebenchmark:
_5stars (๐ฅ) =
*, if โ 1 โค ๐ฅ โค โ0.6**, if โ 0.6 < ๐ฅ โค โ0.2***, if โ 0.2 < ๐ฅ โค 0.2****, if 0.2 < ๐ฅ โค 0.6*****, if 0.6 < ๐ฅ โค 1
Then, the two cells are labeled as โ*โ and โ*****โ, respectively.Listing 3 shows the implementation of the 5stars function inPython using Pandas DataFrames; the cut function of Pandasbins values into discrete intervals. โก
3.3.2 Labeling based on the overall value distribution. Explic-itly providing rules and ranges for the labels has the benefit thatthe decision on which label to give to a cell of the target cubeis local, i.e., it depends only on the value of the cellโs measure,the benchmarkโs measure, and the result of their comparison.However, the labeling function can also be based on a holisticassessment of the overall distribution of the values of the compar-ison function. In this case, the labeling function first groups thecells of the target cube based on the result of their comparisonwith their respective benchmark cell, and then gives a label toeach group. The simplest possibility would be to split the com-parison value into quartiles or, more generally, into ๐ groups,and label each group as โtop-1โ, โtop-2โ, . . . , โtop-kโ. This involvessimply the ranking of the values and the splitting of the orderedset of cells into ๐ groups. Assuming a fixed set of ๐ labels, thelabel is then determined by the position of a cell in the ranking.
Overall, labels can be assigned either by fixing the number oflabels to a constant number and constructing equi-depth or equi-width histograms, or by allowing the system to come up with theoptimal number of clusters and assign cells accordingly. Moresimplistic schemes (e.g., rounding the z-score of the comparisonvalues) can also be devised. Overall, the idea of these labelingschemes is to avoid predefining ranges, and allowing labels toadapt to the distribution of the comparison values.
4 SYNTAX & SEMANTICS OF THE ASSESSOPERATOR
In this section, we formally define the syntax and semantics ofthe assess operator. We begin by introducing in Section 4.1 auser-friendly SQL-like syntax, to facilitate end users in posing
124
assessment queries with both expressive power and ease. Then,we move on to define the semantics of the assess operator inSection 4.3. To support this task, in Section 4.2 we preliminarilydefine a set of logical operators.
4.1 SyntaxThe general syntax for writing a statement based on the assessoperator includes three parts: one (consisting of the with, assess,by, and for clauses) that specifies the target cube; one (consistingof the against clause) that specifies the benchmark; one (consist-ing of the using and labels clauses) that specifies the assessmentmethod. Importantly, as we will explain in Section 4.3, the bench-mark specification drives the mapping of the assess syntax to thelogical operators defined in Section 4.2.
with ๐ถ0 [ for ๐ ] by ๐บ
assess|assess*๐ [ against < ๐๐๐๐โ๐๐๐๐ > ]
[ using < ๐ ๐ข๐๐๐ก๐๐๐ > ] labels _
where ๐ถ0 is a detailed cube (with schema C = (๐ป,๐)), ๐ is ameasure of ๐ถ0, ๐ is a set of conjunctive selection predicates eachover one level of ๐ป , ๐บ is a group-by set of C, < ๐๐๐๐โ๐๐๐๐ > isthe benchmark specification, < ๐ ๐ข๐๐๐ก๐๐๐ > specifies what willbe compared and how, and _ is a labeling function (optional partsof the syntax are in brackets). While in assess only the cells ofthe target cube that have a match in the benchmark are returned,in the assess* variant all the cells of the target cube are returned,possibly completed with null labels.
The target cube, ๐ถ , is defined by aggregating ๐ถ0 on ๐บ andselecting the cells that meet the conjunctive predicates in ๐ .
As to the benchmark, its specification can take different forms:
โข For constant benchmarks, the against clause has the form
against ๐ฃ
where ๐ฃ is a value compatible with๐. The benchmark ๐ต
is characterized by ๐บ๐ต = ๐บ๐ถ , ๐๐ต = ๐๐ถ . ๐ต has a measure๐๐๐๐๐ ๐ก which takes value ๐ฃ in all cells. A particular caseis when the user wants to directly assess the measurevalue without using any specific value. In this case theagainst clause is omitted; as mentioned in Section 3.3, thispractically corresponds to adopting a dummy benchmarkwhere all cells are zeros.
โข For external benchmarks, the against clause takes the form
against ๐ต.๐๐
where ๐ต is a cube and๐๐ is one of its measures. Note that๐ถ and ๐ต are joinable only if they have the same group-byset.
โข In a sibling benchmark, the for clause must include a pred-icate which slices the target cube on member ๐ข of level๐๐ โ ๐บ๐ถ . In this case,๐ is assessed against a benchmarkrelated to a different member of ๐๐ , say ๐ข๐ ๐๐ :
with < ๐๐ข๐๐ > for ๐1, . . . , ๐๐ , ๐๐ = ๐ข by ๐บ
assess๐ against ๐๐ = ๐ข๐ ๐๐
using < ๐ ๐ข๐๐๐ก๐๐๐ > labels _
Here the benchmark is characterized by ๐บ๐ต = ๐บ๐ถ and๐๐ต = ๐๐ถ \ {๐๐ } โช {(๐๐ = ๐ข๐ ๐๐ )}. In practice, the slicing on๐ข is replaced by one on ๐ข๐ ๐๐ .
โข In a past benchmark the syntax takes the form
with < ๐๐ข๐๐ > for ๐1, . . . , ๐๐ , ๐๐ก = ๐ข by ๐บ
assess๐ against past ๐
using < ๐ ๐ข๐๐๐ก๐๐๐ > labels _
where ๐๐ก is a temporal level, ๐๐ก โ ๐บ , and ๐ is an integer.Here the benchmark is isomorphic to ๐ถ , except that thevalues of๐ are those predicted based on a time series oflength ๐ฃ .
Finally, as to the assessment method, its specification is basedon the using and labels clauses.
โข The using clause specifies a (nested) function that de-scribes how the comparison is made, including possibletransformations to be made on measures (e.g., the compu-tation of a derived measure). Here, a keyword benchmarkis used to distinguish, when necessary, the cells of the tar-get cube from the corresponding ones in the benchmark.
โข The labels clause specifies a labeling function, either basedon explicit ranges or on the overall value distribution, tobe applied to the result of the computation specified bythe using clause. A range-based labeling function can beeither predeclared by the user and given a name (e.g., 5starin Example 3.3) or declared inline within the statementby listing its set of ranges with the corresponding label;the user is in charge of ensuring that the set of ranges iscomplete and non-overlapping. A set of library labelingfunction based on the value distribution (e.g., quartiles) isalso made available to users.
In all cases above, the result returned to the user includes, foreach cell, (i) its coordinate, (ii) the value of๐ for that coordinate,(iii) the value of the benchmark measure, (iv) the value result-ing from the comparison, and (v) the corresponding label. Thebenchmark measure is๐๐๐๐๐ ๐ก for constant benchmarks,๐ forsibling and past benchmarks, and๐๐ for external benchmarks.
Example 4.1. The first example gives an absolute assessmentof the total monthly sales in terms of quartiles:
with SALES by month
assess storeSales labels quartiles
Similarly, sales can be assessed against a goal value, say 1000, viaa 5 star scale in the [0..1] range by first normalizing the differenceand then using the range-based labeling function specified inExample 3.3:
with SALES by month
assess storeSales against 1000
using minMaxNorm(difference(storeSales,1000))
labels 5star
The following statement uses a sibling benchmark; for each prod-uct of type fresh fruit, the total quantity sold in Italy is assessedagainst the one in France. For each product, assessment is basedon the ratio between (i) the difference in quantities sold in Italyand France, and (ii) the total sales of fresh fruit in Italy; this ratio
125
is computed using library function ๐๐๐๐๐ ๐๐๐๐ก๐๐ .
with SALES
for type = โFresh Fruitโ, country =โItalyโby product, country
assess quantity against country = โFranceโusing percOfTotal(difference(quantity, benchmark.quantity))
labels {[-inf, -0.2): bad, [-0.2,0.2]: ok, (0.2, inf]: good},
Finally, in the next statement we use a past benchmark; specifi-cally, we assess the sales of a specific store in July 1997 againstthe past four months:
with SALES
for month = โ1997-07โ, store = โSmartMartโby month, store
assess storeSales against past 4
using ratio(storeSales, benchmark.storeSales)
labels {[0, 0.9): worse, [0.9, 1.1]: fine, (1.1,inf): better}
โก
4.2 Logical operatorsThis section introduces the logical aspects behind the differentsteps of the evaluation of an assess statement, formulated aslogical operators. Note that our aim is not to propose a logicallanguage for manipulating cubes (such languages exist, see e.g.[2]) but to describe specific cube manipulations required to logi-cally optimize assess statements. In particular, we do not detailthe classical (roll-up, etc.) cube manipulations.
We recall from Section 2 that a cube is defined as a partialfunction that maps coordinates into tuples of measures. For acube ๐ถ and a coordinate ๐พ such that ๐ถ (๐พ) = ๐ก , we denote with๐ = โจ๐พ, ๐กโฉ the cell defined by ๐ถ (๐พ) and we abusively note ๐ โ ๐ถ .We define operators that respect the closure property, in the sensethat they operate on cubes and specify cubes.
Get. The first basic operator consists of obtaining the result ofa cube query. Given a cube ๐ถ over a schema C = (๐ป,๐), a set ofselection predicates ๐ and a group-by set๐บ of C, the get operatorcorresponds to the cube query ๐ = (๐ถ,๐บ, ๐,๐), is denoted by[๐], and defines the derived cube being the result of ๐. Note that[(๐ถ,๐บ0, โ , ๐)] is simply noted [๐ถ] in what follows. Besides, thederived cube returned by get can be renamed using the notation[(๐ถ,๐บ, ๐,๐)] โ ๐๐๐๐ .
Join โ . The join operation is essential for putting together thetarget cube (๐ถ1) and the benchmark (๐ถ2). In OLAP terms this is adrill-across operation, or join applied to cubes.
Let ๐ถ1 and ๐ถ2 be two joinable cubes over schemas C1 andC2. As already stated, we assume for simplicity that the twocubes share the same hierarchies, so that C1 = (๐ป,๐1) andC2 = (๐ป,๐2).
๐ถ1 โ ๐ถ2 = {โจ๐พ, (๐ก, ๐ก โฒ)โฉ|โจ๐พ, ๐กโฉ โ ๐ถ1, โจ๐พ, ๐ก โฒโฉ โ ๐ถ2}The schema of the resulting cube is (๐ป, (๐1, ๐2)).
We also define a version of join where we allow partial joiningin the sense that join is made on a subset of the levels of ๐ป .Formally:
๐ถ1 โ ๐1,...,๐๐ ๐ถ2 = {โจ๐พ, (๐ก, ๐ก1, . . . , ๐ก๐ )โฉ|
โจ๐พ, ๐กโฉ โ ๐ถ1, โจ๐พ ๐ , ๐ก ๐ โฉ โ ๐ถ2, ๐พ |๐1,...,๐๐ = ๐พ๐
|๐1,...,๐๐, ๐ โ [1, . . . , ๐]}
Italy
Apple โนquan%ty = 100โบ
Pear โนquan%ty = 90โบ
Lemon โนquan%ty = 30โบ
France
Apple โนquan%ty = 150โบ
Pear โนquan%ty = 110โบ
Lemon โนquan%ty = 20โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150โบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110โบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150, diff = โ50โบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110, diff = โ20โบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20, diff = 10โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150, diff = โ50, percOfTotal = โ0.23โบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110, diff = โ20, percOfTotal = โ0.09โบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20, diff = 10, percOfTotal = 0.05โบ
Italy France
Apple โนquan%ty = 100โบ โนquan%ty = 150โบ
Pear โนquan%ty = 90โบ โนquan%ty = 110โบ
Lemon โนquan%ty = 30โบ โนquan%ty = 20โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150, diff = โ50, percOfTotal = โ0.23, label = badโบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110, diff = โ20, percOfTotal = โ0.09, label = okโบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20, diff = 10, percOfTotal = 0.05, label = okโบ
C B
D
E
F
G
C'
Italy
Apple โนquan%ty = 100, qtyFrance = 150โบ
Pear โนquan%ty = 90, qtyFrance = 110โบ
Lemon โนquan%ty = 30, qtyFrance = 20โบ
D'
Figure 1: Derived cubes resulting from the application oflogical operators for the sibling intention in Example 4.5
Note that, differently from the (natural) join defined above, thispartial join is not commutative.
Finally, the assess* syntactical variant (that also returns thenon-matching cells of the target cube) uses a left-outer join๐ถ1 โ โ ๐ถ2 where non-matching cells are completed with nullvalues.
Example 4.2. Figure 1 shows the results, ๐ถ and ๐ต, of the fol-lowing get operations:
๐ถ =[(SALES, โจproduct, countryโฉ,{type = โFresh Fruitโ, country = โItalyโ},โจquantityโฉ)]
๐ต =[(SALES, โจproduct, countryโฉ,{type = โFresh Fruitโ, country = โFranceโ},โจquantityโฉ)] โ benchmark
(cube ๐ต is given alias benchmark) and the result of their partialjoin, ๐ท = ๐ถ โ product ๐ต. โก
Cell-Transform โ. This operator specifies a cell-at-a-time oper-ation that takes a cube and a function, and outputs a cube wherea new measure is added, containing the value of the functionapplied over the measure(s). Let C = (๐ป,๐) be the schema ofcube ๐ถ with group-by set ๐บ , and๐ be a subtuple of๐ . Let ๐ bea function defined on a tuple of parameters compatible with๐ ;then the cell-transformation operation operated by ๐ returns acube defined by:
โ๐โ๐๐๐๐,๐
(๐ถ) = {โจ๐พ, (๐ก, โจ๐ (๐)โฉ)โฉ | โจ๐พ, ๐กโฉ โ ๐ถ}
The schema of the resulting cube is (๐ป, (๐, โจ๐๐๐๐โฉ)), where๐๐๐๐ is the derived measure returned by ๐ .
126
Italy
Apple โนquan%ty = 100โบ
Pear โนquan%ty = 90โบ
Lemon โนquan%ty = 30โบ
France
Apple โนquan%ty = 150โบ
Pear โนquan%ty = 110โบ
Lemon โนquan%ty = 20โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150โบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110โบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150, diff = โ50โบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110, diff = โ20โบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20, diff = 10โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150, diff = โ50, percOfTotal = โ0.23โบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110, diff = โ20, percOfTotal = โ0.09โบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20, diff = 10, percOfTotal = 0.05โบ
Italy France
Apple โนquan%ty = 100โบ โนquan%ty = 150โบ
Pear โนquan%ty = 90โบ โนquan%ty = 110โบ
Lemon โนquan%ty = 30โบ โนquan%ty = 20โบ
Italy
Apple โนquan%ty = 100, benchmark.quan%ty = 150, diff = โ50, percOfTotal = โ0.23, label = badโบ
Pear โนquan%ty = 90, benchmark.quan%ty = 110, diff = โ20, percOfTotal = โ0.09, label = okโบ
Lemon โนquan%ty = 30, benchmark.quan%ty = 20, diff = 10, percOfTotal = 0.05, label = okโบ
C B
D
E
F
G
C'
Italy
Apple โนquan%ty = 100, qtyFrance = 150โบ
Pear โนquan%ty = 90, qtyFrance = 110โบ
Lemon โนquan%ty = 30, qtyFrance = 20โบ
D'
Figure 2: Example of application of the pivot operator
H-Transform โก. This operator considers holistic (H) transfor-mations, in the sense that computing a new measure value foreach cell of cube ๐ถ requires to know all the cells of ๐ถ .
Let again๐ be a subtuple of๐ as above. In this case, function๐ operates on a tuple of parameters compatible with๐ and on aset of tuples. The H-transformation of ๐ถ operated by ๐ returns acube defined by:
โก๐โ๐๐๐๐,๐
(๐ถ) = {โจ๐พ, (๐ก, โจ๐ (๐,๐ถ)โฉ)โฉ | โจ๐พ, ๐กโฉ โ ๐ถ}
The schema of the resulting cube is (๐ป, (๐, โจ๐๐๐๐โฉ)), where๐๐๐๐ is the extra measure returned by ๐ .
Example 4.3. The following cell-transformation extends cube๐ท with a derived measure storing their difference:
๐ธ = โ๐๐ ๐ ๐ ๐๐๐๐๐๐โdiff, โจquantity,benchmark.quantityโฉ (๐ท)Then, cube ๐น is obtained from ๐ธ by applying an H-transformationas follows:
๐น = โก๐๐๐๐๐๐๐๐๐ก๐๐โpercOfTotal, โจdiff,quantityโฉ (๐ธ)where holistic function ๐๐๐๐๐ ๐๐๐๐ก๐๐ operates on a tuple of twoparameters ๐ and ๐ and computes, for each cell, the ratio between๐ and the sum of ๐ over all cells. โก
Pivot โ. This operator takes a cube including a set of ๐ slicesof some level ๐ (a cube slice is the set of cells corresponding to onesingle member of a level), among which only one slice for a givenmember ๐ข๐ โ ๐ท๐๐(๐) is returned. Each coordinate ๐พ of this slicein the returned cube is associated with its initial tuple of measures๐ก , concatenated with all the ๐ measures๐ = โจ๐1, . . . ,๐๐ โฉ of allits ๐ โ 1 neighbor coordinates ๐พ โฒ in the initial set of ๐ slices. Thenew measures are renamed ๐๐๐๐1, . . . , ๐๐๐๐๐ . Formally, givencube ๐ถ with schema C = (๐ป,๐), let ๐ข1, . . . , ๐ข๐ โ ๐ท๐๐(๐) bethe members of ๐ on which the slices are defined. Let ๐ข๐ be thereference slice for pivoting. Then
โโจ๐1โ๐๐๐๐1,...,๐๐โ๐๐๐๐๐ โฉ,๐,๐ข๐ (๐ถ) =
{โจ๐พ, (๐ก, โจ๐ฃ11, . . . , ๐ฃ1๐โ1, . . . , ๐ฃ
๐
1 , . . . , ๐ฃ๐
๐โ1โฉ)โฉ|โจ๐พ, ๐กโฉ โ ๐ถ,๐พ |๐ = ๐ข๐ , โจ๐พ โฒ, ๐ก โฒโฉ โ ๐ถ,๐พ โฒ|๐ = ๐ข๐ ,
๐พ |๐บ\๐ = ๐พ โฒ|๐บ\๐, ๐ก โฒ|
๐๐= ๐ฃ
๐๐, ๐ โ [1, ๐ โ 1], ๐ โ [1, ๐]}
where the ๐ก โs are tuples of measure values. The schema of the re-sulting cube is (๐ป, (๐,๐๐๐๐1, . . . , ๐๐๐๐๐ )) where in turn๐๐๐๐ =
โจ๐1, . . . ,๐๐โ1โฉ.
Example 4.4. Figure 2 shows the result๐ถ โฒ of the following getoperator:
๐ถ โฒ =[(SALES, โจproduct, countryโฉ,{type = โFresh Fruitโ, country โ {โItalyโ, โFranceโ}},โจquantityโฉ)]
Cube ๐ถ โฒ includes two slices for country. By applying the follow-ing pivot operator:
๐ท โฒ = โโจquantityโฉโqtyFrance,country,โItalyโ (๐ถ โฒ)
a cube๐ท โฒ is obtained that includes only the reference slice (โItalyโ),with an extra measure qtyFrance. โก
4.3 SemanticsAssume the expression of the assess operator as defined in Section4.1:
with ๐ถ0 [ for ๐ ] by ๐บ
assess๐ [ against < ๐๐๐๐โ๐๐๐๐ > ]
[ using < ๐ ๐ข๐๐๐ก๐๐๐ > ] labels _
In terms of the logical operators introduced in Section 4.2, let
(1) โกฮ, ยท (ยท) be the composition of the comparison/transforma-tion functions denoted by the using clause.
(2) โก_, ยท (ยท) be the transformation that applies the labeling func-tion denoted by the labels clause.
Without loss of generalization, we assume that the functions thatare used for the comparison and the labeling are holistic. Clearly,the application of cell-based functions is also possible (and mostwelcome for efficiency and optimization purposes).
The semantics of an assess statement is defined as
โก_โ๐_,๐ฮ (โกฮโ๐ฮ,๐(๐ถ))
where the definition of cube๐ถ depends on the type of benchmarkused, which in turn is determined by the form taken by theagainst clause as explained in Section 4.1:
โข Constant benchmark: ๐ถ = [(๐ถ0,๐บ, ๐,๐)].โข External benchmark ๐ต: ๐ถ = [(๐ถ0,๐บ, ๐,๐)] โ [๐ต]โข Sibling benchmark:
๐ถ = [(๐ถ0,๐บ, ๐,๐)] โ ๐บ\๐๐ [(๐ถ0,๐บ, ๐๐ต, ๐)] โ benchmark
where ๐๐ต = ๐ \ {(๐๐ = ๐ข)} โช {(๐๐ = ๐ข๐ ๐๐ )}.โข Past benchmark:
๐ถ = [(๐ถ0,๐บ, ๐,๐)]โ ๐บ\๐๐ก (โ๐๐๐๐๐๐ ๐ ๐๐๐โ๐โฒ,๐ (โ๐โ๐โฒ,๐๐ก ,๐ข (
[(๐ถ0,๐บ, ๐๐ต, ๐)] โ benchmark)))
where ๐๐ต = ๐ \ {(๐๐ก = ๐ข)}โช {(๐๐ก โ {๐ข1, . . . , ๐ข๐ }), members๐ข1, . . . , ๐ข๐ are predecessors of๐ข for level ๐๐ก , and ๐๐๐๐๐๐ ๐ ๐๐๐is a time series prediction function.
Note that, in the assess* variant, the inner join is replaced bya left-outer join. In all cases, the resulting cube has schema(๐ป, โจ๐,๐๐ต,๐ฮ,๐_โฉ) The benchmark measure๐๐ต is๐๐๐๐๐ ๐ก forconstant benchmarks,๐ for sibling and past benchmarks, and๐๐ for external benchmarks.
Example 4.5. Consider again some of the statements of Exam-ple 4.1. The first one relies on a constant benchmark:
with SALES by month
assess storeSales labels quartiles,
and corresponds to the logical expression:
โก๐๐ข๐๐๐ก๐๐๐๐ , โจstoreSalesโฉ ( [(SALES, โจmonthโฉ, โ , โจstoreSalesโฉ)])
127
The one based on a sibling benchmark,
with SALES
for type = โFresh Fruitโ, country =โItalyโby product, country
assess quantity against country = โFranceโusing percOfTotal(difference(quantity, benchmark.quantity))
labels {[-inf, -0.2): bad, [-0.2,0.2]: ok, (0.2, inf]: good},
corresponds to the following plan (see Figure 1):(1) get the target cube:
๐ถ =[(SALES, โจproduct, countryโฉ,{type = โFresh Fruitโ, country = โItalyโ},โจquantityโฉ)]
(2) get the benchmark:
๐ต =[(SALES, โจproduct, countryโฉ,{type = โFresh Fruitโ, country = โFranceโ},โจquantityโฉ)] โ benchmark
(3) (partially) join ๐ถ and ๐ต:
๐ท = ๐ถ โ product ๐ต
(4) transform ๐ท :
๐ธ = โ๐๐ ๐ ๐ ๐๐๐๐๐๐โdiff, โจquantity,benchmark.quantityโฉ (๐ท)(5) transform ๐ธ:
๐น = โก๐๐๐๐๐๐๐๐๐ก๐๐โpercOfTotal, โจdiff,quantityโฉ (๐ธ)(6) transform ๐น :
๐บ = โ๐๐๐๐๐ ( { [โinf,โ0.2) :bad,
[โ0.2,0.2]:ok,(0.2,inf]:good}), โจpercOfTotalโฉ (๐น )The last one uses a past benchmark:
with SALES
for month = โ1997-07โ, store = โSmartMartโby month, store
assess storeSales against past 4
using ratio(storeSales, benchmark.storeSales)
labels {[0, 0.9): worse, [0.9, 1.1]: fine, (1.1,inf): better}
and corresponds to the following plan:(1) get the target cube:
๐ถ =[(SALES, โจmonth, storeโฉ,{month = โ1997-07โ, store = โSmartMartโ},โจstoreSalesโฉ)]
(2) get the data for the benchmark:
๐ต =[(SALES, โจmonth, storeโฉ,{month โ [โ1997-03โ; โ1997-06โ], store = โSmartMartโ},โจstoreSalesโฉ)] โ benchmark
(3) pivot ๐ต:
๐ท = โโจstoreSalesโฉโpast,month,โ1997-06โ๐ต
(4) transform ๐ท :
๐ธ = โ๐๐๐๐๐๐ ๐ ๐๐๐โโจstoreSalesโฉ,past๐ท
(5) (partially) join ๐ถ and ๐ธ:
๐น = ๐ถ โ store ๐ธ
(6) transform ๐น :
๐บ = โ๐๐๐ก๐๐โr, โจstoreSales,benchmark.storeSalesโฉ๐น
(7) transform ๐บ :
โ๐๐๐๐๐ ( { [0,0.9) :worse, [0.9,1.1]:fine,(1.1,inf) :better}), โจrโฉ๐บ
5 OPTIMIZING ASSESS STATEMENTSThis section illustrates how the logical operators introducedabove allow to optimize the evaluation strategies of assess ina rule-based fashion. We start by giving basic algebraic prop-erties of the operators, and then present optimization schemesexploiting these properties.
5.1 Basic propertiesCommutativity of transform (๐1). An important feature of the
transform operators is that they preserve the set of coordinates ofthe cube they are applied to, monotonically adding newmeasuresto it. In other words, the operators commute when one does notneed the result of the other. Formally,
โ๐โ๐๐ ,๐โฒ (โ๐โ๐๐,๐ (๐ถ)) = โ๐โ๐๐,๐ (โ๐โ๐๐ ,๐
โฒ (๐ถ))
if ๐๐ โ ๐ โฒ and ๐๐ โ ๐ . The same property holds for โก, and forcombinations of โก and โ.
Pushing join through transformation (๐2). A join can be pushedbefore a cell-transformation, if the transformation is appliedto the measures of only one of the joined cubes, by applyingthe transformation directly over that cube and removing thepivot operation needed to guarantee the two cubes are joinable.Formally,
(๐ถ,๐บ, ๐,๐) โ ๐บ\{๐ } (โ๐โ๐๐ ,๐2 โ๐1โ๐2,๐,๐ข (๐ถ,๐บ, ๐ โฒ, ๐1))= โ๐โ๐๐ ,๐1 ((๐ถ,๐บ, ๐,๐) โ ๐บ\{๐ } (๐ถ,๐บ, ๐ โฒ, ๐1))
where ๐ โฒ = ๐ \ {(๐๐ = ๐ข)} โช {(๐๐ โ {๐ข1, . . . , ๐๐}).
Replacing join with pivot (๐3). Joining different slices of thesame cube can be done either by getting each slice individuallyand partially joining them, or by getting the slices together andpivoting all but one of them. Formally,
[(๐ถ,๐บ, ๐,๐)]โ ๐บ\{๐ } [(๐ถ,๐บ, ๐ โฒ, ๐)] = โ๐โ๐โฒ,๐,๐ข [(๐ถ,๐บ, ๐๐๐๐ , ๐)]
where ๐ โฒ is a tuple of measure names not in ๐ , ๐ โฒ = ๐ \ {(๐ =๐ข)} โช {(๐ โ {๐ข1, . . . , ๐ข๐}) and ๐๐๐๐ = ๐ \ {(๐ = ๐ข)} โช {(๐ โ{๐ข,๐ข1, . . . , ๐ข๐})}.
5.2 Optimization strategiesIn a classical interactive cube analysis, a user expresses high-level manipulations through front-end applications over DBMSs.We assume the same context, where cubes are accessed throughcube queries (our logical get operation), over an already properlyoptimized DBMS. In this setting, we work under the followinghypotheses: (i) the get, join, and pivot logical operations can beexecuted via SQL queries; (ii) the results of these SQL queries fitin main memory; (iii) all transformations are seen as black-boxfunctions, thus they are not pushed to SQL. The optimizationopportunities of assess statements are then related to whichlogical operators are pushed to SQL.
Following the above assumptions, for an assess statementwe consider three possible plans, based on different executionstrategies, as described in the following subsections.
128
s e l e c t t 1 . country , t 1 . product ,2 t 1 . quan t i t y , t 2 . q u an t i t y as b c _ qu an t i t y
from4 ( s e l e c t country , product , sum ( qu an t i t y ) as q u an t i t y
from s a l e s s6 j o i n cus tomer c on c . ckey = s . ckey
j o i n produc t p on p . pkey = s . pkey8 where type = ' Fre sh F r u i t ' and count ry = ' I t a l y '
group by country , p roduc t ) t1 ,10 ( s e l e c t country , product , sum ( qu an t i t y ) as q u an t i t y
from s a l e s s12 j o i n cus tomer c on c . ckey = s . ckey
j o i n produc t p on p . pkey = s . pkey14 where type = ' Fre sh F r u i t ' and count ry = ' France '
group by country , p roduc t ) t 216 where t 1 . p roduc t = t 2 . p roduc t
Listing 4: Getting the pivoted cube of the sibling intentionfollowing JOP
5.2.1 Naive Plan. A Naive Plan (NP) faithfully reproduces thesequences of operations shown in Section 4.3; only the get oper-ations are pushed to SQL and all other operations are executedin memory. NP is feasible for all benchmark types.
Example 5.1. Consider the sibling statement of Example 4.5.Its NP consists in translating individually each get operationinto an SQL call, to retrieve the target and benchmark cubes.Specifically, the first get operation is translated in the SQL queryof Listing 1; the second get operation consists of the same SQLcode where the selection is made on โFranceโ instead of โItalyโ.All other subsequent operations of that statement, i.e., the partialjoin and the transformations, are done in memory.
5.2.2 Join-Optimized Plan. In a Join-Optimized Plan (JOP),also the join is pushed to SQL to take advantage of the DBMSoptimizer. This requires that the plan starts with the subexpres-sion ๐ถ โ ๐ต, where ๐ถ and ๐ต are two get operations, so that allthree operations can be pushed to SQL. JOP is not feasible forconstant benchmarks, since there is no join to be done; for theother benchmark types, it may require property ๐2 to be appliedto NP to postpone cell-transformations after the join.
Example 5.2. Consider the sibling statement of Example 4.5,and the subexpression of step (3): ๐ท = ๐ถ โ product ๐ต. This subex-pression is translated to the SQL query of Listing 4, with oneinner subquery for each get operation ๐ถ and ๐ต, and an outerquery for joining them. โก
Example 5.3. As mentioned above, property ๐2 can be usedto put an assess statement in a form that allows pushing thejoin to SQL. Consider for instance the five first steps of the paststatement of Example 4.5. Applying property ๐2 turns these stepsinto the plan:
(1) get the target cube:
๐ถ =[(SALES, โจmonth, storeโฉ,{month = โ1997-07โ, store = โSmartMartโ},โจstoreSalesโฉ)]
(2) get the data for the benchmark:
๐ต =[(SALES, โจmonth, storeโฉ,{month โ [โ1997-03โ; โ1997-06โ], store = โSmartMartโ},โจstoreSalesโฉ)] โ benchmark
(3) (partially) join ๐ถ and ๐ต:
๐ท = ๐ถ โ store ๐ต
(4) transform ๐ท :
๐ธ = โ๐๐๐๐๐๐ ๐ ๐๐๐โโจstoreSalesโฉ,benchmark.storeSales๐ท
The subexpression ๐ท = ๐ถ โ store ๐ต can be then pushed to SQL. โก
5.2.3 Pivot-Optimized Plan. The goal of a Pivot-OptimizedPlan (POP) is to let the DBMS compute pivot operations. To thisend, whenever the plan starts with the subexpression๐ถโ ๐ต, where๐ถ and ๐ต are get operations on the same cube, the join operationis replaced with the pivot operation using property ๐3, resultingin a pivot operation for aligning the target and benchmark slices.Both operations (get and pivot) are then pushed to SQL. POP isfeasible only for sibling and past intentions, which get multipleslices from a single cube.
Example 5.4. Consider the sibling statement of Example 4.5.Using property ๐3 allows to rewrite the plan to (see also Figures1 and 2):
(1) get the (target+benchmark) cube:
๐ถ โฒ =[(SALES, โจproduct, countryโฉ,{type = โFresh Fruitโ, country โ {โItalyโ, โFranceโ},โจquantityโฉ)]
(2) pivot ๐ถ โฒ:
๐ธ = โโจquantityโqtyFranceโฉ,country,โItalyโ (๐ถ โฒ)
(3) transform ๐ท โฒ:
๐ธ โฒ = โ๐๐ ๐ ๐ ๐๐๐๐๐๐โdiff, โจquantity,qtyFranceโฉ (๐ท โฒ)
(4) transform ๐ธ โฒ:
๐น โฒ = โก๐๐๐๐๐๐๐๐๐ก๐๐โpercOfTotal, โจdiff,quantityโฉ (๐ธ โฒ)
(5) transform ๐น โฒ:
๐บ โฒ = โ๐๐๐๐๐ ( { [โinf,โ0.2) :bad, [โ0.2,0.2]:ok,(0.2,inf]:good}),
โจpercOfTotalโฉ (๐น โฒ)
Listing 5 shows the resulting SQL query. Likewise, the past state-ment, in the form given in Example 5.3, can be rewritten with ๐3as:
(1) get (target+benchmark) cube:
๐ท =[(SALES, โจmonth, storeโฉ,{month โ [1997-03; 1997-07], store = โSmartMartโ},โจstoreSalesโฉ)]
(2) pivot ๐ท :
๐ธ = โโจstoreSalesโฉโpast,month,โ1997-07โ (๐ท)
(3) transform ๐ธ:
๐น = โ๐๐๐๐๐๐ ๐ ๐๐๐โbenchmark.storeSales, โจpastโฉ (๐ธ)
(4) transform ๐น :
๐บ = โ๐๐๐ก๐๐โr, โจstoreSales,benchmark.storeSalesโฉ (๐น )
(5) transform ๐บ :
โ๐๐๐๐๐ ( { [0,0.9) :worse, [0.9,1.1]:fine,(1.1,inf) :better}), โจrโฉ (๐บ)
Under this form, the first two steps of the plan can be transformedinto SQL calls. โก
129
s e l e c t ' I t a l y ' as country , product ,2 quan t i t y , b c _ qu an t i t y
from4 ( s e l e c t country , product , sum ( qu an t i t y ) as q u an t i t y
from s a l e s s6 j o i n cus tomer c on c . ckey = s . ckey
j o i n produc t p on p . pkey = s . pkey8 where type = ' Fre sh F r u i t '
and count ry in ( ' I t a l y ' , ' France ' )10 group by country , p roduc t )
p i v o t (12 sum ( qu an t i t y ) f o r count ry
in ( ' I t a l y ' as quan t i t y , ' France ' as b c _qu an t i t y )14 )
where qu an t i t y i s not n u l l and b c _qu an t i t y i s not n u l l
Listing 5: Getting the pivoted cube of the sibling intentionfollowing POP
Table 1: Formulation effort for different intentions
Constant External Sibling PastSQL: 481 989 1169 1954
Python: 7006 6193 6309 7049Total: 7487 7182 7478 9003assess: 143 260 270 254
6 EXPERIMENTSTo test our approach, we implemented the assess operator relyingon the simple multidimensional engine described in [6], whichuses multidimensional metadata to rewrite OLAP queries on astar schema stored in Oracle 11g DBMS. Post-processing of theresults (e.g., to apply transformations) is then done via off-the-shelf Python Scikit-learn over Pandas DataFrames. All tests wererun on an Intel(R) Core(TM)i7-6700 [email protected] CPU with8GB RAM.
The prototype was tested against the Star Schema Benchmark(SSB) cube, described by four hierarchies; please refer to [14]for the logical schema of the SSB dataset. As commonly donein OLAP settings, primary and foreign keys were indexed usingB-Trees, and materialized views were created to improve perfor-mances. The experiments are focused on four assess statementsof different types, henceforth referred to as Constant, External,Sibling, and Past, respectively.
6.1 Formulation effortThe first goal of our experiments is to evaluate the saving in userโseffort when writing an assess statement over the one necessaryto obtain the same result using plain SQL and Python. To thisend we adopt the simple metric proposed in [11], where theASCII character length is used as as a proxy for the effort it takesto craft a query. The results are shown in Table 1. For SQL andPython we considered the code generated by our prototype whenfollowing the less complex plan. Nevertheless, as expected, thetotal formulation effort using SQL+Python is, for each intentiontype, more than one order of magnitude larger than using assessstatements.
6.2 Efficiency and scalabilityOur second experimental goal is to evaluate the efficiency ofour approach in executing (i) different types of intentions, (ii)with different execution plans, and (iii) on cubes with differentcardinalities. To achieve (iii) we generated three detailed SSB
Table 2: Target cube cardinalities for each intention typeapplied to each detailed cube
๐๐๐ต1 ๐๐๐ต10 ๐๐๐ต100Constant 1.2 ยท 105 1.2 ยท 106 1.2 ยท 107External 2.4 ยท 104 2.5 ยท 105 2.5 ยท 106Sibling 2.4 ยท 104 2.5 ยท 105 2.5 ยท 106
Past 1.5 ยท 103 1.6 ยท 104 1.6 ยท 105
Table 3: Minimum execution times (in seconds) for differ-ent intentions (in parentheses, the corresponding execu-tion times for NP)
๐๐๐ต1 ๐๐๐ต10 ๐๐๐ต100Constant 0.60 (0.60) 6.77 (6.77) 45.14 (45.14)External 0.27 (0.31) 2.38 (2.60) 32.86 (35.60)Sibling 0.32 (0.42) 3.69 (4.97) 49.61 (99.93)
Past 1.20 (3.21) 11.72 (30.93) 118.25 (321.11)
cubes, namely ๐๐๐ต1, ๐๐๐ต10, and ๐๐๐ต100, with different scale fac-tors resulting in the following cardinalities:
|๐๐๐ต1 | =6 ยท 106
|๐๐๐ต10 | =6 ยท 107
|๐๐๐ต100 | =6 ยท 108
Note that the cardinality of each cube is equal to the numberof tuples in the corresponding fact table. Since the by and forclauses of each assess statement are not changed, scaling up thecardinality of the detailed cube implies that also the cardinalityof the target cube scales up as shown in Table 2. To reduce theimpact of caching, each assess statement was executed five timeson each detailed cube, and the execution times were averaged.
Figure 3 shows, on a logarithmic scale, the times in seconds forexecuting the Constant, External, Sibling, and Past intentions us-ing the NP, JOP, and POP plans, for increasing cube cardinalities.As to Constant, assessing a target cube of 1.2 ยท 107 tuples (derivedby querying ๐๐๐ต100) takes about 45 seconds, mostly employedto get the data from the DBMS. Note that, since this assessmentdoes not require the retrieval of a benchmark cube, only NP isfeasible. As to External, the only possible plans are NP and JOP(POP is not feasible here), with JOP providing the best perfor-mance. As to Sibling and Past, POP performs the best, taking50 seconds and 118 seconds, respectively. Being based on thepivot operator, POP gets in both cases the target cube and thebenchmark at once by retrieving the slices required together. Inother words, POP avoids the join between the target cube and thebenchmark, a time-consuming operation for NP and JOP. Overall,NP has the worst performance, since (i) it requires to separatelyget both cubes and join them into main memory, and (ii) it mayload into main memory unnecessary data (i.e., the tuples thatwill not match in the join). Overall, we can conclude that (i) JOP,when applicable, outperforms NP, and (ii) POP, when applicable,outperforms JOP and NP. This is summarized in Table 3 which,for each benchmark type, compares the best performance withthe one of the naive execution strategy. Remarkably, this tablealso clearly shows that our approach scales linearly for all theintentions.
Our last experimental goal is to understand which are themost expensive execution steps, i.e., those for which there isroom for further optimizations. The overall execution time for
130
SSB1 SSB10 SSB100
100
101
Tim
e (s
)Constant
SSB1 SSB10 SSB100
100
101
Tim
e (s
)
External
SSB1 SSB10 SSB100
100
101
102
Tim
e (s
)
Sibling
SSB1 SSB10 SSB100
100
101
102
Tim
e (s
)
Past
NPJOPPOP
Figure 3: Execution times for increasing cardinalities of the target cube ๐ถ
SSB1 SSB10 SSB10010 3
10 2
10 1
100
101
102
103
Tim
e (s
)
NP
SSB1 SSB10 SSB10010 3
10 2
10 1
100
101
102
103
Tim
e (s
)
JOP
SSB1 SSB10 SSB10010 3
10 2
10 1
100
101
102
103
Tim
e (s
)
POP
Get C Get B Get C + B Trans. Join Comp. Label
Figure 4: Breakdown of the execution time of the Past intention for increasing cardinalities of the target cube ๐ถ
an intention can be broken down into the time necessary to (1)get the target cube, (2) get the benchmark, (3) transform thetwo cubes (e.g., to apply regression in past benchmarks), (4) jointhem, (5) compute the comparison/transformation, and (6) labelthe result. This breakdown is shown in Figure 4 for each plan andwith increasing cube cardinalities. We focus on the Past intention,that is the most complex one since forecasting measure valuesrequires to compute a regression. First of all, we observe thatthe execution times for comparison and labeling are in the orderof milliseconds. Thus, not surprisingly, they are negligible withrespect to the time necessary to get and join the cubes. For allthree plans, transformation is the most time-consuming step,since linear regression has to be applied to huge numbers oftuples. As to the time for accessing data, note that:
โข NP brings both the target ๐ถ and benchmark ๐ต cubes intomain memory to join them; the cost for this main-memoryjoin is lower than the ones for getting the two cubes, butstill not negligible. The cost for the pivot operation iscounted as transformation.
โข JOP pushes the join to SQL; thus, in this case the cost forthe join is counted together with the one for getting๐ถ +๐ต.The cost for the pivot operation is counted as transforma-tion.
โข POP replaces the join with a pivot operation and pushesit to SQL, thus, the cost of pivot here is part of the cost forgetting ๐ถ + ๐ต.
7 RELATEDWORK7.1 OLAP models and operatorsOLAP comes with a large number of proposals on its foundationsand operators, all of which slowly converged towards the coreideas of cubes, dimensions, dimension hierarchies, and levels aswell as operators like roll-up, drill-down, slice, drill-across duringthe late โ90s. To avoid overcrowding the discussion, we refer theinterested reader to an excellent survey [16].
Over the years, several operators have been proposed to com-plement the fundamental ones. The DIFF operator [17] returnsthe set of tuples that most successfully describe the difference ofvalues between two cells of a cube that are given as input. Thesame author also describes a method that profiles the explorationof a user and uses the Maximum Entropy principle to recommendwhich unvisited parts of the cube can be the most surprising ina subsequent query [18]. Finally, the RELAX operator allows toverify whether a pattern observed at a certain level of detail ispresent at a coarser level of detail too [19].
In a different line of research, prediction cubes are proposedwith the characteristic property that each of the cells comes witha model that is trained to produce a predictive model with datathat correspond to that cell [5]. Then, a comparison betweenmodel and actual value is also possible, assessing the modelโsaccuracy. Also, the Shrink operator [9, 15] has been proposed toreduce the result size of a query with minimal loss of informationvalue via the calculated fusion of data slices.
Alternative operators have also been proposed in the Cinecubesmethod [7, 8]. The goal of this effort is to facilitate automatedreporting, given an original OLAP query as input. To achieve thispurpose two operators (expressed as acts) are proposed, namely,(a) put-in-context, i.e., compare the result of the original queryto query results over similar, sibling values; and (b) give-details,where drill-downs of the original queryโs groupers are performed.
Compared to the previous proposals, our work on the explicitintroduction of an assess operator differs in the fundamentalproblem it addresses. The works of Sarawagi are mostly of ex-planatory rather than assessment nature. Similarly, predictioncubes are trying to assess the impact of a set of predictor at-tributes on a class label in the context of a data cube, via anintroduced model for their relationship โagain, the emphasis ison trying to explain what we see rather than trying to provideassessments and labels on the comparison of the assessment.The Shrink operator is intended to compress without losing too
131
much information. The Cinecubes approach introduces an auto-matically invoked model of assessment in its put-in-context act;this is indeed a first form of assessment, although not tunable orexplicitly invoked by the user.
7.2 The Intentional Analytics ModelThe IAM for OLAP was introduced in [20]. Later this proposalwas significantly extended [21]. The main idea behind the inten-tional model for OLAP is that OLAP models need to be extendedwith (a) new operators, (b) altering of the definition of a queryresult, (c) introducing highlights to annotate the answers. Toaddress the first requirement, the traditional roll-ups and drill-downs operators were complemented with operators that pertainto the intention of the user towards the data โi.e., what is thereason why the user poses the query. The original, large set ofoperators (including operators like verify and analyze) was latersolidified and formalized into five operators, namely, describe,assess, explain, predict, and suggest [21]. The result of a queryis also redefined as a combination of data and KDD models thatare applied over the data. Also, the resulting data and modelsare evaluated with respect to their interestingness to producehighlights, i.e., subsets of the data that provide the most of novelinformation to the user. The foundations of the model can belinked to Bloomโs taxonomy and Anderson and Krathwohlโs re-finement to it [13, 22], which organize cognitive tasks as: (a)remembering, (b) understanding, (c) applying a procedure, (d)analyzing (component interrelationships), (e) evaluating (withrespect to criteria and standards), and (f) creating.
Although the IAM acts as an all-encompassing framework fordefining new operators, results, and highlights for OLAP, thegoal of the previous works was not go down into the detailsof each operator, but rather to dictate templates on what kindof algebraic operators we can introduce. The particularities ofthe describe operator (supporting the understanding process inBloomโs framework) were further explored [4]. The current paperextends the originally proposed assess operator (in turn, inspiredby the put-in-context operator) in significantly deeper ways,as it comes with several alternatives that were not obviouslyexpressed in the original work [21], as well as with the syntax ofan SQL-like language and optimization techniques.
8 CONCLUSIONSIn this paper we have introduced the assess operator to auto-matically evaluate and characterize the result of a cube query interms of labels given to the single cells based on their compar-ison with a benchmark. We have provided several alternativesfor specifying benchmarks, comparison, and labeling schemes.Finally, we have discussed alternative plans for the execution ofassess statements showing that their performance is perfectly inline with the right time requirement of analysis sessions.
Our future work on the assess operator will develop in differ-ent directions:
โข Consider cube schemas including descriptive propertiesof levels (e.g., the population of a country). Introducingproperties will enable users to express more complex state-ments, e.g., to compare per capita sales of different coun-tries.
โข Devise strategies for effectively completing partial assessstatements, for instance, ones where the against, using or
benchmark clauses are not specified by the user. Interest-ingly, this could require different possibilities to be testedand ranked based on their expected interest for the user.
โข Enhance the expressiveness of the assess operator by con-sidering more complex labeling functions (e.g., functionsbased on ranges that depend not only on comparison val-ues of cells, but also on their coordinates) and additionaltypes of benchmarks (for instance to let the sales of milkbe assessed against those of drinks, i.e., against an ancestorof milk in the roll-up order).
โข Investigate the relevant properties of our logical operatorsand develop a cost-based optimization strategy.
REFERENCES[1] Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of
Databases. Addison-Wesley.[2] Rakesh Agrawal, Ashish Gupta, and Sunita Sarawagi. 1997. Modeling Multidi-
mensional Databases. In Proceedings of ICDE. Birmingham, UK, 232โ243.[3] Leilani Battle, Michael Stonebraker, and Remco Chang. 2013. Dynamic re-
duction of query result sets for interactive visualizaton. In Proceedings ofInternational Conference on Big Data. Santa Clara, CA, USA, 1โ8.
[4] Antoine Chรฉdin, Matteo Francia, Patrick Marcel, Verรณnika Peralta, and StefanoRizzi. 2020. The Tell-Tale Cube. In Proceedings of ADBIS. Lyon, France, 204โ218.
[5] Bee-Chung Chen, Lei Chen, Yi Lin, and Raghu Ramakrishnan. 2005. PredictionCubes. In Proceedings of VLDB. Trondheim, Norway, 982โ993.
[6] Matteo Francia, Enrico Gallinucci, and Matteo Golfarelli. 2020. TowardsConversational OLAP. In Proceedings of DOLAP. Copenhagen, Denmark, 6โ15.
[7] Dimitrios Gkesoulis and Panos Vassiliadis. 2013. CineCubes: cubes as moviestars with little effort. In Proceedings of DOLAP. San Francisco, CA, USA, 3โ10.
[8] Dimitrios Gkesoulis, Panos Vassiliadis, and Petros Manousis. 2015. CineCubes:Aiding data workers gain insights from OLAP queries. Inf. Syst. 53 (2015),60โ86.
[9] Matteo Golfarelli, Simone Graziani, and Stefano Rizzi. 2014. Shrink: An OLAPoperation for balancing precision and size of pivot tables. Data Knowl. Eng.93 (2014), 19โ41.
[10] Matteo Golfarelli, Federica Mandreoli, Wilma Penzo, Stefano Rizzi, and ElisaTurricchia. 2012. OLAP query reformulation in peer-to-peer data warehousing.Inf. Syst. 37, 5 (2012), 393โ411.
[11] Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed Lazowska.2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. InProceedings of SIGMOD. San Francisco, CA, USA, 281โ293.
[12] Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012.Enterprise Data Analysis and Visualization: An Interview Study. IEEE Trans.Vis. Comput. Graph. 18, 12 (2012), 2917โ2926.
[13] David R. Krathwohl. 2002. A Revision of Bloomโs Taxonomy: An Overview.Theory Into Practice 41, 4 (2002), 212โ218.
[14] Patrick E. OโNeil, Elizabeth J. OโNeil, Xuedong Chen, and Stephen Revilak.2009. The Star Schema Benchmark and Augmented Fact Table Indexing. InProceedings of TPCTC. Lyon, France, 237โ252.
[15] Stefano Rizzi, Matteo Golfarelli, and Simone Graziani. 2015. An OLAM Oper-ator for Multi-Dimensional Shrink. Int. J. of Data Warehousing and Mining 11,3 (2015), 68โ97.
[16] Oscar Romero and Alberto Abellรณ. 2007. On the Need of a Reference Algebrafor OLAP. In Proceedings of DaWaK. Regensburg, Germany, 99โ110.
[17] Sunita Sarawagi. 1999. Explaining Differences inMultidimensional Aggregates.In Proceedings of VLDB. Edinburgh, Scotland, 42โ53.
[18] Sunita Sarawagi. 2000. User-Adaptive Exploration of Multidimensional Data.In Proceedings of VLDB. Cairo, Egypt, 307โ316.
[19] Gayatri Sathe and Sunita Sarawagi. 2001. Intelligent Rollups in Multidimen-sional OLAP Data. In Proceedings of VLDB. Roma, Italy, 531โ540.
[20] Panos Vassiliadis and Patrick Marcel. 2018. The Road to Highlights is Pavedwith Good Intentions: Envisioning a Paradigm Shift in OLAP Modeling. InProceedings of DOLAP. Vienna, Austria.
[21] Panos Vassiliadis, Patrick Marcel, and Stefano Rizzi. 2019. Beyond Roll-Upโsand Drill-Downโs: An Intentional Analytics Model to Reinvent OLAP. Infor-mation Systems 85 (2019), 68โ91.
[22] Leslie Owen Wilson. 2016. Anderson and Krathwohl - Bloomโs Taxonomy Re-vised. thesecondprinciple.com/teaching-essentials/beyond-bloom-cognitive-taxonomy-revised/.
[23] Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process,and Challenges of Exploratory Data Analysis: An Interview Study. CoRRabs/1911.00568 (2019).
132