On-Line Analytical Processing with Conceptual Information Systems
更新时间:2023-05-22 01:23:01 阅读量: 实用文档 文档下载
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
On-Line Analytical Processing with Conceptual Information SystemsGerd StummeTechnische Universitat Darmstadt, Fachbereich Mathematik Schlo gartenstr. 7, D{64289 Darmstadt, stumme@mathematik.tu-darmstadt.de
Abstract. A Conceptual Information System consists of a database to-
gether with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with a database to analyze data conceptually. The paper describes the conception of Conceptual Information Systems and discusses the use of their visualization techniques for On-Line Analytical Processing (OLAP).
1 IntroductionA Conceptual Information System consists of a (relational) database together with conceptual hierarchies. These hierarchies, called conceptual scales, are used to support navigation through the data. An important factor for the success of Conceptual Information Systems is the visualization of conceptual scales by line diagrams. By combining conceptual scales in nested line diagrams, a large variety of perspectives can be generated interactively, in which relationships and dependencies can be investigated. The management system TOSCANA allows an on-line interaction with a database to analyze and explore data conceptually. On-Line Analytical Processing (OLAP) relies on the metaphor of a (high-dimensional) cube containing the data. For dimensions which are not structured hierarchically, the cube metaphor provides a good intuitive understanding of multidimensional data. But an essential feature of OLAP dimensions is that they are ordered hierarchally: days roll up into months, months into quarters and years, products into product groups and product lines. Often they are trees (simple hierarchies ), but they may be any arbitrary partially ordered set (multiple hierarchy ). In this setting, the cube metaphor which re ects the mathematical construction of a direct product of linear vector spaces is not the most natural way, since the hierarchies have to be forced into a at linear form. Instead of listing the hierarchies on (one-dimensional) axes, we suggest to visualize them by line diagrams. By using nested line diagrams, arbitrary dimensions can be combined for ad hoc analysis.
2 Conceptual Information SystemsConceptual Information Systems are based on the mathematical theory of Formal Concept Analysis. The aim of Formal Concept Analysis (cf. 11], 2]) is a mathematical formalization of the concept`concept'. It re ects the philosophical understanding of concepts as units of thought consisting of two parts: the extension containing all
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
C major 20 tongues Wood G major D major F major A major Piccolo 40 tongues Marine Band Soloist Plastic 28 tongues GLH Marine Band Oktav Chromatic Koch Slide Harp Big River Harp 24 tongues Marine Band Marine Band SBS E major B flat major
Piccolo GLH Big River Harp Marine Band Marine Band SBS Marine Band Soloist Marine Band Oktav Auto Valv
e Harp Chromatic Koch Slide Harp
20 tongues 24 tongues 28 tongues 40 tongues Wood Plastic C major D major E major F major G major A major B flat majorAuto Valve Harp
Fig. 1. Formal context of harps and its concept latticeobjects which belong to the concept and the intension containing the attributes shared by all those objects. This is modeled by formal concepts that are derived from a formal context. De nition. A (formal) context is a triple K:= (G; M; I) where G and M are sets and I is a relation between G and M. The elements of G and M are called objects and attributes, respectively, and gIm is read\the object g has the attribute m". Now a (formal) concept is a pair (A; B) such that A G and B M are maximal with A B I. The set A is called the extent and the set B the intent of the concept. The hierarchical subconcept{superconcept{relation of concepts is formalized by (A; B) (C; D): () A C (() B D). The set of all concepts of the context K together with this order relation is a complete lattice that is called the concept lattice of K and is denoted by B(K). Example. In Figure 1, a formal context of the Richter Harps produced by Hohner Inc. is given. The relation gIm is read as`harp g is available with feature m'. In the line diagram, the circles stand for the concepts. A concept is a subconcept of another, if there is an ascending path of straight line segments from the former to the latter. The extent intent] of each concept contains all objects attributes] which can be reached from the concept on a descending ascending] path. If we are for example interested in a wooden harp tuned in D major, then we
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
take the largest concept that has Wood and D major in its intent. This concept is represented by the circle just above the label 28 tongues. The extent of this concept contains Marine Band SBS, Marine Band, and Auto Valve Harp| so these are exactly the harps available in wood and D major. The intent of this concept contains| beside Wood and D major| the features A, F, G, and C Major, so the three harps are available in these tunings also. This corresponds to a functional dependency in database theory. In many applications, attributes are not one-valued as in the previous example, but allow a range of values. This is modelled by many-valued contexts. In order to obtain a concept lattice, many-valued contexts are`translated' into one-valued contexts by conceptual scales. De nition. A many-valued context is a tuple (G; M; (Wm )m2M; I) where G and M are sets of objects and attributes, resp., Wm is a set of values for each m 2 M, and S I G m2M (fmg Wm ) such that (g; m; w1) 2 I and (g; m; w2) 2 I imply w1= w2. A conceptual scale for an attribute m 2 M is a context Sm:= (Gm; Mm; Im ) with Wm Gm . The context (G; Mm; J) with gJn: () 9w2Wm: (g; m; w)2I^ (w; n)2Im is called the realized scale for the attribute m. Conceptual Information Systems consist of a many-valued context together with a collection of conceptual scales. The many-
valued context is implemented as a relational database. The collection of the scales is called conceptual scheme. It is written in the description language ConScript ( 9]). Beside the contexts of the conceptual scales, the conceptual scheme also contains the layout of their line diagrams. The layout has to be provided in advance, since experience showed that well readable line diagrams in general cannot be generated fully automatically. For Conceptual Information Systems, the management system TOSCANA ( 3], 10]) has been developed. Based on the paradigm of conceptual landscapes of knowledge ( 14]), TOSCANA supports the navigation through the data by using the conceptual scales like maps which are designed for di erent purposes and in di erent granularities. Example. Figure 2 shows a realized scale of a Conceptual Information System on pipelines ( 8]). The many-valued context consists of 3961 pipes, ttings, etc., and of 54 many-valued attributes. It shall support the engineer by choosing suitable parts for a projected pipeline system. Since there are almost 4000 objects, the scale does not display their names, but the contingents only. One can for instance see, that 52+:::+27= 348 of the 3961 di erent parts are anges (German: Flansche) which are di erentiated further according to the German Industrial Standards (DIN). By zooming into this concept, one can see the distribution of the 348 anges according to another conceptual scale, e. g, the inner diameter or the wall thickness. For the exploration of relationships between di erent attributes, it is desirable to visualize more than one conceptual scale at a time. Nested line diagrams are used to show the direct product of the scales. We introduce them in the next section where we also discuss their role for On-Line Analytical Processing.
3 On-Line Analytical ProcessingOn-Line Analytical Processing (OLAP) has become almost synomous with multidimensional data. OLAP adresses many topics, like data preprocessing and e cient
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
Rohrbogen
Flansche
Reduzierstücke DIN:2605-2 DIN:2605-1 T-Stücke Rohre DIN:2631
Nd6
Nd10 DIN:2642 DIN:2632 DIN:2634 DIN:2636 DIN:2638
DIN:2641
DIN:2633
DIN:2635
DIN:2637
1213 1215
240
560
385
52
30
5
30
72
15
48
33
36
27
Fig. 2. Realized scale`Part Type'data storage for supporting the analysis process (cf., e. g., 5]). Here, we focus on the visualization of the data. De nition. A dimension is a set D, its elements are called its members. Let D:= fD1; D2;:::; Dn g be a set of dimensions. Each tuple of XD:= D1 D2::: Dn is called a member combination. It addresses a single data point called a cell. A variable is a partial function: XD ! V where V is a set. (d1;:::; dn) is the value of the cell addressed by the member combination (d1;:::; dn). The set D together with one or more variables is called the data cube. Example. Our example is about sales data of a ( ctitious) soft-drink wholesale company. Suppose that we want to examine the s
ales of beverage in dependence of time, region and type of product. Thus we have three dimensions: region, product, and time. Let's say that they consist of the members Dregion:= ftotal, europe, america, north america, south america, asiag, Dproduct:= ftotal, mineral water, juice, orange juice, apple juice, colag, Dtime:= f1996, 1st
as stored in a three-dimensional cube, where the edges are labeled with the members of region, product, and time, resp. Most OLAP tools display the data in a spreadsheet as in Fig.3. For instance, we see that sales(cola; north america;
quarter 1996, 2nd quarter 1996, 3rd quarter 1996, 4th quarter 1996, 1997, 1st quarter 1997, 2nd quarter 1997, 3rd quarter 1997, 4th quarter 1997g. In a real application, there will of course be more dimensions, and a much ner granularity, for instance down to city or shop level for region, or to day (or even hour) level for time. The sales (in million gallons) are represented by a function sales: Dregion Dproduct Dtime ! R+. We can imagine the sales
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
Total Europe America North South Asia MineralWater 1997 837 442 268 174 94 127 1Q7 191 99 63 41 22 29 2Q7 201 102 66 43 23 33 3Q7 274 141 82 51 31 51 4Q7 171 100 57 39 18 14 Cola 1997 1523 432 673 375 298 418 1Q7 364 99 160 89 71 105 2Q7 378 103 171 91 80 104 3Q7 405 120 189 103 86 96 4Q7 376 110 153 92 61 113 Juice 1997 816 360 257 170 87 199 1Q7 189 81 62 41 21 46 2Q7 200 85 63 42 21 52 3Q7 223 99 68 44 24 56 4Q7 204 95 64 43 21 45 Total 1997 3176 1234 1198 719 479 744 1Q7 744 279 285 171 114 180 2Q7 779 290 300 176 124 189 3Q7 902 360 339 198 141 203 4Q7 751 305 274 174 100 172
Fig. 3. Visualization of the data cube in a spreadsheet (nested diagram)1st quarter 1997)= 89.
De nition. A hierarchy on a dimension D is a partially ordered set H:= (D; ). It is called simple hierarchy, if it is a tree. Otherwise it is called multiple hierarchy (within the dimension D).
Typically, aggregation follows the hierarchy from bottom to top. The type of aggregation depends on the type of variable. For most variables (like, e. g., budget or sales) the values will be summed up. But other ways of aggregation are in use as well. For instance, for share prices or inventory numbers, usually the average is computed. Example. The hierarchies of the three dimensions product, region, and time are shown in Figure 4. They are all simple hierarchies (trees). The sales are aggregated by summation in all dimensions. Orange juice and apple juice roll up to juice, and juice, mineral water and cola roll up to total. In OLAP terminology, diagrams as in Fig.3 are called nested diagrams. In this section, we examine how nested line diagrams of Conceptual Information Systems can be used as an alternative method of data visualization. Figure 5 shows how the data cube is composed as direct product of the dimensions, where the members of each dimension are ordered in a linear way. Many tools indicate the hierarchies on the dimensions additionally like a PC le manager
displays the folder/subfolder hierarchy. But the basically linear arrangement is essential for the cube metaphor. Since the hierarchies model the basic understanding of the conceptual view of the analyst on the data, they should play a prominent role in the visualization. Indeed, they are often used for displaying one single hierarchy, as in Figure 4. But if two or more hierarchies occur simultaneously, then this visualization technique is dropped.
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
Total Total Europe Mineral Water Juice Cola America Asia
Orange J.
Apple J.
North
South
1996
1997
1Q6 2Q6 3Q6 4Q6 1Q7 2Q7 3Q7
4Q7
Fig. 4. The hierarchiesIn Conceptual Information Systems, nested line diagrams are used for displaying line diagrams of large partially ordered sets (especially conceptual scales). Hierarchical dimensions roughly correspond to conceptual scales, so OLAP analysis tools can roughly be seen as special Conceptual Information Systems. Nested line diagrams can be used for drawing direct products of the dimensions. In contrast to nested diagrams, they do not only provide all member combinations, but also re ect the derived order: De nition. Let Hi:= (Di; i ), i= 1;:::n, be hierarchies. Then the derived order on the direct product H:= (D; ) with D:= D1 D2::: Dn is de ned by (d1;:::; dn) (e1;:::; en): () 8i 2 f1;:::; ng: di i ei . Example. The nested line diagram of the direct product of the three dimensions region, product, and time (see Fig.5) is displayed in Figure 6. The derived order can be read by following ascending paths. Hereby the lines of the outer two levels have to be replaced by sheaves of 4 and 5 4= 20 parallel lines, resp., linking corresponding elements. For instance, (south america, 3rd quarter, mineral water) (america, total, mineral water), since the cell addressed by the former member combination can be reached by an ascending path from the latter one. For nding out how much Cola was sold in North America in the rst quarter of 1997, we have a look in the lower left ellipse (labeled with north) in Fig.6. In the leftmost ellipse (1. Q.), we nd the entry 89 in the right box (cola). Clearly this representation needs more space than the one in Fig.3. Its advantage is the clear structuring along the most important| from the analyst's actual point of view| dimension (which is chosen as outermost hierarchy). Figure 6 shows that displaying a partially ordered set with 120 elements is close to the boundaries of the system. The spreadsheet can display even larger data volumes, and still look neat at a rst glance. But, as in typography, the most important aim is not to provide a neat representation, but one that supports easy reading ( 7]). TOSCANA is designed for a more general approach, allowing more complicate scales than the
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
t
c
nuodiog
erPR
enmoiigTeRProduct
e
miT
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
Asia
172
113
14
45
America
203
96
Total
418
744
199
127
189
104
51
105
33
52
56
South
180
46
100 18 21 141 479 298 94 87 124 114 71 174 92 22 43 21 198 103 176 91 171 89 41 41 43 42 51 170 44 39 23 21 86 80 31 24 64 339 189 300 171 82 62 66 63 68 719 120 103 141 102 85 99 North 174 375
Juice
4. Q.
3. Q.
29
376
751
Cola
204
274
171
405
902
223
1523
274
1198
673
Total
3176
816
837
378
779
200
268
257
364
201
744
189
285
1997
Mineral Water
191
1. Q.
2. Q.
305
110
100
95
1234
432
442
360
279
99
Europe
Fig.6. Visualization as nested line diagramDrill-Down. Restricting the number of dimensions means that one can look at them in more detail. Instead of only examining the topmost levels of their hierarchies, one wants to see a ner granularity. This unfolding of the hierarchies is called drill-down. Figure 7 shows the result of zooming into the top element of the outer hierarchy in Fig.6 (i.e., letting region=\total"), and drilling down the product hierarchy. Additionally, the time dimension has been
extended to two years. At the moment, TOSCANA does not support this way of drill-down very well. The hierarchies have to be prepared in advance and cannot be changed on the y, forced by the unsolved problem of a satisfying automatic lay-out algorithm
99
81
290
360
63
160
57
153
61
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
Time Products Mineral Water 6340 1678 1650
Total Juice Cola
3012
757 893 Orange J. Apple J.
1996 3164 841 1. Q. 722 184 193 345 90 103 781 214 206 95 361 4. Q. 912 281 229 106 402 123 162 97 749 206 381 109 191 744 189 87 364 1. Q. 3. Q. 2. Q. 388 834 1489 446 837 3176 816 1523 447
1997
369
4. Q. 751 3. Q. 2. Q. 902 274 223 779 201 200 91 378 99 405 171 204 92 376 112
111
124
109
102
Fig. 7. Zooming with drill-downfor partially ordered sets. In the next section, we discuss how this problem can be encountered. The actual solution is to provide scales with di erent levels of granularity between which the analyst can choose. Another way of drill-down is to refer to external information sources, e. g., databases of the transaction systems or Internet sites. In TOSCANA, such references can be attached to each data cell. By mouse-click, a report generated by the database or a Web browser will be opened.Pivoting. Di erent questions request di erent views on the data. In a spreadsheet
display, this means that the dimensions listed on the vertical and horizontal axis are interchanged. This operation is called pivoting or rotating. For nested line diagrams, it corresponds to permuting the inner and the outer hierarchies. The diagram in Fig.7 can be used to examine the question\How does the composition of sold products change over time?", while the pivoted version is more adequate for investigating\How do the sales evolve in time for each product?". Pivoting of hierarchies is implemented in TOSCANA.
5 Further developmentsHow can TOSCANA be ne tuned for the speci c structure of OLAP data? TOSCANA is originally not designed for OLAP. In its main applications, there are more
Abstract. A Conceptual Information System consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with
than one label attached to the nodes in the diagram, but typically, not all nodes are labeled. TOSCANA usually displays the list of labels beside the nodes. In diagrams as in Fig.6, this would lead to an over full diagram. But in OLAP applications, each node has exactly one label. Hence the label can be written directly in the node. In the gures in this paper, this had to be done manually, since this feature is not yet supported by TOSCANA. The readability can be improved further by adapting the layout of the labels to this characteristic. Drill-down requires techniques for extending and pruning hierarchies on the y. For arbitrary hierarchies (and even for lattices), the development of fully automatic algorithms providing satisfying diagrams is an open challenge. But most OLAP hierarchies are trees, which can be drawn automatically. Beside supporting drilldown, an automatic layout routine can solve the problem of e ciently exploiting the whole screen space. Data cubes are usually o
nly sparsely populated. Often less than 10% of the cells contain data. For the visualization this implies that not the whole direct product of the hierarchies needs to be displayed. In this case local scaling ( 6]) can be used for automatic pruning. It is considered to be implemented in TOSCANA.
References1. E. F. Codd, S. B. Codd, C. T. Salley: Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate. /essbase/ wht ppr/coddTOC.html 2. B. Ganter, R. Wille: Formale Begri sanalyse: Mathematische Grundlagen. Springer, Heidelberg 1996 (English translation to appear) 3. W. Kollewe, M. Skorsky, F. Vogt, R. Wille: TOSCANA{ ein Werkzeug zur begrifichen Analyse und Erkundung von Daten. In: R. Wille, M. Zickwol (eds.): Begri iche Wissensverarbeitung{ Grundfragen und Aufgaben. B. I.{Wissenschaftsverlag, Mannheim 1994 4. OLAP Council: OLAP glossary. 1995. /research/ 5. Pilot Software: An introduction to OLAP: Multidimensional terminology and technology. White Paper, Pilot Software, 1997, /olap/olap.htm 6. G. Stumme: Local scaling in conceptual data systems. LNAI 1115, Springer, Heidelberg 1996, 308{320 7. J. Tschichold: Erfreuliche Drucksachen durch gute Typographie: eine Fibel fur jedermann. Maro-Verlag, Augsburg, 2nd edition 1992 8. N. Vogel: Ein Begri iches Erkundungssystem fur Rohrleitungen. TH Darmstadt 1995 9. F. Vogt: Datenstrukturen und Algorithmen zur Formalen Begri sanalyse: Eine C++{ Klassenbibliothek. Springer, Heidelberg 1996 10. F. Vogt, R. Wille: TOSCANA| A graphical tool for analyzing and exploring data. LNCS 894, Springer, Heidelberg 1995, 226{233 11. R. Wille: Restructuring lattice theory: an approach based on hierarchies of concepts. In: I. Rival (ed.): Ordered sets. Reidel, Dordrecht{Boston 1982, 445{470 12. R. Wille: Line diagrams of hierarchical concept systems. Int. Classif. 11 (1984), 77{86 13. R. Wille: Lattices in data analysis: how to draw them with a computer In: I. Rival (ed.): Algorithms and order. Kluwer, Dordrecht{Boston 1989, 33{58 14. R. Wille: Conceptual landscapes of knowledge: A pragmatic paradigm of knowledge processing. In: Proc. KRUSE '98, Vancouver, Canada, 11.{13. 8. 1997, 2{13
正在阅读:
On-Line Analytical Processing with Conceptual Information Systems05-22
完整打印版小学语文第册教案(人教版)205-15
23.送东阳马生序 教案10-19
欢度六一作文500字07-09
《强化安全意识,提高避险能力》国旗下的讲话最新讲稿范例03-22
2015年江西公务员申论真题答案与解析11-12
2017民主评议党员登记表12-28
利用恒温水源进行矿井降温参考文本04-30
- 1Simple Text Processing
- 2Digital Signal Processing(1 Introduction)-讲义
- 3An analytical evaluation of the response of steel joints under
- 4Test-Method-Information
- 5Metadata Interchange for Chinese Information
- 6Impact of information and communications technology on transport
- 7An Introduction to Database Systems
- 8An Introduction to Database Systems
- 9InstallShield,Installation,Information是什么
- 10实验室操作技巧Schlenk line
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Information
- Analytical
- Processing
- Conceptual
- Systems
- Line
- with
- 土壤资源调查方案
- 传统文化在马克思主义中国化进程中的作用研究综述
- 衡水中学2010~2011年度高三物理三调考试试题+答案
- 上海三菱电梯有限公司产品推介会在天津举行
- 浅论立体印刷光栅材料
- 3.2.3神经调节与体液调节的关系
- 室内装修工艺流程
- 勘探与生产公司地震资料叠前时间偏移重点处理项目管理办法.doc
- 《免疫学》课程设计方案
- 《学前儿童家庭教育》自考复习提纲
- 行为矫正——链锁法
- 七年级历史 第18课 三国鼎立 教案 人教版
- 创建德育特色学校实施方案
- 如何防治小儿佝偻病?
- 电蚊香对婴儿有害吗
- WINCC FLEXIBLE如何实现BOOL量画面切换
- 心似双丝网,中有千千结
- 周练九年级化学第一二单元测试题
- 2010年6月英语四级真题—听力原文完整版(沪江)
- 求职简历怎么写知识分享