摘要: 檢索性能的優(yōu)劣在數(shù)據(jù)倉庫應(yīng)用中是至關(guān)重要的,位圖索引在性能優(yōu)化中起關(guān)鍵作用之一, 并且它和傳統(tǒng)的B樹索引是不同的。通過實(shí)例描述了位圖索引的構(gòu)成原理,即:位圖索引是由一系列有序的位向量組成;詳細(xì)闡述了位圖索引的特性(包括優(yōu)點(diǎn)和缺點(diǎn))以及其使用條件。最后文章結(jié)論指出合理地使用位圖索引可以極大地改善大型系統(tǒng)的檢索效率和減少系統(tǒng)資源。
關(guān)鍵詞: 位圖索引;性能優(yōu)化;數(shù)據(jù)倉庫
中圖分類號:TP311 文獻(xiàn)標(biāo)識碼:A 文章編號:1009-3044(2013)02-0233-03
Application of Bitmap Indexes in Data Warehouse Performance Tuning
WANG Hui, WEI Shu-di, LIANG Xiao-man
(Dept. of Computer Science, Hengyang Normal University, Hengyang 421002, China)
Abstract: Query performance is critical in data warehouse applications, and bitmap indexes, which are quite different from the traditional B-tree indexes, play one of the important roles in performance tuning. The paper describes the principle of bitmap indexes through examples, i.e. a bitmap index is made up of an ordered series of bits, states detailedly the features (including advantages and disadvantages) and the usage conditions of bitmap indexes. Finally the paper concludes that using bitmap indexes properly can greatly improve query efficiency and reduce system resources in large systems.
Key words: bitmap index; performance tuning; data warehouse
With the rapid development of information technology in various industries, database (DB) and data warehouse (DW) applications are in existence more and more widely. In the applications, system performance is a critical issue, and becomes increasingly important as the systems get larger and more complex. Performance improvements and strategies vary in their effectiveness, and systems with different purposes, such as operational systems and decision support systems, require different performance skills[1]. In the phase of data modeling and physical design, besides the usage of materialized views[2], properly handling derived fields[3] and partitioning large tables, which can improve DB/DW performance to some extent, indexes provide the method for retrieving data rapidly. Creating, maintaining and using indexes properly can speed up data query processing exponentially through avoiding full table scan, and enormously optimize the DB/DW performance.
There are several index types available, and each index has benefits for certain situations[4][5]. Besides the traditional indexes, B-tree indexes, which are excellent for primary key and high-selective columns, bitmap indexes are very effective for low cardinality columns, i.e., they have a small number of distinct values. Through compression techniques, bitmap indexes can generate a large number of rowids with minimal I/O.
Bitmap indexes have been proposed and supported by some commercial DBMS. These data structures are particularly well adapted to DW environment. In this paper we first describe the principle of bitmap indexes in Section 1. In Section 2 we present the SQL statements for creation of bitmap indexes. We study the features of bitmap indexes in Section 3 and usage conditions of bitmap indexes in Section 4. We conclude the paper and present the significance of bitmap indexes in Section 5.
1 Principle of bitmap indexes
1.1 Construction of bitmap indexes
A bitmap index is made up of an ordered series of bits, one for each distinct value of the indexed column.[6]
Suppose that column C is the indexed column of table T, whose cardinality (the number of distinct values) is m, and the record number of table T is n (serial number: 1,2,…,k,…,n). The bitmap index on column C consists of m bit arrays (commonly called bitmaps) with length n, each of which refers to a distinct value in C. If the value of column C in the k-th record is ‘v’, the k-th bit of the bit array corresponding to ‘v’ is set to 1; if the value of column C in the k-th record is not ‘v’, the k-th bit of the bit array corresponding to ‘v’ is set to 0.
A picture is worth a thousand words, Figure 1 shows the example of bitmap indexes:
Figure 1 Example of bitmap indexes
Assume that the table “CUSTOMER” stores customers data with columns “CNO”, “SURNAME”, “GIVEN NAME”, “GENDER”, “MARITAL STATUS” and so on. The column “GENDER” has only two distinct values: Male and Female, and the column for marital status has only four values, namely, Single, Married, Divorced and Widow.
A bitmap index on GENDER column contains two bit arrays. The first one refers to ‘Male’, and the second to ‘Female’. If a customer is male, the bitmap entry for that customer consists of two bits, where the first bit is set to 1, and the second bit is set to 0. If a customer is female, the two bits in the bitmap entry for the customer are 0 and 1. Likewise, the second bitmap index contains four bit arrays, referring to ‘Single’, ‘Married’, ‘Divorced’ and ‘Widow’ respectively.
When a bitmap index is created, an entry is constructed for each row in the base table. Each entry carries the address or rowid of the base table row.
1.2 Query on bitmap indexes
A query on bitmap indexes is eventually converted to a Boolean logic operation on the bit arrays, which can efficiently merge the indexes that correspond to the conditions in a WHERE clause. The rows that don’t satisfy the conditions are filtered out before the base table itself is accessed. This extremely improves query performance through avoiding full table scan.[7]
Consider a query against the table “CUSTOMER” in Figure 1: Select the rows from the table “CUSTOMER” where gender is “Male” and marital status is “Single” or “Divorced”.
Since there are bitmap indexes on columns “GENDER” and “MARITAL STATUS” respectively, the query is based on the Boolean logic operation of corresponding bit arrays. Firstly, the bit array for “Single” is “1000”, and the bit array for “Divorced” is “0001”. The result of their OR operation is “1001”. Secondly, through AND operation with the result above and the bit array for “Male” (“1101”), we get “1001”. Thirdly, the address (or rowid) corresponding to the bit “1” is applied to access the base table to obtain the desired rows, which are the first and fourth ones in this case.
Figure 2 is the example of data retrieval on bitmap indexes.
Figure 2 Example of data retrieval on bitmap indexes
1.3 Compression of bitmap indexes
Consider the space size of a bitmap index. Let us say the records number of a table is n, and the cardinality of the indexed column is m, from the description above, the overall number (s) of bits among a bitmap index is equal to the multiplying product of n and m, i.e. s = n*m. If m is big enough, close to n, s is close to n2, which occupies a big space for the bitmap index.[8]
Software can compress each bitmap in a bitmap index to save space. Bitmap compression algorithms typically employ run-length encoding, such as the Byte-aligned Bitmap Code(BBC), the Word-Aligned Hybrid code(WAH), the Partitioned Word-Aligned Hybrid(PWAH) compression and so forth[9][10]. In these methods, the number of bits s can be reduced to n[log2n], rather than n2. More importantly, the compressed bitmap index can directly participate in bitwise operations without decompression.
The compression techniques are effective in both reducing index sizes and maintaining query performance. Especially, the higher the cardinality of the indexed column is, the more space is saved in the compressed index, and the more significant the compression is.
2 Creation of bitmap indexes
Bitmap index can be created with SQL statement, which is similar to the traditional B-tree index in syntax. The following is the creation statements for the bitmap indexes in Figure 1:
CREATE BITMAP INDEX gender_ind ON customer(gender) TABLESPACE tbs_ind ;
CREATE BITMAP INDEX marital_ind ON customer(marital_status) TABLESPACE tbs_ind ;
In the examples, the user must have the INDEX object privilege on the table to be indexed and quota on tablespace specified.
3 Features of bitmap indexes
There are some features for bitmap indexes.
(1)Since bitmap indexes only store bit arrays for the distinct values of indexed columns, for low cardinality indexed columns, bitmap indexes take significantly less space than other indexes such as B-tree indexes, especially using compression techniques.
(2)Bitmap indexes are effective for the queries using low-selectivity columns in a WHERE clause, through applying Boolean logic to merge the indexes with the conditions in the WHERE clause. Even bitmap indexes are compressed, the merges can be done without decompression.
(3)Bitmap indexes include rows with NULL values of indexed columns, unlike the traditional B-tree indexes. Indexing of nulls is useful for some SQL statements, such as queries with the aggregate function COUNT.
(4)Bitmap indexes are integrated with DBMS optimizer and execution engine, and can be used seamlessly in combination with the execution methods.
(5)If new values are introduced for the indexed columns, the corresponding bitmap indexes have to be reconstructed. So bitmap indexes are not suitable for OLTP applications where users modify the data frequently.
(6)Another disadvantage is that accessing the base table is necessary all the time after the bitmap indexes are accessed, unlike B-tree indexes, accessing the base table may not be required if the requested information is already contained in the B-tree indexes.
4 Usage conditions of bitmap indexes
As described above, bitmap indexes can improve some query performance and reduce storage requirements compared to other indexing techniques, however, there are some usage conditions for bitmap indexes.
(1)Bitmap indexes are ideally suitable for low cardinality columns, in which the number of distinct values is small, compared to the number of rows in the table. For instance, “GENDER”, “MARITAL STATUS” in Figure 1 are candidates for bitmap indexes, so are “EDUCATION”, “OCCUPATION”, “SALUTATION”, “NATIONALITY” etc. for customers.
(2)The advantages of using bitmap indexes are great for low-selectivity data, rather than the columns which are primarily queried with less than or greater than comparisons.
(3)Bitmap indexes are most effective for queries that contain multiple conditions, such as AND, OR, NOT, or equality, in the WHERE clause.
(4)Bitmap indexes are primarily intended for DW applications where there are large amounts of data and ad-hoc queries, and users query the data rather than update it. They are not suitable for OLTP applications with large numbers of concurrent DML transactions modifying the data.
5 Conclusion
Performance is substantial in a large data process system, and bitmap indexes play one of the important roles in query performance-tuning, especially in DW environment. The paper describes the principle, creation, features and usage conditions of bitmap indexes, and it is believable and significant that using bitmap indexes properly will greatly improve query efficiency and reduce system resources in large systems.
References:
[1] MOORE V,BARLOW J,BARRIERE V,etc.Oracle Database Performance Tuning Guide 10g Release 1[M].California, USA, Oracle Corp,2003.
[2] WANG Hui.Li Lang.pplication of Materialized View as Aggregate Table in Data Warehouse[J].Journal of Hengyang Normal University, 2012,33(6):59-62.
[3] WANG Hui,WEI Shu-di.Solution to handle the derived fields in database schema design[J].Computer Knowledge and Technology, 2012,8(28):6652-6654.
[4] CYRAN M, LANE P. Oracle Database Concepts 10g release 1[M].California, USA, Oracle Corp,2003.
[5] ZHAO Zhi-sheng,LI Jiang,YE Yun-long.The Research on Index Strategy in Data Warehouse's Model Desigh[J].Computer Development & Applications, 2010,23(1):8-10,17.
[6] PONNIAH P.Data Warehousing Fundamentals for IT Professionals[M], 2nd ed. New Jersey USA John Wiley & Sons, Inc. 2010.
[7] QIN Xue-yong,YAO Yan-sheng.Research and Analysis on Some Key Problems of Scalable Data Warehouse[J].Computer Technology and Development, 2006,16(12):136-138.
[8] WAN Huai-yu,HUANG Hou-kuan.Study on bitmap index and its application in data warehouse[J].Railway Computer Application, 2006,15(12):31-33.
[9] XIA Fang,CHEN Hong,CAO Li-qiang et al.Using Bitmap Index to Accelerate Accessing Large Scale Scientific Data on Demand[J].Journal of Computer Research and Development, 2011, 48(Suppl.): 94-99.
[10] WU Kesheng,SHOSHANI Arie,STOCKINGER,Kurt.Analyses of Multi-Level and Multi-Component Compressed Bitmap Indexes[J].ACM Transactions on Database Systems, 2010. 35(1):2:1-2:52.