Plover Tsai: Clustered Indexes vs. Non-Clustered Indexes

2011年12月7日星期三

Clustered Indexes vs. Non-Clustered Indexes

A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. However, the index can comprise multiple columns (a composite index), like the way a telephone directory is organized by last name and first name. (Microsoft)

這樣的設計有什麼好處呢？

A clustered index is particularly efficient on columns that are often searched for ranges of values. 如果我們為時間欄位建 clustered index，以後 SELECT 某範圍時間 query 的速度會比較快。為什麼？因為物理硬碟順序保證一件事：找頭找尾，兩邊一夾資料就是你的。
Clustered indexes are also efficient for finding a specific row when the indexed value is unique. 為什麼？這就是以前演算法教的 binary search algorithm，當然資料庫不會建簡單的 binary tree，理論上要建 B-tree 才夠入流。

因此，怎麼善用 clustered index 的優點，就得靠資料庫設計者的智慧。此外，我們還得考慮 INSERT/UPDATE clustered index 的成本 (頻繁操作 clustered index 會導致 B-tree 常常動來動去。此外，clustered index 建太長，搬動 clustered index 也很耗硬碟I/O)

至此只談到 clustered index，那 non-clustered index 呢？或者說，clustered index 那麼好用，為什麼還要建不好用的 non-clustered index？前面說過了，為了保持物理硬碟順序，INSERT/UPDATE clustered index 需要成本。

根據以上推論，non-clustered index 不用保持物理硬碟順序，但要有 index 的功能。因此：

The data rows are not sorted and stored in order based on their nonclustered keys.
The leaf layer of a nonclustered index does not consist of the data pages. (如果 clustered index 跟 non-clustered index 都用到某一欄位，這是有可能的：只要把 clustered index 建在兩個以上的欄位就可以了。所以，non-clustered index 最末端的葉子勢必不能放真正的資料，只能放 pointer。)

用合乎邏輯的方式看 clustered indexes & non-clustered indexes，就不會覺得很奇怪，也不用硬背她們。

Plover Tsai

2011年12月7日星期三

Clustered Indexes vs. Non-Clustered Indexes

沒有留言:

張貼留言

2011年12月7日 星期三

Clustered Indexes vs. Non-Clustered Indexes

沒有留言:

張貼留言

2011年12月7日星期三