sql強制使用索引，索引sql server_優化SQL Server索引策略

2023-10-18 阅读 16 评论 0

摘要：索引sql server 指數策略概述 (Index strategies overview) This article is about techniques for optimizing the SQL Server indexes strategy. It is an appendix of the SQL index overview and strategy article in which I covered different areas like what indexes

索引sql server

指數策略概述 (Index strategies overview)

This article is about techniques for optimizing the SQL Server indexes strategy. It is an appendix of the SQL index overview and strategy article in which I covered different areas like what indexes actually do, how to create them, and I briefly mentioned some index design guidelines. Furthermore, I also presented an example of how to design them by tuning and optimizing queries, so I’ve really tried to cover all but there is always more when it comes to SQL Server indexes.

本文介紹有關優化SQL Server索引策略的技術。它是SQL索引概述和策略文章的附錄，其中涵蓋了不同的領域，例如索引的實際作用，如何創建索引，并簡要介紹了一些索引設計準則。此外，我還提供了一個示例，說明如何通過優化和優化查詢來設計它們，因此我確實嘗試涵蓋所有內容，但在涉及SQL Server索引時總會有更多內容。

So, this is going to be another brief article to go over a lot of important areas and take a high-level look at what kind of strategy we should employ for each major kind of index.

因此，這將是另一篇簡短的文章，涉及許多重要領域，并從高層次看待每種主要索引應采用哪種策略。

If you head over to the initial article that I put as a “plug” at the beginning and read it, you’ll notice that some of this stuff I already covered but we’re going to look at it again from a higher level and show some different examples, so we can get another perspective of how all this stuff works.

如果您翻閱我作為“插件”開頭的第一篇文章并閱讀，您會注意到我已經介紹了其中一些內容，但是我們將在更高層次上再次進行研究。顯示一些不同的示例，因此我們可以對所有這些東西的工作方式有一個新的認識。

聚集索引 (Clustered indexes)

To kick off, let’s see what clustered SQL Server indexes are and how they should be used. A clustered index is one that physically stores the data sorted on disk by the key. Now, before we continue, you most likely have heard of clustered index and have some understanding of what it is, how it works, etc. but I bet not everyone has heard about heap. Heaps are tables that don’t contain a clustered SQL Server index. They are just a pile of data, neither sorted logically or physically in any order. Having said that, it brings up the question is there any situation where we should or would have use heap? It is obviously bad. In other words, this is roughly translated to is there any situation where we wouldn’t use an index. The answer to this question is rarely. The best-case scenario would be to put a clustered SQL Server index on every table but there are cases when a table is so small that it simply won’t be beneficial to have an index on it because SQL Server would choose to do a table scan anyway, so the cost of having and managing the index would be greater than not having one.

首先，讓我們看看什么是群集SQL Server索引以及如何使用它們。聚集索引是一種物理存儲按鍵排序在磁盤上的數據的索引。現在，在繼續之前，您很可能已經聽說過聚集索引，并且對聚集索引，它如何工作等有所了解。但是我敢打賭，并不是所有人都聽說過堆。堆是不包含群集SQL Server索引的表。它們只是一堆數據，沒有以邏輯或物理順序進行排序。話雖如此，但它提出了一個問題，在任何情況下，我們應該或應該使用堆？顯然是不好的。換句話說，在任何情況下我們都不會使用索引，這可以粗略地解釋為。這個問題的答案很少。最好的情況是在每個表上放置一個群集SQL Server索引，但是在某些情況下，表太小以至于不能在上面建立索引，因為SQL Server會選擇做一個表無論如何進行掃描，因此擁有和管理索引的成本將比沒有索引的成本高。

The bottom line is the heaps are bad because they represent unorganized data. They cause a lot more I/O for SQL Server to handle and the best way to organize this pile of unorganized data is to simply put a clustered index on it.

最重要的是，堆是壞的，因為它們表示無組織的數據。它們導致SQL Server需要處理更多的I / O，而組織這堆無組織的數據的最佳方法是簡單地在其上放置聚簇索引。

Having a clustered SQL Server index on a table means the data will be physically stored and organized in order on disk by the key. The key is simply a field that we choose when creating a clustered index and it’s also going to form the base for all other non-clustered indexes which will essentially store pointers to where that data sits inside the clustered indexes. Therefore, it important to ensure the following:

在表上具有群集SQL Server索引意味著，將通過密鑰在磁盤上按順序物理存儲和組織數據。密鑰只是我們在創建聚簇索引時選擇的字段，它也將構成所有其他非聚簇索引的基礎，這些非聚簇索引實際上將存儲指向數據在聚簇索引內的位置的指針。因此，重要的是要確保以下幾點：

Static – First of all, it’s really important to choose a static key or in other words one that doesn’t change. Otherwise, if we choose a field that is often modified then any time that clustered index is modified SQL Server has a lot of organizing to do at the physical level. In addition to this, all non-clustered indexes need to be changed too to reflect the changes 靜態 –首先，選擇一個靜態密鑰或換句話說，不要更改一個靜態密鑰非常重要。否則，如果我們選擇一個經常修改的字段，那么只要修改聚簇索引，SQL Server就會在物理級別進行大量組織工作。除此之外，所有非聚集索引也需要更改以反映更改
Narrow – Having a narrow key for the clustered index is equity important because non-clustered indexes are going to store clustered indexes key in their index as a pointer. Meaning if we have a wide key, all the non-clustered indexes are going to store that wide key which will essentially require a lot more data pages which will ultimately require more I/O, memory, CPU, and more work for SQL Server to do 窄 –對聚集索引使用窄鍵非常重要，因為非聚集索引將把聚集索引鍵存儲在其索引中作為指針。意味著如果我們有一個寬鍵，那么所有非聚集索引都將存儲該寬鍵，這實際上將需要更多的數據頁，最終將需要更多的I / O，內存，CPU和更多SQL Server工作。做
Unique – This is a good SQL Server index. No matter if we choose a non-unique index key, SQL Server will always make it unique by appending four bytes to the end of it. How unique the key is directly connected to how narrow it is. If we don’t have a unique column, then we don’t have a very narrow column either 唯一 –這是一個很好SQL Server索引。無論我們是否選擇非唯一索引鍵，SQL Server都會始終通過在其末尾附加四個字節來使其唯一。密鑰的獨特程度與它的狹窄程度直接相關。如果我們沒有唯一的列，那么我們也沒有很窄的列
Sequential – Finally, we always want a sequential key. The one that will always auto-increase. This will guarantee that whenever there is new data coming in, it will be placed in order (at the end of the index). Otherwise, if it’s not sequential, SQL Server would have a lot of shifting to do which creates fragmentation, and potential problems along the way 順序鍵 –最后，我們總是需要順序鍵。總是會自動增加的那個。這將確保無論何時有新數據傳入，都將按順序放置它們（在索引的末尾）。否則，如果不是連續的，SQL Server將要做很多改變，這會產生碎片，并在此過程中可能出現問題

So, when choosing a key for a clustered SQL Server index, ask yourself four questions. Is it all from above? If it has all of the above characteristics, then you have a very good clustered index key.

因此，在為群集SQL Server索引選擇鍵時，問自己四個問題。都是從上面來的嗎？如果它具有上述所有特征，那么您將擁有一個非常好的聚簇索引鍵。

Also, worth mentioning is that almost every table should have a clustered key except for the very rare cases when the heap is okay.

另外，值得一提的是，除了非常少見的堆可以使用的情況以外，幾乎每個表都應具有集群鍵。

Identity columns are good because they have all of the good characteristics for an excellent clustered SQL Server index. The only downside of the identity column is that it doesn’t really describe data in any way. A good practice here would be if you can find a natural key, that’s great but if you can’t, don’t force it. Why? Because if you force it, there is a big chance that it will not have the best characteristics mentioned above.

標識列之所以不錯，是因為它們具有出色的群集SQL Server索引的所有優良特性。標識列的唯一缺點是它實際上并沒有以任何方式描述數據。這里的一個好習慣是，如果您可以找到一個自然鍵，那很好，但如果找不到，就不要強行使用它。為什么？因為如果您強制使用它，則很有可能不會具有上述最佳特性。

GUID columns are also acceptable but the big problem with those is that they are not narrow – on the contrary, they are very wide. They will most likely solve the problem of generating unique IDs almost anywhere, but they are not a great clustered key and in addition to this, they are not sequential by default. The general rule of thumb here is to only use GUID columns if you have to because there is a bigger problem. Sure thing, you can use the new sequential ID but there’s a catch. For example, if you reboot a SQL Server, there’s a chance that it will start generating them sequentially from a point prior to the most recent one before the SQL Server is rebooted.

GUID列也是可以接受的，但是它們的最大問題是它們不窄-相反，它們很寬。它們很可能會解決幾乎在任何地方生成唯一ID的問題，但是它們不是一個很好的群集密鑰，而且除此之外，默認情況下它們也不是順序的。一般的經驗法則是僅在必要時才使用GUID列，因為存在更大的問題。當然，您可以使用新的順序ID，但是有一個問題。例如，如果您重新啟動SQL Server，則有可能從重新啟動SQL Server之前的最新點開始按順序開始生成它們。

Multiple columns are also generally bad for SQL Server indexes, same as GUID columns, they are not narrow so we bump into the same issue with characteristics.

與GUID列相同，多列通常也不利于SQL Server索引，它們不窄，因此我們遇到了具有特征的相同問題。

Bottom line is to try to stay away from composite keys. The only use case when it would be okay to use those is when using intermediate tables because usually when we create an intermediate table, it will have a few keys in there, so it can represent many-to-many relationships and a lot of time those are going to be integers. So, a couple of integers as a composite key is okay because it’s still going to be pretty narrow. But then again if you choose a composite key that is comprised of natural keys then you can get into trouble. So, the ultimate advice is just to pay attention to those four characteristics as this is definitely the best strategy.

底線是試圖遠離復合鍵。可以使用它們的唯一用例是在使用中間表時，因為通常當我們創建一個中間表時，它將在其中有一些鍵，因此它可以表示多對多關系和很多時間這些將是整數。因此，可以使用幾個整數作為復合鍵，因為它仍然很窄。但是，如果再次選擇包含自然鍵的復合鍵，則可能會遇到麻煩。因此，最終建議只是注意這四個特征，因為這絕對是最佳策略。

非聚集索引 (Non-clustered indexes)

Non-clustered SQL Server indexes are dependably performing hard work when it comes to the performance of our databases. They represent a separate structure that is logically sorted and they just point to the physical data. This means that a non-clustered index can point to either a heap or a clustered index. If it’s a clustered index, it’s going to use a key as its pointer, and if it’s a heap it’s going to use row ID as its pointer.

當涉及到數據庫性能時，非群集SQL Server索引可以可靠地完成艱苦的工作。它們代表邏輯上已排序的單獨結構，并且它們僅指向物理數據。這意味著非聚集索引可以指向堆索引或聚集索引。如果是聚簇索引，它將使用鍵作為其指針，如果是堆，則將使用行ID作為其指針。

Non-clustered SQL Server indexes are most commonly used for searching columns. A good practice would be to analyze our queries every time we define an index, look for the predicates (a logical condition being applied to rows in a table), also look at all the filters in the Where clause and try to create indexes that include anything that is inside the Where clause.

非群集SQL Server索引最常用于搜索列。一個好的做法是，每次定義索引時都要分析查詢，查找謂詞（將邏輯條件應用于表中的行），還要查看Where子句中的所有過濾器，并嘗試創建包含以下內容的索引Where子句中的任何內容。

If we do this right, that will increase chances that it can pull all the data from the actual indexes rather than have to go into the data pages which will ultimately give us huge performance gain.

如果我們做對了，那將增加它可以從實際索引中提取所有數據的機會，而不必進入數據頁面，這最終將給我們帶來巨大的性能提升。

Filtered SQL Server indexes are also good because they allow us to create an index on a subset of data and this is accomplished by putting a Where clause inside an index. These are great for sparse columns (ordinary columns that have an optimized storage for null values) and popular subsets of data.

篩選后SQL Server索引也很好，因為它們允許我們在數據子集上創建索引，這是通過將Where子句放在索引內來實現的。這些對于稀疏列（具有優化的空值存儲的普通列）和流行的數據子集非常有用。

Last but not least, one of the greatest strategies when it comes to non-clustered indexes is indexing foreign keys. This is pretty much the best practice and you should always do it.

最后但并非最不重要的一點是，對于非聚集索引而言，最大的策略之一就是索引外鍵。這幾乎是最佳做法，您應該始終這樣做。

列存儲索引 (Columnstore indexes)

Columnstore SQL Server indexes are used to speed up access to a very large amount of data on large tables. Columnstore index is basically a vertical read-only index. To understand what this means, think of a picture where data is stored horizontally on data pages. This is how most of the relation engines stores data at the record level. This means field by field, column by column… the entire row is stored in a data page.

列存儲SQL Server索引用于加快對大型表上大量數據的訪問。列存儲索引基本上是垂直只讀索引。要了解這意味著什么，請考慮一張圖片，其中的數據水平存儲在數據頁上。這就是大多數關系引擎在記錄級別存儲數據的方式。這意味著逐字段，逐列……整行存儲在數據頁中。

On the other hand, with the vertical index, it just stores a column. Performance wise, when SQL Server loads data pages into memory, it could be loading a lot of unnecessary data to extract what it needs. When only the column’s data is loaded we have efficient use of memory and SQL Server has a lot less to do because it’s loading only what it needs into memory.

另一方面，對于垂直索引，它僅存儲一列。在性能方面，當SQL Server將數據頁加載到內存中時，它可能正在加載很多不必要的數據以提取所需的內容。當僅加載列的數據時，我們可以有效地使用內存，而SQL Server只需要做的事情就很多，因為它僅將所需的內容加載到內存中。

戰略方針 (Strategy guidelines)

How many SQL Server indexes should a table have is the most asked question. The rule of thumb is to keep it under 10 for OLTP environments. Depending on the type of table, the indexing can vary and be less or more than 10. For example, let’s say there is a table that is rarely hit with Inserts, Updates, and Deletes than we can go above 10. On the other side, if a table is extremely busy then it’s a good idea to go with far more less than 10. In general, we should stick to under 10 indexes for OLTP environments.

一個表應具有多少個SQL Server索引是最常見的問題。經驗法則是在OLTP環境中將其保持在10以下。根據表的類型，索引可以變化，并且可以小于或大于10。例如，假設有一個表很少被插入，更新和刪除命中，而不能超過10。，如果一個表非常忙，那么最好使用少于10個的表。通常，對于OLTP環境，我們應該堅持使用10個以下的索引。

In a world of data warehouses AKA OLAP, the above rule is the opposite. OLAP is characterized by a relatively low volume of transactions. So, what that means is there should be no activity or much less activity in our data warehouses. The only thing to be considered here is performance. Therefore, we should generously index OLAP environments.

在數據倉庫（又稱為OLAP）的世界中，上述規則是相反的。 OLAP的特點是交易量相對較低。因此，這意味著在我們的數據倉庫中應該沒有任何活動或更少的活動。這里唯一要考慮的是性能。因此，我們應該慷慨地索引OLAP環境。

Furthermore, don’t forget three important things to deploy to decrease fragmentation. Try to specify fill factor for a SQL Server index. If we have a table that is hit very often, and in addition to this there is high fragmentation, then this means a lot of page splitting is going under the hood. So, we can decrease the fill factor by leaving empty spaces on the pages which will ultimately lead to decrease fragmentation. Also, don’t forget to choose the right data types and specify default when you can because these will also lower the fragmentation.

此外，不要忘記部署三個重要的東西以減少碎片。嘗試為SQL Server索引指定填充因子。如果我們有一個經常被擊中的表，并且除此之外還有很高的碎片，那么這意味著很多頁面拆分正在進行。因此，我們可以通過在頁面上保留空白空間來減少填充因子，從而最終減少碎片。另外，不要忘記選擇正確的數據類型并在可能的情況下指定默認數據類型，因為它們也會減少碎片。

結論 (Conclusion)

I would wrap things up with this and recommend a few other articles for reading that I wrote on indexing subject with the goal of boosting the performance of SQL Server. If you’re interested, go ahead and read the following articles:

我將對此進行總結，并推薦一些其他有關閱讀的文章，這些文章是我為建立索引主題而寫的，目的是提高SQL Server的性能。如果您有興趣，請繼續閱讀以下文章：