描述 (Description)
One of the best ways to optimize performance in a database is to design it right the first time! Making design and architecture decisions based on facts and best practices will reduce technical debt and the number of fixes that you need to implement in the future.
優化資料庫性能的最佳方法之一是在第一時間進行正确設計! 根據事實和最佳實踐做出設計和體系結構決策将減少技術負擔,并減少将來需要實施的修複程式數量。
While there are often ways to tweak queries, indexes, and server settings to make things faster, there are limits to what we can accomplish in a cost-efficient and timely manner. Those limits are dictated by database architecture and application design. It may often seem that prioritizing good design up-front will cost more than the ability to roll features out quickly, but in the long-run, the price paid for hasty or poorly-thought-out decisions will be higher.
盡管通常可以通過多種方法來調整查詢,索引和伺服器設定,以加快處理速度,但以經濟高效且及時的方式來完成工作卻受到了限制。 這些限制由資料庫體系結構和應用程式設計決定。 通常看來,優先考慮良好的設計要比快速部署功能花費更多,但是從長遠來看,為草率或考慮周全的決策付出的代價會更高。
This article is a fast-paced dive into the various design considerations that we can make when building new databases, tables, or procedural TSQL. Consider these as ways to improve performance in the long-term by making careful, well thought-out decisions now. Predicting the future of an application may not be easy when we are still nursing it through infancy, but we can consider a variety of tips that will give us a better shot at doing it right the first time!
本文是對建構新資料庫,表或過程TSQL時可以進行的各種設計考慮的快速了解。 現在考慮通過謹慎,周到的決策将這些方法視為長期改善性能的方法。 在嬰兒期仍在護理中的情況下,預測應用程式的未來可能并不容易,但是我們可以考慮各種技巧,這些技巧可以讓我們在第一時間正确地進行操作!
“Measure twice, cut once” applies to application development in that regard. We need to accept that doing it right the first time will be significantly cheaper and less time-consuming than needing to rearchitect in the future.
在這方面,“兩次測量,一次剪切”适用于應用程式開發。 我們需要接受的是,與将來重新配置相比,第一次正确執行将大大便宜且耗時更少。
了解應用 (Understand the application)
Learn the business need behind the application and its database. In addition to a basic understanding of what an app will do, consider the following questions:
了解應用程式及其資料庫背後的業務需求。 除了對應用程式的功能有基本了解之外,請考慮以下問題:
-
What is it used for? What is the app and its purpose? What kind of data will be involved?
它是幹什麼用的? 該應用程式及其用途是什麼? 将涉及哪種資料?
-
Who will access it? Will it be end-users on the web or internal employees? Will it be 5 people, 5,000 people or 5 million people?
誰将通路它? 是網絡上的最終使用者還是内部員工? 是5個人,5,000個人還是500萬人?
-
How will they access it? Will it be a web page? Mobile app? Local software on a computer or private cloud? A reporting interface?
他們将如何通路它? 會是網頁嗎? 移動應用? 計算機或私有雲上的本地軟體? 報告界面?
-
Are there specific times of day when usage is heavier? Do we need to accommodate busy times with extra resources? Will quiet times allow for planned maintenance or downtime? What sort of uptime is expected?
有一天的特定時段使用量更大嗎? 我們需要額外的資源來适應繁忙的時間嗎? 安靜的時間是否可以按計劃進行維護或停機? 預計什麼樣的正常運作時間?
Getting a basic idea of the purpose of a database will allow you to better forecast its future and avoid making any major mistakes in the beginning. If we know about application quirks, such as busy or quiet times, the use of ORMs, or the use of an API, we can better understand how it will interact with the database and make decisions to accommodate that usage.
了解資料庫用途的基本概念将使您能夠更好地預測資料庫的未來,并避免在開始時犯任何重大錯誤。 如果我們了解應用程式的怪癖,例如繁忙或安靜的時間,ORM的使用或API的使用,我們将更好地了解它如何與資料庫互動并做出适應該用法的決策。
Often, a good understanding of the app allows us to make simplifying assumptions and cut through a lot of the decisions we need to make. This conversation may involve both technical folks (architects, developers, engineers, and other DBAs) or it may involve business reps that have a strong understanding of the purpose of the app, even if they may not know much about the technical implementation of it.
通常,對應用程式有一個很好的了解,可以使我們做出簡化的假設并簡化許多需要做出的決定。 這種對話可能涉及技術人員(架構師,開發人員,工程師和其他DBA),也可能涉及對應用程式用途有深刻了解的業務代表,即使他們可能不太了解應用程式的技術實作。
Here are some more details on the questions and considerations we should make when designing new database objects:
以下是有關在設計新資料庫對象時應考慮的問題和注意事項的更多詳細資訊:
可擴充性 (Scalability)
How will the application and its data grow over time? The ways we build, maintain, and query data change when we know the volume will be huge. Very often we build and test code in very controlled dev/QA environments in which the data flow does not mirror a real production environment. Even without this, we should be able to estimate how much an app will be used and what the most common needs will be.
随着時間的流逝,應用程式及其資料将如何增長? 當我們知道資料量将會很大時,我們建構,維護和查詢資料的方式就會改變。 通常,我們在受嚴格控制的dev / QA環境中建構和測試代碼,在該環境中資料流不會反映真實的生産環境。 即使沒有這個,我們也應該能夠估計将使用多少應用程式以及最常見的需求是什麼。
We can then infer metrics such as database size, memory needs, CPU, and throughput. We want the hardware that we put our databases on to be able to perform adequately and this requires us to allocate enough computing resources to make this happen. A 10TB database will likely not perform well (or at all) on a server with 2GB of RAM available to it. Similarly, a high-traffic application may require faster throughput on the network and storage infrastructure, in addition to speedy SSDs. A database will only perform as quickly as its slowest component, and it is up to us to make sure that the slowest component is fast enough for our app.
然後,我們可以推斷名額,例如資料庫大小,記憶體需求,CPU和吞吐量。 我們希望我們放置資料庫的硬體能夠充分發揮作用,這需要我們配置設定足夠的計算資源來實作這一目标。 10TB資料庫在具有2GB RAM的伺服器上可能無法很好(或根本無法)運作。 同樣,高流量的應用程式除了需要快速的SSD之外,還可能需要在網絡和存儲基礎結構上具有更快的吞吐量。 資料庫的運作速度僅與最慢的元件一樣快,我們需要確定最慢的元件對我們的應用程式足夠快。
How will data size grow over time? Can we easily expand storage and memory easily when needed? If downtime is not an option, then we will need to consider hardware configurations that will either provide a ton of extra overhead to start or allow for seamless expansions later on. If we are not certain of data growth, do we expect the user or customer count to grow? If so, we may be able to infer data or usage growth based on this.
資料大小将如何随着時間增長? 是否可以在需要時輕松輕松地擴充存儲和記憶體? 如果不能選擇停機,那麼我們将需要考慮硬體配置,這些配置将提供大量額外的啟動費用,或者允許以後進行無縫擴充。 如果我們不确定資料的增長,那麼我們期望使用者或客戶數量會增長嗎? 如果是這樣,我們也許可以據此推斷資料或使用量的增長。
Licensing matters, too as licensing database software isn’t cheap. We should consider what edition of SQL Server will function on and what the least expensive edition is that we are allowed to use. A completely internal server with no customer-facing customer access may be able to benefit from using Developer edition. Alternatively, the choice between Enterprise and Standard may be decided by features (such as AlwaysOn) or capacity (memory limitations, for example). A link is provided at the end of this article with extensive comparisons between editions of SQL Server.
許可也很重要,因為許可資料庫軟體并不便宜。 我們應考慮将在哪個版本SQL Server上運作以及允許使用的最便宜版本。 沒有面向客戶的客戶通路權限的完全内部伺服器可能會從使用Developer Edition中受益。 或者,企業和标準之間的選擇可以由功能(例如AlwaysOn)或容量(例如記憶體限制)決定。 本文結尾處提供了一個連結,其中包含SQL Server版本之間的廣泛比較。
High availability and disaster recovery are very important considerations early-on that often are not visited until it is too late. What is the expected up-time of the app? How quickly are we expected to recover from an outage (recovery time objective/RTO)? In addition, how much data loss is tolerated in the event of an outage or disaster (recovery point objective,/RPO)? These are tough questions as businesses will often ask for a guarantee of zero downtime and no data loss, but will back off when they realize the cost to do so is astronomically high. This discussion is very important to have prior to an application being released as it ensures that contracts, terms of service, and other documentation accurately reflect the technical capabilities of the systems it resides on. It also allows you to plan ahead with disaster recovery plans and avoid the panic often associated with unexpected outages.
高可用性和災難恢複是非常重要的考慮因素,通常要等到太晚才進行通路。 該應用程式的預期正常運作時間是多少? 我們期望從中斷中恢複的速度有多快(恢複時間目标/ RTO)? 此外,如果發生中斷或災難(恢複點目标,/ RPO),可以容忍多少資料丢失? 這些難題是棘手的,因為企業通常要求保證零停機時間且沒有資料丢失,但是當他們意識到這樣做的成本天文數字很高時,就會退縮。 在釋出應用程式之前進行此讨論非常重要,因為它可以確定合同,服務條款和其他文檔準确反映其所駐留的系統的技術能力。 它還使您可以提前計劃災難恢複計劃,并避免經常因意外中斷而引起的恐慌。
資料類型 (Data types)
One of the most basic decisions that we can make when designing a database is to choose the right data types. Good choices can improve performance and maintainability. Poor choices will make work for us in the future 🙂
設計資料庫時,我們可以做出的最基本的決定之一就是選擇正确的資料類型。 好的選擇可以提高性能和可維護性。 糟糕的選擇将在未來為我們服務
Choose natural data types that fit the data being stored. A date should be a date, not a string. A bit should be a bit and not an integer or string. Many of these decisions are holdovers from years ago when data types were more limited and developers had to be creative in order to generate the data they wanted.
選擇适合要存儲的資料的自然資料類型。 日期應該是日期,而不是字元串。 位應該是位,而不是整數或字元串。 這些決定中有許多是幾年前的決定,當時資料類型更加有限,開發人員必須具有創造力才能生成所需的資料。
Choose length, precision, and size that fits the use case. Extra precision may seem like a useful add-on, but can be confusing to developers who need to understand why a DECIMAL(18,4) contains data with only two digits of decimal detail. Similarly, using a DATETIME to store a DATE or TIME can also be confusing and lead to bad data.
選擇适合用例的長度,精度和大小。 額外的精度似乎是有用的附加功能,但是對于需要了解為什麼DECIMAL(18,4)包含僅兩位數字的十進制詳細資訊的開發人員來說,這可能會造成混淆。 同樣,使用DATETIME存儲DATE或TIME也可能造成混亂,并導緻資料損壞。
When in doubt, consider using a standard, such as ISO5218 for gender, ISO3166 for country, or ISO4217 for currency. These allow you to quickly refer anyone to universal documentation on what data should look like, what is valid, and how it should be interpreted.
如有疑問,請考慮使用諸如性别的ISO5218,國家的ISO3166或貨币的ISO4217等标準。 這些使您可以快速地向任何人推薦通用文檔,以了解什麼樣的資料,什麼是有效的以及應該如何解釋它們。
Avoid storing HTML, XML, JSON, or other markup languages in the database. Storing, retrieving, and displaying this data is expensive. Let the app manage data presentation, not the database. A database exists to store and retrieve data, not to generate pretty documents or web pages.
避免在資料庫中存儲HTML,XML,JSON或其他标記語言。 存儲,檢索和顯示此資料非常昂貴。 讓應用程式管理資料表示,而不是資料庫。 存在一個資料庫來存儲和檢索資料,而不是生成漂亮的文檔或網頁。
Dates and times should be consistent across all tables. If time zones or locations will matter, consider using UTC time or DATETIMEOFFSET to model them. Upgrading a database in the future to add time zone support is much harder than using these conventions in the beginning. Dates, times, and durations are different. Label them so that it is easy to understand what they mean. Duration should be stored in a one-dimensional scalar unit, such as seconds or minutes. Storing duration in the format “HH:MM:SS.mmm” is confusing and difficult to manipulate when mathematical operations are needed.
所有表中的日期和時間應該一緻。 如果時區或位置很重要,請考慮使用UTC時間或DATETIMEOFFSET對其模組化。 與開始時使用這些約定相比,将來更新資料庫以增加時區支援要困難得多。 日期,時間和持續時間是不同的。 給它們加上标簽,以便于了解它們的含義。 持續時間應以一維标量機關存儲,例如秒或分鐘。 當需要數學運算時,以“ HH:MM:SS.mmm”格式存儲持續時間會造成混亂并且難以操作。
空值 (NULLs)
Use NULL when non-existence of data needs to be modelled in a meaningful fashion. Do not use made-up data to fill in NOT NULL columns, such as “1/1/1900” for dates, “-1” for integers, “00:00:00” for times, or “N/A” for strings. NOT NULL should mean that a column is required by an application and should always be populated with meaningful data.
當需要以有意義的方式對不存在的資料進行模組化時,請使用NULL。 請勿使用虛構資料來填充NOT NULL列,例如,“ 1/1/1900”代表日期,“-1”代表整數,“ 00:00:00”代表時間,或“ N / A”代表字元串。 NOT NULL表示應用程式需要一列,并且應始終填充有意義的資料。
NULL should have meaning and that meaning should be defined when the database is being designed. For example, “request_complete_date = NULL” could mean that a request is not yet complete. “Parent_id = NULL“ could indicate an entity with no parent.
NULL應該具有含義,并且在設計資料庫時應該定義含義。 例如,“ request_complete_date = NULL”可能意味着請求尚未完成。 “ Parent_id = NULL”可能表示沒有父代的實體。
NULL can be eliminated by additional normalization. For example, a parent-child table could be created that models all hierarchical relationships for an entity. This may be beneficial if these relationships form a critical component of how an app operates. Reserve the removal of NULLable columns via normalization for those that are important to an app or that may require additional supporting schema to function well. As always, normalization for the sake of normalization is probably not a good thing!
可以通過其他歸一化來消除NULL。 例如,可以建立一個父子表來對實體的所有層次關系進行模組化。 如果這些關系構成了應用程式運作的關鍵組成部分,那麼這可能是有益的。 通過規範化保留對應用程式很重要或可能需要其他支援架構才能正常運作的那些可空列的删除。 與往常一樣,為了标準化而進行标準化可能不是一件好事!
Beware NULL behavior. ORDER BY, GROUP BY, equalities, inequalities, and aggregate functions will all treat NULL differently. Always SET ANSI_NULLS ON. When performing operations on NULLable columns, be sure to check for NULL whenever needed. Here is a simple example from Adventureworks:
當心NULL行為。 ORDER BY,GROUP BY,等式,不等式和聚合函數都将以不同的方式對待NULL。 始終将ANSI_NULLS設定為ON。 對NULLable列執行操作時,請確定在需要時檢查NULL。 這是Adventureworks的一個簡單示例:
SELECT
*
FROM Person.Person
WHERE Title = NULL
SELECT
*
FROM Person.Person
WHERE Title IS NULL
These queries look similar but will return different results. The first query will return 0 rows, whereas the second will return 18,963 rows:

這些查詢看起來相似,但将傳回不同的結果。 第一個查詢将傳回0行,而第二個查詢将傳回18,963行:
The reason is that NULL is not a value and cannot be treated like a number or string. When checking for NULL or working with NULLable columns, always check and validate if you wish to include or exclude NULL values, and always use IS NOT NULL or IS NULL, instead of =, <, >, etc…
原因是NULL不是值,不能将其視為數字或字元串。 在檢查NULL或使用NULLable列時,請始終檢查并驗證是否要包含或排除NULL值,并始終使用IS NOT NULL或IS NULL代替=,<,>等。
SET ANSI NULLS ON is a default in SQL Server and should be left as a default. Adjusting this will change how the above behavior works and will go against ANSI standards. Building code to handle NULL effectively is a far more scalable approach than adjusting this setting.
SET ANSI NULLS ON是SQL Server中的預設值,應保留為預設值。 調整此設定将更改上述行為的工作方式,并違反ANSI标準。 建構代碼以有效處理NULL是比調整此設定更具伸縮性的方法。
對象名稱 (Object names)
Naming things is hard! Choosing descriptive, useful names for objects will greatly improve readability and the ability for developers to easily use those objects for their work and not make unintended mistakes.
命名很難! 為對象選擇描述性,有用的名稱将極大地提高可讀性,并使開發人員能夠輕松地将這些對象用于工作而不犯意外錯誤。
Name an object for what it is. Include units in the name if they are not absurdly obvious. “duration_in_seconds” is much more useful than “duration”. “Length_inches” is easier to understand than “length”. Bit columns should be named in the positive and match the business use case: “is_active”, “is_flagged_for_deletion”, “has_seventeen_pizzas”. Negative columns are usually confusing: “is_not_taxable”, “has_no_pizzas”, “is_not_active” will lead to mistakes and confusion as they are not intuitive. Database schema should not require puzzle-solving skills to understand ?
為對象命名。 如果名稱不是很明顯,則在名稱中包括機關。 “ duration_in_seconds”比“ duration”有用得多。 “ Length_inches”比“ length”更容易了解。 位列的名稱應為正,并與業務用例比對:“ is_active”,“ is_flagged_for_deletion”,“ has_seventeen_pizzas”。 負數列通常令人困惑:“ is_not_taxable”,“ has_no_pizzas”,“ is_not_active”會導緻錯誤和混亂,因為它們不直覺。 資料庫架構不應該需要解謎技巧才能了解?
Other things to avoid:
其他需要避免的事情:
-
Abbreviations & shorthand. This is rarely not confusing. If typing speed is a concern for slower typists, consider the many tools available that provide Intellisense or similar auto-completion features.
縮寫和速記。 這很少會引起混淆。 如果打字速度較慢的打字員擔心,請考慮提供Intellisense或類似自動完成功能的許多可用工具。
-
Spaces & special characters. They will break maintenance processes, confuse developers, and be a nuisance to type correctly when needed. Stick to numbers, letters, and underscores.
空格和特殊字元。 它們會破壞維護流程,使開發人員感到困惑,并且在需要時妨礙您正确鍵入内容。 堅持數字,字母和下劃線。
-
Reserved words. If it’s blue, white, or pink in SSMS, don’t use it! This only causes confusion and increases the chances of logical coding errors.
保留字。 如果它在SSMS中是藍色,白色或粉紅色,請不要使用它! 這隻會引起混亂并增加邏輯編碼錯誤的機會。
Consistency is valuable and creating effective naming schemes early will pay dividends later when there is no need to “fix” standards to not be awful. As for the debate between capitalization and whether you should use no capitals, camel case, pascal case, etc…, this is completely arbitrary and up to a development team. In databases with lots of objects, prefixes can be used to allow objects of specific types, origins, or purposes to be easily searchable. Alternatively, different schemas can be used to divide up objects of different types.
一緻性是有價值的,并且當不需要為“糟糕”的标準而“固定”标準時,盡早建立有效的命名方案将為您帶來回報。 至于大寫與是否應該不使用大寫,駱駝案,帕斯卡案等之間的争論,這完全是武斷的,取決于開發團隊。 在具有許多對象的資料庫中,字首可用于允許輕松搜尋特定類型,起源或目的的對象。 或者,可以使用不同的模式來劃分不同類型的對象。
Good object naming reduces mistakes and errors while speeding up app development. While nothing is truly self-documenting, quality object names reduce the need to find additional resources (docs or people) to determine what something is for or what it means.
良好的對象命名可以減少錯誤和錯誤,同時加快應用程式開發速度。 雖然沒有什麼是真正的自記錄文檔,但是高品質的對象名稱減少了尋找其他資源(文檔或人員)來确定其用途或含義的需求。
舊資料 (Old Data)
Whenever data is created, ask the question, “How long should it exist for?”. Forever is a long time and most data does not need to live forever. Find out or create a data retention policy for all data and write code to enforce it. Many businesses have compliance or privacy rules to follow that may dictate how long data needs to be retained for.
每當建立資料時,都問一個問題:“資料應存在多長時間?”。 永遠是很長的時間,大多數資料并不需要永遠存在。 查找或為所有資料建立資料保留政策,并編寫代碼以實施該政策。 許多企業都有遵從法規或隐私規則,可能會規定需要保留多長時間。
Limiting data size is a great way to improve performance and reduce data footprint! This is true for any table that stores historical data. A smaller table means smaller indexes, less storage use, less memory use, and less network bandwidth usage. All scans and seeks will be faster as the indexes are more compact and quicker to search.
限制資料大小是提高性能和減少資料占用的好方法! 對于任何存儲曆史資料的表都是如此。 較小的表意味着較小的索引,較少的存儲使用,較少的記憶體使用和較少的網絡帶寬使用。 由于索引更緊湊且搜尋更快,是以所有掃描和查找都會更快。
There are many ways to deal with old data. Here are a few examples:
有許多方法可以處理舊資料。 這裡有一些例子:
-
Delete it. Forever. If allowed, this is an easy and quick solution.
删除它。 永遠。 如果允許,這是一個簡單快捷的解決方案。
-
Archive it. Copy it to a secondary location (different database, server, partition, etc…) and then delete it.
存檔。 将其複制到輔助位置(不同的資料庫,伺服器,分區等),然後将其删除。
-
Soft-delete it. Have a flag that indicates that it is no longer relevant and can be ignored in normal processes. This is a good solution when you can leverage different storage partitions, filtered indexes, or ways to segregate data as soon as it is flagged as old.
軟删除它。 有一個标志,表明它不再相關,在正常過程中可以忽略。 當您可以利用不同的存儲分區,過濾索引或将資料标記為舊資料時将資料隔離的方法時,這是一個很好的解決方案。
-
Nothing. Some data truly is needed forever. If so, consider how to make the underlying structures scalable so that they perform well in the future. Consider how large the tables can grow.
沒有。 确實确實需要一些資料。 如果是這樣,請考慮如何使基礎結構具有可伸縮性,以使其在将來表現良好。 考慮表可以增長到多大。
Data retention doesn’t only involve production OLTP tables, but may also include backup files, reporting data, or data copies. Be sure to apply your retention policies to everything!
資料保留不僅涉及生産OLTP表,還可能包括備份檔案,報告資料或資料副本。 確定将保留政策應用于所有内容!
笛卡爾積(交叉聯接/無聯接謂詞) (Cartesian Products (Cross Joins/No Join Predicate))
All joins occur between some data set and a new data set. In bringing them together, we are connecting them via some set of keys. When we join data sets without any matching criteria, we are performing a CROSS JOIN (or cartesian product). While this can be a useful way to generate data needed by an application, it can also be a performance and data quality issue when done unintentionally.
所有聯接都發生在某些資料集和新資料集之間。 在将它們組合在一起時,我們通過一組鍵将它們連接配接起來。 當我們在沒有任何比對條件的情況下聯接資料集時,我們将執行“交叉聯接”(或笛卡爾積)。 盡管這可能是生成應用程式所需資料的有用方法,但如果無意間完成操作,也可能成為性能和資料品質問題。
There are a variety of ways to generate CROSS JOIN conditions:
有多種方法可以生成CROSS JOIN條件:
-
Use the CROSS JOIN operator
使用CROSS JOIN運算符
-
Enter incorrect join criteria
輸入不正确的加入條件
-
Unintentionally omit a join predicate
無意中省略了連接配接謂詞
-
Forget a WHERE clause
忘記WHERE子句
The following query is an example of the second possibility:
以下查詢是第二種可能性的示例:
SELECT
Product.Name,
Product.ProductNumber,
ProductModel.Name AS Product_Model_Name
FROM Production.Product
INNER JOIN Production.ProductModel
ON ProductModel.ProductModelID = ProductModel.ProductModelID
WHERE Product.ProductID = 777;
What we expect is a single row returned with some product data. What we get instead are 128 rows, one for each product model:
我們期望的是一行傳回一些産品資料。 相反,我們得到的是128行,每個産品型号對應一個:
We have two hints that something has gone wrong: An overly large result set, and an unexpected index scan in the execution plan:
我們有兩個提示,表明出了點問題:結果集太大,執行計劃中的索引掃描意外:
Upon closer inspection of our query, it becomes obvious that I fat-fingered the INNER JOIN and did not enter the correct table names:
在仔細檢查我們的查詢後,很明顯,我用手指指着INNER JOIN并沒有輸入正确的表名:
INNER JOIN Production.ProductModel
ON ProductModel.ProductModelID = ProductModel.ProductModelID
By entering ProductModel on both sides of the join, I inadvertently told SQL Server to not join Product to ProductModel, but instead join Product to the entirety of ProductModel. This occurs because ProductModel.ProductModel will always equal itself. I could have entered “ON 1 = 1” for the join criteria and seen the same results.
通過在連接配接的兩邊都輸入ProductModel ,我無意間告訴了SQL Server不要将Product連接配接到ProductModel ,而是将Product連接配接到整個ProductModel 。 發生這種情況是因為ProductModel.ProductModel将始終等于自身。 我可以為連接配接條件輸入“ ON 1 = 1”,并看到相同的結果。
The correction here is simple, adjust the join criteria to connect Product to ProductModel, as was intended:
此處的校正很簡單,按預期調整連接配接條件以将Product連接配接到ProductModel :
INNER JOIN Production.ProductModel
ON Product.ProductModelID = ProductModel.ProductModelID
Once fixed, the query returns a single row and utilizes an index seek on ProductModel.
修複後,查詢将傳回單行并利用ProductModel上的索引查找。
Situations in which a join predicate is missing or wrong can be difficult to detect. SQL Server does not always warn you of this situation, and you may not see an error message or show-stopping bad performance that gets your immediate attention. Here are some tips on catching bad joins before they cause production headaches:
連接配接謂詞丢失或錯誤的情況可能很難檢測到。 SQL Server并不總是會警告您這種情況,并且您可能不會看到引起您立即注意的錯誤消息或停止性能下降。 以下是在導緻生産麻煩之前趕上不良聯接的一些技巧:
-
Make sure that each join correlates an existing data set with the new table. CROSS JOINs should only be used when needed (and intentionally) to inflate the size/depth a data set.
確定每個聯接将現有資料集與新表相關聯。 僅在需要時(有意地)使用CROSS JOIN來擴大資料集的大小/深度。
-
An execution plan may indicate a “No Join Predicate” warning on a specific join in the execution plan. If so, then you’ll know exactly where to begin your research.
執行計劃可能會在執行計劃中的特定聯接上訓示“無聯接謂詞”警告。 如果是這樣,那麼您将确切地知道從哪裡開始研究。
-
Check the size of the result set. Is it too large? Are any tables being cross joined across an entire data set, resulting in extra rows of legit data with extraneous data tacked onto the end of it?
檢查結果集的大小。 太大了嗎 是否有任何表交叉連接配接到整個資料集,進而導緻合法資料的額外行加上多餘的資料?
-
Do you see any unusual index scans in the execution plan? Are they for tables where you expect to only seek a few rows, such as in a lookup table?
您在執行計劃中是否看到任何異常的索引掃描? 它們是否用于您希望僅查找幾行的表(例如在查找表中)?
For reference, here is an example of what a “No Join Predicate” warning looks like:
作為參考,以下是“無連接配接謂詞”警告的示例:
We’ll follow the standard rule that yellow and red exclamation marks will always warrant further investigation. In doing so, we can see that this specific join is flagged as having no join predicate. In a short query, this is easy to spot, but in a larger query against many tables, it is easy for these problems to get buried in a larger execution plan.
我們将遵循标準規則,即黃色和紅色感歎号将始終值得進一步調查。 這樣,我們可以看到此特定聯接被标記為沒有聯接謂詞。 在短查詢中,這很容易發現,但是在針對許多表的較大查詢中,這些問題很容易被埋入較大的執行計劃中。
疊代 (Iteration)
SQL Server is optimized for set-based operations and performs best when you read and write data in batches, rather than row-by-row. Applications are not constrained in this fashion and often use iteration as a method to parse data sets.
SQL Server針對基于集合的操作進行了優化,當您批量(而不是逐行)讀取和寫入資料時,SQL Server的性能最佳。 應用程式不受這種方式的限制,并且經常使用疊代作為解析資料集的方法。
While it may anecdotally seem that collecting 100 rows from a table one-at-a-time or all at once would take the same effort overall, the reality is that the effort to connect to storage and read pages into memory takes a distinct amount of overhead. As a result, one hundred index seeks of one row each will take far more time and resources than one seek of a hundred rows:
也許一次或一次從一個表中收集100行可能總會花費相同的精力,但現實情況是,連接配接到存儲并将頁面讀入記憶體的工作要花費大量的精力。高架。 結果,每行一百個索引查找要比一百行索引查找花費更多的時間和資源:
DECLARE @id INT = (SELECT MIN(BusinessEntityID) FROM HumanResources.Employee)
WHILE @id <= 100
BEGIN
UPDATE HumanResources.Employee
SET VacationHours = VacationHours + 4
WHERE BusinessEntityID = @id
AND VacationHours < 200;
SET @id = @id + 1;
END
This example is simple: iterate through a loop, update an employee record, increment a counter and repeat 99 times. The performance is slow and the execution plan/IO cost abysmal:
這個例子很簡單:循環周遊,更新員工記錄,增加計數器并重複99次。 性能緩慢,執行計劃/ IO成本極低:
At first glance, things seem good: Lots of index seeks and each read operation is inexpensive. When we look more closely, we realize that while 2 reads may seem cheap, we need to multiply that by 100. The same is true for the 100 execution plans that were generated for all of the update operations.
乍一看,情況似乎不錯:進行大量索引搜尋,并且每個讀取操作都很便宜。 當我們仔細觀察時,我們意識到雖然2次讀取可能看起來很便宜,但我們需要将其乘以100。對于為所有更新操作生成的100個執行計劃,情況也是如此。
Let’s say we rewrite this to update all 100 rows in a single operation:
假設我們重寫此代碼以在一次操作中更新所有100行:
UPDATE HumanResources.Employee
SET VacationHours = VacationHours + 4
WHERE VacationHours < 200
AND BusinessEntityID <= 100;
Instead of 200 reads, we only need 5, and instead of 100 execution plans, we only need 1.
不需要200個讀取,我們隻需要5個,而不是100個執行計劃,我們隻需要1個。
Data in SQL Server is stored in 8kb pages. When we read rows of data from disk or memory, we are reading 8kb pages, regardless of the data size. In our iterative example above, each read operation did not simply read a few numeric values from disk and update one, but had to read all of the necessary 8kb pages needed to service the entire query.
SQL Server中的資料存儲在8kb頁面中。 當我們從磁盤或記憶體中讀取資料行時,無論資料大小如何,我們都在讀取8kb頁面。 在上面的疊代示例中,每個讀取操作都不能簡單地從磁盤讀取幾個數值并更新一個,而是必須讀取服務整個查詢所需的所有8kb頁面。
Iteration is often hidden from view because each operation is fast an inexpensive, making it difficult to locate it when reviewing extended events or trace data. Watching out for CURSOR use, WHILE loops, and GOTO can help us catch it, even when there is no single poor-performing operation.
疊代通常從視圖中隐藏,因為每個操作快速且便宜,是以在檢視擴充事件或跟蹤資料時很難找到它。 警惕使用CURSOR,WHILE循環和GOTO可以幫助我們抓住它,即使沒有單個性能不佳的操作也是如此。
There are other tools available that can help us avoid iteration. For example, a common need when inserting new rows into a table is to immediately return the IDENTITY value for that new row. This can be accomplished by using @@IDENTITY or SCOPE_IDENTITY(), but these are not set-based functions. To use them, we must iterate through insert operations one-at-a-time and retrieve/process the new identity values after each loop. For row counts greater than 2 or 3, we will begin to see the same inefficiencies introduced above.
還有其他可用的工具可以幫助我們避免疊代。 例如,将新行插入表中時,通常需要立即傳回該新行的IDENTITY值。 這可以通過使用@@ IDENTITY或SCOPE_IDENTITY()來實作,但是這些不是基于集合的函數。 要使用它們,我們必須一次周遊插入操作,并在每次循環後檢索/處理新的辨別值。 對于大于2或3的行數,我們将開始看到上面介紹的相同的低效率。
The following code is a short example of how to use OUTPUT INSERTED to retrieve IDENTITY values in bulk, without the need for iteration:
以下代碼是一個簡短示例,說明如何使用OUTPUT INSERTED來批量檢索IDENTITY值,而無需進行疊代:
CREATE TABLE #color
(color_id SMALLINT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, color_name VARCHAR(50) NOT NULL, datetime_added_utc DATETIME);
CREATE TABLE #id_values
(color_id SMALLINT NOT NULL PRIMARY KEY CLUSTERED, color_name VARCHAR(50) NOT NULL);
INSERT INTO #color
(color_name, datetime_added_utc)
OUTPUT INSERTED.color_id, INSERTED.color_name
INTO #id_values
VALUES
('Red', GETUTCDATE()),
('Blue', GETUTCDATE()),
('Yellow', GETUTCDATE()),
('Brown', GETUTCDATE()),
('Pink', GETUTCDATE());
SELECT * FROM #id_values;
DROP TABLE #color;
DROP TABLE #id_values;
In this script, we insert new rows into #color in a set-based fashion, and pull the newly inserted IDs, as well as color_name, into a temp table. Once in the temp table, we can use those new values for whatever additional operations are required, without the need to iterate through each INSERT operation one-at-a-time.
在此腳本中,我們在基于集合的方式插入新行# 顔色 ,新插入的ID,以及COLOR_NAME,拉成一個臨時表。 進入臨時表後,我們可以将這些新值用于所需的任何其他操作,而無需一次周遊每個INSERT操作。
Window functions are also very useful for minimizing the need to iterate. Using them, we can pull row counts, sums, min/max values, and more without executing additional queries or iterating through data windows manually:
視窗函數對于最小化疊代需求也非常有用。 使用它們,我們可以擷取行數,總和,最小值/最大值等,而無需執行其他查詢或手動周遊資料視窗:
SELECT
SalesOrderHeader.SalesOrderID,
SalesOrderDetail.SalesOrderDetailID,
SalesOrderHeader.SalesPersonID,
ROW_NUMBER() OVER (PARTITION BY SalesOrderHeader.SalesPersonID ORDER BY SalesOrderDetail.SalesOrderDetailID ASC) AS SalesPersonRowNum,
SUM(SalesOrderHeader.SubTotal) OVER (PARTITION BY SalesOrderHeader.SalesPersonID ORDER BY SalesOrderDetail.SalesOrderDetailID ASC) AS SalesPersonSales
FROM Sales.SalesOrderHeader
INNER JOIN Sales.SalesOrderDetail
ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
WHERE SalesOrderHeader.SalesPersonID IS NOT NULL
AND SalesOrderHeader.Status = 5;
The results of this query show us not only a row per detail line, but include a running count of orders per sales person and a running total of sales:
該查詢的結果不僅向我們顯示了每個明細行,而且還包括每個銷售人員的連續訂單計數和銷售總額:
Window functions are not inherently efficient: The above query required some hefty sort operations to generate the results. Despite the cost, this is far more efficient that iterating through all sales people, orders, or some other iterative operation over a large data set:
視窗函數本身并不是有效的方法:上面的查詢需要進行大量的排序操作才能生成結果。 盡管付出了代價,但與在大型資料集上周遊所有銷售人員,訂單或其他疊代操作相比,這要高效得多:
In addition to avoiding iteration, we also avoid the need for aggregation within our query, allowing us to freely select whatever columns we’d like without the typical constraints of GROUP BY/HAVING queries.
除了避免疊代外,我們還避免了在查詢中進行聚合的需要,進而使我們可以自由選擇所需的任何列,而無需GROUP BY / HAVING查詢的典型限制。
Iteration is not always a bad thing. Sometimes we need to query all databases on a server or all servers in a list. Other times we need to call a stored procedure, send emails, or perform other operations that are either inefficient or impossible to do in a set-based fashion. In these scenarios, make sure that performance is adequate and that the number of times that a loop needs to be repeated is limited to prevent unexpected long-running jobs.
疊代并不總是一件壞事。 有時我們需要查詢伺服器上的所有資料庫或清單中的所有伺服器。 其他時候,我們需要調用存儲過程,發送電子郵件或執行其他效率低下或無法以基于集合的方式完成的操作。 在這些情況下,請確定性能足夠,并且限制需要重複循環的次數以防止意外的長時間運作的作業。
封裝形式 (Encapsulation)
When writing application code, encapsulation is used as a way to reuse code and simplify complex interfaces. By packaging code into functions, stored procedures, and views, we can very easily offload important business logic or reusable code to a common place, where it can be called by any other code.
在編寫應用程式代碼時,封裝被用作重用代碼和簡化複雜接口的一種方式。 通過将代碼打包到函數,存儲過程和視圖中,我們可以很容易地将重要的業務邏輯或可重用代碼解除安裝到一個通用位置,在該通用位置可以由任何其他代碼調用。
While this sounds like a very good thing, when overused it can very quickly introduce performance bottlenecks as chains of objects linked together by other encapsulated objects increases. For example: a stored procedure that calls a stored procedure that uses a function that calls a view that calls a view that calls a view. This may sound absurd but is a very common outcome when views and nested stored procedures are relied on heavily.
雖然這聽起來很不錯,但是當它被過度使用時,随着其他封裝對象連結在一起的對象鍊的增加,它會很快引入性能瓶頸。 例如:一個存儲過程,該存儲過程調用一個存儲過程,該存儲過程使用一個函數來調用一個視圖,該函數調用一個視圖,該視圖又調用一個視圖。 這聽起來很荒謬,但是當嚴重依賴視圖和嵌套存儲過程時,這是非常普遍的結果。
How does this cause performance problems? Here are a few common ways:
這如何導緻性能問題? 以下是一些常見的方法:
-
Unnecessary joins, filters, and subqueries are applied, but not needed.
應用了不必要的聯接,過濾器和子查詢,但不是必需的。
-
Columns are returned that are not needed for a given application.
傳回給定應用程式不需要的列。
-
INNER JOINs, CROSS JOINs, or filters force reads against tables that are not needed for a given operation.
内部聯接,交叉聯接或過濾器強制對給定操作不需要的表進行讀取。
-
Query size (# of tables referenced in query) results in a poor execution plan.
查詢大小(查詢中引用的表數)會導緻執行計劃不佳。
-
Logical mistakes are made due to obfuscated query logic not being fully understood.
由于無法完全了解混淆的查詢邏輯,是以會導緻邏輯錯誤。
Here is an example of an AdventureWorks query in which simple intentions have complex results:
這是一個AdventureWorks查詢的示例,其中簡單的意圖具有複雜的結果:
SELECT
BusinessEntityID,
Title,
FirstName,
LastName
FROM HumanResources.vEmployee
WHERE FirstName LIKE 'E%'
At first glance, this query is pulling only 4 columns from the employee view. The results are what we expect, but it runs a bit longer than we’d want (over 1 second). Checking the execution plan and IO stats reveals:
乍一看,此查詢僅從員工視圖中拉出4列。 結果是我們所期望的,但運作時間比我們想要的要長一點(超過1秒)。 檢查執行計劃和IO統計資訊會發現:
What we discover is that there was quite a bit going on behind-the-scenes that we were not aware of. Tables were accessed that we didn’t need, and excess reads performed as a result. This leads us to ask: What is in vEmployee anyway!? Here is the definition of this view:
我們發現,有很多幕後事件我們還沒有意識到。 通路了我們不需要的表,結果執行了多餘的讀取。 這導緻我們問: vEmployee到底有什麼!? 這是此視圖的定義:
CREATE VIEW [HumanResources].[vEmployee]
AS
SELECT
e.[BusinessEntityID]
,p.[Title]
,p.[FirstName]
,p.[MiddleName]
,p.[LastName]
,p.[Suffix]
,e.[JobTitle]
,pp.[PhoneNumber]
,pnt.[Name] AS [PhoneNumberType]
,ea.[EmailAddress]
,p.[EmailPromotion]
,a.[AddressLine1]
,a.[AddressLine2]
,a.[City]
,sp.[Name] AS [StateProvinceName]
,a.[PostalCode]
,cr.[Name] AS [CountryRegionName]
,p.[AdditionalContactInfo]
FROM [HumanResources].[Employee] e
INNER JOIN [Person].[Person] p
ON p.[BusinessEntityID] = e.[BusinessEntityID]
INNER JOIN [Person].[BusinessEntityAddress] bea
ON bea.[BusinessEntityID] = e.[BusinessEntityID]
INNER JOIN [Person].[Address] a
ON a.[AddressID] = bea.[AddressID]
INNER JOIN [Person].[StateProvince] sp
ON sp.[StateProvinceID] = a.[StateProvinceID]
INNER JOIN [Person].[CountryRegion] cr
ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
LEFT OUTER JOIN [Person].[PersonPhone] pp
ON pp.BusinessEntityID = p.[BusinessEntityID]
LEFT OUTER JOIN [Person].[PhoneNumberType] pnt
ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
LEFT OUTER JOIN [Person].[EmailAddress] ea
ON p.[BusinessEntityID] = ea.[BusinessEntityID];
This view does not only contain basic Employee data, but also many other tables as well that we have no need for in our query. While the performance we experienced might be acceptable under some circumstances, it’s important to understand the contents of any objects we use to the extent that we can use them effectively. If performance were a key issue here, we could rewrite our query as follows:
該視圖不僅包含基本的Employee資料,而且還包含許多我們不需要的其他表。 盡管在某些情況下我們接受的性能可能是可以接受的,但重要的是要了解我們所使用的任何對象的内容,以使我們可以有效地使用它們。 如果性能是此處的關鍵問題,我們可以按以下方式重寫查詢:
SELECT
e.BusinessEntityID,
p.Title,
p.FirstName,
p.LastName
FROM HumanResources.Employee e
INNER JOIN Person.Person p
ON p.BusinessEntityID = e.BusinessEntityID
WHERE FirstName LIKE 'E%'
This version only accesses the tables we need, thereby generating half the reads and a much simpler execution plan:
該版本僅通路我們所需的表,進而生成一半的讀取次數和一個更簡單的執行計劃:
It is important to note that encapsulation is in no way a bad thing, but in the world of data, there are dangers to over-encapsulating business logic within the database. Here are some basic guidelines to help in avoiding performance problems resulting from the nesting of database objects:
重要的是要注意,封裝絕不是一件壞事,但是在資料世界中,存在将業務邏輯過度封裝在資料庫中的危險。 以下是一些基本準則,可幫助避免由于資料庫對象嵌套而導緻的性能問題:
-
When possible, avoid nesting views within views. This improves visibility into code and reduces the chances of misunderstanding the contents of a given view.
盡可能避免在視圖中嵌套視圖。 這樣可以提高對代碼的可見性,并減少誤解給定視圖内容的機會。
-
Avoid nesting functions if possible. This can be confusing and lead to challenging performance problems.
盡可能避免嵌套函數。 這可能會造成混淆,并導緻挑戰性的性能問題。
-
Avoid triggers that call stored procedures or that perform too much business logic. Nested triggers are equally dangerous. Use caution when operations within triggers can fire more triggers.
避免觸發調用存儲過程或執行過多業務邏輯的觸發器。 嵌套觸發器同樣危險。 當觸發器内的操作可以觸發更多觸發器時,請謹慎使用。
-
Understand the functionality of any defined objects (functions, triggers, views, stored procedures) prior to use. This will avoid misunderstandings of their purpose.
使用前了解任何已定義對象的功能(功能,觸發器,視圖,存儲過程)。 這樣可以避免對它們目的的誤解。
Storing important and frequently used TSQL in stored procedures, views, or functions can be a great way to increase maintainability via code reuse. Exercise caution and ensure that the complexity of encapsulated objects does not become too high. Performance can be inadvertently impacted when objects are nested many layers deep. When troubleshooting a problem query, always research the objects involved so that you have full exposure to any views, functions, stored procedures, or triggers that may also be involved in its execution.
在存儲過程,視圖或函數中存儲重要且經常使用的TSQL是通過代碼重用提高可維護性的好方法。 謹慎行事,并確定封裝對象的複雜性不會變得太高。 當對象嵌套很多層時,性能可能會受到無意影響。 在對問題查詢進行故障排除時,請始終研究涉及的對象,以便您可以充分了解其執行過程中可能涉及的任何視圖,函數,存儲過程或觸發器。
OLTP與OLAP (OLTP vs. OLAP)
Data is typically accessed either for transactional needs or analytical (reporting) needs. A database can be effectively optimized to handle either of these scenarios very well. The ways in which we performance tune for each is very different and needs some consideration when designing database elements.
通常針對事務需求或分析(報告)需求通路資料。 可以有效地優化資料庫,以很好地處理這兩種情況。 我們對每種性能進行調整的方式非常不同,在設計資料庫元素時需要考慮一些因素。
OLTP (OLTP)
Online transaction processing refers to workloads in which data is written to and read for common interactive business usage. OLTP workloads are typically characterized by the following patterns:
線上事務處理是指工作負載,在其中寫入和讀取資料以用于常見的互動式業務用途。 OLTP工作負載通常具有以下特征:
-
More writes, such as adding new data, updating rows, or deleting rows.
更多寫操作,例如添加新資料,更新行或删除行。
-
More interactive operations in which people are logging into apps or web sites and directly viewing or modifying data. This comprises common business tasks.
人們可以登入到應用程式或網站并直接檢視或修改資料的更具互動性的操作。 這包括常見的業務任務。
-
Operations on smaller row counts, such as updating an order, adding a new contact, or viewing recent transactions in a store. These operations often operate on current or recent data only.
對較小行數的操作,例如更新訂單,添加新聯系人或檢視商店中的最近交易。 這些操作通常僅對目前或最近的資料進行操作。
-
More tables and joins involved in queries.
查詢中涉及更多的表和聯接。
-
Timeliness is important. Since users are waiting for results, latency is not tolerated.
及時性很重要。 由于使用者正在等待結果,是以不容許延遲。
-
High transaction counts, but typically small transaction size.
事務計數高,但通常事務規模小。
OLTP environments tend to be more relational, with indexes targeted at common updates, searches, or operations that are the core of an application. OLTP processes generally ensure, and rely on data integrity. This may necessitate the use of foreign keys, check constraints, default constraints, or triggers to assist in guaranteeing real-time data integrity.
OLTP環境往往具有更強的關系性,索引針對的是作為應用程式核心的常見更新,搜尋或操作。 OLTP流程通常可確定并依賴于資料完整性。 這可能需要使用外鍵,檢查限制,預設限制或觸發器來幫助保證明時資料完整性。
OLAP (OLAP)
Online analytical processing generally refers to reporting or search environments. These are used for crunching large volumes of data, such as in reporting, data mining, or analytics. Common features of OLAP workloads are:
線上分析處理通常是指報告或搜尋環境。 這些用于處理大量資料,例如在報告,資料挖掘或分析中。 OLAP工作負載的共同特征是:
-
Typical workloads are read-only, with writes only occurring during designated load/update times.
典型的工作負載是隻讀的,僅在指定的加載/更新時間内發生寫操作。
-
Many operations are automated or scheduled to run and be delivered to users at a later time. These processes are often used to gain insight into a business and to assist in decision making processes.
許多操作是自動化的或已安排運作的,并在以後傳遞給使用者。 這些流程通常用于深入了解業務并協助制定決策流程。
-
Operations can run on very large quantities of data. This can be crunching data year-over-year, trending spending over the past quarter, or any other task that may require pulling a longer history to complete.
操作可以在大量資料上運作。 這可能是與去年同期相比資料緊縮,過去一個季度的支出趨勢或其他任何可能需要拉長曆史才能完成的任務。
-
Tables tend to be wider and fewer, allowing for reports to be generated with less joins or lookups.
表格往往更寬,更少,進而允許使用更少的聯接或查找來生成報告。
-
Users may not be waiting for results, which can be delivered via email, file, or some other asynchronous means. If they are, there may be an expectation of delay due to the size of data. For reports where timeliness is important, the data can be crunched and staged ahead of time to assist in speedy results when requested.
使用者可能不等待結果,結果可以通過電子郵件,檔案或其他異步方式傳遞。 如果是這樣,可能會由于資料大小而導緻延遲。 對于及時性很重要的報告,可以在需要時對資料進行整理和分級,以幫助快速獲得結果。
-
Low transaction count, but typically large transaction sizes.
交易數量少,但通常交易規模大。
OLAP environments are usually flatter and less relational. Data is created in OLTP applications and then passed onto OLAP environments where analytics can take place. As a result, we can often assume that data integrity has already been established. As a result, constraints, keys, and other similar checks can often be omitted.
OLAP環境通常更平坦,關系更少。 資料在OLTP應用程式中建立,然後傳遞到可以進行分析的OLAP環境中。 結果,我們經常可以假設已經建立了資料完整性。 結果,通常可以省略限制,鍵和其他類似的檢查。
If data is crunched or transformed, we can validate it afterwards, rather than real-time as with OLTP workloads. Quite a bit of creativity can be exercised in OLAP data, depending on how current data needs to be, how quickly results are requested, and the volume of history required to service requests.
如果資料受到限制或轉換,我們可以事後進行驗證,而不是像OLTP工作負載那樣進行實時驗證。 OLAP資料可以發揮很多創造力,這取決于目前資料的需要方式,請求結果的速度以及處理請求所需的曆史記錄量。
使它們分開 (Keeping them separated)
Due to their vastly different needs, it behooves us to separate transactional and analytical systems as much as possible. One of the most common reasons that applications become slow and we resort to NOLOCK hints is when we try to run huge search queries or bulky reports against our transactional production application. As transaction counts become higher and data volume increases, the clash between transactional operations and analytical ones will increase. The common results are:
由于它們的需求截然不同,是以我們應該盡可能地将事務和分析系統分開。 應用程式變慢并且訴諸于NOLOCK提示的最常見原因之一是,當我們嘗試針對事務型生産應用程式運作大型搜尋查詢或大型報告時。 随着事務計數的增加和資料量的增加,事務操作與分析操作之間的沖突将增加。 常見的結果是:
-
Locking, blocking, and deadlocks when a large search/report runs and users are trying to update data.
當運作較大的搜尋/報告并且使用者試圖更新資料時,将發生鎖定,阻塞和死鎖。
-
Over-indexing of tables in an effort to service all kinds of different queries.
表的過度索引,旨在為各種不同的查詢提供服務。
-
The removal of foreign keys and constraints to speed up writes.
删除外鍵和限制以加快寫入速度。
-
Application latency.
應用程式延遲。
-
Use of query hints as workarounds to performance problems.
使用查詢提示作為性能問題的解決方法。
-
Throwing hardware at the database server in an effort to improve performance.
在資料庫伺服器上扔硬體以提高性能。
The optimal solution is to recognize the difference between OLAP and OLTP workloads when designing an application, and separate these environments on day 1. This often doesn’t happen due to time, cost, or personnel constraints.
最佳解決方案是在設計應用程式時識别OLAP和OLTP工作負載之間的差異,并在第1天将這些環境分開。由于時間,成本或人員限制,這種情況通常不會發生。
Regardless of the severity of the problem or how long it has persisted, separating operations based on their type is the solution. Creating a new and separate environment to store OLAP data is the first step. This may be developed using AlwaysOn, log shipping, ETL processes, storage-level data copying, or many other solutions to make a copy of the OLTP data.
不管問題的嚴重性或持續的時間長短,解決方案都是根據其類型将操作分開。 第一步是建立一個新的單獨的環境來存儲OLAP資料。 可以使用AlwaysOn,日志傳送,ETL流程,存儲級資料複制或許多其他解決方案來制作OLTP資料,以開發此檔案。
Once available, offloading operations can be a process that occurs over time. Easing into it allows for more QA and caution as a business grows familiar with new tools. As more operations are moved to a separate data store, you’ll be able to take remove reporting indexes from the OLTP data source and further optimize it for what it does best (service OLTP workloads). Similarly, the new OLAP data store can be optimized for analytics, allowing you to flatten tables, remove constraints and OLTP indexes, and make it faster for the operations that it services.
一旦可用,解除安裝操作可能是一個随時間推移發生的過程。 随着企業對新工具的日益熟悉,對其進行簡化可以帶來更多的品質檢查和注意事項。 随着更多的操作轉移到單獨的資料存儲中,您将能夠從OLTP資料源中删除報告索引,并針對最佳性能(服務OLTP工作負載)對其進行進一步優化。 同樣,可以優化新的OLAP資料存儲以進行分析,使您可以展平表,删除限制和OLTP索引,并使其更快地進行服務。
The more separated processes become, the easier it is to optimize each environment for its core uses. This results not only in far better performance, but also ease of development of new features. Tables built solely for reporting are far easier to write queries against than transactional tables. Similarly, being able to update application code with the knowledge that large reports won’t be running against the database removes many of the performance worries that typically are associated with a mixed environment.
流程變得越分離,就越容易針對其核心用途優化每個環境。 這不僅帶來了更好的性能,而且還簡化了新功能的開發。 僅針對報表而建構的表比對事務表的查詢要容易得多。 同樣,能夠在不針對資料庫運作大型報表的情況下更新應用程式代碼,進而消除了通常與混合環境相關的許多性能擔憂。
扳機 (Triggers)
Triggers themselves are not bad, but overuse of them can certainly be a performance headache. Triggers are placed on tables and can fire instead of, or after inserts, updates, and/or deleted.
觸發器本身并不壞,但過度使用觸發器肯定會引起性能頭痛。 Triggers are placed on tables and can fire instead of, or after inserts, updates, and/or deleted.
The scenarios when they can become performance problems is when there are too many of them. When updating a table results in inserts, updates, or deletes against 10 other tables, tracking performance can become very challenging as determining the specific code responsible can take time and lots of searching.
The scenarios when they can become performance problems is when there are too many of them. When updating a table results in inserts, updates, or deletes against 10 other tables, tracking performance can become very challenging as determining the specific code responsible can take time and lots of searching.
Triggers often are used to implement business/application logic, but this is not what a relational database is built or optimized for. In general, applications should manage as much of this as possible. When not possible, consider using stored procedures as opposed to triggers.
Triggers often are used to implement business/application logic, but this is not what a relational database is built or optimized for. In general, applications should manage as much of this as possible. When not possible, consider using stored procedures as opposed to triggers.
The danger of triggers is that they become a part of the calling transaction. A single write operation can easily become many and result in waits on other processes until all triggers have fired successfully.
The danger of triggers is that they become a part of the calling transaction. A single write operation can easily become many and result in waits on other processes until all triggers have fired successfully.
To summarize some best practices:
To summarize some best practices:
-
Use triggers only when needed, and not as a convenience or time-saver.
Use triggers only when needed, and not as a convenience or time-saver.
-
Avoid triggers that call more triggers. These can lead to crippling amounts of IO or complex query paths that are frustrating to debug.
Avoid triggers that call more triggers. These can lead to crippling amounts of IO or complex query paths that are frustrating to debug.
-
Server trigger recursion should be turned off. This is the default. Allowing triggers to call themselves, directly or indirectly, can lead to unstable situations or infinite loops.
Server trigger recursion should be turned off. 這是預設值。 Allowing triggers to call themselves, directly or indirectly, can lead to unstable situations or infinite loops.
-
Keep triggers simple and have them execute a single purpose.
Keep triggers simple and have them execute a single purpose.
結論 (Conclusion)
Troubleshooting performance can be challenging, time-consuming, and frustrating. One of the best ways to avoid these troubles is to build a database intelligently up-front and avoid the need to have to fix things later.
Troubleshooting performance can be challenging, time-consuming, and frustrating. One of the best ways to avoid these troubles is to build a database intelligently up-front and avoid the need to have to fix things later.
By gathering information about an application and how it is used, we can make smart architecture decisions that will make our database more scalable and perform better over time. The result will be better performance and less need to waste time on troubleshooting broken things.
By gathering information about an application and how it is used, we can make smart architecture decisions that will make our database more scalable and perform better over time. The result will be better performance and less need to waste time on troubleshooting broken things.
目錄 (Table of contents)
Query optimization techniques in SQL Server: the basics |
Query optimization techniques in SQL Server: tips and tricks |
Query optimization techniques in SQL Server: Database Design and Architecture |
Query Optimization Techniques in SQL Server: Parameter Sniffing |
SQL Server中的查詢優化技術:基礎 |
SQL Server中的查詢優化技術:提示和技巧 |
SQL Server中的查詢優化技術:資料庫設計和體系結構 |
SQL Server中的查詢優化技術:參數嗅探 |
翻譯自: https://www.sqlshack.com/query-optimization-techniques-in-sql-server-database-design-and-architecture/