Couple days ago my friend during a job interview was asked to explain the concept of table partitioning in SQL Server. Tough one? To be honest, neither he nor I had never heard about that. After few hours of learning, I’m ready to present to you the idea. So, is that connected somehow with disk partitions in our computers? No exactly, but the concept seems to be quite similar (at least for me).
As we probably all know, the relational database consists of tables (relations). Each one represents a small part of our business that we try to model. We are accustomed to the fact that the table is one indivisible file that is stored on our hard drive. However, SQL Server provides a solution that allows us to split a chosen table’s file into smaller pieces. That is called table partitioning. Well, at first it seems quite complicated. Does it mean that our queries should target specific partition to get the data, and how will we know which partition contains the data that we are looking for? The simple answer is: we don’t care. For the user, everything looks as it was before. It’s DBMS responsibility to ensure getting correct data. Okay, so if it looks the same, what are the advantages of this approach. There are at least two of them:
- Performance – let’s start with the simple example. Imagine that you’ve collected a big number of invoices for the last year. Now your task is to store them physically in your office. What would you do? The first approach is to store them in one box as a single collection. That’s not the best solution considering the fact that you might need them in the future. For instance, your boss asked you to bring him all invoices from June and November. Because you were such a lazy person in the past now, you have to struggle with a stack of invoice just to select only these from indicated months. Okay, so let’s try something different. How about storing all invoices from each month in a separate box? Now everything seems much simpler. „You want me to bring you invoices from June? Easy, here it is!”. The same thing happens when you are partitioning a table. DBMS does not have to leaf through all rows. Instead of that it just needs to know which partition stores invoices from a specific month. As you probably expect, it’s way faster approach. Moreover, you can reduce the execution time of your queries even more by placing each file on the separate hard drive. Thereby DBMS can look for data from many partitions at the same time.
- Transaction lock – in MS SQL each transaction requests locks of depending resources e.g. rows, tables, etc. Otherwise, another transaction could modify the resources in a way that would cause problems. So if a user’s query starts a transaction locking many records in a single partition only, the DBMS may lock only this one partition instead of the entire table.
What do we need?
Knowing what the table partitioning is, we can consider what we will need to achieve the objective? First of all, we want the mechanism that based on the set criterion will determine the partition to which we must save the new data, or get the existing ones. It’s called partition function. Mentioned criterion (partitioning column) is nothing but a column in our table, for instance, invoice issue date. The last thing is partition scheme consisting of the list of files group on which all the partitions are stored. Diagram below presents the flow of the whole process:
Before we start it should be noted that table partitioning it is only available in Enterprise and Developer editions of SQL Server. Okay, let’s code something!
First, we need to create new database consisting of files groups:
CREATE DATABASE [InvoicesDatabase] ON PRIMARY ( NAME = N'InvoicesDatabase',FILENAME = N'C:\PARTITIONS\InvoicesDatabase.mdf' , SIZE = 5MB , FILEGROWTH = 1024KB ), FILEGROUP [FILE_GROUP1] (NAME = N'fg1', FILENAME = N'C:\PARTITIONS\fg1.ndf' , SIZE = 2048KB , FILEGROWTH = 1024KB ), FILEGROUP [FILE_GROUP2] ( NAME = N'fg2',FILENAME = N'C:\PARTITIONS\fg2.ndf' , SIZE = 2048KB , FILEGROWTH = 1024KB ), FILEGROUP [FILE_GROUP3] (NAME = N'fg3',FILENAME = N'C:\PARTITIONS\fg3.ndf' , SIZE = 2048KB , FILEGROWTH = 1024KB ) LOG ON ( NAME = N'InvoicesDatabase log',FILENAME = N'C:\PARTITIONS\InvoicesDatabase_log.ldf', SIZE = 1024KB, FILEGROWTH = 10%) GO
Next, we need to create a partition function as below:
CREATE PARTITION FUNCTION InvoicesDatabase_PartitionFunction (DATE) AS RANGE RIGHT FOR VALUES('2016-01-01','2016-02-01') GO
Let’s stop here for a while. Notice that we’ve informed SL Server about a type of partitioning column which will be used as a criterion for function. Moreover, we determined explicitly thresholds of intervals, based on which DBMS will assign data to proper partitions. It remains for us to explain the RIGHT words that appeared in the query. It says that the threshold will be included in next (right) interval. By analogy, we can use LEFT word. Let’s go back to the SQL code. Now we need to create partition scheme:
CREATE PARTITION SCHEME InvoicesDatabase_PartitionScheme AS PARTITION InvoicesDatabase_PartitionFunction TO (FILE_GROUP1, FILE_GROUP2, FILE_GROUP3)
It is time to create a table which will be based on the above scheme:
CREATE TABLE Invoices (Id INT IDENTITY,IssueDate DATE) ON InvoicesDatabase_PartitionScheme (IssueDate)
We can check if everything works fine by executing these two queries:
SELECT * FROM sys.partition_schemes SELECT * FROM sys.partitions WHERE OBJECT_ID('Invoices') = object_id
Here’s the result:
Let’s assume that our database is already full of data. Are we able to check if all this actually work? Of course! We can run this query:
SELECT $PARTITION.InvoicesDatabase_PartitionFunction(I.IssueDate) AS PartitionNb, I.* FROM Invoices I
And here is the materialized result:
It seems to work correctly. Our threshold belongs to the second partition as we declared before using the RIGHT word. The last thing I want to show you today is how to add another partition to an existing database. For that purpose we must at first create the new group of files:
ALTER DATABASE [InvoicesDatabase] ADD FILEGROUP [FILE_GROUP4] ALTER DATABASE [InvoicesDatabase] ADD FILE ( NAME = N'fg4', FILENAME = N'C:\PARTITIONS\fg5.ndf', SIZE = 2048KB, FILEGROWTH = 1024KB ) TO FILEGROUP [FILE_GROUP4] GO
Next, we must add the created partition to the existing scheme using the query below:
ALTER PARTITION SCHEME InvoicesDatabase_PartitionScheme NEXT USED [FILE_GROUP4]
At the end, we must add another threshold to our partition function:
ALTER PARTITION FUNCTION InvoicesDatabase_PartitionFunction() SPLIT RANGE ('2016-04-01')
Here are our partitions !
I think that this mechanism is quite useful, especially for developers who pay a lot of attention to the optimization of the database environment. In the next week, we will come back together to the CQRS/ES subject. So if you don’t want to miss that, follow me on Twitter or leave a like on Facebook.