Tales from the T-SQL Tuesday Archive: #003 Relationships

Posted by John Q. Martin on Tue, Apr 16, 2024
T-SQL Tuesday Logo

In this trip down memory lane we are going all the way back to February 2010 and Rob Farley (l|b) proposing the topic of relationships. There were several things that came to mind when I was looking at this topic. These included building collaborative teams that crossed the traditional Dev/DBA/Infra boundaries, or managing external suppliers. However, the one that really stuck in my mind was related to a mistake that I made when managing a database system which resulted in a lot of data being inadvertently deleted.

Today I am going to talk about foreign keys, specifically, how they can be used to cascade referential integrity within a database.

Referential Integrity

Over the years I have worked on many database systems and seen various different options when it comes to managing referential integrity. I have seen it managed entirely in the application layer in products like Dynamics AX. I have seen it added to databases through primary key & foreign key relationships which were disabled and used simply as a metadata layer for reporting and other applications. And I have seen it used in what I would call a ’traditional’ way by enforced primary and foreign key relationships.

I’m a big fan of relational integrity, mainly because I have had to fix it when it has gone wrong, and it is not a fun experience tracking down gaps and broken relationships within multi-terabyte databases.

Fundamentally, the concept is that we define a primary key on a table, then we can use that key value in associated tables which allows us to effectively normalise data structures for optimal data storage and consistency. We can use the Foreign Key constraints to enforce this relationship so that data cannot be inserted in the secondary table if there is not valid data in the primary table to maintain the relationship between the two tables on the specified columns. A good example of this is the relationship hierarchy between products, product categories, and sales tables. We cannot have a product if there is no way to categorise it, which in turn means if there is no product then we cannot sell it. This is illustrated in the figure below.

Entity relationship diagram with three tables for product category, product, and sales.

Defining Foreign Keys

If we take the data structure outlined above and implement that in a SQL Server database we can then see how it will enforce these desired behaviours within our data model. A full set of T-SQL scripts can be found here in my GitHub repository for you to download.

Table Definitions

There are two ways to define the constraints when we define our tables, we can do it in-line or we can add them as a secondary task. If I am using SQL Projects in Visual Studio or Azure Data Studio then I will just define my model and let the tooling handle the dependency graph around what order to create things in. However, if I am manually defining the model and not using those tools to deploy my code then I define the tables and primary keys as a unit, with the foreign keys defined as a separate stage. This means I can reduce the complexity of figuring out and maintaining the order in which objects need to be created.

Here we are going to take the latter option to make it clear what is going on.

The snippet below is creating the three tables from our diagram above.

 1CREATE TABLE dbo.productCategory
 2(
 3	categoryId TINYINT IDENTITY(1,1) PRIMARY KEY,
 4	categoryName NVARCHAR(20) NOT NULL,
 5	categoryDescription NVARCHAR(100) NOT NULL
 6)
 7;
 8
 9CREATE TABLE dbo.product
10(
11	productId SMALLINT IDENTITY(1,1) PRIMARY KEY,
12	productCategoryId TINYINT NOT NULL,
13	productName NVARCHAR(20) NOT NULL,
14	productDescription NVARCHAR(100) NOT NULL,
15	productPrice MONEY NOT NULL
16)
17;
18
19CREATE TABLE dbo.sales
20(
21	salesId BIGINT IDENTITY(1,1) PRIMARY KEY,
22	productId SMALLINT NOT NULL,
23	productQuantity INT NOT NULL
24)
25;

Now we can look to create the foreign keys between productCategory and product, and then product and sales. This will mean that we will be in a position where we cannot create products without a category, and we cannot make a sale line without a product ID. We run an alter table command in each case to create the foreign key and identify which column in the table references in the other table.

1ALTER TABLE dbo.product
2	ADD CONSTRAINT fk_productCategory FOREIGN KEY(productCategoryId)
3		REFERENCES dbo.productCategory(categoryId)
4;
5ALTER TABLE dbo.sales
6	ADD CONSTRAINT fk_product FOREIGN KEY(productId)
7		REFERENCES dbo.product(productId)
8;

Now that we have all of the tables and relationships in place we can load some data into the tables. It is important to remember that we need to load the tables in the correct order because the relationships will prevent data being inserted if the reference values are not present.

 1--// Add some data to the tables.
 2INSERT dbo.productCategory
 3(categoryName,categoryDescription)
 4VALUES
 5(N'Motorcycle',N'Motorcyles'),
 6(N'Electronics',N'Electronic Accessories'),
 7(N'Luggage',N'Additional Luggage'),
 8(N'Performance',N'Performance Parts'),
 9(N'Parts',N'Components')
10;
11
12INSERT dbo.product
13(productCategoryId,productName,productDescription,productPrice)
14VALUES
15(1,N'Tiger 1200 XCA',N'Triumph Tiger 1200 XCA',17995.00),
16(1,N'Tiger 800 XCA',N'Triumph Tiger 800 XCA',12995.00),
17(1,N'Speed Triple 1050 RS',N'Triumph Speed Triple 1050 RS',15995.00),
18(2,N'TomTom Rider 550',N'TomTom SatNav Unit',395.99),
19(4,N'Arrow GP2',N'Arrow Exhaust (ALU)',499.99),
20(5,N'Sprocket F1242',N'7 Tooth Front Sprocket',49.99),
21(5,N'Sprocket F4529',N'Performance Rear Sprocket',78.99),
22(5,N'Serket Taper',N'Scorpion Exhaust (Carbon)',295.95)
23;
24
25INSERT dbo.sales
26(productId, productQuantity)
27VALUES
28(1,1),
29(2,1),
30(1,2),
31(2,1),
32(3,1),
33(4,2),
34(3,1),
35(3,1),
36(3,1),
37(3,1),
38(5,2),
39(6,1),
40(7,1),
41(8,1),
42(8,1),
43(1,1),
44(3,1),
45(1,1),
46(4,1),
47(1,1),
48(5,1),
49(1,1),
50(7,1),
51(7,2),
52(4,1),
53(4,1),
54(4,1),
55(4,1),
56(4,1),
57(4,1),
58(2,1),
59(3,1),
60(2,1),
61(2,1),
62(1,1),
63(2,1),
64(1,1)
65;

Now lets join all these tables up to give us a quick overview of the data to simulate a report being run against the system. Here we will join on the primary key to foreign keys for the relevant tables.

 1SELECT s.salesId,
 2	p.productName,
 3	pc.categoryName,
 4	s.productQuantity
 5FROM dbo.sales AS s
 6JOIN dbo.product AS p
 7	ON s.productId = p.productId
 8JOIN dbo.productCategory AS pc
 9	ON p.productCategoryId = pc.categoryId
10;
Results from T-SQL Select statement above.

Deleting Data

One of the things that I like about these foreign key relationships is that they enforce consistency within the database so that we do not end up with situations where data in one table references a value not present in another. The best way to illustrate this is by trying to delete some of our category data from our database. In the example below we are going to delete the row in dbo.productCategory with the ID value of 3, which has no corresponding records in dbo.product that reference it.

1DELETE dbo.productCategory
2	WHERE categoryId = 3
3;

Upon completion of this statement we will see the message that one row was affected.

Now lets see what happens if we try to delete the product category for Motorcycle which has got product lines associated with using the following statement.

1DELETE dbo.productCategory
2	WHERE categoryId = 1
3;

The result in this case is that we receive an error message stating that the delete statement conflicted with the reference constraint as can be seen below.

Error deleting data from table using delete statement above.

Introducing Cascading Deletes

So, now we reach the point where I talk about the option which bit me in the rear once while I was performing a delete operation. I had a task to clear some data out of one of the tables in a sales queue database which was used to transfer data from one system to another. I wrote my script which cleared the data out of tables in the correct order based on the relationship structure which was defined by the primary and foreign keys. However, in the final table which I was clearing data out of I had forgotten to highlight the where clause to restrict the data which was to be deleted (some important early career lessons learned here about running all of this in transactions, and more robust query patterns).

I was a little concerned when, what should have been a quick query was taking some time to execute. I started by thinking that perhaps I was being blocked by something, and then it completed. Now, I had made the assumption that the foreign keys were the same as we had defined above and that I would have got an error had something gone awry. However, this was not the case.

When the database had been built the developers at the time and used the “ON DELETE CASCADE” option when the foreign key was created. The net result being that by taking out the data in this one little lookup table, I had managed to clear the entire import queue table for the data transfer system.

We can demonstrate this by updating our foreign key definitions to include the ON DELETE CASCADE option.

 1--// Drop and recreate the foreign keys
 2ALTER TABLE dbo.product
 3	DROP CONSTRAINT fk_productCategory
 4;
 5ALTER TABLE dbo.sales
 6	DROP CONSTRAINT fk_product
 7;
 8ALTER TABLE dbo.product
 9	ADD CONSTRAINT fk_productCategory FOREIGN KEY(productCategoryId)
10		REFERENCES dbo.productCategory(categoryId)
11		ON DELETE CASCADE
12;
13ALTER TABLE dbo.sales
14	ADD CONSTRAINT fk_product FOREIGN KEY(productId)
15		REFERENCES dbo.product(productId)
16		ON DELETE CASCADE
17;

Now, if we go back to our delete statement to remove the motorcycle category from our table we will see that it runs successfully.

1DELETE dbo.productCategory
2	WHERE categoryId = 1
3;

What is somewhat misleading is that it will come back and tell us that one row was affected. However, if we go and query our main table again using our report query we will see there is significantly less data in our results.

Smaller result set of 16 records rather than 37.

Now you may ask, surely that is the expected result because if we took the category out of the table then our relationships will break because there will be nothing to join the product and sales rows back to. Yes, that is the case, however if we run the following statement we can see clearly that the sales table has significantly fewer than the 37 we started with when we loaded data.

1SELECT COUNT(*) AS dataRows
2FROM dbo.sales
3;

Now we have sixteen data rows left in our sales table…

This is the point where my recovery strategy got put to the test and I started to recover the database and log backups. I am glad to say that I was better at planning for bad things to happen than I was highlighting and running some T-SQL. Data was recovered up to the point of failure using tail of the log backup and then restoring from full and transaction log backups stopping at the appropriate time and copying the data over.

Summary

I want to use this as a cautionary tale for you all. I’m not saying don’t use this feature for cascading deletes based on the foreign keys. I am saying that you need to be cautious of using it because of the impact that it can have when things go sideways. It can greatly simplify application design when interacting with the data layer as we don’t need to code in the dependency structure to complete deletes. However, if there is a lot of ad-hoc query access and updates by hand then I would probably err on the side of caution and not, but that is just my view.

I hope you have found this useful, and that you can learn from my early career mistake and avoid making it yourself. These days I wrap statements in transactions, leave them open until I have verified that what has happened is what I expected. Another useful tip is to log the start times of activities like this with a print statement or in a table, that way when you do a restore you can use pick the right time to stop the log restore to minimise data loss.

Thanks for your time.

/JQ



comments powered by Disqus