For 2024, I’m trying something new: weekly homework challenges! For this week, let’s say we’ve decided to implement foreign keys, and we need to find data that’s going to violate our desired keys.
We’re going to use the Stack Overflow database, and we’ll focus on these 3 tables:
- dbo.Users table: with Id column as its primary key
- dbo.Posts table: with OwnerUserId column noting which Users.Id wrote the post
- dbo.Comments table: with UserId column noting which Users.Id wrote the comment, and PostId column noting which Posts.Id is being commented on
Before we attempt to implement foreign keys, we need to find data which might violate the foreign key relationships. Are there any:
- Posts rows whose OwnerUserId does not match up with a valid Users.Id
- Comments rows whose UserId doesn’t match up with a valid Users.Id
- Comments rows whose PostId doesn’t match up with a valid Posts.Id
- And to make your task easier, let’s focus on just the first 100K rows in each table (rows with an Id <= 100000) to see whether or not foreign keys make sense for this database
Your query exercise has a few parts:
- Write one or more queries to find these answers as quickly as possible with low load on the database.
- Given what you find, hypothesize about what might have caused the foreign key problems.
- Given what you learned, are there any changes you want to make to the app, processes, or database?
You can post your answers in this blog post’s comments, and discuss each others’ ideas. We’ll revisit your answers next week. Have fun!