How to efficiently get your data into a hierarchical structure
It is not uncommon for your data to have some hierarchical relationship.
Storing this data can be relatively easy but sometimes it is a bit difficult to get out again, especially if the data has different levels of hierarchy.
There are many situations where you might encounter this kind of data – it could be a list of nested groups, or a family tree, or the structure of employees and managers within a business. For this article, we’ll go with the employee/manager example, but hopefully, you’ll see how it can be applied in different cases.
We will also be using SQL Server 2019, but this will work in all versions of SQL Server going back to SQL Server2008.
Let’s start with how data is modelled.
Let’s say we have many users, and each user can manage other users. We can imagine it like this:
So on top of that, we have John, who manages Ian and Ellie. Ellie manages none, but Ian manages Allen and Tim. Alan manages Robert.
Only two tables are really needed to store this data.
storing hierarchical data
We a. will start with
Users Table, where we store each individual user:
Next, we’ll store each relationship, which represents who is the manager and who is the employee:
Now, we will enter some data. Using the example above, we’ll end up with this data for users:
And the data for managers looks like this:
If we want to get a list of all users and their immediate manager, we can do that quite easily with the following code:
This results in this data:
So far, so good. But what if we want to hierarchically all managers for any given user? So, if we look at Alan, we get a list of Ian and John. If we look at Ellie, we’ll only find John.
This is where we come in to our recurring questions.
Writing a query to get all managers for the same user
Let’s start with a very simple query to get the immediate manager of a given user:
This returns a single value of ‘ian’:
OK so far.
Now we want this result to contain ‘John’. We have the option of adding additional joins to that query, but then we’ll be limited to the number of joins we add. We’d make it work for two levels of managers, but what if we suddenly had three levels? or 30? This is where we need a recursive query to find all possible managers, regardless of the number of levels.
The key element within a recursive query in SQL Server is the Common Table Expression (CTE). There’s a neat feature of CTEs that allows you to handle recursion with ease, so that’s what we’ll be using here.
Let’s update our existing query to use CTEs, so that we can step through the changes needed:
See how we’ve moved part of the query
WITH Block – That’s the CTE. It’s basically the same query as before (it still returns ‘ian’), but now that we have a CTE in there, adding recursion is easy.
The main part of a recursive CTE is a
UNION, We write the CTE to first select the base data (in our case, finding the user’s immediate manager) and then
UNION The resulting data is then returned to the CTE to find the managers of that new result set.
Something like this:
see how we got a new one
UNION There, that joins the CTE (called
AllManagers) on that
dbo.Managers table, using
StaffUserId Columns to match pre-selected managers.
Now we end up with a list of ‘Ian’ and ‘John’, as we’d expect:
if we own
UserId parameter for id 6 and execute again now we get all manager of robert:
And all! We have a recursive query that returns all managers for a given user.
adding to hierarchy-level numbers
You’ll want to return an index for each returned manager, to indicate who is the immediate manager, who is the next manager, etc. This is easy to do and is a helpful addition to the recursive CTE. Here is the code:
See how we have added ‘Status’ column in CTE. In the original query, we are setting this as 1, which will be used for the immediate manager, and then
UNION The division adds 1 each time to count the levels. Running it for ‘robert’, we get these results:
So we can see that the immediate manager (‘Alan’) is ranked 1, and each subsequent manager is assigned an increasing position.
This is useful when you are only interested in a few specific managers (for example, find the first two managers of a user).
avoid infinite loops
One risk with any recursion is an infinite loop, in our case, writing a CTE that loops round and round until there is enough at the end of the SQL and kills the query.
Let’s add an extra line to our
dbo.Managers Table to make ‘Robert’ manager of ‘John’. Visually, the hierarchy would look like this:
And the updated data will be like this:
If we run the latest version of our query, we’ll get an error:
However, if we look at what is returned, we can see that those first 100 rows are still returned:
This is not perfect, and the solution is not to increase the maximum recursion limit unless you need to!
There are several ways to fix this, but one of the simplest (though not necessarily the fastest) is to create a ‘path’ of managers and check that the next manager you’re adding to the CTE doesn’t already exist. Essentially you can create a csv value of the already processed manager id and check against that. Here’s what it looks like:
See how we have determined the initial value of
ManagerPath having the id of the manager before and after it and with a comma in it
UNION We append this value to the next manager’s ID. We check that we haven’t seen this manager already
WHERE clause, checking that the id (comma surrounded) does not exist there.
This results in the correct list:
However, it may not make sense to you to include ‘Robert’ in the list of ‘Robert’ managers. If you want to exclude the original user from the results, you can add him/her to
WHERE section also. Here is the code:
This results in the same result set, without the original user:
We have seen that we can query hierarchical data in SQL Server using a Recursive Common Table Expression (CTE) to return all the parents/ancestors of any row. We have also seen how we can include a positional value for each result and avoid the infinite loop.