Writing Recursive Queries in SQL Server | by Jamie Burns | Nov, 2022

How to efficiently get your data into a hierarchical structure

photo by Michael Dzidzik Feather unsplash

It is not uncommon for your data to have some hierarchical relationship.

Storing this data can be relatively easy but sometimes it is a bit difficult to get out again, especially if the data has different levels of hierarchy.

There are many situations where you might encounter this kind of data – it could be a list of nested groups, or a family tree, or the structure of employees and managers within a business. For this article, we’ll go with the employee/manager example, but hopefully, you’ll see how it can be applied in different cases.

We will also be using SQL Server 2019, but this will work in all versions of SQL Server going back to SQL Server2008.

Let’s start with how data is modelled.

Let’s say we have many users, and each user can manage other users. We can imagine it like this:

Hierarchy of users, which shows John at the top managing Ian and Ellie.  Ellie doesn't manage anyone.  Ian manages Allen and Tim.  Alan manages Robert.
A hierarchical structure of users who manage other users

So on top of that, we have John, who manages Ian and Ellie. Ellie manages none, but Ian manages Allen and Tim. Alan manages Robert.

Only two tables are really needed to store this data.

storing hierarchical data

We a. will start with Users Table, where we store each individual user:

Next, we’ll store each relationship, which represents who is the manager and who is the employee:

Now, we will enter some data. Using the example above, we’ll end up with this data for users:

And the data for managers looks like this:

If we want to get a list of all users and their immediate manager, we can do that quite easily with the following code:

This results in this data:

So far, so good. But what if we want to hierarchically all managers for any given user? So, if we look at Alan, we get a list of Ian and John. If we look at Ellie, we’ll only find John.

This is where we come in to our recurring questions.

Writing a query to get all managers for the same user

Let’s start with a very simple query to get the immediate manager of a given user:

This returns a single value of ‘ian’:

Query results showing 'Ian'
query result

OK so far.

Now we want this result to contain ‘John’. We have the option of adding additional joins to that query, but then we’ll be limited to the number of joins we add. We’d make it work for two levels of managers, but what if we suddenly had three levels? or 30? This is where we need a recursive query to find all possible managers, regardless of the number of levels.

The key element within a recursive query in SQL Server is the Common Table Expression (CTE). There’s a neat feature of CTEs that allows you to handle recursion with ease, so that’s what we’ll be using here.

Let’s update our existing query to use CTEs, so that we can step through the changes needed:

See how we’ve moved part of the query WITH Block – That’s the CTE. It’s basically the same query as before (it still returns ‘ian’), but now that we have a CTE in there, adding recursion is easy.

The main part of a recursive CTE is a UNION, We write the CTE to first select the base data (in our case, finding the user’s immediate manager) and then UNION The resulting data is then returned to the CTE to find the managers of that new result set.

Something like this:

see how we got a new one UNION There, that joins the CTE (called AllManagers) on that dbo.Managers table, using StaffUserId Columns to match pre-selected managers.

Now we end up with a list of ‘Ian’ and ‘John’, as we’d expect:

Query results, showing 'Ian' and 'John'
query result

if we own UserId parameter for id 6 and execute again now we get all manager of robert:

Query results showing 'Alan', 'Ian' and 'John'
query result

And all! We have a recursive query that returns all managers for a given user.

adding to hierarchy-level numbers

You’ll want to return an index for each returned manager, to indicate who is the immediate manager, who is the next manager, etc. This is easy to do and is a helpful addition to the recursive CTE. Here is the code:

See how we have added ‘Status’ column in CTE. In the original query, we are setting this as 1, which will be used for the immediate manager, and then UNION The division adds 1 each time to count the levels. Running it for ‘robert’, we get these results:

The results of the query showing 'Alan' with a condition of 1, 'Ian' with a condition of 2 and 'John' with a condition of 3
query result

So we can see that the immediate manager (‘Alan’) is ranked 1, and each subsequent manager is assigned an increasing position.

This is useful when you are only interested in a few specific managers (for example, find the first two managers of a user).

avoid infinite loops

One risk with any recursion is an infinite loop, in our case, writing a CTE that loops round and round until there is enough at the end of the SQL and kills the query.

Let’s add an extra line to our dbo.Managers Table to make ‘Robert’ manager of ‘John’. Visually, the hierarchy would look like this:

The manager is showing the hierarchy with John being managed by Robert.
update manager hierarchy

And the updated data will be like this:

If we run the latest version of our query, we’ll get an error:

An error, saying 'statement terminated'.  At most 100 recursions have elapsed before the statement is completed.'
The error we get for the infinite loop

However, if we look at what is returned, we can see that those first 100 rows are still returned:

List of results.  Shows repeating 'Allen', 'Ian' and 'John' multiple times as the position value increases for each row
query result

This is not perfect, and the solution is not to increase the maximum recursion limit unless you need to!

There are several ways to fix this, but one of the simplest (though not necessarily the fastest) is to create a ‘path’ of managers and check that the next manager you’re adding to the CTE doesn’t already exist. Essentially you can create a csv value of the already processed manager id and check against that. Here’s what it looks like:

See how we have determined the initial value of ManagerPath having the id of the manager before and after it and with a comma in it UNION We append this value to the next manager’s ID. We check that we haven’t seen this manager already WHERE clause, checking that the id (comma surrounded) does not exist there.

This results in the correct list:

The results of the query are showing 'Alan' with position 1, 'Ian' with position 2, 'John' with position 3 and 'Robert' with position 4.
query result

However, it may not make sense to you to include ‘Robert’ in the list of ‘Robert’ managers. If you want to exclude the original user from the results, you can add him/her to WHERE section also. Here is the code:

This results in the same result set, without the original user:

Query results, showing 'Alan' in position 1, 'Ian' in position 2 and 'John' in position 3
query result

We have seen that we can query hierarchical data in SQL Server using a Recursive Common Table Expression (CTE) to return all the parents/ancestors of any row. We have also seen how we can include a positional value for each result and avoid the infinite loop.

Leave a Reply