Designing Data Teams for Enabling Analytics at Scale | by Manvik Kathuria | Nov, 2022

No one size fits all!

photo by caledico Feather unsplash

Most global organizations have embraced data analytics to drive data-driven decision making and reporting. If you are a startup, all you need is a focused data team to get started. On the other hand, if you are a large scale or medium to large business, you are debating the structure of data teams.

Designing teams are a mix of art and science. It may seem like an easy exercise, but don’t underestimate the importance of building a team that will set your organization up for success. The first thing you should have in mind is your data strategy before you design your team structure. Strategy defines your goals and actions you need to take to unlock the power of data. Let’s start with the questions that come to mind when building a modern data team.

  • Who owns the data roadmap and delivery?
  • What use cases do we want to address with the data?
  • What is the role of the data team(s)?
  • Should we mix data engineering, platform and analytics or keep them separate?
  • Centralized or Distributed Teams?
  • How many teams and what roles do we need?
  • What are the vision, mission and values ​​of the data team?

The world of modern data has completely changed from its predecessor. Historically small teams were centrally responsible for ingesting, storing, transforming and reporting on their data. The data landscape was simple, and the volume footprint was small. Most companies did not require an organizational structure for data teams. They can mostly keep up with small to medium-sized data teams that will serve the rest of the business.

The low complexity of data, slow growth over time and restricted use cases of data analytics did not require complex data team structures. AI/ML was rare, and most people were happy with the basic reporting and dashboards they could build with the help of the data team. Excel was the power tool for slicing, cutting and analyzing data. Although it is still a tool of choice for many people.

Ownership of data sets was simple, and the traditional data role of steward was sufficient to maintain the sanctity of data governance. Big data was unheard of, and computing and storage were expensive and not as easily scalable as they are in the cloud today. More data means more comprehensive hardware costing millions of dollars in CAPEX and with a vendor taking months to install it.

In the modern world data has changed drastically in terms of volume, complexity and usage. The new landscape is quicker and less patient when it comes to getting answers to important business questions. Cloud has opened up the horizons of scalability, although very few organizations know its true meaning and requirements. There is no shortage of proprietary offerings in ample data space and equally robust open-source projects trying to solve data problems. Data governance, security and management have become a far more important problem than they were a few years ago.

Software engineering has had its share of iterations from people, process and technology perspectives. Data engineering is still in its infancy, with much to be done to deliver large scale data with reliability and confidence.

The first place to start would be data organization. Data has become a separate workspace for many companies. However, the structure at the top is only part of the design. There should be simple design and collaboration at the team level. Let’s go through some popular team structures.

centralized teams

The traditional structure is one where a group (one or more teams) is responsible for anything and everything related to the data. From ingestion to curation to reporting, they deliver projects/initiatives from start to finish.

The central data team, along with the other teams, feeds them
image by author

It comes with the advantage of greater collaboration, knowledge sharing, central priority and a single place for all data needs. It is easy to set standards and best practices to ensure that the system is trusted and has good quality data. Data governance tasks tend to be a bit more straightforward as they do not require crossing multiple teams attempting to enforce data governance rules.

The biggest challenge with central data teams is that they eventually become a bottleneck for new projects and initiatives. Even though priority is central, stakeholders struggle to act first on their requests. The team may deviate from the objectives of the organization and work in silos.

Embedded Teams

Many organizations allow each business unit to run its own data team for ingestion, curation, and reporting. A data platform is typically a shared capability across all business units and is considered as a service for creating and reporting data pipelines.

image by author

Being closer to the business or data owners helps to understand the data better. Priority gets localized to the business unit and aligned with their objectives. This model suffers from a lack of standard practices and guidelines, with the risk of reinventing the wheel. Due to the invisibility of data in other entities, redundant datasets can lead to inconsistent reports.

Hybrid/Matrix Teams

The hybrid approach combines the best of both worlds and eliminates the drawbacks in the other two structures. The central data team exists to ingest, curate and prepare standard data sets for all business units. Engineers/Analysts from the central team get involved.

image by author

The central nature facilitates the creation, maintenance and repetition of standard guidelines, practices and procedures. People in central teams connected to the business can integrate best practices at the business unit level. Business units feel part of the solution and, working together with resources, provide data and problem context that are missed by the central team’s design. This design has the same challenges as a matrix structure, managing the priorities of a central and embedded team and leading those along the dotted lines of reporting. It requires better collaboration and cohesion to deliver the design smoothly.

What works for one organization may not work for another. Your data strategy should be the starting point for designing your data team organization. Current organization structure, size and bandwidth determine the best team structures. It is possible to start with one structure and move to another over time. Ultimately, the number of use cases you solve with data matters.

“Most people have a very strong sense of organizational ownership, but I think people have an innovation agenda, and everything is shared in terms of implementation.” – Satya Nadella

Leave a Reply