CDO's Data Quality Makeover: A 4-Step Blueprint for Success | DGIQ Conference, December 2023
Succeeding with Data Quality in the Era of the Data Cloud | Presented by Manu Bansal, CEO and Founder, Lightup Data | DGIQ East, Washington, D.C., December 5, 2023
Transcript
Thank you all for being here. Is this your last talk of the day? Yeah. Okay. Good. Well, thanks, Thanks for your commitment. And hopefully, this is what the time you’re spending here.
I’m Manu Bansal. I’m one of the founders and the CEO of Lightup Data. We are a data quality company. Started about five years ago, and I’ll talk more about that.
If I get started, I’m curious, how many of you are data quality people specifically?
Okay.
How many of you would call yourself the CTO or the highest ranking data person at your company?
Okay. And, I’m guessing there’s a lot of you who would call yourself practitioners or the people actually responsible for delivering on the initiatives of the CTO. Right? So, so this is meant for both you and your data officers.
And this is, insight that we have garnered working with companies like McDonald’s and Gap and Skechers over the last few years, And the first time I’m talking about these topics, I’m actually super excited about the insights you’re able to distill as I started to look back at all the work we have been doing with our customers.
And what are some lessons that I would love to pass on to you and get your views on and see if they translate into other organizations?
So like I said, we are a data quality company. We started about five years ago based in Silicon Valley backed by one of the top tier Silicon Valley firms.
Before building this company, I was building another company called Luhana, which we were acquired right before I started this company, where I was, responsible for for building predictive analytics for telecom operators. And that was kind of my deep dive into data being on the other side where I was at the receiving end. And we would just run data quality issues all the time and we looked around and we said the data stack has now moved a lot from what it used to be and the data quality tools that exist don’t seem to fit the stack well. Problems have shifted. The solution architecture needs to be different than what I was seeing. So we said, let’s go solve that problem the way we would like it solved, right? So that’s how light up came about.
And, before I go into all the insights that I’m going to present to you today. Right? I just wanted to take a moment to thank customers that have worked so closely with us that I feel they’re more of collaborators rather than customers. Right? We don’t normally get to thank them enough. So I just wanted to make sure that I’m bringing them up here.
And a lot of what we learned is actually by working with them and at drawing from their experience and translating that into a tool that we could give back to them.
So what I want to talk about today is, How do we go from what we have come to think of as data quality debt to what we would like to achieve, which is data trust? Right. And what I mean by that is, when we talk to all those customers that we have worked with or people who we might get to work with, when we ask them how they’re doing on data quality.
Usually, the answer is the momentums the the initiative started with a lot of momentum, but died down at something like five to ten percent of target.
Right. So the rest of what was never implemented is now data quality debt, and that’s what we are all sitting on.
And when we try to understand why that’s the case, what we have noticed time and again is the investment needed to solve the data quality problem at scale is just too high. And traditionally, the reward has been a bit too low to justify that investment.
But that’s changing now. That’s why we are in market. That’s why we are seeing the success that we have with our customers. And in fact, they are seeing the success with data quality initiatives that they’ve never seen before. This ROI script is actually flipped.
Right? And what are some lessons we can draw from the way they approached data quality to derive that new equation of ROI?
So before we go there though, let’s talk about the data quality debt as a problem statement, right? So what we notice is setting up checks has traditionally taken too long or too much effort. Right? And we’ve seen estimates like a month to build a simple null value check end to end, but push it out into production, you’re jumping through hoops all the time, right?
I mean, if you’re doing that kind of a development cycle at scale, that’s not going to go very far. And so now before you know it, what happens is the data model has changed, and that’s kind of the promise of the new data stack, which is being able to add applications at a rapid pace. But then now you’re always chasing your own tail because this model has shifted by the time you’re done with building out those checks and you start over again. Right?
So what you end up with is just not enough coverage. That’s why the number usually is like five percent of all the checks they wanted to build. That’s all they got to, right?
So that’s what I mean by the data, data quality debt problem.
But then it doesn’t have to be that way.
So this is what McDonald’s was able to accomplish, of course, working with us, which is what I’m drawing upon here, where you see that massive spike and look at the Y axis here. So what I’m showing you here is a number of data quality checks in production over time. And this is actually a dated graph, but I wanted to zoom in on that steep cliff you’re seeing there, where anything they had done with a legacy in about two years, they were able to port over to light up our platform in about two weeks.
Right. So that’s kind of the thesis here, which is the effort it used to take, can be cut down by a couple of orders of magnitude And then the second part I’m going to talk about is the value of actually implementing those checks is tremendously higher than what it used to be. So that’s what flips your ROI equation all of a sudden.
Okay. This is, same kind of data from some other customers we have worked with, where the number of checks has started to run into five thousand or around that that number. Alright. So I want to just kind of take a step back and think about if you had to build that many checks where each check was taking you one engineer a month, what’s the cost of just realizing that kind of outcome?
It’s basically infeasible. Right? I mean, there’s no ROI that really justifies it. If you could do that in minutes, that’s what you get, right?
And this is still not one hundred percent coverage you’re talking about. These accounts are still growing. Right? And they’re probably going to get to eight thousand, ten thousand checks before they call it done, right?
But they have been able to come that close that quickly.
All right. And so what they’re perceiving, with all the work that they’re now able to do with us is, Of course, this massive reduction in data quality incident rate, but what really captures it for me is what Dan said there from Baker Heels, which is people are now seeing a new art of possible.
Right? It’s like, it’s the same old problem, but way more solvable than people have ever seen it to be. And so what’s happening with Dan, especially is people are now coming to him from different business units with their own data quality use cases.
First, they don’t want to do it. Right? Dan was going and telling them you need to do this. Then they started to come to him and asked him, can you do this for me? And what’s happening now is they’re asking him, how do I do this?
People are just that motivated, that inspired. They’re like these problems that we’ve just dealt with with people and processes, you’re telling me, I can actually just build this out as a check at scale in a few minutes. And I don’t need to be a data engineer. I need to be an expert of of software and scale.
That’s unprecedented.
Right? So that’s what that’s what we are really seeing right now when people have been able to push in the right technology with all the surrounding process of their own that goes with it, right? And so what did they do?
What did they specifically do that was different than how how they were even themselves doing it before we got there, right? So that’s my kind of four point blueprint for success here which hopefully translates to your organization too. Number one for me has always been focusing on operational use case.
And I’ll show you examples, but every single one of those customers has focused on an operational use case as opposed to something that I would just call analytics. Right? It’s not to not to take away from analytics, but it’s one thing to solve a use case that’s a report on a Tableau dashboard that someone is looking at internal to the organization.
It’s completely different when you’re talking about an operational use case where it’s directly driving customer experience or business risk. So they all focus on operational use. It’s much easier to justify investing in data quality that way. Okay.
Number two, they hinged on to replatforming, and I’ll talk more about it. But you’re rebuilding the platform as we speak. Right. I mean, you’re you’re picking up data rates or snowflake or red shift or what have you. But, okay, that rebuilding is not necessarily limited to just your data store.
You can and should go beyond that and use this opportunity to justify investing in data quality from scratch.
Number three, they were all big on democratizing data quality. And this was actually something that took us some time to understand from the customer.
They said, look, If we just have a data steward or a small expert group building all the data quality checks, how far can it really go?
Data engineers are some of the most precious resources hardest to find some of the most paid people. Can’t expect them to just sit and write checks when they should be instead building new applications and pipelines.
Well, so what do you do?
Broaden it out, right? Take it to business. And number four, kind of piggybacking of the first one there, they all approach data quality from an operational point of view, right, just like Now data is going into operations from analytics. That’s what needs to happen with data quality too, right? That’s what I’m going to talk about. So the fourth thing that I’m going to detail in detail into into rest of my talk, giving you concrete examples from what they did and how you can think about approaching this problem also.
Now why is it that we are talking about this now, right? I mean, why is this the right time? Why is this succeeding now when some of those concepts maybe we realized like ten years back. Right? Couple of things in my opinion. One is, this renewed motivation, to solve the data quality problem.
Why is that? Because you have shifted from just BI to AI, right, taking data into production.
I’ll talk more about that. But that’s a good reason to reinvest in the platform also to do data quality. Right? From the get go. Okay.
There’s of course scaling that’s happening at the same time, which is driving a lot of this change, but that’s an opportunity to rethink your entire data architecture. And that’s why we are seeing that much openness to asking how should we do it now when technology has probably moved more than a decade? Right. We have the chief data officer.
We didn’t have that kind of a formal discipline for the longest time, but we do now. And that means a lot because now the CTO is responsible for delivering a platform that people can build on and trust. And that’s a big part of the measure of success.
And of course, technology has come a long way, right? So all of those are coming together at the same So building on those changes, right? The first recommendation is, like I said, pick an operational use case. So here’s what McDonald’s picked as their, as their flagship use case.
They’re operating a lot of marketing tech campaigns with data.
And this is in the wake of the pandemic where McDonald’s being traditionally the brick and mortar business, was at a crossroad and it’s like what do we do now?
And they went digital like never before where they introduced a personalization campaign. You can now have a McDonald’s account for the first time in life.
Okay. So that was called My McDonald’s.
And this has been a longstanding idea of the loyalty program and personalization in retail for the longest time, Magnaudis did it for the first time in twenty twenty, twenty twenty one.
And this is obviously driven by data, Right? You are now buying a burger at a restaurant that generates a transaction, which needs to flow back to the Redshift cluster If that comes out broken, guess what? You don’t get your points back. So from a McDonald’s point of view, they have violated their fiduciary duty and not giving you points back.
And then you make a chargeback, so it’s a financial loss. But then there’s an NPS loss, so they want to overcompensate you. And then they lost the opportunity to personalize the marketing campaign because they didn’t detect what you bought.
Just imagine the impact of that issue. Right? I mean, this is a business that celebrates when they grow a couple of percent points a year, right? So you can imagine how big that initiative was.
It’s very different than saying I have a report on my power bi dashboard. That’s not showing the right numbers. Well, people are not getting the points back. You have heard quantifications like an hour of this incident on a small group of restaurants is worth six figures.
Right? It’s that easy to attribute direct impact.
Baker Hughes chose to focus on supply changes cases.
They really really care about delivering their equipment to their customers on time at the right place.
Well, it turns out the data that’s ending up in the SAP ERP system for every late delivery, seventy percent of them don’t have a cost code.
So when you look back at the data two months down the road, you have no idea why seventy percent of your of your deliveries are not making it on time. Well, someone’s supposed to capture this data, but it’s not getting captured as part of the business process, but you discover only when you try to actually solve that problem.
It’s too late. You can’t go back in time and, recollect that data.
But you can imagine how operationally important that use case Right. There’s a lot of other use cases that we are now taking for granted because data products is the big new thing. We have finally figured out how to take data directly into customer experience.
Those are the use cases that we need to focus on to justify investment in data quality.
Alright. So really to sum it up, what’s happening here is that what used to be just a BI application.
It’s kind of graduated into being an AI application to use the word loosely, right? That includes data products that that includes ML capabilities, includes fancy new things like GenAI, but also includes in my books, run of the mill, but very concrete use cases like just giving people points back on their transaction at a store.
Right? But the key here is that data is directly driving a data product. That’s what you want to focus on. Alright. Then data quality is not optional anymore.
That was my first recommendation. The second recommendation, which somewhat flows from this evolution of the data cycle is hedging on to replatforming, right?
We’re seeing massive scaling of data. This is this is no news to any of you in the room. And this is not just in terms of data volume, but this is in terms of what you call variety of data or what I’m calling cardinality.
This is in terms of rapidly evolving shape of data. You’re adding new tables, new views, new columns every week, new applications that are riding on top of those new data models.
And to sustain that kind of data volume, and those new applications, we are seeing a rebuild of the data platform, right? So some bigger Hughes, in this case, is now rebuilding the data stack on Tops data bricks, where they’re creating a clone of SAP so that they can do analytical and of operational analysis of that data that they cannot otherwise do on top of SAP directly. Well, okay. Great.
I mean, it’s it’s necessary. To do that because the data volumes we’re dealing with. Same thing at McDonald’s looks even more complex, but the fact is still the same. The data stack is getting rebuilt.
It has to.
Right. So it sounds like this is all negative, right? You’re dealing with that much scale. That only makes the data quality problem harder.
The interesting thing here is there’s actually an opportunity too.
This is a good reason to go and insist on data quality. So if you are reporting to your CTO, that’s what you need to tell them. If you are the CTO, this is what you need to ask yourself.
How will your people embrace this new platform you’re building out if they don’t trust it.
Right. I mean, your legacy stack has been matured over two decades.
Your charter is to take the organization to the new data stack in two years?
Fine. Maybe it’s like five years.
Right, but how are you going to get your people to embrace that stack? It’s what we keep hearing time and again, which is the biggest resistance actually comes from the inside where your own people don’t want to be in the line of fire if they make a mistake hitting the new data stack.
So you cannot afford to wait for data trust to be created. Right? If you’re going to succeed in this initiative, you need to short trust from day one. It’s much better to intersperse the two. Take one small application, build data quality on it, get people to embrace that, and then move to the next application. Is exactly what we are seeing customers do at McDonald’s or Gap or Baker Hughes, right? So here’s a snapshot of why data quality traditionally has been too slow.
And this looks ugly on purpose, right? I mean, that’s what your process is to build data quality.
I mean, you’re you’re practitioners. So you know this probably better than I do. Right? There’s a full prototyping phase, but then there’s an implementation at scale process And by the time everything is said and done, what finally gets deployed sometimes mismatches the spec. And so you are trying to bake it into a production grade implementation after it has been deployed. And the whole thing might take anywhere from four to eight weeks.
Well, okay. I mean, you can’t afford that. Right? And that’s what your process looks like.
You’re going to fail in pushing your new platform through. This doesn’t happen, but it doesn’t have to be like that.
What these customers are showing us is you can cut this cycle time down to minutes. Right? I mean, this is literally what McDonald’s talked about. At the data and AI summit earlier this this year.
What would take them a month to build out, they’re not doing this in twenty minutes.
There’s still corner cases where it might take you two hours, maybe even two days. But for majority of your use cases, that’s the massive difference in velocity that you can expect to achieve, right? But that’s what you need if you’re going to be operating at forty x the scale of data that you had in your last life. Right? You need that.
Good news is you can get that, right? If you’re picking the right technology. So here I’m comparing a couple of different architectures.
Of course, I’m biased, but I just wanted to give you a sense of what are the different architectures to look for so that you can decide which one is a good fit for you. In that case is where maybe there are other architectures and hours that works for you, but just to give you context, right? If you look at traditional data quality platforms, the way they have been architected is for much lower data volume when things were, let’s say, a few spreadsheets worth of data, a few CSVs or a MySQL DB.
Right? So the way they would operate is what’s to what’s to your left, which is an architecture that just imports all the data into data quality tool first and then you begin to analyze it for quality of data. You might then fix it, you might then push it back, whatever, but that’s that the first step is bring data out from where it lives, bring it all in one place.
That used to be fine when volumes were low doesn’t work anymore.
And so the next evolution of that architecture was to now start to embed a scalable processing engine within the DQ platform.
Right? So that could be spark, that could be hadoop, that used to be hadoop, then that gave way to spark. And that’s what a lot of vendors have tried to do, in, I would say, last six to eight years.
In theory, that works really well. Spark scales. I mean, that’s the basis of Databricks.
Snowflake has a similar map reduce engine. So why wouldn’t it work for the data quality? Go.
The problem though is it creates a dev ops nightmare on you as a custodian of this DQ tool.
Managing a cluster with a merits park, try doing that. You probably have. So you know what I’m talking about, right? It’s it’s just not feasible from an operational point of view. So sounds good, but doesn’t actually work the moment you hit scale.
A competing architecture that came about was tried to kind of skirt around the problem of scale by by looking at metadata instead of the actual data. So that’s the observability architecture I’m showing to the far right there.
So that our idea here is just process the metadata and see how far that goes in solving data quality.
Metadata is computed by the data platform, so you’re not spending additional compute resources.
It’s couple of orders of magnitude smaller in scale, so much easier to deal with. The problem is it has its own limited Right? It’s it’s good for basic checks. It’s relatively shallow, but doesn’t really let you solve problems like price checks.
Mcdonald’s is seeing an issue where my burger transaction is showing up as a five thousand dollars transaction. Okay. You’re giving me one hundred x the points I’m supposed to get back.
It doesn’t happen at a McDonald store, right, and bring a truckload to carry burgers out with me. Right? But how do you even check for that using metadata alone?
You have to scan the data. So that’s where it runs into limitations. Where it’s a good architecture from the point of view that it deals with the scale problem if that meets your requirements.
And then the way we chose to architect the platform was we said, look, we saw this transition happen in the BI where BI tools used to have the same architecture.
Bring everything in, then we’ll start to chart it and analyze and do what we’ve learned with it. And then I said, look, it’s no longer necessary, but it’s actually not even working anymore. But that’s what red shift has solved and then snowflakes solved and then data bricks solved, which is you can just issue queries down.
And now the platform is doing the big data compute part for you. You never have to move any data. You can do exactly the same thing for data quality, right? Every single check can be broken down into two broad components.
Compute a data quality metric, measure like what’s the volume looking like, and then detect anomalies on it. What is it supposed to look like? The first component is a big data compute part. You can push it down. You don’t have to move data. The second component anomaly direction, much lower data volume problem, you can do that outside the tool, right? So there are different choices available, but I encourage you to go to another talk tomorrow.
One of my colleagues, Brian is talking about the lighter platform for a whole session tomorrow. If you’re interested in what that looks like, please go to his talk tomorrow or or you can stop by at our booth.
The point here is that there are now platforms which will give you those deep data quality checks without compromising on what you can test for, they will scale and they’re easy to use. They’re no longer expert task. Right. So that’s what you want to look for. And the timing is when you’re redoing your data platform, right. That was my second recommendation.
The third thing here is to democratize build out of data quality, right?
Like we said, McDonald’s is now running into four to five thousand different data quality checks and counting. Right. This is across twenty different data products which they classify as five or six different data domains, right? If you started to bring all of that into one team, that team doesn’t even have the context to define good versus bad data.
Right? So it’s a massive challenge just from a data literacy point of view. Least of all about resources, right? But the way they did it was they brought in this out So right now there’s three hundred people at McDonald’s building their own EQ checks.
Well, that’s the platform team entire the entire platform team is not that big. So there’s people from sale, from treasury, from marketing, they’re just writing their own data quality checks. This is literally business users.
Right. So the point here is that that can be done. I mean, it sounds obvious in hindsight that, of course, that’s the only way to scale I think the big question mark was can this actually happen?
And it was tremendously hard when building data quality checks was an expert task. When you needed to go and learn a domain specific language or be a Python expert or whatnot, right, or having to write a scalable implementation color, Java, or whatnot.
Turns out you don’t have to do that anymore. Right?
Just for one, one stake in the ground, we have a platform that that has three different interfaces to write data quality checks, right, from least technical to most technical that the first interface, which is just a drag and drop interface, fit for business users.
Doesn’t solve all your use cases, but sixty to seventy percent of what business users want to do, just happens there. Alright. Takes minutes. Then you have the next more complex interface, which is a a variant of SQL. It’s it’s actually pure sequel, but you’re not writing SQL for Scale. You’re writing SQL for function.
Right? And then the system knows how to convert a SQL that scales out of that functional description. But you’re still writing it in exactly the same dialect as an underlying technology. You’re doing this on Snowflake, it’s Snowflake’s equal. You’re riding there, right? So now any use cases that require one more degree of customization than what the prebuilt checks will give you, go ahead the SQL interface, right. And then McDonald found a lot of success with that specific interface.
And then there are operations that you want to do which are programmatic in nature. You want to create a data quality check in response to creating a new table. Or you want to trigger a DQ check when data updates in the certain tables. So you’re tying it into your airflow orchestrator.
Use the API interface for all those needs. So depending on what you’re trying to accomplish, there is a right interface. But the key point here is that majority of what you’re trying to do here, can be done by business users themselves as long as the data steward is really there to enable them instead of trying to do it themselves.
So what we’re seeing at those organizations is the role of the data steward has now shifted or elevated from being the door to the enabler.
Right? I mean, there’s all those different roles that these customers have defined for us, which we then implemented in the product.
Right? So as you can see, it’s not just one person, one persona writing checks.
There are execs that want to see what the data quality score looks like from a bird’s eye view. Right? Then there are people who are just the responders. You don’t want them authoring any data quality checks, but anytime there’s an incident spotted, they should be the ones to get on it, get people assembled, get experts in the room, coders, whoever it is.
Right? It’s like, what’s the impact of this issue. Is it a true anomaly that’s detected or is this a false anomaly? Do we need to act on this?
If we do, who does that, by when? Has this been done? Well, this person is not actually authoring the data quality chair. But then there are authors that don’t want to be in the line of responding to every single incident.
Right? These are two very different skill sets. So this is all those different personas and what you’re seeing at those organizations is. Data Stewart is not taking on that expert role of kind of creating literacy with all those different personas, which span all the way from data engineers to line of business users, right? So that’s the way you want to approach this if you’re going to scale this out to thousands of checks.
And my last recommendation here is to think operationally.
I mean, data is operational.
It’s enabling closed loop applications, right? It’s an end to end data pipeline. You’re not just sitting there analyzing that interactively. It’s directly driving supply chain or customer loyalty or inventory management.
Well, okay, do the same thing with your data quality as well.
Right? Think how do I make this proactive?
How do I make this an incident detection and management workflow?
Instead of a data quality analysis workflow, you don’t even need to wait to run this check. Is run it on a cadence. You don’t even want to look at the system if everything is running fine, right? Let it tell you when something needs your attention.
And then plug that signal into pager duty, plug that into service now, wherever people are tuned in, for an operational workflow responding to incidents. So approach it from that point of view, we are continuously monitoring data quality, check are running every day, maybe even every three hours, every hour. Right. And then they are generating incidents, and you only get pinged if there’s a problem, not the other way around.
Okay. It’s a very different mindset, but then you start to think in terms of time series as opposed to a table of data or snapshot of data. Very different way to approach this.
Alright. So just to sum it up, these are the four things I talked to you about, right, pick an operational use case, much easier to justify the ROI.
Hit on to refreshing of the data platform, right? You can do better. And you should because the stack has changed. The old tools anyway don’t work, but you don’t have to stick to them.
Democratize, don’t centralize, right, no longer necessary, no longer scalable. And then operationalize right, to come out of the analytical mindset, think operations because that’s what data is now driving.
So that’s all I had to talk to you about today. There’s an announcement, which is fresh off the mill. We just announced our integration with Colibra earlier today.
We have had an integration with Alation for a very long time now. We are data quality vendor, their data catalog, and a governance solution. And now we have that integration going with Calibra too. So if that’s the stack that you’re interested in learning more about or how these two tools can work well together, stop by at our booth, talk to us about what you can do with light up and how these tools work together.
We can deploy in multiple ways So especially in very regulated enterprise environments, if you want something to go on your side, we can do that. You want something that’s a combination of a staff and an on prem deployment. A hybrid deployment, we can do that.
Learn more about our platform. We have some people wearing light up shirts here. Feel free to talk to them after this session. There’s some data sheets that you can grab. We have a booth here. Feel free to stop by. And then Brian is talking specifically about the lighter data quality platform tomorrow.
In the first half, go to this session if you’re interested in learning more about this.
And we do have a free trial, by the way.
So you could just get started right now.
That’s all I have, and we have some time for questions. Thank you.
I have a question to suggest So the question is in your client organizations, what kind of oversight have your clients implemented all these d q checks or add they will.
And then my recommendation is, I don’t see the slides loaded, so something something to look into.
I see. Okay.
So the oversight that you’re talking about, right, talking about, you know, access controller you know, you have, as you said, in your example, three hundred plus people creating teacher checks.
Right.
From CBO standpoint, has their have your clients implemented oversight around what kind of checks are being created?
Which ones are running and so forth.
Great question. And so we have this concept of workspaces, which I didn’t go into at all. So I talked about these different data domains. Right.
So three hundred users is made up of say five different teams or six different teams. There’s a workspace for each team, and then you can have user membership restricted to a workspace. So you can make sure sales is not seeing what treasury is doing if that’s how you want it to be. Okay.
So that starts to decompose the problem into small groups. Within each group, then what people are doing is they will designate all those different roles I showed you where there’s an admin for the workspace who has full rights on that workspace. There are people who can author, but, but let’s say cannot add users. And there are people who can look at what has been built but cannot author.
People who can respond to incidents. So what starts to happen is that the the same structure that you would have, let’s say, a ten person team, you start to see expanding into thirty persons with different roles, but then you have six, eight, ten of those parts, which are all operating independently. Then you can start to apply restrictions like, even if you’re connecting to the same data source, and you want to restrict the set of tables that this one r can access or read and the amount of checks they can run, you can apply those kinds of constraints by a team by creating independent connections with same entity, right?
So you could put constraints like, I want to govern how much snowflake time I’m going to use or how much data brief load I’m going to create. Okay, the square governance built in. So you can say no data quality check should run longer than say ten seconds of compute time. Or put together, they should not consume more than our worth of resources.
And then you can decide who gets to govern that. So, yes, people are governing. I guess the question is what are the, I mean, if you have more specific oversights, I can get into that.
This is helpful for now. Thank you.
Okay. Other questions? You have to have some questions.
Is your product that, you know, you guys are like a licensing cost on an ongoing basis, and or is it something that is stalled and then, then as an applicant.
So what we have designed the pricing model to be is to give you predictability and what you need to budget for this.
We kept hearing about that from enterprise customers.
And you’re seeing the kind of customers I’m talking about here. So that’s basically who we work with. And, as much as we want usage based pricing because our investors like that, you, as the customer, want predictability in pricing. So we have a happy medium where there are tiers, there’s like t shirt sizes that you’re buying. And then you know when you’re going to graduate out of it and we tell you. And when you do, You are happy about that, and we are happy about that. But we don’t give you surprises in in your spend at all.
Other questions?
Not a question, just an observation that, you know, we’re, you know, I’m from AP and we had a massive amount of data. We everything that he said brought me exactly in our journey where we tried doing data quality rules one by one, you know, but the team became it was centralized, you know, as a CDI and he became a bottleneck too. And he just couldn’t he couldn’t scale quality fast enough. So we’re in the same process of looking for that right there. Molly vendor, not in the homes with that here in Houston?
You know who to talk to now.
The same the same all the pain points that you’re talking about, but we do that.
How long have you been at this?
We’ve been at it for two years. So Alright.
How big is the data team?
Well, there’s data teams all over AD. But, like, in Studio, like, the team that was just grinding it responsible for, you know, the first that we said, like, the the first way to define the rules and working with the stewards, there’s just four of us.
And then, you know, you have and we our intention was always to federate out and, to get more people involved but the tool just didn’t allow for that type of flexibility. So and then our tech team is about four or five people as well.
Alright. Right.
So we have very small amount of people trying to centralize things.
Thanks for sharing. I mean, it’s a design choice. Right? So either either you are either the platform supports this kind of structure or it was never designed for it.
And that’s why I’m talking about the principle. So regardless of what tool you pick, that’s the structure you are looking for. Right? Even if you’re building your own, that’s what you want to enable.
And that takes a very different shape and form if if if you’re trying to democratize versus keeping it in an expert task, right? We have we have seen, we’ve seen those pitfalls where it’s easy for people to prototype, but it’s really hard to scale it. So now scaling it comes back to the to the CDO org, and that just becomes a massive bottleneck. Right?
So even if you’re trying to solve this on your own. That’s what you want to be thinking about. Unless it branches out, it doesn’t scale.
Other questions, comments? Oh, we’ll make sure the slides are made available, but, feel free to just send us a note if you don’t see that, or I’ll get in touch with you and make sure you have this right back.
Cool. Thank you all and, enjoy your dinner.
Find hidden bad data, across the modern data stack.
Get full visibility into enterprise data with Lightup’s modern Data Quality and Observability solution.