Taking time to reflect on the past five years of running Lightup has re-energized me as I remember our humble beginnings and all the lessons we’ve learned along the way — some hard, some easy, some good … and some completely unexpected.
I’m proud to share that we’ve reached a huge five-year milestone, supporting Data Quality (DQ) at an unprecedented scale, where our biggest enterprise customers are now able to:
- Run 500,000+ daily live Data Quality checks.
- Cover 12+ petabytes of data.
- Monitor 2,500+ tables.
- Enable 500+ users to participate in Data Quality management processes.
The net result?
94% reduction in incident complaints from business stakeholders, building stronger data trust across the enterprise.
From 800 metrics before Lightup to upwards of 500,000 unique daily Data Quality metrics using Lightup, we’ve been able help our customers increase the number of people writing checks from 20 before Lightup to a mix of nearly 500 across engineers, product managers, and platform specialists in our top customer account.
This breakthrough coverage not only raises the bar for industry standards, but also illustrates the transformative impact of the Lightup platform.
But, no matter what challenges or surprises we’ve faced, I’ve remained grateful for every lesson, every learning opportunity, every chance to make things better as we continue to focus on serving our customers’ needs and solving business-oriented Data Quality problems.
Highly complex, these issues weren’t just about running technical Data Quality checks; these were deeper issues that required a different approach and different analysis to understand how to fix the underlying Data Quality incidents.
During this season of gratitude, I wanted to pull back the curtains and look at how and why we got here. Why? Because without our ups and downs, without our ability to pivot quickly and respond to market demands, we wouldn’t have been able to grow, and we wouldn’t have been fortunate enough to be working with some of the brightest minds around, solving complex Data Quality challenges at Fortune 500 enterprises.
Data Quality Democratization
I can’t think about our journey without recognizing the impact of democratizing Data Quality. That part of Lightup’s DNA has been the crux of our success — it’s one of the critical elements for scaling Data Quality without fail.
By democratizing Data Quality and eliminating the need for coding expertise, Lightup has empowered hundreds of users within our enterprise accounts, enabling a spectrum of users — admins, editors, and viewers from data teams, support operations, and business departments — to engage meaningfully in the end-to-end Data Quality management process.
This new approach has helped address historical barriers that have hindered many, if not most, Data Quality initiatives. By shifting away from the legacy approach that relies on specialized coding skills, we’ve opened the door for non-technical line of business users to contribute to the Data Quality life cycle.
Our commitment to making Data Quality more accessible has resonated with our customers, who view our approach as a vital enabler of their success.
Key Learning: Make Data Quality a Business-Driven Success
As we’ve grown, we’ve learned that achieving Data Quality at scale requires democratizing it across the entire organization. In the past, Data Quality was often viewed as the responsibility of a small data team of specialized data engineers.
But the biggest learning is that to truly scale Data Quality, it needs to be broader, more inclusive, and integrated into the business itself. It’s not just about automated checks — it’s about operationalizing business-specific Data Quality checks, so everyone within the business — from the data team to business users — plays a role in defining rules for good data. From there, a collaborative workflow must be established to remediate issues and prevent future incidents from occurring to maintain high Data Quality standards.
Shifting from Technical DQ to Business-Focused DQ
As Data Quality has evolved, we’ve seen a shift from needing purely technical Data Quality checks to more demand for business-focused Data Quality checks.
While prebuilt rules and models are helpful, they can’t always capture the nuances of each organization’s business processes. And that’s where implementing business-focused Data Quality checks becomes extremely useful and important.
Key Learning: Data Quality for the Business, Not Just IT
Data Quality is no longer just an IT tool — it’s a tool for the business. With a no-code user interface (UI), Lightup is designed to be user-friendly, intuitive, and accessible to non-technical business users. Unlike legacy tools, there’s no proprietary rule engine to learn.
Our customers want to express their business logic in SQL because it’s a familiar language. But even if your business users don’t know SQL, Lightup still enables them to define their own no-code Data Quality checks.
That’s creating another shift in data culture, where business users are empowered to get involved in Data Quality, so they can actually automate the data validations they’d be doing manually on their own in Excel.
The impact of Lightup’s approach is clear: Data Quality is now something that’s visible, operational, and essential for business success.
The Evolving Role of the CDOs as Facilitators of Data Quality Democratization
For CDOs, the mandate to create real, measurable business value has been historically challenging. What we’ve seen is that when CDOs broaden participation in Data Quality to include non-technical business users from every department, Data Quality can be successfully incorporated into cross-functional processes for business operations — instead of being a siloed data team task.
With Lightup’s approach, CDOs become the facilitators or enablers for the entire organization, fostering data literacy and involving line of business data experts in Data Quality processes.
Data Quality becomes decentralized and democratized across the organization — and that’s the “recipe for success.”
Key Learning: Democratization Through Accessibility Accelerates CDO’s Impact
Data Quality isn’t just the responsibility of the Chief Data Officer (CDO) and the data team anymore. Data Quality is critical for business operations, so it must involve the business’ data experts — those who truly understand the context and nuances of their data — to define rules and help monitor Data Quality as a shared responsibility for maximum impact.
That’s how you reach sustainable Data Quality at enterprise scale. Without broad participation, the process can’t scale. And without scale, there’s no impact.
But that wasn’t immediately obvious to us when we first started.
The Power of Pushdown Architecture
As data increases, the traditional approach of moving data around simply doesn’t work anymore. We’ve seen that enterprise Data Quality initiatives simply can’t scale by moving and copying data to different locations for validation and testing.
Instead, the business logic and Data Quality rules must be brought directly to where the data lives. That’s why we designed Lightup around a pushdown architecture, with the ability to push Data Quality rules directly down to the data source environment for in-place processing.
Key Learning: Pushdown Enables Scalability
Lightup’s modern pushdown architecture was the right design. We followed our instincts and it turned out to be one of the key technical reasons we’ve been able to support large enterprises with highly complex data environments, even as a relatively young company.
Leveraging AI/ML for Anomaly Detection and Scalability
In the past, implementing Data Quality checks was a tedious, manual process — with development and testing cycles taking weeks to months to complete. But technology has evolved. And so have our customers’ expectations.
To meet those expectations, we’ve offered more automated functionality out of the box:
- Using AI/ML for Anomaly Detection as an obvious starting point, so we included prebuilt Data Quality-specific models for detecting anomalies, identifying trends, and analyzing seasonality.
- Preconfigured metrics help save time and eliminate as much manual effort as possible for basic checks.
- Sliced metrics use AI/ML to take a single metric configuration and automatically expand it to granular checks across every slice or dimension, resulting in productivity gains of 10x or more. (i.e., Instead of writing a volume check for each of your 14,000 restaurants, 200 markets, or 1,000 stores, Lightup applies that single check to every subcategory in that dataset.)
Key Learning: Increase Productivity with AI-Based Automation
Providing out-of-the-box checks and prebuilt Anomaly Detection models has proved to be very useful and valuable to our customers. We focused on integrating more AI/ML capabilities to automate as much as possible and make it even easier for users to interact with Lightup.
This level of automation has allowed our customers to implement Data Quality initiatives at an enterprise scale far beyond their initial expectations — all without the need for extensive coding resources or specialized consultants.
And now it’s clear we need to use AI to simplify the user experience even more by bringing GenAI to Lightup.
Growing Visibility at the C-Level
It’s been fascinating to see how Data Quality has become more and more visible at the C-Suite level, even in large public companies.
As Data Quality touches every part of the business — from reporting to customer-facing applications to regulatory compliance — it’s clear that Data Quality isn’t just an IT issue. It’s a critical component of business growth, customer experience, and overall success.
Key Learning: Successful Data Quality Programs Have C-Level Buy-In
We’ve found that the most successful Data Quality initiatives are those that have strong buy-in from leadership, which can help ensure that it’s handled as a business-critical priority.
Data Quality as an Operational Asset
Data is no longer just an asset for analytics — it’s become an operational asset. Data now drives everything from customer-facing applications to complex business processes, often with minimal to no human intervention.
As a result, Data Quality for these operational pipelines needs to be “hands-off” or autonomous — proactively managed, monitored, and corrected without relying on manual intervention.
Key Learning: Bring DevOps Principles to DQ
Much like DevOps principles in software development, instead of waiting for business stakeholders to react to issues, Data Quality systems need to operate continuously and autonomously.
This means moving away from static Data Quality reports and leveraging real-time incident alerts that trigger automated workflows and support tickets. The end goal is to build workflows that support a wide range of technical and non-technical users and departments, ensuring Data Quality is continuously monitored and fine-tuned across all parts of the organization.
Embracing the Enterprise Data Stack Sprawl
Today’s enterprise data stack isn’t just about a single, unified system. It’s a sprawling, diverse collection of tools, platforms, and data sources. From legacy systems to the latest cloud data platforms, organizations are working with a wide array of technologies to meet their business needs.
Building Lightup so it integrates across different data sources wasn’t simple. It required significant investment in engineering and a complete reworking of our codebase to ensure we could quickly add new connectors without disrupting the rest of the system.
However, once we achieved this, we knew we could support the increasingly fragmented enterprise data environments we were seeing. The ability to bring all this data together for Data Quality monitoring is a key differentiator for us — and our customers.
Key Learning: Design for Data Stack Sprawl
Our initial assumption about enterprise environments was wrong: Data stack sprawl is here to stay. With that, we knew it was critical for Lightup to support diverse data sources, from 20-year-old systems to brand-new deployments. This enables enterprises to operate more effectively across their growing hybrid and multi-cloud data environments.
Lightup was designed to support these multi-technology environments. Our customers can connect and manage data from any part of their stack, making Data Quality an integral part of their business operations, regardless of the age or complexity of the data source.
Deployment and Data Residency: Critical Considerations for Enterprises
When it comes to deploying Data Quality solutions, enterprises need flexibility and control — especially around cloud infrastructure and data residency. The vast majority of our customers, approximately 90%, require deployment within their own private cloud environments. This is simply how modern enterprises operate, and it means that designing for cloud compatibility can’t be an afterthought.
From day one, we knew our solution had to be cloud-agnostic, able to run seamlessly across multiple cloud platforms without being tied to any one provider. Whether it’s AWS, Azure, Google Cloud, or others, Lightup can be deployed in any of them, ensuring that customers have the freedom to deploy where it makes the most sense for their business.
Key Learning: Cloud-Agnostic for Flexibility
What we’ve learned is that today’s data landscape is highly diverse — and that diversity multiplies every year. While we once thought that enterprises would gravitate towards a single data platform like Databricks or Snowflake, the reality is much more complex.
Many businesses are running both legacy systems and modern platforms in parallel. Data environments are not standardized, and we’ve had to embrace this diversity as an opportunity, not a hindrance. We see this as a chance to build a more extensible, flexible solution that can handle this complexity.
To support this, we’ve focused heavily on making Lightup portable and extensible, making it easier for customers to deploy Lightup anywhere and add new connectors quickly for seamless integration with virtually any data source — old or new.
This approach not only ensures that our platform can scale with an enterprise’s evolving tech stack, but it also enables our customers to compare and work with data side by side from sources that may span decades with new, cutting-edge systems.
Building an Extensible Data Quality Platform with Seamless Integrations
For Data Quality to be successful, it must integrate seamlessly into the enterprise’s existing technology ecosystem. Every enterprise has its own data stack, its own set of requirements, and its own preferred tools.
Our platform is designed to be modular and flexible, allowing for easy integration with existing systems — whether that’s a data catalog, an IT service management (ITSM) platform, or a data visualization tool. This flexibility is critical for ensuring that Data Quality becomes a natural part of the workflow rather than an afterthought or a standalone SaaS application.
Key Learning: Build in Extensibility
Our design intuition was right. We built Lightup to be a highly extensive infrastructure tool, supporting complex enterprise data stacks — with different integration points to accommodate legacy and modern cloud applications.
The Importance of Fast Time-to-Value
Historically, Data Quality initiatives have been slow, complex, and burdensome, which ultimately leads to abandoned tools or failed projects.
Turns out, one of the not-so-obvious technical drivers of Data Quality success is time-to-value or how fast an organization can deploy a Data Quality solution and start seeing meaningful results. That’s a critical factor for an initiative’s success or failure.
While we already knew AI/ML could be used to enhance our anomaly detection capabilities by automatically spotting bad data at much wider scale than people ever could, we didn’t realize that was just part of the equation for delivering fast time-to-value.
Here’s why. Most of the complex problems we were seeing revolved around business-specific scenarios. That is, business-specific definitions of good and bad data based on the context, business process, and/or use within the company.
For example, basic technical checks — such as volume, freshness, or value too high or low — won’t be able to detect if a value has been rounded. Why would that matter? Well, if your reporting has to be precise for ESG, for instance, then rounded numbers would be insufficient because that would be considered bad data — not fit for purpose.
We quickly realized that monitoring for and detecting those kinds of subtle nuances or subjective requirements is challenging for nearly every organization because they’re too business-specific. They can’t be automatically inferred from the data based on out-of-the-box Data Quality checks or monitoring.
Only the data subject matter experts who know the contextual use of the data within the business can define the characteristics of good and bad data. And that requires a customized check — something other platforms on the market didn’t support well or make accessible to non-technical business users.
Key Learning: Accelerate Time-to-Value by Unlocking Business-Specific DQ
With crystal-clear 20/20 hindsight, we realize that the value of Lightup is that it dramatically accelerates time-to-value, especially when it comes to unlocking all the nuanced business-specific Data Quality requirements that business stakeholders already know.
And that’s made all the difference for our customers — and their success.
Driving Success with Close Partnerships
One of the most unexpected lessons we’ve learned is that building a successful Data Quality initiative requires a close partnership with customers. Why? Data Quality isn’t just about providing a tool — it’s about aligning that tool with the customer’s unique business-specific rules and processes.
Every organization’s data needs are different, and the best Data Quality solutions are those that can become part of the enterprise’s business processes. Our customers understand that Data Quality isn’t just a technical issue that needs to be solved — it’s about the full life cycle of being able to detect incidents and get alerted so the right fix can be implemented before the business is negatively impacted.
Key Learning: Innovate with Customer Partnerships
Lightup’s novel approach to addressing the end-to-end Data Quality life cycle requires strong partnerships with customers to help them get up to speed quickly and adopt best practices.
From meeting compliance standards, to managing customer loyalty programs and global inventory systems, Data Quality impacts business operations in profound ways.
This realization was only possible through our close partnerships with our customers. As a result, we’ve supported our customers’ needs by developing new innovative features and capabilities that we never anticipated at the beginning — such as virtual tables to help with SOX compliance.
Trust and Control: Non-Negotiable Success Factors
Finally, one of the most important lessons we’ve learned is the importance of trust. Our customers need to feel that they have control over the Data Quality process. And they need to trust that the system is working as expected.
That’s why we’ve built in features like backtesting, previews, and feedback loops, so that our customers can continuously evaluate and adjust Lightup’s performance.
Key Learning: Copilot Supervision Over Autopilot
When it comes to building robust Anomaly Detection at scale, some false positives are unavoidable, especially in the beginning for newly trained models. But we’ve worked closely with our customers to keep them to a minimum, making it easy for users to flag issues and fine-tune Lightup over time.
What’s important is to keep the users in control with a transparent system that doesn’t hide what it’s doing. Turns out, blackbox unsupervised learning that tries to automates everything isn’t always effective, especially because there’s no recourse when it fails.
They need a copilot setup where they stay in the driver’s seat and know exactly what’s happening within Lightup.
Fostering Collaboration Through Role-Based Access
Collaboration is a core component of democratizing Data Quality. Our solution includes Workspaces with role-based access, allowing teams across departments to collaborate while still maintaining data privacy and security.
These Workspaces help departments manage their data and projects separately, while also enabling them to share relevant data or insights with other teams as needed.
Key Learning: Meet Enterprise Requirements with Different User Roles
In enterprise environments, having different user access permissions is essential for ensuring that Data Quality can be managed at scale, while meeting security, privacy, and regulatory requirements.
An Agile and Realistic Approach to Data Infrastructure Transformation
Lightup isn’t an over-the-top SaaS application — it’s a core data infrastructure component, built to become an essential part of the data ecosystem.
However, unlike the legacy waterfall approach to implementing Data Quality as a data infrastructure component, Lightup enables a less risky, more agile approach that ensures step-by-step success.
Our enterprise customers have been successful with their Data Quality initiatives because Lightup enables them to start small with a high-impact use case to show proof of value immediately. From there, they can build on that success to showcase more quick wins.
And since Lightup is portable and extensible, our customers can add connectors as they go, instead burdening IT with a resource-intensive initiative that disrupts the entire infrastructure architecture.
Key Learning: Start Small, Scale Fast, Amplify Success
Most Data Quality implementations fail or eventually end up abandoned when they become centralized IT initiatives. Why? Because they try to “boil the ocean” all at once, instead of focusing on the business-critical use cases first to show business value.
Initially, we didn’t realize how valuable it would be to our customers to be able to deploy Lightup in their cloud environments so quickly, without enterprise infrastructure downtime or disruptions.
But, we’re proud to share that we’ve seen our customers transform their data infrastructure with an agile approach of starting small, scaling fast, and amplifying that success across the enterprise for record-breaking Data Quality monitoring coverage.
Looking Ahead: The Future of Lightup
Back when we first launched Lightup, “AI” used to be synonymous with “machine learning.” And not too long ago, “data” used to refer to structured data. But now, “AI” means “GenAI,” “GenAI” means “LLMs,” and “data” means “structured and unstructured data for GenAI and LLMs.”
But, GenAI isn’t just a buzzword or the next cryptocurrency fiasco — it’s here to stay. And that was a major tectonic shift in the market we didn’t expect.
That’s made us take stock of where we are, how we got here, and how we’ll transition to the next phase of growth. We still see AI and ML as crucial components for accelerating time-to-value and automating processes, making it easier to monitor, manage, and remediate Data Quality issues across the enterprise.
Looking to the future, we’re investing heavily in more AI — that is, GenAI — to expand Lightup’s feature set to include observability for Large Language Models (LLMs) and monitoring for unstructured data. We’re also exploring automated AI-driven workflows to make it even easier for enterprises to monitor Data Quality with more out-of-the-box checks or even vertical-specific checks by use case.
Ultimately, every step we’ve taken and every decision we’ve made has been guided by our core mission: To make our customers successful, helping them scale Data Quality like never before while achieving real business value from their data.