The Road to InfoOps - CD Foundation

Contributed by Victor Lu and Jesse Saffran, DataOps Initiative Contributors

In a way, traditional DataOps is evolving to encompass new use cases, potentially transforming into what we might call InfoOps. For instance, GitOps was originally developed as a DevOps practice to automate infrastructure and application deployments using Git as the single source of truth. Git repositories, once solely used for storing software code, now serve as the foundation for Kubernetes infrastructure-as-code solutions like ArgoCD and Flux. With advancements in AI/ML, AI can generate software code much like a human engineer. As a result, code stored in GitHub is becoming a source of data for AI models. What was once a core artifact of DevOps is now deeply integrated into MLOps/ModelOps, and DataOps frameworks.

This post is split into four parts (click a link to skip ahead):

1. History of DataOps

DataOps, short for “Data Operations,” has undergone significant evolution since its inception, expanding well beyond its original definition to address the growing complexities of data management and analytics.

Origins and Early Definitions

First introduced around 2014, DataOps was conceived as a set of best practices aimed at improving the quality and reducing the cycle time of data analytics by integrating Agile methodologies into data processes.

Evolution and Expansion

Over the years, DataOps has evolved from a collection of best practices to a comprehensive approach that combines elements of DevOps, Agile, Lean, and Total Quality Management (TQM). This evolution reflects a shift towards creating greater business value from big data by aligning data management practices with business goals.

In 2017, DataOps gained significant traction, marked by ecosystem development, increased keyword searches, and the emergence of open source projects. Gartner recognized DataOps in its 2018 Hype Cycle for Data Management, signaling its growing importance in the industry.

Modern Interpretations

Today, DataOps is viewed as a transformative approach that automates and optimizes the entire data product lifecycle. It aims to create a factory-like environment for data processing, where data flows seamlessly from ingestion to insight.

This perspective emphasizes the need for collaboration, orchestration, quality, security, accessibility, and ease of use in data operations.

Integration with Other Methodologies

The evolution of DataOps also reflects its integration with other methodologies. While it borrows principles from DevOps, DataOps is not merely “DevOps for data.” Instead, it represents a distinct approach that addresses the unique challenges of data analytics, emphasizing continuous improvement and collaboration across various roles, including data scientists, analysts, engineers, IT, and quality assurance professionals.

2. Role of Metadata in DataOps

Metadata is becoming increasingly critical across various domains, extending beyond software code to AI models and datasets. With the rise of AI/ML and the growing complexity of software supply chains, metadata plays a key role in ensuring transparency, security, and regulatory compliance.

Expanding the Role of Metadata

Originally, metadata in software development primarily focused on source code documentation, versioning, and dependencies. However, with AI advancements, metadata now encompasses details about AI models—including their training data, hyperparameters, lineage, and performance metrics. Similarly, datasets require extensive metadata to track provenance, transformations, biases, and quality assessments.

Regulatory Push for Metadata Transparency

New regulations, such as the EU Cyber Resilience Act (CRA), are reinforcing the need for structured metadata. The CRA mandates Software Bill of Materials (SBOM), ensuring that organizations track and disclose all components in their software supply chain. This is crucial for security, as AI-generated software is increasingly integrated into critical systems.

Beyond SBOM, initiatives like Model Cards (for AI models) and Datasheets for Datasets are becoming essential to improve accountability and reproducibility in AI development. These metadata structures provide necessary details on risks, limitations, and intended uses, aligning with regulatory demands and best practices in responsible AI.

Metadata as a Foundation for Future Governance

As AI-driven automation reshapes DevOps, MLOps, and DataOps, metadata governance will be key in managing risks, enabling explainability, and ensuring compliance. Organizations must move beyond just tracking software components and extend metadata strategies to cover AI models and datasets, building a comprehensive framework for trustworthy and auditable systems.

3. Bridging DevOps, DataOps, and ModelOps: A Unified Approach

Traditionally, DevOps focused on automating software development and deployment pipelines, enabling faster and more reliable software releases. However, with the rise of data-driven applications and AI/ML models, the boundaries between DevOps, DataOps, and ModelOps are becoming increasingly blurred. To keep pace with modern requirements, traditional DevOps workflows must be seamlessly integrated into DataOps workflows, and vice versa—while also considering ModelOps requirements.

Why Traditional DevOps Needs to Integrate with DataOps

DevOps frameworks such as Jenkins, Tekton, ArgoCD, and Flux were originally designed for software applications, focusing on CI/CD pipelines for code deployment. However, today’s data-centric architectures demand a workflow where:

Data is treated as a first-class citizen—just like source code.
GitOps principles extend to data versioning, governance, and lineage tracking.
Observability and monitoring include not just application performance but also data pipeline health.

By integrating traditional DevOps workflows into DataOps, organizations can ensure that data pipelines are version-controlled, tested, and deployed with the same rigor as software applications.

Why DataOps Needs DevOps

On the other hand, DataOps frameworks (such as Apache Airflow, Dagster, and Prefect) often focus on orchestrating ETL pipelines and managing data workflows. However, they lack standardized DevOps principles such as:

Infrastructure as Code (IaC) for deploying and managing data infrastructure.
Automated testing for data quality before it is used by downstream applications.
Continuous delivery mechanisms to safely update and deploy data pipelines.

By embedding DevOps principles into DataOps, organizations can achieve faster, more reliable, and reproducible data workflows.

The Growing Role of ModelOps

Beyond DevOps and DataOps, ModelOps (AI/ML operations) brings additional complexity. ML models are not just software artifacts; they are dynamic, evolving assets that require:

CI/CD for ML models—automating model training, validation, and deployment.
Model versioning and rollback strategies, similar to software versioning.
Data lineage tracking, ensuring that ML models are trained on reproducible datasets.

Frameworks such as Kubeflow, MLflow, and Seldon Core are designed to orchestrate ML model lifecycles, but they must integrate DevOps workflows for deployment automation and governance.

Towards a Unified DevOps-DataOps-ModelOps Workflow

To achieve seamless integration, organizations need:

GitOps-based workflows where both software code and data pipelines are managed in repositories (e.g., ArgoCD, Flux).
CI/CD pipelines that extend beyond code—automating not only software deployment but also data validation, ML model updates, and policy enforcement.
End-to-end observability using tools like OpenTelemetry, Prometheus, and Grafana to monitor application, data, and ML model performance.
Security and compliance automation, including SBOMsto meet regulatory requirements like the EU CRA.

Traditional DevOps tools must evolve beyond software-focused CI/CD to integrate deeply with DataOps and ModelOps. Likewise, DataOps and ModelOps must embrace DevOps methodologies to ensure robust, automated, and scalable operations. By unifying these workflows, organizations can achieve a fully automated, AI-driven, and data-centric DevOps ecosystem in multicloud and/or hybrid cloud environments.

4. Decoupling DataOps from Cloud-Specific Infrastructure and DevOps Pipelines

Discussions about DataOps in public cloud environments often revolve around cloud-specific infrastructure and DevOps pipelines. DataOps pipelines often emerge as an extension of the cloud-specific DevOps pipelines. The DevOps tooling and infrastructure automation provided by these cloud vendors naturally influence how DataOps workflows are built and managed. Many public cloud providers integrate DataOps principles within their own ecosystems, making it tempting to define DataOps in a way that is tightly coupled to a particular cloud vendor’s tooling and workflow.

However, DataOps should not be constrained by the infrastructure and DevOps pipelines of a single cloud provider. Instead, it should focus on foundational, cloud-agnostic principles, such as:

Independent Data Pipeline Automation: Implementing scalable and portable automation across multi-cloud and hybrid environments.
Cross-Cloud Metadata Management: Ensuring data lineage, governance, and observability beyond the boundaries of a single cloud provider.
Portable IaC for DataOps: Using vendor-neutral tools (e.g., Terraform, Pulumi) to define data infrastructure without cloud lock-in.
Interoperable Data Governance & Compliance: Addressing regulatory requirements (e.g., GDPR, EU CRA) in a way that works across different cloud ecosystems.
Decoupled public cloud specific CI/CD for Data Workflows: Building flexible, DevOps-driven data pipelines that are not restricted by the deployment model of a specific cloud.

By separating DataOps from cloud-specific infrastructure and DevOps pipelines, organizations can ensure adaptability, portability, and resilience in their data operations strategy, regardless of the underlying cloud provider.

Conclusion

In the evolving landscape of automation and AI-driven workflows, we must go beyond orchestrating just data flow—we now need to orchestrate information about software code, AI models, and datasets or even infrastructure supporting these workflows. These artifacts are not only operational assets but also sources of metadata that describe their lineage, dependencies, and transformations. Furthermore, with AI’s ability to learn from and generate software code, other models, and data, the boundaries between them blur, making metadata management even more crucial.

This shift necessitates a broader approach, where DataOps evolves into “InfoOps”—an ecosystem that unifies the orchestration of data, software, and AI models as interdependent elements. In this paradigm, software code is not just an artifact of DevOps, but also a dataset for AI; AI models are not just an outcome of ModelOps but also contributors to new data; and data is not just a resource for analytics but a foundation for AI-driven code and models. This interconnected flow of information demands a new level of metadata governance, automation, and orchestration, making InfoOps a critical evolution of DataOps.

💡 Interested in the future of DataOps? Join our DataOps Initiative