Introduction to Data-Centricity (2024)

Welcome to Fluree’s series on data-centricity. Over the next few months, we’ll peel back the layers of the data-centric architecture stack to explain its relevance to today’s enterprise landscape.

Data-centricity is a mindset as much as it is a technical architecture — at its core, data-centricity acknowledges data’s valuable and versatile role in the larger enterprise and industry ecosystem and treats information as the core asset to enterprise architectures. Opposite of the “Application-Centric” stack, a data-centric architecture is one where data exists independently of a singular application and can empower a broad range of information stakeholders.

Freeing data from a single monolithic stack allows for greater opportunities in accelerating digital transformation: data can be more versatile, integrative, and available to those that need it. By baking core characteristics like security, interoperability, and portability directly into the data-tier, data-centricity dissolves the need to pay for proprietary middleware or maintain webs of custom APIs. Data-Centricity also allows enterprises to integrate disparate data sources with virtually no overhead and deliver data to its stakeholders with context and speed.

Data-Centric architectures have the power to alleviate pain points along the entire data value chain and build a truly secure and agile data ecosystem. But to understand these benefits, we must first understand the issues of “application-centricity” currently in place at the standard legacy-driven enterprise.

The application boom of the ’90s led to increased front-office efficiencies but left behind a wasteland of data-as-a-byproduct. Most application developers were concerned with one thing: building a solution that worked. How the application data would be formatted or potentially reused was secondary or perhaps out of sight.

Businesses quickly realized that their data has a value chain — an ecosystem of stakeholders that need permissioned access to enterprise information for business applications, data analysis, information compliance, and other categories of data collaboration. So, companies invested in building data lakes — essentially plopping large amounts of data, in its original format, into a repository for data scientists to spend some time cleansing and analyzing. But these tools simply became larger data silos, introducing even higher levels of complexity.

In fact, 40% of a typical IT budget is spent simply on integrating data from disparate sources and silos. And integrating new data sources into warehouses can take weeks or months — which is a far cry from becoming truly “data-driven.”

In the application-centric framework, data originates from an application silo and trickles its way down the value chain with little context. To extract value from this data is a painful or expensive process. Combining this data with other data is an almost impossible task. And delivering this data to its rightful value chain is met with technical and bureaucratic roadblocks.

These are not controversial claims. According to an American Management Association survey, 83% of executives think their companies have silos, and 97% think it’s having a negative effect on business.

Let’s explore how these data silos continue to proliferate, even after the explosion of cloud computing and data lake solutions:

Today, developers build applications that, by nature, produce data.

Middle and back office engineers build systems to collect the resultant data and run it through analysis, typically in a data lake or warehouse.

Data governance professionals work to ensure the data has integrity, adheres to compliance, and can be re-used for maximum value.

In order to re-use or share this data with third parties or another business app, it needs to go through processes of replication, cleansing, harmonization, and beyond to be usable. Potential attack surfaces are introduced at every level of data re-use. Complexity constrains the data pipeline with poor latency and poor context.

In other words, data is not armed at its core with the characteristics it needs to provide value to its many stakeholders — so we build specialized tools and processes around it to maximize its analytical value, prove compliance, share it with third parties, and operationalize it into new applications. This approach may have worked for ten or so years — but the data revolution is not slowing and these existing workaround tools cannot scale. Our standard ETL process is slow and expensive, and data lakes become data silos with their own sets of compliance, security, and redundancy issues.

But there is a better way — a path to data-centricity — that flips the standard enterprise architecture on its head and starts with an enterprise’s core asset: data.

Industries are moving towards data ecosystems — an integrative and collaborative approach to data management and sharing. Here are just a few examples:

  • Data-driven business applications today touch many internal and external stakeholders (sales, HR, marketing, analysis, compliance, security, customers, third parties…). There is a clear need to collaborate more effectively and dynamically on data that powers multiple applications across multiple contexts.
  • Enterprises are building (many for the very first time) a master data management platform for a 360-degree-level view into their master data assets. This can be as simple as building a “golden record” customer data repository to cut down on redundancies in data silos and implement more centralized data access rules. Next-generation MDM solutions are now making their “golden record” repositories operational — where their master data repositories directly power applications and analysis from the same source of truth. Data-centricity is essential here.
  • Enterprises are creating data “knowledge graphs” that link and leverage vast amounts of enterprise data under a common format for maximum data visibility, analytics, and reuse.
  • More advanced enterprises are building “data fabrics,” a hyperconverged architecture that focuses on integrating data across enterprise infrastructures. Data Fabrics (in theory) provide streamlined and secure access to disparate data across cloud and on-prem deployments in an otherwise complex distributed network environment.
  • Enterprises are realizing the value of “data marketplaces,” where “golden record” information can be subscribed to within a data-as-a-service framework.

To accommodate these data-driven trends, we need to build frictionless pipelines to valuable data that is highly-contextual, trusted, and interoperable. And we need to answer emerging questions around data such as:

  1. Data Ownership: Who owns the data, and how is privacy handled?
  2. Data Integrity: How do we know the data has integrity?
  3. Data Traceability: Who/When/How was it originally put into the system? How has that data changed over time, and who has accessed that data?
  4. Data Access Permissions: Who should be able to access the data or metadata, and under what circ*mstances? How can we change those security rules dynamically?
  5. Data Explainability: How do we trace back how machines arrived at specific data-driven decisions?
  6. Data Insights: How can we organize our data to maximize value to its various stakeholders?
  7. Data Interoperability: How do we make data natively interoperable with machines and applications that reside within and outside of our organization?

Fluree is a data management platform that extends the role of the traditional database to answer these above questions. Fluree breaks its core “data-stack” into 5 key areas: trust, semantic interoperability, security, time, and sharing.

Is Fluree a database, a blockchain, a content delivery network, or a more dynamic, operational data lake?

It seems we could silo off Fluree as a technology to fill any of those roles, but its value should be realized in the greater “data value chain” context. Fluree’s data-centric features work together to enable the data environment of any CIO’s, CTO’s, and CDOs’ dreams. Secure data traceability and audit trails. Instant knowledge graph with RDF semantic standards. Blockchain for trusted collaboration. Scalable graph database with in-memory capabilities to power intelligent, real-time apps and next-generation analysis.

But these concepts can feel overwhelming, especially for the business that has always worked in silos of data responsibility. So, we decided to conceptualize each component to the data-centric stack in this 5 part series. Stay tuned for part 1 on “data trust.”

Introduction to Data-Centricity (2024)

FAQs

What do you mean by data centricity? ›

Data centricity is the concept and practice of positioning data as a core, fixed asset that does not change regardless of the technology that uses it. To be truly data-centric, organizations must start with a holistic data framework.

What is an example of a data centric approach? ›

Data-centric AI can take one of two forms: AI algorithms that understand data and use that information to improve models. Curriculum learning is an example of this, in which ML models are trained on 'easy data' first [B09]. AI algorithms that modify data to improve AI models.

What is data-centric technology? ›

A data-centric approach focuses on using data as the starting point to define what should be done. Data-centric companies place data science at their core. With a data centric architecture, data is a primary and permanent asset, whereas applications change.

What is a data centric way of working? ›

Data-driven working is the consistent, daily, weekly analysis and steering based on (raw) facts and data. You let data guide you in your daily work. The goal is to gain more insight and knowledge daily and/or weekly and make more informed decisions based on reliable data.

Why is data centric important? ›

The Benefits of a Data-Centric Approach

Data is the fundamental building block of any machine learning or AI model. A data-centric approach ensures that only high-quality, relevant data is utilized in training models, thereby increasing the likelihood of producing accurate and useful results.

What is the data centric principle? ›

What is the most important principle for building large distributed, enterprise level systems? It is the data-centric principle : to conceive of data as existing separately from application at ever greater levels of granularity in the system.

What are data centric concepts and methods? ›

With a data-centric development strategy, using high-quality data—by identifying inconsistencies, labeling data correctly, and removing redundancies—significantly improves a model's accuracy. Often, it results in better predictions or outcomes than repeatedly adjusting the model trained on a faulty dataset.

What is data centric decision-making? ›

Data-driven decision making is the process of collecting data based on your company's key performance indicators (KPIs) and transforming that data into actionable insights. You can use business intelligence (BI) reporting tools during this process, which make big data collection fast and fruitful.

What is the difference between data centric and data informed? ›

Data informed companies are collecting and using data, but only through simple techniques (e.g. charts). Data-driven companies are using algorithms in decision making. Data-centric companies place data science at the core (e.g. Google).

What is data centric planning? ›

There are several indicators that allow you to say that: A data-centric approach places data at the core of an organization's operations and decision-making processes. Data is treated as a fundamental asset and is central to all activities. Decision-making is heavily influenced by data-driven insights.

What is data centric protocol? ›

In data-centric protocols, nodes do not have any global identification numbers, so they send data directly or through intermediate nodes to the sink. Energy Aware Routing Strategies for an Evolving Wireless Sensor Network: A Survey. Solmaz Salehian (Oakland University, USA), Shamala K.

What is data centric consistency? ›

Data-Centric Consistency Models

Consistency model: a contract between a (distributed) data store and processes, in which the data store specifies precisely what the results of read and write operations are in the presence of concurrency.

What is data-centric workload? ›

Data-centric workloads

Existing limits of data center hardware also restricts complete movement, management and utilization of unstructured data sets. Conventional CPUs are impeding performance because they do not include specialized capabilities needed for storage, networking, and analysis.

What is a data centric culture? ›

It's a culture where data literacy is widespread, and data-driven insights are the norm rather than the exception. It is a concept that encapsulates behaviors, attitudes, and practices, fostering an environment where data is readily available, trusted, and used consistently to drive actions.

What is data centric paradigm? ›

Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm that emphasizes the importance of enhancing data systematically and at scale to build effective and efficient AI-based systems.

What is the difference between document centric and data centric? ›

Document-centric documents follow a narrative structure, while data-centric documents are ordered based on logical criteria, such as alphabetical order or user-defined retrieval order. These document types can also contain elements from the other type, combining narrative and logical structures.

What is an example of data centric security? ›

For example, having a data inventory can help organizations identify and classify sensitive data, assess the risks associated with different types of data, and implement appropriate controls to protect that data.

Top Articles
Latest Posts
Article information

Author: Velia Krajcik

Last Updated:

Views: 6349

Rating: 4.3 / 5 (74 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.