Towards a Decentralized Data Consumption Model
Introduction
The application and service portfolios of most large organizations tend to grow organically over time as separate applications, driven by strategic business needs, built on multiple technology stacks, databases, communication protocols/formats, and hosting models. Each application may have its own implementation of commonly used business entities like Customer, Store, Product, Supplier, and others most appropriate for its set of use cases.
While this decentralized approach has its advantages, there is also great value in further developing the organizational understanding of the data which comprise the information models of the software products. This calls for a unified data and service consumption model for an enhanced customer experience, without requiring large scale re-architecture and governance efforts.
With such a model, an organization would be better positioned to:
Design combined experiences for multiple applications for reporting and dashboards.
Plan ahead on data re-use and enriching from one application to the other.
Effect integrations and expand capabilities where intersections of data are currently recognized.
Typical Applications Portfolio
As an example, if we take an industry like retail, multiple applications may be serving different parts of the business, with a set of common or closely related entities:
For an entity like Product, for example, although conceptually well-understood, a different subset of attributes may be important to different applications. For the Point of Sales application, it is the sale price, for the Finance application, both sale and purchase prices, and for the Inventory application, it is instead the stock count.
A federated, decentralized architecture like this has many advantages:
Each application is responsible for maintaining its own data and services.
Each application is designed to satisfy its own set of use cases.
Each application is maintained by a dedicated team of product and technology people who are experts in the domain the application belongs to.
It is easier to build and deploy an application without impacting other unrelated applications.
Although the applications are largely self-contained, data needs to be shared across them to keep the current state in sync. In addition, there may be value in building an integrated view across the separate business domains, to satisfy use cases like the below:
As a user experience specialist, I would like to build a portal for internal users of these tools, using a standard vocabulary for a uniform user experience.
As a product developer, I would like to know if the data in the two products could potentially be enhanced by or be in conflict with one another.
As an integration solution developer, I would like to be able to see a data dictionary covering likely integration datapoints with what is known about them, to reason about the solution being designed.
Building an integrated view from decentralized applications is difficult as:
Data is maintained in several places, and lacks a common vocabulary, each entity having its own definition and set of attributes, potentially causing redundancy and confusion.
Each application satisfies its own data quality, security, and access requirements which need reconciliation when integrating with another.
It is difficult to ensure timely synchronizations of data across applications potentially leading to decision-making based on stale data.
Integrated Data Sharing Approach
Traditional approaches to data sharing are built on a centralized governance and operating model, which is often difficult to execute for the following reasons:
It represents a large effort requiring coordination across many parts of the business.
Centralized representation creates coupling and forces synchronization.
Attempts to create a single view of an entity which can satisfy the requirements of different applications can lead to additional complexity and data quality issues, for example, for data warehouse implementations.
It can lead to increased costs, lack of agility, and slow innovation cycles.
Any changes to a single application can become a global concern, impacting multiple applications.
Modern approaches to enterprise data integration tend to be more decentralized, or “domain-driven,” aiming to link data producers directly to data consumers “shifting left” the responsibilities for data ingestion, preparation, and transformation into the domains.
An application may be viewed as standalone, comprising multiple business entities with input and output mechanisms for data ingestion and export.
The entities in the application belong conceptually to a business domain (ideally) or in practice even to several domains, unless carefully designed using domain-driven design principles. In any case, it is important to define the business concepts these entities implement in a domain-specific language and define mappings across domains.
Interactions among the various applications can be built by applying a standard set of patterns of data production, consumption, mapping, and policies, over an interoperability infrastructure.
Some of these patterns are:
Data consumption patterns
Industry-standard formats: JSON, CSV
Mode: Batch, Stream, APIs
Schema: JSON Schema
Policy patterns
Role-based access
Privacy and security considerations (encryption, de-identification etc.)
Processing patterns
Inbound processing
Data mapping and transformation
Outbound processing
Interoperability infrastructure
Storage
Transformation
Workflow and orchestration
Secrets management
Identity and access
Logging, monitoring, alerting
Input and output processing for multiple formats
The benefits of using such an approach to integrate across multiple independent but related applications are:
Alignment of business domain, technology, and data
Clear ownership of both applications and data products
Standardization of domain-specific “data language”
Closing the gap between operational and analytics data
Faster innovation cycles
Reduced cost of ownership
Automated, simplified patterns for integration and consumption
Attainment of higher-order value by interconnecting entities
Conclusion
The demands of digital transformations require organizations to quickly build standalone solutions to address market demands. However, these solutions often lack a central governance or oversight with respect to common data model or standardized communications infrastructure. A wholesale re-architecture of applications and corresponding data models across the enterprise is often too expensive and takes too much time. Modern approaches to integration encourage keeping the domain-specific applications separate, mapping between the domain vocabularies as needed, while using an integration layer to standardize information exchange.