Posted by Chris Mc @ 9:01am GMT
In the last few years, enterprises have added several tools to their IT data integration toolbox to improve the return on their CRM investments. Yet, most of these companies have not achieved a simple goal: to create reliable, unified views of their customers – aggregated across data silos – and deliver these to all customer-facing applications in a timely fashion. Recently, companies have turned to three common technologies to create solutions for customer data integration. These are data movement tools such as Extract-Transform-Load (ETL), data query and aggregation tools such as Enterprise Information Integration (EII) and Data Quality (DQ) tools. However, what the tool vendors aren’t telling you is that these tools are woefully inadequate for developing a reliable Customer Data Integration (CDI) platform.
Customer Hubs Emerging
Industry Market Research firm Gartner Inc. defines CDI as "the combination of the technology, processes and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines and, enterprises, where there are multiple sources of customer data in multiple application systems and databases”. There are several implementation styles of CDI solutions but the most effective is where an enterprise commits to building and managing a customer hub that serves as a central repository of customer data reconciled from multiple data sources. This hub may contain some or all of the critical customer data needed to provide multiple customer views to downstream applications. While there are significant differences among the various customer hubs available, such as what type of data to persist and how much to aggregate dynamically, there is little doubt that a large, enterprise-class CDI solution needs a central customer hub.
Data Tools Ill-suited
In the past decade, many companies that tried to build an in-house version of a customer data integration hub using ETL, EII and DQ tools are now struggling with the aftermath of a custom solution. There are several reasons for the failure of CDI solutions built with these tools.
First, all three technologies originated for narrow purposes ill-suited for CDI: ETL to move large volumes of data in batch-mode; EII to run distributed queries across disparate sources in real-time; and DQ tools to “scrub” incorrect names and addresses one source at a time. Each of these technologies effectively supports only a single data-modality; batch or real-time. Since customer data is inextricably tied to both operational and strategic business processes of a company, such as order-to-cash process or profitability segmentation analyses, it needs to be delivered in time for each business process. Therefore, any customer data integration solution needs to support a range of modalities of data movement: from a large-volume batch process that loads a new source into a customer hub; to scheduled intra-day batches; to a publish-subscribe model for immediate updates of critical data. Tools designed for single-modality can quickly hamper the reliability and scalability of a CDI solution.
Treat Different Data Types Separately
To build a reliable CDI solution, it is imperative to treat different types of data separately, such as master reference data, relationship data or transaction data. Master reference data is the foundational entity data (such as name and address) that is critical for uniquely identifying a customer across multiple systems and channels. Without a persistent and trustworthy hub of customer reference or profile data that serves as the “system of record”, other types of data can not be aggregated reliably. Ideally, such a master store should create and maintain the best-of-breed record for each customer culled from all relevant internal and external data sources –at the cell or attribute level – along with the associated cross-reference keys. This store then becomes the best source of truth for customer profile information for all downstream operational and analytical applications.
The next type of data is relationship or hierarchy data. This type of data defines the relationships among various entities (such as individual to organization, organization to organization, or individuals within households). Relationship data can be managed reliably across different sources only after the underlying conflicts of master (entity) data have been resolved. Most of the custom solutions deployed have fixed relationships among entities embedded in the system’s data model, which makes it hard for IT to manage changes in customer relationships and affiliations.
The third type of data is transaction or activity data (such as amount withdrawn from an account). Although there are significant challenges in managing large volumes of transaction data, there is usually little conflict in reconciling such data since there is an unambiguous system of record for each type of transaction. The key issue lies in attaching these transactions correctly to the same customer across multiple CRM touch-points and then aggregating them accurately for other applications to consume (such as the average account balance). Note that transactions can be aggregated for the right customer or household only after the ambiguities of the associated master and relationship data have been removed.
Essentially, without treating different data types separately and establishing a reliable foundation of master data at the start, a trustworthy CDI platform can not be built. Yet, none of the earlier mentioned data tools maintain separation of data types. For instance, ETL tools neither recognize nor treat master data apart from other types of data. EII tools assume that all federated data results are clean and unambiguous; in fact, they rely on an external source to provide correct cross-reference keys and global IDs to accurately join the results of a federated query. DQ tools provide ad-hoc cleansing of a source but do not recognize data types nor offer on-going management of data changes.
Critical Sequence Needed to Achieve a Trustworthy Customer Hub
An important implication of treating the three data types separately is that there is a critical sequence in which a customer hub must be built, if it is to become a trustworthy enterprise-wide foundation for creating and delivering unified customer views. A comprehensive CDI framework for such a foundation must respect this sequence and include all the tools needed for various processes associated with managing different data types. As mentioned above, the logical sequence to building a central data hub is as follows:
1. First, identify and consolidate the most reliable reference or profile data on all customer entities across all data sources (the best version of the truth);
2. Second, determine and manage the various relationships and hierarchies among all the customers (the best insight into relationships);
3. Finally, reliably combine all the relevant customer activity and transaction data into accurate, unified views (the best unified view) and deliver these to various people and systems.
Conversely, short-changing this sequence can result in a high cost of data management since data conflicts and other reliability issues (based on different data types) are intermingled. To ensure that a customer is correctly and uniquely identified and managed across all applications and data sources, a customer hub must begin with a foundation of reliable customer reference data. The first step is to build a persistent store of customer master reference data that addresses its complete lifecycle: model, cleanse, match, merge, share, extend and manage. Without managing the full lifecycle, the hub can not accurately and reliably provide data services for other needs – such as provide cross-keys for transaction aggregation or identify multiple affiliation data among customers. (See side-bar II)
This reliable master data hub can then be extended to include all the relevant affiliation data among customers and organizations. The solution must allow such hierarchy data to be leveraged across data sources instead of tied to a fixed hierarchical view of an implementation. Finally, only after managing verifiable hierarchy data should the solution attempt to access, aggregate and unify the customer activity data with other data types for an accurate, timely and complete view (through caching or aggregation). None of the common data tools (ETL/EII/DQ) provide a solution framework for implementing the critical sequence that is needed to build a trustworthy customer hub.
The Challenge of Data Models
One of the key reasons custom solutions are inextensible is because of their instantiation of a fixed data model in a physical database repository or data warehouse. This fate is also shared by “packaged” CDI solutions offered by application vendors (such as Siebel, Oracle and SAP). In a large enterprise, rarely does a single vendor have access to all sources of customer data – external and/or internal. Therefore, standardizing on the application vendor data model means more, not less, work since every data source outside the vendor application has to be transformed to feed into the vendor’s customer data hub. The best approach is to create a template-driven, logical data model specifically for each enterprise reflecting all its specific customer data sources that need to be integrated. Ultimately, the solution provider has to deliver a data model and a solution framework cognizant of the needs of each major industry vertical. None of these data tools attempt to address the challenge of data models for a diverse set of data sources encountered in various vertical sectors.
Meta-data Driven Framework Needed
The most fundamental short-coming of the trio of data tools (ETL/EII/DQ) is the fact that they do not offer a meta-data framework for managing the complete set of data management tasks required of a customer data integration solution. Each of these tools, along with the numerous enterprise application integration (EAI) technologies, solves only a narrow integration issue within the IT “stack” – integrating application to application, moving data to single warehouse, cleansing a single source, etc.
The key advantage of using a meta-data driven CDI framework is that it renders the solution entirely configurable, so that business and IT changes can be implemented rapidly without writing code. Since the CDI framework is manageable by business analysts and data stewards as well as by IT, such a solution becomes the successful foundation for all unified customer views in an enterprise. Additional data sources are easy to add, without additional programming, as businesses evolve through mergers and acquisitions.
For the solution to manage data changes without software programming efforts, it must be driven by meta-data that captures the data syntax, semantics and business rules that are relevant to integrating customer data into unified views. It is important to maintain the distinction between managing meta-data through a generalized meta-data tool versus having a meta-data driven framework designed for a specific purpose (such as CDI). A meta-data driven framework captures, stores and uses highly contextual meta-data tied to a business purpose (such as, when was a customer address changed and by whom). By separating meta-data from its business context, a generalized meta-data tool often limits its business value.
Because the custom CDI solutions built with ETL-EII-DQ tools are not meta-data driven, they are not manageable by data stewards, are hard to configure and are generally not extensible beyond a handful of sources.
Service Oriented Architecture Critical
Finally, if a customer data hub is to be the central repository of critical customer information for other systems, it needs to have critical capabilities to synchronize reliable data back to source systems. In addition, such a CDI solution needs to support standards-based service-oriented architecture (SOA) so that its underlying data services may be used by future service-oriented applications. Typically, none of the hubs built by data tools offer these critical capabilities, foreshadowing their quick obsolescence.
Summary
Although necessary components of the data integration architecture, ETL, EII and DQ tools are not designed, nor able, to build a trustworthy foundation for customer data integration. For the same reason you wouldn’t hire a plumber to build your house, organizations should not rely primarily on these technologies when developing a reliable customer data foundation.
Like plumbing in a house, the tools that push data through the pipes are not representative of the overarching blueprint needed for customer data integration (CDI) architecture. The cornerstone of the architecture is the recognition that different types of data need to be treated separately. Additionally, data reliability can only be maintained through a set of best practices that first put in place the bedrock of reliable customer master reference data. A solution that has a flexible data model supported by a meta-data driven, configurable framework is the best way to construct such a foundation. Once built, it should be easily manageable by data stewards and extensible to emerging service-oriented architecture standards and therefore to new business conditions.
Before hiring a customer data integration “plumber” to build your customer foundation, take the time to evaluate a data architecture expert who can build a solid foundation from which to achieve your customer data integration goals.
About the Author
Anurag Wadehra is the Vice President of Marketing at Siperian Inc. a leading customer data integration solution provider. The Siperian solution creates the most trustworthy and manageable customer master reference store possible from widely disparate internal and external data sources. It is the foundation for delivering accurate, relevant and actionable 360º customer views. Siperian’s highly manageable and extensible solution enables enterprises to cost-effectively provide trustworthy customer master data to any system or business user, resulting in more efficient and profitable customer relationships, reduced customer data operations costs and increased accuracy of regulatory compliance.
For further information, contact awadehra@siperian.com or visit the website at http://www.siperian.com or call 1-866-SIPERIAN (1-866-747-3742).

