Data Engineering 101 for the Digital Enterprise by Mark Hewitt

Data drives every part of the digital business, from product capabilities to CX to informed decision-making. The quality and structure of the data directly affect the outcomes for all these functions. This is where your organization's data strategy comes into play.

Data engineering involves the process of designing, building, and maintaining the infrastructure and systems that are used to collect, store, and process large amounts of data. This typically includes tasks such as data ingestion, data storage, data processing, and data visualization. Perhaps more important than the initial implementation and design, data engineering is also the process of taking dated, problematic or incompatible data structures and understanding how to assist to mature them to address subsequent needs. Data engineers bring discipline to the systems and pipelines that allow organizations to effectively utilize their data.

To have a mature data process, organizations must tackle the following areas:

Develop an enterprise data architecture. This includes having a clear understanding of the different types of data that the organization collects, as well as the systems and technologies that are used to store and process that data. Data sources have a habit of growing in number, and without a clear picture of what is currently in play, it is hard for data engineers to make effective choices. Centralized data repositories are typically where various pieces of data are gathered to allow the data to be easily shared and used for business decision-making. Although this can be a great place to mitigate data incompatibilities and inconsistencies which already exist, it will pay dividends to ensure that a common data understanding informs the design of new data structures as they are created.

Ensure data quality. It is critical that the data being collected and stored is accurate and reliable. A combination of different roles including application developers and data engineers are responsible for the quality of data in the organization. Implementing processes and systems that ensure data quality, especially data validation, can give an organization more confidence that the integrity of the data is being upheld. Data cleansing may be needed to increase the qualify of data which already exists. Quality data is needed to ensure the accuracy of any data-driven decision-making and allows for more complex transformations and re-usage of data to drive new capabilities.

Employ effective data processing. Data warehousing, data mining, and data analysis play a big part in enterprise data usage. It is important to have a robust data processing infrastructure in place that can handle large amounts of data. The data processing infrastructure must also function in a repeatable way that can be leveraged by any project that might need it. The infrastructure is ultimately what will dictates how long time windows will need to be to enable data updates, and how real-time the data insights can be.

Embrace data visualization. Data visualization is the process of presenting data in a way that is easy to understand and interpret. Too often, visualizations emerge from what is obvious and easy to implement rather than what can best inform users. The right combination of application engineers, data engineers, data scientists, analysts and UX experts should be involved as necessary to create visualizations that effectively communicate the insights that can been derived from the data.

With a capable data engineering team, strong data architecture and rigorous data governance, organizations can more effectively draw on their data to drive business decisions and gain a competitive advantage. Without these building blocks of an organizational data strategy, organizational data initiatives will occur in a haphazard fashion and will yield more limited and inconsistent results.

Mark Hewitt