Developing For Data Quality by Julian Flaks

Data and its applications have never been more under the microscope than today. The promise of AI and the never-ending drive to insights and reporting fidelity take the spotlight, but applications of these technologies rapidly focus technology leaders back to the less glamorous counterpart: data quality.

Challenges to data quality never have one cause, and can be thought of more as a syndrome than a single ailment. Each organization has a unique path to tread to build and maintain the meaningful sources of data they come to rely on.

In reality, compromises in data quality are a multiplicative function of various sources such as:

  • Data schema issues and lack of rigor

  • Legacy data still exerting pressure

  • User interaction and application design limitations

The impact of user experience on data can easily be overlooked, especially as the focus is commonly on optimizing flows for expediency. It is vital that the information architecture be understood across the board, so that the interactions the user is put through can support the rigor the database needs. This in turn often has downstream UX impact itself, since the question of what data can be presented to the user is too often answered by what the data cannot yet support.

The difference between strong IA and UX design and passable UX design can sometimes live in the understanding of the data domain, along with the mapping coming out of it. 80/20 rules of what a thing commonly looks like need balancing with an understanding of the outlying data cases that may be coming into the system. Hard questions about what is and what is not too onerous a level of data validation for a user to comply with need to be asked. When data engineers look for insights and applications of the data created by an application, the difference between data always following a pattern versus usually following a pattern, can be immense.

Even the navigational contexts which users are guided within can have long term implications in the cohesiveness of the data being gathered. As features are slowly added to an application in piecemeal fashion, insufficiently thought out interfaces can lead to compromises in engineering decisions or user flows in a bid to make things work without too much change.

Along with the myriad concerns of nomenclature, navigation and labeling choices, information architecture can mean the difference between an application being developed along truly meaningful lines and an application whose domain concepts are confused. Clarity of purpose in an application typically makes it easier down the line to extend it in ways that keep the data decisions clean. 

On the server side, application code builds upon user data in various ways. In our modern age where DB Administrators are more rarely the resources deciding upon schemas, it takes great discipline to balance effective application development with adequate analysis and understanding of the impact of decisions on the health of the data. In an active system which constantly produces new data, it is both easier and more dependable to be informed by real data constraints in a schema than by sampling values and inferring the limitations of less formally structured data.

Updating user experiences and data applications in a way that nurtures data quality adds complexity, but also often removes it. Having a framework of understanding across all contributors can keep everyone more closely connected to the most important details of the system. Of course, the ergonomics of the user will sometimes have to be regarded as the highest priority, as it should be. At the same time, the more minimalist view of data that everything can be magically fixed during reporting is likely to get more outdated as we do more exciting things with the data we create.