5 challenges to implementing QA strategy in data and analytics projects

  • Technology
  • 5 challenges to implementing QA strategy in data and analytics projects

Developing a QA strategy for unstructured data and analytics can be a trying and elusive process, but there are several things we’ve learned that can improve accuracy of results.

Image: iStock/HAKINMHAN

In a traditional application development process, quality assurance occurs at the unit-test level, the integration test level and, finally, in a staging area where a new application is trialed in an environment similar to what it will perform with in production. While it’s not uncommon for less-than-perfect data to be used in early stages of application testing, the confidence in data accuracy for transactional systems is high. By the time an application gets to final staging tests, the data that it processes is seldom in question.

SEE: Kubernetes: A cheat sheet (free PDF) (TechRepublic)

With analytics, which uses a different development process and a mix of structured and unstructured data, testing and quality assurance for data aren’t as straightforward.

Here are the challenges:

1. Data quality

Unstructured data that is incoming to analytics must be correctly parsed into digestible pieces of information to be of high quality. Before parsing occurs, the data must be prepped so it is compatible with the data formats in many different systems that it must interact with. Data also must be pre-edited so as much needless noise (such as connection “handshakes” between appliances in Internet of Things data) are eliminated. With so many different sources for data, each with its own set of issues, data quality can be difficult to obtain.

SEE: When accurate data produces false information (TechRepublic)

2. Data drift

In analytics, data can begin to drift as new data sources are added and new queries alter analytics direction. Data and analytics drift can be a healthy response to changing business conditions, but it can also get companies away from the original business use case that the data and analytics were intended for. 

SEE: Electronic Data Disposal Policy (TechRepublic Premium)

3. Business use case drift

Use case drift is highly related to drifts in data and analytics queries. There is nothing wrong with business use case drift—if the original use case has been resolved or is no longer important. However, if the need to satisfy the original business use case remains, it is incumbent on IT and the end business to maintain the integrity of data needed for that use case and to create a new data repository and analytics for emerging use cases.

SEE: 3 rules for designing a strong analytics use case for your proposed project (TechRepublic)

4. Eliminating the right data

In one case, a biomedical team studying a particular molecule wanted to accumulate every piece of data it could find about this molecule from a worldwide collection of experiments, papers and research The amount of data that artificial intelligence and machine learning had to review to collect this molecule-specific data was enormous, so the team made a decision up front to bypass any data that was not directly related to this molecule.The risk was that they might miss some tangential data that could be important, but it was not a large enough risk to prevent them from slimming down their data to ensure that only the highest quality, most relevant data was collected.

SEE: 3 reasons business users should buy an M1 MacBook Pro instead of the M1 MacBook Air (TechRepublic)

Data science and IT teams can use this approach as well. By narrowing the funnel of data that comes into an analytics data repository, data quality can be improved.

5. Deciding your data QA standards

How perfect does your data need to be in order to perform value-added analytics for your company? The standard for analytics results is that they must come within 95% accuracy of what subject matter experts would have determined for any one query. If data quality lags, it won’t be possible to meet the 95% accuracy threshold.

SEE: Ag tech is working to improve farming with the help of AI, IoT, computer vision and more (TechRepublic)

However, there are instances when an organization can begin to use data that is less-than-perfect and still derive value from it. One example is in general trends analysis, such as gauging increases in traffic over a road system or increases in temperatures over time for a fruit crop. The caveat is: If you’re using less-than-perfect data for general guidance, never make this mission-critical analytics.

5 challenges to implementing QA strategy in data and analytics projects

Data, Analytics and AI Newsletter

Learn the latest news and best practices about data science, big data analytics, and artificial intelligence.
Delivered Mondays

Sign up today

Also see

Did you like this article? You can read it and many others @ Tech Republic!


Here's the latest news

María Belén Pérez Maurice got the surprise of a lifetime at the 2020 Tokyo Olympics. The 36-year-old Argentinian fencer was speaking to the press after her loss against Hungarian athlete Anna Márton when her coach and boyfriend of 17 years, Lucas...

Fencer’s Coach Surprises Her With On-Camera Proposal at Olympics

WASHINGTON — The Department of Veterans Affairs will require 115,000 of its frontline health care workers to be vaccinated against the coronavirus in the next two months, making it the first federal agency to mandate that employees be inoculated, government...

V.A. Issues Vaccine Mandate for Health Care Workers, a First for a Federal Agency

The NBA playoffs have concluded, but as the saying goes, basketball never stops. NBA Draft season has arrived. This year's draft class is strong at the top with potentially franchise-changing prospects such as Oklahoma State's Cade Cunningham and USC's Evan...

When is the NBA Draft in 2021? Date, time, location, pick order & more to know

If you want people to trust the photos and videos your business puts out, it might be time to start learning how to prove they haven't been tampered with. Image: Lightspring/Shutterstock Great (or terrifying) moments in deepfake history: The argument...

Deepfakes: Microsoft and others in Big Tech are working to bring authenticity to videos, photos

"Wait, What" by Carlos Benzecri, available for sale until July 31 at CRUSH. Photography courtesy of XPOSED. CRUSH, an exhibit of Canadian art, runs until July 31. All proceeds go to Rainbow Railroad, an organization that helps LGBTQI+ individuals facing...

CRUSH Art Exhibit: Until July 31, All Proceeds Benefit Rainbow Railroad
Load More
Share via
Copy link
Powered by Social Snap