Why data analytics initiatives still fail

Executives talk about the value of data in generalities, but Michele Koch, director of enterprise data intelligence at Navient Solutions, can calculate the actual worth of her company’s data.

In fact, Koch can figure, in real dollars, the increased revenue and decreased costs produced by the company’s various data elements. As a result, she is well aware that problems within Navient’s data can hurt its bottom line. A mistake in a key data field within a customer’s profile, for instance, could mean the company can’t process a loan at the lowest cost.

“There’s money involved here, so we have a data quality dashboard where we track all of this. We track actual and potential value,” she says.

An early data-related initiative within Navient, an asset management and business processing service company based in Wilmington, Del., illustrates what’s at stake, says Barbara Deemer, chief data steward and vice president of finance. The 2006 initiative focused on improving data quality for marketing and yielded a $7.2 million ROI, with returns coming from an increased loan volume and decreased operating expenses.

Since then, Navient executives committed themselves to supporting a strong data governance program as a key part to a successful analytics effort, Koch says. Navient’s governance program includes long-recognized best practices, such as standardizing definitions for data fields and ensuring clean data.

It assigns ownership for each of its approximately 2,600 enterprise data elements; ownership goes either to the business area where the data field first originated or the business area where the particular data field is integral to its processes.

The company also has a data quality program that actively monitors the quality of fields to ensure high standards are constantly met. The company also launched a Data Governance Council (in 2006) and an Analytics Data Governance Council (in 2017) to address ongoing questions or concerns, make decisions across the enterprise, and continually improve data operations and how data feeds the company’s analytics work.

“Data is so important to our business initiatives and to new business opportunities that we want to focus on always improving the data that supports our analytics program,” Koch says.

Most executives agree that data governance is vital, citing compliance, customer satisfaction and better decision-making as key drivers, according to the 2018 State of Data Governance from data governance solutions company Erwin and UBM. However, the report found that almost 40 percent of responding organizations don’t have a separate budget for data governance and some 46 percent don’t have a formal strategy for it.

The findings are based on responses from 118 respondents, including CIOs, CTOs, data center managers, IT staff and consultants.

Given those figures, experts say it’s not surprising that there are weak spots in many enterprise data programs. Here’s a look at seven such problematic data practices.

Bringing data together, but not really integrating it

Integration tops the list of challenges in the world of data and analytics today, says Anne Buff, vice president of communications for the Data Governance Professionals Organization.

True, many organizations gather all their data in one place. But in reality they don’t integrate the various pieces from the multiple data sources, Buff explains. So the Bill Smith from one system doesn’t connect with the data on Bill Smith (and the variations of his name) generated by other systems. This gives the business multiple, incomplete pictures of who he is.

“Co-located data is not the same as integrated data,” Buff says. “You have to have a way to match records from disparate sources. You need to make it so, when this all comes together, it creates this larger view of who Bill Smith is. You have to have something to connect the dots.”

Various data integration technologies enable that, Buff says, and selecting, implementing and executing the right tools is critical to avoid both too much manual work or redoing the same work over and over.

Moreover, integration is becoming increasingly critical because data scientists are searching for patterns within data to gain the kind of insights that can yield breakthroughs, competitive advantages and the like.

“But if you can’t bring together data that has never been brought together before, you can’t find those patterns,” says Buff, who is also an advisory business solutions manager at SAS in Cary, N.C.

Not realizing business units have unique needs

Yes, consolidated, integrated data is critical for a successful analytics program. But some business users may need a different version of that data, Buff says.

“Data in one form doesn’t meet the needs for everyone across the organization,” she adds.

Instead, IT needs to think about data provisioning, that is, providing the data needed for the business case determined by the business user or business division.

She points to a financial institution’s varying needs as an example. While some departments might want integrated data, the fraud detection department might want its data scientists to use unfettered data that isn’t clean so they can search for red flags. They might want to search for someone at the same address using slight variations of their personal identifying information to apply for multiple loans.

“You’ll see similar data elements but with some variables, so you don’t want to knock out too much of those variances and clean it up too much,” Buff explains.

On the other hand, she says, the marketing department at that financial institution would want to have the correct version of a customer’s name, address and the like to properly target communications.

Recruiting only data scientists, not data engineers, too

As companies seek to move beyond basic business intelligence to predictive and prescriptive analytics as well as machine learning and artificial intelligence, they need increasing levels of expertise on their data teams.

That in turn has shined a spotlight on the data scientist position. But equally important is the data engineer, who wrangles all the data sets that need to come together for data scientists to do their work but has (so far) gained less attention in many organizations.

That’s been changing, says Lori Sherer, a partner in Bain & Co.’s San Francisco office and leader of the firm’s Advanced Analytics and Digital practices.

“We’ve seen the growth in the demand for data engineer is about 2x the growth in the demand for data scientist,” Sherer says.

The federal Bureau of Labor Statistics predicts that demand for data engineers will continue to grow at a fast clip for the next decade, with the U.S. economy adding 44,200 positions between 2016 and 2026 with an average annual pay already at $135,800.

Yet, like many key positions in IT, experts say there aren’t enough data engineers to match demand — making IT departments who are now just beginning to hire or train for the position playing catch up.

Keeping data past its prime, instead of managing its lifecycle

The cost of storage has dropped dramatically over the past decade, enabling IT to more easily afford to store reams of data for much longer than it ever could before. That might seem like good news, considering the volume and speed at which data is now created along with the increasing demand to have it for analysis.

But while many have hailed the value of having troves and troves of data, it’s often too much of a good thing, says Penny Garbus, co-founder of Soaring Eagle Consulting in Apollo Beach, Fla., and co-author of Mining New Gold: Managing Your Business Data.

Garbus says too many businesses hold onto data for way too long.

“Not only do you have to pay for it, but if it’s older than 10 years, chances are the information is far from current,” she says. “We encourage people to put some timelines on it.”

The expiration date for data varies not only from organization to organization, it varies by departments, Garbus says. The inventory division within a retail company might only want relatively recent data, while marketing might want data that’s years old to track trends.

If that’s the case, IT needs to implement the architecture that delivers the right timeframe of data to the right spot, to ensure everyone’s needs are met and old data doesn’t corrupt timely analytics programs.

As Garbus notes: “Just because you have to keep [old data], doesn’t mean you have to keep it inside your core environment. You just have to have it.”

Focusing on volume, rather than targeting relevancy

“We’re still building models and running analytics with the data that is most available rather than with the data that is most relevant,” says Steve Escaravage, senior vice president of IT consulting company Booz Allen Hamilton.

He says organizations frequently hold the mistaken notion that they should capture and add more and more datasets. He says they think “that maybe there’s something in there that we haven’t found rather than asking: Do we have the right data?”

Consider, he says, that many institutions look for fraud by analyzing vast amounts of data to look for anomalies. While an important activity, leading institutions also analyze more targeted datasets that can yield better results. In this case, they might look at individuals or institutions that are generating certain types of transactions that could indicate trouble. Or healthcare institutions might consider, when analyzing patient outcomes, data regarding how long doctors were on their shifts when they delivered patient care.

Escaravage says organizations could start by creating a data wish list. Although that exercise starts with the business side, “the mechanisms to capture it and make it available, that’s the realm of the CIO, CTO or chief data officer.”

Providing data, but ignoring where it came from

One of the big topics today is bias in analytics, a scenario that can skew results or even produce faulty conclusions that lead to bad business decisions or outcomes. The problems that produce bias reside in many different arenas within an enterprise analytics program — including how IT handles the data itself, Escaravage says.

Too often, he says, IT doesn’t do a good enough job tracking the provenance of the data it holds.

“And if you don’t know that, it can impact the performance of your models,” Escaravage says, noting the lack of visibility into how and where data originated makes controlling for bias even more difficult.

“It’s IT’s responsibility to understand where the data came from and what happened to it. There’s so much investment in data management, but there should also be a meta data management solution,” he says.

Providing data, but failing to help users understand context

IT should not only have a strong metadata management program in place, where it tracks the origin of data and how it moves through its systems, it should provide users insight into some of that history and provide context for some of the results produced via analytics, Escaravage says.

“We get very excited about what we can create. We think we have pretty good data, particularly data that’s not been analyzed, and we can build a mental model around how this data will be helpful,” he says. “But while the analytics methods of the past half-decade have been amazing, the results of these techniques are less interpretable than in the past when you had business rules applied after doing the data mining and it was easy to interpret the data.”

The newer, deep learning models offer insights and actionable suggestions, Escaravage explains. But these systems don’t usually provide context that could be helpful or even critical to the best decision-making. It doesn’t provide, for instance, information on the probability vs. the certainty that something will occur based on the data.

Better user interfaces are needed to help provide that context, Escaravage says.

“The technical issue is how people will interface with these models. This is where a focus on the UI/UX from a transparency standpoint will be very important. So if someone sees a recommendation from an AI platform, to what degree can they drill down to see an underlying probably, the data source, etc.?” he says. “CIOs will have to ask how to build into their systems that level of transparency.”