The creepiness factor in healthcare data analytics

There are great benefits than can unfold from data sources, but we need to balance it with privacy concerns and unintended consequences.

Last week, I was a speaker at the Healthcare2015 INFORMS conference in Nashville. I happened to sit in on an interesting panel discussion where there was a lively debate about the use of psychographic data for healthcare analysis.

What took me by surprise was the sharp polarization in the panel around the issue of “creepiness.” One of the panelists, a senior analytics executive from a large hospital system, was vehement in his view that the use of information other than that explicitly covered by data privacy agreements with the patient, amounts to a breach of trust in the hospital-patient relationship, and hence “creepy.” On the other end of the spectrum, a former hospital executive, now an analytics entrepreneur, was of the view that any and all information available, should go into the analysis purely from the point of view of improving the quality of the analysis.
Both panelists were from scientific backgrounds, yet their individual occupations reflected their worldview on what evidently is a lightning rod for controversy – the unfettered use of data in healthcare analytics.

More data, better data
Analysts love data – the more, the merrier. As an analytics practitioner, I can relate well to throwing in as many different types of data as possible into the mix in order to enrich the understanding of the individual patient. Let’s face it, that’s also where the fun lies in data sciences.
What gets analyzed today in the context of the healthcare sector at large are: Electronic health records (EHR), lab test data from Laboratory Information Management Systems (LIMS), and health insurance information, such as covered items, deductibles, and so on, which sit inside Revenue Cycle Management (RCM) systems that enable the hospital to capture data on the applicable charges and raise invoices so that they can be paid.
With this information, a fairly comprehensive view of a patient can be obtained for understanding the patient’s medical history and risk factors, treatment efficacy, and financial risks ( such as bad debts).
Analytics teams parse through all this information, and in many cases, employ third party solutions, often cloud-based, that analyze the data (in an anonymized form) to provide insights for clinicians and administrators. Compliance with HIPAA privacy requirements is paramount in these situations.

How about some more data?
The data available in EHR and claims processing systems has certain limitations. For the most part, they provide little or no information about the socio-economic profiles of the patients. In addition, they provide no ability to integrate with other sources of health or medical information – examples being wearables, such as the Apple Watch or Fitbit.
If you push the envelope a little bit further, there is other data from social media feeds, credit histories, and genetic test results, which is also available. The latter, in particular, has raised several concerns about privacy and ethics.

So, what makes any of this creepy?
At the panel discussion, the main argument against the use of these kinds of data sources went like this:
— If this additional information were to fall into the wrong hands (such as employers or insurance companies), that might impact the patient’s job prospects, increase insurance premiums, and have other undesirable consequences.
An example of this is Silicon valley start-up 23ANDME’s efforts to monetize genetic data. This has raised dark concerns about the terrifying implications of corporations gaining access to the innermost secrets of our cells that pharmaceutical marketers and insurance companies might use against the rest of us.
The main argument for including these sources of data went like this:
— The additional data improves the analysis and benefits the patient and hospital alike – and oh by the way, all this data is out there anyway, so someone has to use it.
What the debate highlighted to me was the balance between science and ethics that we need to watch out for in matters that relate to individual privacy and the need for improved healthcare.
It is pertinent to note too, that in the same debate, the panelists agreed that certain types of non-traditional medical information, such as wearables data, do have a place in healthcare analytics. With the Internet of Things (IoT) market set to explode to 1.7 Trillion by 2020, there is no way we are going to want to ignore this wealth of data for improving public health.

Society has benefited greatly from advances in medicine over the centuries. Today, we are on the cusp of unlocking the potential of data in the practice of evidence-based medicine and population health management. There are great benefits than can unfold from the use of multiple data sources, but we need to balance it with privacy concerns and unintended consequences.