Prediction Modeling and Analytics for Chronic Health Conditions

Prediction Modeling and Analytics for Chronic Health Conditions

We are working on understanding chronic health conditions with a view to developing  prediction models for emergency visits, high cost patient visits, as well as triggers and risk factors for asthma. Our studies in this area use a combination of machine learning, natural language processing and network science.


Asthma is a common chronic health condition affecting millions of people in the United States. While it cannot be cured, it can be managed if we identify and understand triggers and risk factors that cause asthma exacerbations. However, this is challenging because these triggers and risk factors are complex and interconnected, and there are limitations to current mainstream approaches for identifying them. The recent availability of massive amounts of heterogeneous data has opened up new possibilities for asthma triggers and risk factors analyses. In this study, we introduce a data-driven framework, adapt and integrate multiple advanced machine learning techniques, and perform an empirical analysis to (i) derive characteristics of self-reported asthma patients from social media, (ii) enable integration and repurposing of highly heterogeneous and commonly available  datasets; and (iii) uncover the sequential patterns of asthma triggers and risk factors, and their relative importance, both of which are difficult to achieve via retrospective cohort-based studies. Our methods and results can provide guidance for developing asthma management plans and interventions for specific subpopulations and eventually, have the potential to reduce the societal burden of asthma.

Proactively identifying individual patients who are likely to become high cost users of a health system is an important challenge. By doing such a prediction early, i.e., at the point of admission, appropriate care can be provided, and interventions can be put in place to avoid substantial future costs and further deterioration of the patients’ health. In this study, we develop a machine learning approach combined with network science to predict high cost patients. We extract a large disease co-occurrence network from publicly available inpatient datasets and explore its structure using quantitative measures. We also uncover communities of diseases that tend to co-occur to understand their interconnections. Using this network, we propose community membership and high-cost propensity scores as additional signals for predicting high cost patients. We empirically evaluate our approach using a large dataset of 2.3 million patient encounters from Arizona by comparing the performance of models with different input feature sets. Our results demonstrate that the new features significantly improve the accuracy, sensitivity, and specificity of the prediction. Our approach has the potential to improve targeted care management and reduce health care expenditure. It can also be incorporated into decision support systems at the point of admission in hospitals to inform physicians and health care workers.

Our online dashboards also help visualize and analyze chronic conditions in Arizona here.