AI Predicts Heart Attacks Using Personal Health Data

The Heart Attack You Didn’t See Coming

A 45 year old woman goes for a routine checkup. Her cholesterol is fine. Her blood pressure is normal. She doesn’t smoke. She has no family history of heart disease. Six months later, she has a heart attack.

This story is not rare. It happens thousands of times a year. And it points to a gap in how we predict heart attacks. Traditional risk factors like high cholesterol and smoking are powerful indicators, but they miss a lot of people. Something else is going on.

Simon Bin Akter and his colleagues at the University of Texas at El Paso wanted to find that something else. They didn’t invent a new medical test or a new drug. They built a machine learning model that looks at personal health data in a smarter way. And it found patterns that doctors might never see. (Akter et al., 2024)

Why Your Doctor’s Checklist Isn’t Enough

The standard approach to heart attack prediction is a checklist. Do you smoke? Is your LDL high? Do you have diabetes? Each yes adds a point. If you reach a threshold, your doctor might put you on a statin or recommend lifestyle changes.

This works for a lot of people. But it also creates blind spots. The authors note that myocardial infarction, or MI, often strikes “individuals without traditional risk factors” (Akter et al., 2024). That means the checklist is missing information. The question is: what information?

The answer might be buried in the kind of data you already give to your doctor or fill out on health surveys. Things like: Have you ever had bronchitis? Do you feel short of breath when walking up stairs? How often do you see a doctor for something that isn’t heart related?

These are not traditional risk factors. But they might be signals.

The Data Problem Nobody Talks About

Here is the catch. Health survey data is messy. When researchers at the CDC collect data from the Behavioral Risk Factor Surveillance System (BRFSS), they get responses from hundreds of thousands of people. But only a tiny fraction of those people have had a heart attack. Maybe 2% or 3% of the sample. The other 97% are healthy.

This is called class imbalance. And it is a nightmare for machine learning models.

Most AI models are built to maximize accuracy. If 97% of your data is healthy people, the easiest way to get 97% accuracy is to predict that nobody has a heart attack. The model looks great on paper. It is completely useless in practice. It will miss every single heart attack.

Akter and his team recognized this problem. They wrote that “health surveys and EHRs frequently suffer from class imbalances, leading to prediction biases and differences between specificity and sensitivity” (Akter et al., 2024). In plain English: the model gets good at saying someone is healthy. It gets bad at saying someone is at risk.

The Three Fixes That Changed Everything

The authors did not just throw data at a neural network and hope for the best. They built three specific techniques to handle the imbalance problem.

Dual Path Neural Network

Most neural networks process all input data through a single pathway. The model learns patterns from the whole dataset at once. But when the dataset is heavily imbalanced, the model’s learning gets skewed. It pays more attention to the majority class (healthy people) because there are more examples of them.

Akter and his colleagues built a Dual Path Artificial Neural Network (DP ANN). The model processes the data through two separate pathways. One pathway specializes in learning patterns from the majority class. The other specializes in learning patterns from the minority class (people who had heart attacks). Then the model combines what it learned from both pathways to make a final prediction.

This is not a small tweak. It is a fundamentally different architecture. And it produced dramatically better results.

Triple Criteria Selection

Feature selection is the process of deciding which variables to include in the model. Most methods use a single criterion. For example, they might ask: does this variable correlate with the outcome? If yes, keep it.

The problem is that single criteria can be biased. A variable might correlate with the outcome only because it correlates with something else that is truly predictive. The model picks up noise instead of signal.

The authors used Triple Criteria Selection (TCS), which evaluates each variable against three different statistical tests. Only variables that pass all three tests are included. This is a stricter filter. But it produces a cleaner set of predictors.

Minority Weighted Sampling

The third fix addresses the sampling problem directly. When you have a dataset with 97% healthy people and 3% heart attack survivors, you have to decide how to sample from those groups.

The standard approach is random undersampling, where you randomly drop examples from the majority class until the classes are balanced. This works, but it throws away a lot of data. You might lose subtle patterns that only appear in large samples.

Minority Weighted Sampling (MWS) is more sophisticated. Instead of randomly dropping majority class examples, it weights them. Each example from the majority class gets a weight based on how similar it is to the minority class. Examples that look more like heart attack survivors get higher weight. Examples that look nothing like them get lower weight.

This preserves more information while still reducing the imbalance.

What the Model Actually Found

The DP ANN model achieved an average specificity of 80% and sensitivity of 82%, with an AUC ROC of 89.5% (Akter et al., 2024). Those numbers mean the model correctly identifies 8 out of 10 people who will have a heart attack, and correctly clears 8 out of 10 people who will not.

The authors also compared their model to previous approaches and found it improved “imbalance variance by approximately 14.96%” (Akter et al., 2024). That is a technical way of saying the model is more fair. It does not favor one group over another.

But the real story is not the model’s accuracy. It is what the model learned.

The Surprising Predictors of Heart Attacks

The authors used SHAP analysis to figure out which variables mattered most. SHAP is a technique that shows how much each input contributes to the model’s output. It is like asking the model: why did you predict this person will have a heart attack?

The results were striking.

For women, two of the strongest predictors were coronary heart disease and bronchitis. Bronchitis is not a traditional heart attack risk factor. It is a lung condition. But the model flagged it repeatedly.

For people aged 35 to 54, the strongest predictor was stroke. Not heart attack history. Stroke. This makes biological sense. Stroke and heart attack share underlying mechanisms like atherosclerosis and inflammation. But most risk calculators do not weight stroke as heavily as the model did.

The model also picked up on less obvious signals. Things like general health status, difficulty walking, and frequency of doctor visits. These are not specific to heart disease. But they appear to be proxies for overall health deterioration that precedes a heart attack.

What This Changes

Akter and his team built a model that outperformed existing approaches across four heavily imbalanced datasets (Akter et al., 2024). That is a technical achievement. But the practical implications are bigger.

Right now, heart attack prediction relies on a small set of well known risk factors. This model suggests that we are leaving information on the table. The data people already provide in health surveys and electronic health records contains patterns that could catch heart attacks earlier.

The authors argue that their approach “provides a robust model for healthcare professionals to assess MI risk through targeted factors, promoting early detection and potentially improving patient outcomes” (Akter et al., 2024).

In other words: your doctor already has the data. They just need the right tool to read it.

What This Does Not Prove

This is a single study on retrospective data. The model has not been tested in a real clinical setting. It has not been deployed in a doctor’s office. It has not been shown to actually reduce heart attack rates.

The data comes from U.S. health surveys, which means the findings might not generalize to other countries or populations. The model also relies on self reported data, which is notoriously noisy. People forget. People lie. People misunderstand questions.

And the model is a black box. Even with SHAP analysis, it is hard to know exactly why the model makes the predictions it does. A doctor who sees a patient flagged as high risk cannot easily explain to that patient what is going wrong. That matters for trust and treatment.

These are open questions. They are not reasons to dismiss the work. They are reasons to keep pushing.

What This Actually Means

▸Your health survey data is more valuable than you think. The answers you give about bronchitis, difficulty walking, and general health are not random noise. They might be early signals of cardiovascular risk.

▸Machine learning can find patterns humans miss. The human brain is good at linear thinking. High cholesterol equals risk. Smoking equals risk. But the body is a complex system. Nonlinear interactions matter. Models can see those interactions.

▸Class imbalance is a solvable problem. Most health datasets are imbalanced. Most models handle it poorly. The techniques in this paper (DP ANN, TCS, MWS) offer a template for building fairer, more accurate models.

▸The best predictor might not be what you expect. For women, bronchitis mattered. For middle aged adults, stroke mattered. The model found these patterns because it was not constrained by existing medical assumptions.

▸We need clinical trials next. A good model on retrospective data is promising. A model that actually prevents heart attacks in real patients is the goal. That requires testing, validation, and deployment. The authors have shown the path. Now someone needs to walk it.

References

[1]Simon Bin Akter, Sumya Akter, Moon Das Tuli, David Eisenberg (2024). Fair and explainable Myocardial Infarction (MI) prediction: Novel strategies for feature selection and class imbalance correctionDOI· 10 citations