A Sample Size–Driven Approach to Heart Disease Risk Prediction

Main Article Content

Vijayalakshmi Sarraju, jayapal, Supreeti.kamilya

Abstract

Introduction:
Predicting cardiovascular disease survival outcomes is a challenging clinical data analytics subject with practical implications.


Objectives:
This paper analyses the association between sample size and model performance, providing insights relevant to generating reliable predictions across three diverse datasets.


Methods:
We use filter-based mutual information gain to identify significant characteristics. The Mutual Information gain methodology computes the dependency between each predictor variable and the target outcome, enabling the identification of the attributes that provide the most predictive value. Unlike wrapper techniques, mutual information gain is appropriate for clinical prediction applications since it is computationally competent and scalable to massive data sets. This novel approach of mutual information gain with sample-based statistical validation ensures robust and interpretable model performance across varied population sizes. Machine learning models, such as the support vector machine (SVM) and logistic regression (LR), are utilised to analyse sample sizes and assess the model's efficiency.


Results:
Across all datasets, larger samples consistently increased accuracy by up to 10%, improved sensitivity by 5–8%, and enhanced specificity, creating the positive impact of statistically representative sample sizes on model generalisation.

Article Details

Section
Articles