Rhea Zhou
Cary Academy
"Healthcare machine learning (ML) development is constrained by privacy regulations preventing cross-hospital data sharing. Federated learning (FL) enables collaborative training without exchanging raw data, but hospitals exhibit non-IID data distributions that destabilize optimization and create performance disparities between large and small institutions.
We introduce CHiP-FL (Client Clustering, Hierarchical Aggregation, and Personalization Federated Learning), a framework designed to balance predictive performance with cross-site equity. Hospitals are represented by data signature vectors capturing dataset size, label prevalence, missingness patterns, and feature statistics, which are clustered via k-means. Training optimizes a multi-level regularized objective that jointly minimizes local loss while constraining divergence between hospital, cluster, and global models.
Using the eICU Collaborative Research Database, we compare CHiP against federated baselines. While FedProx achieves the highest global AUROC (≈0.933), it exhibits substantial inter-site inequality, including the largest variance (≈0.062) and strong size bias (≈-0.032). In contrast, CHiP maintains competitive performance (≈0.903) while reducing size bias by over 90% (≈-0.002), with low Gini dispersion (≈0.028) and strong worst-decile AUROC (≈0.84).
These results highlight a fairness-efficiency tradeoff in federated clinical AI and position equity as a central objective in healthcare ML modeling."
We introduce CHiP-FL (Client Clustering, Hierarchical Aggregation, and Personalization Federated Learning), a framework designed to balance predictive performance with cross-site equity. Hospitals are represented by data signature vectors capturing dataset size, label prevalence, missingness patterns, and feature statistics, which are clustered via k-means. Training optimizes a multi-level regularized objective that jointly minimizes local loss while constraining divergence between hospital, cluster, and global models.
Using the eICU Collaborative Research Database, we compare CHiP against federated baselines. While FedProx achieves the highest global AUROC (≈0.933), it exhibits substantial inter-site inequality, including the largest variance (≈0.062) and strong size bias (≈-0.032). In contrast, CHiP maintains competitive performance (≈0.903) while reducing size bias by over 90% (≈-0.002), with low Gini dispersion (≈0.028) and strong worst-decile AUROC (≈0.84).
These results highlight a fairness-efficiency tradeoff in federated clinical AI and position equity as a central objective in healthcare ML modeling."
