Multi-source heterogeneous data fusion: a methodological survey from early, late, and hybrid strategies to attention-based architectures

Michael R. Anderson; James W. Mitchel

doi:10.66372/

Authors

Michael R. Anderson Department of Computer Science, Stanford University, Stanford, CA, USA Author
James W. Mitchel Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA Author

DOI:

https://doi.org/10.66372/

Keywords:

multi-source data fusion; multimodal learning; attention mechanism; sensor fusion; graph neural networks; federated learning; differential privacy; methodological survey.

Abstract

Modern decision systems rarely operate on a single, clean stream of data. Sensor arrays in autonomous vehicles, multi-omics records in cardiology, transaction logs combined with social signals in retail, and asynchronous text-tabular-graph evidence in finance all share the same underlying problem: how to fuse heterogeneous sources into a coherent representation that improves downstream estimation. This paper presents a methodological survey of the dominant fusion paradigms—early (feature-level), late (decision-level), hybrid, and attention-based fusion—then connects them to graph fusion and to the privacy- and fairness-aware variants now common in production systems. We construct a unified notation, classify 163 representative recent studies, and complement the survey with a four-scenario controlled experiment covering sensor fusion under degraded perception, multimodal medical risk prediction, multi-source financial early warning, and dynamic graph fusion on transaction networks. Across these scenarios, attention-based hybrid fusion outperforms early-only and late-only baselines on accuracy and calibration metrics by margins of 3.4–8.7 percentage points in F1 and 6–14 percentage points in AUPRC, while late fusion remains attractive when source reliability varies sharply across time. We discuss residual gaps and outline the design space for future fusion architectures combining retrieval, graph attention, and privacy-preserving training.

Author Biography

James W. Mitchel, Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA

Multi-source heterogeneous data fusion: a methodological survey from early, late, and hybrid strategies to attention-based architectures

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Manu

For Authors

About Journal

Editorial Team

Make a Submission

Ready to Publish