In recent years, there has been a concerted effort to improve our understanding of the quality and effectiveness of transfused blood components. The expanding use of large datasets built from electronic health records allows the investigation of potential benefits or adverse outcomes associated with transfusion therapy. Together with data collected on blood donors and components, these datasets permit an evaluation of associations between donor or blood component factors and transfusion recipient outcomes. Large linked donor-component recipient datasets provide the power to study exposures relevant to transfusion efficacy and safety, many of which would not otherwise be amenable to study for practical or sample size reasons. Analyses of these large blood banking-transfusion medicine datasets allow for characterization of the populations under study and provide an evidence base for future clinical studies. Knowledge generated from linked analyses have the potential to change the way donors are selected and how components are processed, stored and allocated. However, unrecognized confounding and biased statistical methods continue to be limitations in the study of transfusion exposures and patient outcomes. Results of observational studies of blood donor demographics, storage age, and transfusion practice have been conflicting. This review will summarize statistical and methodological challenges in the analysis of linked blood donor, component, and transfusion recipient outcomes.