Large healthcare databases maintained by health plans have been widely used to conduct customized protocol-based epidemiological safety studies as well as targeted routine sequential monitoring of suspected adverse events for newly licensed vaccines. These databases also offer a rich data source to discover vaccine-related adverse events not known prior to licensure using data mining methods, but they remain relatively under-utilized for this purpose. Initial safety applications of data mining methods using big healthcare data are promising, but stronger integration of database expertize, epidemiological design, and statistical analysis strategies are needed to better leverage the available information, reduce bias, and improve reporting transparency. We enumerate major methodological challenges in mining large healthcare databases for vaccine safety research, describe existing strategies that have been used to address these issues, and identify opportunities for methodological advancements that emphasize the importance of adapting techniques used in customized protocol-based vaccine safety assessments. Investment in such research methods and in the development of deeper collaborations between database safety experts and data mining methodologists has great potential to improve existing safety surveillance programs and further increase public confidence in the safety of newly licensed vaccines.
Integrating database knowledge and epidemiological design to improve the implementation of data mining methods that evaluate vaccine safety in large healthcare databases
Authors: Nelson JC; Shrotreed SM; Yu O; Peterson D; Baxter R; Fireman B; Lewis N; McClure D; Weintraub E; Xu S; Jackson LA
Stat Anal Data Min. 2014 Oct;7(5):337-51.