The purpose of this study was to develop an algorithm for identifying patients with chronic hepatitis B virus (HBV) using automated data sources from two US health systems and evaluate the algorithm’s performance by quantifying the incidence of hepatocellular carcinoma (HCC) among chronic HBV patients. To allow comparisons with estimates from automated databases that may not contain all data elements used in this algorithm, we created three definitions of chronic HBV infection and used these definitions to create three overlapping cohorts. We compared the incidence of HCC in each cohort with the incidence of HCC in a matched general population comparison cohort with no evidence of HBV. Patients who met the most stringent criteria for chronic HBV infection (based on the standard definition of 6 months of infection using repeat laboratory tests and record review) were 146 times more likely to develop HCC than matched comparison patients (adjusted hazard ratio = 146.5, 95% CI: 74.0-289.8). Those not meeting the stringent criteria, but who met the criterion of at least one positive hepatitis B surface antigen test were 30 times more likely to develop HCC than comparison patients (adjusted hazard ratio = 29.8, 95% CI: 16.5-53.6). Finally, patients who met the criterion based on at least one HBV diagnosis were 38 times more likely to develop HCC than matched comparison patients (adjusted hazard ratio = 37.8, 95% CI: 25.9-55.1). The magnitude of the relative increase in HCC risk seen using different criteria used to define HBV infection indicate that these automated data algorithms can identify patients with chronic HBV infection.