Considering the non-specific nature of muscle symptoms, studies of statin-induced myopathy (SIM) in electronic health records require accurate algortihms that can reliably identify true statinrelated cases. However, prior algorithms have been constructed in study populations that preclude broad applicability. Here we developed and validated an algorithm that accurately defines SIM from electronic health records using structured data elements and conducted a study of determinants of SIM after applying the algorithm. We used electronic records from an integrated health care delivery system (including comprehensive pharmacy dispensing records) and defined SIM as elevated creatine kinase (CK) ≥4 x upper limit of normal. A diverse cohort of participants receiving a variety of statin regimens met the criteria for study inclusion. We identified multiple conditions strongly associated with elevated CK independent of statin use. A 2-step algorithm was developed using these all-cause conditions as secondary causes (step 1) along with evidence of a statin regimen change (step 2). We identified 1,262 algorithm-derived statininduced elevated CK cases. Gold standard SIM cases determined from manual chart reviews on a random subset of the all-cause elevated CK cases were used to validate the algorithm, which had a 76% sensitivity and 77% specificity for detecting the most certain cases. Pravastatin use was associated with a 2.18 odds (95% confidence interval 1.39-3.40, P=0.0007) for statin-induced CK elevation compared to lovastatin use after adjusting for dose and other factors. We have produced an efficient, easy-to-apply methodological tool that can improve the quality of future research on statin-induced myopathy.