In this large cross-sectional study, we have described a comprehensive strategy for identifying CKD from primary care data in the UK. It is the first such study to combine estimations of both GFR and proteinuria with clinical coding using an ontological method. This case-finding approach found the prevalence of CKD stages 1–5 to be 8.7% in a population of over 1.2 million patients, which is lower than the observed prevalence of 13–14% in the Health Survey for England [13]. Using eGFR in isolation, the prevalence of CKD stages 3–5 was 6.4%, which is in keeping with previous reports [9,10,11,12,13,14,15,16].
Proteinuria estimations detected 0.6% of the cohort to have CKD who did not meet the diagnostic criteria based on eGFR in isolation, and clinical coding identified a further 1.6%. The majority of these individuals were classified as having CKD stage 1 or 2. This is of importance given that mild-to moderate CKD is predominantly managed in the community, and the crucial role of general practice for identifying these patients and treating them appropriately [3]. It is particularly valuable to identify those with proteinuria, because these individuals are at higher mortality risk independent of eGFR [8].
This study has also confirmed the limitations of using clinical coding in isolation for case finding. CKD Read codes were present in 62% of those with an eGFR < 60 mL/min/1.73m2, affirming the possibility that a significant proportion of CKD in the community may be unrecognised. Our ontological approach did not improve upon the 70% of people identified using only QOF-derived codes in the recently published National CKD Audit [16]. We also found that 26% of individuals with a CKD Read code did not meet the criteria for CKD on the basis of laboratory testing alone. Even allowing for limitations of our method, and for the fact that a normal eGFR does not preclude a diagnosis of CKD, it could be that a significant proportion of individuals are incorrectly coded. To ascertain this would require analysis on a case-by-case basis.
An advantage to our method was the identification of people with ESRD receiving RRT. Using a subset of CKD Read codes, the ontology enabled 1348 individuals to be identified who had received either a renal transplant or dialysis. Although this group represents only 1.6% of those with laboratory-confirmed CKD, these patients are at particularly high-risk of complications and to our knowledge, this is the first description of a method to identify them from routinely collected primary care data. However, it should be noted that the prevalence of ESRD in this cohort was 1110 per million population, 18% higher than the prevalence reported in the latest UK Renal Registry Report (941 per million population) [25]. Whilst it is possible that this could represent a higher rate of ESRD in this cohort, it may also be indicative of some of the limitations with coding discussed below.
Limitations
We have described a comprehensive method for identifying CKD from primary care coding that has been applied to a large cohort. However, there are inevitable limitations that come with using routinely collected data, including missing and incorrectly coded information. It is only possible to identify CKD in patients who have visited their GP or had a blood test taken. Furthermore, the busy and high turnover nature of General Practice results in the unavoidable reality that aspects of the medical history may not be coded, including historical diagnoses that may have preceded computerised medical records. Even with fastidious coding, the Read code hierarchy lacks sufficient granularity in some areas to accurately determine all cases of CKD. Whilst our ontology was created to be as comprehensive as possible, it is limited by the fact that some concepts are not present in the Read code terminology, whilst other are non-specific or do not sufficiently differentiate between acute and chronic. Also of note, the NHS in England is to change from using Read codes to using the SNOMED CT by April 2018, and Read codes had stopped being updated at the time of this study [26, 27].
Finally, we have used two logical models to derive eGFR and proteinuria based on multiple readings taken over at least 90 days. Whilst these methods improve upon the use of single readings, they do not completely overcome the confounding issue of AKI, the limitations of the tests, and fluctuations in disease states. Additionally, missing ethnicity coding will underestimate eGFR in those of black ethnicity, and inter-laboratory variation in the creatinine assay may influence prevalence rates. It should also be noted that the database blood results are from samples collected in GP practices, and results taken elsewhere will not be accounted for.