Skip to main content

Do routine hospital data accurately record comorbidity in advanced kidney disease populations? A record linkage cohort study



Routine healthcare datasets capturing clinical and administrative information are increasingly being used to examine health outcomes. The accuracy of such data is not clearly defined. We examine the accuracy of diagnosis recording in individuals with advanced chronic kidney disease using a routine healthcare dataset in England with comparison to information collected by trained research nurses.


We linked records from the Access to Transplant and Transplant Outcome Measures study to the Hospital Episode Statistics dataset. International Classification of Diseases (ICD-10) and Office for Population Censuses and Surveys Classification of Interventions and Procedures (OPCS-4) codes were used to identify medical conditions from hospital data. The sensitivity, specificity, positive and negative predictive values were calculated for a range of diagnoses.


Comorbidity information was available in 96% of individuals prior to starting kidney replacement therapy. There was variation in the accuracy of individual medical conditions identified from the routine healthcare dataset. Sensitivity and positive predictive values ranged from 97.7 and 90.4% for diabetes and 82.6 and 82.9% for ischaemic heart disease to 44.2 and 28.4% for liver disease.


Routine healthcare datasets accurately capture certain conditions in an advanced chronic kidney disease population. They have potential for use within clinical and epidemiological research studies but are unlikely to be sufficient as a single resource for identifying a full spectrum of comorbidities.

Peer Review reports


Over 50% of individuals receiving kidney replacement therapy (KRT) have a comorbid medical condition in addition to their kidney disease [1]. Comorbidity is associated with increased hospitalisation [2], reduced quality of life [3], and mortality [4, 5]. It is therefore essential to adjust for comorbidity when comparing clinical outcomes, without which confounding due to differences in case-mix may bias results [6, 7]. Further, inaccurate or incomplete data may result in bias, so robust methods of collecting comorbidity information are required.

In clinical research studies, data are often extracted from clinical notes by specially trained staff. Benefits of this approach include collection of high-quality, consistent information with minimal missing data. However, this is resource-intensive and the economic implications of directly gathering information that is already routinely collected elsewhere need to be considered. Disease-specific registries, including the UK Renal Registry (UKRR) record comorbidity information through clinician reporting but with low data-completeness: the UKRR only captures comorbidity in half of individuals [1].

One way of improving the completeness of comorbidity data is through linkage to routinely collected healthcare datasets such as Hospital Episode Statistics (HES) [6]. These contain information recorded at the point of care delivery, are cheaper than direct data collection and of minimal burden to study participants and researchers. Long-term follow up of large populations across geographical areas can be efficiently captured with reduced attrition, no recall bias and the ability to adjust for residual confounding relating to the accrual of comorbidity over time [8,9,10]. If data are of sufficient quality, these datasets are an appropriate resource for use within clinical research.

HES records detailed information on National Health Service (NHS) funded hospital care in England and Wales to inform reimbursement of health providers [11]. HES data are increasingly used in research to identify participants and record outcomes [12,13,14], and the UKRR established HES linkage to supplement its comorbidity information in 2018 [15].

Although the accuracy of HES in recording individual medical conditions has been compared to various disease registries [16,17,18], its accuracy in people with advanced chronic kidney disease (CKD) is less well documented. Clustering of comorbidities [19] and higher hospitalisation rates [20] may lead to differences in the quality of data compared to the general population and merits further exploration.

The aim of this study was to investigate the accuracy of HES comorbidity data in a cohort of individuals with advanced CKD with reference to information collected by trained research nurses. This is to identify whether this resource can be reliably used within epidemiological and clinical research in the KRT population.

Materials and methods

Data sources and study population

We used data from the Access to Transplant and Transplant Outcome Measures (ATTOM) observational cohort study linked to the HES dataset. ATTOM recruited individuals aged 18 to 75 years in the United Kingdom between 2011 and 2013. Patients had started dialysis or received a kidney transplant within the preceding 90 days or were active on the deceased-donor waitlist, and entered ‘incident dialysis’, ‘incident transplant’ or ‘waitlisted’ cohorts respectively. Study methodology has been described previously [21].

Research nurses collected data on patient demographics, socioeconomic indicators, primary renal disease (PRD) and comorbidity (Supplementary table 1) at recruitment. Demographic and clinical data were collected from case notes whilst ethnicity and socioeconomic information were obtained from self-completed patient questionnaires. Research nurses underwent data collection training and received documentation with clear definitions against which to gather information. Independent data validation was performed by a senior nurse in a randomly selected 5% of cases with a concordance of over 98% for all collected variables [21].

Data from HES were available from 1st January 2006 to 31st December 2017, containing demographic and clinical information from NHS secondary care encounters. Encounters are recorded as admitted patient care (APC), outpatient (OP) or emergency department (ED) attendances.

Diagnoses and procedures from APC and OP episodes are coded using International Classification of Diseases 10th revision (ICD-10) and Office for Population Censuses and Surveys Classification of Interventions and Procedures version 4 (OPCS-4) criteria. Up to 20 diagnosis and 24 operation codes are recorded for each APC episode. Information in the primary position reflects the principal diagnosis, with subsequent positions documenting comorbidities collated by professional clinical coders [11].

Data were obtained by NHS Digital, stored at NHS Blood and Transplant, and linked to the ATTOM database by unique patient identifiers (Data Sharing Agreement Number DARS-NIC-14342-Q8W0X-v1.4). Ethical approval for ATTOM was obtained from the National Health Service Health and Social Care Research Ethics Committee (Ref: 11/EE/0120). Patients provided informed consent at ATTOM recruitment for subsequent analysis of outcomes. All data were stored in line with the United Kingdom Data Protection Act 1998 requirements. Study methodology was performed in line with the aforementioned ethical guidelines and regulations.

HES data were only available from hospitals in England, so ATTOM participants from elsewhere in the UK were excluded. From here we refer to ATTOM and HES as ‘study data’ and ‘hospital data’ respectively.

Data completeness and healthcare utilisation

To determine the completeness of HES data, the dataset linkage rate and number of HES entries per individual were determined. Methodology on dataset linkage rate is described within Supplementary Material. As diagnosis recording is most detailed within HES APC [11, 22] only these episodes were used to extract comorbidity information (over 95% of OP episodes were coded as ‘unspecified morbidity’). The number of patients with an APC episode prior to study recruitment was calculated and number of admissions determined. Comorbidities among individuals with and without an APC episode were compared.

Comorbidity recording

The comorbidities recorded by study nurses are shown in Supplementary table 1, alongside corresponding ICD-10 and OPCS-4 codes. Codes were identified from a systematic search of data dictionaries alongside consultation of established algorithms [23]. Comorbidities were extracted from all diagnosis and operation positions from hospital admissions between January 2006 and study recruitment. If a condition was recorded once, it was considered to persist on subsequent attendances in-keeping with established methodology [24]. The prevalence of comorbidities were calculated using the denominator of all individuals with dataset linkage and complete study comorbidity records.

To maximise their statistical power, studies need to identify conditions with an adequate sensitivity (proportion of true ‘cases’ identified), specificity (proportion of true ‘controls’ identified) and positive predictive value (PPV; proportion of identified cases that truly have the condition). A higher PPV leads to greater statistical power through low misclassification of positive cases which could ‘dilute’ any observed effect. False negatives have less impact on power for conditions with a relatively low prevalence as they join the larger control population. If the condition of interest is rare, specificity and negative predictive value (NPV) are generally high.

The study comorbidity dataset was taken to represent ‘gold standard’. The sensitivity, specificity, PPV and NPV of comorbidities derived from hospital data were calculated. Cohen’s kappa statistic was used to compare the agreement of recording between sources. Accepted values were taken to indicate poor (< 0.2), fair (0.21–0.40), moderate (0.41–0.6), substantial (0.61–0.8) and good (> 0.8) agreement [25]. The ICD-10 and OPCS-4 codes of comorbidities with a PPV below 50% were scrutinised to identify diagnoses giving false positive results. To examine whether disease prevalence associates with recording accuracy, pooled sensitivities and PPVs were calculated using a subgroup meta-analysis.

Operations preferentially generate cost codes for hospital episodes and the condition being treated by an operation could be more likely to be ‘truly’ present if requiring an intervention. A subgroup meta-analysis compared the sensitivity and PPV of conditions identified using ICD-10 criteria alone to those also derived from OPCS-4 codes. A random-effects model was used due to heterogeneity in the prevalence of comorbidities and variation in the sensitivity and PPV of comorbidities derived from hospital data reported previously [17, 18].

The renal modified Charlson score was calculated using comorbidities derived from study and hospital data (Supplementary table 2) [26]. The sensitivity, specificity, PPV and NPV of the Charlson score derived from hospital data were calculated.

Statistical analyses

Descriptive statistics were used to report baseline characteristics with non-parametric continuous variables expressed as median [interquartile range, IQR] and categorical variables as frequency (percentage). The Chi-square test and Mann-Whitney U test were used to compare categorical and non-parametric continuous variables respectively. Results of regression analyses were presented as odds ratios with 95% confidence intervals. Statistical significance was defined as a p-value < 0.05. Analyses were performed using Stata 15 (Statacorp, College Station, TX).


Data sets and study population

In total, 5703 patients were recruited to ATTOM from an English renal centre. Study and hospital records were linked for 5506 (97%) individuals. Of the 197 individuals whose records did not link, 49 had non-English postcodes and likely received treatment elsewhere in the UK, leaving 148 (2.6%) unmatched (Fig. 1). Factors associated with dataset linkage are described in the Supplementary Material and shown in Supplementary table 3 and Supplementary table 4.

Fig. 1
figure 1

Flow chart depicting individuals included in the study. There were 69 individuals without an admitted patient care episode prior to study recruitment, but 67 of these had a subsequent admitted patient care episode after recruitment

Of those individuals with linked datasets, the median age was 53 years [IQR 43–63], 62% of individuals were male and 76% were of white ethnicity. Overall, 20% of individuals had a PRD classified as ‘other’, with a further 19% each having diabetes and glomerulonephritis (Table 1).

Table 1 Study dataset linkage by patient demographic and clinical factors. Data are expressed as number (%) or median [IQR]. Standardised differences of 0.2, 0.5 and 0.8 reflect small, medium and large standardised differences respectively

Healthcare utilisation

The median time covered by hospital data prior to study recruitment was 6.7 years [IQR 6.4–7.0]. Of the 5506 individuals whose datasets linked, 5437 (99%) had an APC episode prior to recruitment. The median number of APC episodes was 9 [IQR 5–16] and median time from last admission to recruitment was 58 days [IQR 19–258]. Of those individuals with an admission, 89% had an admission within 1 year of recruitment and 95% within 2 years. Details of the 69 individuals without an admission prior to study recruitment are shown in the Supplementary Material; these individuals are included in subsequent analyses and counted as having no comorbidity in hospital records.

Comorbidity recording

There was variation in the sensitivity, specificity, PPV and NPVs of comorbidities (Table 2). Diabetes, ischaemic heart disease and malignancy were most prevalent (Fig. 2) and recorded with a high sensitivity and PPV of 97.7 and 90.4% for diabetes, 82.6 and 82.9% for ischaemic heart disease and 62.8 and 71.9% for malignancy (Figs. 3 and 4). Alongside heart valve replacement, these conditions had a kappa statistic over 0.6 indicating adequate agreement.

Table 2 Sensitivity, specificity, positive and negative predictive values and Kappa statistic of hospital data comorbidity as compared to study data. Conditions are ordered by prevalence
Fig. 2
figure 2

Prevalence of comorbidities derived from study and hospital datasets

Fig. 3
figure 3

Forest plot displaying sensitivity (%) with 95% confidence intervals for individual comorbidities derived from hospital data. Comorbidities are ordered by prevalence. ES: effect size, represents sensitivity (%)

Fig. 4
figure 4

Forest plot displaying positive predictive values (%) with 95% confidence intervals for individual comorbidities derived from hospital data. Comorbidities are ordered by prevalence. ES: effect size, represents positive predictive value (%)

Heart failure, chronic lung disease, mental illness and peripheral vascular disease each had greater sensitivities relative to their PPV, reflecting a greater proportion of false positive cases in hospital data. False positive cases of chronic lung disease reflected recordings of asthma or COPD in 85% of cases, and false positive cases of mental illness were recorded as depression in 46% and harmful or dependent use of alcohol in 32% of cases (Supplementary table 5). Peripheral vascular disease was identified using both ICD-10 and OPCS-4 codes and had a sensitivity of 67.2% and PPV of 47.7%. Examining the ICD-10 code alone gave a similar sensitivity (51.2, 95% CI 45.3–57.1) and PPV (51.5, 95% CI 45.6–57.4).

Blood borne viruses and abdominal aortic aneurysm had the lowest sensitivities but proportionately greater PPVs reflecting a higher rate of false negative cases. Liver disease and dementia both had poor sensitivities and PPVs under 50%. False positive liver disease cases were due to coding of liver transplant, fatty change of the liver and liver failure otherwise unspecified.

To examine whether disease prevalence was associated with the accuracy of comorbidity recording, pooled sensitivities and PPVs were calculated. The three most prevalent comorbidities comprising diabetes, heart disease and malignancy had a greater pooled PPV than all other conditions combined at 81.8% (95% CI 70.1–93.6) versus 48.1% (95% CI 37.1–59.0) (p < 0.001) but the association between recording accuracy and disease prevalence was not linear.

The conditions identified through ICD-10 codes alone or a combination of ICD-10 and OPCS-4 codes are shown in Supplementary table 1. There was no variation in sensitivity or PPV with coding system. The pooled sensitivity of conditions identified from ICD-10 and OPCS-4 criteria was 69.6% (95% CI 56.4–82.8), and from ICD-10 codes alone 59.8% (95% CI 39.7–80.0) (p = 0.43). The pooled PPV of ICD-10 and OPCS-4 diagnoses was 58.1% (95% CI 43.3–73.0) and for ICD-10 diagnoses alone was 53.5% (95% CI 29.5–77.5) (p = 0.74).

The sensitivity and PPV of Charlson comorbidity scores derived from hospital data are shown in Table 3. These declined with rising Charlson score. The sensitivity and PPV of a Charlson score of 0 were 88.2 and 82.9% respectively, and for a Charlson score of 1–2 were 83.9 and 66.6%.

Table 3 Sensitivity, specificity, positive and negative predictive values and Kappa statistic of hospital data Charlson comorbidity index as compared to study data


This observational study of over 5000 individuals with advanced CKD describes the accuracy of comorbidity recording in the Hospital Episode Statistics dataset compared to data collected by trained research nurses. The record linkage rate and proportion of individuals with comorbidity data before starting kidney replacement therapy are high, but there is variation in the sensitivity and positive predictive values of conditions derived from the hospital dataset. We suggest hospital data are adequate for capturing comorbidities including diabetes, ischaemic heart disease and malignancy but caution should be used if using this resource to identify a full spectrum of conditions.

There are several possible explanations for the variation in recording accuracy. First, accuracy may be influenced by the likelihood of a condition being directly implicated in hospital admission. Acute coronary syndromes and the management of malignancy are likely to require hospitalisation and were accurately recorded, whilst conditions predominantly monitored as an outpatient such as blood borne viruses and aortic aneurysms had lower sensitivities. Whilst the working diagnosis will influence the likelihood of hospital admission, this will also vary with clinician, social and geographical factors. We were not able to examine variation in recording accuracy between hospitals due to individuals having admissions across multiple sites and the small number of individuals attending certain hospitals, but inter-centre variation may also exist.

Second, variations in diagnostic criteria may lead to discrepancies in recording. For example, echocardiogram abnormalities are common in people on dialysis in the context of volume overload but there may not structural or functional cardiac dysfunction when the patient is at their dry weight [27]. Extracellular fluid overload could be misinterpreted as heart failure and recorded as such in clinical notes, but stricter diagnostic criteria were used in the study proforma. Variation may also reflect how ‘presumed’ diagnoses are recorded e.g. malignancy without histological confirmation.

Third, the granularity of ICD-10 and OPCS-4 coding systems should be considered. Amputations are coded as a procedure within hospital data but the reason for amputation is not documented. We assumed lower limb amputations related to peripheral vascular disease, though some may have traumatic, infective, or malignant aetiologies. Examining ICD-10 diagnosis codes for peripheral vascular disease alone did not substantially improve the PPV. Previous studies have suggested that severe disease is more likely to be correctly recorded [28], so it might have been expected that individuals with peripheral vascular disease requiring amputation to also have ICD-10 coding.

Previous studies have assessed the accuracy of hospital coding with reference to primary care and disease registry data, and recommended ways to maximise data quality. Herrett et al. examined the recording of acute myocardial infarction, reporting a PPV of 91.5% in hospital data with reference to a myocardial infarction registry. However, a third of cases were missed and they suggest linked datasets from more than one source can reduce biased estimates [16, 29]. Careful selection of ICD-10 codes is also important: a meta-analysis examining stroke recording found a wide variation in PPV, with the most accurate studies using stroke-specific as opposed to general cerebrovascular disease codes [17]. Finally, the PPV can be increased if diagnoses are recorded only if they correlate to the treating specialty, are in the primary diagnosis position or documented more than once [30]. These techniques will however reduce sensitivity so a balance must be found.

Lessons on improving routine healthcare data quality can also be taken from countries which successfully gather this information [31]. Denmark has a similar healthcare system to the UK and has excellent routine healthcare data which is easily accessible for research purposes. Consultants prospectively enter medical diagnoses in clinical databases that record the quality of healthcare delivered, and as these are used to assess treatment effectiveness and in research there are constant efforts to ensure the data is valid [32].

One study has previously examined the accuracy of HES comorbidity data in individuals on KRT, using UKRR comorbidity returns as their reference [6]. They reported overall ‘good’ concordance between sources, but the information was not as granular as is presented here and 50% of individuals had missing UKRR comorbidity information. HES comorbidity was however predictive of mortality and partially explained variation in outcomes between centres [6]. It is therefore possible that hospital data could minimise bias arising from comorbidity accrual in longitudinal observational studies [33, 34].

Using routine healthcare data for research purposes comes with economic and practical advantages: it is of low burden to participants and researchers, captures a large study population with high data completeness (96% in our study) and allows longitudinal follow up of individuals. Datasets used for hospital reimbursement also provide a ‘real-world’ view of hospitals care and insight into the financial impact of treatment.

Challenges however do exist. First, not all individuals are represented within hospital data and 2.6% of datasets in our study were not linked. This could be explained by individuals opting-out of record sharing between NHS Digital and third parties which results in the loss of 2% of hospital episodes [11].

Second, HES does not capture treatment in primary care, in the private sector or outside of England. The development of comorbidity is often associated with hospitalisation and nearly 90% of individuals had an admission within a year of KRT start, so for this population it seems unlikely for significant uncaptured community comorbidity accrual to have occurred. It is also not known if the absence of hospital data reflects no hospital contact or a loss to follow up. Similarly, hospital data cannot code conditions as absent, so lack of documentation does not definitively confirm absence of disease.

Third, the data inputted into HES are extracted from patient notes often completed by junior members of the medical team, with trained medical coders selecting the best aligned ICD-10 and OPCS-4 codes. The quality of the data depends on the documented information [35], experience of the coder and whether any systematic errors occur during the data collection process.

Finally, whilst cheaper than employing staff to gather patient information, the time and cost in gaining access to hospital data may be a barrier to its use. A new application for HES data costs £1030 and linking a bespoke dataset costs £2060 [36]. The time to receive data varies depending on the information required, but for this project took 2 years.

Our study has several strengths. We examine a large cohort of individuals with advanced CKD who are broadly representative of the UK KRT population [21] and report the accuracy of national hospital data with greater granularity and a lower rate of missing reference data than previous studies [37]. Our reference data collected by trained research nurses is likely to be accurate and reflects standard practice in most clinical research studies.

We acknowledge this study’s limitations. Study comorbidity was used as a gold standard, and although data validation suggested a high concordance between staff this source may still contain errors. Current HES data quality may differ from the 2006–2013 dataset used here. A rise in the number of completed coding fields in HES over time could yield greater data accuracy, but the possibility of over-diagnosis should be considered [37, 38].

In conclusion, the routinely collected HES dataset captured comorbidity information in 96% of individuals before the start of KRT, but there is variation in data accuracy. HES data were accurate for more prevalent conditions, but less suitable for recording a full complement of comorbidities. Understanding patterns of comorbidity among people with advanced kidney disease is crucial in informing policy and service planning, and in shared decision-making with patients. Our work will inform the use of routinely collected data to improve the efficiency of future research.

Availability of data and materials

The HES dataset analysed during the current study is not publicly available and cannot be shared at a patient level as per the NHS Digital data sharing agreement. Analysis codes and ATTOM summary datasets are available from the corresponding author on reasonable request.



Admitted patient care


Access to Transplant and Transplant Outcome Measures study


Chronic kidney disease


Emergency department


Hospital Episode Statistics


International Classification of Diseases version 10


Interquartile range


National Health Service


Negative predictive value


Kidney replacement therapy




Office for Population Censuses and Surveys Classification of Interventions and Procedures version 4


Odds ratio


Positive predictive value


Primary renal disease


UK Renal Registry


  1. 1.

    Rao A, Steenkamp R, Caskey F. UK Renal Registry 16th annual report: chapter 5 comorbidities and current smoking status amongst patients starting Renal replacement therapy in England, Wales and Northern Ireland from 2011 to 2012. Nephron Clin Pract. 2013;125(1–4):99–110.

    Article  PubMed  Google Scholar 

  2. 2.

    McPhail SM. Multimorbidity in chronic disease: impact on health care resources and costs. Risk Manag Healthc Policy. 2016;9:143–56.

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Gijsen R, Hoeymans N, Schellevis FG, Ruwaard D, Satariano WA, van den Bos GAM. Causes and consequences of comorbidity: a review. J Clin Epidemiol. 2001;54(7):661–74.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Khan IH, Catto GRD, MacLeod AM, Edward N, Fleming LW, Henderson IS. Influence of coexisting disease on survival on renal-replacement therapy. Lancet. 1993;341(8842):415–8.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Liu J, Huang Z, Gilbertson DT, Foley RN, Collins AJ. An improved comorbidity index for outcome analyses among dialysis patients. Kidney Int. 2010;77(2):141–51.

    Article  PubMed  Google Scholar 

  6. 6.

    Fotheringham J, Jacques RM, Fogarty D, Tomson CRV, El Nahas M, Campbell MJ. Variation in Centre-specific survival in patients starting renal replacement therapy in England is explained by enhanced comorbidity information from hospitalization data. Nephrol Dial Transplant. 2013;29(2):422–30.

    Article  PubMed  Google Scholar 

  7. 7.

    Karamadoukis L, Ansell D, Foley RN, McDonald SP, Tomson CRV, Trpeski L, Caskey FJ. Towards case-mix-adjusted international renal registry comparisons: how can we improve data collection practice? Nephrol Dial Transplant. 2009;24(8):2306–11.

    Article  PubMed  Google Scholar 

  8. 8.

    Cook JA, Collins GS. The rise of big clinical databases. BJS (British Journal of Surgery). 2015;102(2):e93–e101.

    CAS  Article  Google Scholar 

  9. 9.

    Grunau GL, Sheps S, Goldner EM, Ratner PA. Specific comorbidity risk adjustment was a better predictor of 5-year acute myocardial infarction mortality than general methods. J Clin Epidemiol. 2006;59(3):274–80.

    Article  PubMed  Google Scholar 

  10. 10.

    Fraccaro P, Kontopantelis E, Sperrin M, et al. Predicting mortality from change-over-time in the Charlson Comorbidity Index: A retrospective cohort study in a data-intensive UK health system. Medicine. 2016;95:43

    Article  Google Scholar 

  11. 11.

    Herbert A, Wijlaars L, Zylbersztejn A, Cromwell D, Hardelid P. Data resource profile: hospital episode statistics admitted patient care (HES APC). Int J Epidemiol. 2017;46(4):1093–1093i.

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Chaudhry Z, Mannan F, Gibson-White A, Syed U, Majeed A, Ahmed S. Research outputs of England’s Hospital Episode Statistics (HES) database: a bibliometric analysis. BMJ Health Care Inform. 2017;24(4):329.

    Article  Google Scholar 

  13. 13.

    Sarween N, Hughes S, Evison F, Day C, Knox E, Lipkin G. SO012 pregnancy outcomes in renal transplant recipients in England over 15 years. Nephrol Dial Transplant. 2016;31(suppl_1):i6.

    Article  Google Scholar 

  14. 14.

    Judge PK, Harper CHS, Storey BC, Haynes R, Wilcock MJ, Staplin N, Goldacre R, Baigent C, Collier J, Goldacre M, Landray MJ, Winearls CG, Herrington WG. Biliary tract and liver complications in polycystic kidney disease. J Am Soc Nephrol. 2017;28(9):2738–48.

    Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    UK Renal Registry (2019) UK Renal Registry 21st annual report – data to 31/12/2017, Bristol, UK.

    Google Scholar 

  16. 16.

    Herrett E, Shah AD, Boggon R, Denaxas S, Smeeth L, van Staa T, Timmis A, Hemingway H. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346(may20 3):f2350.

    Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Woodfield R, Grant I. UK biobank stroke outcomes group, UK biobank follow-up and outcomes working group, Sudlow CLM. Accuracy of electronic health record data for identifying stroke cases in large-scale epidemiological studies: a systematic review from the UK biobank stroke outcomes group. PLoS One. 2015;10(10):e0140533.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Yao RJR, Andrade JG, Deyell MW, Jackson H, McAlister FA, Hawkins NM. Sensitivity, specificity, positive and negative predictive values of identifying atrial fibrillation using administrative data: a systematic review and meta-analysis. Clin Epidemiol. 2019;11:753–67.

    Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Schneider KM, O’Donnell BE, Dean D. Prevalence of multiple chronic conditions in the United States’ Medicare population. Health Qual Life Outcomes. 2009;7(1):82.

    Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Iimuro S, Kaneko T, Ohashi Y, et al. Analysis of 2897 hospitalization events for patients with chronic kidney disease: results from CKD-JAC study. Clin Exp Nephrol. 2019;23(7):956–68.

    Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Oniscu GC, Ravanan R, Wu D, Gibbons A, Li B, Tomson C, Forsythe JL, Bradley C, Cairns J, Dudley C, Watson CJ, Bolton EM, Draper H, Robb M, Bradbury L, Pruthi R, Metcalfe W, Fogarty D, Roderick P, Bradley JA, ATTOM Investigators. Access to transplantation and transplant outcome measures (ATTOM): study protocol of a UK wide, in-depth, prospective cohort analysis. BMJ Open. 2016;6(2):e010377.

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    NHS Digital. Hospital Outpatient Activity, 2015/16: Primary Diagnosis. Published online December 1, 2016. Accessed December 9, 2019.

  23. 23.

    Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130–9.

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Wang CY, Baldwin L-M, Saver BG, Dobie SA, Green PK, Cai Y, Klabunde CN. The contribution of longitudinal comorbidity measurements to survival analysis. Med Care. 2009;47(7):813–21.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–82.

    Article  Google Scholar 

  26. 26.

    Hemmelgarn BR, Manns BJ, Quan H, Ghali WA. Adapting the charlson comorbidity index for use in patients with ESRD. Am J Kidney Dis. 2003;42(1):125–32.

    Article  PubMed  Google Scholar 

  27. 27.

    House AA, Wanner C, Sarnak MJ, et al. Heart failure in chronic kidney disease: conclusions from a kidney disease: improving global outcomes (KDIGO) controversies conference. Kidney Int. 2019;95(6):1304–17.

    Article  PubMed  Google Scholar 

  28. 28.

    Baecklund E, Iliadou A, Askling J, Ekbom A, Backlin C, Granath F, Catrina AI, Rosenquist R, Feltelius N, Sundström C, Klareskog L. Association of chronic inflammation, not its treatment, with increased lymphoma risk in rheumatoid arthritis. Arthritis Rheum. 2006;54(3):692–701.

    Article  PubMed  Google Scholar 

  29. 29.

    Millett ERC, Quint JK, De Stavola BL, Smeeth L, Thomas SL. Improved incidence estimates from linked vs. stand-alone electronic health records. J Clin Epidemiol. 2016;75:66–9.

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Ludvigsson JF, Olén O, Bell M, Ekbom A, Montgomery SM. Coeliac disease and risk of sepsis. Gut. 2008;57(8):1074–80.

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Ludvigsson JF, Andersson E, Ekbom A, Feychting M, Kim JL, Reuterwall C, Heurgren M, Olausson PO. External review and validation of the Swedish national inpatient register. BMC Public Health. 2011;11(1):450.

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Schmidt M, Schmidt SAJ, Adelborg K, Sundbøll J, Laugesen K, Ehrenstein V, Sørensen HT. The Danish health care system and epidemiological research: from health care contacts to database records. Clin Epidemiol. 2019;11:563–91.

    Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Norris CM, Ghali WA, Knudtson ML, Naylor CD, Saunders LD. Dealing with missing data in observational health care outcome analyses. J Clin Epidemiol. 2000;53(4):377–83.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Sarfati D, Hill S, Purdie G, Dennett E, Blakely T. How well does routine hospitalisation data capture information on comorbidity in New Zealand? New Zealand Med J. 2010;123(1310):50–61.

    PubMed  Google Scholar 

  35. 35.

    Tang KL, Lucyk K, Quan H. Coder perspectives on physician-related barriers to producing high-quality administrative data: a qualitative study. CMAJ Open. 2017;5(3):E617–22.

    Article  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Data Access Request Service (DARS) charges from 2020/2021. Published online November 9, 2020.

  37. 37.

    Fotheringham J, Fogarty D, Jacques R, El Nahas M, Campbell M. Chapter 13 The Linkage of Incident Renal Replacement Therapy Patients in England (2002–2006) to Hospital Episodes and National Mortality Data: Improved Demography and Hospitalisation Data in Patients Undergoing Renal Replacement Therapy. Nephron Clin Pract. 2012;120(suppl 1):c247–60.

    Article  PubMed  Google Scholar 

  38. 38.

    Burns EM, Rigby E, Mamidanna R, Bottle A, Aylin P, Ziprin P, Faiz OD. Systematic review of discharge coding accuracy. J Public Health (Oxf). 2012;34(1):138–48.

    CAS  Article  Google Scholar 

Download references


This paper presents research from the Access to Transplantation and Transplant Outcome Measures (ATTOM) study which was funded by the National Institute for Health Research (NIHR) under the Programme Grants for Applied Research scheme (RP-PG-0109-10116). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Many thanks to the ATTOM research team, the research nurses and to the patients in the study. We would also like to thank James Fotheringham who gave insightful input into the manipulation of HES data and provided support for the study.


The authors, other than RS, received funding from the National Institute for Health Research (NIHR) under the Programme Grants for Applied Research scheme (RP-PG-0109-10116) for completion of the ATTOM study.

Author information




AN performed the analyses, produced the tables, and wrote the manuscript under the supervision of DMT and RR. RS assisted with statistical analyses and contributed to manuscript preparation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ailish Nimmo.

Ethics declarations

Ethics approval and consent to participate

Ethical approval for the ATTOM study was obtained from the National Health Service Health and Social Care Research Ethics Committee (Ref: 11/EE/0120). Patients provided informed consent at ATTOM recruitment that they were happy for subsequent analysis of outcomes. All data were stored in line with the United Kingdom Data Protection Act 1998 requirements. HES data were stored securely in NHS Blood and Transplant as per the data sharing agreement with NHS Digital (Data Sharing Agreement Number DARS-NIC-14342-Q8W0X-v1.4).

Consent for publication

Not applicable.

Competing interests

The authors, other than RS, received funding from the National Institute for Health Research (NIHR) under the Programme Grants for Applied Research scheme (RP-PG-0109-10116) for completion of the ATTOM study. RS declares no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nimmo, A., Steenkamp, R., Ravanan, R. et al. Do routine hospital data accurately record comorbidity in advanced kidney disease populations? A record linkage cohort study. BMC Nephrol 22, 95 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Comorbidity
  • Chronic kidney disease
  • Routine healthcare datasets
  • Record linkage
  • Secondary care