This article has Open Peer Review reports available.
Predicting patient survival after deceased donor kidney transplantation using flexible parametric modelling
© Li et al. 2016
Received: 22 January 2016
Accepted: 16 May 2016
Published: 25 May 2016
The influence of donor and recipient factors on outcomes following kidney transplantation is commonly analysed using Cox regression models, but this approach is not useful for predicting long-term survival beyond observed data. We demonstrate the application of a flexible parametric approach to fit a model that can be extrapolated for the purpose of predicting mean patient survival. The primary motivation for this analysis is to develop a predictive model to estimate post-transplant survival based on individual patient characteristics to inform the design of alternative approaches to allocating deceased donor kidneys to those on the transplant waiting list in the United Kingdom.
We analysed data from over 12,000 recipients of deceased donor kidney or combined kidney and pancreas transplants between 2003 and 2012. We fitted a flexible parametric model incorporating restricted cubic splines to characterise the baseline hazard function and explored a range of covariates including recipient, donor and transplant-related factors.
Multivariable analysis showed the risk of death increased with recipient and donor age, diabetic nephropathy as the recipient’s primary renal diagnosis and donor hypertension. The risk of death was lower in female recipients, patients with polycystic kidney disease and recipients of pre-emptive transplants. The final model was used to extrapolate survival curves in order to calculate mean survival times for patients with specific characteristics.
The use of flexible parametric modelling techniques allowed us to address some of the limitations of both the Cox regression approach and of standard parametric models when the goal is to predict long-term survival.
Outcomes following kidney transplantation are commonly analysed using Cox regression models. Such analyses have been instrumental for understanding the influence of both donor and recipient factors on post-transplant events, such as graft failure and patient mortality [1–5]. However, the Cox regression approach places emphasis on estimating relative risk and does not make any distributional assumptions about the absolute risk of an event. Therefore, its usefulness in predicting survival beyond the period of observed data is limited . Following kidney transplantation, the risk of death is highest in the period immediately after surgery, but decreases sharply and then changes direction when the risk of death starts to gradually increase over time. While a number of standard parametric models (such as the exponential, Weibull or loglogistic) are available and could facilitate extrapolation of survival data, they are not flexible enough to accommodate hazard functions that change direction.
In some situations, we not only want to understand what factors influence relative survival, but we also want to predict long-term survival for patients with given characteristics. Estimates of life expectancy following transplant are important as a basis for having informed discussions with individual patients and their relatives. For decision-making at a population level, estimates of mean survival are needed to inform cost-effectiveness evaluations that compare two or more treatment alternatives in terms of both lifetime health gains and costs. There has also been considerable interest in the development of survival prediction models and scoring tools for use in kidney allocation systems. A number of predictive models have been proposed, such as the Recipient Risk Score (RRS), Life Years From Transplant (LYFT) and the Expected Post Transplant Survival (EPTS) score, the latter which was adopted as a measure alongside the Kidney Donor Profile Index (KDPI) to facilitate longevity matching in the revised kidney allocation system approved by the Organ Procurement and Transplantation Network in the United States in 2013 [7–11]. The primary motivation for the current analysis is to develop a predictive model to estimate post-transplant survival as a potential approach to inform the design alternative allocation schemes for deceased donor kidneys in the United Kingdom.
In order to estimate mean patient survival from observed data, it is desirable to have complete information about when most or all patients have died. If the data are not complete, estimates of mean survival will not reflect the full distribution of survival times and will likely underestimate true survival . For recipients of kidney transplants, waiting to observe post-transplant mortality for a complete cohort of patients would require several decades of follow-up. To circumvent this problem, predictive models such as the aforementioned LYFT approach used estimates of median rather than mean survival times . In contrast to the mean, median survival only requires sufficient follow-up to observe when 50 % of patients have died. However, with gradual improvements in post-transplant survival, even median survival can exceed 15 years. The survival models for LYFT were developed based on transplant recipient data spanning the period 1987 to 2006, thus highlighting another dilemma: predicting survival times based on data from patients who received transplants as many as 20 years ago may not accurately reflect the current clinical situation and the data often need to be further adjusted to reflect improvements in survival over time. For example, advances in surgical technique, organ preservation technology, immunosuppressive therapy and changes in the age and comorbidity profiles of both donors and recipients all have the potential to influence post-transplant outcomes.
Unlike the Cox regression approach, flexible parametric models characterise the baseline hazard directly and can therefore provide smooth estimates of the hazard and survival functions for any combination of covariates and can be used to extrapolate survival beyond the observed data . The ability to extrapolate also means that it is not necessary to rely on older historical data simply to have sufficient long-term follow-up to observe enough deaths. By choosing to focus on data from transplants that have been carried out more recently, a parametric modelling approach offers the advantage of allowing us to generate predictions of mean patient survival that are more reflective of the characteristics of the current transplant population and of current clinical practice.
In this analysis, we demonstrate the application of the flexible parametric modelling approach proposed by Royston and Parmar [6, 13] to predict mean patient survival among recipients of kidney transplants from deceased donors in the United Kingdom. We begin by describing the dataset and explaining the approach we took to determine how many years of historical data we should use to inform model development. We then present the fitted flexible parametric model and demonstrate agreement between observed and predicted survival. Finally, we use the model to extrapolate beyond the observed data in order to predict mean survival for patients with a given set of characteristics.
NHS Blood and Transplant is the central authority responsible for managing the UK Transplant Registry, which records mandatory data for kidney transplants performed in all transplant centres across the UK . Anonymised data on all first-time kidney and combined kidney and pancreas transplants performed between 1993 and 2012 were obtained from the registry. Patients <18 years old at the time of transplant, recipients of kidneys from living donors, en bloc and double transplants were excluded from the analysis, as were recipients of kidneys transplanted with organs other than the pancreas.
Determining how many years of transplant data to include in model development
Kaplan-Meier curves and log-rank tests were used to explore if there was any evidence of notable shifts in mortality rates over the 20-year period that would justify controlling for change over time or potentially restricting the analysis to more recent years of data. Several approaches for dividing the dataset into cohorts based on year of transplant were explored, including 5-year intervals, 10-year intervals and intervals that coincided with changes to the UK national kidney allocation scheme in 1998 and 2006. The list of variables that were routinely recorded in the UK Transplant Registry changed between 1993 and 2012 and so the availability of key variables was also an important consideration in deciding whether to model survival using all of the data or to limit the analysis to a more recent subset. Based on a combination of the above factors, a decision was made to restrict the development of the flexible parametric model to patients who received transplants between 2003 and 2012; however, longer-term data from transplants performed between 1993 and 2002 were used to check the plausibility of extrapolated survival based on the fitted model.
Previous published analyses and prognostic models were reviewed to identify potential factors for inclusion in the development of the model to predict post-transplant patient survival [3, 8, 9]. Recipient factors of interest included age, gender, ethnicity, primary renal diagnosis, pre-emptive transplant, waiting time, kidney only versus combined kidney and pancreas transplant and the calculated reaction frequency of antibodies to human leukocyte antigen (HLA). Calculated reaction frequency (cRF) is a measure of the sensitisation level for each patient and is calculated as the percentage of donors in a pool of 10,000 UK donors with whom the patient is HLA antibody incompatible, similar to the concept of calculated panel reactive antibody . Patients with a cRF between 0 and 9 % were considered non-sensitised, whereas patients with a cRF ≥ 85 % were classed as highly sensitised . Donor factors of interest included age, ethnicity, weight, history of hypertension, diabetes, circulatory-death versus brain-death donor and cause of death. Cold ischaemia time and the level of HLA mismatch were also included. HLA mismatch was graded from level 1 (000-mismatched) to level 4 (poorly matched) as described in the UK 2006 National Kidney Allocation Scheme .
Categorical variables were created for each of these factors and the univariate effect of each factor on survival was explored using log-rank tests . After making the decision to restrict model development to patients who received transplants between 2003 and 2012, most variables had either complete or only a small amount of missing data (<2 %) and therefore we did not perform imputation in order to facilitate the model fitting process. However, data for two donor factors, hypertension and diabetes, were missing in approximately 8 % of cases. For these variables, two approaches to handling missing data were explored. First, in order to retain these cases during model fitting, additional categories for missing donor hypertension and donor diabetes status were created. Second, multiple imputation using chained equations was performed and results were compared for consistency with the non-imputed dataset.
Fitting the multivariable flexible parametric model
We followed the Royston-Parmar approach to fitting a flexible parametric model, in which the baseline distribution is modelled as a restricted cubic spline function of log time [6, 17]. The first step in the development of the prognostic model was to determine the appropriate complexity or number of knots to characterise the baseline spline function and choose a suitable scale (proportional hazards, proportional odds or probit) . We initially fitted models on each of the three scales while varying the number of interior knots from 0 to 4 and inspected the Akaike information criterion (AIC) to determine the optimal fit.
For the multivariable model, the data were then split 2:1 into derivation and validation subsets and variable selection was performed on the derivation dataset using backward elimination and a p-value threshold of 0.10. We tested selectively for clinically plausible interactions and explored the possibility of time-dependent effects for specific covariates if log-log plots suggested any departures from proportionality of hazards over time. We used the model fitted to the derivation subset to predict survival curves in the validation subset and compared these graphically. The final model was then refitted to the combined derivation and validation dataset and results are reported with the index of concordance (c index) as a measure of discrimination. The c index estimates the probability of concordance between predicted and observed outcomes with a value of 0.5 indicating no predictive discrimination and a value of 1.0 indicating perfect separation of patients with different outcomes . The fitted model was then used to extrapolate survival curves for patients with given characteristics in order to generate predictions of mean survival by calculating the area under the curve.
All analyses were conducted in Stata (Version 13, Stata Corp, College Station, Texas, USA). The flexible parametric model was fitted using the stpm2 command .
Restricting model development to transplants carried out between 2003 and 2012
Univariate survival analysis by recipient, donor and transplant factors for transplants carried out between 2003 and 2012 (N = 12,307)
Crude mortality rate %
p-value (log-rank test)
Kidney and pancreas
< 6 months
6 months to <2 years
> 2 years
Primary renal disease
Diabetic nephropathy (type 1)
Diabetic nephropathy (type 2)
Renal vascular disease
Polycystic kidney disease
< 55 kg
> 95 kg
Donor cause of death
Level 1 
Level 2 [0 DR + 0/1 B]
Level 3 [0 DR + 2 B] or [1 DR + 0/1 B]
Level 4 [1 DR + 2 B] or [2 DR]
Cold ischaemia time
< 12 hrs
12 to <18 hrs
18 to <24 hrs
> = 24 hrs
Shape of the hazard function and choice of spline function
Based on AIC, the preliminary flexible parametric model with the optimal fit was found to be on a proportional hazards scale with 2 interior knots for the spline function. Before fitting the multivariable model, we compared the preliminary model based on the chosen scale and number of knots without covariates to the observed data by examining the shape of the hazard and survival functions.
Fitting the multivariable flexible parametric model
Final flexible parametric model fitted to combined derivation and validation dataset showing coefficients for each of the 3 spline terms for the baseline hazard function and hazard ratios for significant predictors of post-transplant patient survival (N = 12,283)
Baseline hazard (log hazard scale)
95 % CI
Restricted cubic spline 1
Restricted cubic spline 2
Restricted cubic spline 3
95 % CI
> = 60
Primary renal diagnosis
Diabetic nephropathy (type 1)
Diabetic nephropathy (type 2)
Polycystic kidney disease
> = 60
Interaction terms for recipient age and gender, recipient age and diabetic nephropathy as the primary renal diagnosis, and donor age and hypertension history were tested, but none were found to be significant. To allow for the possibility of time-dependent effects for any of the covariates in the model, we first examined log-log plots for any potential departures from the proportional hazards assumption and identified pre-emptive transplant, type of transplant (kidney only versus combined kidney and pancreas transplant) and cold ischaemia time as potentially varying over time. We tested time-dependent effects for these variables in the flexible parametric model, but again none were found to improve the fit of the model.
Agreement between observed and predicted survival
Extrapolation beyond the observed time period to predict mean survival
There are many examples in the transplant literature of analyses that have examined the influence of various factors on patient survival following kidney transplantation, most of which are based on Cox regression models [3–5]. The objective of the current analysis was to revisit post-transplant mortality using a different modelling technique that facilitates extrapolation of survival curves beyond the period of observed data and allows us to predict mean patient survival times.
Before fitting a flexible parametric model, we felt it was important to first consider how much historical data to include in the development of our model. A conventional approach might be to try and maximise sample size and number of years of follow-up in order to capture any changes in the hazard rate over as long a period as possible; however, we felt that this needed to be balanced with the objective of developing a predictive model that reflects current expectations of post-transplant survival. Although 20 years of historical data on transplants were available for analysis, we chose to restrict model development to the most recent 10 years for two main reasons. First, our exploratory analysis of Kaplan-Meier curves indicated that there had been significant improvements in survival for patients who received transplants between 2003 and 2012 in comparison to patients who received transplants between 1993 and 2002. Second, a wider number of variables that were of potential interest as predictors in the survival model were only available in the more recent subset of the data, including sufficient sample sizes to facilitate a comparison between recipients of organs from circulatory-death donors and brain-death donors. Although restricting model development to transplants performed between 2003 and 2012 reduced the overall sample size and limited the maximum duration of follow-up to 10 years, it was judged that on balance, an analysis based on the more recent subset of data would be a better reflection of current clinical practice and more appropriate given the intended use of the model for predicting survival. Quite often the decision about how much historical data to include in model development is determined primarily by availability of and access to information sources. While the decision that we took to only use the most recent 10 years of transplant data is not widely generalisable beyond our analysis, we advocate considering changes in the clinical context that might influence survival and using exploratory analysis to provide empirical guidance to inform this decision prior to model fitting.
A range of potential explanatory variables were considered during the model development process, but the final model was reduced to just four recipient factors (age, gender, primary renal diagnosis and pre-emptive transplant) and two donor factors (age and hypertension). Notably, we found no difference in death rates between recipients of kidneys from circulatory-death donors in comparison to brain-death donors. In addition, controlling for type 1 diabetic nephropathy as the primary renal diagnosis, we found no difference in death rates for recipients of kidney only transplants compared to recipients of combined kidney and pancreas transplants. These findings are broadly consistent with previous UK analyses based on Cox regression models. For example, Johnson et al identified recipient age, donor age and diabetes to be significant predictors of 5-year patient survival . However, Johnson et al. found that a waiting time of 2 years or more and hypertension as the primary renal diagnosis in transplant recipients also significantly increased the risk of death at 5 years. In the present analysis, hypertension was grouped with other forms of renal vascular disease as a primary diagnosis, the latter which was also not found to be a significant predictor of survival by Johnson et al. The analysis by Johnson et al. was based on a slightly earlier time period and included patients who received transplants in the UK between 1995 and 2001; it did not include recipients of combined kidney and pancreas transplants or recipients of organs from circulatory-death donors. With respect to donor factors, the current analysis reaches similar conclusions to the findings of Watson et al. in the development of the UK Kidney Donor Risk Index (KDRI), which identified donor age group and donor hypertension as the two most important variables with the largest influence on transplant outcomes .
The UK Transplant Registry is a rich source of historical data and among patients who received transplants after 2002, many of the variables that we explored in our model had either complete or only small amounts of missing data. However, the amount of missing data for variables such as recipient primary renal diagnosis and donor hypertension potentially introduce an additional source of uncertainty into our final predictive model. For donor hypertension, we performed multiple imputation and confirmed that this did not change the effect of this variable on post-transplant survival estimates. Nonetheless, information on donor hypertension in the registry is obtained from various sources, ranging from medical records to family members, and we were unable to control for consistency with respect to the definition of donor hypertension in the dataset. The prominence of donor hypertension in post-transplant survival models highlights the importance of improving the completeness and consistency with which this variable is recorded. In addition, the registry does not contain information on other factors such as comorbidities or dialysis history for transplant recipients, so we were unable to explore the potential effect of these variables on patient survival in the current analysis.
The flexible parametric approach to modelling survival offers several advantageous features. In comparison to semi-parametric approaches such as the Cox regression model, fully parametric models characterise the baseline hazard, which facilitates extrapolation beyond the period of observed data. In comparison to standard parametric models such as the Weibull, the use of restricted cubic splines allows for greater flexibility to accommodate more complex hazard functions that increase and decrease over time and are commonly encountered in medical research. The objective of this analysis was to demonstrate the application of a flexible parametric modelling approach to predict mean survival times for recipients of kidney transplants. The application of flexible parametric techniques to estimate mean survival in patients who are receiving dialysis would facilitate comparisons of survival differences between alternative treatment modalities. In addition to informing cost-effectiveness analyses, this approach may have a variety of applications, from the development of prognostic models for informing discussions with patients about treatment outcomes to the use of scoring tools as part of organ allocation schemes. Given the advantages of flexible parametric models, we feel that it is a particularly useful approach for conducting multivariable analysis of patient-level observational data when the goal is to predict long-term survival.
AIC, Akaike information criterion; cRF, calculated reaction frequency; EPTS, expected post transplant survival; HLA, human leukocyte antigen; KDPI, kidney donor profile index; KDRI, kidney donor risk index; LYFT, life years from transplant; RRS, recipient risk score.
The authors are grateful to all the transplant centres in the UK who contributed data on which this article is based.
This article presents independent research commissioned by the National Institute for Health Research(NIHR) under the Programme Grant for Applied Research (RP-PG-0109-10116) entitled Access to Transplantation andTransplant Outcome Measures (ATTOM). The views expressed in this publication are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Availability of data and materials
Data for this analysis were obtained from the UK Transplant Registry held by NHSBlood and Transplant, whose Data Access Policy is available from: http://www.odt.nhs.uk/uk-transplantregistry/data/data-access-policy/
All authors contributed to the design of the study. MLR obtained the registry data. BL performedthe analysis and drafted the manuscript. All authors read, revised and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
This study was approved by the Observational Research Ethics Committee of the London School of Hygieneand Tropical Medicine (Ref 6383).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Morris PJ, Johnson RJ, Fuggle SV, Belger MA, Briggs JD. Analysis of factors that affect outcome of primary cadaveric renal transplantation in the UK. HLA Task Force of the Kidney Advisory Group of the United Kingdom Transplant Support Service Authority (UKTSSA). Lancet. 1999;354:1147–52.View ArticlePubMedGoogle Scholar
- Fuggle SV, Allen JE, Johnson RJ, Collett D, Mason PD, Dudley C, et al. Factors affecting graft and patient survival after live donor kidney transplantation in the UK. Transplantation. 2010;89:694–701.View ArticlePubMedGoogle Scholar
- Johnson RJ, Fuggle SV, O'Neill J, Start S, Bradley JA, Forsythe JL, et al. Factors influencing outcome after deceased heart beating donor kidney transplantation in the United Kingdom: an evidence base for a new national kidney allocation policy. Transplantation. 2010;89:379–86.View ArticlePubMedGoogle Scholar
- Summers DM, Johnson RJ, Allen J, Fuggle SV, Collett D, Watson CJ, et al. Analysis of factors that affect outcome after transplantation of kidneys donated after cardiac death in the UK: a cohort study. Lancet. 2010;376:1303–11.View ArticlePubMedGoogle Scholar
- Summers DM, Johnson RJ, Hudson A, Collett D, Watson CJ, Bradley JA. Effect of donor age and cold storage time on outcome in recipients of kidneys donated after circulatory death in the UK: a cohort study. Lancet. 2013;381:727–34.View ArticlePubMedGoogle Scholar
- Royston P, Lambert PC. Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model. College Station, Texas: Stata Press; 2011.Google Scholar
- Baskin-Bey ES, Kremers W, Nyberg SL. A recipient risk score for deceased donor renal allocation. Am J Kidney Dis. 2007;49:284–93.View ArticlePubMedGoogle Scholar
- Wolfe RA, McCullough KP, Schaubel DE, Kalbfleisch JD, Murray S, Stegall MD, et al. Calculating life years from transplant (LYFT): methods for kidney and kidney-pancreas candidates. Am J Transplant. 2008;8:997–1011.View ArticlePubMedGoogle Scholar
- Concepts for kidney allocation 2011, Organ Procurement and Transplantation Network.Google Scholar
- A Guide to Calculating and Interpreting the Estimated Post-Transplant Survival (EPTS) Score Used in the Kidney Allocation System (KAS). https://optn.transplant.hrsa.gov/media/1511/guide_to_calculating_interpreting_epts.pdf . Accessed 8 May 2014.
- Israni AK, Salkowski N, Gustafson S, Snyder JJ, Friedewald JJ, Formica RN, et al. New national allocation policy for deceased donor kidneys in the United States and possible effect on patient outcomes. J Am Soc Nephrol. 2014;25:1842–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Latimer N, NICE DSU. Technical Support Document 14: Undertaking survival analysis for economic evaluations alongside clinical trials - extrapolation with patient-level data. 2011. http://www.nicedsu.org.uk. Accessed 8 Dec 2014.Google Scholar
- Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21:2175–97.View ArticlePubMedGoogle Scholar
- Pruthi R, Hilton R, Pankhurst L, Mamode N, Hudson A, Roderick P, et al. UK Renal Registry 16th annual report: chapter 4 demography of patients waitlisted for renal transplantation in the UK: national and centre-specific analyses. Nephron Clin Pract. 2013;125:81–98.View ArticlePubMedGoogle Scholar
- Johnson RJ, Fuggle SV, Mumford L, Bradley JA, Forsythe JL, Rudge CJ. A New UK 2006 National Kidney Allocation Scheme for deceased heart-beating donor kidneys. Transplantation. 2010;89:387–94.View ArticlePubMedGoogle Scholar
- Collett D. Modelling Survival Data in Medical Research 2nd edition ed. London: Chapman & Hall/CRC; 2003.Google Scholar
- Lambert PC, Royston P. Further development of flexible parametric models for survival analysis. Stata J. 2009;9:265–90.Google Scholar
- Harrell Jr FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.View ArticlePubMedGoogle Scholar
- Wolfe RA, McCullough KP, Leichtman AB. Predictability of survival models for waiting list and transplant patients: calculating LYFT. Am J Transplant. 2009;9:1523–7.View ArticlePubMedGoogle Scholar
- Watson CJ, Johnson RJ, Birch R, Collett D, Bradley JA. A simplified donor risk index for predicting outcome after deceased donor kidney transplantation. Transplantation. 2012;93:314–8.View ArticlePubMedGoogle Scholar