Renal function, calcium regulation, and time to hospitalization of patients with chronic kidney disease

BACKGROUND
Chronic kidney disease is associated with disruption of the endocrine system that distorts the balance between calcitriol, calcium, phosphate and parathyroid hormone in the calcium regulation system. This can lead to calcification of the arterial tree and increased risk of cardiovascular disease and death. In this study we develop a health metric, based on biomarkers involved in the calcium regulation system, for use in identifying patients at high risk for future high-cost complications.


METHODS
This study is a retrospective observational study involving a secondary analysis of data from the kidney disease registry of a regional managed care organization. Chronic kidney disease patients in the registry from November 2007 through November 2011 with a complete set of observations of estimated glomerular filtration rate, calcitriol, albumin, free calcium, phosphate, and parathyroid hormone were included in the study (n = 284). Weibull regression model was used to identify the most significant lab tests in predicting "waiting time to hospitalization". A multivariate linear path model was then constructed to investigate direct and indirect effects of the biomarkers on this outcome.


RESULTS
The results showed negative significant direct effects of phosphate and parathyroid hormone on "waiting time to hospitalization". Base on this result, the risk of hospitalization increases 16.8% for each 0.55 mg/dl increase in phosphate level and 13.5% for each 0.467 increase in the natural logarithm of parathyroid hormone. Positive indirect effects of calcitriol surrogate (calcidiol), free calcium, albumin and estimated glomerular filtration rate were observed but were relatively small in magnitude.


CONCLUSION
Variables involved in the calcium regulation system should be included in future efforts to develop a quality of care index for Chronic Kidney disease patients.

Averaging would be a limitation in the current study given our desired to describe causal relationships. Given a longer follow up period of 4 years and a larger sample size, we now can more precisely assess the impact of kidney function and calcium homeostasis imbalances on the risk of hospitalization using longitudinal instead of averaged data.
The main purpose of this study is to develop a health metric, based on variables involved in the calcium regulation system (calcium, PO4, PTH), a kidney function indicator (eGFR), and a set of kidney function associates (calcitriol and albumin), that is related to the risk of future high-cost complications. We test the significance of estimated direct and indirect effects of these kidney function measures and serum chemistry values on waiting time to hospitalization in CKD patients with widely varying severity of disease.
The National Kidney Foundation (NKF) defines CKD as either kidney damage or a sustained kidney glomerular filtration rate (GFR) of less than 60 ml/min/1.73 m 2 for 3 or more months [10]. An estimate of GFR (eGFR) can be calculated from the patient's routine blood tests. The highest incidence rate of End-stage Renal Disease (ESRD) occurs in patients older than 65 years [11]. Age, diabetes mellitus, and hypertension are major predictors of chronic kidney disease [11].
Phosphate (PO4), calcium, activated vitamin D 1,25dihydroxyvitamin D 3 (calcitriol), parathyroid hormone (PTH) and their influence on kidney function play an important role in controlling the level of phosphate and calcium in the bloodstream. Healthy kidneys are rich with 1-alpha hydroxylase enzyme, which plays a major role in turning vitamin D into its active form, calcitriol. Calcitriol acts on vitamin D receptors, which control calcium channels and therefore is an important component of the calcium regulation system. Secondarily, it controls the absorption of phosphate and in turn regulates PTH levels. When kidneys fail, their ability to activate vitamin D is diminished. Without the activated vitamin D to facilitate calcium and phosphate absorption, PTH will increase, which results in calcium and phosphate resorption from the bone [12,13] (see Figure 1). An illustration of the interaction between calcium, phosphate, PTH, and calcitriol is presented in Figure 1 . Silver et al. [3], and Palmer et al. [8] provide additional information.

Data
The data set analyzed contains blood test results: eGFR, calcidiol, albumin, Ca ++ , PO4, and PTH from 284 de-identified patients. The data did not include the measurement of the serum calcitriol and so we use 25hydroxyvitamin D (calcidiol) as a surrogate for serum calcitriol. There is a significant biphasic relationship between serum calcitriol and serum calcidiol. This relation is positive at normal calcidiol levels, because of the effect of substrate deficiency on calcitriol production, but negative at low calcidiol levels, because secondary hyperparathyroidism simulates the synthesis of calcitriol [14]. About 90% of the serum calcidiol measurements, in our data set, were in normal range. So, it is reasonable to used the strongly correlated calcitriol as a surrogate in our study. Therefore, calcidiol effects found in our analyses can be interpreted as calcitriol effects.
The data set were drawn from the kidney disease registry of a managed care organization (MCO) during the 4 year period from Nov. 2007 through Nov. 2011. The data set also included age, gender, service date and a complete financial profile for all medical claims that were paid for these patients over that same time period. Service date is a date that medical services were provided. A service constitutes lab tests, procedures, hospital services, physician services, office visits, or other services for which claims were submitted. Charges for services were matched with corresponding date of service. This data was obtained after review of the study protocol by Figure 1 Interaction between calcium, phosphate, PTH, and calcitriol [12]. http://www.biomedcentral.com/1471-2369/14/154 the university at Buffalo's Health Sciences Institutional Review Board (HSIRB) and permission from the regional MCO.
The registry contained records of 14,264 patients who were treated for kidney disease during this time period. 5,799 of those had confirmed CKD (as defined by eGFR calculated using the MDRD equation of less than 60 ml/min/1.73 m 2 for 3 or more months.) 284 of those had complete observation (see subsequent paragraphs). We compared the 284 selected patients (study group) to the remaining 5,515 CKD patients to check the representativeness of the sample analyzed. For each individual, we calculated the average lab tests over the entire time period and then used PROC TTEST in SAS software. Table 1 gives the summary of the results as the number of observations, mean, and standard deviation for each group and the pooled p-value for mean differences for each variable. Considering the significant level α = 0.05/7 = 0.007, using the Bonferroni correction, the means are not significantly different. Thus, we have no significant evidence that the sample of 284 patients analyzed in this study are not representative of all CKD patients in the registry.
In this data set, a "hospitalization" is said to have occurred when a month of charges exceeded $3,000 and those charges were confirmed to be for hospital services. We defined the date of hospitalization to be the service date with maximum cost in a month with monthly cost greater than $3,000. For each individual, we determined the dates of all hospitalizations, if any.

Definition of variables
It is generally agreed that the ionized (or free) calcium is the form that is biologically active. Because of this, free calcium (Ca ++ ) is a more useful index than total calcium and provides a better indication of calcium status [15][16][17]. Ionized calcium can be measured directly with the use of calcium-specific electrodes. If ionized calcium cannot be measured, however, certain approximations can be utilized to distinguish the protein bound calcium from the ionized fraction calcium. We adopt the following formula from [15] to calculate free calcium.
% protein-bound calcium ≈ 0.8 albumin(g/l) + 0.2 globulin (g/l) + 3 Free calcium (ionzed calcium Ca ++ ) ≈ total calcium − protein bound to calcium For patients with at least one hospitalization, we searched the hospitalization intervals, from the first to the last, for a complete set of observation of lab tests of interest. Hospitalization intervals are; 3 weeks after the first service date until 3 weeks before the first hospitalization, 3 weeks after one hospitalization until 3 weeks before the next hospitalization, and 3 weeks after the last hospitalization until the last service date. We did not include lab tests taken within 3 weeks before a hospitalization due to the fact that these tests could be taken in preparation for that hospitalization, which would make for an artificially short waiting time to hospitalization. We The number of observations, mean and standard deviation for each group and the pooled p-value for mean differences for each variable. ( * ) Study group is the confirmed CKD patients in the registry with use-able complete observations. ( * * ) The remaining confirmed CKD patients in the registry (as defined by at least one eGFR calculated and max{eGFR} ≤ 60 ml/min/1.73 m 2 ). http://www.biomedcentral.com/1471-2369/14/154 also excluded test taken within 3 weeks after the previous hospitalization due to the fact that these tests may reflect a causal association of last hospitalization event on test scores which is not direction of causation we want to study.
To search for a complete set of lab tests in an interval, we selected the first set of observed lab test scores in that interval. Since PO4 comes last in our causal ordering we selected an observation of PO4 which a complete set of other lab tests was observed before or simultaneously with the selected observation. If the first interval did not contain a complete set of lab scores then we moved to the next interval and continued in this fashion until an interval with a complete set of lab scores was observed. If a complete set was not observed in at least one interval then the patient was not included in the analysis data set. After completing this process for each patient, 284 patients remained in the data set to be analyzed.
To define the outcome variable, waiting time to hospitalization (WTH), we considered the hospitalization interval in which a complete set of lab tests was observed and then defined the waiting time to hospitalization by the time from the left of that interval until the next hospitalization. For individuals with no hospitalization, we selected the first lab tests in the time interval from 3 weeks after first service date until the last service date. For these individuals with no hospitalization, waiting time to hospitalization is censored at C, where C is the time from 3 weeks after the first service date until the last service date. Also for individuals with at least one hospitalization and with a complete set of lab observations only after the last hospitalization, waiting time to hospitalization is also censored at C, where C is the time from 3 weeks after the last hospitalization until the last service date. (see Figure 2) Normal probability plots were used to assess the normality of each variable (in a statistical sense). If a variable was non-normal a normalizing transformation was applied. We used the natural log transformation of the non-normal variable PTH to normalize it. No transformation was required for other lab variables because they appeared to be normally distributed, based on normal probability plot. The normal limits for lab test scores studied are given in Table 2. Normal limits for ln(PTH) were calculated by similarly transforming the upper and lower normal limits of PTH.
Using the normal limits in Table 2, we standardized each ln(PTH) and each variable that did not require transformation. The average of the corresponding normal limits was taken as the mean, and the range of the normal limits divided by 4 as the standard deviation in the standardization calculations. Let z-eGFR, z-albumin, z-calcidiol, z-PTH, z-PO4 and z-Ca ++ denote the z-scores of the corresponding variables (transformed variable in the case of PTH).

Statistical methods
We first used a censored regression model [18,19] to estimate the effect of each kidney function and each calcium homeostasis covariate while controlling for other covariates on the mean of the natural log of waiting time to hospitalization. The other variables in causal ordering given in the next section are modeled as function of the previous variables in that ordering, using multiple regression analysis.
In the censored regression model ln(Y ) = μ+γ Z +σ . with Weibull distribution for Y, σ is the scale parameter and is the error term. Z is vector of covariates and γ is the vector of regression coefficients, which represent , where t 1 is 21 days after the first service date and t 2 is 21 days before the first hospitalization. III. Patients with hospitalization who had their first complete set of observation in the k th hospitalization interval [t 1k , t 2k ], where t 1k is 21 days after the k th hospitalization and t 2k is 21 days before the (k + 1) th hospitalization. IV. Patients with hospitalization who had their first complete set of observation in the last hospitalization interval [t 1l , t 2l ], where t 1l is 21 days after the last hospitalization and t 2l is the last service date. http://www.biomedcentral.com/1471-2369/14/154 The Weibull model also has a proportional hazards interpretation. We used residuals to investigate the proportional hazards assumption, using martingale and deviance residuals for the Cox proportional hazards regression analysis. There is no indication of a lack of fit of the model to individual observations. We also created plots of the cumulative martingale residuals against the values of the each covariate and computed the p-value of a Kolmogorov-type supremum tests based on a sample of 1,000 simulated residual patterns. There was no evidence to reject the proportional hazard assumption. With the Cox proportional hazards model, the regression coefficients correspond directly to the log of hazard ratios.
The hazard ratio corresponding to the k th variable for a Weibull model is given by exp(−β k /σ ) where β k is the corresponding regression coefficient for the k th variable and σ is the scale parameter.

Multivariate path analysis
Path analysis is a statistical technique used to examine direct and indirect relationships among a set of causally ordered variables. Such relationships, if linear, can be described in a system of linear equations with random variables of interest and unknown parameters. Path analysis was first developed by Sewall Wright in the 1930s for use in phylogenetic studies [20,21]. Classical path analysis assumes a complete casual ordering of variables with unidirectional relationships (i.e., no feedback loops.) Path analysis allows us decompose total effects of upstream variables on subsequent variables in the causal ordering into direct effects and indirect effects. While classical methods require a complete causal ordering of variables, Multivariate Linear Path Model (MLPM) allows a relaxation of this assumption, requiring only a causal ordering of sets of variables. For details we refer to Pak [22,23].
In linear path analysis direct effects, which are the corresponding regression coefficients, and indirect effects; i.e., those acting through an intermediate variable W, in the pathway X → W → Z, can be examined. Indirect effects can be calculated by multiplying the corresponding direct effects. For example, indirect effect of X on Z in the pathway X → W → Z is product of direct effect of X on W and direct effect of W on Z. The p-value for testing the indirect effects can be found using the Intersection Union Test (IUT) [24]. Total effects are defined as the sum of direct and all possible indirect effects.
In this study not all of the variables of interest can be causally ordered in a unidirectional fashion. Sets of variables, however, can be ordered. Thus, we use multivariate linear path modeling, which requires only partial ordering; i.e., an ordering of sets of variables. The models then involve multivariate regression models in place of the univariate regression models associated with the classical path analysis. In the multivariate setting causal ordering is not necessary within each set of variables and the errors in the model for those variables are not required to be independent. In this application we assume the following causal ordering. Considering age as the exogenous variable and ln(Y) as the endpoint endogenous variable, we assume the following casual ordering of the variables, illustrated above. Note that we do not assume any causal ordering between z-eGFR, z-albumin and z-calcidiol, thus making our problem one that is conducive to MLPM.
The assumption of casual ordering of variables, illustrated above, is based on the review of the literature presented above. Low eGFR, low calcitriol and low albumin can be manifestations of underlying kidney damage. Calcitriol has effects on calcium levels in the bloodstream. Low calcitriol inhibits absorption of calcium more than phosphate from the intestines, while resorption from the bone is equal, thus contributing to high serum phosphate levels. Low calcium induces high PTH, which causes calcium and phosphate resorption from bone. This bone resoption raises calcium to its proper level while raising phosphate above its desired level.
Estimation of the multivariate path model parameters is achieved by the ordinary least square method [24,25] applied separately to each model (univariate or multivariate, as appropriate) for all variables except ln(Y) = ln ( WTH-21). For ln(Y) and associated censored regression model, ordinary least square estimation of the parameters would produce biased estimators. So, we used PROC LIF-EREG, in SAS software [19], which estimates the parameters using maximum likelihood estimation for regression of the lab tests on ln(Y), assuming a Weibull distribution for Y.
For the measures of goodness of fit, in the simple linear regression models, we used the squared coefficient of multiple determination, R 2 , to measure the predictivity of the model outcome variable in terms of explained variability of the dependent variable [25]. We note that for the censored regression model, R 2 can not be calculated due to censoring. So for the measure of explained variation by this model, we used the Kent and O'Quigley [26] product-moment correlation coefficient R 2 PM to measure the strength of relationship of ln(Y) with the covariates, in total. For the Censored regression model given by ln(Y ) = μ + γ Z + σ . , we have where σ is the scale parameter and σ 2 = π 2 /6 ≈ 1.645 is the variance of error term in the Weibull model. This quantity was estimated bŷ whereγ andσ are the MLE's of γ and σ from the Weibull regression analysis.
There were 70 individual with no hospitalization out of the 284 patients in our study and, hence, had censored observation. From the 214 individuals with hospitalization only 137 patients had a complete set of observed lab tests prior to the next hospitalization and 77 patients had a complete set of observed lab tests only after the last hospitalization. We had 147 (70 + 77) right censored values out of the 284 observations. Table 3 gives a summary of the results of direct effects (top value in each cell) and the p-values (bottom value) for the variables in each equation in the multivariate path model. It also gives the estimates of the mean square errors (MSE) and r-squares of the regression equations.

Results of path modeling
In Table 3 We note that the error term in the censored regression model for ln(Y), Y = WTH-21, is σ where σ is the scale parameter of the Weibull distribution, with estimated valueσ = 0.88 from the result, and is an error with variance σ 2 = π 2 /6 ≈ 1.645. Thus, the estimate of the conditional variance of the ln(Y) given covariates in the model, which isσ 2 σ 2 ≈ 1.27, is given in Table 3 instead of MSE. For this censored regression model the Kent and O'Quigley product-moment correlation coefficient measureR 2 PM = 0.09 is given, while the usual R 2 is presented for each other equation.
The primary purpose of path modeling is to propose a plausible interpretation of the observed data and to describe effects in terms of significant direct and indirect effects. To find the best fitted model we used backward selection with significance level α = 0.05 for each linear model to obtain the most parsimonious model that includes only significant effects. Table 4 gives the estimates of direct effects and the pvalues for the variables that remained in the parsimonious http://www.biomedcentral.com/1471-2369/14/154  model. It also gives the estimates of mean square errors and r-squares for the models for upstream variables in the causal ordering. Figure 3 illustrates the significant associations of variables in the parsimonious multivariate path model with ln(Y). The arrows indicate single direct effects and the estimates of these direct effects are given on each arrow. The sign on each estimate of each direct effect indicates the direction of the association. Theses path coefficients measure the effect of a one standard deviation change in each original predictor variable on the response variable. Indirect effects can be calculated by multiplying the corresponding direct effects along the indirect path from a variable to ln(Y). For example, indirect effect of X on Z in the pathway X → W → Z is product of direct effect of X on W and direct effect of W on Z. The p-value for testing the indirect effects can be found using the Intersection Union Test (IUT) [20,21,24].   • Age had a significant second order indirect effects, one through z-calcidiol and then through z-PTH (0.0003598, p-value< 0.05) and a third order indirect effect through z-calcidiol, z-Ca ++ and z-PTH (0.0000739, P-value< 0.05) for the total effect of 0.00043. This effect is clinically insignificant.

Discussion
From the result of the censored regression model, we identified that PTH and PO4 are the most significant predictors of high-cost hospitalization and, along with albumin, they have the largest effects (see Table 3). The prediction equation from the parsimonious model is The right hand side of this equation defines a calcium regulation-base health metric for CKD.
The negative significant effect of PTH on waiting time to hospitalization indicates patients with higher PTH are at higher risk of hospitalization. With Weibull model for ln(Y), the regression coefficients correspond directly to the log of hazard ratios so that a negative coefficient for a particular covariate corresponds to a higher risk of hospitalization occurring. The estimated hazard ratio corresponding to the effect of PTH is exp(0.112/0.88) = 1.135. The hazard ratio of 1.135 indicates that the risk of hospitalization will increase 13.5% for each 1 standard deviation increase in z-PTH. A one standard deviation increase in z-PTH corresponds to an approximate 0.467 increase in the natural logarithm of PTH level (recall Figure 3 The parsimonious multivariate path model. http://www.biomedcentral.com/1471-2369/14/154 that the natural logarithm transform was used in defining z-PTH) or, equivalently a 60% increase in PTH.
The negative significant effect of phosphate indicates higher risk of hospitalization for patients with higher phosphate levels. A hazard ratio of 1.168 associated with this effect indicates that the risk of a hospitalization will increase 16.8% for each 1 standard deviation increase in z-PO4. A one standard deviation increase in z-PO4 is approximately 0.55 mg/dl increase in PO4 level. Thus, the risk of hospitalization is estimated to increase by 16.8% for each 0.55 mg/dl increase in PO4.
Extreme elevations of PTH levels along with hyperphosphatemia could result from an abnormal form of plasma PTH. The latter could result from an abnormal conversion of the pro-hormone to its secreted form [27].
Although, we did not find significant direct effects of eGFR, albumin, calcidiol (calcitriol surrogate) and Ca ++ on ln(Y), and hence on WTH, they are the most significant predictors of PTH and PO4. Therefore they had significant indirect effects on WTH through PTH and PO4. These effects are small, however, relative to the direct effects of PTH and PO4 on WTH.
Free calcium (Ca ++ ) was not observed directly in this study. Instead a derived version [16] was used as an approximation. We repeated our analysis using total serum calcium (Ca) instead of free calcium (Ca ++ ) and obtained essentially the same results. Since Ca ++ is what is regulated [15,16] by the calcium regulation system, we chose to present the analysis of Ca ++ in this paper.
Although, previous studies suggest that high serum alkaline phosphatase (ALP) levels predict mortality independent of bone metabolism parameters and liver function tests in CKD and chronic hemodialysis patients [28], we did not find any association between ALP and WTH when adding ALP in the model. Thus, we removed ALP from our model.
The current results provide a calcium regulation based health metric for identifying patients who are at high risk for future high-cost complications and that can be used as an outcome variable in assessing quality of care. Since the O'Quigley R 2 for the ln(Y) equation is low, however, more work is needed to identify additional lab results that are also predictive. The potential role of Fibroblast growth factor FGF-23 should be considered as candidates for addition. Fibroblast growth factor FGF-23 is a recently discovered regulator of calcium-phosphate metabolism. In CKD patients, FGF-23 levels rise in parallel with declining renal function long before a significant increase in serum phosphate concentration can be detected [29]. Unfortunately, FGF-23 was not available in our data set.
Race is known to be associated with both CKD progression and outcomes and is believed to have effects on PTH and calcium-phosphate metabolism. Our data, however, was provided without a race variable. It should be noted that the MCO calculated race specific eGFR's and that those analyzed in this study are race adjusted. Further study should include race as a factor in addition to using it in the calculation of eGFR.
From 14,264 patients in the registry of the MCO only 5,799 had confirmed CKD. 546 of them had observed lab test scores of interest during the 4 year period of study. Out of 546 patients 284 had observed lab scores in at least one hospitalization interval. Work to develop methods of imputation in the context of longitudinal data analysis is ongoing. Once completed, the method developed will allow more complete utization of data in the kidney registry.
Additional work with data rich in lab scores is needed to identify the additional risk factors for use in predictive models to develop a health metric for assessing the quality of care of CKD patients, while adjusting for the illness severity of their case mix. Such a metric, if highly predictive of hospitalization, ESRD or death outcome could serve as the foundation of a reimbursement system that fairly rewards physicians for quality of care. The results of the current study are encouraging and justify a similar prospectively designed investigation.

Conclusion
Variables involved in the calcium regulation system should be included in future efforts to develop a quality of care index for Chronic Kidney disease patients.