Technical advance | Open | Open Peer Review | Published:
Assessing the agreement of biomarker data in the presence of left-censoring
BMC Nephrologyvolume 15, Article number: 144 (2014)
In many clinical biomarker studies, Lin’s concordance correlation coefficient (CCC) is commonly used to assess the level of agreement of a biomarker measured under two different conditions. However, measurement of a specific biomarker typically cannot provide accurate numerical values below the lower limit of detection (LLD) of the assay, which results in left-censored data. Most researchers discard the data below the LLD or apply simple data imputation methods in the presence of left-censored data, such as replacing values below the LLD with a fixed number less than or equal to the LLD. This is not statistically optimal, because it often leads to biased estimates and overestimates the precision.
We describe a simple method using a bivariate normal distribution in this situation and apply SAS statistical software to arrive at the maximum likelihood (ML) estimate of the parameters and construct the estimate of the CCC. We conduct a computer simulation study to investigate the statistical properties of the ML method versus the data deletion and simple data imputation method. We also contrast the methods with real data using two urine biomarkers, Interleukin 18 and Cystatin C.
The computer simulation studies confirm that the ML procedure is superior to the data deletion and simple data imputation procedures. In all of the simulated scenarios, the ML method yields the smallest relative bias and the highest percentage of the 95% confidence intervals that include the true value of the CCC. In the first simulation scenario (sample size of 100 paired data points, 25% left-censoring for both members of the pair, true CCC of 0.238), the relative bias is −1.43% for the ML method, −40.97% for the data deletion method, and it ranges between −12.94% and −21.72% for the simple data imputation methods. Similarly, when the left-censoring for one of the members of the data pairs increases from 25% to 40%, the relative bias displays the same pattern for all methods.
When estimating the CCC from paired biomarker data in the presence of left-censored values, the ML method works better than data deletion and simple data imputation methods.
Biomarkers in blood and urine are important indicators for the diagnosis of diseases and risk-stratification. During the development of biomarker assays, several pre-analytic steps require comparison of paired values of biomarkers exposed to separate conditions, such as varying degrees of storage, freeze-thaw cycles, and different antibodies. For these experiments, comparison of paired biomarker values is a critical step to advance the development to the next stage. In practice, assays often have lower limits of detection (LLD) due to the limitation of analytic procedures, thereby making the comparison of paired values challenging. A data point below the detection limit is equivalent to being left censored because the exact value of the data point is unknown – it only is known that it lies below the LLD. Although left-censored data are more informative than missing data, they still lead to challenges in the data analysis.
Simple (ad hoc) approaches to address the left-censored data are to delete the value below the LLD or impute a fixed value such as one-half of the LLD or the LLD itself. However, these approaches yield biased estimates of the parameters of interest and they underestimate the variability in the data set because the same value is imputed repeatedly [1–3]. Urine biomarkers are very prone to this problem as the concentration of the biomarkers is greatly influenced by urine volume. In diluted urine, biomarker values may be below the LLD. Also, biomarkers whose concentrations are below ng levels are prone to this problem of having values below LLD. For example, IL-18 is measured in pg/vol in urine, and thus usually has higher proportions of values below the LLD compared to other biomarkers. Many researchers have stressed the importance of data that are below the LLD [1–4].
From the statistical point of view, we can expect the ad hoc methods (data deletion or simple imputation) to estimate the data differently and in a biased manner from the ML approach. An ideal approach to handle the left-censored data is to invoke the ML method because it accounts for the distribution of data in the detectable range and extrapolates into the region below the LLD.
The aim of this study is to show that when faced with left-censored data, the ML approach based on a bivariate normal (or lognormal) assumption for estimating the CCC between two assays is a more appropriate approach to use in practice than the ad hoc approaches that involve data deletion or simple data imputation.
In Tables 1, 2, 3, 4, 5 and 6, we report the results of a simulation study to assess the means and the standard deviations for estimating the CCC based on the ML approach and compare them to the four different methods that are described in the methods section, which frequently are applied in clinical research. In addition to means and standard deviations, we also report the relative bias, the mean of the standard error, and the percentage of 95% confidence intervals (CI) that include the true value of CCC for the 1,000 simulated data sets.
As demonstrated in Tables 1, and 2, the estimates from the four simple approaches are obviously biased, although the replacement of non-detectable data by a fraction of the detection limit or the detection limit itself is clearly preferable to discarding the pair method for all range of sample sizes. From Table 1 (sample size of 100 paired data points, 25% left-censoring for X, 25% left-censoring for Y, and a true CCC of 0.238), the relative bias is −1.43% for the ML method, −40.97% for the data deletion method, and it ranges between −12.94% and −21.72% for the simple data imputation methods. These four ad hoc methods also overstate the precision by underestimating the standard error set because the same value is imputed repeatedly. As expected, the ML method provides an excellent estimate of the true value of the CCC even when the censoring percentages increased, but it tends to slightly underestimate the true value. Moreover, the ML approach yields the smallest relative bias and the highest percentage of the 95% CI that include the true value of CCC among five methods. To see the impact of the percent of censoring, in Table 2, we increase the censoring rate to 40%. The relative biases are increased in all approaches. However, the ML approach still yields the smallest relative bias. From Table 2 (sample size of 100 paired data points, 40% left-censoring for X, 25% left-censoring for Y, and a true CCC of 0.238), the relative bias is −1.68% for the ML method, −47.90% for the data deletion method, and it ranges between −18.95% and −22.27% for the simple data imputation methods. The ML method also has the highest percentage of confidence intervals that include the true value of the CCC.
Due to the large sample size of both assays (sample size =100) in Tables 1 and 2, the ML method displays an excellent result for estimating the CCC with respect to the relative bias and the percentage of confidence intervals that include the true value of the CCC. However, if the sample size were smaller, then the ML method might produce less convincing results. To illustrate this point, we re-conduct the simulation studies with sample sizes = 50 (Tables 3 and 4) and sample sizes = 25 (Tables 5 and 6). In both cases, the ML method still performs best among the five approaches according to the means and the standard deviations for estimating the CCC, the relative bias, mean of the standard error, and the percentage of 95% CI that include the true value of CCC.
We illustrate these issues further via a urine stability study to assess agreement for two assays with lower limits of detection. The data set came from the multi-center ASSESS AKI Study (the Assessment, Serial Evaluation, and Subsequent Sequelae of Acute Kidney Injury). The data set was originally analyzed by Parikh et al.. The purpose of the ASSESS AKI sub-study was to determine the agreement between the measurements of the urinary biomarkers collected under a standard condition and under different experimental conditions, denoted as Process A, Process B, and Process C. Each experimental situation consisted of 50 paired samples (a selected process versus the standard). There are two biomarkers that we consider here: urine Interleukin 18 (IL-18; LLD = 12.5 pg/ ml), and urine Cystatin C (LLD = 0.005 mg/ml). The IL-18 contained 99 undetectable readings (out of a total of 300), yielding a 33% left-censoring rate. The Cystatin C contained 80 undetectable readings, for a 26.7% left-censoring rate. A natural logarithm transformation was applied to both the IL-18 and Cystatin C readings. We treat the natural logarithm of Process A, Process B, and Process C as the X variable and the natural logarithm of the reference standard as the Y variable.
Tables 7 and 8 summarize the results based on the four ad hoc approaches and the ML method, for estimating the CCC when comparing the reference standard process to Process A, Process B, and Process C for IL-18 and Cystatin C, respectively. As this example suggests, the four simple approaches can lead to CCC estimates that are different than the CCC estimated from the ML method. For example, from comparing Process B to the standard for urine IL-18 in Table 7, the CCC estimate is 0.73 from the data deletion method, 0.61 from each of the simple data imputation methods, and 0.68 from the ML method.
Biomarkers are being discovered at an accelerated rate due to availability of genomic and proteomic technologies . Several of these candidate biomarkers are undergoing validation to diagnose diseases and serve as indices for predicting health outcomes. The main purpose of our study was to assist the biomarker development program by confirming that the simple data imputation approaches and the deletion of data are not optimal techniques for arriving at accurate (unbiased) results with the appropriate level of precision in the presence of left-censored data.
Many researchers have stressed the importance of data that are below the LLD [1–4]. Hornung and Reed  proposed three methods of estimation with a left-censored lognormal distribution: a maximum likelihood (ML) method and two methods involving the limit of detection. However, they conclude that the ML method is complex to calculate, so they recommend using the one-half of the LLD. Lyles et al. evaluated the Pearson’s correlation coefficient when a subset of data points was below the LLD by using the ML approach under the assumption of bivariate normality. They showed that the ML method was the most accurate among the proposed methods. Barnhart et al.  presented a generalized estimating equations (GEE) approach for estimating parameters to calculate the concordance correlation coefficient (CCC) , which is a measure of agreement ranging between −1 and +1 for paired data. The GEE approach works well and does not require the bivariate normality assumption if the sample size is large enough, and it is comparable to the ML approach when the bivariate normality assumption is appropriate. Parikh et al.  performed a prospective study on hospitalized patients with almost 60% of patients having acute kidney injury (AKI). Five urine biomarkers were used to compare the stability of short-term storage and processing by using the CCC as a measure of agreement. To estimate the CCC, the authors applied the ML method using log-transformed data and accounting for values below the LLD.
We have illustrated with our computer simulation study that the estimation of the CCC from the imputation methods or data deletion lead to biased estimates compared to the ML approach. We also have shown via the computer simulation study that the proportion of left-censored data significantly impacts the degree of bias in estimating the CCC. Our simulation study shows that the ML approach based on the bivariate normality assumption works best among all of the studied approaches. The advantages of the ML approach are that it is accurate (small relative bias) and accounts for the variability in the data set appropriately. Additionally, it uses all the available data for the statistical analysis, in contrast to the data deletion approach that only uses sample pairs with both values above the LLD in the analysis. The estimates from the data deletion approach are obviously biased and result in a (1) large relative bias and (2) a high value of the standard error due to a small sample size from deleting paired data points. Although assigning a fixed value such as the LLD (or one-half of the LLD or the multiplication of the LLD by a random number from the uniform(0,1) distribution), yields smaller relative biases compared to the data deletion approach, the precision from these methods is overestimated due to the assignment of the same value to data below the LLD.
Although we did not investigate the performance of the ML method for censoring above 40%, we expect that the ML method still will perform well when censoring exceeds 50%. Lyles et al. investigated 60% censoring for their situation and the ML method still maintained a high level of accuracy.
The ML approach is very accurate in that it yields small relative biases if the assumption of bivariate normality is appropriate, and it can be readily implemented using SAS PROC NLMIXED [see Additional file 1 for a sample program]. Thus, our simulation study suggests that the ML approach is best for biomarker assay development where paired results need to be compared.
To find the optimal method to deal with left-censored data, we investigate how data deletion and simple data imputation methods compare to the ML approach in a computer simulation study. We adapt the framework from Barnhart et al. for our computer simulation studies. In all simulations described in the results, we generate bivariate normal data for paired data represented by the variables X and Y with a sample size of 100, 50, 25 for each of 1000 data sets using one of the following six combinations of parameter settings for the means, standard deviations, and correlation coefficient: μ x = 0, μ y = 0.2, σ x = 0.8, σ y = 1, ρ = 0.25, 0.50, 0.75, and left-censoring rates of (25% for X, 25% for Y) or (40% for X, 25% for Y). The selected values of the LLDs in the simulation study are determined by the censoring rates. All calculations are performed using SAS 9.3 statistical software. All estimated CCCs () were obtained by maximizing the likelihood function with respect to each of the following five scenarios.
Deleting the pair method means that pairs with X, Y, or both X and Y below the detection limit are discarded before calculation of the CCC. The 95% confidence interval (CI) of this method is calculated by using where is the estimated CCC, Z 0.025 is the critical value of the standard normal distribution, and is the standard error of the estimated CCC.
Replacing the left-censored data by the LLD method refers to the use of the CCC after replacing all non-detectable data by the applicable detection limit. The 95% confidence interval (CI) of this method is calculated by using where is the estimated CCC, Z 0.025 is the critical value of the standard normal distribution, and is the standard error of the estimated CCC.
Replacing the left-censored data by one-half of the LLD method refers to the calculation of the CCC using all pairs after replacing the non-detectable data with 0.5 times the detection limit. The 95% confidence interval (CI) of this method is calculated by using where is the estimated CCC, Z 0.025 is the critical value of the standard normal distribution, and is the standard error of the estimated CCC.
Replacing the left-censored data by c × LLD method refers to the situation in which we first generate a random number from the uniform (0, 1) distribution, say c. Then, we replace each non-detectable data point with c times the detection limit. A new value of c is determined for each non-detectable data point. The 95% confidence interval (CI) of this method is calculated by using where is the estimated CCC, Z 0.025 is the critical value of the standard normal distribution, and is the standard error of the estimated CCC.
The ML approach is performed by constructing a likelihood function based on the bivariate normal distribution of the data in the detectable range, and then extrapolating into the region below the LLD. The 95% confidence interval (CI) of this method is calculated by using where is the estimated CCC, Z 0.025 is the critical value of the standard normal distribution, and is the standard error of the estimated CCC. An additional file displays a sample SAS program for the calculations [see Additional file 1] and another additional file explains this ML approach in more detail [see Additional file 2].
Hornung RW, Reed LD: Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg. 1990, 5: 46-51. 10.1080/1047322X.1990.10389587.
Lyles RH, Williams JK, Chuachoowong R: Correlating two viral load assays with known detection limits. Biometrics. 2001, 57: 1238-1244. 10.1111/j.0006-341X.2001.01238.x.
Barnhart HX, Song J, Lyles RH: Assay validation for left-censored data. Stat Med. 2005, 24: 3347-3360. 10.1002/sim.2225.
Parikh CR, Butrymowicz I, Yu A, Chinchilli VM, Park M, Hsu C, Reeves WB, Devarajan P, Kimmel PL, Siew ED, Liu KD: Urine stability studies for novel biomarkers of acute kidney injury. Am J Kidney Dis. 2013, 63: 567-572.
Patterson SD, Aebersold RH: Proteomics: the first decade and beyond. Nat Genet. 2003, 33: 311-323. 10.1038/ng1106.
Lin LI: A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989, 45: 255-268. 10.2307/2532051.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2369/15/144/prepub
UD and VMC are supported by research grant U01DK082183 from the National Institute of Digestive, Diabetes and Kidney Diseases of the National Institutes of Health, U.S. Department of Health and Human Services. CRP is supported by the NIH grant K24DK090203. CRP is also member of the NIH-sponsored Assess, Serial Evaluation, and Subsequent Sequelae in Acute Kidney Injury Consortium (U01DK082185).
The views expressed do not necessarily represent the views of the Department of Health and Human Services, the National Institutes of Health, the National Institute of Diabetes, Digestive and Kidney Diseases, or the United States Government.
The computing programs from this paper are available from Uthumporn Domthong upon request.
The authors declare that they have no competing interests.
UD performed the statistical analysis and drafted the manuscript. CRP and PLK provided the expertise on biomarker kidney studies. VMC conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.
Uthumporn Domthong, Chirag R Parikh, Paul L Kimmel and Vernon M Chinchilli contributed equally to this work.