- Technical advance
- Open Access
- Open Peer Review
Assessing the agreement of biomarker data in the presence of left-censoring
- Uthumporn Domthong†^{1}Email author,
- Chirag R Parikh†^{2, 3},
- Paul L Kimmel†^{4},
- Vernon M Chinchilli†^{1} and
- the Assessment, Serial Evaluation, and Subsequent Sequelae of Acute Kidney Injury (ASSESS-AKI) Consortium
https://doi.org/10.1186/1471-2369-15-144
© Domthong et al.; licensee BioMed Central Ltd. 2014
- Received: 9 June 2014
- Accepted: 29 August 2014
- Published: 3 September 2014
Abstract
Background
In many clinical biomarker studies, Lin’s concordance correlation coefficient (CCC) is commonly used to assess the level of agreement of a biomarker measured under two different conditions. However, measurement of a specific biomarker typically cannot provide accurate numerical values below the lower limit of detection (LLD) of the assay, which results in left-censored data. Most researchers discard the data below the LLD or apply simple data imputation methods in the presence of left-censored data, such as replacing values below the LLD with a fixed number less than or equal to the LLD. This is not statistically optimal, because it often leads to biased estimates and overestimates the precision.
Methods
We describe a simple method using a bivariate normal distribution in this situation and apply SAS statistical software to arrive at the maximum likelihood (ML) estimate of the parameters and construct the estimate of the CCC. We conduct a computer simulation study to investigate the statistical properties of the ML method versus the data deletion and simple data imputation method. We also contrast the methods with real data using two urine biomarkers, Interleukin 18 and Cystatin C.
Results
The computer simulation studies confirm that the ML procedure is superior to the data deletion and simple data imputation procedures. In all of the simulated scenarios, the ML method yields the smallest relative bias and the highest percentage of the 95% confidence intervals that include the true value of the CCC. In the first simulation scenario (sample size of 100 paired data points, 25% left-censoring for both members of the pair, true CCC of 0.238), the relative bias is −1.43% for the ML method, −40.97% for the data deletion method, and it ranges between −12.94% and −21.72% for the simple data imputation methods. Similarly, when the left-censoring for one of the members of the data pairs increases from 25% to 40%, the relative bias displays the same pattern for all methods.
Conclusions
When estimating the CCC from paired biomarker data in the presence of left-censored values, the ML method works better than data deletion and simple data imputation methods.
Keywords
- Acute Kidney Injury
- Standard Normal Distribution
- Generalize Estimate Equation
- Relative Bias
- Concordance Correlation Coefficient
Background
Biomarkers in blood and urine are important indicators for the diagnosis of diseases and risk-stratification. During the development of biomarker assays, several pre-analytic steps require comparison of paired values of biomarkers exposed to separate conditions, such as varying degrees of storage, freeze-thaw cycles, and different antibodies. For these experiments, comparison of paired biomarker values is a critical step to advance the development to the next stage. In practice, assays often have lower limits of detection (LLD) due to the limitation of analytic procedures, thereby making the comparison of paired values challenging. A data point below the detection limit is equivalent to being left censored because the exact value of the data point is unknown – it only is known that it lies below the LLD. Although left-censored data are more informative than missing data, they still lead to challenges in the data analysis.
Simple (ad hoc) approaches to address the left-censored data are to delete the value below the LLD or impute a fixed value such as one-half of the LLD or the LLD itself. However, these approaches yield biased estimates of the parameters of interest and they underestimate the variability in the data set because the same value is imputed repeatedly [1–3]. Urine biomarkers are very prone to this problem as the concentration of the biomarkers is greatly influenced by urine volume. In diluted urine, biomarker values may be below the LLD. Also, biomarkers whose concentrations are below ng levels are prone to this problem of having values below LLD. For example, IL-18 is measured in pg/vol in urine, and thus usually has higher proportions of values below the LLD compared to other biomarkers. Many researchers have stressed the importance of data that are below the LLD [1–4].
From the statistical point of view, we can expect the ad hoc methods (data deletion or simple imputation) to estimate the data differently and in a biased manner from the ML approach. An ideal approach to handle the left-censored data is to invoke the ML method because it accounts for the distribution of data in the detectable range and extrapolates into the region below the LLD.
The aim of this study is to show that when faced with left-censored data, the ML approach based on a bivariate normal (or lognormal) assumption for estimating the CCC between two assays is a more appropriate approach to use in practice than the ad hoc approaches that involve data deletion or simple data imputation.
Results
Simulation
Simulation results based on 1000 data sets with sample size of 100 -- Per cent censoring (25%, 25%)
Per cent censoring | True ρ | True ρ _{ c } | Method | Mean ${\widehat{\mathit{\rho}}}_{\mathit{c}}$ | Relative bias (%) | Empirical SD | Mean SE | The percentage of 95% confidence intervals that include true value of CCC |
---|---|---|---|---|---|---|---|---|
(25%, 25%) | 0.25 | 0.238 | 1. Delete the pair | 0.1405 | −40.97 | 0.1119 | 0.1087 | 85.2 |
2. Replace by LOD | 0.2072 | −12.94 | 0.1214 | 0.0856 | 91.0 | |||
3. Replace by 0.5 × LOD | 0.1863 | −21.72 | 0.1630 | 0.0848 | 88.2 | |||
4. Replace by c × LOD | 0.1951 | −18.03 | 0.1240 | 0.0851 | 89.0 | |||
5. ML | 0.2346 | −1.43 | 0.0944 | 0.0939 | 94.3 | |||
0.50 | 0.476 | 1. Delete the pair | 0.2966 | −37.69 | 0.1197 | 0.0999 | 61.5 | |
2. Replace by LOD | 0.4169 | −12.42 | 0.1569 | 0.0735 | 86.4 | |||
3. Replace by 0.5 × LOD | 0.3664 | −23.03 | 0.3516 | 0.0742 | 78.0 | |||
4. Replace by c × LOD | 0.3821 | −19.73 | 0.1541 | 0.0744 | 78.3 | |||
5. ML | 0.4701 | −1.24 | 0.0807 | 0.0780 | 93.6 | |||
0.75 | 0.714 | 1. Delete the pair | 0.5307 | −25.67 | 0.0886 | 0.0762 | 38.6 | |
2. Replace by LOD | 0.6582 | −7.82 | 0.0996 | 0.0497 | 85.1 | |||
3. Replace by 0.5 × LOD | 0.6140 | −14.01 | 0.1482 | 0.0519 | 71.5 | |||
4. Replace by c × LOD | 0.6140 | −14.01 | 0.1662 | 0.0521 | 72.1 | |||
5. ML | 0.7077 | −0.88 | 0.0525 | 0.0503 | 94.8 |
Simulation results based on 1000 data sets with sample size of 100 -- Per cent censoring (40%, 25%)
Per cent censoring | True ρ | True ρ _{ c } | Method | Mean ${\widehat{\mathit{\rho}}}_{\mathit{c}}$ | Relative bias (%) | Empirical SD | Mean SE | The percentage of 95% confidence intervals that include true value of CCC |
---|---|---|---|---|---|---|---|---|
(40%, 25%) | 0.25 | 0.238 | 1. Delete the pair | 0.1240 | −47.90 | 0.1279 | 0.1230 | 84.9 |
2. Replace by LOD | 0.1929 | −18.95 | 0.1319 | 0.0843 | 86.8 | |||
3. Replace by 0.5 × LOD | 0.1856 | −22.02 | 0.1092 | 0.0845 | 85.9 | |||
4. Replace by c × LOD | 0.1850 | −22.27 | 0.1180 | 0.0844 | 85.7 | |||
5. ML | 0.2340 | −1.68 | 0.0984 | 0.0971 | 93.9 | |||
0.50 | 0.476 | 1. Delete the pair | 0.2754 | −42.14 | 0.1190 | 0.1108 | 60.2 | |
2. Replace by LOD | 0.3901 | −18.05 | 0.1219 | 0.0727 | 80.1 | |||
3. Replace by 0.5 × LOD | 0.3661 | −23.09 | 0.1488 | 0.0736 | 73.7 | |||
4. Replace by c × LOD | 0.3664 | −23.03 | 0.1603 | 0.0735 | 74.0 | |||
5. ML | 0.4692 | −1.43 | 0.0847 | 0.0811 | 93.8 | |||
0.75 | 0.714 | 1. Delete the pair | 0.4966 | −30.45 | 0.0978 | 0.0851 | 30.7 | |
2. Replace by LOD | 0.6268 | −12.21 | 0.1054 | 0.0500 | 68.7 | |||
3. Replace by 0.5 × LOD | 0.6055 | −15.20 | 0.1389 | 0.0512 | 64.7 | |||
4. Replace by c × LOD | 0.6083 | −14.80 | 0.1306 | 0.0513 | 65.0 | |||
5. ML | 0.7066 | −1.04 | 0.0554 | 0.0528 | 94.8 |
Simulation results based on 1000 data sets with sample size of 50 -- Per cent censoring (25%, 25%)
Per cent censoring | True ρ | True ρ _{ c } | Method | Mean ${\widehat{\mathit{\rho}}}_{\mathit{c}}$ | Relative bias (%) | Empirical SD | Mean SE | The percentage of 95% confidence intervals that include true value of CCC |
---|---|---|---|---|---|---|---|---|
(25%, 25%) | 0.25 | 0.238 | 1. Delete the pair | 0.1402 | −41.09 | 0.1693 | 0.1486 | 87.1 |
2. Replace by LOD | 0.2079 | −12.65 | 0.1509 | 0.1172 | 88.2 | |||
3. Replace by 0.5 × LOD | 0.1811 | −23.91 | 0.2088 | 0.1163 | 83.8 | |||
4. Replace by c × LOD | 0.1949 | −18.09 | 0.1813 | 0.1162 | 83.9 | |||
5. ML | 0.2310 | −2.94 | 0.1351 | 0.1304 | 92.7 | |||
0.50 | 0.476 | 1. Delete the pair | 0.2936 | −38.32 | 0.1481 | 0.1386 | 79.4 | |
2. Replace by LOD | 0.4144 | −12.94 | 0.4115 | 0.1016 | 87.3 | |||
3. Replace by 0.5 × LOD | 0.3661 | −23.09 | 0.1854 | 0.1024 | 79.4 | |||
4. Replace by c × LOD | 0.3722 | −21.81 | 0.1770 | 0.1026 | 81.2 | |||
5. ML | 0.4636 | −2.61 | 0.1151 | 0.1095 | 93.4 | |||
0.75 | 0.714 | 1. Delete the pair | 0.5169 | −27.61 | 0.1341 | 0.1071 | 66.7 | |
2. Replace by LOD | 0.6429 | −9.96 | 0.1438 | 0.0694 | 88.3 | |||
3. Replace by 0.5 × LOD | 0.5812 | −18.60 | 0.2127 | 0.0720 | 78.5 | |||
4. Replace by c × LOD | 0.5909 | −17.24 | 0.1917 | 0.0727 | 79.6 | |||
5. ML | 0.7025 | −1.61 | 0.0759 | 0.0716 | 93.6 |
Simulation results based on 1000 data sets with sample size of 50 -- Per cent censoring (40%, 25%)
Per cent censoring | True ρ | True ρ _{ c } | Method | Mean ${\widehat{\mathit{\rho}}}_{\mathit{c}}$ | Relative bias (%) | Empirical SD | Mean SE | The percentage of 95% confidence intervals that include true value of CCC |
---|---|---|---|---|---|---|---|---|
(40%, 25%) | 0.25 | 0.238 | 1. Delete the pair | 0.1334 | −43.95 | 0.1858 | 0.1652 | 88.1 |
2. Replace by LOD | 0.1908 | −19.83 | 0.1525 | 0.1150 | 84.3 | |||
3. Replace by 0.5 × LOD | 0.1796 | −24.54 | 0.1467 | 0.1155 | 80.2 | |||
4. Replace by c × LOD | 0.1780 | −25.22 | 0.1528 | 0.1152 | 81.8 | |||
5. ML | 0.2296 | −3.53 | 0.1395 | 0.1345 | 92.4 | |||
0.50 | 0.476 | 1. Delete the pair | 0.2735 | −42.54 | 0.1693 | 0.1516 | 76.9 | |
2. Replace by LOD | 0.3673 | −22.84 | 0.2110 | 0.1004 | 80.2 | |||
3. Replace by 0.5 × LOD | 0.3366 | −29.29 | 0.2771 | 0.1011 | 74.3 | |||
4. Replace by c × LOD | 0.3537 | −25.70 | 0.1788 | 0.1011 | 75.9 | |||
5. ML | 0.4617 | −3.00 | 0.1206 | 0.1135 | 93.2 | |||
0.75 | 0.714 | 1. Delete the pair | 0.4913 | −31.19 | 0.1373 | 0.1184 | 62.3 | |
2. Replace by LOD | 0.6022 | −15.66 | 0.1705 | 0.0696 | 78.7 | |||
3. Replace by 0.5 × LOD | 0.5706 | −20.08 | 0.2039 | 0.0710 | 74.7 | |||
4. Replace by c × LOD | 0.5689 | −20.32 | 0.2151 | 0.0713 | 74.1 | |||
5. ML | 0.7003 | −1.92 | 0.0805 | 0.0750 | 93.2 |
Simulation results based on 1000 data sets with sample size of 25 -- Per cent censoring (25%, 25%)
Per cent censoring | True ρ | True ρ _{ c } | Method | Mean ${\widehat{\mathit{\rho}}}_{\mathit{c}}$ | Relative bias (%) | Empirical SD | Mean SE | The percentage of 95% confidence intervals that include true value of CCC |
---|---|---|---|---|---|---|---|---|
(25%, 25%) | 0.25 | 0.238 | 1. Delete the pair | 0.1300 | −45.38 | 0.2287 | 0.2002 | 85.7 |
2. Replace by LOD | 0.2159 | −9.29 | 0.2176 | 0.1563 | 84.1 | |||
3. Replace by 0.5 × LOD | 0.1821 | −23.49 | 0.2280 | 0.1550 | 78.5 | |||
4. Replace by c × LOD | 0.1897 | −20.29 | 0.2215 | 0.1552 | 78.4 | |||
5. ML | 0.2225 | −6.51 | 0.1905 | 0.1783 | 93.1 | |||
0.50 | 0.476 | 1. Delete the pair | 0.2862 | −39.87 | 0.2088 | 0.1853 | 83.4 | |
2. Replace by LOD | 0.3963 | −16.74 | 0.2030 | 0.1356 | 83.1 | |||
3. Replace by 0.5 × LOD | 0.3538 | −25.67 | 0.2277 | 0.1372 | 76.3 | |||
4. Replace by c × LOD | 0.3642 | −23.49 | 0.2105 | 0.1372 | 78.7 | |||
5. ML | 0.4496 | −5.55 | 0.1650 | 0.1526 | 93.3 | |||
0.75 | 0.714 | 1. Delete the pair | 0.4850 | −32.07 | 0.1951 | 0.1489 | 77.3 | |
2. Replace by LOD | 0.6031 | −15.53 | 0.2122 | 0.0960 | 83.5 | |||
3. Replace by 0.5 × LOD | 0.5467 | −23.43 | 0.2545 | 0.0984 | 77.0 | |||
4. Replace by c × LOD | 0.5436 | −23.87 | 0.2700 | 0.0991 | 77.6 | |||
5. ML | 0.6897 | −3.40 | 0.1136 | 0.1027 | 92.1 |
Simulation results based on 1000 data sets with sample size of 25 -- Per cent censoring (40%, 25%)
Per cent censoring | True ρ | True ρ _{ c } | Method | Mean ${\widehat{\mathit{\rho}}}_{\mathit{c}}$ | Relative bias (%) | Empirical SD | Mean SE | The percentage of 95% confidence intervals that include true value of CCC |
---|---|---|---|---|---|---|---|---|
(40%, 25%) | 0.25 | 0.238 | 1. Delete the pair | 0.1357 | −42.98 | 0.2485 | 0.2130 | 85.5 |
2. Replace by LOD | 0.1703 | −28.45 | 0.4223 | 0.1530 | 80.1 | |||
3. Replace by 0.5 × LOD | 0.1822 | −23.45 | 0.2936 | 0.1535 | 74.3 | |||
4. Replace by c × LOD | 0.1751 | −26.45 | 0.2033 | 0.1533 | 75.4 | |||
5. ML | 0.2193 | −7.86 | 0.1948 | 0.1824 | 93.0 | |||
0.50 | 0.476 | 1. Delete the pair | 0.2667 | −43.97 | 0.2296 | 0.1982 | 82.3 | |
2. Replace by LOD | 0.3628 | −23.78 | 0.2088 | 0.1334 | 78.6 | |||
3. Replace by 0.5 × LOD | 0.3307 | −30.53 | 0.2283 | 0.1350 | 71.7 | |||
4. Replace by c × LOD | 0.3304 | −30.59 | 0.2416 | 0.1352 | 72.8 | |||
5. ML | 0.4449 | −6.53 | 0.1693 | 0.1570 | 93.3 | |||
0.75 | 0.714 | 1. Delete the pair | 0.4600 | −35.57 | 0.2050 | 0.1608 | 75.9 | |
2. Replace by LOD | 0.5599 | −21.58 | 0.2482 | 0.0947 | 78.4 | |||
3. Replace by 0.5 × LOD | 0.5246 | −26.53 | 0.2619 | 0.0961 | 72.8 | |||
4. Replace by c × LOD | 0.5266 | −26.25 | 0.2678 | 0.0964 | 74.0 | |||
5. ML | 0.6847 | −4.10 | 0.1180 | 0.1074 | 92.2 |
As demonstrated in Tables 1, and 2, the estimates from the four simple approaches are obviously biased, although the replacement of non-detectable data by a fraction of the detection limit or the detection limit itself is clearly preferable to discarding the pair method for all range of sample sizes. From Table 1 (sample size of 100 paired data points, 25% left-censoring for X, 25% left-censoring for Y, and a true CCC of 0.238), the relative bias is −1.43% for the ML method, −40.97% for the data deletion method, and it ranges between −12.94% and −21.72% for the simple data imputation methods. These four ad hoc methods also overstate the precision by underestimating the standard error set because the same value is imputed repeatedly. As expected, the ML method provides an excellent estimate of the true value of the CCC even when the censoring percentages increased, but it tends to slightly underestimate the true value. Moreover, the ML approach yields the smallest relative bias and the highest percentage of the 95% CI that include the true value of CCC among five methods. To see the impact of the percent of censoring, in Table 2, we increase the censoring rate to 40%. The relative biases are increased in all approaches. However, the ML approach still yields the smallest relative bias. From Table 2 (sample size of 100 paired data points, 40% left-censoring for X, 25% left-censoring for Y, and a true CCC of 0.238), the relative bias is −1.68% for the ML method, −47.90% for the data deletion method, and it ranges between −18.95% and −22.27% for the simple data imputation methods. The ML method also has the highest percentage of confidence intervals that include the true value of the CCC.
Due to the large sample size of both assays (sample size =100) in Tables 1 and 2, the ML method displays an excellent result for estimating the CCC with respect to the relative bias and the percentage of confidence intervals that include the true value of the CCC. However, if the sample size were smaller, then the ML method might produce less convincing results. To illustrate this point, we re-conduct the simulation studies with sample sizes = 50 (Tables 3 and 4) and sample sizes = 25 (Tables 5 and 6). In both cases, the ML method still performs best among the five approaches according to the means and the standard deviations for estimating the CCC, the relative bias, mean of the standard error, and the percentage of 95% CI that include the true value of CCC.
Example
We illustrate these issues further via a urine stability study to assess agreement for two assays with lower limits of detection. The data set came from the multi-center ASSESS AKI Study (the Assessment, Serial Evaluation, and Subsequent Sequelae of Acute Kidney Injury). The data set was originally analyzed by Parikh et al.[4]. The purpose of the ASSESS AKI sub-study was to determine the agreement between the measurements of the urinary biomarkers collected under a standard condition and under different experimental conditions, denoted as Process A, Process B, and Process C. Each experimental situation consisted of 50 paired samples (a selected process versus the standard). There are two biomarkers that we consider here: urine Interleukin 18 (IL-18; LLD = 12.5 pg/ ml), and urine Cystatin C (LLD = 0.005 mg/ml). The IL-18 contained 99 undetectable readings (out of a total of 300), yielding a 33% left-censoring rate. The Cystatin C contained 80 undetectable readings, for a 26.7% left-censoring rate. A natural logarithm transformation was applied to both the IL-18 and Cystatin C readings. We treat the natural logarithm of Process A, Process B, and Process C as the X variable and the natural logarithm of the reference standard as the Y variable.
Concordance correlation coefficients (and 95% confidence intervals) for 3 processes using 5 methods based on IL-18* assay
Processes | Method 1 | Method 2 | Method 3 | Method 4 | Method 5 |
---|---|---|---|---|---|
Delete the pair | Replace by LOD | Replace by 0.5 × LOD | Replace by c× LOD | ML | |
A (Initial 48 hours: 4°C vs −80°C) | 0.8801 (0.81, 0.95) | 0.8228 (0.73, 0.91) | 0.8228 (0.73, 0.91) | 0.8228 (0.73, 0.91) | 0.8314 (0.74, 0.92) |
B (Initial 48 hours: 25°C vs −80°C) | 0.7344 (0.56, 0.91) | 0.6081 (0.43, 0.77) | 0.6081 (0.43, 0.77) | 0.6081 (0.43, 0.77) | 0.6819 (0.51, 0.85) |
C (Centrifuge vs No Centrifuge) | 0.9886 (0.98, 1.00) | 0.9896 (0.98, 1.00) | 0.9896 (0.98, 1.00) | 0.9896 (0.98, 1.00) | 0.9876 (0.99, 1.00) |
Concordance correlation coefficients (and 95% confidence intervals) for 3 processes using 5 methods based on Cystatin C assay
Processes | Method 1 | Method 2 | Method 3 | Method 4 | Method 5 |
---|---|---|---|---|---|
Delete the pair | Replace by LOD | Replace by 0.5 × LOD | Replace by c× LOD | ML | |
A (Initial 48 hours: 4°C vs- 80°C) | 0.9348 (0.89, 0.98) | 0.9641 (0.94, 0.98) | 0.9641 (0.94, 0.98) | 0.9641 (0.94, 0.98) | 0.9735 (0.95, 0.99) |
B (Initial 48 hours: 25°C vs −80°C) | 0.9320 (0.89, 0.98) | 0.9514 (0.92, 0.98) | 0.9514 (0.92, 0.98) | 0.9514 (0.92, 0.98) | 0.9471 (0.91, 0.98) |
C (Centrifuge vs No Centrifuge) | 0.9985 (0.99, 1.00) | 0.9982 (0.99, 1.00) | 0.9758 (0.96, 0.99) | 0.9653 (0.95, 0.98) | 0.9999 (0.99, 1.00) |
Discussion
Biomarkers are being discovered at an accelerated rate due to availability of genomic and proteomic technologies [5]. Several of these candidate biomarkers are undergoing validation to diagnose diseases and serve as indices for predicting health outcomes. The main purpose of our study was to assist the biomarker development program by confirming that the simple data imputation approaches and the deletion of data are not optimal techniques for arriving at accurate (unbiased) results with the appropriate level of precision in the presence of left-censored data.
Many researchers have stressed the importance of data that are below the LLD [1–4]. Hornung and Reed [1] proposed three methods of estimation with a left-censored lognormal distribution: a maximum likelihood (ML) method and two methods involving the limit of detection. However, they conclude that the ML method is complex to calculate, so they recommend using the one-half of the LLD. Lyles et al.[2] evaluated the Pearson’s correlation coefficient when a subset of data points was below the LLD by using the ML approach under the assumption of bivariate normality. They showed that the ML method was the most accurate among the proposed methods. Barnhart et al. [3] presented a generalized estimating equations (GEE) approach for estimating parameters to calculate the concordance correlation coefficient (CCC) [6], which is a measure of agreement ranging between −1 and +1 for paired data. The GEE approach works well and does not require the bivariate normality assumption if the sample size is large enough, and it is comparable to the ML approach when the bivariate normality assumption is appropriate. Parikh et al. [4] performed a prospective study on hospitalized patients with almost 60% of patients having acute kidney injury (AKI). Five urine biomarkers were used to compare the stability of short-term storage and processing by using the CCC as a measure of agreement. To estimate the CCC, the authors applied the ML method using log-transformed data and accounting for values below the LLD.
We have illustrated with our computer simulation study that the estimation of the CCC from the imputation methods or data deletion lead to biased estimates compared to the ML approach. We also have shown via the computer simulation study that the proportion of left-censored data significantly impacts the degree of bias in estimating the CCC. Our simulation study shows that the ML approach based on the bivariate normality assumption works best among all of the studied approaches. The advantages of the ML approach are that it is accurate (small relative bias) and accounts for the variability in the data set appropriately. Additionally, it uses all the available data for the statistical analysis, in contrast to the data deletion approach that only uses sample pairs with both values above the LLD in the analysis. The estimates from the data deletion approach are obviously biased and result in a (1) large relative bias and (2) a high value of the standard error due to a small sample size from deleting paired data points. Although assigning a fixed value such as the LLD (or one-half of the LLD or the multiplication of the LLD by a random number from the uniform(0,1) distribution), yields smaller relative biases compared to the data deletion approach, the precision from these methods is overestimated due to the assignment of the same value to data below the LLD.
Although we did not investigate the performance of the ML method for censoring above 40%, we expect that the ML method still will perform well when censoring exceeds 50%. Lyles et al.[2] investigated 60% censoring for their situation and the ML method still maintained a high level of accuracy.
Conclusions
The ML approach is very accurate in that it yields small relative biases if the assumption of bivariate normality is appropriate, and it can be readily implemented using SAS PROC NLMIXED [see Additional file 1 for a sample program]. Thus, our simulation study suggests that the ML approach is best for biomarker assay development where paired results need to be compared.
Methods
- 1.
Deleting the pair method means that pairs with X, Y, or both X and Y below the detection limit are discarded before calculation of the CCC. The 95% confidence interval (CI) of this method is calculated by using ${\widehat{\rho}}_{c}\pm {Z}_{0.025}\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ where ${\widehat{\rho}}_{c}$ is the estimated CCC, Z _{0.025} is the critical value of the standard normal distribution, and $\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ is the standard error of the estimated CCC.
- 2.
Replacing the left-censored data by the LLD method refers to the use of the CCC after replacing all non-detectable data by the applicable detection limit. The 95% confidence interval (CI) of this method is calculated by using ${\widehat{\rho}}_{c}\pm {Z}_{0.025}\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ where ${\widehat{\rho}}_{c}$ is the estimated CCC, Z _{0.025} is the critical value of the standard normal distribution, and $\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ is the standard error of the estimated CCC.
- 3.
Replacing the left-censored data by one-half of the LLD method refers to the calculation of the CCC using all pairs after replacing the non-detectable data with 0.5 times the detection limit. The 95% confidence interval (CI) of this method is calculated by using ${\widehat{\rho}}_{c}\pm {Z}_{0.025}\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ where ${\widehat{\rho}}_{c}$ is the estimated CCC, Z _{0.025} is the critical value of the standard normal distribution, and $\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ is the standard error of the estimated CCC.
- 4.
Replacing the left-censored data by c × LLD method refers to the situation in which we first generate a random number from the uniform (0, 1) distribution, say c. Then, we replace each non-detectable data point with c times the detection limit. A new value of c is determined for each non-detectable data point. The 95% confidence interval (CI) of this method is calculated by using ${\widehat{\rho}}_{c}\pm {Z}_{0.025}\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ where ${\widehat{\rho}}_{c}$ is the estimated CCC, Z _{0.025} is the critical value of the standard normal distribution, and $\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ is the standard error of the estimated CCC.
- 5.
The ML approach is performed by constructing a likelihood function based on the bivariate normal distribution of the data in the detectable range, and then extrapolating into the region below the LLD. The 95% confidence interval (CI) of this method is calculated by using ${\widehat{\rho}}_{c}\pm {Z}_{0.025}\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ where ${\widehat{\rho}}_{c}$ is the estimated CCC, Z _{0.025} is the critical value of the standard normal distribution, and $\mathit{SE}\left({\widehat{\rho}}_{c}\right)$ is the standard error of the estimated CCC. An additional file displays a sample SAS program for the calculations [see Additional file 1] and another additional file explains this ML approach in more detail [see Additional file 2].
Notes
Declarations
Acknowledgements
UD and VMC are supported by research grant U01DK082183 from the National Institute of Digestive, Diabetes and Kidney Diseases of the National Institutes of Health, U.S. Department of Health and Human Services. CRP is supported by the NIH grant K24DK090203. CRP is also member of the NIH-sponsored Assess, Serial Evaluation, and Subsequent Sequelae in Acute Kidney Injury Consortium (U01DK082185).
The views expressed do not necessarily represent the views of the Department of Health and Human Services, the National Institutes of Health, the National Institute of Diabetes, Digestive and Kidney Diseases, or the United States Government.
The computing programs from this paper are available from Uthumporn Domthong upon request.
Authors’ Affiliations
References
- Hornung RW, Reed LD: Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg. 1990, 5: 46-51. 10.1080/1047322X.1990.10389587.View ArticleGoogle Scholar
- Lyles RH, Williams JK, Chuachoowong R: Correlating two viral load assays with known detection limits. Biometrics. 2001, 57: 1238-1244. 10.1111/j.0006-341X.2001.01238.x.View ArticlePubMedGoogle Scholar
- Barnhart HX, Song J, Lyles RH: Assay validation for left-censored data. Stat Med. 2005, 24: 3347-3360. 10.1002/sim.2225.View ArticlePubMedGoogle Scholar
- Parikh CR, Butrymowicz I, Yu A, Chinchilli VM, Park M, Hsu C, Reeves WB, Devarajan P, Kimmel PL, Siew ED, Liu KD: Urine stability studies for novel biomarkers of acute kidney injury. Am J Kidney Dis. 2013, 63: 567-572.View ArticlePubMedPubMed CentralGoogle Scholar
- Patterson SD, Aebersold RH: Proteomics: the first decade and beyond. Nat Genet. 2003, 33: 311-323. 10.1038/ng1106.View ArticlePubMedGoogle Scholar
- Lin LI: A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989, 45: 255-268. 10.2307/2532051.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2369/15/144/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.