We found a low reproducibility of the measurement of ventricular wall thickness, and as a result low agreement within and between observers in diagnosing LVH using conventional echocardiography in children with ESRD. The inaccurateness of the wall thickness measurements affects LVH assessment for all different definitions of LVH. Our data demonstrate that in individual children, changes in diastolic wall thickness or LVMI as a result of ESRD which are expected over a period of a year cannot be distinguished from measurement error, even when measured by the same observer.
With more than 30 years of clinical use, echocardiography has become one of the most important non-invasive imaging methods in the evaluation of cardiac morphology and dynamics. It is generally considered a valuable method for the detection of LVH, due to its wide availability, non-invasiveness, and relatively low cost. However, in children the interpretation of echocardiographic data is hampered by various problems.
First, according to our study, the variability of outcomes even within one experienced observer is of such magnitude that expected changes in wall thickness can not reliably be monitored reliably. The intra-and inter-observer SDCs of the echocardiographic measurement of the IVSd and the LVPWd ranged from 1.6 to 2.6 mm and 2.0 to 2.9 mm, respectively. In an individual child only changes in septum or wall thickness larger than the SDC can be considered as “real change”. Since in children the normal values of IVSd and LVPWd are small, ranging from 3 to 10 mm and from 3 to 13 mm respectively, changes over time in an individual child need to be as large as 16 to 100% of the normal value before they can be considered as “real change”, i.e., not due to measurement variability. The intra-and inter-observer SDC of the LVMI ranged from 17.7 to 30.5 g/m2. Similarly, since in children the normal values of LVMI are 40 to 80 g/m2, changes over time in an individual child need to be as large as 30-60% of the normal value before they can be considered as “real change”.
Secondly, there is little consensus about the definition of LVH in children. In adults LVH is usually defined as LVMI > 51 (g/m2.7), which is associated with increased cardiovascular morbidity and mortality . In children, LVH is based on the normal distribution of LVMI or wall thickness in healthy children, because cardiovascular outcome studies in children are lacking due to the low incidence of cardiovascular events. Some physicians define LVH as LVM according to Devereux corrected for body size above a cutoff value (> 51 g/m2.7 or 38.6 g/m2.7) or above the P95 of healthy children for age. Yet, different indexations with respect to body size are used. All these different indexes for LVM lead to important differences in LVH prevalence, varying from 18% to 55% in children with chronic kidney disease  and from 27% to 52% in children on peritoneal dialysis . In 2009 Khoury et al. reported normal values for LVM indexed for height in children between 0 and 18 years of age . As LVM indexed for weight or BSA had been found not to be accurate in children with obesity , it is conceivable that Khoury’s normal values are not applicable in growth-retarded (ESRD) children either. Borzych et al. showed that when using the Khoury charts, LVH prevalence was significantly higher in growth-retarded children on peritoneal dialysis than in children on peritoneal dialysis of normal height . Another approach to the definition of LVH is using z-scores of only the septal and/or posterior wall thickness as measure for LVH. In this definition, the effect of changes in left ventricular end-diastolic volume is neglected. Since ventricular dilatation leads to wall thinning, LVH may be missed in dilated cardiomyopathy by this method. In the present study we used LVM indexed for BSA because in our opinion height and weight allow a better estimation of lean body mass, and therefore heart size, than height2.7 in children with possible growth retardation .
Our results are in line with other studies. In two studies in healthy children, intra-observer variance was found to be smaller than inter-observer variance, as is usually the case [28, 29]. In the study by Schieken et al.  the intra-observer SDC was 1.7 mm for both IVSd and LVPWd. An explanation for this relatively high reproducibility might be that the data from the Schieken et al.’s study were derived from a selection of 20 out of 28 echocardiograms, which satisfied the criteria for technical acceptability. Inter-observer SDCs were comparable with our findings. In the study by Day et al.  the intra-observer mean differences were equal to those found in our study, but the inter-observer mean differences were considerably larger. Still, the authors concluded that “the echocardiographic measurements taken from healthy children in a longitudinal study can be made accurately with acceptable reproducibility”.
The reproducibility of the LV mass is highly dependent on the reproducibility of IVSd and LVPWd. In M-mode measurements, differences of IVSd and LVPWd of approximately 5% may be translated into differences in LV mass between 8% and 15%, which may represent about 50 g in adults . Thus, measurement inaccuracies in individual adult patients limit the use of the Devereux formula to calculate LV mass . Test-retest studies in adults indeed found differences of up to 30 g between tests [31, 32]. Therefore the authors of ‘the reliability of M-Mode echocardiography studies’ (the RES trial) concluded that the probability of a true change in LV mass over time is maximized for a single-reader difference greater than 18% of the initial value.
Theoretically there are several sources of variability that can influence echocardiographic measurements. In our study the following sources of variation were excluded by the design of offline assessment of the non-moving stored images: placing of the echocardiogram probe, the way the images are captured and obviously within-patient variability like day-to-day variability in fluid status (e.g. before or after dialysis) and blood pressure. To minimize other sources of variation, such as timing in the cardiac cycle, all measurements were made according to the Guidelines and Standards for Performance of a Pediatric Echocardiogram of the American Society of Echocardiography7. In these guidelines, the exact choice of the images and heart cycles within one assessment on which the calculations are based is not defined. We therefore adjusted the protocol in part two of the study to exclude any potential variability due to the choice of the image and heart cycle. This did not lead to improvement of the reproducibility results, indicating that the variability is inherent in the measurement procedure, and not due to the choice of the particular image.
To improve reproducibility the standard method is to measure multiple heart cycles and calculate the mean value, as was done in our study. In addition, some authors even advise interpretation by more than one cardiologist . Others, however, advise that the same cardiologist reads all the echocardiograms for an individual patient to reduce the variability of the longitudinal measurements . This is unfortunately not supported by the low intra-observer reproducibility that we found. In 2010 Lopez et al. developed recommendations for quantifications during the performance of a pediatric echocardiogram . However, these recommendations are based on 2D or 3D short axis imaging, while in Europe M-mode echocardiography is still the most used technique. To date, it has to be established if 2D or 3D echocardiography is indeed better reproducible than M-mode echocardiography. Magnetic resonance imaging (MRI) of the heart could be valuable for diagnosing LVH, as it is more accurate and reproducible than echocardiography . It can be used to precisely estimate a patient's left ventricular mass and assess other structural cardiac abnormalities. Contrary to echocardiography, MRI has several disadvantages that preclude its use in daily clinical practice. It is expensive, time consuming, not easily accessible and obviously not bed side. Furthermore, in young children sedation is often necessary.
A limitation of this study is that the original echocardiograms were assessed retrospectively without the original observers knowing this would be a reproducibility study. The observers 1, 2 and 3 were aware of the fact this was a reproducibility study. This could be an explanation of the differences between the original observer and the other observers. Another limitation of this retrospective design is that not all original echocardiograms were performed at equal time intervals after a hemodialysis session for the 26 children on hemodialysis. Although the reproducibility measurements were assessed by the exact same offline images, the wall thickness could be affected by fluid overload. Finally, although observers 1, 2 and 3 used the same device to evaluate the images, we cannot exclude potential variation due to the use of the Vivid 7 or Philips Sonos 5500 for the original echocardiography.
Furthermore, Cohen’s kappa gives a quantitative assessment of how well two raters agree corrected for chance agreement. The interpretation of kappa is arbitrary. Several difficulties have been described with the interpretation of kappa, however, one of which is related to the prevalence of the condition [37, 38]. If only the most severe cases are diagnosed as LVH as was done by observer 2, the intra-observer kappa is inflated.