|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Cardiac Imaging |
1 From the Department of Radiology (A.R., M.P.) and Image Sciences Institute (I.I.), University Medical Center Utrecht, Heidelberglaan 100, Room E01.132, 3584 CX Utrecht, the Netherlands. Received January 3, 2007; revision requested March 2; revision received March 9; accepted April 13; final version accepted April 26. Address correspondence to A.R. (e-mail: a.rutten{at}umcutrecht.nl).
| ABSTRACT |
|---|
|
|
|---|
Materials and Methods: Informed consent and institutional review board approval were obtained. A retrospective study was performed by using prospective unenhanced electrocardiographically triggered cardiac multidetector CT scans in 228 women (mean age, 67 years ± 5 [standard deviation]). From the original 1.5-mm data set, two sets of adjacent images with a section thickness of 3 mm and a variation in starting point of 1.5 mm were obtained. Calcium scoring was performed to acquire Agatston, volume, and mass scores. Subjects were assigned to one of five risk categories (I–V) according to the Agatston score of each 3-mm data set and the average score.
Value was calculated to assess agreement in risk category assignment. Differences and relative differences between scores obtained for both 3-mm data sets were calculated overall and according to risk category. The effect of scoring algorithm on the relative differences between scores was analyzed with the Wilcoxon signed rank test.
Results: Categories I–V contained 102, 35, 48, 31, and 12 subjects, respectively. For all scoring algorithms, median relative differences decreased from more than 130% in category II to less than 10% in category V. In the three highest categories, relative differences were significantly smaller for volume and mass scores than for Agatston scores (P < .05). Twenty-one subjects were assigned to different risk categories between the two data sets (
= 0.87). Eleven patients were assigned a nonzero score in one and a zero score in the other data set.
Conclusion: A small variation in scan starting position can substantially influence calcium measurements and poses an inherent limit to calcium scoring with contiguous 3-mm sections. Mass and volume scores are slightly less affected than are Agatston scores.
© RSNA, 2007
| INTRODUCTION |
|---|
|
|
|---|
Most data on calcium scoring and its predictive value have been obtained with nonoverlapping 3-mm electron-beam computed tomographic (CT) data sets and the Agatston scoring algorithm (16). Despite the high temporal resolution of electron-beam CT, it has a relatively high mean interscan variability of 15%–49% (17–22). New scoring algorithms with continuous measures, such as calcium volume and mineral mass, were developed to replace the stepwise approach of the Agatston score (23–25). Mineral mass quantification, or mass score, has been shown to be the most reproducible (26,27).
With the increasing use of multidetector CT for cardiac examinations, calcium scoring is nowadays frequently performed with multidetector CT instead of electron-beam CT. Multidetector CT not only enables electron-beam CT–like acquisitions with prospective electrocardiographic (ECG) triggering, but also enables helical retrospectively ECG-gated acquisitions with reconstruction of partially overlapping sections. This improves reproducibility, but radiation dose substantially increases compared with that of standard prospectively ECG-triggered acquisitions (28–31).
An international consortium was formed to standardize the acquisition and evaluation of coronary calcium scoring scans (32). While considerable consensus was reached with respect to scan acquisition parameters, the use of transverse and helical acquisition modes was allowed because both have advantages: lower dose for transverse scanning and overlapping reconstructions for helical scanning. A recent scientific statement on cardiac CT by the American Heart Association recommends the use of prospectively ECG-triggered acquisitions because of the lower dose (33).
We hypothesized that scanning sequential 3-mm sections presents an inherent problem with regard to reproducibility because of the occurrence of partial volume effects regardless of scoring algorithm. Small variations in the starting position of the scan can have a major influence on partial volume effects (Fig 1). These inherent effects could render the acquisition technique with contiguous 3-mm sections less suitable for risk categorization and treatment follow-up than an acquisition with reconstruction of overlapping sections. Thus, the purpose of our study was to retrospectively evaluate the effect of a small variation of scan starting position on coronary artery calcium scores based on nonoverlapping 3-mm multidetector CT data sets.
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data Acquisition
All examinations were performed with a 16-detector CT scanner (16-IDT; Philips Medical Systems, Best, the Netherlands). An unenhanced multidetector CT scan of the whole heart was performed, starting approximately at the level of the tracheal bifurcation and ending below the apex of the heart. The scanning parameters were a collimation of 16 x 1.5 mm, 120 kVp, 40–70 mAs (depending on patient weight), a 420-msec rotation time, and prospective ECG triggering at a time point to obtain data within the diastolic phase.
A nonoverlapping set of 1.5-mm-thick sections and a nonoverlapping set of 3-mm-thick sections, both beginning at the original scan starting point, were reconstructed from the raw data. The scanner software did not allow reconstruction from the raw data of a 3-mm data set with a starting point 1.5 mm below the original scanning starting point. Therefore, two image sets with nonoverlapping 3-mm sections were approximated by averaging sections from the 1.5-mm data set. The first set contained contiguous 3-mm sections beginning at the start of the 1.5-mm data set and therefore could be directly compared with the 3-mm data set reconstructed from the raw data. The second set also contained contiguous 3-mm sections, but this set had a starting point 1.5 mm lower than the first set. This way, two similar data sets of nonoverlapping 3-mm-thick sections were created with an offset of 1.5 mm. The starting point of the second 3-mm data set was still above the highest coronary artery; therefore, no calcifications were discarded in the second data set.
Coronary Artery Calcium Measurement
Image sets of all 228 patients were analyzed with a standard personal computer by a single investigator (A.R.) who had experience reading more than 500 cardiac scans. This was done to guarantee continuity and consistency of scoring. To avoid observer variability in calcium scoring, we used the following approach. First, the observer manually identified coronary calcifications in the 1.5-mm data sets, in which all regions with CT attenuation higher than the threshold of 130 HU were marked. The identified calcifications in the 1.5-mm data sets were used as an overlay to determine the presence of calcifications in all 3-mm data sets, namely, the data set reconstructed from the raw data and the two averaged data sets. All voxels in the 3-mm data sets at the location of calcifications in the corresponding 1.5-mm data set were automatically analyzed for their attenuation. Only voxels for which the CT attenuation in the 3-mm data set was higher than 130 HU were considered calcifications.
The Agatston, volume, and mass scores for the 3-mm image sets were calculated by using software written in C++. Scores were implemented as outlined by Ulzheimer and Kalender (34).
Data Evaluation
To determine if our method of obtaining two 3-mm data sets with a 1.5-mm offset by averaging sections from a 1.5-mm data set induced major errors, we first compared the calcium scores from the 3-mm data set reconstructed from raw data with scores from the averaged 3-mm data set with the same starting position. Absolute differences (highest score – lowest score) and percentage differences—that is, difference as a fraction of the mean: 100·(highest score – lowest score)/(highest score + lowest score)/2—were calculated to assess variability between the scores from these two data sets.
Absolute and percentage differences were also calculated to assess variability between the two averaged 3-mm image sets. Calcium scores have a nonnormal distribution. Therefore, differences were summarized for the overall group and according to risk category with medians and complete ranges (minimum to maximum differences) or interquartile ranges (25th–75th percentiles). Log transformation was attempted to normalize the data, but this did not result in a normal distribution.
Each subject was assigned to a risk category as defined by Rumberger et al (7) that was based on only the Agatston score. Each subject was assigned to a risk category on the basis of the Agatston score for each of the two averaged data sets and on the basis of the average Agatston score for the two data sets (7). The risk categories were as follows: category I indicated an Agatston score of 0 (very low risk); category II, an Agatston score of >0–10 (low risk); category III, an Agatston score of >10–100 (moderate risk); category IV, an Agatston score of >100–400 (moderately high risk); and category V, an Agatston score of >400 (high risk).
Bland-Altman plots were constructed for each scoring algorithm to assess agreement between scores of the two averaged data sets. Differences between the scores of the two image sets were plotted against the average score of the two image sets.
Shifts in risk category assignment between the two 3-mm image sets with 1.5-mm offset were determined for each subject. The conversion rate was calculated as the number of subjects with a changed risk category between these two 3-mm data sets divided by the total number of subjects, multiplied by 100.
Statistical Analysis
To determine if the scoring algorithms (ie, Agatston, volume, and mass) that were applied have an effect on the relative differences found between the two averaged data sets, Wilcoxon signed rank tests were used. Because calcium scores have a nonnormal distribution, testing was performed after patients were assigned to risk categories on the basis of their average Agatston score. A two-sided P value of less than .05 was considered to indicate a significant difference.
To assess agreement in risk category assignment between the two averaged 3-mm data sets,
was calculated. All statistical analyses were performed with software (SPSS for Windows, version 12.0.1, 2004; SPSS, Chicago, Ill).
| RESULTS |
|---|
|
|
|---|
Raw Data Reformation versus Averaging
All raw data reformations with a zero score also had a zero score on the averaged data set for all scoring algorithms. Median differences, after exclusion of all zero scores, were 0.3 (interquartile range, 0–0.8) for Agatston score, 0.5 mm3 (interquartile range, 0–1.0 mm3) for volume score, and 0.06 mg (interquartile range, .01–0.11 mg) for calcium mass. Corresponding median percentage differences were 0% (interquartile range, 0%–1%), 0% (interquartile range, 0%–2%), and 0% (interquartile range, 0%–1%), respectively, after exclusion of all zero scores. All percentage differences between the raw data reformation and the corresponding averaged 3-mm data set were smaller than those between the two averaged data sets.
Reproducibility
Differences between the two reformations ranged from 0 to 168 for Agatston scores, from 0 to 115 mm3 for volume scores, and from 0 to 14 mg for calcium mass (Fig 2). Bland-Altman plots showed that there was an increase in differences with higher scores (Fig 3). Percentage differences ranged from 0% to 200% for all scores, with interquartile ranges of 0%–18%, 0%–12%, and 0%–13% for Agatston, volume, and mass scores, respectively. Percentage differences decreased with higher scores. Wilcoxon signed rank test results showed that the percentage differences for volume and mass scores were significantly smaller than for Agatston scores in risk categories III and higher (P < .05) (Table 1). There was no significant difference between percentage differences of volume scores and that of mass scores in these categories (P > .05).
|
|
|
|
|
statistic was 0.87 (ie, excellent agreement). Although there were shifts in 9% of subjects, it never involved a change of more than one category.
|
| DISCUSSION |
|---|
|
|
|---|
To our knowledge, our study is the first that involves solely the effect of minor variation in scan starting position on score variability. An earlier report by Mohlenkamp et al (35) describes the effect of a table shift of 1.5 mm on the reproducibility of the Agatston score and of the calcium area quantification, but, in that study, two scan acquisitions were obtained. The use of two subsequent scan acquisitions introduces variations in patient-related factors, such as heart rate and heart rate variability, which also can influence reproducibility. In light of this, it is not surprising that median percentage differences in the three highest risk categories were almost twice as high as those found in our study. The reproducibility found in our study was truly based on the change in scan starting position; all other factors were constant.
Variability of calcium scoring depends on a large number of factors (eg, observer, pulsation, breathing, partial volume effects) (17,18,20–23). As seen in our study as well, percentage differences decrease with higher scores while absolute differences between measurements increase with higher scores (20–23). Inter- and intraobserver variability are factors that are mainly influenced by the decision whether to call a high-attenuation object a coronary calcification or not. We have explicitly excluded observer variability between the 3-mm data sets by manually identifying coronary calcifications on a 1.5-mm data set and automatically calculating the resulting calcium scores for the various 3-mm data sets.
Several studies have involved possibilities for improving reproducibility. Use of thinner sections, thicker sections, early diastolic triggering, resting heart rate–adjusted triggering, lower minimum attenuation threshold, variation of minimum threshold dependent on noise level, increase in minimum area, and volume score or mass score instead of Agatston score have all been suggested to decrease interscan variability of nonoverlapping data sets (17,18,20,21,23,24,26,36–43). Our study results, however, show that minimal variation of scan starting position highly influences calcium measurements in contiguous data sets. How much of the variability is explained by variation in scan starting position is not entirely clear. As mentioned previously, the variability we found in our study was about half the variability found in a study by Mohlenkamp et al (35). However, the interscan variability reported by Lu et al (44) was very similar to the median variabilities found in our study. A change in scan starting position therefore can explain a large part of the interscan variability that occurs with the use of contiguous 3-mm sections for calcium scoring.
Partial volume effects and the threshold used for identifying calcifications are the reasons starting position affects calcium scores. Partial volume effects reduce the CT number of a voxel that is only partially filled by a calcification. If the CT number falls below the threshold, this voxel will no longer contribute to the calcium score. This will affect the volume of a calcification but will also indirectly affect Agatston score and calcium mass.
One way to overcome this cause of decreased reproducibility is to use retrospective ECG gating, which allows the reconstruction of overlapping sections and thus reduces the influence of partial volume effects. However, use of retrospectively ECG-gated scanning does substantially increase radiation dose (45–47), which should be avoided, especially in a screening population. Another option for reducing the influence of partial volume effects is to use thinner sections. However, thinner sections raise noise levels, especially at the diaphragmatic surface of the heart. Higher noise results in an increase in false-positive findings, which can negatively influence reproducibility. This is in accordance with the fact that better reproducibility has been found with even thicker sections (21). However, with sections thicker than 3 mm, small calcifications are missed because of partial volume effects, and small nonzero scores are mistaken for zero scores, regardless of scoring algorithm. Because multidetector CT has an improved signal-to-noise ratio compared with that of electron-beam CT, lowering the detection threshold from 130 to 90 HU when using a multidetector CT scan for calcium scoring could also be an option to reduce the influence of partial volume effects (48). The chance that a voxel containing calcium reaches the 90-HU threshold is more likely, and partial volume effects may have less influence. However, the differentiation of noise from small calcifications is more difficult, and this will induce more score variability. Lowering the detection threshold is not an option with electron-beam CT because of the high noise level.
In the past, several investigators (19,35,49,50) have suggested improving interscan variability by performing two consecutive scans and averaging the calcium scores. Major disadvantages of performing two consecutive scans are that subjects receive twice the radiation dose and the total examination time is longer. An optimal solution would be the possibility of reconstructing overlapping data sets from raw data acquired with prospective ECG triggering. This would combine a low radiation dose with the advantage of overlapping sections. However, this option is currently not available with the majority of scanners.
The current rapid development of CT scanner technology has substantially improved results of coronary CT angiography (51). However, the advances in scanner technology will probably have a much smaller effect on coronary calcium scoring. Thinner sections or lower thresholds for calcium scoring will induce substantially more image noise and therefore may be less advantageous if a low-dose scanning technique is used.
In our study, 113 subjects had either one scan with a zero score or had a zero score on both scans. Eleven (10%) of these 113 subjects had a nonzero score on the other scan, regardless of the scoring algorithm used. Results of studies (52,53) suggested using calcium scoring as a first screening test in, for instance, emergency department settings. Individuals with a zero Agatston score would be sent home, whereas anyone with a nonzero Agatston score would undergo further testing. In our study, a slight variation in scan starting position would have led to different treatment in about 10% of patients. In the case of a zero score in a 3-mm data set, there is a 5% chance that this would be a false-negative score. A slightly higher percentage of 11% was found for conversions between two consecutive scans in an earlier study by Devries et al (22). Therefore, use of calcium scoring as a screening test in the emergency setting should first undergo careful consideration.
Our study results demonstrated that, in patients with Agatston scores higher than 10, mass and volume scores showed better reproducibility than Agatston scores, which is in accordance with results of previous studies (20,23,26,27). This improved reproducibility for volume and mass scores could not be shown in the category of subjects with an Agatston score between >0 and 10. This is probably because 11 of the 35 subjects in this category had a conversion from a zero to a nonzero score. This conversion occurs regardless of scoring algorithm and always gives a variability of 200%. We found no significant differences between the variabilities of volume and mass quantification.
Our study had limitations. First, we could not perform two 3-mm reconstructions with an offset of 1.5 mm from the raw data. To work around this, we chose to obtain the two 3-mm data sets by averaging. We tested the validity of our averaging approach by comparing calcium scores obtained from the 3-mm raw data reconstruction with scores obtained from the corresponding averaged 3-mm data set. We did not find identical scores, but differences were extremely small. Therefore, we expect our results to be representative of results obtained for two 3-mm data sets with 1.5-mm offset reconstructed directly from the raw data.
Another limitation of our study was that it included a low-risk population of postmenopausal women. Almost 50% of the subjects had a zero score. To be able to determine the conversion rate from zero to nonzero scores, we did not exclude these subjects. Relatively few subjects had a high score. Although we found lower percentage variabilities in the higher calcium score categories than in the lower calcium score categories, which is consistent with results of previous studies on interscan reproducibility (20–23), changes in scan starting position seem to explain a large part of the variability in all categories.
In conclusion, minimal variation in scan starting position has a substantial effect on the variability of calcium scores. Agatston, volume, and mass scores are all vulnerable to slight changes in scan starting position. This poses an inherent limitation to the reproducibility of calcium scoring by using contiguous 3-mm sections and prospective ECG-triggering techniques.
| ADVANCES IN KNOWLEDGE |
|---|
|
|
|---|
| IMPLICATIONS FOR PATIENT CARE |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Abbreviations: ECG = electrocardiography
Guarantor of integrity of entire study, A.R.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; manuscript final version approval, all authors; literature research, A.R., M.P.; clinical studies, all authors; statistical analysis, A.R., M.P.; and manuscript editing, all authors
Authors stated no financial relationship to disclose.
See also Science to Practice in this issue.
| References |
|---|
|
|
|---|
Related Article
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOLOGY | RADIOGRAPHICS | RSNA JOURNALS ONLINE |