Abstract
Objectives
Attenuation correction is a critical phenomenon in quantitative positron emission tomography (PET) imaging with its own special challenges. However, computed tomography (CT) modality which is used for attenuation correction and anatomical localization increases patient radiation dose. This study was aimed to develop a deep learning model for attenuation correction of whole-body 68Ga-DOTATATE PET images.
Methods
Non-attenuation-corrected and computed tomography-based attenuation-corrected (CTAC) whole-body 68Ga-DOTATATE PET images of 118 patients from two different imaging centers were used. We implemented a residual deep learning model using the NiftyNet framework. The model was trained four times and evaluated six times using the test data from the centers. The quality of the synthesized PET images was compared with the PET-CTAC images using different evaluation metrics, including the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), mean square error (MSE), and root mean square error (RMSE).
Results
Quantitative analysis of four network training sessions and six evaluations revealed the highest and lowest PSNR values as (52.86±6.6) and (47.96±5.09), respectively. Similarly, the highest and lowest SSIM values were obtained (0.99±0.003) and (0.97±0.01), respectively. Additionally, the highest and lowest RMSE and MSE values fell within the ranges of (0.0117±0.003), (0.0015±0.000103), and (0.01072±0.002), (0.000121±5.07xe–5), respectively. The study found that using datasets from the same center resulted in the highest PSNR, while using datasets from different centers led to lower PSNR and SSIM values. In addition, scenarios involving datasets from both centers achieved the best SSIM and the lowest MSE and RMSE.
Conclusion
Acceptable accuracy of attenuation correction on 68Ga-DOTATATE PET images using a deep learning model could potentially eliminate the need for additional X-ray imaging modalities, thereby imposing a high radiation dose on the patient.
Introduction
68Ga-DOTATATE positron emission tomography/computed tomography (PET/CT) has emerged as a sensitive and accurate functional imaging method with significant advantages over conventional imaging in the diagnosis and management of neuroendocrine tumors (1,2). In PET imaging, a positron emitter radiopharmaceutical is administered to a patient that emits two 511-keV gamma photons in opposite directions following positron annihilation. However, the gamma pair can undergo photoelectric and Compton interactions before reaching the detector, leading to photon attenuation, poor contrast, and errors in quantitative calculations (3,4).
If the PET images obtained from the standard uptake value (SUV) for diagnosis, prognosis, and treatment-related issues are adequately corrected, it can enable quantitative measures with considerable accuracy (5). The use of CT-based attenuation correction (CTAC) algorithms is considered one of the most common and well-known methods of attenuation correction (AC) in PET (4). The main drawback of these methods is the imposed high effective dose on patients. There was a report in the early days of introducing PET/CT that showed that the average effective dose of patients from whole-body 18F-fluorodeoxyglucose (18F-FDG)-PET/CT examinations was approximately 25 mSv (6). On the other hand, PET radiopharmaceuticals usually have an effective dose of 10 mSv (7). Therefore, the majority of the radiation dose received from imaging is related to CT scans. Because obtaining the tissue attenuation map directly from magnetic resonance imaging (MRI) signals poses challenges, various methods have been employed to address this issue (8,9,10,11,12). One of the commonly used methods for AC in PET/MRI scanners is the Dixon-based method (9). Nevertheless, a major drawback of this method is its failure to account for bone tissue (10). Consequently, a model-based approach was adopted to address this limitation (11). However, this method introduces a quantification error due to inconsistent registration (12). The inconsistency and small field of view of MRI compared with PET can result in the loss of information from certain body parts (13). Meanwhile, the maximum likelihood reconstruction of activity and attenuation (MLAA) algorithm can be used to obtain missing information and create an attenuation map from the PET emission data (14). However, one drawback of this algorithm is the presence of high noise and induced cross-talk artifacts (15). Additionally, atlas-based segmentation methods (16,17,18) have been employed, but they suffer from incorrect classification of tissue, anatomic abnormalities, noise, and metal-induced artifacts, making AC a challenging issue in PET/MRI (19). In recent years, deep learning has demonstrated great potential in enhancing medical image quality, denoizing, and artifact reduction (20,21). So far, deep learning has been used in producing synthetic CT using MRI images for AC in PET (22), including direct transformation to pseudo-CT from T1-weighted MR, ultrashort echo time, zero-TE MR, Dixon, estimation of AC factors from time-of-flight data (23,24,25,26,27), generation of synthetic CT images from non-AC (NAC) PET images on whole-body PET/MRI imaging, and MLAA-based AC maps (28,29,30). However, there is a need for structural images, and accuracy is compromised by image artifacts from misregistration and inter-modality errors (31). Several studies have attempted to directly convert NAC PET images to corrected PET images without the need for multiple imaging modalities such as MRI and CT (31,32). These studies employed different approaches and models in different areas of the body and with different radiopharmaceuticals.
In the present study, we aimed to develop an optimal deep-learning model for AC of whole-body 68Ga-DOTATATE PET images without relying on anatomical structures.
Materials and Methods
Data Acquisition
68Ga-DOTATATE whole-body PET images of 118 patients from two imaging centers (59 images from center 1 and 59 images from center 2) were retrospectively included in the study. This study was approved by the Research Ethics Committee of Tabriz University of Medical Sciences (approval no.: IR.TBZMED.REC.1401.584, approval date: 03.10.2022), which ensures adherence to ethical standards. The examinations were performed using 5-ring BGO-based PET/CT and 3-ring LSO-based PET/CT scanners. PET imaging was performed approximately 60 min after injection of 1.85 MBq 68Ga-DOTATATE per kilogram of patient weight. Before radiotracer injection, a low-dose CT scan was performed for AC and anatomical localization.
Data Preprocessing
From the 118 68Ga-DOTATATE PET images, 85% of the data from each center were considered for training the model, while 15% were used for external validation. In addition, 15% of the training dataset was set aside for validation during the training process to evaluate the loss function and prevent overfitting. To reduce the dynamic range of image intensity, all PET images, including CTAC and NAC images, were converted to SUVs. In addition, to reduce the computational load, the image intensities were normalized by an empirical fixed value of 9 and 3, respectively.
Network Architecture
A deep learning algorithm based on the NiftyNet platform was utilized to generate PET/CT image AC using reference (PET-CTAC) images. NiftyNet is an infrastructure built upon the TensorFlow library and is designed to be used in various image analysis programs. It supports segmentation, regression, image generation, and reconstruction tasks. Therefore, it plays a vital and fundamental role in speeding up clinical work, including diagnostic and therapeutic procedures (33). The NiftyNet platform is a high-resolution residual neural network (HighResNet) (34). Our prepared network was composed of 20 residual layers. In the first seven layers, a 3x3x3 voxel kernel is employed to encode low-level image features, such as edges and corners. This kernel is dilated by factors of 2 and 4 in subsequent layers to extract mid- and high-level features. Then, a residual connection is used to link all two layers. In the residual blocks, each layer comprises an element-wise rectified linear unit (ReLU) and batch normalization. The structural details of the model are shown in Figure 1.
Implementation Details
In this study, we used the following parameters to train the network: lr=0.001, activation function=leakyReLU, loss function=l2 loss, optimizer=Adam, decay factor=0.00001, batch size=12, queue length=480. The model was trained four times but was evaluated six times using test datasets with different matrix sizes.
Initially, a dataset comprising 50 samples from center 1 with a matrix size of 192x192 was used to train the network. However, the test was separately conducted using 9 images from each center, namely, center 1 and 2 test datasets for the first and second evaluations, with a matrix size of 192x192. Specifically, the matrix size of 9 test datasets from center 2 was resized from 200x200 to 192x192.
The second training was performed using only 50 data samples from center 2, with a matrix size of 192x92. The model was then tested using 9 datasets from both centers, i.e., center 1 and 2 test datasets for the third and fourth evaluations, with a matrix size of 192x192. For the third training phase, a total of 100 samples were used to train the network with 50 data from each center and a matrix size of 192x192. Additionally, 18 samples (9 datasets from each center) were used as the test dataset in the fifth evaluation. For the fourth training session of the network, a dataset of 100 samples from both centers was used, with a matrix size of 200x200. The matrix size of the 50 images from center 1 was resized from 192x192 to 200x200. Eighteen data points were utilized for network testing from both centers, as the sixth evaluation with a matrix size of 200x200.
Statistical Analysis
In this study, we performed statistical analyses to explore the relationship between the two variables. Specifically, we utilized the Pearson correlation coefficient to assess the type of connection, and the paired sample t-test to calculate the p-value. Additionally, we computed several evaluation metrics, including peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), mean squared error (MSE), and root mean squared error (RMSE) for parameters such as the view signal-to-noise peak ratio, structural similarity, and average error rate.
Evaluation Strategy
The performance of the prepared model was assessed using various quantitative metrics, including quantitative metrics such as PSNR (Eq.1), MSE (Eq.2), RMSE (Eq.3), and SSIM (Eq.4). The metrics were computed by comparing the reference PET-CTAC images with the images generated by the network [PET-deep learning AC (PET-DLAC)]. The metrics are defined as follows:
where in Equation (Eq.) (1), R2 represents the maximum value of the PET-CTAC images as the reference image, represents the predicted image, and MSE denotes the mean squared error. In Eq. (2), “n” indicates the number of voxels inside the region of interest, “i”denotes the voxel index, petpredict stands for AC PET images, and petref stands for the reference PET-CTAC images.
In Eq. (4), µref and µpre represent the mean values of the reference and predicted PET images, respectively, σref and σpre are the variances of the petref and petpredict images, where σref,pre represents their covariance. Additionally, c1 and c2 are two parameters with constants c_1 = 0.01 and c_2 = 0.02 in Eq. (3), respectively, to avoid division by very small values.
Furthermore, to illustrate the voxel-wise distribution of radiotracer uptake correlation between PET-CTAC and PET-DLAC images, a joint histogram analysis was performed for SUV values ranging from 0.1 to 18 using 200 bins.
Results
The summary of the mean ± standard deviation of the image quantitative assessment parameters, including MSE, PSNR, RMSE, and SSIM, that were calculated between the SUV of PET-CTAC images as the reference and the 18 test datasets predicted by the model for the six evaluations, are demonstrated in Table 1. Although the values of these parameters in all evaluations were acceptable, there were some variations among them. Among all evaluations, the fourth evaluation obtained the highest PSNR value (52.86±6.6), indicating a better representation of image quality. The third evaluation showed the lowest PSNR (47.96±5.09), and the fifth evaluation had the lowest MSE value (0.000121±5.07xe-5) and RMSE (0.01072±0.002) value, indicating a smaller deviation from the reference images. The second evaluation demonstrated the highest MSE (0.0015±0.000103) and RMSE (0.0117±0.003). Additionally, the sixth evaluation showed the highest SSIM level (0.99±0.003) among all evaluations, while the second evaluation showed the lowest SSIM level (0.97±0.01) compared to the reference images. A box plot comparing the parameters in the six evaluations is shown in Figure 2. Furthermore, we calculated the maximum SUV (SUVmax) difference between PET-CTAC and PET-DLAC images in 20 superficial regions of interest (ROIs) and 20 deep ROIs in the axial section along the x-axis for each evaluation. For all evaluations p-value <0.05, Only for some evaluations where the original size of the images had been changed such as the sixth evaluation p-value >0.05. Additionally, the SUVmax difference was calculated for 5 ROIs within the tumor volumes for each evaluation in the axial section along the y-axis, for the first, fourth, and fifth evaluations p-value <0.05. The coronal views of the NAC, PET-CTAC, and PET-DLAC images, as well as the bias map between PET-CTAC and PET-DLAC images, are shown in Figures 3 and 4. The images represent the results of the four training sessions performed on the image represents the nine test data from two imaging centers. In all nine test data related to the four train sets, errors and underestimations were visually observed compared to the reference images. In particular, where the matrix size of images was set to 200x200, the rate of underestimation increased, and it should be noted that center 2 had data of the same matrix size. However, when the matrix size of 192x192, the underestimation was at its lowest level. Additionally, by reducing the size of the images, the number of errors observed in the lungs was significantly reduced; thus, images related to center 2 in the second training set showed the lowest number of errors and underestimation. These images were created using only data from the same center and were resized to 192x192 pixels. In general, most images exhibited the highest amount of error in the lungs, while the liver, kidneys, and bladder images exhibited the highest amount of underestimation. The joint histogram in Figure 5 reveals that there was the highest voxel-wise similarity between PET-CTAC and PET-DLAC images in the first evaluation and first training within the data from center1, R²=0.95, and a curve slope of 1.10. In contrast, for the fourth evaluation related to the second train, the correlation coefficient remained high at R²=0.95, but the slope was slightly lower at 0.95. The lowest R² value of 0.82 was observed in the second evaluation, which is related to the first training. In this case, the training dataset was obtained from center 1, and the test dataset was obtained from center 2. Both datasets were resized to match the size of the training dataset. In summary, joint histogram analysis revealed a significant level of similarity between PET-CTAC and PET-DLAC images.
Discussion
In this study, we used a deep learning model for the AC of whole-body 68Ga-DOTATATE PET images without the need for structural information. The model was also evaluated using training and test datasets from two distinct imaging centers to assess and enhance its performance. In recent years, there has been a significant concern about AC in PET images using deep learning methods. Many studies have been conducted to generate pseudo-CT images using MRI (22,23,24,25,26,27) or NAC images (28,29) for AC purposes, but these methods require an additional modality as well as insufficient accuracy due to the large mismatch of images between the two modalities, and many artifacts and errors can be observed between them (31). Hence, there are many studies on PET image AC based on NAC images, without the need for structural images (CT or MRI). Shiri et al. (32) used a deep convolutional encoder-decoder (deep-DAC) network to calculate AC directly for 18F-FDG PET brain images. They achieved promising results on 18 images with a PSNR of 38.7±3.54 and SSIM of 0.988±0.006, respectively. Dong et al. (31) proposed 3D patch-based cycle-consistent generative adversarial networks (CycleGAN) for AC of 18F-FDG PET whole-body (n=30) images and reported an average PSNR of 44.3±3.5 and NMSE of 0.72±0.34. Likewise, Mostafapour et al. (35) proposed the ResNet model for AC of 46 PET images with 68Ga-PSMA and reported PSNR and SSIM 48.17±2.96 and 0.973±0.034, respectively. However, to enhance and elevate the accuracy of outcomes, further studies are needed. In this study, we used the Resnet model to obtain 68Ga-DOTATATE PET whole-body images. Our proposed model was trained four times and evaluated six times using 18 test datasets from two imaging centers for 68Ga-DOTATATE PET images with different matrix sizes. In all 18 test data bias maps across six evaluations, high error rates were observed in the lungs, whereas the liver, bladder, and kidneys displayed a marked tendency toward underestimation. It is worth noting that the magnitude of these errors was substantially diminished by decreasing the dimensions of the images. Although the evaluations did not show significant differences, certain errors undoubtedly stemmed from the incomplete AC of the reference images, which cannot be overlooked. It may be advisable to use data from the same center to train the model at a specific center to achieve optimal AC. Additionally, the results indicate that reducing the image matrix size relative to the increase in size can improve model performance. From the viewpoint of image quality, although our model was not comparable with the CTAC approach, it ruled out the radiation dose from CT. However, our promising finding reveals the potential of the model for further exploration on larger datasets with possibly enhanced levels of accuracy in future studies.
Conclusion
This study demonstrated the performance and feasibility of a deep learning model for AC in whole-body 68Ga-DOTATATE PET images. The results indicate the accuracy and high performance of the model, demonstrating its potential for effectively correcting attenuation in PET imaging. It appears that the model can reduce the reliance on CT images for AC of PET images, thereby minimizing additional radiation exposure to the patient.