SA-CME credits are available for this article here.
Medical images represent anatomical and/or functional facsimiles of the human body. As such, they serve a critical role in the diagnosis of diseases and the evaluation of treatment response. Current interpretations of the images by radiologists comprise an anthropogenic synopsis of 2-dimensional (2D) or 3-dimensional (3D) spatial data. Despite extensive efforts at standardization, evaluations continue to depend on the individual evaluating the images, resulting in variation of interpretation.
Radiomics is an emergent methodology within image analysis in which quantitative data is acquired using automated analysis techniques (Figure 1).1-4 The extracted information, also known as image features, can be combined with orthogonal data (eg, clinical data or biological measures [ie, mutations, transcriptomic panels, etc.]) to build prediction models for diagnosis or treatment selection. These strategies are poised to offer a more quantitative and objective basis for informed medical decision-making.1,5,6
The tripartite mainstays of cancer treatment include radiation therapy, chemotherapy, and surgery. These treatments extensively utilize medical images for diagnosis and to monitor efficacy. The imaging modalities most commonly used include computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET). The frequent utilization of these technologies provides clinical practices, even those with modest patient volumes, an extensive collection of mineable image data. Indeed, radiomics features have already been associated with improved diagnosis accuracy in cancer,7 specific gene mutations,8 and treatment responses to chemotherapy and/or radiation therapy in the brain,9,10 head and neck,11,12 lung,13-17 breast,18,19 and abdomen.20 More recently, radiomics features integrated into a multitasked neural network were combined with clinical data to derive a personalized radiation dose for patients treated with stereotactic lung radiation therapy.21 Altogether, these developments suggest that the integration of image data to inform clinical care is on the horizon.
Herein, we review recent developments in radiomics, its applications to lung cancer treatments, and the challenges associated with radiomics as a tool for precision diagnostics and theranostics.
A general workflow of radiomics is depicted in Figure 2. At the data collection stage, imaging data is combined with clinical and histopathological data. Image data must undergo additional steps before downstream analyses, however, including region-of-interest segmentation, and feature and texture extraction. Based on the classification task at hand (eg, local failure after radiation, progression-free survival after immunotherapy, etc.), researchers can then proceed to the next stage, the training and validation of the radiomics model. After training and validation, a dataset that the algorithm has not yet seen (test or holdout set) is used to evaluate the model. If the model is shown to be accurate, it may potentially provide clinicians with improved decision-making capabilities. Transportability testing (using a dataset from a distinct but plausibly related population) of the model is also critical since it can help determine whether the model can be implemented more broadly in other settings. To establish transportability, an independent dataset external to the primary institution should be used.
The first step in radiomics is data acquisition. A large sample size is required because of the complexity of the prediction task. Since machine learning and neural network-based models can learn multifactorial, nonlinear relationships between image-based predictors and outcomes, models can inadvertently too closely fit or “memorize” the data they are built on. This can lead to poor performance on previously unseen data, a phenomenon known as overfitting. To mitigate overfitting, large datasets and other strategies are implemented to build improved and more generalizable models.
Although first developed using CT images, radiomic methodologies have also been implemented for other modalities such as MRI, PET, and ultrasound (US). Models are usually built on a single modality to ensure the consistent treatment of images in the preprocessing pipeline. Images and clinical data used to build a radiomic model can be gathered from single or multiple institutions. To ensure standardization among the data presented to the model, there are critical quality assurance steps at both the data acquisition and preprocessing steps. Standardization of imaging protocols and having a clearly defined, universally applicable preprocessing pipeline are critical for model reproducibility.
At the time of imaging, acquisition and reconstruction parameters such as voxel size and gray-level discretization are central to achieving reproducible results. Other factors that may affect stability of radiomic features include respiratory motion and use of IV contrast. It has been previously shown that inter-CT scanner variability22 and variability of random noise23 may affect the stability of radiomic features. To decrease variability of the features during the collection process, resampling and image cropping to a uniform spacing and size prior to extracting features is recommended.24-26 Another data optimization technique involves clipping and normalizing voxel intensities. Lastly, data augmentation through preprocessing transformations or data generation using neural networks can increase the data available to a nascent radiomic model.27
Delineation of the tumor and normal tissue is a crucial first step in both radiation therapy and radiomics, directly influencing the performance of radiomic models.28 Appropriate segmentation is critical to models that extract predefined features directly, as well as to neural models, which can be trained to emphasize the designated areas. Identifying the section of the image to be used for segmentation and extraction of radiomic features is a topic of ongoing investigation. Traditionally, features are extracted from the segmented tumor region. However, there is also increasing interest in image characteristics adjacent and external to the gross tumor volume. For example, Dou et al29 have shown the possibility to improve multivariate models to predict the risk of distant metastasis by extracting features from the peritumoral region.
There are certain obvious challenges with manual segmentation: Tumors may be near tissue with similar characteristics, making it difficult to distinguish between the two structures. Moreover, medical images may have distortions due to random noise, imaging resolution, and artifacts. To reduce intra- and interobserver variability, automatic or semi-automatic segmentation may improve the stability of radiomic features. Various methods have been proposed for semi-automatic segmentation.30,31 With recent advances in deep-learning algorithms, fully automatic segmentation methods have also been developed.32
Originally, radiomic models were developed from predefined, “handcrafted” features consisting of algebraic representations of voxel intensities. This structured data can be analyzed with classical statistics or with machine learning and neural networks. More recently, convolutional neural networks have been implemented to directly learn properties of the image, allowing for the extraction of features beyond those conceived and crafted by humans. Aspects from either or both methodologies can then be merged into a representative quantity (or quantities) known as an image signature.
Features can be categorized based on origin. Semantic features are those currently used in clinical practice as visualized and described by the radiologist. Radiomics complements these with nonsemantic, quantitatively and systematically extracted features, based on voxel intensity. Classic quantitative radiomic features can be further categorized as structural, first, second, and higher order. Structural features are the most basic descriptive and derived measures such as tumor volume, shape, maximum diameter, and surface area. These features can help quantify tumor spiculation and other factors that may indicate tumor malignancy. First-order features refer to simple statistical quantities such as mean, median, and maximum gray-level values found within the segmented tumor. Extracting second-order, or textural features, quantifies statistical inter-relationships between neighboring voxels. This provides a measure of spatial relationship between the voxel intensities in the tumor, which may allow for the determination of tissue heterogeneity.33 Higher-order statistical features are extracted by applying filters and transformations to the image. Two of the most popular methods are the Laplacian transforms of Gaussian-filtered images and wavelet transforms. Such higher-order methods increase the number of features extracted by the order of magnitude of filters applied. This allows identification of image attributes based on various spatial frequency patterns. Lastly, incorporating the change in radiomic features over time, or delta radiomic features, has been shown to improve lung cancer incidence,34 overall survival, and metastases prediction.35
Deep learning, a subset of machine learning, uses a neural network model that mimics the connectivity of a biological brain to identify complex abstractions of patterns using nonlinear transformations. Neural network models learn directly from unstructured data such as images through convolutional layers that synthesize voxel intensities into representative features. Deep-learning approaches typically require more data, a challenge that can be mitigated through various techniques such as data augmentation36 and transfer learning.37
Manual feature extraction can result in thousands of radiomic features, some of which are redundant. In a dataset with clinical events (eg, local failure after radiation therapy) occurring at much lower magnitudes, inclusion of large-scale parameters with low event rates can contribute to model overtraining or overfitting. Utilization of feature selection techniques can help alleviate this potential pitfall.
Radiomic feature selection methods focus on stability of features, feature independence, and feature relevance. The stability of features may be analyzed with a test-retest dataset in which multiple images of the same modality are taken over a relatively short period to test whether such features are reproducible.38 Feature independence is assessed by statistical methods testing the correlation between the features themselves, such as principal component analysis (PCA). Feature selection based on relevance can be done with a univariate approach, testing whether each individual feature is correlated with the outcome being investigated, or a multivariate approach, which analyzes the combined predictive power of the features.
Parmar et al11 used clustering as a method to contend with the large number of quantitative features. The high-dimensional feature space was reduced into radiomic clusters, with clusters being predictive of patient survival, tumor stage and histology. Alternatively, neural networks have been shown to learn increasingly detailed geometries in each subsequent convolutional layer, and can be used to generate a set of highly descriptive image features.39
A predictive model is then constructed from the extracted relevant features creating a “radiomic signature.” Depending on the task at hand, various prediction models can be utilized (eg, classification and survivability models). Classification models categorize data into known categories (eg, tumor is benign or malignant). Survivability models require additional time-related information about the patients being treated, and aim to predict the time to failure or survival of patients undergoing a certain treatment. One approach to predict time-to-event clinical outcomes is by making the image signature equivalent to the logarithm of the hazard ratio in a Cox regression model.21,40 Other machine-learning methods can then be used with either manually extracted features or the outputs of neural network models to derive prediction scores.
To show that the radiomic model is generalizable, it must be validated. Model validation on an independently obtained external dataset is recommended. The model is usually analyzed using the receiver operating characteristic (ROC) curve with the area under the curve (AUC) being the commonly reported value in discrimination analysis. Model validation should be repeated on a target population prior to its deployment to ensure transportability.
The rapid proliferation of radiomics applications has fueled optimism that medical images can be utilized to better guide clinicians in the recommendation of optimal treatment strategies. As with every technique and technology, however, certain challenges require attention to create and implement a robust radiomics model.
Collecting and sharing data over multiple institutions or hospitals is a significant limitation to model development and testing. A single institution or hospital typically does not have enough events to establish and test a transportable radiomics model. To address this need, multiple data-sharing networks have been established to house shared data such as the Cancer Imaging Archive41 and the Quantitative Imaging Network.42 Contributions of well-annotated data to the shared datasets or collaborations between multiple institutes are critical for future model development and implementation.
In multi-institutional radiomics studies, it is rare that all institutes share the same imaging acquisition settings such as imaging modality, protocol, or reconstruction algorithm. Additionally, image segmentation and interpretation of the data may be highly subjective and prone to human variations. While a highly standardized dataset will more likely guarantee a consistent model and reproducible predictions, this is an impractical expectation of a large dataset, especially in a multi-institutional setting. Data cleaning and preprocessing can mitigate these challenges through selection of similarly annotated images, image resampling, retrospective segmentation, and even translation of one modality to another.43 Additionally, the robustness of radiomic models built on multi-institutional datasets can be inherently higher since they are less prone to overfitting caused by a single institutional standard.
Although radiomic models may be highly performant on the data on which they are built, prediction results may be affected when implemented into real clinical settings due to model under-or overfitting. Therefore, it is crucial to use independent, external datasets to evaluate the predictive power of the established radiomic signature. Additionally, the radiomic model should be trained on new data as the standard of care continues to improve for it to adapt to new treatment protocols and prognosis, as well as to better quantify its accuracy. A reliable method to maintain an up-to-date radiomics model can be as critical as establishing the initial model. As data-sharing archives41,42 (noted above) become more prevalent, the need for large volumes of current, external images will be met. Since radiomic models can be deployed through online or locally hosted software, they are highly movable even if the independent data on which they are evaluated is not.
Since radiomics is a fairly new concept and model structures are inherently abstruse (representing a black box), questions and concerns are often raised toward the ultimate implementation of radiomic models. Physician skills and intuition are honed over years of training and experience. There is anticipated to be a gulf of trust between physicians’ “gestalt” and experience-driven approaches with the current difficult-to-interpret output of artificial intelligence systems. Efforts to improve the interpretability of predictive models include feature selection through bootstrapping44 as well as development of saliency maps highlighting the relative importance of voxels to the predicted outcome.21 The implication that radiomic models manifest underlying biology by being able to classify histological subtypes45,46 and gene mutations47,48 makes the association between genetics and radiomics an active area of research. This type of integrative analyses of known risk factors is needed to explain the meaning of radiomic features. Promoting enhanced interpretability of radiomic and neural-network-derived models will be a critical step to catalyze implementation as a decision-support tool.
A growing number of studies show the value of radiomics as a tool to augment clinical decision-making, with significant progress in applying radiomics to lung cancer diagnosis, treatment, and risk evaluation.
Aerts et al38 created a radiomic signature prognostic of overall survival in independent cohorts of patients based on intensity, shape, textural, and wavelet features. The features were selected based on stability using test-retest CT scans, independence, and univariate predictive capability of the features before constructing a multivariate model including the top feature from each of the four feature groups. Several radiomics studies have shown diagnostic potential in CT-based models to discriminate cancerous tumors from benign nodules.
A number of studies have also applied radiomics to predict histology based on pretreatment CT images45,46 and radiogenomics to identify the tumors’ underlying gene expression.47,48 Currently, histological classification and genetic subtyping depend on biopsies and re-biopsies. If radiomics methods achieve clinical levels of accuracy, it may allow patients to forego numerous invasive biopsies. For example, Wang et al47 showed that it is possible to create a deep neural network using CT images to provide an accurate method to establish epidermal growth factor receptor (EGFR) status in lung adenocarcinoma patients, potentially reducing the need for biopsy.
Another set of studies looked at the prognostic and predictive possibilities of using the radiomic approach—an important area in precision medicine because it informs the creation of an optimal treatment plan. Such studies predict probability of response to treatment,49 survival,50,51 and risk of metastases.29,52
Extending classification and survivability models to guide treatment, Lou et al21 developed an image-based, deep-learning framework for the individualizing of radiation therapy dose. First a risk score was identified by a deep neural network, Deep Profiler. This signature outperformed classical radiomic features in predicting treatment outcome. This framework also incorporates a model to project optimized radiation dose to minimize treatment failure probability.
Hosny et al36 trained deep neural networks to stratify patients into low- and high-mortality risk groups, and were also able to outperform models based on classical radiomic features as well as clinical parameters. The neural network predictions were largely stable when tested against imaging artifacts and test-retest scans. In addition, there was a suggestion that deep-learning extracted features may be associated with biological pathways including cell cycle, DNA transcription, and DNA replication.
Altogether, radiomics could potentially serve an important complementary role to other orthogonal data such as genetic and clinical information to improve assessment of clinical characteristics and molecular information.
The models discussed have translation potential because they could be integrated into clinical practice upon additional and prospective validation. Imaging is a mainstay of clinical use, and software deployment of radiomic models are noninvasive and, if designed with user input, can be seamlessly integrated into daily workflow for the intended specialist (eg, radiologist or radiation oncologist).
There are a several avenues of implementation for software facilitating radiomic analyses into routine clinical practice. These include improved segmentation through semi-automatic or automatic contouring, which can be achieved by traditional image analysis techniques such as region-growing,30,31 convolutional techniques such as neural network-based segmentation,32 or “smart-contouring” techniques based on the regions of an image determined to be salient based on a deep-learning model.21 Another promising area for integration is risk-profiling. Modeling risk can be achieved through a software package paired with an institution’s existing imaging server. This should minimize significant disruption of the existing clinical workflow. As with any method of risk-profiling, predictive radiomic models could serve as an advisory decision-support tool in the hands of the radiologist and radiation oncologist. Specifically, radiomic models that both model and mitigate the risks are poised to alter the clinical paradigm(s). Adjusting treatment strategies through dose-specific21 or targeted agent-specific recommendations represent possible uses that could improve clinical outcomes in select patient populations. As with segmentation and risk-profiling, these applications can be achieved through software deployment.
Lastly, while other biomarkers are likely to represent critical orthogonal inputs to more accurately predict clinical outcomes, it is possible that tumor intrinsic determinants (ie, genetic alterations, RNA gene expression, etc.) can be detected by radiomic features, as suggested.38,53 Additional studies that seek to determine whether these classes of variables (image vs biology) are tautological, orthogonal or somewhere in between will be critical to assessing the need for additional inputs into the models. Convergence toward an integrative approach that incorporates these varied inputs is likely unavoidable in order to improve model accuracy and ultimate clinical deployment.
Radiomics is a computational image evaluation technique that integrates medical images, clinical data, and machine learning. Despite hurdles to implementation, radiomic models show immense potential for personalized lung cancer diagnosis, risk profiling, and treatment due to their ability to incorporate image characteristics beyond the ken of the human observer.
Kuzmin GA, Gidwani M, Ma T, Zhuang T, Abazeed ME. An emergent role for radiomic decision support in lung cancer. Appl Rad Oncol. 2019;8(4):24-30.
*These authors contributed equally to this work. Dr. Kuzmin is a physics resident, Department of Radiation Oncology, Cleveland Clinic, OH. Ms. Gidwani is a graduate student, Department of Translational Hematology Oncology Research, Cleveland Clinic. Dr. Ma is a physics resident, Department of Radiation Oncology, Cleveland Clinic. Dr. Zhuang is a physicist, Department of Radiation Oncology, Cleveland Clinic. Dr. Abazeed is the director, Center for Precision Radiotherapy, Department of Radiation Oncology, and assistant professor, Cleveland Clinic Lerner College of Medicine. Disclosure: None of the authors received outside funding for the production of this original manuscript and no part of this article has been previously published elsewhere. Dr. Abazeed reports grant support from Siemens Healthcare, Malvern, PA.