Real-World Performance of AI-Powered ICH Triage System

Intracranial hemorrhage (ICH) is a life-threatening condition with high mortality and disability rates. In March 2018, the FDA cleared Viz ICH [1], an AI tool that analyzes brain NCCT studies for the presence of an ICH. Upon detection of a suspected ICH, Viz.ai sends a notification to alert a multidisciplinary team to promote an efficient workflow, leading to quick diagnosis and treatment. The objective of this study is to characterize the performance of Viz ICH in real-world settings. The FDA clearance is based on data generated in controlled settings; however, an effective AI tool should prove its potency and efficacy with heterogeneous data, originating from sites with varying equipment, protocols, and personnel skills. 3139 patients from 127 hospitals were analyzed using Viz ICH. Results were used to evaluate the detection of ICH, measured by accuracy, sensitivity, specificity, PPV, NPV, and time to notification. On a real-life consecutive set of 3156 scans, Viz ICH showed a high accuracy rate of 98.57%, and sensitivity, specificity, PPV and NPV of 90.48%, 98.97%, 81.1% and 99.53%, respectively. Mean time to notification was 3.39 minutes for ICH suspected cases. Viz ICH demonstrates robust performance in heterogeneous, real-world settings. Furthermore, the swift time to notification can assist clinicians in speeding up diagnoses. Viz ICH can therefore be considered a reliable AI tool for medical centers of various sizes and capabilities for the goal of improving patient outcomes.


Introduction
ICH is a medical emergency, resulting in a mortality rate of 35% to 52% at 30-days.Survivors' functional outcome is often very poor, with fewer than 20% being independent at six months [2].
ICH is typically divided into spontaneous (nontraumatic) ICH and traumatic ICH.Spontaneous ICH accounts for 9%-27% of all strokes [3], but the global burden of brain hemorrhage is greater than that of ischemic stroke in terms of death and disability, even though the incidence of ischemic stroke is twice as great [4].and Epidural hemorrhage (EDH).Traumatic ICH is a common result of traumatic brain injury (TBI) [5], and is the most common cause of death in lethally injured trauma patients accounting for up to 50% of fatalities [6], resulting in a significant amount of long-term disability [7].
ICH is a dynamically evolving process [8], in which response time is critical.Disease outcome is primarily affected by hematoma growth, perihematomal edema, and evolving intracranial pressure, that may necessitate immediate neurosurgical intervention, which may be life-saving [9].Early detection and management of ICH have been proven to be associated with improved clinical outcomes [10,11].
Importantly, also in the setting of an acute ischemic stroke, it is essential to rule-out ICH as quickly as possible to enable optimal treatment to restore blood flow before irreversible brain damage occurs [12].
Both acute ischemic stroke and ICH present similar clinical features, but whereas treatment with thrombolytics is an essential part of ischemic stroke management [13], it can be fatal when given to a patient with ICH.The detection of ICH in a timeefficient, and reliable manner is fundamental and crucial in both settings.
The use of CT exams, since the 1970s, has changed the way we manage and diagnose patients, especially in the acute setting.The use of CT improved the treatment of many acute medical conditions, including stroke [14].According to Power et al. (2016) CT has many advantages, such as availability of scanners and fast results, allowing physicians to confirm diagnoses and exclude critical conditions.
Over the past decades, imaging and thereby radiologists' on-call workload in the ambulatory and ER setting has shown tremendous growth.The increase in imaging studies over the past years represents a big challenge for radiologists and medical centers around the world and necessitates adjustments in the workforce and workflow to accommodate these evolving changes while guaranteeing the quality and safety of patients and avoiding radiologist's burnout [15,16].
In the initial clearance in March 2018 [1], the United States Food and Drug Administration (FDA) approved Viz ICH as a notification-only, parallel workflow tool for use by hospital networks and trained clinicians to identify and communicate images of specific patients to a multidisciplinary team, independent of standard of care workflow.Viz ICH uses an AI based algorithm to analyze head NCCT images for findings suggestive of ICH and notifies appropriate medical specialists of these findings in parallel to standard of care imaging interpretation, as shown in figure 1.When ICH is suspected, the system sends a notification to a multidisciplinary team, recommending review of those images.Images can be viewed through a mobile application (figure 2) or via the Viz web viewer (figure 3).
The primary focus of this study is to evaluate the performance of the Viz ICH device in terms of accuracy, sensitivity, specificity, PPV, NPV, and timeto-notification on a diverse dataset from different hospitals of varying sizes and capabilities to obtain a robust estimate of performance that represents realworld settings.

Data Collection
The data consists of 4 subsets, each one collected sequentially from multiple hospitals during the time periods shown in Table 1.Data was collected and reviewed on a quarterly basis as a post-market real world performance monitoring quality assurance mechanism of the Viz ICH medical device.Selected hospitals and collection dates were chosen at random to enable variety and examine device generalization.Collection time periods were selected to represent the full range of intended population scanned throughout a day in both week/end days.
Scans were excluded from the set-in cases of motion artifacts, metal artifacts and low quality that limit scan readability.Postoperative scans were also excluded due to inadequacy for the ICH screening task.
The collected cohort demonstrates diversity of patient gender, age, scanning technical parameters (such as CT manufacturer and model, scan resolution and protocols).The hospital type also varied, with both Primary Stroke Centers (PSC) and Comprehensive Stroke Centers (CSC) (see Table 2).

Truthing
The CT scans were interpreted by radiology trained clinical specialists and a neuroradiology fellow.Every scan was evaluated for the existence of ICH and the bleed type.

Time to Notification Measurement
To assess the time saved, the time-to-notification was computed.The time-to-notification per scan relates to the time elapsed between the CT acquisition start time and the time in which the notification was sent to the user.

Statistical Analysis
Statistical analysis was performed using descriptive data, including ranges, means, medians, standard deviations for continuous variables, and frequencies and percentages for categorical variables.System performance methods were examined using sensitivity, specificity, PPV and NPV.
Confidence intervals were calculated with the Clopper-Pearson interval based on Beta distribution.Receiver Operating Characteristics (ROC) analysis was used to determine Viz ICH classification performance over multiple suspected ICH size volumetric thresholds.

Results
Over the chosen timeframe, 3420 non-contrast head CT scans were analyzed by the Viz ICH algorithm, out of which 264 were excluded (see Truthing method section).Out of the 3156 scans, 147 were positive for ICH, with ICH prevalence of 4.6%.Scans were collected from 127 hospitals, 33 of them were comprehensive stroke centers (Hubs) and 94 primary stroke centers (spokes).Table 2 illustrates the distribution of study population according to patients' age range and gender, hospital hierarchy and scan technical parameters, such as slice thickness, number of slices and scanner manufacturer.

Study Scans Categorical Sub-Grouping
Performance Analysis

Discussion
Viz ICH is an AI based tool, intended for analyzing NCCT images of the brain are often acquired in the acute setting and demand rapid interpretation for optimal patient care.When the algorithm identifies a scan as ICH suspected, the system notifies a multidisciplinary team, recommending immediate review of those images.Images can be promptly previewed through a web interface or through a mobile application, allowing clinicians to assess the case in a remote manner.
Over the last years, radiologists' workload has increased dramatically [17].Long-hour shifts, increasing number of scans, and manpower shortage may not only lead to radiologist' burnout but also risk critical diagnostic errors [18,19].Since medical imaging is a major contributor to the overall diagnostic process, it is also a major potential source of diagnostic error [20].Clinically significant errors may vary between 2-20% [21].Different methods such as double reporting was shown to be effective [22] but are time consuming and demand more workforce [23].AI based algorithms, such as Viz ICH, could be used to improve the workload on radiologists, shorten reporting time and serve as additional tools for radiologists.
ICH is a life-threatening condition, with high mortality rates, and complex morbidity [24].ICH is generally an acutely evolving process, for which quick diagnosis and treatment are key to better outcomes [24,25].Most importantly missed ICH in NCCT scans may have a fatal impact on patients, leading to delay in diagnosis, complications, and death [26].AI tools, designed to detect and notify about suspected ICH, could help decrease error rate, minimize false negatives diagnosis [27] and shorten notification time.
In this study, we aimed to examine the performance of Viz ICH in real-life settings.While FDA clearances of computer-aided triage devices typically include supporting information regarding the sensitivity and specificity of the cleared device as well as average time-savings, these data-points need to be validated in a real world setting for several reasons: • Quality bias: sites participating in pivotal studies supporting FDA submissions are typically at the cutting edge of the field.As such, the average quality of a scan at a site participating in a pivotal study may not represent the average quality of a scan in an average hospital in the US.Thus, these data may be biased to have higher quality imaging due to more modern or advanced equipment.Greater experience of the CT technologists and staff may lead to fewer technical issues or patient-driven issues, such as patient motion.
• Overfitting due to lack of diversity in data: Overfitting [28] is a general problem in modeling whereby a model may have good predictive capabilities on a training dataset, but lower performance on real-world data.While there are many reasons for this phenomenon, the most relevant for AI applications in medical imaging is when the distribution of the training set data is different from the distribution of the real-world test data.For example, a device developed using data from only a subset of scanner makes, models and configurations, might obtain good performance on these data, but yield lower performance when applied on more diverse data.When a device is developed using a dataset with limited diversity, there is risk that performance on data of greater diversity will be lower.
• Accuracy: Since Viz ICH was approved, a great amount of data was processed from many sites.This real-world data could be leveraged to evaluate the AI performance more accurately.
• Generalization: The use of real-world evidence, such as that presented here, may help payors, clinicians, and administrators decide if there is sufficient evidence that the proposed solutions would work in their specific situations.Moreover, demonstration of effectiveness of workflow enhancements -such as earlier alerting of specialists -does not necessarily guarantee transferability of that enhancement to a different system without understanding the specific circumstances of that system.The value of such evidence is to demonstrate the realistic possibility.
•  [31], with an estimated population of 328.24 million for that year [32].Based on these parameters, we can estimate the SAH prevalence to be 0.003%.SAH incidence rates vary between 6.2 and 11.15 per 100,000 people [33,34].A low disease prevalence is typically associated with lower sensitivity in detection.
The low prevalence of SAH, could account for the observed lower sensitivity of SAH compared to the overall sensitivity for all bleed types in this study.
ICH prevalence in this study was 4.6%, which is significantly higher than ICH prevalence in the general population documented in epidemiological studies [35,36].Other than the fact that the subpopulation of people who go through head CT in the hospital setting is more prone to have ICH than the general population, this gap could be explained by the fact that scans reviewed by Viz ICH algorithm are only scans that where included upon certain criteria (e.g., head injury, stroke protocol, etc.), which are more susceptible to be positive for ICH, compared to the general population.

Limitations
The study data was collected from the data available for the Viz ICH algorithm.Important information regarding patients, such as medical history, clinical presentation, laboratory exams and clinical outcomes, was not available for this study.
Literature review of ICH subtype prevalence lacked coherent measurements of comparison [33,34,35], which leads to a difficulty determining each subtype prevalence in the general population.Thus, we couldn't assess the external validity of this article in relation to real world prevalence of ICH subtypes.

Conclusion
Viz ICH demonstrates robust performance in identifying ICH on NCCT in most parameters, despite the heterogeneity of setting, equipment, and processes.Therefore, it can be considered a reliable and efficient AI tool for medical centers of all sizes and capabilities.It shows excellent accuracy, sensitivity, specificity and NPV values, as well as swift time to notification, which can have a significant impact on patients' lives.
development and commercialization of the AI algorithm discussed in this article.

Fig 3 .
Fig 3. Viz Web ICH suspected subjects list (TOP), single subject scan view (bottom)

Fig 4 :
Fig 4: Viz ICH classification ROC plot, based on ICH volumetric classification threshold (ranging from 0 to ~206 mL, the largest suspected ICH volume in the examined set) determining final device classification.The ROC curve shows an AUC of 97%.

Table 1 : Study consecutively collected subsets, time period, and hospital distribution. Notice that hospitals were used for data collection in multiple time periods as a part of timely, multi-site, consecutive performance evaluation on the Viz ICH device for quality control purposes. Subset # Time Period Number of hospitals Number of unique hospitals (appear on specified subset only) Number of repeat hospitals (Appear in more than one subset)
4 05/01/2022 -00:00-08:00 06/01/2022 -08:00-12:00 07/01/2022 -12:00-15:00 46

Table 2 .
Study population distribution

Table 3
volumetric detection threshold, is viewed in Figure4.The curve demonstrates AI model detection reliability starting a volumetric threshold of the suspected ICH size detected.The Area Under the ROC Curve (AUC) was 0.97.

Table 4
classifies False Positive (FP) cases according to the reasons for the classification confusion.Figure5demonstrates common FP cases.Artifacts category, containing above mentioned artifact types, only contained scans where the artifacts did not limit the ability to identify ICH.Space occupying lesions that present as hyperdense lesions, may have a similar appearance as ICH, especially IPH.Calcifications in scans included calcified areas (pathological or benign) that were misidentified by the algorithm as ICH.Low scan quality refers to scans with a low signal to noise ratio.

Table 4 :
Classification of false positive cases according to FP reason
IVH and SAH is needed to determine Viz ICH's sensitivity for these ICH subtypes.It is also worth mentioning, that the prevalence of SAH in this study was low with only 18 positive cases, equivalent to 0.006%.In 2019, the US recorded 0.85 million SAH cases EDH, and SAH have low prevalence in this study (five IVH positive cases, 3 EDH positive cases, 18 SAH positive cases), resulting in a wide confidence interval, questioning the statistical significance of these specific findings.Hence, further investigation with a cohort enhanced with positive cases for EDH,