Real-World Performance of Ai-Powered Ich Triage System

Research Article | DOI: https://doi.org/10.58489/2836-3582/010

Real-World Performance of Ai-Powered Ich Triage System

  • Khen Sela Peremen 1
  • Naama Avni 1
  • Lilian Atlan 1
  • Dor Amran 1
  1. 150 Menachem Begin Road, we tower, 19th floor, 6492105, Tel Aviv, Israel.

*Corresponding Author: Khen Sela Peremen

Citation: Khen Sela Peremen, Naama Avni, Lilian Atlan, Dor Amran. (2024). Real-World Performance of Ai-Powered Ich Triage System. Journal of Hematology and Disorders. 3(1); DOI: 10.58489/2836-3582/010

Copyright: © 2024 Khen Sela Peremen, this is an open-access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: 14 August 2023 | Accepted: 04 March 2024 | Published: 14 March 2024

Keywords: Intracranial hemorrhage, Stroke, Artificial intelligence, Non-contrast CT scan, Brain bleed.

Abstract

Intracranial hemorrhage (ICH) is a life-threatening condition with high mortality and disability rates.

In March 2018, the FDA cleared Viz ICH [1], an AI tool that analyzes brain NCCT studies for the presence of an ICH. Upon detection of a suspected ICH, Viz.ai sends a notification to alert a multidisciplinary team to promote an efficient workflow, leading to quick diagnosis and treatment.

The objective of this study is to characterize the performance of Viz ICH in real-world settings. The FDA clearance is based on data generated in controlled settings; however, an effective AI tool should prove its potency and efficacy with heterogeneous data, originating from sites with varying equipment, protocols, and personnel skills.

3139 patients from 127 hospitals were analyzed using Viz ICH. Results were used to evaluate the detection of ICH, measured by accuracy, sensitivity, specificity, PPV, NPV, and time to notification.

On a real-life consecutive set of 3156 scans, Viz ICH showed a high accuracy rate of 98.57%, and sensitivity, specificity, PPV and NPV of 90.48%, 98.97%, 81.1% and 99.53%, respectively. Mean time to notification was 3.39 minutes for ICH suspected cases.

Viz ICH demonstrates robust performance in heterogeneous, real-world settings. Furthermore, the swift time to notification can assist clinicians in speeding up diagnoses. Viz ICH can therefore be considered a reliable AI tool for medical centers of various sizes and capabilities for the goal of improving patient outcomes.

Abbreviations:

ICH – Intracranial hemorrhage; AI - Artificial intelligence; FDA - Food and Drug Administration; 

CT – Computerized tomography; NCCT - Non-contrast CT; PPV – Positive predictive value; NPV – Negative predictive value; TBI – Traumatic brain injury; IPH – Intraparenchymal cerebral hemorrhage; IVH – Intraventricular hemorrhage; SAH – Subarachnoid hemorrhage; SDH – Subdural hemorrhage; EDH – Epidural hemorrhage; PSC – Primary stroke centers; CSC –Comprehensive stroke centers; ROC - Receiver operating characteristics; AUC - Area Under the Curve; DICOM - Digital Imaging and Communications in Medicine

Introduction

ICH is a medical emergency, resulting in a mortality rate of 35% to 52% at 30-days. Survivors' functional outcome is often very poor, with fewer than 20 being independent at six months [2].

ICH is typically divided into spontaneous (non-traumatic) ICH and traumatic ICH. Spontaneous ICH accounts for 9%-27% of all strokes [3], but the global burden of brain hemorrhage is greater than that of ischemic stroke in terms of death and disability, even though the incidence of ischemic stroke is twice as great [4].

ICH refers to the accumulation of blood outside blood vessels within various spaces in the brain. Common categorization is based on the hemorrhage location: Intraparenchymal cerebral hemorrhage (IPH), Intraventricular hemorrhage (IVH), Subarachnoid hemorrhage (SAH), Subdural hemorrhage (SDH), and Epidural hemorrhage (EDH). Traumatic ICH is a common result of traumatic brain injury (TBI) [5], and is the most common cause of death in lethally injured trauma patients accounting for up to 50% of fatalities [6], resulting in a significant amount of long-term disability [7].

ICH is a dynamically evolving process [8], in which response time is critical. Disease outcome is primarily affected by hematoma growth, perihematomal edema, and evolving intracranial pressure, that may necessitate immediate neurosurgical intervention, which may be life-saving [9]. Early detection and management of ICH have been proven to be associated with improved clinical outcomes [10,11].

Importantly, also in the setting of an acute ischemic stroke, it is essential to rule-out ICH as quickly as possible to enable optimal treatment to restore blood flow before irreversible brain damage occurs [12].

Both acute ischemic stroke and ICH present similar clinical features, but whereas treatment with thrombolytics is an essential part of ischemic stroke management [13], it can be fatal when given to a patient with ICH. The detection of ICH in a time-efficient, and reliable manner is fundamental and crucial in both settings.

The use of CT exams, since the 1970s, has changed the way we manage and diagnose patients, especially in the acute setting. The use of CT improved the treatment of many acute medical conditions, including stroke [14].  According to Power et al. (2016) CT has many advantages, such as availability of scanners and fast results, allowing physicians to confirm diagnoses and exclude critical conditions.

Over the past decades, imaging and thereby radiologists’ on-call workload in the ambulatory and ER setting has shown tremendous growth. The increase in imaging studies over the past years represents a big challenge for radiologists and medical centers around the world and necessitates adjustments in the workforce and workflow to accommodate these evolving changes while guaranteeing the quality and safety of patients and avoiding radiologist’s burnout [15,16].

In the initial clearance in March 2018 [1], the United States Food and Drug Administration (FDA) approved Viz ICH as a notification-only, parallel workflow tool for use by hospital networks and trained clinicians to identify and communicate images of specific patients to a multidisciplinary team, independent of standard of care workflow. Viz ICH uses an AI based algorithm to analyze head NCCT images for findings suggestive of ICH and notifies appropriate medical specialists of these findings in parallel to standard of care imaging interpretation, as shown in figure 1. When ICH is suspected, the system sends a notification to a multidisciplinary team, recommending review of those images. Images can be viewed through a mobile application (figure 2) or via the Viz web viewer (figure 3).

The primary focus of this study is to evaluate the performance of the Viz ICH device in terms of accuracy, sensitivity, specificity, PPV, NPV, and time-to-notification on a diverse dataset from different hospitals of varying sizes and capabilities to obtain a robust estimate of performance that represents real-world settings.

Methods

Data Collection

The data consists of 4 subsets, each one collected sequentially from multiple hospitals during the time periods shown in Table 1. Data was collected and reviewed on a quarterly basis as a post-market real world performance monitoring quality assurance mechanism of the Viz ICH medical device. Selected hospitals and collection dates were chosen at random to enable variety and examine device generalization. Collection time periods were selected to represent the full range of intended population scanned throughout a day in both week/end days.

Scans were excluded from the set-in cases of motion artifacts, metal artifacts and low quality that limit scan readability. Postoperative scans were also excluded due to inadequacy for the ICH screening task.

The collected cohort demonstrates diversity of patient gender, age, scanning technical parameters (such as CT manufacturer and model, scan resolution and protocols). The hospital type also varied, with both Primary Stroke Centers (PSC) and Comprehensive Stroke Centers (CSC) (see Table 2).

Truthing

The CT scans were interpreted by radiology trained clinical specialists and a neuroradiology fellow. Every scan was evaluated for the existence of ICH and the bleed type.

Time to Notification Measurement

To assess the time saved, the time-to-notification was computed. The time-to-notification per scan relates to the time elapsed between the CT acquisition start time and the time in which the notification was sent to the user.

 

Statistical Analysis

Statistical analysis was performed using descriptive data, including ranges, means, medians, standard deviations for continuous variables, and frequencies and percentages for categorical variables. System performance methods were examined using sensitivity, specificity, PPV and NPV.

Confidence intervals were calculated with the Clopper-Pearson interval based on Beta distribution. Receiver Operating Characteristics (ROC) analysis was used to determine Viz ICH classification performance over multiple suspected ICH size volumetric thresholds.

Fig:1

Fig 1: Management workflow for ICH, left: standard ICH workflow, right: ICH workflow with Viz ICH

Fig 2

Fig 2: Viz ICH’s mobile DICOM viewer, examples for ICH suspected scans

Fig:3

Fig 3. Viz Web ICH suspected subjects list (TOP), single subject scan view (bottom)

Table 1: Study consecutively collected subsets, time period, and hospital distribution. Notice that hospitals were used for data collection in multiple time periods as a part of timely, multi-site, consecutive performance evaluation on the Viz ICH device for quality control purposes.

 

 

 

Subset #

 

 

 

Time Period

Number of hospitals
Number of unique hospitals (appear on specified subset only)

 

Number of repeat hospitals (Appear in more than one subset)

 

1

10/23/2021 00:00 -

 

10/24/2021 23:59

08

 

2

11/07/2021 00:00 -

 

11/14/2021 23:59

2648

 

3

02/01/2022 00:00 -

 

02/03/2022 23:59

318

 

 

4

05/01/2022 - 00:00-08:00

 

06/01/2022 - 08:00-12:00

 

07/01/2022 - 12:00-15:00

 

46

 

50

Table 2. Study population distribution

CategorySubcategoryNumber of Cases (out of 3156)Percentage of Cases

 

Age range [Years]

< 50>82326.08%
50-80167252.98%
> 8066120.94%

 

Patient gender

F170754.09%
M144945.91%

 

 

hospital Hierarchy

Comprehensive Stroke Center (CSC - Hub)

 

1585

 

50.22%

Primary Stroke Center (PSC - Spoke)157149.78%

 

 

 

Slice Thickness [mm]

2.586827.50%
3.057218.12%
4.01835.80%
5.0153348.57%

 

 

Number of Slices

18-39152948.45%
40-5977924.68%
60-12584826.87%

 

 

 

 

 

manufacturers

Canon Medical Systems220.70%
GE MEDICAL SYSTEMS130941.48%
Hitachi, Ltd.60.19%
Philips1544.88%
SIEMENS117237.14%
TOSHIBA49315.62%

Abbreviations:

CSC - Comprehensive Stroke Center 

PSC - Primary Stroke Center

MM - Millimeter

Results

Over the chosen timeframe, 3420 non-contrast head CT scans were analyzed by the Viz ICH algorithm, out of which 264 were excluded (see Truthing method section). Out of the 3156 scans, 147 were positive for ICH, with ICH prevalence of 4.6%. Scans were collected from 127 hospitals, 33 of them were comprehensive stroke centers (Hubs) and 94 primary stroke centers (spokes). Table 2 illustrates the distribution of study population according to patients’ age range and gender, hospital hierarchy and scan technical parameters, such as slice thickness, number of slices and scanner manufacturer.

Study Scans Categorical Sub-Grouping 

Performance Analysis

Table 3 describes Viz ICH’s performance. Viz ICH showed a high accuracy rate of 98.57%, [CI: 98%,99%]. The algorithm identified 133/147 of ICH positive scans, and 2978/3009 of ICH negative scans, resulting in sensitivity, specificity, PPV and NPV of 90.48% [CI: 85%,95%], 98.97% [CI:99%,99%], 81.1% [CI:74%, 87%] and 99.53% [CI: 99%,100%], respectively. Performance was stratified by age, gender, and hospital hierarchy. As shown in Table 3, baseline factors (age, gender, and hospital hierarchy) did not show significant effect on sensitivity and specificity of ICH detection, based on a T-test held over these subgroups’ metrics differences (P-Value<0>

In addition, Viz ICH mean time to notification measured was 203.14 sec (3.39 minutes) for all suspected ICH cases. The Receiver Operating Characteristic (ROC) curve for Viz ICH based on ICH volumetric detection threshold, is viewed in Figure 4. The curve demonstrates AI model detection reliability starting a volumetric threshold of the suspected ICH size detected. The Area Under the ROC Curve (AUC) was 0.97.

Error Analysis

Table 4 classifies False Positive (FP) cases according to the reasons for the classification confusion. Figure 5 demonstrates common FP cases. Artifacts category, containing above mentioned artifact types, only contained scans where the artifacts did not limit the ability to identify ICH. Space occupying lesions that present as hyperdense lesions, may have a similar appearance as ICH, especially IPH. Calcifications in scans included calcified areas (pathological or benign) that were misidentified by the algorithm as ICH. Low scan quality refers to scans with a low signal to noise ratio.

Fig:4

Fig 4: Viz ICH classification ROC plot, based on ICH volumetric classification threshold (ranging from 0 to ~206 mL, the largest suspected ICH volume in the examined set) determining final device classification. The ROC curve shows an AUC of 97%.

Fig:5

Fig 5: FP examples. Left to right: (A) Calcifications, (B) Space occupying lesion, (C) Metal artifacts.

Table 3: Viz ICH’s diagnostic performance

Category

Subcategory

Accuracy

Sensitivity

Specificity

PPV

NPV

 

 

 

 

 

81.1%

 

 

General

 

98.57%

90.48%

98.97%

 

[74%,

99.53%

 

Performance

-

[98%, 99%]

[85%, 95%]

[99%, 99%]

 

87%]

[99%, 100%]

 

 

(3111/3156)

(133/147)

(2978/3009)

 

(133/164)

(2978/2992)

 

 

97.62%

97.62%

 

 

 

 

IPH

[87%, 100%]

[87%, 100%]

**1

**

**

 

 

(41/42)

(41/42)

 

 

 

 

 

75.0%

75.0%

 

 

 

 

IVH

[19%, 99%]

[19%, 99%]

**

**

**

 

 

(3/4)

(3/4)

 

 

 

 

 

90.0%

90.0%

 

 

 

 

SDH

[73%, 98%]

[73%, 98%]

**

**

**

Bleed Type

 

(27/30)

(27/30)

 

 

 

 

 

66.67%

66.67%

 

 

 

 

SAH

[41%, 87%]

[41%, 87%]

**

**

**

 

 

(12/18)

(12/18)

 

 

 

 

 

66.67%

66.67%

 

 

 

 

EDH

[9%, 99%]

[9%, 99%]

**

**

**

 

 

(2/3)

(2/3)

 

 

 

 

 

Multiple

96.08%

96.08%

 

 

 

 

 

Bleeds

[87%, 100%]

[87%, 100%]

**

**

**

 

 

(49/51)

(49/51)

 

 

 

 

 

 

 

 

80.0%

 

 

Age range

 

98.91%

88.89%

99.25%

 

[61%,

99.62%

 

<50>

[98%, 99%]

[71%, 98%]

[98%, 100%]

 

92%]

[99%, 100%]

 

 

(814/823)

(24/27)

(790/796)

 

(24/30)

(790/793)

 

 

 

 

 

81.1%

 

 

General

 

98.57%

90.48%

98.97%

 

[74%,

99.53%

 

Performance

-

[98%, 99%]

[85%, 95%]

[99%, 99%]

 

87%]

[99%, 100%]

 

 

(3111/3156)

(133/147)

(2978/3009)

 

(133/164)

(2978/2992)

 

 

 

 

 

80.61%

 

 

98.27%

88.76%

98.8%

 

[71%,

99.36%

50-80

[98%, 99%]

[80%, 94%]

[98%, 99%]

 

88%]

[99%, 100%]

 

(1643/1672)

(79/89)

(1564/1583)

 

(79/98))

(1564/1574)

 

 

 

 

83.33%

 

 

98.94%

96.77%

99.05%

 

[67%,

99.84%

>80

[98%, 100%]

[83%, 100%]

[98%, 100%]

 

94%]

[99%, 100%]

 

(654/661)

(30/31)

(624/630)

 

(30/36)

(624/625)

 

 

 

 

 

89.02%

 

 

 

98.62%

86.9%

99.34%

 

[80%,

99.2%

 

Male

[98%, 99%]

[77%, 93%]

[99%, 100%]

 

95%]

[99%, 100%]

 

Patient gender

 

(1429/1449)

(73/84)

(1356/1365)

 

(73/82)

(1356/1367)

 

 

 

98.54%

 

95.24%

 

98.66%

73.17%

 

99.82%

 

 

Female

 

[98%, 99%]

 

[87%, 99%]

 

[98%, 99%]

[62%,

 

[99%, 100%]

 

 

 

(1682/1707)

 

(60/63)

 

(1622/1644)

82%]

 

(1622/1625)

 

 

 

 

 

(60/82)

 

 

Comprehens

 

98.42%

 

90.53%

 

98.93%

84.31%

 

99.39%

 

ive Stroke

 

[98%, 99%]

 

[83%, 96%]

 

[98%, 99%]

[76%,

 

[99%, 100%]

 

Center

 

(1560/1585)

 

(86/95)

 

(1474/1490)

91%]

 

(1474/1483)

 

(Hub)

 

 

 

(86/102)

 

hospital

 

 

 

 

 

 

 

Hierarchy

Primary

 

98.73%

 

90.38%

 

99.01%

75.81%

 

99.67%

 

Stroke

 

[98%, 99%]

 

[79%, 97%]

 

[98%, 99%]

[63%,

 

[99%, 100%]

 

Center

 

(1551/1571)

 

(47/52)

 

(1504/1519)

86%]

 

(1504/1509)

 

(Spoke)

 

 

 

(47/62)

 

Abbreviations:

PPV – Positive predictive value 

NPV – Negative predictive value 

IVH – Intraventricular hemorrhage 

SAH – Subarachnoid hemorrhage 

SDH – Subdural hemorrhage

EDH – Epidural hemorrhage 

PSC – Primary stroke centers

CSC – Comprehensive stroke centers

Table 4: Classification of false positive cases according to FP reason

 

Number of Cases

Percentage of Cases

Artifacts (Skull artifacts, scan artifacts and metal artifacts)

9

29.03%

Space Occupying Lesion (SOL)

7

22.58%

Calcifications

3

9.68%

Low Scan Quality

3

9.68%

Other

9

29.03%

Total

31

 

Table 5: Viz ICH’s diagnostic performance - additional technical factors sub-grouping.

 

Sub- Category

 Accuracy

Sensitivity

Specificity

     PPV    NPV 

Category

 

 

 

 

 

 

 

99.11%

 

 

 

 

 

98.82%

92.54%

[98%,

82.67%

82.67%

 

 

18-39

[98%, 99%]

[83%,

100%]

[72%,

[72%,

 

 

 

(1511/1529)

98%]

(1449/146

90%]

90%]

 

 

 

 

(62/67)

2)

(62/75)

(62/75)

 

 

 

 

86.11%

98.12%

68.89%

99.32%

 

 

 

97.56%

 

 

 

 

 

Number of Slices

 

 

[71%,

[97%,

[53%,

[98%,

 

 

40-59

[96%, 99%]

 

95%]

 

99%]

 

82%]

 

100%]

 

 

 

(760/779)

 

 

 

 

 

 

 

 

(31/36)

(729/743)

(31/45)

(729/734)

 

 

 

 

90.91%

99.5%

90.91%

99.5%

 

 

 

99.06%

 

 

 

 

 

 

 

 

[78%,

[99%,

[78%,

[99%,

 

 

60-125

[98%, 100%]

 

97%]

 

100%]

 

97%]

 

100%]

 

 

 

(840/848)

 

 

 

 

 

 

 

 

(40/44)

(800/804)

(40/44)

(800/804)

 

 

 

 

89.13%

99.39%

89.13%

99.39%

 

 

 

98.85%

 

 

 

 

 

 

 

 

[98%, 99%]

[76%,

[99%,

[76%,

[99%,

 

 

 

 

96%]

100%]

96%]

100%]

 

 

 

(858/868)

 

 

 

 

 

 

2.5

 

(41/46)

(817/822)

(41/46)

(817/822)

 

 

 

 

100.0%

98.21%

58.33%

100.0%

 

 

 

98.25%

 

[77%,

 

[97%,

 

[37%,

 

[99%,

 

Slice Thickness

 

[97%, 99%]

 

 

 

 

 

 

 

 

100%]

99%]

78%]

100%]

 

 

 

(562/572)

 

 

 

 

 

 

3

 

(14/14)

(548/558)

(14/24)

(548/548)

 

 

 

 

97.27%

83.33%

98.79%

88.24%

98.19%

 

 

 

 

[59%,

[96%,

[64%,

[95%,

 

 

 

[94%, 99%]

 

 

 

 

 

 

 

 

96%]

100%]

99%]

100%]

 

 

 

(178/183)

 

 

 

 

 

 

4

 

(15/18)

(163/165)

(15/17)

(163/166)

 

 

 

 

 

99.04%

 

99.59%

 

 

 

98.7%

91.3%

[98%,

81.82%

[99%,

 

5

[98%, 99%]

[82%,

99%]

[71%,

100%]

 

 

(1513/1533)

97%]

(1450/146

90%]

(1450/145

 

 

 

(63/69)

4)

(63/77)

6)

 

 

 

 

 

 

 

98.67%

 

99.64%

 

 

 

98.38%

91.11%

[98%,

73.21%

[99%,

 

 

 

[97%, 99%]

[79%,

99%]

[60%,

100%]

 

 

 

(1153/1172)

98%]

(1112/112

84%]

(1112/111

 

 

SIEMENS

 

(41/45)

7)

(41/56)

6)

 

 

 

 

 

99.2%

 

99.52%

 

 

 

98.78%

90.62%

[99%,

85.29%

[99%,

 

 

 

[98%, 99%]

[81%,

100%]

[75%,

100%]

 

 

GE MEDICAL

(1293/1309)

96%]

(1235/124

93%]

(1235/124

 

Manufacturers

SYSTEMS

 

(58/64)

5)

(58/68)

1)

 

 

 

 

85.71%

100.0%

100.0%

99.32%

 

 

 

99.35%

 

 

 

 

 

 

 

 

[96%, 100%]

[42%,

[98%,

[54%,100

[96%,

 

 

 

 

100%]

100%]

%]

100%]

 

 

 

(153/154)

 

 

 

 

 

 

Philips

 

(6/7)

(147/147)

(6/6)

(147/148)

 

 

 

 

90.32%

98.92%

84.85%

99.35%

 

 

 

98.38%

 

[74%,

 

[97%,

 

[68%,

 

[98%,

 

 

 

[97%, 99%]

 

 

 

 

 

 

 

 

98%]

100%]

95%]

100%]

 

 

 

(485/493)

 

 

 

 

 

 

TOSHIBA

 

(28/31)

(457/462)

(28/33)

(457/460)

 

Discussion

Viz ICH is an AI based tool, intended for analyzing NCCT images of the brain are often acquired in the acute setting and demand rapid interpretation for optimal patient care. When the algorithm identifies a scan as ICH suspected, the system notifies a multidisciplinary team, recommending immediate review of those images. Images can be promptly previewed through a web interface or through a mobile application, allowing clinicians to assess the case in a remote manner.

Over the last years, radiologists’ workload has increased dramatically [17]. Long-hour shifts, increasing number of scans, and manpower shortage may not only lead to radiologist’ burnout but also risk critical diagnostic errors [18,19]. Since medical imaging is a major contributor to the overall diagnostic process, it is also a major potential source of diagnostic error [20]. Clinically significant errors may vary between 2-20% [21]. Different methods such as double reporting was shown to be effective [22] but are time consuming and demand more workforce [23]. AI based algorithms, such as Viz ICH, could be used to improve the workload on radiologists, shorten reporting time and serve as additional tools for radiologists.

ICH is a life-threatening condition, with high mortality rates, and complex morbidity [24]. ICH is generally an acutely evolving process, for which quick diagnosis and treatment are key to better outcomes [24,25]. Most importantly missed ICH in NCCT scans may have a fatal impact on patients, leading to delay in diagnosis, complications, and death [26]. AI tools, designed to detect and notify about suspected ICH, could help decrease error rate, minimize false negatives diagnosis [27] and shorten notification time.

In this study, we aimed to examine the performance of Viz ICH in real-life settings. While FDA clearances of computer-aided triage devices typically include supporting information regarding the sensitivity and specificity of the cleared device as well as average time-savings, these data-points need to be validated in a real world setting for several reasons:

  • Quality biassites participating in pivotal studies supporting FDA submissions are typically at the cutting edge of the field. As such, the average quality of a scan at a site participating in a pivotal study may not represent the average quality of a scan in an average hospital in the US. Thus, these data may be biased to have higher quality imaging due to more modern or advanced equipment. Greater experience of the CT technologists and staff may lead to fewer technical issues or patient-driven issues, such as patient motion.
  • Overfitting due to lack of diversity in dataOverfitting [28] is a general problem in modeling whereby a model may have good predictive capabilities on a training dataset, but lower performance on real-world data. While there are many reasons for this phenomenon, the most relevant for AI applications in medical imaging is when the distribution of the training set data is different from the distribution of the real-world test data. For example, a device developed using data from only a subset of scanner makes, models and configurations, might obtain good performance on these data, but yield lower performance when applied on more diverse data. When a device is developed using a dataset with limited diversity, there is risk that performance on data of greater diversity will be lower.
  • AccuracySince Viz ICH was approved, a great amount of data was processed from many sites. This real-world data could be leveraged to evaluate the AI performance more accurately.
  • GeneralizationThe use of real-world evidence, such as that presented here, may help payors, clinicians, and administrators decide if there is sufficient evidence that the proposed solutions would work in their specific situations. Moreover, demonstration of effectiveness of workflow enhancements - such as earlier alerting of specialists - does not necessarily guarantee transferability of that enhancement to a different system without understanding the specific circumstances of that system. The value of such evidence is to demonstrate the realistic possibility.
  • Data Drift: Supervised AI algorithms employ past examples-based learning to predict requested outputs. In a clinical dynamic environment setting, this static approach preserves the original data distributions learned. A likely outcome of this setup is a decrease in performance over time due to a change in data distribution due to either technical or clinical changes that manifest in imaging visual modifications. Practically this means deployed algorithms must be monitored to ensure their safety over time.

In 2022, Matsoukas, Stavros, et al [29] published an article examining Viz ICH real world performance, in all hospitals of the Mount Sinai Health System. In the here presented study, we aimed to examine a larger cohort of sites, varying in equipment, population, protocol, and personnel skills, thus exploring AI model generalization over a diversified dataset.

In this study, we examined scans from 127 hospitals, 33 comprehensive stroke centers (Hubs) and 94 primary stroke centers (spokes). The hospitals are using different CT manufactures and protocols as seen in table 2. The extensive and varied nature of our study mitigate the risk of bias and the artificial inflation of performance metrics, commonly observed in studies featuring minimally diverse and homogeneous cohorts. Such comprehensive coverage ensures our findings are robust, reflecting true clinical utility rather than the biased results of more limited research settings, such as single site studies.

The algorithm demonstrated robustness to differences in sensitivity and specificity over all subgroups such as gender, age group and hospital hierarchy. When comparing FP scans causes between CSC and PSCs, the latter group had shown larger amounts of SOLs misidentified as ICH by Viz ICH, a factor unrelated to the hospital hierarchy. In addition, a slight increase in artifacts containing scans was spotted in the PSCs subgroup, which in combination with the lower sample size of this group compared with CSC, may be the cause for the difference detected in PPV. A different prevalence of stroke suspected patients, arriving at PSCs vs. CSC can also provide further justification to the difference in PPV.

Viz ICH mean time to notification is 3.39 minutes. Such notification time allows a multidisciplinary team to review suspected cases with higher priority, which can lead to faster diagnosis, assist treatment decisions, and result in better outcomes. When examining radiologists’ performance, their mean time to report is 132 minutes. When an AI algorithm assists the radiologists, their mean time to report is shortened to 73 minutes [30].

When examining Viz ICH’s sensitivity to bleed types, variations in sensitivity are seen. Viz ICH is more sensitive to IPH (97.62% CI 87%,100%) and SDH (90.0% CI: 73%,98%), than to IVH (75% CI 19%,99%), EDH (66.67% CI 9%,99%) and SAH (64.71% CI 38%,86%). It is worth mentioning that IVH EDH, and SAH have low prevalence in this study (five IVH positive cases, 3 EDH positive cases, 18 SAH positive cases), resulting in a wide confidence interval, questioning the statistical significance of these specific findings. Hence, further investigation with a cohort enhanced with positive cases for EDH, IVH and SAH is needed to determine Viz ICH’s sensitivity for these ICH subtypes.  It is also worth mentioning, that the prevalence of SAH in this study was low with only 18 positive cases, equivalent to 0.006%. In 2019, the US recorded 0.85 million SAH cases [31], with an estimated population of 328.24 million for that year [32]. Based on these parameters, we can estimate the SAH prevalence to be 0.003%. SAH incidence rates vary between 6.2 and 11.15 per 100,000 people [33,34]. A low disease prevalence is typically associated with lower sensitivity in detection. The low prevalence of SAH, could account for the observed lower sensitivity of SAH compared to the overall sensitivity for all bleed types in this study. 

ICH prevalence in this study was 4.6%, which is significantly higher than ICH prevalence in the general population documented in epidemiological studies [35,36]. Other than the fact that the subpopulation of people who go through head CT in the hospital setting is more prone to have ICH than the general population, this gap could be explained by the fact that scans reviewed by Viz ICH algorithm are only scans that where included upon certain criteria (e.g., head injury, stroke protocol, etc.), which are more susceptible to be positive for ICH, compared to the general population.

Limitations

The study data was collected from the data available for the Viz ICH algorithm. Important information regarding patients, such as medical history, clinical presentation, laboratory exams and clinical outcomes, was not available for this study.

Literature review of ICH subtype prevalence lacked coherent measurements of comparison [33,34,35], which leads to a difficulty determining each subtype prevalence in the general population. Thus, we couldn’t assess the external validity of this article in relation to real world prevalence of ICH subtypes.

Conclusion

Viz ICH demonstrates robust performance in identifying ICH on NCCT in most parameters, despite the heterogeneity of setting, equipment, and processes. Therefore, it can be considered a reliable and efficient AI tool for medical centers of all sizes and capabilities. It shows excellent accuracy, sensitivity, specificity and NPV values, as well as swift time to notification, which can have a significant impact on patients' lives.

Conflict of interest

All the authors are employees of Viz.ai, and receive a salary from Viz.ai, the company responsible for the development and commercialization of the AI algorithm discussed in this article.

References