TY - JOUR
T1 - Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts
AU - JSUM A. I. investigators
AU - Nishida, Naoshi
AU - Yamakawa, Makoto
AU - Shiina, Tsuyoshi
AU - Mekada, Yoshito
AU - Nishida, Mutsumi
AU - Sakamoto, Naoya
AU - Nishimura, Takashi
AU - Iijima, Hiroko
AU - Hirai, Toshiko
AU - Takahashi, Ken
AU - Sato, Masaya
AU - Tateishi, Ryosuke
AU - Ogawa, Masahiro
AU - Mori, Hideaki
AU - Kitano, Masayuki
AU - Toyoda, Hidenori
AU - Ogawa, Chikara
AU - Kudo, Masatoshi
N1 - Funding Information:
This research was funded by the Japan Agency for Medical Research and Development (AMED) under Grant Number 18lk1010030h0001 (MK) and 20lk1010035h0002 (MK). We thank Professor Tomohiro Kuroda (Division of Medical Information Technology and Administrative Planning, Kyoto University Hospital) for construction of the data curation system and the database.
Funding Information:
MK and NN received scholarship grants form GE Healthcare Japan. NS received patent royalties from Gilead Sciences, Otsuka Pharm, Abbie, Eisai, MSD, scholarship grants from Eisai, Sumitomo Dainippon Pharma, Nihon Kayaku, Takeda, Bayer, Chugai Pharm, Otsuka Pharm, Gilead Sciences, Astellas, Daiich Sankyo, endowed chair from Tsushima Management, Nagano Prefecture, Eiken Kagaku, Otsuka Pharm, EA Pharma. HI and TN received research grants from Canon Medical Systems, scholarship grants form Abbie, Otsuka Pharm, Sumitomo Dainippon Pharma. MO received scholarship grants form GE healthcare Japan. All other authors declare no competing interests.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/4
Y1 - 2022/4
N2 - Background: Ultrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US. Methods: We constructed three AI models based on still B-mode images: model-1 using 24,675 images, model-2 using 57,145 images, and model-3 using 70,950 images. A convolutional neural network was used to train the US images. The four-class liver tumor discrimination by AI, namely, cysts, hemangiomas, hepatocellular carcinoma, and metastatic tumors, was examined. The accuracy of the AI diagnosis was evaluated using tenfold cross-validation. The diagnostic performances of the AI models and human experts were also compared using an independent test cohort of video images. Results: The diagnostic accuracies of model-1, model-2, and model-3 in the four tumor types are 86.8%, 91.0%, and 91.1%, whereas those for malignant tumor are 91.3%, 94.3%, and 94.3%, respectively. In the independent comparison of the AIs and physicians, the percentages of correct diagnoses (accuracies) by the AIs are 80.0%, 81.8%, and 89.1% in model-1, model-2, and model-3, respectively. Meanwhile, the median percentages of correct diagnoses are 67.3% (range 63.6%–69.1%) and 47.3% (45.5%–47.3%) by human experts and non-experts, respectively. Conclusion: The performance of the AI models surpassed that of human experts in the four-class discrimination and benign and malignant discrimination of liver tumors. Thus, the AI models can help prevent human errors in US diagnosis.
AB - Background: Ultrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US. Methods: We constructed three AI models based on still B-mode images: model-1 using 24,675 images, model-2 using 57,145 images, and model-3 using 70,950 images. A convolutional neural network was used to train the US images. The four-class liver tumor discrimination by AI, namely, cysts, hemangiomas, hepatocellular carcinoma, and metastatic tumors, was examined. The accuracy of the AI diagnosis was evaluated using tenfold cross-validation. The diagnostic performances of the AI models and human experts were also compared using an independent test cohort of video images. Results: The diagnostic accuracies of model-1, model-2, and model-3 in the four tumor types are 86.8%, 91.0%, and 91.1%, whereas those for malignant tumor are 91.3%, 94.3%, and 94.3%, respectively. In the independent comparison of the AIs and physicians, the percentages of correct diagnoses (accuracies) by the AIs are 80.0%, 81.8%, and 89.1% in model-1, model-2, and model-3, respectively. Meanwhile, the median percentages of correct diagnoses are 67.3% (range 63.6%–69.1%) and 47.3% (45.5%–47.3%) by human experts and non-experts, respectively. Conclusion: The performance of the AI models surpassed that of human experts in the four-class discrimination and benign and malignant discrimination of liver tumors. Thus, the AI models can help prevent human errors in US diagnosis.
KW - Artificial intelligence
KW - Deep neural network
KW - Diagnosis
KW - Liver tumor
KW - Ultrasonography
UR - http://www.scopus.com/inward/record.url?scp=85127729453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127729453&partnerID=8YFLogxK
U2 - 10.1007/s00535-022-01849-9
DO - 10.1007/s00535-022-01849-9
M3 - Article
C2 - 35220490
AN - SCOPUS:85127729453
SN - 0944-1174
VL - 57
SP - 309
EP - 321
JO - Journal of Gastroenterology
JF - Journal of Gastroenterology
IS - 4
ER -