Computer-Aided Diagnosis in the Evaluation of Thyroid Nodules: A Study of Intra- And Inter-Rater Reliability and Agreement.

Apr 6
1 min read

Updated: Apr 13

Artificial Intelligence (AI) Tools for Thyroid Nodules on Ultrasound, From the AJR Special Series on AI Applications.

Abstract

Objective To evaluate the intra- and inter- rater reliability of a computer-aided diagnosis system applied to thyroid nodule assessment.

Methods This prospective, single-center study included 150 thyroid nodules evaluated by two physicians at two time points, 90 days apart. Analyses were performed using the AmCAD-UT system, focusing on morphological features and ACR TI-RADS classification. Cohen's kappa coefficient and percentage agreement were used to assess reliability.

Results Intra-rater reliability ranged from moderate to almost perfect, with kappa values from 0.49 (95% CI: 0.31–0.66) to 0.98 (95% CI: 0.96–1.00), and agreement rates from 81.3% to 99.3%. Rater 2 demonstrated higher reproducibility across most variables, particularly for “texture” (k = 0.98), “margin” (k = 0.90), “composition” (k = 0.93), and “taller-than-wide” (k = 0.92). Inter-rater agreement was more variable, with kappa values ranging from 0.43 (95% CI: 0.23–0.62) to 0.96 (95% CI: 0.89–1.00), and agreement percentages from 78.0% to 99.3%. The lowest inter- rater reproducibility was observed for “shape”.

Conclusion The computer-aided diagnosis system demonstrated predominantly moderate to almost perfect intra-rater reliability and moderate to strong inter-rater agreement across most evaluated features. The highest reproducibility was observed for “taller-than-wide,” “texture,” and “composition,” whereas “shape” consistently showed lower agreement. These findings support the system's role as a reliable adjunct for standardizing thyroid nodule assessment, although its performance remains partially influenced by operational factors and warrants further multicenter validation.

Read full paper

<< Back