Quantitative Nuclear Grading to Increase Reliability and Consistency in Bladder Cancer Diagnoses: Model Construction and Validation
artificial intelligence , bladder cancer , grading , urothelial carcinoma , digital pathology , validation
Bladder cancer, with a simple yet subjective and unreliable grading scheme, is a prime candidate for the development of a reproducible and quantitative grading model. In the early form of the disease, tumours are separated into low- and high-grade based on qualitative histological differences. Unfortunately, these criteria exist on a spectrum without standardized thresholds, resulting in unacceptable amounts of inter-observer variability. To counteract this unreliability and irreproducibility, we developed internally cross-validated and externally validated quantitative grading models. Using two large non-muscle-invasive bladder cancer cohorts from different continents, we analyzed the histological features in 1023 low-grade and 199 high-grade images taken from 774 cases. Images underwent consensus grading according to the 2004 World Health Organization and International Society of Urological Pathology guidelines. Lymphocytes served as an internal control to standardize cohort image magnifications. Tissue regions and nuclei were identified using automated image analysis software and the size, shape, and mitotic rate were generated for all nuclei. We found consistent differences in size-related features across cohorts, specifically, the standard deviation of nuclear area could differentiate between the two grades with > 80% balanced accuracy. In contrast, there were large discrepancies in mitotic index and shape-related features between cohorts which may be a result of technical errors or sampling differences. We used the first cohort to train a variety of classification models, many of which performed impressively well on the training data. However, out of all multivariate models, only the random forest generalized well to a validation cohort and achieved a balanced accuracy of 80%. Using a discovery-validation approach with two distinct datasets, these findings strongly support the use of a random forest model to help standardize bladder cancer grading.