Machine learning determined risk factors associated with non-adherence to timely surgery for breast cancer patients
Original Article

Machine learning determined risk factors associated with non-adherence to timely surgery for breast cancer patients

Guillaume Labilloy1, Bharti Jasra2, Jason Widrich3, Lauren Edgar2, Carmen Smotherman1, Leigh Neumayer2, Brian G. Celso2^

1Center for Data Solutions, College of Medicine, University of Florida, Jacksonville, FL, USA; 2Department of Surgery, College of Medicine, University of Florida, Jacksonville, FL, USA; 3Department of Anesthesia, College of Medicine, University of Florida, Jacksonville, FL, USA

Contributions: (I) Conception and design: BG Celso, G Labilloy, L Neumayer; (II) Administrative support: L Neumayer; (III) Provision of study materials or patients: B Jasra, C Smotherman; (IV) Collection and assembly of data: G Labilloy, C Smotherman; (V) Data analysis and interpretation: G Labilloy, J Widrich, BG Celso; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^ORCID: 0000-0002-0095-2701.

Correspondence to: Brian G. Celso, PhD, MBA. Department of Surgery, College of Medicine, University of Florida, 853 W. 8th Street, Jacksonville, FL 32209, USA. Email:

Background: We investigated a diverse population of newly diagnosed breast cancer patients to characterize factors associated with adherence to timely surgery.

Methods: Machine learning (ML) methods were applied to electronic health record (EHR) and tumor registry data of women diagnosed with stage 0 through III breast cancer between 2014 and 2019. The ML architectures were evaluated and selected to stratify patients by risk of non-adherence to surgery. The most performant model was selected using the area under the curve (AUC) criteria. The high over low Area Deprivation Index (ADI) adherence ratio (HLAR) was plotted against time. The patients who received timely surgery that followed diagnostic confirmation indicated adherence.

Results: A total of 1,004 women with breast cancer were included. The performance of the ML model at 110 days had an AUC of 0.82. The ADI was found to be the most important feature. The HLAR curve for high (most deprived) ADI shifted to the right compared to the low ADI curve, suggesting an increased delay to treatment for patients living in the most deprived zones.

Conclusions: The degree of deprivation appeared to be a valid feature to help predict non-adherence, along with cancer stage and patient demographics. This important finding will guide future interventions to first predict who is at risk for non-adherence and then to study interventions to mitigate the risks. Future research needs to move from identification of non-adherence risk factors to development and implementation of interventions to mitigate the risk and improve timely adherence.

Keywords: Adherence; breast cancer; machine learning (ML)

Received: 28 July 2022; Accepted: 23 December 2022; Published online: 14 January 2023.

doi: 10.21037/abs-22-31

Highlight box

Key findings

• The degree of deprivation was the most important variable in predicting breast cancer patients at risk of being non-adherent to timely surgery.

What is known and what is new?

• There are significant differences in breast cancer outcomes based on race as well as insurance status. Additionally, delay in treatment greater than 90 days post-diagnosis is associated with worse outcomes.

• Machine learning (ML) assessed the relation between social determinants of health and adherence to care in an effort to identify future interventions. The degree of deprivation emerged as the foremost predictor to stratify patients at risk of non-adherence.

What is the implication, and what should change now?

• Our ML model may be useful in the clinical setting to risk stratify patients according to their social determinants. Interventions applied, perhaps by a nurse navigator, has the potential to reduce significant health disparities.


There is a need to use innovative methods to study how we can provide high-quality health care to vulnerable populations in an effort to reduce disparities. This is particularly important for women facing a new breast cancer diagnosis living in communities with higher rates of poverty and disease and poor access to primary care. Advances seen across the spectrum of cancer care remain unevenly applied and less likely to benefit those with minority status. Health disparities in breast cancer care are persistent and, in some cases, worsening (1). For breast cancer, there remain significant differences in racial and the type and presence or absence of health insurance coverage. Having no coverage or being underinsured have led to delays in breast cancer detection, breast surgery, and type of reconstructive surgery selected, all of which amplify outcome disparities (2-5). Individuals who are socially and economically disadvantaged often have difficulty navigating the cancer care continuum that includes screening, diagnostic follow-up, treatment, and surveillance (6,7).

Rationale and knowledge gap

Machine learning (ML) is a type of artificial intelligence (AI) that uses computer algorithms and statistical models that transform and analyze datasets for the purpose of discovering new relationships. Some early examples of ML include linear and logistic regression, and support vector machines that uncover the maximum separation between groups. While limited access to care for socially disadvantaged breast cancer patients may help explain disparities, very few studies have tried a systematic evaluation of social determinants of health (SDOH) role. ML techniques can provide actionable ways to investigate this problem by using an unbiased exploratory data analysis to identify and better understand the predictive power of SDOH in explaining the delay surgery for patients with breast cancer. In addition, these exploratory analyses can identify potential areas and time frames for intervention. Earlier interventions could then be employed to minimize or eliminate those delays for a patient at risk due to identifiable social determinants. Thus, a deeper understanding of social disparities features might better identify predictors of cancer outcome and their effect sizes.


In the state of Florida in USA, incidence rates of breast cancer were shown to be related to the patient’s race. The county where our safety net hospital, UF Health-Jacksonville, is located has one of the highest breast cancer mortality rates for Black women in the entire state of Florida (8). In our own practice, we noticed delays in care for some of our patients that seemed more related to non-medical factors. Thus, we are uniquely positioned to explore whether distinctive traits of our breast cancer patients such as their genetics, physical characteristics, and health habits. The goal of the present study was to investigate discrepancies that exist in SDOH at the urban core that might provide insight into how locally derived intervention may benefit our patients and other regional medical systems. The overall purpose was to do the exploratory analysis of a ML model developed to predict adherence for underserved women newly diagnosed with breast cancer. Our central hypothesis was that a ML algorithm can be employed to accurately define the role social determinants have on the prediction of breast cancer surgery adherence.



Inclusion criteria

Women 18 years of age or older at diagnosis who received a mammogram from any UF Health clinic with an abnormal result and were treated at UF Health in Jacksonville, Florida with stage 0 through stage III breast cancer between 01/01/2014 and 12/31/2019 were included. The Current Procedural Terminology (CPT®) codes 77066 and 77065 or International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) 174.* and ICD-9-CM C50.* were used to identify participants who met inclusion criteria. Women with stage 0 breast cancer were included in the present study as standard of care for stage 0 breast cancer remains surgery.

Exclusion criteria

Patients with stage IV breast cancer, race or ethnicity information was not available, those receiving neoadjuvant chemotherapy, and patients who received palliative care for significant co-morbidities that precluded curative treatment.

Data collection and procedures

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the University of Florida Institutional Review Board (IRB No. 202101137) and individual consent for this retrospective analysis was waived. Following approval, a retrospective review of electronic health record (EHR) and the tumor registry data was conducted. Age at diagnosis, cancer stage at diagnosis, race, insurance type at diagnosis, type of surgery, and Area Deprivation Index (ADI) were used to build a ML model. The ADI was established for the purpose of ranking neighborhoods based on socioeconomic status disadvantage on a scale of 1–10. The scale score is a composite measure of 17 census variables designed to describe socioeconomic disadvantage based on such factors as income, education, household characteristics, and housing (9), and later refined to produce national percentiles and state-based deciles (10). This measure has been validated and was actively used by a number of organizations including the Centers for Medicare and Medicaid Services. Furthermore, the ADI has been used to target geographic areas of greatest disadvantage and guide breast cancer treatment (11). We adopted the above average measure of 6 to differentiate groups. For the purpose of this research, a neighborhood was considered “high” (more disadvantaged) if it was above the average (scale 6) for ADI at the census tract block group level.

A variety of ML decision trees are often used in medical differential diagnoses (12). We adopted an ensemble method that uses a sequence of regression trees to find the optimal nodes/branching of given variables to predict outcomes in a supervised learning dataset (13). The decision trees were connected in a sequence where the residual errors in making the current tree are used to create the next tree. Variable importance was calculated by averaging the “gain” amount that each variable contributes to the branching performance averaged across all the decision trees. The PyCaret Python library we used is based on the well-established scikit-learn (14,15). The importance of a feature was established using the weighted “gini importance” or “mean decrease impurity” as described in Breiman et al. (16). Patterns of adherence were depicted by using ML methods applied to data derived from the EHR and the UF Health-Jacksonville Tumor Registry.

Adherence to treatment was fulfilled if the patient received surgery within a time-horizon expressed in number of days which followed diagnostic confirmation. The ML architectures were evaluated and selected to stratify patients by risk of non-adherence to timely surgery that followed a diagnostic procedure. We evaluated the performance of the ML model at 30, 60, 90, 100, 110, 120, 130, 140 days to determine the optimal time horizon to maximize accuracy and precision between the groups in the ML model. Patient surgeries that were delayed beyond 18 months were not in the analysis to avoid skewing the data with a few outliers. The purpose of using a ML approach was to produce a model that classified patients into outcomes groups at different stages of breast cancer for different types of treatments. The data used to build the model was enhanced with the data from the ADI, which provided a socio-economic score for each patient based on their postal address for a more precise analysis.

Statistical analysis

Summaries statistics include counts and percentages for categorical data and medians and interquartile range (IQR =1st quartile, 3rd quartile) for continuous data. To build the ML model, we compared various classification techniques such as logistic regression, decision trees, random forest trees, gradient boosted trees, naïve Bayes and support vector machines. Those models were evaluated by use of receiver operating characteristic (ROC) curves to assess both the accuracy or the proportion that are correctly classified and precision which is the proportion of true positives out of all detected positives, and their area under the curve (AUC). Additionally, the F1 score represented the accuracy of the prediction with a value closer to 1 being better. The binary classifiers were evaluated using the Matthews Correlation Coefficient (MCC). The MCC score was high when a model performs well for each quadrant of the confusion matrix. The best performing model revealed the contribution of individual factors toward the primary outcome.


The dataset analyzed contained 1,004 women that met inclusion criterion. The median age of the participants was 59 years (IQR, 50–68 years) at the time of diagnosis. Most patients were Caucasian (55%), and 53% were part of a program type insurance (Medicare, Medicaid, Veterans Affairs, Jax Charity). Fifty-nine percent of patients were associated with a high (most deprived) ADI. Fifty-six percent underwent lumpectomy versus 44% who received mastectomy. Fifteen percent of patients were stage 0 cancer, 40% were stage I, 20% stage II, and 6% stage III. Table 1 presents the summaries statistics for both groups, adherent and non-adherent, respectively. Figure 1 shows that the time to surgery was bimodal, with day 60 and another around day 200 from diagnosis. Figure 2 depicts the adherence over non-adherence ratio, adherence over both populations (adherence + non-adherence), at different time horizons for low and high ADI.

Table 1

Baseline characteristics for patients adherent or not adherent to treatment at 110 days from diagnosis

Variable Adherence group (n=744, 74%) Non-adherence group (n=260, 26%)
Age at diagnosis (year), median [IQR] 61 [52; 69] 54 [45; 63]
ADI, median [IQR] 6 [4; 9] 7 [4; 9]
Race, n (%)
   Black 226 (30) 116 (45)
   White 433 (58) 121 (47)
   Other 85 (11) 23 (9)
Ethnicity, Hispanic 47 (6) 14 (5)
   Private 42 (6) 17 (7)
   HMO 215 (29) 86 (33)
   Program 408 (55) 127 (49)
   Other 22 (3) 7 (3)
   Uninsured 57 (8) 23 (9)
Surgery type
   Lumpectomy 452 (61) 109 (42)
   Mastectomy 292 (39) 151 (58)
Stage of cancer
   0 133 (21) 13 (8)
   1 368 (57) 31 (19)
   2 128 (20) 73 (44)
   3 16 (2) 48 (29)

, includes Medicare, Medicaid, Veterans Affairs, Jax Charity; , includes Federal, unknown, others. IQR, interquartile range [1st quartile; 3rd quartile]; ADI, Area deprivation index; HMO, Health Maintenance Organization.

Figure 1 Histogram of days to breast cancer surgery.
Figure 2 Adherence to treatment ratio against time by ADI. ADI, Area Deprivation Index.

Several rounds of model selection and optimization were conducted amongst the different models, for the different time horizons for surgery, and to optimize the parameters and hyperparameters of each model. The AdaBoost model using a time horizon of 110 days had an acceptable sensitivity and specificity with a before and after optimization AUC and F1 of 0.799 and 0.855 and 0.820 and 0.856 respectively, and was retained as the classification model. A comparison of the various ML classification techniques is presented in Table 2 in order of the metrics degree of accuracy and precision, AUC, and F1. Kappa and MCC were not taken into consideration due to bias in the population distribution. At 110 days from diagnosis to surgery, 744 (74%) patients were classified as adherent, whereas 260 (26%) were non-adherent. The difference between adherent and non-adherent at 110 days for both high and low ADI was approximately 8%. The shift to the right of the high (more deprived) ADI curve compared to the low ADI denotes the delays become apparent early in the treatment process and this gap is never closed for the more deprived patients. Figure 3 presents the ML model features importance, and high ADI (disadvantage) is shown to be the most important feature. ADI was consistently one of the most important features across time horizons followed by the cancer stage, white race, Health Maintenance Organization (HMO) payer, and surgery type.

Table 2

Machine learning model decision parameters at 110 days

Machine learning method Accuracy Precision AUC F1 Kappa MCC
Ada Boost Classifier 0.7835 0.8541 0.8200 0.8560 0.4226 0.4237
CatBoost Classifier 0.7749 0.8409 0.8049 0.8518 0.3839 0.3848
Gradient Boosting Classifier 0.7678 0.8331 0.8074 0.3597 0.3597 0.3622
Light Gradient Boosting Machine 0.7649 0.8342 0.7891 0.8452 0.3554 0.3579
Random Forest Classifier 0.7621 0.8364 0.7899 0.8427 0.3533 0.3546
Extra Trees Classifier 0.7621 0.8387 0.7431 0.8420 0.3596 0.3610
Extra Gradient Boosting 0.7606 0.8272 0.7844 0.8433 0.3349 0.3388
Quadratic Discriminate Analysis 0.7450 0.7482 0.6669 0.8539 −0.0084 −0.0236
Decision Tree Classifier 0.7396 0.8336 0.6659 0.8238 0.3199 0.3218
K Neighbors Classifier 0.6780 0.8557 0.7202 0.7612 0.2843 0.3014
Logistic Regression 0.6211 0.8215 0.6577 0.7135 0.1813 0.1954
SVM-Linear Kernel 0.6198 0.8095 0.0000 0.6767 0.1274 0.1663
Ridge Classifier 0.6182 0.8191 0.0000 0.7112 0.1752 0.1889
Linear Discriminate Analysis 0.6168 0.8186 0.6568 0.7097 0.1736 0.1872
Naïve Bayes 0.5811 0.8689 0.7094 0.6475 0.2061 0.2503

AUC, area under the curve; MCC, Matthews Correlation Coefficient; SVM, support vector machine.

Figure 3 Features importance from model of adherence. ADI, Area Deprivation Index; HMO, Health Maintenance Organization.


Key findings

The present study was carried out to develop a ML model that systematically evaluated how health disparities may delay surgery for disadvantaged women with breast cancer. We identified limited access to care and SDOH as risk factors for non-adherence among these individuals. The ADI was the most significant component used to stratify breast cancer patients at risk of being non-adherent to timely surgery followed by cancer stage and race. Notably, our research discovered that the urban core surrounding UF Health-Jacksonville ranked in the most disadvantaged groups within the ADI. Therefore, these identified predictors of timely surgery may lead to a deeper understanding of social disparities features, their effect size and potentially guide future interventions.

Strengths and limitations

For this study, we gathered extensive EHR and tumor registry data from our diverse, predominantly underserved patient population, which creates a rich foundation for exploration of potential systematic barriers faced by women. However, missing data for specific variables in the EHR and tumor registry data are present, which thwarts further investigation into this area (17,18). By combining tumor registry with EHR data, we can improve the reliability and validity of the data for analysis. It is difficult to collect all data without missing some information, as was the case in our study. The problem of linking databases has resulted in low overall match rates due, in part, to different variable definitions and missing data within each database (19). Ways to statistically account for missing data, such as omitting all cases with missing information or analyzing them as a separate group, may produce biased results (20). At the same time, the convention of imputation of all cases with unknown stage proportionally to the known stages increase the probability of mistakenly assuming that the stage distribution of the unknown and observed stages are equivalent.

There are likely other unaccounted for underlying processes that may contribute to non-adherence. Anecdotally, we can describe missed appointments and patient reported transportation barriers, however, the current study was unable to account for these patient issues. Collecting data on such factors may help elucidate links to SDOH resulting in non-adherence to timely surgery. Moreover, ML has demonstrated the ability to address these problems through statistical techniques that can impute the missing data without corrupting the results.

Comparison with similar researches

Past research on the impact of geography, in particular, an urban-rural divide has found disparities in cancer treatment and management. Zipkin et al. (21) found that delays in breast cancer surgery was more prevalent among urban patients than for rural patients. Patients whose drive were greater than one hour to their health care facility was associated with surgical delays for urban residents. In addition, the authors showed that surgical delays were related to age, black and Hispanic ethnicity, co-morbidity, and hospital type. Geographic areas of greatest disadvantage were targeted with use of the ADI to direct adjuvant treatment of breast cancer (11), and has served as a proxy measure that depicts social determinants not readily available in a region of interest that may be at the level of national, state, or area zip code. Additionally, the ADI appears to correlate better with mortality and other outcomes than the previously used Index of Medical Underservice, used by the Federal Government for decades to allocate funding for populations without adequate access to health care or other social services.

ML models

The use of AI was shown to be invaluable in medical science. For example, McKinney et al. (22) demonstrated that AI was superior to experienced radiologists in the detection of breast cancer from mammography, but more importantly, was less likely to falsely detect a tumor when none existed. Likewise, Conant et al. (23) found improved sensitivity and specificity when AI was used along with reduced reading times for digital breast tomosynthesis. Clinical interventions based on predictive models require the correct specification of cause and effect and the calculation of alternative scenarios (24,25). Even indirect measures that include potential disparity variables tracked throughout the course of treatment such as ADI could provide more accurate information to help medical providers treating a diverse population with breast cancer. Unfortunately, data-driven prediction models are often mistakenly misinterpreted as having causal effects without the necessary parameters or their predictions (26). Nevertheless, the inclusion of social determinants is considered just as important when determining prognosis.

Explanations of findings

In our analysis, the optimal time horizon for surgery was 110 days. This is 20% longer than the 90-day cut off described by Ho et al. (27) who found that outcome was no worse in patients who had delayed treatment of greater than 90 days post-diagnosis based on tumor stage. Those patients with a delayed first treatment of more than 30 days involved non-invasive breast cancer, followed by metastatic and invasive non-metastatic breast cancer. On the other hand, delayed first treatment of greater than 90 days post-diagnosis was associated with worse outcome in patients with invasive non-metastatic and metastatic breast cancer. The authors suggested that by taking into consideration the severity of the disease, wait time for patients to receive treatment could be optimized. Moreover, a longer time to first treatment (31–90 days post-diagnosis) may be viable for more extensive diagnostic workup and allow for patient-centered decision-making that considers patients’ preference and expressed concern without compromising survival (28). We believe our large proportion of underserved patients may have led to this finding, and accentuates the need for early intervention. As illustrated in Figure 2, the delay in adherence increased through the first 60 days after diagnosis and after that, the gap never closes.

Implications and actions needed

Recent developments in the area of health-related social risks include recommendations for primary care by the US Preventive Services Task Force. Of their 85 active recommendation statements, 67% referenced social determinants to some degree (29). One of the task force’s conclusions was that more evidence was needed to understand the added value for primary care clinicians to build better connections with social service programs. The review also concluded that there were currently no multi-domain social risk screening instruments with evidence that they can accurately identify social risk or measure effective interventions (30). Therefore, the findings may be utilized in the clinical setting to aide predominantly minority patients by using locally derived and culturally appropriate information in order to minimize barriers and optimize the quality and outcomes of breast cancer care.

It is becoming more apparent to accurately identify at-risk patients likely to be non-adherent include social determinants. The American Cancer Society has introduced a framework for understanding and addressing social determinants to advance cancer health equity (31). Unfortunately, in the Surveillance, Epidemiology, and End Results Program (SEER) database SDOH such as poor housing in impoverished neighborhoods with a lack of educational and economic opportunities are not accounted for (32). Alternatively, the ADI measures neighborhood disadvantage at a more granular level. Moreover, the UF Health-Jacksonville gathered extensive EPIC EHR of a diverse, underserved local population that creates a rich foundation for exploration of potential systematic barriers.

There is a need for innovative methods to provide quality health care to vulnerable populations. ADI was consistently one of the most important features across time horizons for timely surgery. Furthermore, the degree of deprivation emerged as the foremost predictor to stratify patients at risk of non-adherence. A ML model may be useful in the clinical setting to risk stratify patients according to their SDOH. Since the ADI is known at the time of initial screening, this could be used to identify those patients who will need intervention and to implement that early in the process. Accordingly, the development of a clinically useful tool based on the ML model and from a patient-centered, shared decision perspective may improve breast cancer outcomes for historically underserved patients.


The majority of breast cancer patients were able to undergo surgery within 110 days of diagnosis. The degree of deprivation appeared to be a valid feature to help predict non-adherence, along with cancer stage and patient demographics. This lends supports for the necessity to understand better the relation between SDOH and care received by surgery patients, thus allowing intervention at an earlier period to mitigate the delays. When applied widely with interventions (perhaps by a nurse navigator), it has the potential to reduce significant health disparities in one of the country’s most health disparate counties. Future research needs to move from identification of non-adherence risk factors to development and implementation of interventions to mitigate the risk and improve timely care.


The study was presented as a poster presentation at the 2021 San Antonio Breast Cancer Symposium, San Antonio, TX, December 7–10, 2021.

Funding: None.


Data Sharing Statement: Available at

Peer Review File: Available at

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the University of Florida Institutional Review Board (IRB No. 202101137) and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See:


  1. Bradley CJ, Dahman B, Shickle LM, et al. Surgery wait times and specialty services for insured and uninsured breast cancer patients: does hospital safety net status matter? Health Serv Res 2012;47:677-97. [Crossref] [PubMed]
  2. Yang RL, Newman AS, Reinke CE, et al. Racial disparities in immediate breast reconstruction after mastectomy: impact of state and federal health policy changes. Ann Surg Oncol 2013;20:399-406. [Crossref] [PubMed]
  3. Samiian L, Sharma P, Van Den Bruele AB, et al. The Effect of Insurance and Race on Breast Cancer Tumor Biology and Short-Term Outcomes. Am Surg 2018;84:1223-8. [Crossref] [PubMed]
  4. Anderson DR, Olayiwola JN. Community health centers and the patient-centered medical home: challenges and opportunities to reduce health care disparities in America. J Health Care Poor Underserved 2012;23:949-57. [Crossref] [PubMed]
  5. Ganggayah MD, Taib NA, Har YC, et al. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 2019;19:48. [Crossref] [PubMed]
  6. Balasubramanian BA, Demissie K, Crabtree BF, et al. Black Medicaid beneficiaries experience breast cancer treatment delays more frequently than whites. Ethn Dis 2012;22:288-94. [PubMed]
  7. Dankwa-Mullan I, George J, Roebuck MC, et al. Variations in breast cancer surgical treatment and timing: determinants and disparities. Breast Cancer Res Treat 2021;188:259-72. [Crossref] [PubMed]
  8. Florida Department of Health, Bureau of Vital Statistics.2018. Deaths from Breast Cancer - Florida Health CHARTS - Florida Department of Health. Available online: Accessed 2 Dec 2021.
  9. Knighton AJ, Savitz L, Belnap T, et al. Introduction of an Area Deprivation Index Measuring Patient Socioeconomic Status in an Integrated Health System: Implications for Population Health. EGEMS (Wash DC) 2016;4:1238. [Crossref] [PubMed]
  10. Kind AJ, Jencks S, Brock J, et al. Neighborhood socioeconomic disadvantage and 30-day rehospitalization: a retrospective cohort study. Ann Intern Med 2014;161:765-74. [Crossref] [PubMed]
  11. Griggs JJ, Culakova E, Sorbero ME, et al. Effect of patient socioeconomic status and body mass index on the quality of breast cancer adjuvant chemotherapy. J Clin Oncol 2007;25:277-84. [Crossref] [PubMed]
  12. Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput Sci 2021;2:160. [Crossref] [PubMed]
  13. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29:1189-1232. [Crossref]
  14. PyCaret, April 2020. Available online: PyCaret version 1.0.0.
  15. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011;12:2825-30.
  16. Breiman L, Friedman, JH, Olshen, RA, et al. Classification And Regression Trees (1st ed.). Routledge; 1984. Available online: 10.1201/978131513947010.1201/9781315139470
  17. Nathan H, Pawlik TM. Limitations of claims and registry data in surgical oncology research. Ann Surg Oncol 2008;15:415-23. [Crossref] [PubMed]
  18. van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999;18:681-94. [Crossref] [PubMed]
  19. Eisemann N, Waldmann A, Katalinic A. Imputation of missing values of tumour stage in population-based cancer registration. BMC Med Res Methodol 2011;11:129. [Crossref] [PubMed]
  20. Matsen CB, Luther SL, Stewart AK, et al. A match made in heaven? Trying to combine ACS-NSQIP and NCDB databases. J Surg Res 2012;175:6-11. [Crossref] [PubMed]
  21. Zipkin RJ, Schaefer A, Wang C, et al. Rural-Urban Differences in Breast Cancer Surgical Delays in Medicare Beneficiaries. Ann Surg Oncol 2022;29:5759-69. [Crossref] [PubMed]
  22. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577:89-94. [Crossref] [PubMed]
  23. Conant EF, Toledano AY, Periaswamy S, et al. Improving Accuracy and Efficiency with Concurrent Use of Artificial Intelligence for Digital Breast Tomosynthesis. Radiol Artif Intell 2019;1:e180096. [Crossref] [PubMed]
  24. Park K, Ali A, Kim D, et al. Robust predictive model for evaluating breast cancer survivability. Eng Appl Artif Intell 2013;26:2194-205. [Crossref]
  25. Hegselmann S, Gruelich L, Varghese J, et al. Reproducible Survival Prediction with SEER Cancer Data. presented at: Proceedings of the 3rd Machine Learning for Healthcare Conference; 2018; Proceedings of Machine Learning Research. Available online:
  26. Prosperi M, Guo Y, Sperrin M, et al. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence 2020;2:369-75. [Crossref]
  27. Ho PJ, Cook AR, Binte Mohamed Ri NK, et al. Impact of delayed treatment in women diagnosed with breast cancer: A population-based study. Cancer Med 2020;9:2435-44. [Crossref] [PubMed]
  28. Bleicher RJ. Timing and Delays in Breast Cancer Evaluation and Treatment. Ann Surg Oncol 2018;25:2829-38. [Crossref] [PubMed]
  29. Davidson KW, Krist AH, Tseng CW, et al. Incorporation of Social Risk in US Preventive Services Task Force Recommendations and Identification of Key Challenges for Primary Care. JAMA 2021;326:1410-5. [Crossref] [PubMed]
  30. Eder M, Henninger M, Durbin S, et al. Screening and Interventions for Social Risk Factors: Technical Brief to Support the US Preventive Services Task Force. JAMA 2021;326:1416-28. [Crossref] [PubMed]
  31. Alcaraz KI, Wiedt TL, Daniels EC, et al. Understanding and addressing social determinants to advance cancer health equity in the United States: A blueprint for practice, research, and policy. CA Cancer J Clin 2020;70:31-46. [Crossref] [PubMed]
  32. Howlader N, Noone AM, Krapcho M, et al. (eds). SEER Cancer Statistics Review, 1975-2018, National Cancer Institute. Bethesda, MD. Available online:, based on November 2020 SEER data submission, posted to the SEER web site, April 2021.
doi: 10.21037/abs-22-31
Cite this article as: Labilloy G, Jasra B, Widrich J, Edgar L, Smotherman C, Neumayer L, Celso BG. Machine learning determined risk factors associated with non-adherence to timely surgery for breast cancer patients. Ann Breast Surg 2024;8:3.

Download Citation