Predicting the success of a radioactive arteriovenous fistula using machine learning


data source

We performed a post hoc analysis of pooled patient-level data from the international multicenter PATENCY-1 and PATENCY-2 Phase III randomized controlled trials 2014-2019 (trial registration: ClinicalTrials.gov; NCT02110901, July 2014; and NCT02414841, August 2015). These trials prospectively tracked clinical outcomes for up to 3 years after the establishment of a new AVF in the radioactive head at 31 and 39 centers, respectively, in the United States and Canada. Detailed methodology and results of primary trials have been previously published14And the15th16.

All patients with advanced chronic kidney disease who underwent radioactive head AVF formation were eligible for enrollment in the trials. Patients with a life expectancy of less than 6 months, active metastasis, or previous treatment with the study drug (vonapanitase, recombinant human elastase) were excluded from the trials. Ultimately, the experimental vonapanitase was considered to have a limited effect on relevant clinical outcomes at one year, and further investigation of the drug was abandoned for this use case. Participants were prospectively followed for up to three years in a pre-established clinical outcome registry. Enrollment began in July 2014 and follow-up enrollment ended in April 2019. Key data points collected during the trial and subsequent follow-up of the registry included baseline comorbidities at the time of trial enrollment, anatomical characteristics and case mix, subsequent surgical or vascular interventions, and ultrasound measurements. audiology after surgery.

Routine duplex ultrasounds (US) were performed at 4–6 weeks and 12 weeks after AVF creation. The diameter of the outgoing vein lumen was measured twice at three predetermined sites in the forearm (3 cm near the AVF anastomosis, mid- forearm, and just below the pre-incubation fossa) and averaged. Flow volume was estimated from three separate measurements at the same site in the 5 cm vertical vein near the AVF anastomosis. The stenosis was divided as the presence or absence of a 50% luminal stenosis at any point along the entire length of the access. The depth of access has not been evaluated. All ultrasounds were interpreted by a blinded core laboratory (VasCore; Boston, MA). The methods were performed in accordance with relevant guidelines and regulations, including a waiver of informed consent, and were approved by the Institutional Review Board of the Brigham General Human Research Committee for the use of trial data previously collected from PATENCY-1 and PATENCY-2 for post hoc analysis.

prediction models

We sought to build on and improve upon threshold-based ultrasound criteria for predicting AVF maturity and suitability for use. To be included in prediction modeling, patients must be at risk for AVF use during study follow-up (eg, on hemodialysis) and have complete ultrasound data from 4 to 6 weeks away. Any patients with chronic renal disease prior to dialysis who did not develop a need for dialysis during study follow-up were excluded (Figure 1).

outcome

To improve interpretability and simplify model construction, the prediction modeling result was stratified as successful unassisted AVF use within 1 year, defined as 90-day dialysis-needle cannulation with no prior intervention. Patients who had not used AVF successfully for 1 year or before a terminal event (death, transplantation, abandonment of access, or loss of follow-up) were classified as not having used successfully. For patients on predominant hemodialysis, the 1-year time window started on the day of surgery. For patients who had not yet received dialysis at the time of AVF establishment and who had not started dialysis within 1 year, successful use was defined as the introduction of 2-needle cannulation of all dialysis prescribed for 90 consecutive days beginning within 6 weeks of initiation of dialysis . Similar approaches have been implemented in previous analyzes of AVF . data5.

Variable selection

Covariates were shared by all predictive modeling processes, and included age, gender, race, ethnicity, BMI, smoking status, medical comorbidities, dialysis status at the time of AVF creation, CVC history, CKD etiology, primary vein and artery diameter measured in the operating room after anesthesia induction, AVF site, and method of anesthesia, technique of suturing anesthesia, use of statins, use of anticoagulant, and size of the recording site. Ultrasound data from the 4–6 week visit was selected for predictive modeling because of similarities with previous work to examine prediction with unsupported AVF, clinical relevance, and the complexity of including both 4–6 week and 12 week data together in the models. Ultrasound covariates included cephalic vein diameter, AVF flow volume, and presence or absence of 50% luminal stenosis. Analysis was restricted to patients with complete ultrasound data from 4 to 6 weeks old as described above. Variable latency was calculated using K . nearest neighbors calculation17.

statistical analysis

In reporting descriptive statistics, categorical variables were summarized using frequency with percentage. Continuous variables were reported as the mean with standard deviation when normally distributed, and the median with the interquartile range otherwise. Unadjusted comparisons of ultrasound variables were performed using analysis of variance (ANOVA) followed by Tukey’s test. Paired data were compared using paired t tests. Categorical data were compared using Pearson Chi-squared tests. A two-tailed alpha level of 0.05 was used. All analyzes were performed with R version 4.0.5 (https://cran.r-project.org/) and packages RankAnd the tidymodelsAnd the glmnet, rpartAnd the Watchman.

Modeling overview

To achieve our goal of building a predictive classification model, we explored several modeling procedures each with their potential advantages and disadvantages. Modeling methods included conventional logistic regression and penalized logistic regression using Lasso methods, classification and regression tree (CART) and two methods of group classification: random forest and XGBoost. Each approach has its own different potential benefits and drawbacks; We sought to balance model complexity, flexibility, and performance with interpretability and clinical utility.

Multivariate logistic regression is used as a ‘gold standard’ in classification problems. With many covariates used in modeling, simple logistic regression can bias with coefficient estimation bias resulting in a decrease in performance when the model is used in external data. To address this issue, penalized regression techniques use a contraction modulus to reduce out-of-sample bias; Lasso is a popular technique due to its ability to reduce coefficients to zero, acting as an experimental variable selection method and leading to simpler final models18. Notably, the bias-variance trade-off will always be a compromise and over-allocation cannot be eliminated, but the penalties and validation techniques described here can mitigate overfitting (particularly on smaller datasets).

The CART procedure is another traditional procedure for classification, with the main advantage of flexibly producing a clinically interpretable decision base, but with the drawback of potentially unstable performance in external data sets even with pruning methods.19. To overcome this problem, tree-clustering methods such as random forest and XGBoost have been developed and widely adopted20, 21. Random Forest and XGboost are highly flexible and consider interactions between variables with relatively low bias. Random Forest grows thousands of trees in a manner similar to CART, but using random samples of both variables and records that are then averaged to achieve a final model (a technique referred to as bootstrap clustering, or “packaging”). Similarly, XGBoost can build thousands of trees, but additionally uses the error from each tree to re-weight the selected samples for each subsequent tree (referred to as gradient boosting), theoretically favoring variables with more predictive performance and de-emphasizing stale ones. Variable significance can be examined by a variety of approaches, but a deeper understanding of the relationships between variables in cluster techniques is challenging and can lead to skepticism from clinicians due to lower interpretability.

Modeling details

All predictive modeling methods were built using the training model, hyperparameter tuning, and model testing using a combination of baseline clinical characteristics and the 4-6 week US standards described above. We performed an initial, random 70/30 split into training and testing data sets before model generation, diagnostics, or data cleaning. Continuous variables were preprocessed by averaging (subtracting the mean) and measuring (divide by standard deviation) their distributions before fitting the model. A total of 5 missing values ​​were calculated using the K-nearest neighbors methodology (BMI, n= 1; vein diameter during operation, n= 2; artery diameter during surgery, n= 2)17. Models were built using the training data set, and hyperparameters were adjusted using grid search methods with 10-fold overlapping validation within the training data set.

Our modeling approach started with a simple logistic regression including all covariates in the main effects model. Then, the Lasso-punished logistic regression model was suitable for the empirical selection of the most useful covariates for prediction.18. The regularization penalty was selected for selecting the most scarce model within one standard error of the regularization penalty with a minimum validated log-average loss of 10 times. The lasso for variable selection was used to reformulate the logistic regression model. Finally, the elastic network model was appropriate using a regular network search with 10 levels and 10-fold cross-checking to adjust both the regularization penalty value and the elastic network shuffling parameter22. The significance of the variable was calculated as the absolute value of the measured coefficients at the penalty of optimal regulation.

A simple classification tree approach has also been taken in the hope of improving interpretability if a simple and useful decision tree can be identified19. The tree model was pruned by optimizing the complexity and tree depth parameter using a regular network search with 10 levels and 10-fold cross validation. The significance of the variable was calculated by the total impurity reduction method.

A random forest classification model was built with the aim of increasing predictive performance at the expense of some interpretation. The hyperparameters set included the number of covariates per node split attempt and the minimum node size. Hyperparameters were set by a regular network search with 10 levels and 10-fold cross-validation. All random forest models are built with a thousand trees. The significance of the variable was calculated by Jenny’s impurity reduction method20,23.

The augmented tree model was built using the XGBoost method with a logistic loss function21. Tree depth, minimum node size, learning rate, and minimum loss required to create an additional partition on a leaf node were adjusted using 10-fold nested cross validation and a maximum entropy network search with 100 hyperparameter configurations. The significance of the variable was calculated by the information acquisition method.

After hyperparameter tuning, the final models were refitted for the entire training data set. The performance of the final model was evaluated based on the expectation of the test data set. A rating threshold of 0.5 was used for all models. Receiver operating characteristics (ROC) curve plots, calibration schemes, and decision curve graphs were generated for each modeling approach. Performance measures were calculated for each modeling approach, including area under the ROC curve (AUROC), area under the exact recall curve (AUPRC), sensitivity, specificity, accuracy, logistic and intercept calibration slope. The discriminatory performance of each model was compared with that of the fixed-threshold criteria approaching the UAB (flow volume >500 mL/min and venous diameter >4 mm) and KDOQI (flow volume >600 mL/min and venous diameter >6 mm) ultrasound criteria. Decision curves were plotted for each possible AVF prediction strategy across a range of threshold probabilities24.

Report summary

More information about research design is available in the Nature Research Reporting Summary linked to this article.


Leave a Reply

Your email address will not be published. Required fields are marked *