Machine-learning-assisted prediction of surgical outcomes in patients undergoing gastrectomy

2019-11-08 01:49:46ShengLuMinYanChenLiChaoYanZhenggangZhuWencongLu

Chinese Journal of Cancer Research 2019年5期

Sheng Lu ,Min Yan,Chen Li,Chao Yan,Zhenggang Zhu,Wencong Lu

1Department of General Surgery,Rui Jin Hospital,Shanghai Jiao Tong University School of Medicine,Shanghai Institute of Digestive Surgery,Shanghai 200025,China;2Department of Chemistry,College of Sciences,Shanghai University,Shanghai 200444,China

Abstract Objective:Postoperative complications adversely affected the prognosis in patients with gastric cancer.This study intends to investigate the feasibility of using machine-learning model to predict surgical outcomes in patients undergoing gastrectomy.Methods:In this study,cancer patients who underwent gastrectomy at Shanghai Rui Jin Hospital in 2017 were randomly assigned to a development or validation cohort in a 9:1 ratio.A support vector classification(SVC)model to predict surgical outcomes in patients undergoing gastrectomy was developed and further validated.Results:A total of 321 patients with 32 features were collected.The positive and negative outcomes of postoperative complication after gastrectomy appeared in 100(31.2%)and 221(68.8%)patients,respectively.The SVC model was constructed to predict surgical outcomes in patients undergoing gastrectomy.The accuracy of 10-fold cross validation and external verification was 78.17% and 78.12%,respectively.Further,an online web server has been developed to share the SVC model for machine-learning-assisted prediction of surgical outcomes in patients undergoing gastrectomy in the future procedures,which is accessible at the web address:http://47.100.47.97:5005/r_model_prediction.Conclusions:The SVC model was a useful predictor for measuring the risk of postoperative complications after gastrectomy,which may help stratify patients with different overall status for choice of surgical procedure or other treatments.It can be expected that machine-learning models in cancer informatics research are possibly shareable and accessible via web address all over the world.

Keywords:Gastric cancer;postoperative complications;machine-learning models;support vector classification

Introduction

Gastric cancer is one of the most common malignancies and the second leading cause of cancer death in the world.In China,more than 679,100 new diagnoses are made every year.An estimated 498,000 patients died from gastric cancer in 2015(1).Surgery is the only possible curative treatment,and results of gastrectomy have improved throughout the years with respect to survival,morbidity and postoperative mortality(2).

Concerning the risk of postoperative complications,researchers would generally perform a Student’sttest or Chi square test to discover the risk factors.Other methods include prognostic nutritional index(PNI)(3),modified Glasgow prognostic score(mGPS)(4),the Estimation of Physiological Ability and Surgical Stress(E-PASS)scoring system(5),etc.However,the reliability and practicability of the previous criteria were indeterminate,and the previous methods could not account for the influence of each factor adopted in the equation.

In recent years,cancer informatics and machine-learning models have been successfully applied in cancer research(6,7).In this work,the support vector classification(SVC)model was constructed to predict surgical outcomes in patients undergoing gastrectomy.Furthermore,we provided the web-server for researchers to utilize the model available in this work.Below,we are to describe how to develop a machine-learning model in detail,making the following six steps very clear:1)how to collect a valid benchmark dataset to train and test the model;2)how to check basic statistics of features available;3)how to construct the optimal model based on data pretreatment,feature reduction,model selection,and model optimization;4)how to evaluate the anticipated accuracy of the model;5)how to establish an user-friendly web-server for the model that are accessible to the public;and 6)how to apply the model in diagnosing and taking care of patients after gastrectomy.

Materials and methods

Data collection of patients and variables

This study enrolled 321 patients who were diagnosed with gastric cancer and underwent gastrectomy with lymph node dissection in 2017 at Rui Jin Hospital affiliated to Shanghai Jiao Tong University.Patients who received chemotherapy and who underwent emergency surgery were excluded from the study.Ninety percent of the patients were randomly selected as training set,while the other 10 percent were used as testing set.In this work,we retrospectively reviewed clinical data only in past one year,because surgical and nursing technique has been developed rapidly in recent years.In our center,the number of patients who underwent laparoscopic surgery and enhanced recovery after surgery(ERAS)was increasing in the past few years.Thus,we decided to collect data from the most recent year to construct the model for predicting surgical outcomes in patients undergoing gastrectomy.

We retrospectively reviewed medical history,laboratory findings,operative findings,and surgical outcomes in patients undergoing gastrectomy.Variables included in this study were listed inTable 1.Age was defined at the time of surgery.Body height and weight were measured on admission day.

In this work,patients with postoperative complications were categorized into“positive”group,while the others were categorized into“negative”group.The only endpoint of this study was analysis of in-patients’morbidity.Postoperative complications were defined as either lifethreatening or requiring significant deviation from standard management.These correlate to the Clavien-Dindo classification of Grade II and above complications(8).

Machine-learning methods for classification and prediction

In this work,supervised machine-learning methods including SVC,k-Nearest Neighbor(k-NN),linear discriminant analysis(LDA),general linear model(GLM)were used to construct classification models predicting postoperative complications.The data sets were randomly partitioned into 90% training set and 10% independent test set.Models were built using training set and validated using independent test set.The classification tasks were designed to evaluate the performances of different machine-learning models.For each classification task,feature reduction using principle component analysis was employed to select the most informative features among latent variables from the training set and to avoid overfitting.The optimal model was determined by the performances of the receiver operating characteristic(ROC)curves for different models on the training set.We built the models and selected the features using data only from the training set,in order to rigorously evaluate the performance of our finalized models with the independent test set.The inputs to the classification algorithms were the principle components,which were linear combination of quantitative features available as described in the previous section,and the surgical outcomes in patients undergoing gastrectomy were the predicted results of either positive group with postoperative complications or negative group.Considering the unbalanced data set consisting of positive and negative samples,Random Over-Sampling Examples(ROSE)(9-11)was carried out to deal with the class imbalance problems before modelling.Caret’s varImp function was used to assess feature importance,which calculates the area under the ROC curve.Introduction of mentioned machine-learning methods was provided inSupplementary materials.

Table 1 Variables included in this study

Statistics and implementation

Statistical analyses and machine-learning algorithms were performed using R software(Version 3.5.0;R Foundation for Statistical Computing,Vienna,Austria)installed with caret and ROSE packages.Clinicopathological variables were analyzed using Chi-squared tests for discrete variables,andt-test for continuous variables.P values less than 0.05 were considered significant.The performance of model was evaluated by the area under the ROC curve,specificity and sensitivity,respectively.The ROC curve was created by plotting the true positive rate(TPR)against the false positive rate(FPR)at various threshold settings.TPR is also known as sensitivity,and FPR can be calculated as(1-specificity),which were given as follows:

whereTPis true positive,FPis false positive,TNis true negative,andFNis false negative in the prediction results.All computations were carried out on an Intel Core i7 computer with a 4-core 2.7 GHz processor.

Results

Workflow of machine-learning process

In this work,the machine-learning process can be illustrated inFigure 1.The workflow of modelling mainly consists of procedures for basic statistics after collection of original data,data pretreatment such as deletion of corelated variables and resampling of data set,reduction of features via principal component analysis,model selection based on machine-learning approaches,model optimization via adjusting hyper-parameters,model validation,model accessibility,and model application.

Baseline information

Clinical characteristics and corresponding complication rates were presented inTable 2.Out of 321 patients,100(31.2%)were diagnosed with postoperative complications.Age(P＜0.001),number of comorbidities(P=0.001),surgical mode(P=0.036),length of surgery(P=0.016)and tumor size(P=0.001)were significantly related to postoperative complications among the elderly patients.

Data pretreatment

After splitting the data into training set(n=289)and testing set(n=32),one of data pretreatments is to check the colinearity of the features in training set,since the model would be unsteady if there exist two features with colinearity.After the computation of correlation coefficients between pairs of features(Figure 2),it was found that 9 correlation coefficients of feature pairs were more than 0.5.Therefore,9 variables including weight,height,type of anastomosis,length of anesthesia,fluid intake,red blood cell(RBC),albumin(ALB),glutamic oxaloacetic transaminase(AST),and total bilirubin(TBIL)were deleted.

Another data pretreatment is to resample data set for imbalanced distribution of different classes.In classification problems,a disparity in the frequencies of the observed classes can have a significant negative impact on model fitting.In this study,the number of patients with postoperative complications was less than half of that without complications(positive samplesvs.negative samples:31.2%vs.68.8%).Thus,the ROSE method was executed to resample the unbalanced data set.After resampling,the seriousness of the effects of an imbalanced distribution was considerably relieved(positive samplesvs.negative samples:47.1%vs.52.9%).

Figure 1 Workflow of machine-learning process.PC,principal component.

Feature reduction

Since overmuch variables would reduce the stability and reliability of the constructed models,the principal component analysis(PCA)method was used in this study to decrease the number of variables.It was found that the predictive models would be feasible by using the top 10 PCs as inputs of features(explained 67.6% of all variables).

Table 2 Baseline information of clinical features for all patients

Figure 2 Co-linearity of features in training set.BMI,body mass index;WBC,while blood cell;RBC,red blood cell;PLT,platelet;HBG,hemoglobin;LYM,lymphocyte absolute value;ALT,glutamic-pyruvic transaminase;AST,glutamic oxalacetic transaminase;TP,total protein;ALB,albumin;TBIL,total bilirubin;DBIL,direct bilirubin;CREA,creatinine;BUN,blood urea nitrogen;GLU,glucose.

Model selection

To concisely summarize the prediction performance of the models,we constructed ROC curves,which evaluate the performance of a model in a way that takes the uncertainty of each prediction into account.Figure 3illustrates the ROC distributions constructed by SVC with RBF kernel function,k-NN,linear discriminant analysis(LDA),and general linear model(GLM)using the top 10 PCs as inputs of features,respectively.The ROC results indicated that the performance of SVC was better than those of the other methods.Thus,the SVC method with RBF kernel function was selected to construct the optimal model.

It was found that the classification performance of SVC model is strong because the area under the ROC curve was 0.8033,suggesting that this model would be useful in predicting postoperative complication after gastrectomy.

Model optimization

Figure 3 Comparison of receiver operating characteristic(ROC)for different methods.

The optimal SVC model with RBF kernel function for discriminating different samples could be determined by two parameters,a capacity parameterCand a kernel function parameterσ.Figure 4shows a ROC heatmap for tuning parameters of the optimal model.It could be concluded that the SVC model with capacity parameterC=8,using RBF kernel function with parameterσ=0.08786 could provide the best performance for predicting surgical outcomes in patients undergoing gastrectomy with the sensitivity of 81.73% and specificity of 72.55%.Based on the optimized parameters,the accuracy of training set would reach 94.81%,and the result of 10-fold cross validation showed that the accuracy was 78.17%,while the area under the ROC curve was 0.8275.

Model validation

The effect of prediction verified by external dataset was 78.12%,with sensitivity of 90.91% and specificity of 50.00%.The result indicated that the SVC model available was efficient in predicting surgical outcomes in patients undergoing gastrectomy.

Model accessibility

In order to help surgeons to utilize the SVC model constructed in this work,an online web server was further developed for predicting surgical outcomes in patients undergoing gastrectomy.The online web server to share the model available for machine-learning-assisted prediction of surgical outcomes in patients undergoing gastrectomy can be accessible at the web address:http://47.100.47.97:5005/r_model_prediction.

Model application

The model available can be used not only to predict surgical outcomes of new patients with gastric cancer undergoing gastrectomy but also to evaluate the importance of clinical features based on the Caret’s varImp function,and the rank was demonstrated inFigure 5.

In the process of applying the model available via the web server,the surgeons need input the original data of clinical features.After receiving all of clinical features,the web server can provide online prediction of surgical outcomes in patients undergoing gastrectomy.Therefore,surgeons can obtain the predicted results and prepare further therapies for patients with postoperative complications after gastrectomy in advance.In particular,patients predicted negative outcomes exhibited a considerably reduced risk of postoperative complications,indicating that the SVC model is a helpful predictor of surgical outcomes in patients undergoing gastrectomy.

Discussion

To our knowledge,there are few studies applying machinelearning-assisted model to predict surgical outcomes in patients undergoing gastrectomy.In this study,we designed a workflow of machine-learning approach by using top 10 PCs as inputs coming from 23 clinical features.The machine-learning classifiers was built and evaluated for prediction of surgical outcomes in patients undergoing gastrectomy.We also validated our methodology using an independent test set and provided the online web server to share the model.

Figure 4 Receiver operating characteristic(ROC)heatmap for tuning parameters.Resampling:10-fold cross-validation,repeated 3 times.

Our SVC model demonstrated that chronologic age was the most important variable concerning on postoperative complications after gastrectomy,followed by tumor size,number of comorbidities,etc.(Figure 5).These variables reflect both immunonutritional status and clinicopathological characteristics of surgical patients.Variables with higher rank may relate more closely to postoperative complications.For instance,elderly patients often have age-associated physiologic problems such as decreased organ reserve,concomitant comorbidities,and mental imbalance,leading to a higher risk for complications.Several articles also showed that age was an independent risk factor of postoperative complications,which indicated the relevance between machine-learning results and clinical facts(12,13).The SVC model also indicated that tumor size was the major variable related to postoperative complications after gastrectomy,in agreement with the report that the mean tumor size in the reoperation group was greater than that in the non-reoperation group(14).Besides the chronologic age and tumor size,our model also revealed that the number of comorbidities is among the top three important factors influencing postoperative complications,agreeing with the fact reported(15,16).In concordance with previous studies,basic statistics of this study confirmed that chronologic age was significantly correlated with postoperative complications(17,18).Univariate analysis also showed that number of comorbidities,tumor size,surgical mode,and length of surgery were significantly associated with postoperative complications,indicating that the risk of postoperative complication was related to multiple factors,including preoperative performance status,clinicopathological features,surgical stress,etc.

As the population ages,the number of surgical interventions in gastric cancer patients has been rapidly increasing in China.Overall,about 31% of patients occurred postoperative complications according to our data.Some researchers have pointed out that postoperative complications would adversely affect the overall survival in patients with cancer of digestive system(19,20).Mantovaniet al.suggested that poor outcome might result from invisible residual tumor cells,the proliferation and metastasis of which could be promoted by inflammatory responses because of severe postoperative complications(21).Moreover,severe postoperative complications could also delay chemotherapy that was necessary to prolong the survival of patients with gastric cancer.Therefore,it is of importance to set up an informative model for the evaluation of performance status and the prediction of surgical outcome of elderly patients,considering the organic function and surgical invasion.A valid predictive model can be utilized to identify the appropriate treatment modality.There’s a very high probability that patients predicted negative outcomes may minimize cancer-related death and prolong disease-specific survival by allowing the recommended lymph node resection regardless of chronologic age or other factors.However,large prospective analyses are necessary to validate this recommendation.

Although this study has a number of strengths,it also has several limitations.Despite the successful application of machine-learning technology,which offers good sensitivity in postoperative complication identification,the specificity of external dataset was not high,which means the high-risk patients distinguished by prediction model might be not truly concurrent the postoperative complications.The possible implication of our model is to help doctors find patients who are more likely to suffer postoperative complications.Another limitation of this study is that the intraoperative features were essential for a better predictor,although the machine-learning model revealed the critical role of preoperative features.Further validation in additional cohorts of patients undergoing gastrectomy is necessary to confirm these conclusions in prospective research.We hope that the presented work provides readers with machine-learning tools that they can incorporate into their work.

Figure 5 Importance of clinical features via the model.HBG,hemoglobin;ALT,glutamic-pyruvic transaminase;CREA,creatinine;LYM,lymphocyte absolute value;PLT,platelet;WBC,while blood cell;GLU,glucose;BUN,blood urea nitrogen;BMI,body mass index;DBIL,direct bilirubin.

Conclusions

To improve the long-term prognosis of patients with gastric cancer who have undergone gastrectomy,preventing postoperative complications is of critical importance.The SVC model available is a useful predictor for measuring the risk of postoperative morbidities and may help stratify patients with different overall status for choice of surgical procedures or other treatments.

Acknowledgements

None.

Footnote

Conflicts of Interest:The authors have no conflicts of interest to declare.

Chinese Journal of Cancer Research2019年5期

Chinese Journal of Cancer Research的其它文章: A study on service capacity of primary medical and health institutions for cervical cancer screening in urban and rural areas in China; Medical expenditures for colorectal cancer diagnosis and treatment:A 10-year high-level-hospital-based multicenter retrospective survey in China,2002-2011; Surgical outcomes of hand-assisted laparoscopic liver resection vs.open liver resection:A retrospective propensity scorematched cohort study; Texture analysis on gadoxetic acid enhanced-MRI for predicting Ki-67 status in hepatocellular carcinoma:A prospective study; Prognostic significance of lymphovascular infiltration in overall survival of gastric cancer patients after surgery with curative intent; Phosphoglucose isomerase gene expression as a prognostic biomarker of gastric cancer

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡