Objectives: Survival after liver transplant depends on pretransplant, peritransplant, and posttransplant factors. Identifying effective factors for patient survival after transplant can help transplant centers make better decisions.
Materials and Methods: Our study included 902 adults who received livers from deceased donors from March 2011 to March 2014 at the Shiraz Organ Transplant Center (Shiraz, Iran). In a 3-step feature selection method, effective features of 6-month survival were extracted by (1) F statistics, Pearson chi-square, and likelihood ratio chi-square and by (2) 5 machine-learning techniques. To evaluate the performance of the machine-learning techniques, Cox regression was applied to the data set. Evaluations were based on the area under the receiver operating characteristic curve and sensitivity of models. (3) We also constructed a model using all factors identified in the previous step.
Results: The model predicted survival based on 26 identified effective factors. In the following order, graft failure, Aspergillus infection, acute renal failure and vascular complications after transplant, as well as graft failure diagnosis interval, previous diabetes mellitus, Model for End-Stage Liver Disease score, donor inotropic support, units of packed cell received, and previous recipient dialysis, were found to be predictive factors in patient survival. The area under the receiver operating characteristic curve and model sensitivity were 0.90 and 0.81, respectively.
Conclusions: Data mining analyses can help identify effective features of patient survival after liver transplant and build models with equal or higher performance than Cox regression. The order of influential factors identified with the machine-learning model was close to clinical experiments.
Key words : Cox regression, Data mining, Deceased-donor transplant, Feature selection
Introduction
Liver transplant (LT) is the preferred choice of therapy for patients with end-stage liver disease. Posttransplant outcomes have improved as the result of advances in LT.1 However, due to shortages of donated livers and factors affecting successful LT, liver allocation faces several challenges, and graft survival and patient survival are important research problems.2 In recent years, researchers have conducted studies using Model for End-Stage Liver Disease (MELD) to identify whether it can predict survival after LT.3-7 Evidence has proved that MELD alone is not sufficient to predict survival following LT.4,8,9 Several studies have considered various characteristics to predict LT patient survival.10-13 Although the estimation of LT survival based on pre-LT variables could be beneficial for liver allocation policies,14 enhancing post-LT management can prolong survival time.
Health data sets are imbalanced, complex, and ambiguous; therefore, choosing a good predictive technique to analyze health data is challenging.15 Researchers have previously tried to identify key survival factors of LT patients based on statistical methods.16-21 These methods need to consider some assumptions that limit their application. For example, Cox regression assumes that key factors have similar effects on survival over time.22 In recent years, several studies have predicted LT patient survival by machine-learning methods. Unlike the common survival analysis methods, machine learning is not limited by any assumption on the data distribution or structure, and it can model negative and positive targets based on nonstructured, complex, and noisy data.23-25 Extracting hidden knowledge based on the adaptation of machine-learning algorithms is not straightforward, but it can recognize relations and patterns in the data by learning and can rank the importance of key factors among unstructured and imbalanced data sets.
The modeling of outcomes of organ transplant using machine-learning techniques has been con-ducted in LT.25-29 In some of these studies, machine learning has predicted graft and patient survival more accurately than other validated scores and Cox regression.24,25,29,30
This study aimed to model the survival of adult LT recipients by identifying key factors that can affect patient survival and to rank the importance of effective factors. In this regard, we investigated pretransplant, peritransplant, and posttransplant factors by using machine-learning techniques and, to confirm our approach, compared the performances with Cox regression performance.
Materials and Methods
Patients
A retrospective cross-sectional study was conducted on adult patients (≥ 18
years old) who received LT for the first time between March 2011 and March 2014
at the Shiraz Organ Transplantation Center (Shiraz, Iran). In this center,
long-term follow-up is done. Hence, patients who had received a liver from an
extended-criteria or non-extended-criteria brain-dead donor were included. The
definition of extended-criteria donor was as follows: donor ≥ 60 years old, or
cold ischemia time of ≥ 14 hours, or donor hypotension. For this study, 932
patients were included. Of these 932 patients, 30 met the following exclusion
criteria: lack of medical records and impossibility of follow-up for 6 months
due to changes in phone number or migration to another city or country.
Therefore, the follow-up status of cases included in the study was recorded as
either alive or dead.
Data set
Fifty-eight variables regarding recipients, donors, surgical procedures, and
complications (n = 13, 20, 7, and 18, respectively) were collected as input
variables. The demographic and clinical variables of all patients were collected
until either death or survival at follow-up. During data collection from patient
medical records, researchers compiled metadata to provide definition, validity,
relevancy, and consistency of data. A database of 932 patients was created. Our
data set included categorical, continuous, and nominal data. With the exception
of continuous variables, other variables were displayed as numeric codes. Data
encoding was conducted based on metadata to improve consistency.
Value consistency was considered during data entry. If inconsistent values were detected, data were verified again according to the patient’s medical records, and precise values were entered into the data set. The missing values were filled by rechecking the medical records, which provided the most accurate information. To select more relevant and informative features, variables with < 30% missing values were considered in the analysis. Thus, 5 donor-related variables (presence of blood hypotension, hepatitis B core antibody in intensive care unit, donor previous history of diabetes mellitus, alcoholic features, and blood hypertension) were excluded. To account for the missing values, mode values for categorical variables and mean values for continuous variables were replaced.
Machine-learning methods
Our approach was a three-step data mining process based on feature selection
aimed to find effective variables. Figure 1 shows the schematic represen-tation
of the study methodology. For the first feature selection step, important
variables were identified by statistical tests of the association between each
variable and output variable. The importance of each candidate predictor was
calculated based the P value. For continuous and for categorical variables,
P
values were calculated based on the F statistic, Pearson
chi-square test, or likelihood ratio chi-square test. Only three variables were
unimportant; therefore, these were filtered from modeling.
Classifiers were built in the training set, and then the second set was used to validate the model. Five well-known machine-learning techniques, that is, support vector machine (SVM),31 Bayesian network,32 C5.0 decision tree,33 multilayer perceptron neural network (MLP),34 and K-nearest neighbors,35 were applied for the second round of feature selection. Next, to compare models with a standard and popular method, Cox regression was fitted to the data set. At step 3, machine-learning models were also compared with each other based on performance evaluation criteria, and the well-performed machine-learning model was applied on the effective features selected in step 2.
Statistics and performance metrics
Sensitivity, specificity, precision, accuracy, F measure, and area under the
receiver operating characteristic (ROC) curve were calculated to evaluate the
effectiveness of features as identified by models. True positives (TP) refer to
patients who died and who were correctly predicted as dead. False positives (FP)
refer to patients who survived and who were incorrectly predicted as dead. True
negatives (TN) refer to patients who survived and who were correctly predicted
as having survived. False negatives (FN) refer to deceased patients who were
incorrectly predicted as having survived.
The following equations show definitions of the performance metrics:
F measure is defined as a harmonic mean of sensitivity and specificity, and its
range is between 0 and 1. The area under the ROC curve (AUC) generates values
from 0 to 1, in which a value > 0.9 is considered as an excellent
discrimination, a value between 0.8 and 0.9 is considered as a good classifier,
and a value between 0.5 and 0.8 is considered as useful.
According to AUC and sensitivity of validation set, we selected final
machine-learning models that performed well at steps 2 and 3. Based on effective
factors found in all well-performed models in step 2, the data were analyzed for
a second time by a well-performed technique in step 3 (Figure 1), and the most
important factors that affected patient survival were proposed and ranked based
on importance.
P values < .05 were considered statistically significant. MedCalc for Windows
version 14.8.1 (MedCalc Software, Ostend, Belgium) was used to calculate the
AUC. We used IBM SPSS statistics for Windows version 23 (IBM SPSS Inc., Armonk,
NY, USA) for statistical analysis. We used IBM SPSS Modeler for Windows, version
14.2 (IBM SPSS Inc.), a predictive analytics platform, to perform data mining
analyses. We also used the IBM SPSS Modeler for Cox regression analyses.
Results
Characteristics
Detailed characteristics are shown in Table 1. The follow-up period was
fulfilled in 902 patients (mean ± standard deviation age of 41.5 ± 13.3 y;
range, 18-74 y), which included 571 men (63.3%) and 331 women (36.7%).
Ninety-four patients died from LT complications during the 6-month follow-up
period. Survival probability of patients was 89.6% by Kaplan-Meier method. For
the analysis, patients were randomly divided into either the model building (n =
634) or internal validation (n = 268) group. Survival rates of the model
building and validation subsets were 90.22% and 88.06%, respectively. Mean
survival time of patients in this study was generally 165.06 ± 46.74 days. In
the model building set, it was 166 ± 45.32 days; in the validation set, it was
162.85 ± 49.94 days. The log-rank and Breslow test results showed no significant
difference between the survival curves for model building and validation sets (P
= .328, P = .325, respectively).
In this study, the most frequent cause of liver failure was hepatitis B virus in
24.7% of all patients. Primary sclerosing cholangitis, cryptogenic cirrhosis,
and autoimmune hepatitis were found in 18.8%, 16.9%, and 15% of the
participants, respectively. Other causes of liver failure such as alcoholic
liver disease, hepatitis C virus, non-alcoholic fatty liver disease, Wilson
disease, fulminant hepatic failure, primary biliary cirrhosis, and Budd-Chiari
disease were 1.4%, 3.3%, 1.1%, 5.5%, 0.7%, 2%, and 3.5%, respectively.
Hepatocellular carcinoma was found in 7.6% of patients. A MELD score of > 20 was
shown in 402 patients (44.57%), and MELD ≤ 20 was shown in 500 patients
(55.43%). Percentages of patients with Child-Pugh class A, B, and C scores were
10.4%, 49.8%, and 39.8%, respectively, showing that about one-half of patients
were in class B. Recipient previous history of diabetes mellitus was found in
124 patients (13.75%), and 150 patients (16.63%) were diagnosed with diabetes
mellitus after LT.
Data mining results: Classifier performance
The validation results are shown in Table 2. The minimum and maximum accuracy
results of the machine-learning models were 0.92 and 0.96, as shown in the MLP
and SVM models, respectively. Therefore, machine learning could correctly
predict at least 92% of all patients, correctly classifying patients as well as
Cox regression (94% accuracy).
However, what is important in LT studies is correct prediction of death, called
sensitivity. The sensitivity of the SVM model was higher than results shown in
other machine-learning techniques in this study. Cox model sensitivity among all
models was ≤ 3 of machine-learning models. The data set was imbalanced (94
patients died and 808 patients survived), considering a measure that evaluates
the minority class performance to be essential, but specificity was important,
too. Therefore, we considered AUC as a fair base criterion. The ROC curves of
models are shown in Figure 2. Compared with other models in step 2, MLP also
performed well. That is, SVM and MLP models were not significantly different (P
= .1968) based on AUC, but SVM was more sensitive (sensitivity results for SVM
and MLP models were 0.78 and 0.69, respectively). Therefore, SVM was selected as
the well-performing model in step 2. When we compared SVM as the selected
machine-learning model with Cox, based on AUC, SVM performed better than Cox
regression (AUC = 0.78). Results showed that these 2 models were significantly
different (P = .0079). Other criteria such as precision and F measure were also
in favor of the SVM model.
The SVM model outperformed other machine-learning models; therefore, we used it
for data set analyses again for the step 3 feature selection based on effective
variables (Figure 2). Performance results of the final model are also shown in
Table 2. The final SVM model performed well on the data set (AUC = 0.90 and
sensitivity = 0.81). Indeed, the results showed how the 3-step feature reduction
of irrelevant variables could influence the AUC in survival prediction.
Feature selection results
Before partitioning in the first feature selection step, 3 variables (donor
liver microvesicular steatosis, recipient sex and peritransplant portal vein
thrombosis) were not selected. In the second step, to discover the hidden
influential features in the patient data set, machine-learning analysis was
conducted on the data set based on 50 independent variables. Each technique was
considered as a feature selection method, resulting in 5 models with some
suggestions.
The results revealed that postoperative factors could be more effective than
demographic and pretransplant ones with regard to patient survival. Compared
with other machine-learning models in step 2, the order of important variables
in SVM could be clinically more consistent based on experience (Table 3).
Although all of the technically important factors were not clinically
discriminative, a possibly significant effect on survival with a combination of
factors could be predicted.
In the step 2 analysis, graft failure, Aspergillus infection, vascular
complications, acute renal failure, graft type, and graft failure diagnosis
interval (GFDI) were identified as predictors in all models (5, 3, 3, 2, 2, 2
repeated, respectively), with other important factors presented only in some or
one model (Table 3).
Based on the 26 features from step 2, the third feature selection step was
applied on the data set. The final model, as shown in Table 4, included
complications at high ranks and other recipient-, donor-, and surgery-related
factors at lower ranks.
Graft failure with importance of 0.47 was identified as the most influential
factor; Aspergillus infection and acute renal failure with 0.8 impact factor
were the second most important factors. The importance of vascular complications
was 0.07, and the importance of GFDI (primary nonfunctioning or malfunction) was
0.05. Importance of preoperative predictive factors in the model, including
recipient history of diabetes mellitus, MELD, use of inotropic support for
donor, and recipient receiving units of packed red blood cells in the model, was
0.03, 0.02, 0.02, and 0.01, respectively. Sixteen other factors had less effect
on patient survival after LT.
In our study, 115 patients received organs from extended-criteria donors (ECD);
these patients had a survival rate of 79%. Patients who received a liver from an
ideal donor had a survival rate of 91%. Although ECD was not identified as a
predictive factor in the final model, it was identified as an independent risk
factor in the SVM model in step 2.
Cox regression was used to compare prediction performances. Table 5 shows the
feature selection results of Cox regression. It was not considered in the third
step of feature selection.
Discussion
The results of our study showed that the machine-learning model outperformed the
Cox regression model. The accuracy of the machine-learning model was higher than
the accuracy of the Cox model, as was its sensitivity to detect effective
factors (Table 2). In addition to the increased sensitivity and AUC of the
model, the machine-learning model reduced the subset of informative features and
ranked the predictive factors appropriately. Furthermore, this study showed
that, in high-dimensional LT datasets, Cox regression model had lower
sensitivity compared with machine-learning models. Thus, data mining can be a
beneficial method for identifying predictive factors for patient survival after
LT in data sets with high dimensionality.
Based on our machine-learning results, the order of LT patient survival
predictive factors in step 3 of the machine-learning model was close to that
shown in clinical experience (Table 4). The results demon-strated that graft
failure is overwhelmingly the strongest predictive factor, as we expected. It is
not surprising, but this result revealed the performance of our machine-learning
method. As shown in Table 4, some variables, including post-LT Aspergillus
infection, post-LT acute renal failure, vascular complications, GFDI, recipient
history of diabetes mellitus, MELD score, donor inotropic support in the
intensive care unit, number of packed red blood cell units, and pre-LT dialysis,
have more important effects on patient survival after LT. Haseli and
associates36 also reported that patient survival after LT was influenced by
graft type, Pediatric End-Stage Liver Disease or MELD score, complications after
LT, and initial diagnosis. In a study similar to ours, Khosravi and associates
reported that the order of influential factors in the Cox regression model could
be close to clinical findings.25 However, we found that the clinically extracted
factors in machine-learning models are consistent with experience. It is not
clear why some differences exist in the technical results between our study and
the research from Khosravi and colleagues; however, a possible justification is
the difference in our approach and the way we filtered irrelevant variables in
the 3 steps and selected influential variables as inputs for the next phase.
Patients who receive ECD livers have outcomes similar to those who receive
normal livers.37 In our study, we considered recipients of both normal and ECD
livers, with investigation of whether ECD is an influential factor. In step 2 of
feature selection, it was a predictive factor; however, among the 26 features in
the last step, it could not be extracted. Therefore, ECD may be an essential
factor to better understand patient survival, but it is not more important than
the other variables extracted in step 3. Complications are potential causes of
LT surgery death; indeed, based on step 3 results, the post-LT predictive
factors were more influential than the other factors. Our extracted
discriminative factors were in line with results of Haseli and colleagues36 and
Khosravi and associates.25
Our results showed that the number of packed cell units is a predictive factor
of survival after LT, which is in accordance with previous studies.38-40
Furthermore, having a history of diabetes mellitus could have a negative effect
on patient survival.41-43 Diabetes mellitus after LT also increased patient
morbidity.44 We found that recipient diabetes mellitus was an important factor;
however, development of diabetes mellitus post-LT was removed from the model.
Post-LT acute renal failure was extracted as a predictive factor in our study;
this showed the substantial impact of this factor on survival. Other studies
also revealed that post-LT acute renal failure and recipient history of diabetes
mellitus could predict survival in LT recipients.45,46 We found that history of
dialysis could also be as a predictive factor for patient survival. Furthermore,
our study showed that donor inotropic support in an intensive care unit was as
important as MELD and the number of packed red blood cell units for LT survival
prediction. However, Feng and associates47 did not identify inotropic support to
be related to mortality after LT.
Most studies on LT survival have had a number of limitations. A major limitation
of our study was the lack of a central data repository; hence, data had to be
collected manually from different records at difference places in the center.
Therefore, data collection was time consuming and preprocessing was difficult.
The second limitation of this study was lack of knowledge of nutrition and
lifestyle characteristics of patients. In addition, to investigate other
survival times, such as at 12 or 24 months or more, by classification
techniques, we required separate runs on separate data sets. This was because
some complications may occur in the second 6 months or in second 1 year after LT
but do not occur in the first 6 months or the first 1 year. Therefore, this
situation may cause some missing value in data set and may result in differences
in performance of the models, with possible influential factors of patient
survival at 12 and 24 months or more. Another limitation of this study was its
unbalanced data set, which made the analysis complex; however, we showed how
machine learning could correctly recognize minor class examples.
Conclusions
Our study aimed to model patient survival after LT using machine-learning
methods to investigate influential factors and compare the performance of these
methods with a classic statistic method. We applied a 3-step machine-learning
feature selection model to determine factors affecting patient survival after
LT. Our data mining approach was able to provide an improvement in performance
of predicting LT outcomes. Predicting patient survival after LT by machine
learning led to a more consistent selection of important factors that agreed
with clinical experience. We recognize that a large cohort study that would
include characteristics considered in this study, post-LT nutrition, and
lifestyle variables may allow a better understanding of the factors associated
with patient survival after LT. Creating an LT data warehouse can help
facilitate research and improve quality of data.
References:
Volume : 17
Issue : 6
Pages : 775 - 783
DOI : 10.6002/ect.2018.0170
From the 1Student Research Committee, School of Management and Medical
Information Sciences, Shiraz University of Medical Sciences, Shiraz, Iran; the
2Abualisina Transplant Hospital, Shiraz University of Medical Sciences, Shiraz,
Iran; the 3Department of Computer Science and Engineering and IT, School of
Electrical Engineering and Computer, Shiraz University, Shiraz, Iran; and the
4Health Human Resources Research Center, School of Management and Medical
Information Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
Acknowledgements: This research was performed in partial fulfillment of the
requirements for the MSc degree in Medical Informatics, Shiraz University of
Medical Sciences in Shiraz, Iran. This study was supported by the Vice
Chancellor for research affairs of Shiraz University of Medical Sciences
(proposal number of 93-7456). We thank the Research Vice Chancellor of Shiraz
University of Medical Sciences for financially supporting the research, the
Research Consultation Center for statistics consultations, and the Center for
Development of Clinical Research of Nemazee Hospital for editorial assistance
from Dr. Shokrpour. The authors have no conflicts of interest to declare.
Corresponding author: Roxana Sharifian, School of Management and Medical
Information Sciences, Almas Building, Alley 29, Qasrodasht Ave, Shiraz, Iran; or
Ashkan Sami, Department of CSE and IT, School of Electrical Engineering and
Computer Science, Shiraz University, Mollasadra Street, Shiraz, Iran
Phone: +88 6451371989
E-mail: Sharifianr@sums.ac.ir /
sharifianroxana@gmail.com
or Sami@shirazu.ac.ir /
Ashkan.Sami@gmail.com
Table 1. Characteristics of Study Patients (N = 902)
Table 1 (continued) Characteristics of Study Patients (N = 902)
Table 2. Performance Criteria of All Models on Validation Set, Sorted by Area Under the Curve
Table 3. Effective Variables Based on Data Mining Methods
Table 4. Effective Variables of Final Machine-Learning Model
Table 5. Effective Variables Based on Cox Regression
Figure 1. Methodology Representation
Figure 2. Receiver Operating Characteristic Curves of Step 2 Models to Compare Methods
Performance Metrics