Recursive partitioning is a statistical method for multivariable analysis.[1] Recursive partitioning creates a decision tree that strives to correctly classify members of the population by splitting it into sub-populations based on several dichotomous independent variables. The process is termed recursive because each sub-population may in turn be split an indefinite number of times until the splitting process terminates after a particular stopping criterion is reached.
Recursive partitioning methods have been developed since the 1980s. Well known methods of recursive partitioning include Ross Quinlan's ID3 algorithm and its successors, C4.5 and C5.0 and Classification and Regression Trees (CART). Ensemble learning methods such as Random Forests help to overcome a common criticism of these methods – their vulnerability to overfitting of the data – by employing different algorithms and combining their output in some way.
This article focuses on recursive partitioning for medical diagnostic tests,
but the technique has far wider applications.
See decision tree.
As compared to regression analysis, which creates a formula that health care providers can use to calculate the probability that a patient has a disease, recursive partition creates a rule such as 'If a patient has finding x, y, or z they probably have disease q'.
A variation is 'Cox linear recursive partitioning'.[2]
Advantages and disadvantages
Compared to other multivariable methods, recursive partitioning has advantages and disadvantages.
Advantages are:
Generates clinically more intuitive models that do not require the user to perform calculations.[3]
Allows varying prioritizing of misclassifications in order to create a decision rule that has more sensitivity or specificity.[2]
Examples are available of using recursive partitioning in research of diagnostic tests.[6][7][8][9][10][11] Goldman used recursive partitioning to prioritize sensitivity in the diagnosis of myocardial infarction among patients with chest pain in the emergency room.[11]
^Breiman, Leo (1984). Classification and Regression Trees. Boca Raton: Chapman & Hall/CRC. ISBN978-0-412-04841-8.
^ abCook EF, Goldman L (1984). "Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis". Journal of Chronic Diseases. 37 (9–10): 721–31. doi:10.1016/0021-9681(84)90041-9. PMID6501544.
^James KE, White RF, Kraemer HC (2005). "Repeated split sample validation to assess logistic regression and recursive partitioning: an application to the prediction of cognitive impairment". Statistics in Medicine. 24 (19): 3019–35. doi:10.1002/sim.2154. PMID16149128.
^Kattan MW, Hess KR, Beck JR (1998). "Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression". Comput. Biomed. Res. 31 (5): 363–73. doi:10.1006/cbmr.1998.1488. PMID9790741.
^Lee JW, Um SH, Lee JB, Mun J, Cho H (2006). "Scoring and staging systems using cox linear regression modeling and recursive partitioning". Methods of Information in Medicine. 45 (1): 37–43. doi:10.1055/s-0038-1634034. PMID16482368.
^Edworthy SM, Zatarain E, McShane DJ, Bloch DA (1988). "Analysis of the 1982 ARA lupus criteria data set by recursive partitioning methodology: new insights into the relative merit of individual criteria". J. Rheumatol. 15 (10): 1493–8. PMID3060613.
^Stiell IG, Greenberg GH, Wells GA, et al. (1996). "Prospective validation of a decision rule for the use of radiography in acute knee injuries". JAMA. 275 (8): 611–5. doi:10.1001/jama.275.8.611. PMID8594242.
^ abGoldman L, Weinberg M, Weisberg M, et al. (1982). "A computer-derived protocol to aid in the diagnosis of emergency room patients with acute chest pain". N. Engl. J. Med. 307 (10): 588–96. doi:10.1056/NEJM198209023071004. PMID7110205.