Haimo Fang, Kevin Tan, Giles Hooker
This paper shows a nice connection between Gradient boosted decision trees and kernel ridge regression. Roughly speaking, the relevant kernel (for a single tree) is:
Sij = 1 if xi and xj are part of the same leaf.
The authors use this kernel to derive uncertainty estimates for the tree predictions. Furthermore, they show that their approach works with the fast, approximate Nystrom method which is O(n) in time as opposed to O(n³) for full kernel ridge regression.
I like the connection between trees and kernel methods; it shows that trees already contain information about uncertainty of their predictions.
Statistical Inference for Gradient Boosting Regression