I am wanting to predict computation time from a set of parameters theta and sample size vector n. The computation is actually an integer linear program and I have about 200,000 runs under varying conditions. The algorithm involves some randomisation so I have included some duplication to estimate pure error.
The problem is that time = f(theta,n) has some unusual features. The relationship involves some hot and cold spots where computation is higher or lower. If these were the only parameters then nearest neighbours or a random forest would possibly model it well. The dependence on the sample sizes however has some special features: (a) it can be erratic. A small increase in sample size can lead to a much longer computation time, which then disappears for the next larger sample size; (b) since the problem is NP-hard, computation times does explode at a certain point.
I have limited data where the computation time is exploding – for obvious reasons. My aim would be to have a prediction model that can tell me (or the user) when computation time is likely to be say greater than 30 mins (and I have very few instances of this).
I was trying to think what kind of model would pick up the anomalous patterns but also extrapolate to larger sample sizes where times will explode. My intuition is that flexible non-parametric methods do not extrapolate well.
Would anyone have a suggestion for what kind of model I might use (within R-studio)?