It's not that you intrinsically need a human, it's that doing this without human oversight requires being very careful not to make tricky mistakes.
The nature of statistical significance (which underpins everything you've said), is that repeating many experiments reduces the confidence you should have in your results. Supposing each algorithm is an experiment and each experiment is independent, if you target a significance level of p = 0.05, you can expect to find 1 correlated feature out of every 20 you test just by chance.
Can you automatically correct for this? Sure. But this is just one possible footgun. Are you confident you're avoiding them all? In theory automation could do an even better job than a human of avoiding the myriad statistical mistakes you could make, but in practice that requires significant upfront effort and expertise during the development process.
At a certain point doing this automatically becomes analogous to rolling your own crypto. It's not quite an adversarial problem, but it's quite easy to screw up.
I agree that cross validating would work; that's what I was gesturing to when I was talking about making an assessment of the data and partitioning it. Either the provided sample should be partitioned for cross validation, or it should prompt the user for a second set.
Correct. My point is that both humans and machines face the same issues. At least with a machine you get a consistent errors (that does not cost you time), which you can decrease with time.
With humans you must make sure that the same human with the same skill set who knows stats at Master Level, will always be there for your specific data and actually have the time to do the experiments.
Also, I think that 95% of the users/consumers of machine learning are non consumers - I.e. they do not have ANY access to any machine learning tech, and thus need to revert to guessing.
So the ethical thing to do is actually give them some tool even if it might not be optimal.
The nature of statistical significance (which underpins everything you've said), is that repeating many experiments reduces the confidence you should have in your results. Supposing each algorithm is an experiment and each experiment is independent, if you target a significance level of p = 0.05, you can expect to find 1 correlated feature out of every 20 you test just by chance.
Can you automatically correct for this? Sure. But this is just one possible footgun. Are you confident you're avoiding them all? In theory automation could do an even better job than a human of avoiding the myriad statistical mistakes you could make, but in practice that requires significant upfront effort and expertise during the development process.
At a certain point doing this automatically becomes analogous to rolling your own crypto. It's not quite an adversarial problem, but it's quite easy to screw up.
I agree that cross validating would work; that's what I was gesturing to when I was talking about making an assessment of the data and partitioning it. Either the provided sample should be partitioned for cross validation, or it should prompt the user for a second set.