Interesting to compare this with [1], which shows that for some concept classes ...

Interesting to compare this with [1], which shows that for some concept classes neural networks trained with SGD are much easier to train (i.e., require less data).

In other words, even though these two types of model are in a way equivalent, one can be much easier to train than the other for certain concepts (no free lunch).

[1] https://arxiv.org/abs/2001.04413