Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
thegeomaster
21 days ago
|
parent
|
context
|
favorite
| on:
The inefficiency of RL, and implications for RLVR ...
You could think of supervised learning as learning against a known ground truth, which pretraining certainly is.
Davidzheng
21 days ago
[–]
a large number of breakthroughs in AI are based on turning unsupervised learning into supervised learning (alphazero style MCTS as policy improvers are also like this). So the confusion is kind of intrinsic.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: