Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So you're buying the idea that it looked at a bunch of code snippets embedded at various pages, managed to build a sub-model for PHP (separate from all other languages it should have encountered) and managed to generate a long, nearly syntactically correct program uninterrupted by English text?

And while it makes tons of obvious mistakes in English (which is a much more flexible and forgiving language), its PHP is somehow nearly syntactically perfect?

-

Examples from GPT-2 GitHub have a lot of code:

https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-...

To me, this doesn't seem like an argument in favor of this model "understanding" English (or C, or PHP). It seems more like an indication that it memorizes way more information than the paper implies and then does clever word substitution.



Yes, I do think that it learned a model of PHP and JavaScript syntax. 40GB of text data is a lot, and PHP syntax is a lot simpler than English grammar, which it learns quite well.

See also the example in the paper of accidentally learning to translate into French even though they tried to remove French pages from the corpus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: