I have also many years of programming experience and find myself strongly "accelerated" by LLMs when writing code. But, if you think at it, it makes sense that many seasoned programmers are using LLMs better. LLMs are a helpful tool, but also a hard-to-use tool, and in general it's fair to think that better programmers can do a better use of some assistant (human or otherwise): better understanding its strengths, identifying faster the good and bad output, providing better guidance to correct the approach...
Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.
I completely agree that communication skills are critical in extracting useful work or insight from LLMs. The analogy for communicating with people is not far-fetched. Communicating successfully with a specific person requires an understanding of their strengths and weaknesses, their tendencies and blind spots. The same is true for communicating with LLMs.
I have actually found that from a documentation point of view, querying LLMs has made me better and explaining things to people. If, given the documentation for a system or API, a modern LLM can't answer specific questions about how to perform a task, a person using the same documentation will also likely struggle. It's proving to be a good way to test the effectiveness of documentation, for humans and for LLMs.
Communication skills are the keys to using LLMs. Think about it: every type of information you want is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea. If one is casual in their request, using casual language, then the LLM will reply with a casual reply because that matched your request best. To get a hard, factual answer from those that are experts in a subject, use the formal term, use the expert's language and you'll get back a rely more likely to be correct because it's in the same level of formal treatment as correct answers.
Actually, I'm afraid that no. It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace, with each of us enjoying life in its own thriving manner. That would be great information to broadcast, though.
Also it equally has ability to produce large pile of completely delusional answers, that mimics just as well genuinely sincere statements. Of course, we can also receive that kind of misguiding answers from humans. But the amount of output that mere humans can throw out in such a form is far more limited.
All that said, it's great to be able to experiment with it, and there are a lot of nice and fun things to do with it. It can be a great additional tool, but it won't be a self-sufficient panacea of information source.
> Think about it: every type of already solved problem you want information about is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea.
then that was not clear from your comment saying LLMs contain any information you want.
One has to be careful communicating about LLms because the world is full of people that actually believe LLMs are generally intelligent super beings.
I think GP's saying that it must be in your prompt, not in the weights.
If you want LLM make sandwich, you have to tell them you `want triangular sandwiches of standard serving size made with white bread and egg based filling`, not `it's almost noon and I'm wondering if sandwich for lunch is a good idea`. Fine-tuning partially solves that problem but they still like the former.
Sure. Lately I've found that the "role" part of prompt engineering seems to be the most important. So what I've been doing is telling ChatGPT to play the role of the most educated/wise/knowledgeable/skilled $field $role(advisor, lawyer, researcher etc) in the history of the world and then giving it some context for the task before asking for the actual task.
Sometimes asking it to self reflect on how the prompt itself could be better engineered helps if the initial response isn't quite right.
Hey! Asking because I know you're a fellow vimmer [0]. Have you integrated LLMs into your editor/shell? Or are you largely copy-pasting context between a browser and vim? This context-switching of it all has been a slight hang-up for me in adopting LLMs. Or are you asking more strategic questions where copy-paste is less relevant?
[0] your videos on writing systems software were part of what inspired me to make a committed switch into vim. thank you for those!
I do not remember a single instance when code provided to me by an LLM worked at all. Even if I ask something small that cand be done in 4-5 lines of code is always broken.
From a fellow "seasoned" programmer to another: how the hell do you write the prompts to get back correct working code?
I'd ask things like "which LLM are you using", and "what language or APIs are you asking it to write for".
For the standard answers of "GPT-4 or above", "claude sonnet or haiku", or models of similar power and well known languages like Python, Javascript, Java, or C and assuming no particularly niche or unheard of APIs or project contexts the failure rate of 4-5 line of code scripts in my experience is less than 1%.
It's mostly Go, some Python, and I'm not asking anything niche. I'm asking for basic utility functions that I could implement in 10-20 lines of code. There's something broken every single time and I spend more time debugging the generated code than actually writing it out.
I'm pretty sure everybody measures "failure rate" differently and grossly exaggerate the success rate. There's a lot of suggestions below about "tweaking", but if I have to "tweak" generated code in any way then that is a failure for me. So the failure rate of generated code is about 99%.
I write the prompt as if I’m writing an email to a subordinate that clearly specifies what the code needs to do.
If what I’m requesting an improvement to an existing code, I paste the whole code if practical, or if not, as much of the code as possible, as context before making request for additional functionality.
Often these days I add something like “preserve all currently existing functionality.” Weirdly, as the models have gotten smarter, they have also gotten more prone to delete stuff they view as unnecessary to the task at hand.
If what I’m doing is complex (a subjective judgement) I ask it to lay out a plan for the intended code before starting, giving me a chance to give it a thumbs up or clarify its understanding of what I’m asking for if it’s plan is off base.
Step 2: Write out your description of the thing you want to the best of your ability but phrase it as "I would like X, could you please help me better define X by asking me a series of clarifying questions and probing areas of uncertainty."
Step 3: Once both Claude and you are satisfied that X is defined, say "Please go ahead and implement X."
Step 4a: If feature Y is incorrect, go to Step 2 and repeat the process for Y
Step 4b: If there is a bug, describe what happened and ask Claude to fix it.
That's the basics of it, should work most of the time.
dc: not a seasoned dev, with <b> and <h1> tags on "not".
They can't think for you. All intelligent thinking you have to do.
First, give them high level requirement that can be clarified into indented bullet points that looks like code. Or give them such list directly. Don't give them half-open questions usually favored by talented and autonomous individuals.
Then let them further decompress that pseudocode bullet points into code. They'll give you back code that resemble a digitized paper test answer. Fix obvious errors and you get a B grade compiling code.
They can't do non-conventional structures, Quake style performance optimized codes, realtime robotics, cooperative multithreading, etc., just good old it takes what it takes GUI app API and data manipulation codes.
For those use cases with these points in mind, it's a lot faster to let LLM generate tokens than typing `int this_mandatory_function_does_obvious (obvious *obvious){ ...` manually on a keyboard. That should arguably be a productivity boost in the sense that the user of LLM is effectively typing faster.
There's probably a lot that experience is contributing to the interaction as well, for example - knowing when the LLM has gone too far, focusing on what's important vs irrelevant to the task, modularising and refactoring code, testing etc
That's really interesting. What are the most important things you've learned to do with the LLMs to get better results? What do your problem descriptions look like? Are you going back and forth many times, or crafting an especially-high-quality initial prompt?
Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.