This is another follow-up to the original video about the AI-powered shell. This time, I compare different GPT-style language models on the task of generating bash commands: GPT-3, GPT-J-6B, and OpenAI Codex.
TL;DR: OpenAI's GPT-3 model ("davinci") still does the best, even though Codex was trained on code. One plausible explanation is that the Linux tutorials included in GPT-3's diverse training set is a better match for what's being asked for than Github code, which is mostly other programming languages anyways.
GPT-J-6B, being "merely" 6-billion parameters large, doesn't do very well at all, but the key thing that it has going for it is that it's free to download and run yourself. And people have! But given the GPU memory needed to run it well, it's better to use a hosted model via an API. One such API is Grand (usegrand.com) - no affiliation, endorsement, or sponsorship. I'm sure there are other services that provide the exact same thing.
One other important point touched on in the video is "prompt engineering" - the work of finding the right initial text that gets the best results from the language model. It's like figuring out the right incantation for a spell: use the wrong words, and you'll get maybe decent but not great results. Find the right words, and you get might get magic. Unfortunately, there's no way to really tell whether you have the best words or not, and there's hardly any insight into the decisions that the AI makes to help guide you on designing better prompts.
For the record, here's the prompt given to Codex, which does not include examples:
# Linux bash one-liner
# command goes here
The $ at the end is supposed to suggest the bash prompt, although you'll notice that in some examples Codex turns the $ into $> which results in an extraneous > at the beginning of the prompt.
Download links and instructions for this tool are on the original page.