Trading performance for productivity with LLMs
Andrej Karpathy has a brilliant new video. He divides coding into pre-LLM and post-LLM paradigms (with a third middle category referring to specialist neural networks like ImageNet; LLMs are reprogrammable neural networks in his ontology).
Over time, post-LLM code will start to eat pre-LLM code. The universal programming language is English. In a sense, the actual code that's output isn't relevant - the architect can just instruct the machine what to build in English; the programming language or framework or design patterns that get used are abstracted away as an implementation detail.
This reminds me of the move from compiled to interpreted programming languages. Traditionally, code was written in a language like C and then compiled. C gives you fine-grained control over memory allocation, pointer arithmetic, and system calls - but needs to be wielded carefully to avoid segfaults, memory leaks, and buffer overflows. Writing robust C code demands understanding the computer's internals, managing memory manually, and dealing with how machines perform at a low level. Over time, interpreted languages like Ruby and Python appeared, which hid a lot of the gnarly details from programmers - no longer do you need to reallocate an array because you added too many items to it, or free memory from variables that were no longer in scope - and these languages allowed for much larger systems to be built by many more people.
Ruby and Python are much less performant than C. There's substantial overhead in garbage collection, dynamic typing, and runtime interpretation compared to compiled machine code. But they trade off performance for programmer productivity and happiness, and that's worth a lot - as compute became cheap, these languages became hugely popular. The rate-limiting step wasn't running code but writing it.
There's another, similar parallel with computer hardware. CPUs are general purpose processors which do a lot of things well, and most code that gets written will end up being run on them. But for more specialist uses where performance and speed are critical, FPGAs or ASICs are used. FPGAs let the user reconfigure a processor for the task at hand; ASICs are specialised chips that are optimised for specific operations (one use case is mining Bitcoin, where the economic incentives to more efficiently turn electricity into hashes pushed the miner market in the direction of specialised hardware. GPUs and CPUs can no longer compete).
A similar thing is happening with LLMs. Say you're writing an algorithm to score candidates for a job. You can write some code which parses an applicant's resume, extracts some key data points, and then does some numerical analysis to spit out a final score. Or you could give your scoresheet to an LLM and ask it to do it for you. Both are essentially implemented as 'functions' - they take some input (a resume) and give some output (a score). Whether they run as code or as inference is something of an implementation detail.
Of course, doing this on an LLM is much more computationally expensive than running it as code. For the example given, it's probably just going to be simpler to write the code and run it (or get an LLM to write it for you!). But a fine-tuned LLM is likely to be able to give more nuanced scores, even working with the same score sheet, much as a human reviewer would probably do a better job than the algorithm working alone. And perhaps inference costs will get so low that there will be no real advantage to writing code to do this job over having the LLM do it for you.