(Part 2 can be found here)
(Part 1 can be found here)
One thing you never saw in any of the movies of the Terminator franchise (featuring some of the most menacing villains in all of sci-fi) was any of the various models having to stop for a recharge. I can’t really blame James Cameron for that. How cinematically compelling would it have been had the Cyberdyne Systems Model 101 portrayed by Arnold Scwharzenegger had to spend a significant amount of downtime recharging his battery? Yet, if we were seeking a realistic portrayal of such a machine, it would have had to spend most of its time recharging. And escaping the Terminator? Keep running, because his battery is going to be dead in very short order.
All of this is another way of saying that AI is a resource hog. It is a voracious consumer of power and compute resources like the world has never seen. The U.S. federal government looks positively judicious with taxpayer funds when compared to the way AI consumes resources. However, what we call AI is bumping up against some hard physical limits, limits which present a Mt. Everest-sized obstacle to scaling.
A Compute Hog:
When a computer runs a program, it executes instructions, and in particular, machine level instructions, most often generated by a compiler that translates high-level language code into something it can understand. The programs you run day-to-day, on your PC, your laptop, that computer you carry in your pocket called a “phone” can run programs that consume billions of processor cycles, where a cycle is the execution of an instruction. But those software programs don’t even scratch the surface of what modern AI consumes.
Each of the tokens we mentioned in Part 2 places demands on a processor. How much? A prompt to an LLM that generates about 100 tokens in Open AI’s GPT-4 model (the latest model is GPT-5 now) can consume between 50-100 teraflops. “Flops” in this context are floating-point operations per second, where floating-point is a type of data computer systems work with (basically a number that includes a mantissa and an exponent, digitally represented). Tera means a trillion. Trillion. Also keep in mind that a prompt to an LLM includes two phases – a prefill phase (where the text you entered is broken down into tokens) and a decode phase (where the LLM generates tokens in response to your prompt). So, for a relatively small prompt-and-answer, an LLM can consume between 50 and 100 trillion execution cycles. Now consider longer conversations with an LLM. These can easily run into the thousands of teraflops or more.
Because of the astronomical amount of computing power AI workloads consume, the heavy lifting is done in data centers having the requisite amount. Modern data centers include row upon row of servers, each with a number of GPUs. As an aside, “GPU” stands for graphics processing unit, and while such processors were originally designed for graphics workloads, they are massively parallel and thus particularly well-suited for AI workloads. Some computers that process AI workloads also use a more specialized chip called tensor processing units, or TPUs (which unlike GPUs, were specifically designed for AI workloads). In addition to all the GPUs/TPUs, each server also includes a large amount of memory, the capacity of which is measured in terabytes.
In a sense, we’ve come full circle with computing. Up until the 1970’s, we used to think of computers as room-sized behemoths, which they were. That was the amount of space required to run the computing workloads of the time. It was the advent of microprocessors and Moore’s Law (which is now deader than Francisco Franco) that started to shrink the size of computers down to something you can put on your desk or even carry in your pocket. But now, with AI workloads, we are back up to gargantuan sizes again, with whole data centers that dwarf the large computers of yesteryear. And we’re there because that’s the kind of space required to implement computing setups that can run compute-hogging AI.
A Power Hog:
It doesn’t take a leap of imagination to realize that the requirement of that much computing power necessitates the consumption of a lot of electrical power. But how much is a lot? For this part, I turned to AI itself to tell me how much power it might use, and lacking any sense of modesty, it spit the answer right out. It gave me the assumption of 750 giga-flops per token (750 billion instructions executed using floating-point data), with about 0.0001 kWh (kilowatt-hours) per token based on typical GPU/TPU energy usage (doesn’t sound like much, so far, does it? You just wait …). The number of flops and the energy consumed scale linearly with token count. Thus, a query that produces 1000 tokens would use, under this scenario, 0.01 kWh. Moving the decimal place a couple spots to the right, that’s 10 Wh – i.e. enough energy to power a 10-watt LED bulb for an entire hour. That’s for one very small conversation (compare that to what a human brain can do in an hour, running on about 14 Watts of power).
It’s not hard to see how some AI conversations use more power than Clark Griswold’s Christmas lights.
And yet, we’re not done. So far, we’ve only talked about the energy consumed by the computers themselves. Thanks to the Laws of Why We Can’t Have Nice Things (sometimes referred to as “the Laws of Thermodynamics”), using that much compute power and thus that much electricity means a lot of excess heat is generated. Something must be done about that heat, otherwise the computers in these data centers won’t run long before all the electronics are fried like a chicken in the kitchen of your local KFC (btw, Original Recipe >> Extra Krispy).
We need to bring in cooling water, and lots of it. That requires pumps to move the water in and then to move it out. Some data centers also utilize large refrigerant systems to circulate cool air as well. There has been some improvement on this front. Old data centers had about 30-40% energy overhead for cooling, while newer data centers have about 10-20% overhead. Nevertheless, that’s still a lot of energy.
A recent story serves as an illustrative anecdote regarding AI energy consumption. The story, linked here, refers to a planned AI data center for the state of Wyoming, one that will consume five times the amount of electricity as all the residents of Wyoming combined. Not merely more energy, but five times more. Not merely a few residents, but all residents of the state.
All that physical space, all that compute power, and all that energy, and yet this AI is still not intelligent, it still can’t think, and requires multiple orders of magnitude more energy to accomplish many of the same things humans can do. Sure, it’s particularly well-suited for computational mathematics, more so than humans, but that’s not thinking, that’s just number crunching. And of course, it took humans to design computers to be good at such things – humans that have, in their own skulls, a brain that can do amazing things running on about a mere 14 watts of power (or, in an hour, 14 Wh). And with that 14 Wh, we have consciousness and true intelligence.
The Wall:
Above, I wrote that AI faces a Mt. Everest-sized obstacle to scaling. But more accurately, AI is racing head on into a wall, one that will kill scaling.
Let’s return to Moore’s Law, which was mentioned above. The idea behind Moore’s Law was the product of Intel’s Gordon Moore, who postulated that the number of transistors on a given unit area of silicon would double every 18 months. And for decades, that was true. It’s because of Moore’s Law that you can carry in your pocket computing power, run off a battery, that is equivalent to a room-sized computer of the 1970’s. But you can only get so small (sorry, Steve Martin).
When transistor feature sizes were in the thousands, then the hundreds, and even the tens of nanometers, the progress of packing more functionality onto the same chip area marched onward, largely unabated. But on the most advanced chips now – such as the GPUs/TPUs that run AI workloads – the smallest features sizes are in the single-digit nanometers. You know what else has a size measured in single-digit nanometers? Atoms. Yes, atoms, the fundamental building block of all matter. And you know what that means? It means you have run into yet another wall. You are not going to build transistors smaller than atoms. That is a hard, non-negotiable physical limitation. And that means the end of Moore’s Law.
Furthermore, the top clock speeds for chips haven’t increased for about 15 years now. The maximum speed at which an execution unit in one of these chips can execute instructions is therefor also facing a hard limit due to the material properties of the silicon upon which such chips are fabricated.
Now you will still get some denialists saying Moore’s Law is not dead, and they will point to chips where vertical stacking is conducted, but that’s not packing more transistors into a given area, that’s just using the vertical dimension to create more area. Moore’s Law only works if individual transistors themselves can get smaller, and with the smallest feature sizes bumping up against atomic dimensions, that is no longer possible. Moore’s Law has been dead for at least a decade.
The denialists might also opine that there is some other technology on the horizon that will transcend the limitations placed on transistor sizes, while remaining vague about what those technologies are. Some might cite different materials for chipmaking. But most of these materials have some sort of fatal flaw. Take for example graphene – the wonder material that is effectively a flat sheet of carbon atoms. Graphene has been used to make transistors in laboratories, and those transistors can operate at significantly higher clock speeds (at least an order of magnitude more) while having much better properties than silicon regarding heat dissipation. But there is a huge problem – graphene lacks something known as a bandgap. Without getting into device physics, we’ll simplify thing by saying the lack of a bandgap means that such a transistor can never fully turn off, thereby making it useless for functioning as a switch, and therefore useless as the basis for a digital computer.
Analog computing is another technology championed by some. And while it can be very useful in certain applications (as it can almost instantaneously do large matrix multiplications that hog computer cycles in the digital domain), it nevertheless suffers from the limitations from which all analog circuits suffer. Analog circuits are more susceptible to noise, error cascading, and lack the necessary precision for many workloads. Analog computing circuits are also much larger than the digital circuits of the GPUs/TPUs.
Quantum computers are the great hope for some, but we are a long way from a practical quantum computer. Meanwhile, they are currently very error prone, of limited stability, and require cryogenic cooling (meaning hundreds of degrees below zero, and that’s true whether you are talking in Fahrenheit or Celsius). There are questions as whether they could provide any advantage over the current computing paradigm for many workloads. Most of the promise is in specialized workloads, but until we get practical, reliable quantum computers, we can do no more than speculate.
So the upshot of the above is that AI as we know it has, due to the various physical limits discussed above, has ran head-long into a wall. However, that wall is imposed by physical limits. We haven’t talked about financial limits yet. If you think AI is a compute hog and a power hog, wait until you find out how much of a money hog it is. The U.S. government has nothing on AI when it comes to burning through cash.







You must be logged in to post a comment.