The demand for intelligence is virtually infinite

tl;dr: - **The strong claim is the defensible one.** Demand for intelligence behaves as highly elastic and nowhere near satiation: unit prices for a fixed capability fall **~10–50x/year**, yet aggregate compute spend *explodes* — falling price is being more than offset by rising quantity. That's the empirical signature of near-infinite demand.[^1][^2] - **But "infinite demand" is three claims people blur.** (1) The unit price of a fixed capability collapses — true. (2) A specific H100's rental price rises — *cyclical*, not secular ([[H100 rental prices ($ per GPU-hour)|the 2026 shortage is real]], but Hopper still gets obsoleted by Blackwell). (3) Aggregate demand for intelligence is ~unbounded — true, and the load-bearing one. - **Infinite demand does not rescue owning the GPU.** The chip is reproducible (price-performance improves ~30–40%/yr[^3]). Demand routes past it to the *scarce inputs*: power, leading-edge fabs, CoWoS packaging, HBM — and ultimately energy and matter. Infinite demand for intelligence = infinite demand for the **scarce substrate**, not the melting ice cube. So this *agrees* with [[What's worth owning in the age of AGI? - scarce layers, not GPUs|Should you diversify into real estate in the age of AGI?]]'s reproducible-vs-scarce axis; it doesn't refute it. - **The uncomfortable counter:** Baumol / weakest-link economics says cheap intelligence becomes a *shrinking* share of GDP. "Infinite demand" in quantity is fully compatible with compute being a *bad investment*.[^4] - **The cosmic version** (robots → uploads → simulated worlds → tiling the universe with happy minds) is where demand most plausibly becomes physically unbounded — but it's also where "the price of compute" stops being well-defined, because compute becomes capital, labor, and consumption all at once. Nobody has the price theory for that economy yet. ## Three claims people blur together When you say "demand for intelligence is infinite, so GPU prices keep rising," you're stacking three separate claims. Pulling them apart is the whole game: 1. **The unit price of a fixed capability falls.** GPT-3-level quality cost **$60 per million tokens** in 2021; an equivalent model hit **$0.06** by late 2024 — ~1000x in three years, ~10x/year.[^1] Epoch finds the decline is benchmark-dependent and *accelerating*: 9x–900x/year, median ~50x, ~200x excluding pre-2024 data.[^2] **Strongly true.** 2. **A specific chip's rental price rises.** [[H100 rental prices ($ per GPU-hour)|H100 rentals]] V-shaped: ~$8/hr (2023) → $1.70 (Oct 2025) → $2.35 (Mar 2026). The recent +40% is a real shortage — but it's *cyclical capacity lag*, and the chip underneath is being obsoleted. **Cyclical, not secular.** 3. **Aggregate demand for intelligence is ~unbounded.** Total tokens, total compute, total spend — all climbing faster than price falls. **True, and the real claim.** The mistake is letting (3), which is right, smuggle in (2), which isn't. ## The strong claim is the defensible one No serious economist says demand is *literally* infinite — demand curves slope down. The defensible version: intelligence is a **general-purpose input** (in the class of electricity, the internet[^5]) whose demand is *currently highly elastic and far from satiation*. The clean evidence is **Jevons' paradox**: when DeepSeek crashed inference costs in Jan 2025, Satya Nadella's response was "Jevons paradox strikes again... use will skyrocket, turning it into a commodity we just can't get enough of."[^6] Efficiency *raises* total consumption. The honest caveat — and where I push back on the AI-Twitter version: Jevons is a **conditional, not a law**. It holds only when demand is elastic enough that the rebound *exceeds* 100% ("backfire"); in most markets rebound is partial.[^7] So invoking it for AI is a *bet that intelligence sits in the elastic regime*. So far the bet is winning (order-of-magnitude price drops met with more-than-order-of-magnitude usage growth), which is the best evidence we have that demand really is ~unbounded. But it's an empirical bet, not a theorem. ## Why infinite demand doesn't rescue the GPU Here's the part that cuts against the instinct to *buy chips*. Price is set by supply *and* demand. The supply side of compute is reproducible and deflationary: FLOP-per-dollar doubles roughly every ~2.5 years (~30–40%/yr improvement[^3]), and each generation obsoletes the last. So even with demand screaming, *a given H100* is a wasting asset. What stays scarce is one layer up: **power, leading-edge fabs (TSMC is effectively a frontier monopoly), CoWoS advanced packaging (sold out through 2026), HBM memory.**[^8] Infinite demand for intelligence doesn't bid up the reproducible chip for long — it bids up the **non-reproducible inputs**. Which is exactly the *reproducible-vs-scarce* axis from [[What's worth owning in the age of AGI? - scarce layers, not GPUs|Should you diversify into real estate in the age of AGI?]]. So my disagreement with that note is narrower than it first looks: I think it *understates* how large the intelligence market gets, but its investment conclusion — own the scarce substrate, not the GPU — survives my objection intact. Infinite demand is the *case for owning power and fabs*, not silicon that depreciates in 24 months. ## The uncomfortable counter: cheap things become a small share of GDP The strongest objection isn't to claim (2) — it's to the *investment relevance* of claim (3). **Baumol / weakest-link economics** (Chad Jones' O-ring model) says aggregate output is capped by whatever is *hardest to automate*, and that cheap, abundant inputs become a *shrinking* share of spending. Moore's Law made compute absurdly cheap — and IT's income share *fell*. Nordhaus tested the economy for "singularity" signatures and found it passed 2 of 7, implying 100+ years.[^4][^9] The disquieting implication: demand for intelligence can be infinite in *quantity* while compute becomes a *smaller* slice of the economy and a *losing* asset, because the value migrates to whatever stays scarce and essential. "Infinite demand" ≠ "good investment." ## The bull mechanism: self-replicating compute-capital The reason this might *not* play out as ordinary Baumol stagnation is the one genuinely new thing AI does to growth theory: **compute can produce more compute.** Capital becomes accumulable the way only labor used to be. Trammell & Korinek: fully automating production "so that machines can self-replicate" breaks the Kaldor facts and can raise the growth rate dramatically.[^10] Davidson's compute-centric takeoff and Epoch's GATE model put the engine explicitly in *chip production* — GATE implies an "optimal" 2025 AI investment of ~**$25 trillion**.[^11][^12] Christiano notes growth was *hyperbolic* (super-exponential) until the demographic transition capped population ~1960; remove the population cap with copyable AI workers and growth can go hyperbolic again.[^13] This is the same self-replicating-capital idea as [[AI creates deflationary abundance and inflationary capital demand simultaneously]] — and the whole debate reduces to **one parameter**: the elasticity of substitution between cheap intelligence and whatever stays scarce. High → explosive growth, compute approaches the whole economy. Low → Baumol bottleneck, compute's share falls. Reasonable economists disagree on that number. ## The cosmic limit: where demand really is unbounded — and where price breaks Your ladder is the right intuition for *why* (3) might have no ceiling, and the compute intensity climbs at every rung: - **Text** — token-cheap, episodic (a query only runs when prompted). - **World models for robots & cars** — continuous inference, 24/7. NVIDIA's whole "physical AI" pitch is that embodied agents "burn through even more infrastructure" than chatbots; a Tesla runs ~144 TOPS *per car, every frame*, versus an intermittent chat.[^14] This is a real step-change, even if Huang's "100x / 10,000x" figures are marketing. - **An economy run by AI and robots** — compute as the substrate of *production itself*. - **Uploads** — Robin Hanson's *Age of Em*: copyable minds drive wages to the cost of the hardware that runs them, so **the price of compute literally *is* the subsistence wage**, and the population bottleneck vanishes (the economy could double monthly).[^15] - **Simulated worlds, then maximizing positive experience** — the utilitarian terminus where compute *is* conscious wellbeing. The physics permits a staggering amount: Bostrom's "astronomical waste" (~10^38 potential lives lost per century of delay), bounded by Lloyd's ultimate limits (~10^51 ops/sec/kg) and realized as Jupiter/Matrioshka brains — ultimately "hedonium" tiling the accessible universe.[^16] At that limit, demand for computation is bounded only by physics — effectively infinite relative to today. **But two honest caveats, and they're the same two as before:** 1. **The binding scarcity becomes energy and matter** — the cosmic endowment itself. Even when the universe is GPUs, the GPU isn't the scarce thing; the *negentropy* running it is. The investment corollary still points at the substrate. 2. **The numeraire breaks.** When compute is simultaneously the capital, the labor, *and* the consumption good, "the price of compute" stops being well-defined — you can't price compute *in* compute. Karnofsky's "This Can't Go On" shows a 2%/yr economy needs more than one present-world-economy *per atom* of the galaxy within ~8,200 years, so *something* gives.[^13] Nobody has written the price theory for a compute-denominated economy. That's a real open problem, not a gap in my research. ## So what's the headline? Keep it. **"The demand for intelligence is virtually infinite" is true and load-bearing** — it's the half of your instinct that the [[What's worth owning in the age of AGI? - scarce layers, not GPUs|GPU skeptic note]] underweights. But the corollary you were reaching for — *"so GPU prices rise, so own GPUs"* — doesn't follow. Infinite demand prices the **scarce substrate** (power, fabs, energy, matter), routes *around* the reproducible chip, and is even compatible (via Baumol) with compute being a falling share of GDP. The cleanest one-liner: **demand for intelligence may be infinite, but it accrues to whatever can't be copied.** --- *Footnotes* [^1]: Guido Appenzeller (a16z), ["Welcome to LLMflation"](https://a16z.com/llmflation-llm-inference-cost/) (Nov 2024): "for an LLM of equivalent performance, the cost is decreasing by 10x every year." GPT-3 ($60/Mtok, 2021) → Llama 3.2 3B ($0.06/Mtok, 2024) at the same benchmark. [^2]: Epoch AI, ["LLM inference prices have fallen rapidly but unequally across tasks"](https://epoch.ai/data-insights/llm-inference-price-trends): per-fixed-capability decline of 9x–900x/yr, median ~50x; ~200x excluding pre-Jan-2024 data (accelerating). GPT-4-level science ~40x/yr. [^3]: Epoch AI, ["Trends in GPU price-performance"](https://epoch.ai/blog/trends-in-gpu-price-performance): FLOP/s-per-dollar doubles ~every 2.5 years (~30–40%/yr for ML accelerators). [^4]: Chad Jones' weakest-link / O-ring model: output is capped by the hardest-to-automate task, and cheap inputs *fall* as a budget share (as IT did despite Moore's Law). ["A.I. and Our Economic Future"](https://www.nber.org/system/files/working_papers/w34779/w34779.pdf) (NBER w34779, 2026); Aghion, Jones & Jones, ["Artificial Intelligence and Economic Growth"](https://www.nber.org/papers/w23928) (NBER w23928). [^5]: AI as a general-purpose technology requiring complementary investment: Brynjolfsson, Rock & Syverson, ["Artificial Intelligence and the Modern Productivity Paradox"](https://www.nber.org/system/files/working_papers/w24001/w24001.pdf) (NBER w24001). [^6]: [Satya Nadella, X, Jan 27 2025](https://x.com/satyanadella/status/1883753899255046301), responding to the DeepSeek R1 cost shock. Origin: W.S. Jevons, *The Coal Question* (1865) — more efficient steam engines *raised* total coal demand. [^7]: The paradox requires rebound >100% ("backfire," the Khazzoom–Brookes postulate, formalized by Harry Saunders); empirical rebound is usually partial. On loose invocation in AI specifically: Varoquaux et al., ["From Efficiency Gains to Rebound Effects"](https://arxiv.org/abs/2501.16548) (arXiv 2501.16548, 2025). [^8]: TSMC: CoWoS capacity "sold out through 2025 and into 2026," NVIDIA holding 70%+ of CoWoS-L. Binding constraints are advanced packaging, HBM, leading-edge (2–3nm) fab capacity, and ultimately power. [Introl on CoWoS](https://introl.com/blog/cowos-advanced-packaging-chip-architecture-data-center-2025). [^9]: William Nordhaus, ["Are We Approaching an Economic Singularity?"](https://www.nber.org/papers/w21547) (NBER w21547; *AEJ: Macro* 2021): the economy passed 2 of 7 "singularity" tests, implying 100+ years — "the Singularity is not near." [^10]: Philip Trammell & Anton Korinek, ["Economic Growth under Transformative AI"](https://www.nber.org/papers/w31815) (NBER w31815, 2023): fully automating production "so that machines can self-replicate" breaks the Kaldor facts; outcomes hinge on the elasticity of substitution and resource bottlenecks. [^11]: Tom Davidson / Open Philanthropy, [compute-centric takeoff framework](https://www.governance.ai/post/tom-davidson-compute-centric-framework-takeoff-speeds) (2023): AI capability as a function of training compute + algorithmic progress, compounding once AI automates AI R&D. [^12]: Erdil & Besiroglu (Epoch), ["Explosive growth from AI automation: a review"](https://arxiv.org/abs/2309.11690) (arXiv 2309.11690); Epoch's [GATE model](https://epoch.ai/gradient-updates/ai-and-explosive-growth-redux), where chip production is the growth engine and implied optimal 2025 AI investment is ~$25T. [^13]: Paul Christiano, ["Hyperbolic growth"](https://sideways-view.com/2017/10/04/hyperbolic-growth/) (2017) — growth was super-exponential until population capped it ~1960; copyable AI labor removes the cap. Holden Karnofsky, ["This Can't Go On"](https://www.cold-takes.com/this-cant-go-on/) (2021) — a 2%/yr economy needs >1 present-world-economy per galactic atom within ~8,200 years. [^14]: Jensen Huang's "physical AI" framing (NVIDIA GTC 2025 / CES 2026) — embodied agents and reasoning need "easily 100 times more" compute; treat the 100x/10,000x figures as NVIDIA marketing. Hard anchor: Tesla's in-car inference runs ~144 TOPS continuously per vehicle. [Fortune, Jan 2026](https://fortune.com/2026/01/06/nvidia-jensen-huang-chatgpt-moment-for-robotics/). [^15]: Robin Hanson, *The Age of Em* (Oxford, 2016): copyable emulated minds drive wages to the marginal cost of hardware, so compute cost *is* the subsistence wage; population bottleneck removed → monthly economic doublings. [ageofem.com](https://ageofem.com/). [^16]: Nick Bostrom, ["Astronomical Waste"](https://nickbostrom.com/papers/astronomical-waste/) (*Utilitas* 2003) — ~10^38 potential lives lost per century of delay. Physical bound: Seth Lloyd, ["Ultimate physical limits to computation"](https://www.nature.com/articles/35023282) (*Nature* 2000) — ~10^51 ops/sec/kg. Engineering: Anders Sandberg on Jupiter brains; the utilitarian terminus is "hedonium" (Carl Shulman) tiling the accessible universe. *Research* *Is the elasticity of substitution between cheap intelligence and the scarce inputs (power, fabs, matter) above or below the threshold for explosive growth? Everything downstream — whether compute's GDP share rises or falls — turns on this one number.* - Carl Shulman on the 80,000 Hours podcast (#191, also on Dwarkesh) — compute as fungible with labor; fabs as the real bottleneck. - [Trammell & Korinek, *Economic Growth under Transformative AI*](https://www.nber.org/papers/w31815) — outcomes hinge on the elasticity of substitution and resource bottlenecks. - [Epoch's GATE model](https://epoch.ai/gradient-updates/ai-and-explosive-growth-redux) — chip production as the growth engine; implied optimal 2025 AI investment ~$25T. - [Chad Jones, *A.I. and Our Economic Future*](https://www.nber.org/system/files/working_papers/w34779/w34779.pdf) (NBER w34779) — the weakest-link counter: cheap inputs *fall* as a budget share.