A human brain uses about 20 W of power.A single A100 GPU card uses 400 W of power, H100 uses 700 W.Rumor has it that GPT4 requires 128 such cards to run for one instance of the model. So a low estimate would be 50 KW to run inference on a current generation LLM.Why is this discrepancy? One possible explanation is as follows.
Biological brains do not compute using the standard von Neumann architecture flipping bits guided by a synchonous sequential instruction set.Brains use time as an important communication and computation resource. This utilisation of time goes beyond the simple clocking of a CPU or concurrency.Spiking neurons perform communication and processing in space-time, with emphasis on time. In these paradigms, time is used as a freely available resource for both communication and computation.Neuromorphic chips which could be used in bio-inspired neural netwroks are being researched. Here is one example:
Neuromorphic computing