Hardware
GPU
Also known as: Graphics Processing Unit, Graphics Card, Video Card
A processor with thousands of small cores optimized for parallel computation. Originally designed for graphics, GPUs now drive AI inference and training, and any workload requiring massive parallel processing.
GPUs were designed to render video frames — transforming millions of pixels using the same geometric operations simultaneously. That parallel architecture turned out to be exactly what AI needs: performing the same mathematical operation (matrix multiplication) across billions of numbers at once.
The result is that NVIDIA became the defining hardware company of the AI era. Training a frontier model requires thousands of high-end GPUs running for weeks. Running a model for inference (generating responses) is less intensive but still GPU-heavy at scale.
GPU vs CPU
A CPU has a small number of powerful cores (typically 4-64) optimized for complex, sequential tasks — running an OS, executing varied application logic, handling I/O. A GPU has thousands of simpler cores optimized for doing one thing across massive data in parallel. They're complementary: CPUs handle general workloads; GPUs accelerate specific parallel workloads.
For AI specifically
For local AI inference — running models on your own hardware rather than through a cloud API — GPU memory (VRAM) is the primary constraint. A 7B-parameter model requires ~14GB of VRAM in fp16 format. A 70B model needs ~140GB. Consumer GPUs with 16-24GB can run capable open-source models. Workstation GPUs with 48GB (like the NVIDIA RTX 6000 Ada) handle larger models.
For organizations evaluating on-premise AI deployment, the practical question is: what model size do you need, and does the VRAM requirement fit a financially reasonable GPU? See the Understanding AI article for a deeper treatment of local inference hardware.
Integrated vs discrete
Integrated GPUs are built into the CPU — adequate for office work and basic display output, inadequate for AI inference or serious graphics workloads. Discrete GPUs are separate add-in cards with their own dedicated VRAM and are required for any meaningful GPU-accelerated workload.