Understanding AI: From Hardware to Agents

CPUs, GPUs, LLMs, and the tools that connect them — a practical guide to how AI works and how to use it to automate real work.

The hardware that makes AI run

CPUs vs GPUs

A CPU (Central Processing Unit) is the general-purpose brain of a computer. Modern CPUs have anywhere from 8 to 64 cores, each capable of handling complex, varied tasks in sequence. They're fast, flexible, and designed to do the enormous variety of things a computer needs to do: run the operating system, manage network connections, handle file operations, execute business logic. CPUs are generalists.

AI training and inference require something fundamentally different. Training a language model is, at its core, a problem of matrix multiplication — performing the same relatively simple mathematical operation across billions of numbers, simultaneously, billions of times. CPUs handle this poorly. They were architected for breadth and complexity, not for doing one thing at massive, uniform scale.

A GPU (Graphics Processing Unit) was designed to render video frames — which requires transforming millions of pixels simultaneously using the same geometric operations. The architecture that makes GPUs excellent at graphics makes them extraordinarily well-suited for AI: thousands of simpler cores executing the same instruction in parallel, at scale.

This parallel architecture is why NVIDIA became the defining hardware company of the AI era. Their CUDA software platform gave researchers a practical way to harness GPU parallelism for neural network training years before competitors had comparable stacks. Training a frontier model like GPT-4 or Claude requires thousands of high-end GPUs running continuously for weeks. Running a model to generate responses (inference) is less intensive but still GPU-heavy at scale.

For a genuinely fascinating engineering deep dive into how modern GPUs are designed and why the architecture works the way it does, this video is one of the best we've seen.

For organizations considering local AI deployment, the GPU question becomes practical. A consumer GPU with 16GB of VRAM can run capable open-source models locally for private data workflows. A workstation GPU with 48GB handles significantly larger models. The economics of local inference are improving rapidly.

RAM

RAM is where a model lives during operation. Running a language model requires loading its weights — the billions of parameters encoding its learned patterns — into memory where the GPU can access them quickly. A 7-billion-parameter model in compressed format requires roughly 14GB of RAM. A 70-billion-parameter model needs roughly 140GB. RAM capacity determines which models you can run and how many simultaneous sessions you can serve.

For cloud-based AI — Claude.ai, ChatGPT.com, Gemini — this is entirely invisible. The provider handles it. For local model deployment, it becomes a hardware planning constraint worth understanding.

Storage

SSDs matter for AI primarily at model load time — moving weights from disk into memory quickly — and for storing model files. For most organizational AI use through cloud APIs, storage is not a meaningful variable. For local deployments serving multiple users, it becomes one.

What a large language model actually is

An LLM is a mathematical model trained to predict what text should come next, given a sequence of input text. That sounds deceptively simple. The discovery that made modern AI possible was that a model trained with sufficient data, at sufficient scale, to predict text extremely well also develops capabilities that weren't explicitly trained — the ability to follow logical chains, write and debug code, translate languages, answer questions it was never directly shown, and generalize to novel problems.

The training process: feed the model enormous quantities of text, have it predict the next token, compare that prediction to the actual next token, and adjust the model's internal parameters slightly to improve the prediction. Repeat across trillions of tokens. The parameters that emerge encode something that functions like understanding and reasoning — even though the training objective was simply "predict the next word."

Tokens are the units LLMs process — roughly three-quarters of a word on average. "Electromagnetic" is one token. Every input you give a model and everything it generates are measured in tokens. Token counts affect cost, processing time, and the model's ability to handle long documents in a single session.

Context window is the amount of text a model can hold in working memory at once — your full conversation history, any documents you've shared, and the current exchange. Early models had context windows of a few thousand tokens, roughly a few pages of text. Current Claude models measure context in hundreds of thousands of tokens — large enough for an entire codebase, a lengthy legal document, or hours of conversation history. You can give a model a large, complex document and ask questions about any part of it without summarizing or chunking it first.

System prompts are instructions given to a model before a conversation begins, setting its behavior and constraints. When you use Claude.ai, Anthropic provides one. When an organization builds a product on the Claude API, they write their own. The system prompt is what turns a general-purpose model into a purpose-specific tool.

ChatGPT, Gemini, and Claude

These three represent the current leading conversational AI platforms. They are all genuinely capable. The differences are real and worth understanding.

ChatGPT (OpenAI)

The model that introduced most people to modern AI. ChatGPT has the broadest public recognition, the most extensive plugin ecosystem, and strong general-purpose capability across writing, analysis, coding, and conversation. GPT-4o handles text, images, audio, and multimodal tasks.

Where it excels: general writing and communication, creative generation, breadth of integrations, and familiarity as an entry point for teams new to AI.

For organizations committed to Microsoft, Copilot for Microsoft 365 is often the more relevant recommendation — it builds GPT-4 directly into Word, Excel, Teams, and Outlook. If your team already lives in those tools, the integration frequently matters more than which underlying model is technically ahead.

Gemini (Google)

Google's model has a natural home in Google Workspace. Gemini integrated natively into Gmail, Docs, Sheets, and Drive gives the AI access to your organizational data without additional integration work. For a Google Workspace-first organization, that native access often outweighs differences in raw model capability.

Where it excels: Google Workspace integration, multimodal tasks, organizations where data already lives in Google's ecosystem. For Workspace shops, Gemini for tasks involving Drive and Gmail is frequently the pragmatic recommendation.

Claude (Anthropic)

Claude is the model we use daily — for our own operations, for client work, and for everything on this site. Our preference is earned rather than assumed.

In our experience across all three platforms, Claude is the most capable for complex reasoning, long-document analysis, code generation, and — most importantly — agentic use cases where the model is taking real actions in the world rather than just generating text. MCP, the protocol connecting AI models to external systems, was developed by Anthropic and is most mature in the Claude ecosystem.

Anthropic was founded by former OpenAI researchers with a specific focus on AI safety. That focus is visible in Claude's behavior in ways that matter for production deployments: it's more calibrated about uncertainty, less prone to confident confabulation, and more thoughtful about the downstream consequences of actions when operating autonomously.

Where it excels: complex reasoning, code generation and debugging, long-context tasks, autonomous agent workflows, and safety-conscious production deployments.

The honest position: use multiple models for what each does best. ChatGPT or Copilot for Outlook drafting in Microsoft environments. Gemini for Drive and Gmail. Claude for complex reasoning, code, or anything autonomous. These are not in competition for your exclusive loyalty, and treating them as such leaves capability on the table.

The tools

Chat interfaces

Claude.ai, ChatGPT.com, and Gemini.google.com are the most accessible entry point — web-based conversational interfaces requiring no setup.

The ceiling: chat interfaces are conversational. The model can tell you what to do but cannot do it for you. It can draft a provisioning script but not run it. For tasks that require taking action in real systems, the interface alone is insufficient.

Claude for Chrome

Claude for Chrome is a browser extension that gives Claude visibility into what's on your screen and the ability to interact with web pages — filling forms, clicking buttons, navigating between pages, reading content without manual copying.

The significance: web-based tools that expose no API become automatable. Any admin console a human can navigate through a browser, Claude for Chrome can navigate as well. This closes the gap between "systems with proper API integrations" and "everything else that has a web interface."

Claude Code

Claude Code runs in your terminal with access to your local filesystem, the ability to execute shell commands, and the ability to read, write, and modify code files. It's designed for software development but applies to any task involving files and system operations.

The model can read an entire codebase, understand its structure, implement requested changes across multiple files, run the tests, and iterate on errors — without being walked through the code manually. The RSystems website was built this way.

IDE integrations

GitHub Copilot, the Claude extension for VS Code, Cursor, and similar tools bring AI directly into the development environment. The model has context of your open file, project structure, and recent edits — providing suggestions inline and answering questions about the code without leaving the editor.

For active development, IDE integration is often more practical than Claude Code for ongoing work — ambient assistance woven into an existing workflow rather than a separate agent session.

MCP: The bridge between models and systems

MCP (Model Context Protocol) is an open standard, developed by Anthropic, for connecting AI models to external tools and systems. Rather than every AI integration requiring custom code, MCP provides a common interface: the model issues a structured tool call, the MCP server executes it against the real system, and returns the result.

The practical effect: any platform that publishes an MCP server becomes something any compatible model can operate. JumpCloud, Slack, Google Drive, GitHub, HubSpot, and dozens of other platforms have MCP servers. Each one gives the model hands into that system — the ability to read data and take real actions.

The limitation: official MCP servers expose a curated subset of each platform's API. The JumpCloud MCP has roughly a dozen tools. The JumpCloud API has hundreds of endpoints. For production automation, that gap matters — which is why custom MCP development is a meaningful part of making agents genuinely capable rather than merely promising.

How a tool call works: when an AI model is working through a task, it can recognize when it needs capabilities outside its training. It issues a structured tool call — essentially, "call this function with these parameters." The MCP server receives it, executes the corresponding API call, and returns the result. The model incorporates that result and continues to the next step.

A single onboarding task might involve twenty sequential tool calls — check if the user exists, create the account, assign group membership, set device policy, send the welcome message, create the shared folder, log the completion. Each one is a roundtrip through an MCP server to a live system.

Putting it together

The real capability emerges from combining the tools. A chat interface or system prompt defines the task. MCP servers give the model access to the platforms involved. Claude for Chrome handles anything requiring browser interaction. Claude Code handles anything requiring filesystem access or command execution.

A brief example: new hire onboarding

A new employee completes an onboarding form. The response appears in a Google Sheet an MCP server monitors. Claude reads the new hire's details and works through the onboarding sequence:

Via JumpCloud MCP: creates the user account, assigns directory group membership, applies the device enrollment policy, triggers the invitation email. Via Slack MCP: posts a welcome message and adds them to department channels. Via Google Drive MCP: creates their home folder and shares the onboarding document.

Most of the process is done. But the organization uses Adobe Creative Cloud for Teams — not enterprise — and Adobe's admin console has no MCP server and no accessible API for this workflow. The only path is the web interface at adminconsole.adobe.com.

Claude for Chrome picks up here: navigates to the console, finds the product license screen, adds the new user, and confirms the assignment.

The entire sequence — across JumpCloud, Slack, Drive, and Adobe — completes without an administrator touching a single panel. MCP handles every system with a proper integration. Claude for Chrome handles the rest. Together they cover nearly any web-accessible platform.

Where to start

The most effective starting point is usually the simplest: one tool, configured properly, deployed to a team that will use it, against one workflow that saves real and measurable time.

For most organizations that means a properly configured Claude Teams or Enterprise account with SSO, followed by identifying the first workflow where AI genuinely changes how the work gets done — not sounds impressive, but actually changes it.

The more sophisticated stack — custom MCP servers, agent identity architecture, audit infrastructure — follows naturally once the foundational tooling is in place and teams understand what these models can actually do.

If you're not sure where to start, that's the most common situation. Get in touch and we'll figure out what makes sense for your organization.