Combine Claude Code with the Google Workspace CLI and Gemini multimodal embeddings to build RAG systems that understand both text and images, and to give Claude Code direct access to Gmail, Drive, Docs, and Calendar.
This lesson covers two distinct but complementary upgrades:
Together, these turn Claude Code into a system that can read your entire digital workspace -- not just code files.
The Google Workspace CLI (gws) is an open-source tool by Google that exposes all major Workspace services as a command-line interface. When connected to Claude Code, it gives the agent the ability to:
Before this integration existed, connecting Claude Code to Google Workspace required complex OAuth flows and custom API wrappers. The GWS CLI handles all of that. One setup, permanent access.
Consider what becomes possible when Claude Code can read your inbox:
- Triage unread emails and surface the ones requiring decisions
- Draft replies based on previous conversation context
- Extract data from received reports and update a tracking sheet
- Create calendar events from meeting requests in email
Or when it can read your Drive:
- Summarize documents without manual copy-paste
- Cross-reference multiple documents to answer questions
- Generate reports by pulling data from multiple Sheets tabs
- Convert a transcript (YouTube video) into a formatted Google Doc
gws auth login -s calendar,gmail,driveAfter setup, Claude Code can call gws commands as part of any workflow:
gws gmail users messages list --params '{"userId":"me","q":"is:unread","maxResults":10}'
One compelling demo: paste a YouTube URL into Claude Code and say "convert this video into a formatted Google Doc." Claude Code extracts the transcript, structures it with headings and sections, and writes the result directly to a new Google Doc in your Drive -- formatted, clean, and shareable in under a minute.
The same pattern works for any content-to-Doc pipeline: meeting notes, blog drafts, SOPs from conversations.
Standard RAG systems work well for text-only documents. But real-world documents -- product manuals, compliance guides, research papers, client presentations -- contain charts, diagrams, screenshots, and tables that carry crucial information.
A vacuum cleaner manual might explain a function in text on page 3 and illustrate it with a diagram on page 4. A text-only embedding system sees them as separate, unrelated content. A multimodal system understands the relationship.
Google released Gemini Embedding 2 as a multimodal embedding model that can process both text and images simultaneously. When you embed a PDF page with this model, it captures the visual content (charts, diagrams, photos) alongside the text.
This means a RAG system built on Gemini Embedding 2 can answer questions like:
- "What does the roofing style in the photo on page 7 cost per square foot?"
- "Explain the flow diagram from the compliance document"
- "What does the highlighted section in this screenshot mean?"
The basic architecture:
The key insight: by treating each PDF page as an image rather than extracted text, you preserve the visual layout information that traditional text extraction destroys.
A roofing contractor receives vendor manuals in PDF format. The manuals contain text specifications alongside photos of roof types and installation diagrams. A multimodal RAG system can answer:
The same system works for any domain with image-heavy documentation: medical devices, industrial equipment, architectural specifications, legal exhibits.
Pinecone is the most straightforward vector database for this use case. The workflow:
Claude Code can set up the entire pipeline -- creating the index, writing the embedding scripts, and building the query interface -- from a description of what you want to build.
When GWS and multimodal RAG are used together, Claude Code becomes a complete knowledge management system:
A workflow that previously required a custom data pipeline and a full engineering sprint can now be described to Claude Code in natural language and deployed in an afternoon.
For GWS integration: Start with read-only access to Gmail and Drive. Build one workflow (for example, a daily inbox triage) before expanding to write access.
For multimodal RAG: Start with a single PDF document you know well. Ask questions you already know the answers to. Verify the system retrieves the correct pages before scaling.
Both capabilities reward iterative development. Build the simplest version first, verify it works as expected, then expand.
Next lesson: Hosting and Monitoring AI Agents
HivePowered AI — AI Like a Pro Training