Course Lessons
Welcome & Orientation Starter
Welcome to AI Like a Pro

Quick Reference
⚡ Cheat Sheet 📐 Prompt Formula 📋 CLAUDE.md Guide
← Back to Course Home
Lesson 32 of 37  —  Module 6: Advanced Claude Code 86%
Module 6: Advanced Claude Code  Advanced

Claude Code + Google Tools + RAG

Combine Claude Code with the Google Workspace CLI and Gemini multimodal embeddings to build RAG systems that understand both text and images, and to give Claude Code direct access to Gmail, Drive, Docs, and Calendar.

Two Upgrades That Change How Claude Code Reads the World

This lesson covers two distinct but complementary upgrades:

  1. Google Workspace CLI (GWS) -- gives Claude Code direct access to your Gmail, Drive, Docs, Sheets, and Calendar with a single command
  2. Gemini multimodal RAG -- builds a retrieval system that understands documents containing both text and images, using Gemini Embedding 2

Together, these turn Claude Code into a system that can read your entire digital workspace -- not just code files.


Part 1: Google Workspace CLI

What It Is

The Google Workspace CLI (gws) is an open-source tool by Google that exposes all major Workspace services as a command-line interface. When connected to Claude Code, it gives the agent the ability to:

  • Read and send Gmail messages
  • List and read Google Drive files
  • Read and edit Google Docs and Sheets
  • Check and create Calendar events

Before this integration existed, connecting Claude Code to Google Workspace required complex OAuth flows and custom API wrappers. The GWS CLI handles all of that. One setup, permanent access.

Why This Matters for Automation

Consider what becomes possible when Claude Code can read your inbox:
- Triage unread emails and surface the ones requiring decisions
- Draft replies based on previous conversation context
- Extract data from received reports and update a tracking sheet
- Create calendar events from meeting requests in email

Or when it can read your Drive:
- Summarize documents without manual copy-paste
- Cross-reference multiple documents to answer questions
- Generate reports by pulling data from multiple Sheets tabs
- Convert a transcript (YouTube video) into a formatted Google Doc

Setup Overview

  1. Download the gws binary for your operating system
  2. Run the auth flow once: gws auth login -s calendar,gmail,drive
  3. Sign in with your Google account in the browser
  4. The token is stored locally -- no re-auth needed for normal use

After setup, Claude Code can call gws commands as part of any workflow:

gws gmail users messages list --params '{"userId":"me","q":"is:unread","maxResults":10}'

The YouTube-to-Google-Doc Demo

One compelling demo: paste a YouTube URL into Claude Code and say "convert this video into a formatted Google Doc." Claude Code extracts the transcript, structures it with headings and sections, and writes the result directly to a new Google Doc in your Drive -- formatted, clean, and shareable in under a minute.

The same pattern works for any content-to-Doc pipeline: meeting notes, blog drafts, SOPs from conversations.


Part 2: Gemini Multimodal RAG

The Problem with Traditional RAG

Standard RAG systems work well for text-only documents. But real-world documents -- product manuals, compliance guides, research papers, client presentations -- contain charts, diagrams, screenshots, and tables that carry crucial information.

A vacuum cleaner manual might explain a function in text on page 3 and illustrate it with a diagram on page 4. A text-only embedding system sees them as separate, unrelated content. A multimodal system understands the relationship.

Gemini Embedding 2 Changes This

Google released Gemini Embedding 2 as a multimodal embedding model that can process both text and images simultaneously. When you embed a PDF page with this model, it captures the visual content (charts, diagrams, photos) alongside the text.

This means a RAG system built on Gemini Embedding 2 can answer questions like:
- "What does the roofing style in the photo on page 7 cost per square foot?"
- "Explain the flow diagram from the compliance document"
- "What does the highlighted section in this screenshot mean?"

Building a Multimodal RAG Pipeline

The basic architecture:

  1. Document ingestion -- convert PDF pages to images (one image per page captures both text and visuals)
  2. Embedding -- pass each page image through Gemini Embedding 2 to get a vector representation
  3. Vector storage -- store embeddings in Pinecone (or another vector database)
  4. Query -- at query time, embed the question and retrieve the most relevant pages
  5. Generation -- pass the retrieved page images plus the question to Claude or Gemini for an answer

The key insight: by treating each PDF page as an image rather than extracted text, you preserve the visual layout information that traditional text extraction destroys.

Practical Example: Equipment Manuals

A roofing contractor receives vendor manuals in PDF format. The manuals contain text specifications alongside photos of roof types and installation diagrams. A multimodal RAG system can answer:

  • "What fastener spacing does the manual recommend for this roof style?" (combines text spec + diagram reference)
  • "Which materials in this photo match the approved vendor list?" (image recognition + text lookup)
  • "What is the warranty period for this installation method?" (text lookup with visual confirmation)

The same system works for any domain with image-heavy documentation: medical devices, industrial equipment, architectural specifications, legal exhibits.

Pinecone Integration

Pinecone is the most straightforward vector database for this use case. The workflow:

  1. Create a Pinecone index with the correct dimensionality for Gemini Embedding 2
  2. Batch-embed all document pages and upsert to Pinecone
  3. Query with semantic search at runtime

Claude Code can set up the entire pipeline -- creating the index, writing the embedding scripts, and building the query interface -- from a description of what you want to build.


Combining Both: The Full Google + RAG Stack

When GWS and multimodal RAG are used together, Claude Code becomes a complete knowledge management system:

  • Inputs: emails, Drive documents, PDFs, calendar events, YouTube transcripts
  • Processing: multimodal embeddings + vector storage
  • Retrieval: semantic search across all content types
  • Output: structured answers, Google Docs reports, email drafts, calendar events

A workflow that previously required a custom data pipeline and a full engineering sprint can now be described to Claude Code in natural language and deployed in an afternoon.


Getting Started

For GWS integration: Start with read-only access to Gmail and Drive. Build one workflow (for example, a daily inbox triage) before expanding to write access.

For multimodal RAG: Start with a single PDF document you know well. Ask questions you already know the answers to. Verify the system retrieves the correct pages before scaling.

Both capabilities reward iterative development. Build the simplest version first, verify it works as expected, then expand.


Watch the Originals

  • Google Tool 10x -- Google Workspace CLI -- youtube.com/watch?v=Wu67lLD8bB0 -- 12 min
  • Google New Model + Claude Code Changed RAG -- youtube.com/watch?v=hem5D1uvy-w -- 15 min

Next lesson: Hosting and Monitoring AI Agents

HivePowered AI — AI Like a Pro Training