Next.js with OCR from Google Gemini

A simple and powerful boilerplate demonstrating how to perform OCR (Optical Character Recognition) using Google Gemini (gemini-2.5-flash) via the Vercel AI SDK and @ai-sdk/google.

This example shows how to extract text from uploaded images (PNG or JPG) directly through a Next.js App Router + API Route setup — with a clean shadcn/ui interface for image upload and result display.

Features

Next.js App Router for frontend and routing.
Next.js API Routes to handle OCR requests.
Google Gemini (gemini-2.5-flash) for text extraction.
Vercel AI SDK with @ai-sdk/google integration.
File upload interface (PNG or JPG, max 5MB).
shadcn/ui for a modern, minimal UI.
System prompt designed for OCR-focused LLM behavior.
Simple and easy-to-extend structure.

Technical Stack

Framework: Next.js (App Router + API Routes)
AI Provider: Google Gemini (gemini-2.5-flash)
AI SDK: Vercel AI SDK with @ai-sdk/google
UI Library: shadcn/ui (in packages/ui)
Language: TypeScript

How It Works

The user uploads an image (PNG or JPG).
The image is sent via FormData to the /api/ocr endpoint.
The API uses Google Gemini via the Vercel AI SDK to extract text.
The extracted text is returned and displayed in the UI.

Use Case

Perfect for developers who want to integrate OCR capabilities into Next.js apps using Google Gemini with minimal setup.

Ideal for:

Document scanners
Receipt or invoice extraction
License plate or form recognition
Multilingual text recognition

With Next.js, Gemini, and the Vercel AI SDK, this boilerplate provides a production-ready example of how to bring AI-powered OCR to the browser — fast, clean, and extensible. 🚀

Boilerplate