Technical Stack
A simple and powerful boilerplate demonstrating how to perform OCR (Optical Character Recognition) using Google Gemini (gemini-2.5-flash) via the Vercel AI SDK and @ai-sdk/google.
This example shows how to extract text from uploaded images (PNG or JPG) directly through a Next.js App Router + API Route setup â with a clean shadcn/ui interface for image upload and result display.
Features
- Next.js App Router for frontend and routing.
- Next.js API Routes to handle OCR requests.
- Google Gemini (gemini-2.5-flash) for text extraction.
- Vercel AI SDK with
@ai-sdk/googleintegration. - File upload interface (PNG or JPG, max 5MB).
- shadcn/ui for a modern, minimal UI.
- System prompt designed for OCR-focused LLM behavior.
- Simple and easy-to-extend structure.
Technical Stack
- Framework: Next.js (App Router + API Routes)
- AI Provider: Google Gemini (
gemini-2.5-flash) - AI SDK: Vercel AI SDK with
@ai-sdk/google - UI Library: shadcn/ui (in
packages/ui) - Language: TypeScript
How It Works
- The user uploads an image (PNG or JPG).
- The image is sent via FormData to the /api/ocr endpoint.
- The API uses Google Gemini via the Vercel AI SDK to extract text.
- The extracted text is returned and displayed in the UI.
Use Case
Perfect for developers who want to integrate OCR capabilities into Next.js apps using Google Gemini with minimal setup.
Ideal for:
- Document scanners
- Receipt or invoice extraction
- License plate or form recognition
- Multilingual text recognition
With Next.js, Gemini, and the Vercel AI SDK, this boilerplate provides a production-ready example of how to bring AI-powered OCR to the browser â fast, clean, and extensible. đ
Boilerplate details
Last update
2 days agoBoilerplate age
2 days ago