What is Mediator?

Mediator is an AI image and video generation tool. The name comes from Media + Creator = Mediator.

About This Project

This is a personal side project, exploring the possibilities of AI generation technology while diving deep into advanced browser capabilities (such as WebGPU, OPFS, Web Workers, ONNX Runtime Web, etc.).

Core Features

🔒 Fully Client-Side

Mediator is a pure frontend application (PWA), all operations happen in your browser:

No backend server: API calls go directly from browser to Google Gemini
Data stays local: Your images and generated results are only stored locally
Offline capable: Supports PWA, can be installed to desktop

🎨 Multiple Creation Modes

Mode	Description
Generate	Basic text-to-image, supports multiple styles
Sticker	Generate sticker sheets, auto-split into individual stickers
Edit	Upload reference images for editing or style transfer
Story	Multi-step visual storytelling, maintains character consistency
Diagram	Generate technical diagrams, flowcharts
Video	Generate AI videos using Veo 3.1 API
Slides	AI slide generation, PDF to PPTX, OCR text recognition

How It Works

Prompt Builder

At its core, all image generation features are designed to help you build better prompts.

Each mode provides options and presets tailored to specific use cases. When you select styles, aspect ratios, compositions, and other parameters, Mediator converts these choices into AI-understandable descriptions and combines them with your input to create a complete prompt.

Better Prompts = Better Results

AI models need sufficient information to accurately understand your intent. Through the structured options provided by each mode, even a brief description becomes a rich, detailed prompt, leading to results that better match your expectations.

Google Search Integration

When generating images that involve real-world subjects (people, places, events, etc.), Mediator can use Google Search to retrieve accurate information:

How to enable: Toggle "Use Google Search" in the generation options
When to use: When your prompt involves real people, locations, historical events, or anything that benefits from accurate real-world data
How it works: The AI queries Google Search for relevant information and incorporates it into the generation process

Note

This feature uses Gemini's built-in Google Search tool, enabling "grounded" generation that references real-world facts.

Loop Generation

Some modes need to generate multiple images at once, such as:

Story Mode: Generates scene images for each step in sequence
Slides Mode: Generates each slide page one by one

These modes use a looping mechanism that splits your content and processes each part through the AI sequentially. Once complete, you can export everything as a PDF or other format for immediate use.

🌍 Multi-language Support

繁體中文 (Traditional Chinese)
English

Interface automatically switches based on browser language.

🎭 Theme System

14 built-in themes including dark and light modes:

Slate Blue Pro (default dark)
Greek Blue (default light)
Warm Latte, Espresso, Mocha (coffee series)
Nord, Gruvbox, Everforest (programmer favorites)
Spring, Summer, Autumn, Winter (seasonal series)
Matcha, Matcha Dark (matcha series)

Technical Architecture

Frontend Framework: Vue 3 + Composition API
State Management: Pinia
Styling: Tailwind CSS v4
AI API: Google Gemini API + Veo 3.1 API
Storage: localStorage + IndexedDB + OPFS
OCR: PaddleOCR (ONNX Runtime Web) + Tesseract.js
Image Processing: OpenCV.js (text removal inpainting)

Next Steps

Ready to get started? Go to Getting Started to set up your API Key.

What is Mediator? ​

Core Features ​

🔒 Fully Client-Side ​

🎨 Multiple Creation Modes ​

How It Works ​

Prompt Builder ​

Google Search Integration ​

Loop Generation ​

🌍 Multi-language Support ​

🎭 Theme System ​

Technical Architecture ​

Next Steps ​