AI
RAG
LangGraph
React Native
FastAPI
Product

How I Built DuoRead?

June 19, 202612 min read
How I Built DuoRead?

I've built many products before. But DuoRead is the first one that impressed everyone I showed it to — and the first one I actually use every single day myself.

This post is the long, honest, technical version of the story: what I built, why I built it, the exact tech stack, the architecture decisions, the problems that nearly broke me, and how it finally went live on both web and mobile (Play Store) after roughly 6 months to a year of work.

If you just want the one-liner:

DuoRead turns static books and PDFs into a dynamic AI canvas — with an AI tool bar and a side-running DuoRead Agent.

Let's get into how it was actually built.


1. The Problem I Was Trying to Solve

I read a lot. Books, novels, research PDFs, documentation — most of it in English, which isn't my native language. And every serious reading session looked the same:

  • Hit a hard word → switch to Google Translate.
  • Don't know how to say it → switch to Google Pronounce.
  • A paragraph is too dense → paste it into ChatGPT or Gemini to simplify.
  • Need context on a concept → open Wikipedia.
  • Found something worth keeping → switch to a Notes app.

By the time I came back to the book, my flow was gone. I'd lose the thread, get tired, and abandon the book halfway. Millions of people do exactly this every day.

So the thesis behind DuoRead was simple:

What if every one of those actions happened on the same screen as the text — without ever leaving the page?

That single constraint — never break the reader's flow — drove every technical decision that followed.

The DuoRead reading canvas concept The "one screen for everything" idea.


2. What DuoRead Actually Does

Before the architecture, here's the feature surface, because the stack only makes sense once you know what it has to support:

  • DuoRead Agent — a side-running chat agent that knows the book you're reading. Ask it anything.
  • Translate — translate any selection into your native language.
  • Simplify — rewrite complex sentences in plain words.
  • Summarize — compress long passages.
  • Explain — explain a concept with the surrounding context, not in a vacuum.
  • Define — instant definitions for words and terms.
  • Pronunciation & Read Aloud — hear a word, or have a whole page read to you.
  • Sticky Notes — save thoughts and highlights right on the page.
  • Dual ModesOffline Mode and Hybrid Mode.
The tool bar on a single-word selection
The tool bar on a single-word selection
The expanded bar for a multi-word selection
The expanded bar for a multi-word selection

The core interaction is this: select text → a context-aware tool bar floats up → pick an action → the result appears inline, in a panel, beside the text. The book never disappears.


3. The Architecture (Bird's-Eye View)

DuoRead is three codebases that share one backend:

┌─────────────────────┐     ┌─────────────────────┐
│   Web App (React)   │     │  Mobile (React      │
│   Vite + TS +       │     │  Native + Expo)     │
│   Tailwind + shadcn │     │  Android / Play     │
└──────────┬──────────┘     └──────────┬──────────┘
           │                           │
           │      HTTPS / SSE          │
           └─────────────┬─────────────┘
                         ▼
        ┌──────────────────────────────────┐
        │   Backend — FastAPI (Python 3.13)│
        │   • LangChain + LangGraph Agent  │
        │   • RAG over pgvector            │
        │   • JWT Auth                     │
        └───┬───────────┬──────────┬───────┘
            ▼           ▼          ▼
     ┌──────────┐ ┌──────────┐ ┌──────────────┐
     │ Postgres │ │ MongoDB  │ │ Cloudflare R2│
     │ +pgvector│ │ (agent   │ │ (book / PDF  │
     │ (RAG +   │ │ memory   │ │ file storage)│
     │  data)   │ │ checkpts)│ │              │
     └──────────┘ └──────────┘ └──────────────┘

Three clients, one brain. Let me break each layer down.


4. The Frontend — A "Canvas," Not a Reader

The web app (DuoRead-FE-C) is a React + TypeScript single-page app built with Vite (using the SWC compiler for fast builds).

Key choices:

  • Tailwind CSS + shadcn/ui (Radix UI primitives) for the entire design system — dialogs, popovers, tooltips, the floating command bar, sliders for read-aloud speed, etc. Radix gives accessibility and keyboard handling for free, which matters a lot for a tool you live inside for hours.
  • react-pdf + pdfjs-dist to render PDFs in the browser. This is where a static document becomes a selectable, interactive surface.
  • react-resizable-panels — this is the unsung hero. It's what makes the "book on one side, AI on the other" split-canvas layout draggable and resizable. The whole "side-running agent" UX depends on it.
  • TanStack Query (React Query) for all server state — caching, retries, and keeping the UI snappy while the backend thinks.
  • react-markdown + remark-gfm to render the AI's responses (summaries, explanations, agent chat) as rich, formatted text instead of a plain blob.
  • react-router-dom for navigation, react-hook-form + zod for typed, validated forms, and Google OAuth (@react-oauth/google) for sign-in.

DuoRead web app — split reading canvas The web reading canvas.

The hardest frontend problem was text selection across a PDF layer. PDF.js renders an invisible text layer over the canvas, and getting the exact selection coordinates — so the floating tool bar appears in the right place, anchored to the right words — took a lot of iteration. The single-word vs. multi-word bar you saw above are two different selection states that had to be detected reliably.


5. The Mobile App — React Native + Expo

The mobile app (DuoRead-Mobile) is built with React Native + Expo, which let me ship to Android (Play Store) from the same mental model as the web app.

The Expo modules doing the heavy lifting:

  • expo-speech → powers Pronunciation and Read Aloud. On-device text-to-speech, no network round trip needed.
  • expo-document-picker + expo-file-system → let users import their own PDFs and store/cache them locally.
  • @react-native-community/netinfo → detects connectivity, which is the trigger for Offline vs Hybrid Mode (more on that below).
  • expo-secure-store → secure token storage for auth.
  • expo-haptics → subtle tactile feedback on the tool bar (I spent a surprising amount of time tuning this — there's even a haptic feedback plan doc in the repo).
  • react-native-purchases (RevenueCat) → subscription / in-app purchase handling.
  • react-native-webview, react-native-markdown-display / react-native-render-html → rendering documents and AI markdown responses.
  • @react-native-google-signin/google-signin → native Google sign-in.
  • TanStack Query + axios + zod → shared data-fetching patterns with the web app.
Translate
Translate
Simplify
Simplify
Explain
Explain
Read Aloud
Read Aloud
Sticky Notes
Sticky Notes
Highlighting
Highlighting

6. The Backend — Where the "AI-Native" Part Lives

This is the part I'm proudest of. The backend (DuoRead-BE) is Python 3.13 + FastAPI, served by Uvicorn, fully Dockerized.

The route layer is cleanly split by feature — auth, books, chat, sticky notes, highlights, word bank, analytics, notifications, and webhooks — each backed by SQLModel over PostgreSQL, with JWT auth and slowapi rate limiting in front.

But the interesting part is the AI layer.

6.1 The DuoRead Agent — LangChain + LangGraph

The side-running agent isn't a thin wrapper around a chat completion. It's a proper LangGraph agent:

  • Built as a StateGraph with a model node and a ToolNode, wired with tools_condition so the agent can decide when to call a tool versus answer directly.
  • ChatOpenAI as the reasoning model.
  • MongoDB checkpointer (langgraph-checkpoint-mongodb) so each conversation has persistent memory — you can close the app, come back, and the agent still remembers the thread for that book.
  • Responses are streamed to the client over SSE (Server-Sent Events), so the agent feels alive — text appears token by token instead of after a long pause.

6.2 RAG — How the Agent "Knows" Your Book

This is what makes Explain and the agent context-aware instead of generic:

  1. On upload, the PDF is parsed with PyMuPDF / pypdf.
  2. Text is chunked with RecursiveCharacterTextSplitter (≈800-char chunks, ~80-char overlap).
  3. Chunks are embedded — I run sentence-transformers/all-mpnet-base-v2 (768-dim) via HuggingFace, with OpenAI embeddings also wired in.
  4. Vectors are stored in PostgreSQL with the pgvector extension, so retrieval is a similarity search right next to the relational data.
  5. At query time, the agent retrieves the most relevant chunks and grounds its answer in your actual book — not the open internet.

That's why "Explain this concept" gives you an explanation rooted in the paragraph you're reading, with the surrounding context, instead of a Wikipedia-flavored guess.

6.3 Supporting Infrastructure

  • Cloudflare R2 (via boto3) → object storage for book/PDF files (S3-compatible, no egress fees).
  • Firebase Admin → push notifications to mobile.
  • Brevo → transactional email for verification OTPs.
  • langdetect → detecting the source language so Translate/Simplify behave correctly.
  • Deployed as a container (the Dockerfile uses uv for fast, reproducible dependency installs).

7. Offline Mode vs Hybrid Mode

One of DuoRead's differentiators is the mode toggle, and it's a deliberate engineering trade-off:

  • Hybrid Mode — the full experience. The cloud agent, RAG, summarize, explain, and the side-running chat are all live. This is where the heavy AI reasoning happens.
  • Offline Mode — built for reading without a reliable connection. Locally cached documents, on-device features like pronunciation and read-aloud (via expo-speech), highlighting, and sticky notes keep working. netinfo detects the network state so the app degrades gracefully instead of just breaking.

The principle: reading should never depend on a signal bar. AI augments the experience when you're connected; it doesn't hold the basics hostage when you're not.


8. The Problems That Nearly Broke Me

No honest build story is just a victory lap. The hard parts:

  1. PDF text selection & coordinate math. Anchoring a floating tool bar to an exact selection — across PDF.js's text layer on web and a totally different rendering path on React Native — was the single most time-consuming UI problem. Single-word, multi-word, and cross-line selections all behave differently.

  2. Streaming everywhere. Making the agent stream over SSE through FastAPI, then consume that stream cleanly on both a React web client and a React Native client (which handle fetch/streaming very differently), took real plumbing. There are dedicated streaming guides in the repo because I had to document it for myself.

  3. RAG quality vs. cost. Chunk size, overlap, embedding model choice, and how many chunks to retrieve are all knobs that trade accuracy against latency and cost. Getting "Explain" to feel instant and grounded meant tuning all of them.

  4. Agent memory at scale. Persisting LangGraph state per-conversation in MongoDB — so memory survives restarts without bloating — needed the checkpointer wired carefully.

  5. One product, three clients. Keeping web, mobile, and backend in sync — shared validation (zod), shared data patterns (TanStack Query + axios), consistent auth — is an ongoing discipline, not a one-time setup.

  6. Shipping to the Play Store. Build signing, store assets, screenshots, review compliance, and subscriptions (RevenueCat) are their own project on top of the engineering.


9. It's Live — Web + Play Store

After roughly 6 months to a year of building, breaking, and rebuilding, DuoRead is live on both the web and on Android via the Play Store.

It's the first product I've built that I open every single day — not to test it, but to actually read.

DuoRead — live on web and Play Store


10. The Stack, At a Glance

LayerTechnology
WebReact, TypeScript, Vite (SWC), Tailwind CSS, shadcn/ui (Radix), TanStack Query, react-pdf / pdfjs-dist, react-resizable-panels, react-markdown
MobileReact Native, Expo, expo-speech, expo-document-picker, expo-secure-store, netinfo, RevenueCat, Google Sign-In
BackendPython 3.13, FastAPI, Uvicorn, JWT, slowapi, Docker (uv)
AI / AgentLangChain, LangGraph (StateGraph + ToolNode), OpenAI, SSE streaming, MongoDB checkpointer
RAGPyMuPDF/pypdf, RecursiveCharacterTextSplitter, HuggingFace + OpenAI embeddings, PostgreSQL + pgvector
Data & InfraPostgreSQL (SQLModel), MongoDB, Cloudflare R2, Firebase (push), Brevo (email)

Try It & Tell Me What You Think

If you read books or push through a lot of documents — like millions of other people — DuoRead was built for you.

I'd genuinely love your feedback:

  • What's good?
  • What's bad?
  • What should I improve?
  • What should I add next?

Drop a comment or reach out. This is the product I always wanted to exist — and I'm just getting started.

Zain

Muhammad Zain Attiq

Agentic AI Engineer • AI Agents • Freelancer

Back to all posts