aiwebbackendcvFeatured

ClearSheet

Document scanning and OCR web app that detects a page inside a noisy phone photo, corrects perspective, generates a clean scan-style result, and extracts text in French, English, or both.

View live GitHub

Timeline

2026-04 — 2026-04

Technologies

PythonFastAPIOpenCVNumPypytesseractTesseract OCRpytestNext.js 16React 19Tailwind CSS 4Material UIFramer MotionDockerNetlifyHugging Face Spaces

Overview

Turning messy document photos into clean digital scans

ClearSheet is a full-stack document digitization product built to take a raw phone photo of a page, detect the document automatically, flatten it, enhance it into a scan-style image, and extract text with OCR. The project combines a modular OpenCV pipeline, a FastAPI backend, and a polished Next.js frontend for upload, review, and export.

4-stage

Pipeline

Preprocess -> detect -> warp -> OCR

FR + EN

OCR

French, English, or bilingual extraction

Scan modes

clean, balanced, ocr-optimized

PNG + PDF

Export

PNG and searchable PDF

Processing Pipeline

Computer vision tuned for real-world document photos

The backend is intentionally split into small stages so each part of the workflow is testable and easy to evolve. Preprocessing prepares edges and a document mask, contour detection searches for a stable four-point page boundary, the transformer warps and auto-rotates the sheet, and post-processing builds the final scan before OCR ranking chooses the strongest text result.

Preprocessing normalizes contrast, applies blur/Canny, and builds a cleanup mask for document candidates.
Document detection prefers a convex quadrilateral, then falls back to minAreaRect when the page edges are imperfect.
Perspective correction orders corners, warps the page, and estimates residual tilt with Hough-line based auto-rotation.
Three scan profiles (clean, balanced, ocr-optimized) tune CLAHE, sharpening, thresholding, and text expansion differently.
OCR is evaluated across multiple image variants and Tesseract configs, then ranked by confidence and useful content length.

ClearSheet scanning workbench — Frontend workbench and export flow

User Experience

Built as a complete scanning workflow, not just a model demo

The frontend wraps the backend with a practical workspace: upload a source image, choose OCR languages and scan mode, inspect optional debug snapshots, review the transcript, and export the result without leaving the page.

Upload a raw photo from desktop or mobile and send it to the FastAPI scan endpoint.
Choose fra, eng, or fra+eng plus the scan profile that best fits readability or OCR.
Inspect optional debug images for edges, contour detection, and the warped document.
Review extracted text, OCR confidence, and tokenized text geometry returned by the API.
Download the cleaned scan as PNG or generate a searchable PDF directly in the browser.

Searchable PDF export

The PDF exporter does more than place an image on a page: it overlays invisible OCR tokens on top of the scan so copied text stays selectable and searchable.

Delivery & QA

Shipped with deployment and testing in mind

The backend is organized into dedicated preprocessing, detection, transform, post-processing, and settings modules instead of one monolithic script.
Settings auto-detect local Tesseract and bundled tessdata paths, which makes Windows development smoother and Docker deployment cleaner.
docker compose up --build starts the frontend and backend together for a reproducible full-stack local environment.
Backend tests cover health and scan endpoints, invalid uploads, document-not-found cases, transformer behavior, post-processing, debug images, and optional OCR integration.

Related work

MrLabelling

Research platform for automatic 3D object segmentation and reconstruction from HoloLens 2 data. Combines YOLO, Mask R-CNN, and SAM achieving 80% precision for mixed reality applications.

ThoraxVision

AI-powered web application for detecting pulmonary diseases from chest X-rays using deep learning. YOLO-based detection with 80% accuracy on the NIH dataset.

Cadence

An agentic, retrieval-augmented assistant over public-transit data: it answers GTFS specification questions with grounded, cited responses and queries a live STM Montréal feed, then exposes everything as an MCP server.

View all projects