WJW Analytics Platform

The Problem
The client needs to digitize and archive analog files and technical drawings from paper and microfiche stocks in a structured manner. In addition to pure digitization, automated text recognition and searchable and evaluable data storage is required.
The Solution
A containerized full-stack platform was developed that covers the entire process from document capture via OCR-based text extraction to structured storage. The backend is based on FastAPI and handles document processing, text recognition, automatic content extraction and persistent storage in database and file system. Additionally, AI-supported analysis with vector database is used, which enables semantic search and document comparison. The Angular frontend offers a user-friendly interface for project management, uploading entire folder structures, real-time monitoring of OCR status, and reviewing, editing and exporting results.
TechStack
FastAPI
Python
SQLAlchemy
SQLite
ChromaDB
Docker

