The surprisingly complex journey to text-selectable client-side generated PDFs
Generating text-selectable PDFs client-side is surprisingly hard, as this post by SmallDocs author FailMore meticulously details their journey to achieving it. The article uncovers the fundamental differences between browser rendering and PDF structure, revealing why common solutions fall short for privacy-first applications. It's a classic HN tale of a deceptively simple problem requiring a bespoke, technically intricate solution.
The Lowdown
The seemingly simple task of generating client-side, text-selectable PDFs proved unexpectedly complex for SmallDocs, a privacy-first Markdown reader. This article details the arduous journey to overcome the limitations of existing solutions to meet stringent privacy and rendering fidelity requirements.
- SmallDocs required client-side PDF generation due to sensitive user data, coupled with easy download, reliably selectable text, and faithful reproduction of its rich Markdown styling.
- Understanding PDF fundamentals reveals them as "sets of instructions" for drawing glyphs, requiring a separate
ToUnicode CMapfor text selection and copying, which maps glyph shapes back to characters. - Standard approaches like server-side generation (violates privacy),
window.print()(poor user experience),html2pdf.js(produces unselectable images), and existing JS libraries (jsPDF,pdfmake,pdf-libwithout a custom layout) each failed to meet SmallDocs' core requirements. - The eventual solution combined
pdf-libwith a custom layout engine, leveraging the finite scope of Markdown elements and the assistance of LLMs like Claude Code for styling and positioning logic. - The custom engine had to translate browser-specific rendering concepts (pixels to points, inverted Y-axis for positioning) into PDF language and manage page breaks dynamically.
- Key challenges ("gotchas") included ensuring
ToUnicode CMapaccuracy for modern fonts with glyph substitutions (solved bysubset: trueinpdf-lib), treating links as separate PDF annotations, and programmatically rendering background styles for elements likecodeblocks. - SmallDocs successfully achieved its goal of generating highly accurate, text-selectable, client-side PDFs.
- The article concludes by highlighting SmallDocs' broader mission as a developer-friendly, open-source "Office Suite" for the CLI-based agent world, offering private Markdown rendering, sharing, and agent feedback features.
This detailed technical account showcases the depth of effort required for seemingly straightforward web features and reinforces SmallDocs' commitment to privacy-first, high-fidelity experiences, positioning it as a potential future standard for Markdown interaction.