HN
Today

The surprisingly complex journey to text-selectable client-side generated PDFs

Generating text-selectable PDFs client-side is surprisingly hard, as this post by SmallDocs author FailMore meticulously details their journey to achieving it. The article uncovers the fundamental differences between browser rendering and PDF structure, revealing why common solutions fall short for privacy-first applications. It's a classic HN tale of a deceptively simple problem requiring a bespoke, technically intricate solution.

7
Score
1
Comments
#8
Highest Rank
3h
on Front Page
First Seen
May 8, 9:00 AM
Last Seen
May 8, 11:00 AM
Rank Over Time
889

The Lowdown

The seemingly simple task of generating client-side, text-selectable PDFs proved unexpectedly complex for SmallDocs, a privacy-first Markdown reader. This article details the arduous journey to overcome the limitations of existing solutions to meet stringent privacy and rendering fidelity requirements.

  • SmallDocs required client-side PDF generation due to sensitive user data, coupled with easy download, reliably selectable text, and faithful reproduction of its rich Markdown styling.
  • Understanding PDF fundamentals reveals them as "sets of instructions" for drawing glyphs, requiring a separate ToUnicode CMap for text selection and copying, which maps glyph shapes back to characters.
  • Standard approaches like server-side generation (violates privacy), window.print() (poor user experience), html2pdf.js (produces unselectable images), and existing JS libraries (jsPDF, pdfmake, pdf-lib without a custom layout) each failed to meet SmallDocs' core requirements.
  • The eventual solution combined pdf-lib with a custom layout engine, leveraging the finite scope of Markdown elements and the assistance of LLMs like Claude Code for styling and positioning logic.
  • The custom engine had to translate browser-specific rendering concepts (pixels to points, inverted Y-axis for positioning) into PDF language and manage page breaks dynamically.
  • Key challenges ("gotchas") included ensuring ToUnicode CMap accuracy for modern fonts with glyph substitutions (solved by subset: true in pdf-lib), treating links as separate PDF annotations, and programmatically rendering background styles for elements like code blocks.
  • SmallDocs successfully achieved its goal of generating highly accurate, text-selectable, client-side PDFs.
  • The article concludes by highlighting SmallDocs' broader mission as a developer-friendly, open-source "Office Suite" for the CLI-based agent world, offering private Markdown rendering, sharing, and agent feedback features.

This detailed technical account showcases the depth of effort required for seemingly straightforward web features and reinforces SmallDocs' commitment to privacy-first, high-fidelity experiences, positioning it as a potential future standard for Markdown interaction.