Fine-tuning an LLM to write docs like it's 1995

This article details an intriguing experiment in fine-tuning large language models (LLMs) to mimic the writing style of 1990s technical documentation. Driven by a prediction that future tech writers will use specialized local LLMs, the author embarked on a practical project to see if an instruct model could be trained to adopt this specific retro voice.

Data Acquisition: The author leveraged Bitsavers, a vast archive of old computer manuals, specifically the Microsoft collection spanning 1977-2005 (over 37 million words), as the primary training corpus.
Data Preparation: The downloaded OCR'd text underwent a two-pass cleaning process using Python scripts to remove clutter, followed by a filtering step with a cheap LLM (gemma-4-26b) via OpenRouter to ensure paragraph intelligibility, costing approximately $8.
Training Setup: The cleaned text was segmented into training examples, each capped at 512 tokens and paired with synthetic instructions, yielding nearly 200,000 JSONL examples.
Methodology Choice: Fine-tuning via QLoRA (Quantized Low-Rank Adaptation) was chosen over RAG, as the goal was style transfer rather than fact retrieval. QLoRA uses small "adapters" to steer the model's behavior, with quantization reducing memory requirements.
Computational Resources: To manage the computational demands and cost, the author utilized Runpod, an online service providing GPU access, successfully training the adapters for around $50.
Models and Parameters: The experiment involved fine-tuning Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct models, testing various conditions like training data volume, epochs, and adapter rank.
Style Transfer Validation: Models were tested with prompts for known functions (malloc()), fictitious ones (ConnectWifi()), and anachronistic explanations (Explain REST API in 1990s Microsoft style).
Key Findings: Fine-tuned models, particularly Qwen, effectively adopted the 90s documentation structure and tone, even for anachronistic concepts. Interestingly, smaller adapters (lower rank) committed more readily to the impersonation, while base models completely failed. The experiment successfully demonstrated that fine-tuned LLMs can become convincing impersonators of a specific writing style, making them potentially valuable tools for stylistic tasks within technical writing. However, the author cautions that the process demands high-quality data, careful model selection, and meticulous parameter tuning, concluding that these AI tools serve as augmentation for human writers rather than outright replacements.

Fine-tuning an LLM to write docs like it's 1995

The Lowdown