Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

This document outlines a comprehensive proposal to enhance Open Library's BookWorm metadata ingestion system by integrating Google Books as a supplementary data source. It details the current challenges with incomplete or malformed book metadata, particularly for ISBN-13s, and presents a structured plan to leverage Google Books to improve data quality and user experience.

Problem Statement: BookWorm's current reliance on Amazon and ISBNdb leads to missing or incomplete metadata, causing failed imports and poor-quality entries in Open Library, especially for less common titles.
Justification: Integrating Google Books will enrich edition data, reduce import failures, and boost user trust. Success will be measured by higher import success rates and fewer placeholder entries.
Success Metrics: The solution will be deemed successful when BookWorm can fetch and stage metadata from Google Books using ISBN-13, with automated tests confirming accurate parsing of various Google Books responses and proper handling of edge cases.
Technical Proposal: The plan involves introducing Google Books as a fallback provider when Amazon lookups fail. Key requirements include updating STAGED_SOURCES, ensuring correct URL formatting for staging, extending source_records rather than replacing them, implementing a stage_from_google_books function, and handling multiple Google Books results by logging warnings.
Data Fields: Specific metadata fields to be parsed and staged from Google Books responses include isbn_10, isbn_13, title, subtitle, authors, source_records, publishers, publish_date, number_of_pages, and description.
New Interfaces: The proposal introduces several new public functions and classes within scripts/affiliate_server.py, such as fetch_google_book, process_google_book, stage_from_google_books, and modifications to BaseLookupWorker and AmazonLookupWorker, detailing their inputs, outputs, and descriptions. By implementing these changes, Open Library aims to create a more robust and reliable system for acquiring and enriching book metadata, ultimately improving the completeness and accuracy of its vast catalog for users worldwide.

Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

The Lowdown