Show HN: CLI tool for detecting non-exact code duplication with embedding models
Slopo is a new CLI tool designed to identify non-exact code duplication in large codebases using embedding models, tackling the hard-to-detect semantic similarities that often escape traditional tools. This project resonates with HN's interest in developer productivity and leverages cutting-edge AI techniques for code analysis. Its focus on enabling AI agents to verify and refactor detected clusters positions it as a modern solution to a long-standing software engineering challenge.
The Lowdown
Slopo is a lightweight command-line interface (CLI) tool that specializes in detecting non-exact code duplication using embedding models. Unlike tools that find exact copy-pastes, Slopo targets similar code snippets that might be structurally different but semantically identical, often residing far apart in a codebase.
- Core Functionality: It calculates embeddings for individual code units and identifies pairs with high similarity, ranking them by both semantic resemblance and distance within the codebase.
- Targeted Duplication: It focuses on the most challenging duplicates to find—those that are similar but not identical, spread across modules, or separated by significant code.
- AI Integration: The tool generates clusters of potential duplicates, intended as input for AI coding agents to verify, filter (e.g., mark as
slopo.ignore.txt), and ultimately assist in refactoring efforts. - Supported Languages: Currently supports Python, TypeScript, JavaScript, Java, Kotlin, C#, Go, and Rust, with plans to easily add more like PHP.
- Configurability: Users can configure embedding models (compatible with LiteLLM), set thresholds for similarity and reranking, and exclude specific directories or file patterns.
- Workflow: It supports an iterative workflow, allowing users to analyze, review results with AI agents, ignore false positives, and then re-analyze to focus on remaining issues.
By focusing on semantic rather than syntactic duplication and integrating with AI-powered workflows, Slopo aims to help developers maintain cleaner, more efficient codebases by surfacing hidden redundancies that traditional methods often miss.
The Gossip
Semantic Similarities and Refactoring
Commenters quickly grasp Slopo's unique value proposition: detecting semantic code duplication rather than just exact copies. The discussion highlights its potential for pre-refactoring analysis, especially where code is functionally similar but structurally varied, making it a valuable asset for improving code quality.
Language Leap and Expansion
A common point of inquiry for developer tools is language support. One commenter expressed interest in PHP support, and the author confirmed that adding new languages is straightforward and a PHP update is imminent, demonstrating the tool's extensibility and responsiveness to community needs.