HN
Today

SQL patterns I use to catch transaction fraud

This insightful post details six practical SQL patterns for identifying transaction fraud, arguing that robust SQL often outperforms complex machine learning in real-world scenarios. It provides concrete examples and code snippets, offering data professionals actionable strategies for enhancing fraud detection. The author emphasizes combining these patterns to build a more effective and adaptable fraud prevention system.

34
Score
2
Comments
#7
Highest Rank
2h
on Front Page
First Seen
May 16, 4:00 AM
Last Seen
May 16, 5:00 AM
Rank Over Time
137

The Lowdown

The article "Six SQL patterns I use to catch transaction fraud" posits that effective fraud detection primarily relies on well-crafted SQL queries, rather than trending technologies like machine learning or graph databases. The author, a program integrity analyst, shares six core SQL patterns applicable to various transaction data, from credit cards to government benefits, emphasizing their practical implementation and tuning.

  • Velocity: Detects unusually rapid transaction frequency, which can indicate card testing or stolen card usage. It uses window functions to count transactions within a moving time window (e.g., 5 minutes) for a given cardholder.
  • Impossible travel: Identifies transactions that occur geographically too far apart within an impossibly short time frame, suggesting card cloning. This pattern leverages LAG to compare consecutive transaction locations and timestamps, utilizing a haversine function for distance calculation.
  • Amount anomalies: Flags specific transaction amounts often associated with fraudulent activities, such as round dollar amounts ($1.00, $5.00) for card testing, or amounts just under common authorization thresholds ($99.99, $499.99).
  • Suspicious merchants: Pinpoints merchants exhibiting an unusual spike in unique cardholders or transaction volume, which could signal a compromised point-of-sale system. It compares current activity against a merchant's historical baseline using window functions over hourly buckets.
  • Off-hours: Catches transactions made outside a cardholder's typical spending habits. This involves building a historical profile of a cardholder's active hours and flagging transactions that fall outside this established range.
  • Window functions for chained signals: This isn't a pattern itself but a foundational setup that uses various window functions (e.g., LAG, ROW_NUMBER) to pre-compute transaction characteristics like time since last transaction or merchant change. This makes it significantly easier and faster to combine multiple fraud signals into complex rules.

The author stresses that no single pattern is sufficient in isolation due to false positives. The true power lies in combining these signals and scoring transactions based on how many patterns they trigger, allowing analysts to quickly iterate on new fraud hypotheses by translating them into SQL filters.