Cross-Lingual Detection of Implicit Compliance Violations in Cross-Border Contracts: Few-Shot LLM, Legal-BERT, and Rule-Augmented Hybrid
DOI:
https://doi.org/10.66372/Keywords:
implicit compliance detection, cross-lingual NLP, contract analysis, Legal-BERT, few-shot in-context learning, rule-augmented hybridAbstract
The proliferation of cross-border commercial transactions has produced contractual portfolios that span multiple jurisdictions and natural languages, creating an acute need for automated tools that can detect compliance violations whose presence is implicit rather than explicit. Most existing systems focus on English-language documents and on overt clause matching, leaving implicit violations such as concealed jurisdiction shifts, unstated data-flow permissions, and indirect intellectual-property assignments largely untreated in a multilingual setting. This paper presents a comparative study of three detection paradigms on a curated four-language contract dataset covering English, Chinese, Japanese, and Korean: a rule-and-dictionary baseline, multilingual encoder fine-tuning with Legal-BERT and XLM-RoBERTa, and few-shot in-context learning with GPT-4 and Claude. We further introduce a rule-augmented hybrid that fuses neural confidence with rule-based vetoes through a calibration-aware decision layer. Across 8,624 annotated clauses covering six implicit-violation categories, the hybrid attains a macro-F1 of 0.8261 averaged across languages, outperforming the best single-paradigm system by 3.97 absolute points and the dictionary baseline by 23.18 points. Ablation studies show that the rule-veto layer contributes 1.84 points of F1 and that retrieval-anchored few-shot example selection adds another 1.27 points. The findings indicate that no single paradigm dominates across all languages or violation types and that disciplined fusion is the more reliable engineering choice for high-stakes cross-border compliance review.

