A Contrastive Learning Patent Document Retrieval Model for Similar Patent Technologies Based on KorPatSTS

Jae-Ok Min; Sol-Bin Hwang; Young-Hoon Jeon; Song-A Chae; Bong-Gun Lee

Research Article

A Contrastive Learning Patent Document Retrieval Model for Similar Patent Technologies Based on KorPatSTS

Jae-Ok Min¹, Sol-Bin Hwang², Young-Hoon Jeon², Song-A Chae², Bong-Gun Lee³

¹Team Manager, Intelligent Information Strategy Department, Korea Institute of Patent Information, Republic of Korea
²Assistant Manager, Intelligent Information Strategy Department, Korea Institute of Patent Information, Republic of Korea
³Head of Strategic Planning Department, Korea Institute of Patent Information, Republic of Korea

Correspondence to Bonggun Lee, E-mail: bglee@kipi.or.kr

Volume 21, Number 1, Pages 181-207, March 2026.
Journal of Intellectual Property 2026;21(1):181-207. https://doi.org/10.34122/jip.2026.21.1.181
Received on December 31, 2025, Revised on February 21, 2026, Accepted on March 06, 2026, Published on March 30, 2026.
Copyright © 2026 Korea Institute of Intellectual Property.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives (https://creativecommons.org/licenses/by-nc-nd/4.0/) which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.

Abstract

This study proposes an advanced deep-learning–based patent retrieval model, NCE-KorPat, and a high-quality training dataset, KorPatSTS (Korean Patent Semantic Textual Similarity), to more precisely assess grounds for rejection arising from technological redundancy during the patent application process. The model recommends cited patent documents based on semantic and technical similarity between an application and prior art patents. KorPatSTS is a sentence-level dataset of similar patent-technology sentence pairs, drawing on the expertise of the Korea Ministry of Intellectual Property (MOIP) AI Examiner Advisory Group. The dataset aligns the technical constituent elements of claims in an application patent and the corresponding sentences in a detailed description, which explains those elements by considering the matching portions of prior art patents cited as grounds for rejection, thereby forming highly precise sentence-level correspondence pairs. In this study, the NCE-KorPat model was first developed by fine-tuning KorPatBERT, a patent-domain–specific language model, for CPC subgroup-level classification and then subsequently applying contrastive learning and optimization using the KorPatSTS dataset. When applied to Korean patent retrieval experiments, the proposed model demonstrated superior performance, outperforming both the previously best-performing Korean embedding models and state-of-the-art global embedding models. To the best of our knowledge, this study represents the first attempt to construct similar-technology sentence pairs systematically by integrating the domain expertise of Korean patent examiners and directly applying them to a practical patent retrieval model. The proposed approach is expected to substantially contribute to improving the accuracy and efficiency of patent examinations in the future.

Keywords

Intellectual Property Rights, Patent, KorPatBERT, KorPatSTS, Contrastive Learning, Patent Search

Notes

Conflicts of Interest

No potential conflict of interest relevant to this article was reported.

Funding

The author received manuscript fees for this article from Korea Institute of Intellectual Property.

Journal of Intellectual Property (J Intellect Property; JIP)

KCI Indexed
OPEN ACCESS, PEER REVIEWED

pISSN 1975-5945

eISSN 2733-8487

Research Article

A Contrastive Learning Patent Document Retrieval Model for Similar Patent Technologies Based on KorPatSTS

Abstract

Keywords

Notes

Conflicts of Interest

Funding

Section