An Intelligent Patent Summary System Deploying Natural ...

An Intelligent Patent Summary System Deploying Natural Language Processing

and Machining Learning A.J.C. TRAPPEYa,1 C.V. TRAPPEYb, J. W.-C. WANGa and J.-L. WUc

a Department of Industrial Engineering and Engineering Management, National Tsing

Hua University, Hsinchu, Taiwan b

Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan

c Department of Information Management, Chinese Culture University, Taipei, Taiwan

Abstract. With the growing awareness of the fourth industrial revolution and its implication, there are increasing applications of artificial intelligence to secure the value of intellectual property (IP), to develop competitive products, and to license IP. Natural language processing based on artificial intelligence methods have not been sufficiently developed and there remain many obstacles for current researchers to interpret the meaning of large numbers of intellectual property documents. The means to explain the related intellectual property documents (e.g. products in a given domain) and summarize sets of remain a significant challenge. In this research, we develop an intelligent patent summarization system based on artificial intelligence approaches that include Recurrent Neural Network (RNN), Word Embedding, and Attention Mechanisms. The aim of this system is to automatically summarize technical documents in a specific patent domain and identify potential opportunities or liabilities for R & D engineers, lawyers, and managers. The AI-based experiment-based solution for summarization is used to capture the key technique keywords, popular technical terms, and new technical terms. The compression ratios and the retention ratios are used to evaluate the density and consistency of critical information for the proposed summarization system and to measure the quality of the summary system output.

Keywords. Artificial intelligence, Natural language processing, Recurrent Neural Network, Intellectual property

Introduction

With the advancement of science and technology, there is rapid evolution of technology greatly increased the value of intellectual property. Enterprises use intellectual property (IP) such as patents licensed of providing royalties to the owners to develop competitive products that cannot be copied unless merchandise is particularly important. In the era of AI and big data, a great amount of IP information has been created along with the greatest of technology and R & D progress. The problem of “Information Overload” has stymied companies and researchers, creating bottlenecks to discover the important IP technologies and then meaning in a timely manner. Automatic summarization is an indispensable key technology to employ when there are so many essential patents being traded and lawing litigation. The purpose of automatic summarization is to retrieve important semantic and subject information in the single document or sets of the document so that researchers can efficiently review

1 Corresponding Author.

Transdisciplinary Engineering Methods for Social Innovation of Industry 4.0M. Peruzzini et al. (Eds.)

© 2018 The authors and IOS Press.This article is published online with Open Access by IOS Press and distributed under the terms

of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).doi:10.3233/978-1-61499-898-3-1204

1204

and derive the semantic meaning, access the key information, and reduce the time to examine the document [1].

There are two broad approaches to summarization: extractive and abstractive. Extractive methods assemble summaries exclusively from passages taken directly from whole documents, wherein abstractive methods used. The complete contents of the document, and the content of the original document is re-written to produce a summary, which is not entirely from the original document [2]. Neural network models based on the attentional encoder-decoder model for machine translation can generate abstractive summaries with high Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores. However, these systems have typically focused on summarizing short input sequences (one or two sentences) to generate short summaries [3]. The analysis by Nallapati et al., [4] illustrate a key problem with attentional encoder-decoder models: they often generate unnatural summaries consisting of repeated phrases. In addition, few researchers are use automatic summarization techniques to essential intellectual property documents such as patents, which may be quite lengthy (one 40 pages for many patents).

In this research, we propose a new patent summary model, which includes sequence to sequence with attention method, machine learning algorithms, and word embedding. Word embedding enables the algorithm to learn semantic relations, without human effort, to identify the relationship of texts or phrases. The model uses deep learning to facility organizing the meaning of intellectual property documents and writing the summary, which includes key technical phrases, sequence to sequence arrangement and key technical sentences.

1. Literature review In the literature review, the research introduces some of natural language processing techniques, automatic summarization methods, and summary verification methods.

1.1 Natural language processing

The purpose of Natural Language Processing (NLP) is converts the human language into a formal representation that is easy for computers to manipulate. Natural language processing occurs in two ways: One is from human to computer, letting the computer input human language into a program for processing. The second is for the computer to provide results calculated by the computer into comprehensible human language. Current end applications include information extraction, machine translation, summarization, search and human computer interfaces [5]. Text mining is used to reduce the data dimension and improve processing efficiency. Some words or phrases are automatically filtered out before the processing of natural language data (or text), and these words or phrases are called stop words. Text filtering is used in the text pre-processing stage and discover critical information for analysis [6].

Word embedding is core of language modeling and features learning techniques of NLP. Word embedding considers the dimensionality of all words from the original high-dimensional space document to construct a lower-dimension document using continuous vector space to construct a set of features for each word from the text. Distributed representations of the set of features are made using a neural network-based language model that enables machines to learn the distributed representation of words and achieve the purpose of reducing the word space dimension [7].

A.J.C. Trappey et al. / An Intelligent Patent Summary System 1205

The research uses the programming language to process the patent text and data such as word segmentation, removing Stop word, and converting text into embedding which must disassemble the human sentence to understand the semantic meaning. Then form a vector of words to input the RNN for follow-up training and production of the human-readable summary result.

1.2 Automatic summarization technique

Severyn & Moschitti [8] proposed the Convolution Neural Network (CNN) model to rank and rematch the sentences. CNN applies a series of methods to input sentence vectors including convolution, non-linearity and reorganization operations. The purpose of the convolutional layer is to extract the relationships between the word order of sentences so that reorganization of individual words with new input materials create a meaningful summary. Yao et al., [9] proposed a new Deep Neural Network (DNN) method for processing documents into abstracts. The DNN trains the sentence using a descending gradient method to optimize the sentence output weight. The K Nearest Neighbor method classifies the relevance of sentences for the final summary.

Nallapat et al., [4] proposed the sequence to sequence with attention model to learn the sequence traits by using the embedding vectors, known as the encoder-decoder model. The core concept is to use Recurrent Neural Network (RNN) to learn the information of a sequence, condense it into a vector, and then use another recurrent neural network to decode this information to generate another sequence. See et al., [10] proposed the hybrid pointer-generator network model. The model uses pointers to identify attention words and then uses the training model to predict new words, which do not occur in the original document. This method is used to solve the Out-Of-Vocabulary (OOV) problem by copying the attention word in the original text, while retaining the ability to automatically create or predict new words.

The research uses sequence to sequence with attention to train the relevant domain patent sets and input testing sets to find out the word "attention" through feedback after the training is completed, then merge the words or sentences to produce summary.

1.3 Summary verification technique

Lin [11] proposed Recall-Oriented Understudy for Gisting Evaluation (ROUGE), which is to measure natural language processing for text summaries and machine translation results quality. The result of output is compared with the label of human compilation such as n-grams, word sequences, and word matching has been used to verify the similarity of system results to human natural language.

Hovy & Lin [12] proposed that Compression Ratios (CR) and Retention Ratios (RR) are two important indicators for measuring the quality and accuracy of document summarization. The CR is used to calculate the ratio of the number of words produced in a summary of the original text document. The compression ratio is used to measure the proportion of the total amount of information that is compressed. The RR is used to calculate the average of recall ratio and precision ratio that which measures the output of key words selected from the original document [13].

A.J.C. Trappey et al. / An Intelligent Patent Summary System1206

2. Metholodogy In this research, we propose a new patent summary model which includes the sequence to sequence with attention method, machine learning algorithms, and word embedding. The methodology consists four parts: text and document pre-processing, applying sequence to sequence with attention model, testing the output summary, and verification.

The processes of the model uses a large set of related patents. The Latent Dirichlet allocation (LDA) algorithm is used to identify the important technical topics. In order to achieve a reliable summary, the patents are classify into groups using unsupervised clustering method including K-means and the hierarchical algorithm. For the text pre-processing of group patents, text is convert to lowercase, and stop words are removed. In the sequence to sequence with attention process, the text must convert to sequence called word embedding. The embedded words ahead to the vocabulary by the patents to ensure that the machine can correctly judge the semantics and attributes of the words. The research uses the gradient descent method to train the data, and the trained model generates the output summary. The model is verified using the CR and RR methods. The flowchart of the model as shown in Figure 1.

Figure 1. Flowchart of summary model.


Step 1. Collect the patent data and clustering with topic patents Patents are collected the predefined research domain and then topic are abstracted

LDA algorithm. The patents are clustered into topic patents with the K-means and Hierarchical algorithm. Step 2. Data pre-processing

When pre-processing a group patents, to a set of meaningful words, text is convert to lowercase, and stop words and punctuation are removed. The documents are classified into a training sets and testing sets which testing sets are used to generate the summary. The research separates the training sets using input type and label type which input type includes patent title and patent claim and label type includes patent abstract. Step 3. Sequence to sequence

The training set is processed using the Python package-Word2Vec [14] and uses DNN model to train and output word embedding. This stage converts vocabulary into a low-dimensional vector mode (300 dimensions). Then, sequence-to-sequence models include an encoder step and a decoder step. In the encoder step, a model converts an input sequence into a fixed representation. In the decoder step, a language model is trained on both the output sequence as well as the fixed representation from the encoder. Since the decoder model sees an encoded representation of the input sequences as well as the label (abstract) sequence, the model is better to predict future words based on current words. Step 4. Sequence to sequence with attention

During the decoding phase, the model uses the decoder network to combine the encoder states and pass information to a feedforward network. The feedforward network returns weights for each encoder state. The model multiplies the encoder inputs by these weights and then compute a weighted average of the encoder states. The decoder network can now use different portions of the encoder sequence as context while it is processing the decoder sequence, instead of using a single fixed representation of the input sequence. This allows the network to focus on the most important parts of the input sequence instead of the whole of input sequences, therefore producing smarter predictions for the next word in the decoder sequence. Step 5. The output of summary verification The model will first label keywords based on expert perspectives, then CR used to calculate the ratio of document compression. The research defines input patent, including text size as , and output of summary texts size as . The formula of CR:

Moreover, RR used to calculate the ratio of the number of keywords retained or removed during the model processing to measure the quality of the summary. The RR mainly used to measure the output includes the density and consistency of key information. The research defines input patent, including keywords as , and output of summary keywords that overlap with as . The formula of RR:

The above mention methods called Ground truth or gold standard. Finally, the RR value will be compared with the general method such as Tf-idf, to verify our model will be more accurate.


3. Case Study The case topic of intellectual property to be explored in this study is advanced technologies for smart retailing. In the section, the related patent documents is compiled using the summary system developed in the research. The flowchart of patent technical summary report generation is shown in Figure 2. In the initial step, a set of patents in a given topic (sub-domain) must be classified into training set and testing set, which training sets are used to input smart summary processing to train the model and testing sets are used to generate the summary. Then, summary report includes input patents, summary result (Novelty, use, advantage), and key phrases. The feature of patent includes innovative, usability, and advantage that more specifically represent patent content. Novelty is one of the most important criteria an applicant must demonstrate for an invention before a patent can be granted. The major requirements for patent novelty including the concepts of prior publication, prior use, and prior advantage. The major groupings of patent novelty requirements in the world—local novelty, relative novelty, and the increasingly important absolute novelty are considered [15]. The summary report generation flow must follow the Figure 2 that the summary output can be classified novelty, use, advantage which based on Derwent innovation (DI) patent platform of patent abstract format. The research collects raw patent sets by the DI that the patent information and abstract are important material for this study and model. So the output report are followed the DI patent format, which has shown in Figure 2.

Figure 2. Flowchart for generating the summary report.

The study targets a technology-based smart retail ontology that defines the sub-

technologies in the three subclasses domain of an ontology based on relevant important documents and information. These include e-transaction, customer experience and information integration.


The patent search was conducted using the Derwent innovation platform. The study includes the case study of patents related to e-transactions of smart retailing. The preliminary search results show that there are 367 patent families. In addition, the patents are clustered into six groups according to the keywords, unsupervised learning clustering algorithms and silhouette authentication. The data are divided into 90 group 1 patents, 21 group 2 patents, 69 group 3 patents, 52 group 4 patents, 96 group 5 patents, and 34 group 6 patents. The case study selects group 1 patents to conduct a preliminary analysis.

The key technical word from the research shows that the group 1 patents using LDA algorithm is “Mobile payment”. Most of the keywords are related to the customer wireless transaction device, wireless device location, image location detection, virtual image recognition mechanisms. The study divides initial group of 90 into 85 training set and 5 testing set. Table 1 shows the testing set of the group 1 patent “Mobile payment.”

Table 1. Testing sets of group 1 patents. Patent1 US9224154B2

Title Method for providing payment card-linked offers based on single loyalty program account of

user for payment of e.g. goods, involves identifying reward based on loyalty program, and receiving card number in association with transaction

Patent2 US8968103B2

Title Method for performing multimedia capture and encrypting using ephemeral messaging and

multimedia application, involves receiving registration request to register user in mobile payment service, where user sends registration request

Patent3 US8972297B2

Title Method for conducting transaction using e.g. mobile phone, involves establishing communication

session with nearest financial transaction terminal, and receiving prompt from financial transaction terminal for user to complete transaction

Patent4 US9508093B2

Title Method for performing electronic commerce, particularly electronic gifting, involves preparing

gifting order and gifting transaction document on gifting server that has redemption credentials, and communicating to personal electronic device

Patent5 US9202244B2

Title Service providing method for patron e.g. customer at resort, involves providing mobile devices for both patron and staff member for placing order and viewing order made by patron, respectively.

The patents feature can be divided into novelty, use, and advantage properties so that

the summary output format will follow to a consistent format. Table 2 shows the summary output of testing set patents.

Table 2. Summary output of testing set patents.

Novelty

Method receiving payment credentials corresponding payment mechanism associated bank mobile device based adjusting transaction card account identifier provided authorize void transactions terminal identifier id read customer number associated another payment transaction. Data associated set accounts associated participants received multimedia content transmitting recipient currency received sender universal application executed mobile payment service sent merchant. Data file containing user mobile device based location mobile device nfc inform user merchant associated indication transaction. Opening portable electronic transaction data associated set consumers processor based adjusting electronic payment option identification wireless device positioned user purchase item selected device purchase item service. Characteristic location orientation mobile device includes image associating users mobile device network interface logic mobile wallet computer system wireless communications mobile device network interface logic mobile wallet computer system method paying reader receives service saas compatible application identification wireless.


Use

Method providing graphical location based transaction card moving car retail environment facilities allocating funds account transferring amount payment system. Processing pin debit transaction conducted website operated physical merchant performing internet ecommerce transaction transmitting ephemeral authorization. Transaction nfc transaction hand held mobile communication device displaying pictogram, acquired device installs transaction details. Electronic transaction data routing electronic phone electronic phone receiving authorization electronic device data buyer transaction vehicle received transaction data buyer access secure. Access control services management vehicle car receiving service provider.

Advantage

Method enables providing convenience purchaser buy option goods services without requiring cardholder purchase install special software hardware pc without requiring merchant make right purchase desired goods services using mobile device simple secure manner method enables queuing providers prevent eavesdropping tampering mobile communication device enables facilitating users allowed perform fund transfers payments types transactions. Direct mobile device sensitive information regarding mobile device directly issuer authorization reducing potential identity theft fraudulent purchases nfc device reduced. To facility widespread usage pin debit payment application entity controlled transaction method enables preventing ephemeral card select enhanced number pin secure high volume scanning operation obtain digital image checks payments users, to improve effectively service plans bills.

The testing set of patents in the novelty section highlight retail customer transactions

using mobile devices, that includes technology for payment vouchers, users of image-identification-related devices, and devices for wireless communications. For the use and advantage section, the patents provide a method of allocating the transfer amount based on image location recognition, including debit transactions, information transmission security, vehicle location identification and service management. Through remote monitoring, image recognition and decision-making, are used to facility high quality trading patterns. Table 3 shows the CR and RR summary output. The input testing set of patents have 1,414 texts, and the output summary texts including 135 novelty texts, 82 use texts, and 110 advantage texts. The research shows the keyword testing set based on the LDA algorithm in table 4. The summary keywords that overlap with testing sets are be marked in red.

Table 3. Verification of summary output.

Novelty Use Advantage Compression ratio 9.5% 5.8% 7.8%

Retention ratio 76.2%

Table 4. Keywords of input sets and summary

Keywords Testing sets Terminal, location, gifting, bank, multimedia, nfc,

identification, associated, protocol, secure, loyalty, transmitting, redemption, credentials, ephemeral,

authorization, debit, barcode, interface, logic, image Novelty Currency, terminal, location, bank, orientation, nfc,

identification, associated, id, transmitting, credentials, positioned, saas, authorization, portable, interface, logic,

image Use Debit, pin, secure, transferring, location, nfc, transmitting,

ephemeral, pictogram, allocating, graphical, routing, Advantage Secure, authorization, cardholder, pc, sensitive,

eavesdropping, nfc, ephemeral, scanning, debit, image


4. Conclusion The research automatically summarizes technical reports for a specific domain of patents. In the experimental results, the AI-based solution for abstractive summarization captures the key technique keywords, which are prominent technical terms and new technique terms. The summary results compress the key information into a technical summary to enable companies or researchers to ascertain the important IP technologies and meaning in a timely manner. The research constructs a natural language processing and machine learning based method to assist the researchers to identify semantic meanings, access key information, and eliminate time examining excessive amounts of IP documents. In the future, the goal is to construct the models which combine the RNN and CNN algorithms. Which will process multi-documents text sets with high accuracy and reliable natural language processing.

Acknowledgement This research is supported by Institute of Information Science, Academia Sinica in Taiwan.

References [1] R. Ferreira, F. Freitas, L. de Souza Cabral, R.D. Lins, R. Lima, G. Franca and L. Favaro, A context ba

sed text summarization system. In Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on IEEE, 2014, pp. 66-70.

[2] Z. Cao, F. Wei, L. Dong, S. Li am M. Zhou, Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, pp. 2153-2159.

[3] R. Paulus, C. Xiong and R. Socher, A Deep Reinforced Model for Abstractive Summarization, arXiv preprint arXiv:1705.04304, 2017.

[4] R. Nallapati, B. Zhou, C. Gulcehre and B. Xiang, Abstractive text summarization using sequence-to-sequence rnns and beyond, arXiv preprint arXiv:1602.06023, 2016.

[5] R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine learning, ACM, 2008, pp. 160-167.

[6] I.A. El-Khair, Effects of stop words elimination for Arabic information retrieval: a comparative study,International Journal of Computing & Information Sciences, Vol. 4(3), 2006, pp. 119-133.

[7] T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient Estimation of Word Representations in Vector Space, arXiv:1301.3781, 2013.

[8] A. Severyn and A. Moschitti, Learning to rank short text pairs with convolutional deep neural networks, In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2015, pp. 373-382.

[9] C. Yao, J. Shen and G. Chen, Automatic Document Summarization via Deep Neural Networks, In Computational Intelligence and Design (ISCID), 2015 8th International Symposium on, Vol. 1, IEEE, 2015, pp. 291-296.

[10] A. See, P.J. Liu and C.D. Manning, Get To The Point: Summarization with Pointer-Generator Networks, arXiv preprint arXiv:1704.04368, 2017.

[11] C.-Y. Lin, ROUGE: A Package for Automatic Evaluation of Summaries, Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain, 2004,

[12] E. Hovy and C-Y. Lin, Automated text summarization in summarist, in I. Mani et al. (eds.), Advances in Automatic Text Summarization, MIT Press, Cambridge, 1999, pp. 1-14.

[13] A.J. Trappey and C.V. Trappey, An R&D knowledge management method for patent document summarization, Industrial Management & Data Systems, Vol. 108(2), 2008, pp. 245-257.

[14] O. Levy and Y. Goldberg, Dependency-Based Word Embeddings, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, June 23-25 2014, pp. 302–308.


[15] C. Oppenheim, Patent novelty; proposals for change and their possible impact on information scientists, Journal of Information Science, Vol. 10, 1985, Issue 4, pp. 181-186.


An Intelligent Patent Summary System Deploying Natural ...

Documents

Transcript of An Intelligent Patent Summary System Deploying Natural ...