Question answering over knowledge base(KBQA) is an active research area that allows answers to be obtained from knowledge base based on natural language input. Some KBQA systems have been intensively studied for simple questions, which can be answered by one fact in the knowledge base. Even though this task is called “simple”, it is actually not simple at all and is far from being solved perfectly. Moreover, the demand for complex question answering has increased quickly. Answering a complex question can require references to multiple related facts in the knowledge base.
An important line of KBQA research is the step-by-step pipeline method based on semantic parsing. Singh et al.[1] proposed a KBQA component framework to dynamically choose components to create a complete question answering pipeline. A KBQA pipeline consists of three key components:1) Entity linking, which links n-grams to knowledge base entities; 2) Relation linking, which identifies the knowledge base relation(s) to which a question refers; and 3) Query building, which builds the structured query to obtain answers from the knowledge base.
Relation linking, linking the extracted relations from the input question to their knowledge base occurrences, is a core component of the KBQA pipeline. In this work, we propose a multi-attention recurrent neural network(RNN)-based relation linking approach that can deal with relation linking problems for both simple and complex questions. For the input question, entity linking, the basic component of the KBQA pipeline, recognizes the entity mentions and links them to the corresponding entities in the knowledge base first. Then, a classifier is used to judge whether the question is complex or not. Based on the judgement, different relation linking models are used. For a simple question, a one-attention bidirectional long short-term memory (Bi-LSTM) model is applied to detect one relation from the question. For a complex question, a two-attention Bi-LSTM model is applied to detect two relations from the question.
Given a natural language question, entity linking and relation linking are the core tasks for question answering. There is a wide range of tools and research work in the area of entity linking. Mostly, research in this domain targets news corpora, documents and Wikipedia abstracts with long sentences. Based on a vector-space representation of entities, DBpedia Spotlight[2] uses the cosine similarity to rank candidate entities. AIDA[3] builds a coherence graph and applies dense subgraph algorithms for entity linking. Babelfy[4] uses the random walk and the densest subgraph algorithm to tackle the word sense disambiguation and entity linking tasks jointly. FOX[5] deals with named entity recognition based on ensemble learning. However, when these tools are applied to short text in a new domain such as question answering, the performance is limited. Considering short text, TAGME[6] recognizes named entities by matching terms with Wikipedia link texts and disambiguates the match using the in-link graph and the page dataset. KEA[7] begins with the detection of groups of n-grams and a lookup of all potential candidate entities for each n-gram. The disambiguation of candidate entities is based on local and context-related features.
Relation linking is a relatively new research area compared to entity linking. SIBKB[8] represents a background knowledge base as a bi-partite and a dynamic index over the relation patterns included in the knowledge base. A relation linking component is proposed based on the semantic index. With the recent progress of deep learning, neural network (NN)-based methods have been introduced to the relation linking task. NN-based methods represent both the questions and the relations as semantic vectors. Then, the relation linking process can be converted into a similarity matching process between an input question and its candidate relations in a semantic space. The candidates with the highest similarity score will be selected as the final relations. The main difference among these approaches lies in the encoder model and the input granularity. The encoder model can be an RNN model or a convolutional neural network (CNN) model. The input granularity can be word level or character level granularity. ReMatch[9] characterizes both the properties in knowledge and the relations in a question as comparable triples, then leverages both synonyms and semantic similarity measures based on graph distances from Wordnet (https://wordnet.princeton.edu/). Yu et al.[10] proposed a hierarchical RNN enhanced by residual learning for relation detection. Both EARL[11] and Falcon[12] perform joint entity and relation linking. EARL implements two different solution strategies. The first strategy formalizes the problem as an instance of the generalized traveling salesman (GTSP) problem and solves it with an approximation algorithm. The second strategy uses a machine learning method in order to exploit the connection density between nodes in the knowledge graph. Falcon performs joint entity and relation linking of a short text by using a light-weight linguistic approach. It utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources for candidate generation. Specifically, it leverages several fundamental principles of English morphology (e.g. compounding, headword identification) for candidate ranking.
Although there has been some research on entity and relation linking, solving the task of relation linking is still very challenging, especially the relation linking task for complex questions.
In this section, we introduce the overall architecture of our approach, followed by our dictionary-based entity linking approach and the multi-attention RNN based relation linking approach.
The general framework of the proposed approach is shown in Fig.1. Let us consider an example to explain the underlying idea:How many developers make software for Unix-like operating systems? The entity linking component is expected to recognize the mention “Unix-like” as a named entity and link it to the corresponding entity “dbr:Unix-like” in the knowledge base. The relation linking component classifies the question into simple and complex categories. For a simple question, a one-attention Bi-LSTM is applied to detect one relation from the question. For a complex question, a two-attention Bi-LSTM is applied to detect two relations from the question.
Fig.1 Overall architecture of the proposed approach
Entity linking is the first component of the QA pipeline, and the results greatly affect the following components. In general, entity linking is a two-step process to obtain the topic entity from the question. The first step is to recognize the named entity mention. The second step is to disambiguate or link the entity to the knowledge base. The general framework of the proposed entity linking approach is shown in Fig.2.
Fig.2 Architecture of the entity linking
For named entity recognition, we regard it as a sequence annotation task. We encode the question at the word and character levels to provide additional advantages in dealing with unknown or out-of-vocabulary (OOV) words. When a question sequence comes, the pre-trained word embedding and character embedding characters are retrieved to construct a matrix. Then, the Bi-LSTM is applied to generate an abstract representation of the question. Finally, an additional conditional random field (CRF) tagging model is used to detect the named entity mention. For the labeling of the mentions, the BIO tagging scheme (B=beginning of an entity, I=inside of an entity, O=outside of an entity) is used.
For entity disambiguation, we collect a dictionary on the surface forms of entities from entity pages, redirect pages, and disambiguation pages in Wikipedia. We can then generate candidate entities for the given mention on the basis of this dictionary and map these candidate entities in Wikipedia to the entities in the DBpedia knowledge base. For candidate entity ranking, we consider the edit distance and a few linguistic-based rules.
First, a classifier is used to judge whether the question is complex or not. Secondly, simple or complex relation candidates are obtained from the generated subgraph according to the complexity of the question. Finally, different attentive RNN models are applied to encode the given question and relation candidates and score the similarity between them.
The aim of the simple/complex classification is to build a binary classifier to efficiently predict whether a given question is simple or complex. The input of the classifier is the features of the question, and it contains the length of the question, the number of entities, and word POS tagging. After testing eight classifiers, we select logistic regression which performs outstandingly well to classify the complexity of the question.
For each question q, we use the entity linking component to identify the topic entity, which can be simply regarded as the main entity of the question. After obtaining the topic entity, we collect its one-hop or two-hop relations according to the classification of the question. For a simple question, one-hop relations are collected to constitute a simple relation candidate set RS. For a complex question, two-hop relations are collected to constitute a complex relation candidate set RC.
According to the complexity of the question, we adopt different multi-attention Bi-LSTM models to detect relations. For a simple question, the one-attention Bi-LSTM model is applied to detect one relation from the question. For a complex question, the two-attention Bi-LSTM model is applied to detect two relations from the question.
We take the complex question as an example to explain the process. The topic entity mention is replaced in the question with a token “
For each relation r in RC, we describe it from two levels, namely, relation level and word level, and transform each part to its trainable embedding to obtain their vector representations. The running question “How many developers make software for Unix-like operating systems?” is classified as a complex one. Hence, the relation candidate set RC contains all two-hop relations connected to the topic entity. Take the two-hop candidate relation “operatingSystem·developer” for instance. We deal with the relations “operatingSystem” and “developer” separately and feed them into the two-attention Bi-LSTM model. For each input relation, we describe it from the relation and word levels. For example, the relation “operatingSystem” can be divided into “operating” and “system” at the word level. The relation level of the single word “developer” is the same as the word level. The word embedding is fed into the Bi-LSTM network. Then, we obtain ri(i∈{1, 2}) by max-pooling the relation and word levels, respectively. r1 represents the first relation aspect in relation r, and r2 represents the second relation aspect.
Each word in the question is initialized to its word embedding. Then, the embedding is fed into the Bi-LSTM network to obtain the hidden representations, H=[h1,…,hL]. L is the length of the question. Each vector hj is the concatenation between forward/backward representations at time j. We adopt the additive attention mechanism proposed in Ref.[13]. Each aspect ri(i∈{1, 2}) of the relation pays different attention to the question and decides how the question is represented. The extent αij of the attention is used as the weight of each word in the question. Thus, for different relation aspects ri, the corresponding question pattern representation pi is calculated as
(1)
Fig.3 Architecture of the two-attention Bi-LSTM model
(2)
wij=vTtanh(WT[hj;ri]+b)
(3)
where αij denotes the attention weight of the j-th word in the question in terms of relation aspect ri. Let the dimensions of ri and hj be m and n, respectively. Then, W and v are the parameters to be learned with W∈Rc×(m+n), v∈R1×c (where c is a hyper parameter). b is the offset.
At this point, we have the representations of question pattern pi and relation aspect ri. Their similarity is calculated by the following equation:
Si(pi,ri)=pi⊗ri i=1,2
(4)
The operation ⊗ here is the dot product of two vectors.
We compute a matching score S(P,r) that represents the similarity between the question pattern and the candidate relation.
S(P,r)=sigmoid(WT[S1(p1,r1);S2(p2,r2)]+b)
(5)
For all r∈RC, the final prediction is with
(6)
The model described above is trained with a ranking loss to maximize the margin between gold relation r+ and other relations r- in the candidate set:
(7)
where γ is a constant parameter.
We use the LC-QuAD[14] question dataset in our experiments. LC-QuAD (https://figshare.com/projects/LC-QuAD/21812) contains 5 000 questions (4 000 for training and 1 000 for testing) with their intended SPARQL queries for DBpedia. DBpedia is an encyclopedic knowledge base, and the types of entities mentioned in the questions for DBpedia exhibit large variations. There are 5 042 entities, 615 relations, and multifarious entity types (for example, book, television show, company, and politician) in the dataset. We supplement the dataset with the entity mention and relation mention annotations(https://github.com/wds-seu/KBQA-multi-attention-RNN) to benefit the evaluation of each KBQA pipeline step. All reported experiments are run on the DBpedia 2016-04 version.
We report and analyze the experimental results of the entity linking, relation linking components separately to show their effectiveness.
To evaluate the proposed entity linking component, we compare it against the state-of-the-art tools. We report the experimental results of the systems integrated in Gerbil[15] i.e., KEA[7], FOX[5], Babelfy[4], and AIDA[3]. Gerbil is a benchmarking framework for entity linking systems. These methods deal with entity linking task based on the graph algorithm or context-related features. We also report the performance of TAGME[6], DBpedia Spotlight[2], EARL[11] and Falcon[12] for entity linking. Most of these approaches (except Falcon) use state of the art machine learning techniques. It is important to note that Falcon utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources for candidate generation.
Tab.1 lists the macro precision, macro recall, and macro F1 of all systems on the LC-QuAD dataset. The performance of the compared systems listed in Tab.1 is obtained directly from Ref.[12]. The word embedding in our approach is initialized using two pre-trained word vectors, GloVe[16] and BERT[17]. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. BERT, a pre-trained transformer network, is designed to pre-train deep bidirectional representations from unlabeled text. The results demonstrate that Falcon and our approach significantly outperform other systems. Our approach, whether initialized by GloVe or BERT, is superior to other methods in precision. When initializing with BERT, it performs better in both precision and F1 value. The reason for Falcon’s better recall value is that it uses more external knowledge sources to generate candidates.
Tab.1 Macro precision, recall, and F1 values of entity linking approaches
ApproachPrecisionRecallF1KEA0.0010.0010.001FOX0.530.510.51Babelfy0.430.500.44AIDA0.500.450.47DBpedia Spotlight0.600.650.61TAGME0.650.770.68EARL0.530.550.53Falcon0.810.860.83Our GloVe-based approach0.830.810.81Our BERT-based approach0.870.850.86
In the process of relation linking, we evaluate different models to prove that the one-attention Bi-LSTM model is applicable to simple questions and that the two-attention Bi-LSTM model is applicable to complex questions. Among the 1 000 testing questions in LC-QuAD, there are 366 simple and 634 complex questions, respectively. On the basis of the assumption that the simple/complex classification is completely correct, we evaluate the Bi-LSTM model without attention and with one attention separately on the simple question dataset. We also evaluate the Bi-LSTM model without attention, with one attention, and with two attentions on the complex question dataset. We set the embedding dimension to 300 in training process, and all word embedding is initialized using the pre-trained GloVe here. The comparison results are shown in Tab.2.
Tab.2 Macro precision, recall, and F1 values of different models on simple and complex LC-QuAD datasets
DatasetModelPrecisionRecallF1Simple (366)Without attention Bi-LSTM0.510.510.51One-attention Bi-LSTM0.520.520.52Complex (634)Without attention Bi-LSTM0.500.500.50One-attention Bi-LSTM0.490.490.49Two-attention Bi-LSTM0.520.520.52
We observe that the results with attention are better than those without attention for simple questions. Hence, adopting the attention mechanism in the Bi-LSTM model can improve performance. For complex questions, the results of the two-attention Bi-LSTM model that separates two relations are better than those of the one-attention Bi-LSTM model or the model without attention. It means that, merging two aspects of the relation into one attention or without attention is not appropriate for complex questions.
To illustrate the effectiveness of the attention mechanism clearly, we visualize the attention distribution of the question words in the question encoding process (see Fig.4). Color depth represents the attention weights, a darker color indicates a higher attention weight. From the simple question example, we can observe that the one-attention model is able to capture the attention properly. For the complex question, each question word learns a different weight for different relation aspects.
To evaluate our relation linking component, we apply the multi-attention Bi-LSTM model on the basis of the results of our simple/complex classification. The results of our proposed relation linking approach in comparison with four baselines are listed in Tab.3. These baselines are SIBKB[8], ReMatch[9], and the recently released EARL and Falcon systems. Both SIBKB and ReMatch leverage semantic similarity measures to perform relation linking. EARL applies both the graph algorithm and machine learning for relation linking. Falcon utilizes an extended knowledge graph and a linguistic understanding method.
Fig.4 Visualized attention heat map
Our evaluation metrics are the same as those of Falcon. We list two experimental results of our relation linking component. Our RL (golden EL) employs our relation linking component on the basis of the golden entity linking outputs. Our RL (our EL) employs our relation linking component on the basis of our entity linking (BERT) component outputs. From the results, we observe that our independent relation linking component outperforms the state-of-the-art results by Falcon. Despite the error propagation caused by entity linking, the results of our relation linking component are still promising. The experimental results show that it is difficult to obtain better results by only relying on semantic similarity or graph algorithm. Although Falcon leverages external resources to obtain a better recall value, its accuracy is limited by its linguistic methods. Our multi-attention RNN model can better deal with the relation linking task.
Tab.3 Macro precision, recall, and F1 values of relation linking approaches
ApproachPrecisionRecallF1SIBKB0.130.150.14ReMatch0.150.170.16EARL0.170.210.18Falcon0.420.440.43Our RL (golden EL)0.470.470.47Our RL (our EL)0.450.450.45
For error analysis, we list some question examples in Tab.4 (the named entity mentions are underlined), for which our approach does not output correct results.
Tab.4 Failure analysis results
ReasonSample questionEntity linkingQ1:What is the serving railway line of Warwick railway station, Perth?Q2:List the Sci-fi TV shows with theme music given by Ron Grainer?Q3:Who started at the pole position in both 1997 canadian grand prix and the 94 spanish one?Q4:Which Texas based company was founded by Jim Harris?S/C classificationQ5:What is the baseball team whose manager is Chip Hale?Relation linkingQ6:Give me the home town of all musical artists who uses Guitar as instrument?Q7:What magazine companies are of form Limited liability company?Q8:Which universities are also known as the Tulane Green wave?Q9:What was the language of the single which came before To Know Him Is to Love Him?Q10:Who was the voice actor of allen walker also gave voice to kimihiro watanuki?
The first part of the samples is the failure of the entity linking component, which means that we fail to recognize the correct entity mentioned in the question. Such occurrence is common when the mention is too long or too sketchy, such as the long mention “Warwick railway station, Perth” in Q1, the acronym mention “Sci-fi” in Q2, and the long mention “1997 canadian grand prix” and the omitted mention “94 spanish one” in Q3. There are also failures which links the mention to the entity with the same name but the wrong type. For example, the mention “Jim Harris” is linked to the wrong entity “Jim_Harris_(politician)” instead of “Jim_Harris_(entrepreneur)” in Q4.
The second part of the samples is the failure of the simple or complex classification for the question. For example, Q5 is a simple question with a type constraint. However, it is incorrectly recognized as a complex question. The incorrect classification leads to the incorrect results of the following components.
The third part of the samples is the failure of the relation linking component, which has the largest proportion. One cause is the error propagation; for example, the wrong entity linking results for Q6 and Q7 lead to the failure of relation linking. Another cause is that implicit relation is difficult to detect. For the simple question Q8, the surface form “are known as” is not identified as relation “nickname” correctly. The surface form “before” in Q9 is not identified as relation “previousWork” correctly. The causes of wrong relation linking for complex questions are varied because large numbers of two-hop relation candidates compound the difficulties of relation linking. For the complex question Q10, we incorrectly detect the relation “creator·voice” instead of “voice·voice”.
1) For entity linking, which is an essential component for relation linking, we train a Bi-LSTM-CRF model to recognize entities mentioned in a question and link them to the entities in the knowledge base based on a dictionary.
2) We propose a relation linking approach based on the multi-attention RNN, which can deal with the relation linking problem for both simple and complex questions. We first classify questions as simple or complex questions. On the basis of the classification, we apply different attention-based Bi-LSTM models for relation linking.
3) The experimental results show that our entity and relation linking components achieve the best reported macro precision on the LC-QuAD dataset.
4) In future, we plan to improve the relation linking results for complex questions. We also intend to consider aggregation constraints to generate final structured queries for the questions.
[1]Singh K, Radhakrishna A S, Both A, et al. Why reinvent the wheel:Let’s build question answering systems together[C]//Proceedings of the 2018 World Wide Web Conference. Lyon, France, 2018:1247-1256. DOI:10.1145/3178876.3186023.
[2]Mendes P N, Jakob M, García-Silva A, et al. DBpedia spotlight:Shedding light on the web of documents[C]//Proceedings of the 7th International Conference on Semantic Systems. Graz, Austria, 2011:1-8. DOI:10.1145/2063518.2063519.
[3]Hoffart J, Yosef M A, Bordino I, et al. Robust disambiguation of named entities in text[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK, 2011:782-792.
[4]Moro A, Raganato A, Navigli R. Entity linking meets word sense disambiguation:A unified approach[J]. Transactions of the Association for Computational Linguistics, 2014, 2:231-244. DOI:10.1162/tacl_a_00179.
[5]Speck R, Ngonga Ngomo A C. Ensemble learning for named entity recognition[C]//Proceedings of the 13th International Semantic Web Conference. Trentino, Italy, 2014:519-534. DOI:10.1007/978-3-319-11964-9.
[6]Ferragina P, Scaiella U. TAGME:On-the-fly annotation of short text fragments (by wikipedia entities)[C]//Proceedings of the 19th ACM Conference on Information and Knowledge Management. Toronto, Canada, 2010:1625-1628. DOI:10.1145/1871437.1871689.
[7]Waitelonis J, Sack H. Named entity linking in #Tweets with KEA[C]//Proceedings of the 6th Workshop on ‘Making Sense of Microposts’. Montreal, Canada, 2016:61-63.
[8]Singh K, Mulang I O, Lytra I, et al. Capturing knowledge in semantically-typed relational patterns to enhance relation linking[C]//Proceedings of the Knowledge Capture Conference. Austin, TX, USA, 2017:1-8. DOI:10.1145/3148011.3148031.
[9]Mulang I O, Singh K, Orlandi F. Matching natural language relations to knowledge graph properties for question answering[C]//Proceedings of the 13th International Conference on Semantic Systems. Amsterdam, the Netherlands, 2017:89-96. DOI:10.1145/3132218.3132229.
[10]Yu M, Yin W, Hasan K S, et al. Improved neural relation detection for knowledge base question answering[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, 2017:571-581. DOI:10.18653/v1/p17-1053.
[11]Dubey M, Banerjee D, Chaudhuri D, et al. EARL:Joint entity and relation linking for question answering over knowledge graphs[C]//Proceedings of the 17th International Semantic Web Conference. Monterey, CA, USA, 2018:108-126. DOI:10.1007/978-3-030-00671-6_7.
[12]Sakor A, Mulang I O, Singh K, et al. Old is gold:linguistic driven approach for entity and relation linking of short text[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, MN, USA, 2019:2336-2346. DOI:10.18653/v1/n19-1243.
[13]Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations. San Diego, USA, 2015.
[14]Trivedi P, Maheshwari G, Dubey M, et al. LC-QuAD:A corpus for complex question answering over knowledge graphs[C]//Proceedings of the 16th International Semantic Web Conference. Vienna, Austria, 2017:210-218. DOI:10.1007/978-3-319-68204-4_22.
[15]Usbeck R, Röder M, Ngomo A C N, et al. GERBIL:General entity annotator benchmarking framework[C]//Proceedings of the 24th International Conference on World Wide Web. Florence, Italy, 2015:1133-1143. DOI:10.1145/2736277.2741626.
[16]Pennington J, Socher R, Manning C D. GloVe:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 2014:1532-1543. DOI:10.3115/v1/d14-1162.
[17]Devlin J, Chang M, Lee K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Minneapolis, MN, USA, 2019:4171-4186.