ДК 004

The analysis of modern approaches of semantic parsing

Александр Пак – доцент кафедры информационных технологий Казахстанско-Британского технического университета, г. Алматы, Республика Казахстан

Жалгас Жиенбеков – магистрант Казахстанско-Британского технического университета, г. Алматы Республика Казахстан

Abstract: Semantic parsing has recently gotten a lot of press in the community. Even though many neural modelling attempts have considerably improved performance, the data scarcity problem still exists. In semantic parsing, modelling apparent logical regularities is critical, making it challenging for neural networks to obtain high outcomes without task-specific previous knowledge. This article will introduce critical views on modern approaches to semantic parsing, namely preliminary data preparation, modification, and a unique framework for infusing prior information into a model. Also, an overview of how to generate a high-precision synchronous context-free language from the training data captures essential conditional independence aspects encountered in semantic parsing. The model is then taught about these structural properties using datum from this language to train a sequence-to-sequence re-current network (RNN) model with an attention-based coping mechanism. We will consider the possible ways to improve the accuracy of our recurrent network model on three semantic parsing data sets, resulting in new state-of-the-art performance on the standard data set for models with similar supervision.

Keywords: semantic parsing, RNN, GeoQuery, geo, overnight,SCFG.

Introduction

The objective of semantic parsing is a translation of natural language statements into machine-readable form with meaningful representations. From sequence-to-sequence models to neural network topologies, there has been a surge in the use of neural networks for semantic parsing in recent years. While semantic parsers based on neural networks have shown excellent results, there is still space for improvement. Since the main goal was to increase the accuracy score of neural networks, there are many approaches and methods used in previous works. Namely, recombination data, reranking, sentence rewriting and paraphrasing processes. All of them were assumed to be a good approach to achieving the best result. Data recombination processes for neural networks and reranking the dataset had shown that they still have drawbacks in improving semantic parsing results. [1]

In this paper, we will provide realistic strategies that may affect the best outcomes in semantic parsing procedures. We have shown that an RNN can predict tree-structured outputs in a linear way. Recurrent neural networks (RNNs), meanwhile, have made rapid progress in a variety of structured prediction tasks in NLP, such as machine translation and syntactic parsing. RNNs have the ability to perform a wide range of tasks with little feature engineering since they make few domain-specific assumptions. Nearly all experiments, completed in previous studies used basically 3 datasets: GeoQuery, ATIS and Overnight.

  • GeoQuery (GEO) consist of Natural language inquiries about US geography that are linked with Prolog database queries We follow Zettlemoyer and Collins’ typical split of more than 500 training samples and 280 test cases. To standardize variable nomenclature, we preprocess the logical forms to De Brujin index notation;
  • ATIS (ATIS) provides plain querying a flight database in a foreign language that is linked with lambda calculus database queries. We train on 4470 cases and assess Zettlemoyer and Collins’ 438 test examples;
  • Overnight (OVERNIGHT) is a collection of logical forms and natural language paraphrases divided into eight subdomains. The dataset was created by Wang et al. (2015) by producing all conceivable logical forms up to a certain depth threshold and then collecting a large number of natural language paraphrases from Amazon Mechanical Turk employees for each logical form. We use the identical train/test splits for our evaluation.

This paper will provide the results of findings that previous researchers achieve considerable gains over strong baselines on all three labelled datasets. Previous research has concentrated on how to train a semantic parser using input utterances, but what if we wanted to create a semantic parser for a different domain, such as a natural language interacting with a publishing database? Because no such interface exists, we don’t even have a natural source of input utterances to annotate. [2]

Semantic parsing

Semantic parsing is the process of converting natural language utterances into formal meaning interpretations. The objective meaning representations can be specified using a variety of conceptual frameworks. This includes linguistically driven semantic presentations like -calculus and abstract meaning representations, which are aimed to capture the meaning of any phrase. In more task-driven techniques to Semantic Pars- ing, meaning representations are widely used to describe executable programs such as SQL queries, robotic commands, smartphone instructions, and even multipurpose pro- programming languages like Python and Java [3].

The executable formalisms and models used in semantic parsing research have traditionally relied significantly on linguistic ideas of formal semantics, such as the -calculus generated by a CCG parser. Recent work with neural encoder-decoder semantic parsers, on the other hand, has opened the door to more accessible formalisms, such as conventional programming languages, and NMT-style models that are more accessible to a

Most semantic parsing research ignores the context of NL utterances, such as inter-action histories in dialogues. Depending on the application context, the surrounding text changes dramatically. The surrounding text of a current speech is referred to as the context in a free-text piece [3].

image001

Fig. 1. An example of semantic parsing.

With regard to various statements, the context differs. Since the primal task of paper is semantic parsing it responses to converts queries into logical forms (Q2LF). Let x = x1     x|x| denote the query, and y = y1       y|y| denote the logical form. An encoder encoder the query x into vector representations, and a decoder learns to create the logical form y based on the encoding vectors.

Even if the result is only a guess, current semantic parsers will create something for a given input by default. As a result, system outputs may be unexpected, hurting the user experience accidentally. Our objective is to address these challenges by developing a confidence scoring model that can evaluate the probability of a right prediction.

rt,k ∝ exp{dt · ek}                                                                                                                       (1)

Each frame is coupled with a collection of frame elements (semantic roles) that re- flect a type of event, scenario, or connection. Targets, which are words or phrases, elicit frames in a sentence. The FrameNet lexicon maintains track of lexical units, which are lemma and part-of-speech combinations that might evoke that frame, for each frame. The target drying up dry, for example, has a lexical unit of dry up.v, which is associated to the frame Becoming dry. Following previous work, we utilize the FrameNet lexi- con primarily as a mapping mechanism between target lexical units and their probable frames, as well as between frames and the roles they could play.

Encoder. A word embedding function ψ( ) maps each word xi to a fixed-dimensional vector, which is subsequently fed into a bidirectional LSTM At the i-th time step, the hidden vectors are recursively calculated through

Decoder. The attention mechanism is built within the decoder, which is a unidirec- tional LSTM.The decoder is often implemented as a vanilla LSTM with extra neural connections. The internal hidden state of the decoder at time step t, st, is provided by:

st = fLSTM([at−1 : ct : pt : nft ], st−1),                                                                                                                       (2)

The prior action’s embedding is at−1 in this case. Soft attention is used to obtain the context vector ct from input encodings hi. The information about the parent action is encoded in the vector pt. On the contrary, nf indicates the current frontier node’s node type embedding. Intuitively, supplying information to the decoder aids the model in keeping track of the frontier node’s expansion.

The dataset frameworks

We’ll now go through the OVERNIGHT structure for data gathering, which we’ll look into and enhance in this paper. The starting point is a user who requires a semantic parser for a certain domain but does not have any data. Overnight is a two-step method for producing new training data. To begin, logical forms are generated using synchronous grammar, which is then matched with emphcanonical utterances, which are comprehensible but do not seem like realistic pseudo-language utterances. Second, crowd workers translate these canonical assertions into commonplace language. As a consequence, a training set of logically constituted utterances is created, and the semantic parser is then trained. We’ll go through these two methods in further detail now.[6]

OVERNIGHT’s language provides logical constructs and canonical phrases that crowd workers can comprehend (e.g., the number of states that borders California). There are two components to the grammar: The domain-general section includes domain-independent and domain-specific rules for logical operators (e.g., comparatives, superlatives, negation etc.).

While there are different ways to sample data from the language, logical forms and canonical utterances are exhaustively created in OVERNIGHT up to a specified maximal depth, henceforth termed. This is based on the premise that a semantic parser trained on this data will generalize to logical forms that correspond to deeper trees. Furthermore, because a type system is employed during generation, semantically void logical forms (e.g., PublicationYear.Parsing) are not created, resulting in a significant reduction in the number of instances generated.

Data preprocessing and modification

Data augmentation which is typically used to insert previous information into a model, is generalized in our method. Modelling invariances (transformations like translating a picture or adding noise that changes the input x but not the output y) is the focus of data augmentation approaches. In domains like computer vision and voice recognition, these approaches have shown to be useful.

However, we would like to capture more than only invariance qualities in semantic parsing. Consider the phrase ”what states surround Kentucky?” as one example. Given this example, generalizing to queries in which Kentucky is substituted by the name of any other state should be straightforward: just replace the mention of Kentucky in the logical form with the name of the new state. We use a synchronous context-free grammar (SCFG) as the backbone of our genera- tive model tildep for semantic parsing.

Concatenation

The final grammatical induction option we explored and found to be successful is a very basic one. The k strategy, which provides two types of rules, is defined for any k 2. To begin, we define a single rule in which catroot points to a series of k’s.

Concatenation, unlike ABSWHOLEPHRASES and ABSENTITIES, is a broad term that may be used to any problem involving sequence transduction. Semantic parsing, on the other hand, provides no further information on compositionality or indepen- dence. However, because the model must learn to pay attention to the correct bits of the now-longer input sequence, the attention-based RNN creates more complex cases. Related research has shown that training a model on more difficult examples enhances generalization, with the most famous example being dropout.

Algorithm 1 The data recombination training technique. At each epoch, we induce a SCFG and then sample additional recombinant examples from it

 
   

 

1: for each iteration i = 1,..., T do

2:      Compute current learning rate nt

3:    Initialize current dataset Dt to D

4:    for i = 1, . . . , n do

5:                           Sample new example (x, y) from G Run policy 6:                              Add (x, y) to Dt

7:                       end for

8:                       Shuffle Dt

9:                       for each example (x, y) in Dt do

10:                     end for

11:                     Shuffle

12: end for

 

Implementation Details

 We tokenize logical forms in a domain-specific fashion based on the syntax of the formal language in use. We disallow duplication of predicate names on GEO and ATIS to ensure a fair comparison to previous work because string match- ing between input words and predicate names is not commonly used. Prepending un- derscores to predicate tokens prevents copying.

We use an external lexicon to map natural language phrases while executing attention- based copying and data recombination on ATIS alone. When we copy a word that is part of a sentence into the lexicon, we write the entity associated with that entry. While per- forming data recombination, we find entity alignments based on matching phrases and entities from the lexicon.[7]

While doing data recombination, we sample a new round of recombinant cases from our language at each epoch. We add these examples to the original training dataset, randomly shuffle all of the examples, and then train the model for the epoch. The pseu- docode for this training strategy. A critical hyperparameter is the number of instances to sample at each epoch: we found that a good rule of thumb is to sample twice as many recombinant instances as there are cases in the train set, such that half of the examples the model encounters at every epoch are recombinant.

Recombination of data without copying

We also looked at the impact of data recombination on the model without attention- based copying for completeness. We discovered that recombination improved the model greatly on GEO and ATIS, but had a minor negative impact on OVERNIGHT. On geo, the best data recombination strategy resulted in test accuracy of 82.9%, an increase of 8.3 percentage points over the baseline with no copying and no recombination; on atis, data recombination results in test accuracies of 74.6%, an increase of 4.7 per- centage points over the same baseline. OVERNIGHT, no data recombination approach increased average test performance; the best one led in a 0.3 percentage point loss in test accuracy. We believe that data recombination is less effective on OVERNIGHT in general because the set of alternative logical forms is constrained, making it more analogous to a huge multiclass classification assignment. As a result, learning excellent compositional representations that generalize to new logical forms at test time is less critical for the model [6].

Overnight, no data recombination approach increased average test performance; the best one led in a 0.3 percentage point loss in test accuracy. We hypothesize that data recombination is much less successful at night since the number of possible logical forms is limited, making it more equivalent to a large multiclass classification task. As a consequence, the model’s ability to acquire strong compositional representations that generalize to new logical forms at test time becomes less important.

We ran more trials using fictitious data to see how important it is to provide longer, more difficult cases. We experimented with using data recombination to add new exam- ples as well as introducing new independent examples (e.g. to simulate the acquisition. We train our model on a variety of datasets before putting it to the test on a collec- tion of 500 depth-2 samples picked at random. A tiny seed training set of 100 depth-2 instances is always available to the model. The training set is then supplemented with one of four sorts of examples:

  • Independent and of the same length: New depth-2 instances, picked at
  • longer and more independent: Depth-4 samples picked at
  • Recombinant, same length: Depth-2 instances taken from the language elicited by running missing on the seed
  • Recombinant, longer: Depth-4 instances were selected from the grammar created by running abs succeeded by absence on the seed
  • On the development set, which was taken from the same distribution as the train- ing set, all models looked excellent (> 80% accuracy), indicating that performance differences are due to generalization to the true distribution. On the development set, the accuracy of the para detection model st( ) was quite high (> 95% F1 measure), indicating that detection is simpler to model than

Conclusion

We look at how to generate data for training semantic parsers from scratch in a variety of areas and possible methods how the data can be modified before the model. We ex- amine the OVERNIGHT process in depth and shed light on the aspects that contribute to poor generalization, such as logical form mismatch and language mismatch. We next present GRANNO, a method that uses crowd workers to find automatically generated canonical utterances to directly annotate unlabeled utterances with their logical struc- ture. We show that our strategy works on two popular datasets and that it outperforms OVERNIGHT in terms of generalization to actual data.

References

  1. Jia, P. Liang, “Data Recombination for Neural Semantic Parsing”, IEEE Transactions on Industry Applications, vol. 54, № 1, pp. 832–840, August 2016.
  2. B. Wang, P. Liang., “Building a semantic parser overnight.”, Association for Computational Linguistics (ACL), 2015.
  3. Finegan-Dollak, L. Zhang, K. Ramanathan, S. Sadasivam, R. Zhang, D. Radev, “Improving text-to-sql evaluation methodology”, Proceedings of the 56th Annual Meeting of the Associ- ation for Computational Linguistics, vol. 1, no. 1, pp. 351–360, 2018.
  4. Pennington, R. Socher, C. Manning, “Global vectors for word representation”, Empirical Methods in Natural Language Processing (EMNLP), p. 1532-1543, 2014.
  5. Krizhevsky, I. Sutskever, G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, Advances in Neural Information Processing Systems, p. 1097-1105, 2012.
  6. Jaitly, G. Hinton, “Vocal tract length perturbation (vtlp) improves speech recognition”, International Conference on Machine Learning (ICML), 2013
  7. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors”, arXiv preprint arXiv, 2012.
  8. Dong, M. Lapata, “Language to logical form with neural attention”, Association for Com- putational Linguistics (ACL), 2016.
  9. Gu, Z. Lu, H. Li, “Incorporating copying mechanism in sequence-tosequence learning”, Association for Computational Linguistics (ACL), 2016.

Интересная статья? Поделись ей с другими: