УДК 004

Model transformation rules using NLP in model driven architecture

Доснияз Асем Жанабайкызы – магистрант Казахстанско-Британского технического университета (г. Алматы, Республика Казахстан)

Abstract: Automating analysis modeling in object-oriented software development can significantly reduce errors that may arise during manual transformation. To achieve this, we propose using a set of model transformation rules to automate the transformation of English use case specifications into class diagrams. Our approach involves parsing the input text using a natural language parser, applying sentence structure rules, and transforming type dependencies and parts of speech tags to identify problem-level objects and their relationships. This can help minimize errors in the manual transformation process.

Keywords: model-driven software development, model transformation, class diagrams, natural language processing.

Introduction

Our paper proposes an approach for generating class diagrams from use case specifications, addressing the challenge of identifying relevant model elements and their contextual relationships. Our approach involves parsing input using the Stanford NL parser to generate parts of speech tags and type dependencies, followed by comprehensive sentence structure rules and transformation rules to identify potential elements. These rules take into account sentence structure and syntactic/semantic relationships to precisely identify elements. Our approach provides a more efficient and accurate solution to transforming use cases into class diagrams in a loss-less manner.

Background and concepts

Use case model

Use case diagrams, along with their corresponding textual descriptions, are frequently employed to document a system's functional requirements. They illustrate functional elements, actors, and objects in communication. Typically, a use case diagram includes actors and use cases. An actor is a participant that interacts with the system. A use case description (UCD) (Table 1) specifies a system's functional requirements by describing a sequence of actions performed by the system in response to the actors' inputs, thus providing the required functions to the actors.

Table 1. Use Case Description template.Withdraw fund taken from Yue et al (2013a, 2015), but a few modifications done).

image001

NLP models

Our method extracts two NLP constructs, namely Parts of Speech tags (POS-tags) and Type Dependencies (TDs), from the sentences in the Use Case Descriptions (UCD) using the Stanford NL parser API. Afterward, we use these NLP constructs to identify the sentence structure of the sentences and extract the elements necessary for the class diagram generation.

Parts of Speech tags (POS-tags)

Annotate the words in a sentence with their respective parts of speech, such as noun, pronoun, verb, adjective, adverb, and more. Once a sentence is provided as input to the parser, it analyzes the sentence and assigns each word a POS-tag from a set of 36 available tags.

Type Dependencies (TDs)

Represent the grammatical dependency relationship (i.e., bi-lexical asymmetrical relationship) between the words in a sentence. The current Stanford typed dependencies set includes 53 different types of grammatical relationships. The parser produces a TD in the form of a triplet structure tdName(head, dependent), where tdName represents the name of the dependency, head represents the head word, and dependent represents the dependent word.

Language model

In English, a single idea can be communicated using various sentence structures. This applies to generating class diagrams, where essential elements are spread across different sentences. To extract these elements accurately, we must first identify the structures of the sentences.

Our language model is based on the twenty-five verb patterns introduced by A. S. Hornby. These patterns describe the twenty-five different ways in which a verb phrase can be expressed within a sentence.

For sentence ‘Customer enters the conversion amount’, its generated type dependencies are [root(ROOT-0, enters-2), nsubj(enters-2, Customer-1), det(amount-5, the-3), nn(amount-5, conversion-4), dobj(enters-2, amount-5)]. Such sentences follow the Subject-Verb-DirectObject structure of sentences.

By analyzing all sentence structures, to prevent ambiguity in sentences (Kamsties and Peach, 2000; Wiegers and Beatty, 2013), our language model is designed to enforce the use of English language in writing Use Case Descriptions, with a few specific restriction rules:

  • Use simple sentences to write basic flow steps of use case description
  • Do not use pronouns
  • Use consistent names of things
  • Use “system” to refer to the developing system.

Methodology

The proposed approach works in 4 steps. First, it reads use case descriptions, and parses it to generate TDs and POS tags using Stanford Core NLP API.Then applies the sentence structure rules on TDs and POS-tags to identify sentence structures of the sentences. Following the identification of sentence structures, we developed transformation rules by type dependencies and part-of-speech tags to identify problem-level classes, attributes, methods, and relationships between classes.

Step 1. Parse sentences to obtain their type dependencies and pos tags

For the sentence = “Customer enters the convertation amount.” the TDs

and POS-tags generated by the parser are: TDs=[nsubj(enters-3, customer-2), root(ROOT-0, enters-3), det(amount-6, the-4), nn(amount-6, convertation-5), dobj(enters-3, amount-6)]. POS-tags=[ customer/NN, enters/VBZ, the/DT, convertation/NN, amount/NN, ./.]

Step 2. Identify sentence structures of the sentences

The proposed approach utilizes a set of sentence structure rules, which are described in the Language Model Section, to identify the structure of sentences. To identify the sentence structure of a given sentence, the approach sequentially checks each type dependency (TD) and retrieves the appropriate sentence structure rule from a table containing all rules. As described above, for sentence ‘Customer enters the convertation amount’, its generated type dependencies are [root(ROOT-0, enters-2), nsubj(enters-2, Customer-1), det(amount-5, the-3), nn(amount-5, convertation-4), dobj(enters-2, amount-5)]. We have the sentence structure as the nsubj(x,y), dobj(y,z).Such sentences follow the Subject-Verb-DirectObject structure of sentences.

Step 3. Identify transformation rules to create class diagram elements

We propose transformation rules (T1-T42) to disambiguate the process of identifying the correct elements of the class diagram from the text. The rules utilize the syntactic and semantic relationships between words in sentences, as determined from type dependencies (TDs) and part-of-speech tags.

Table 2. Transformation rules.

image002

Rules to determine class operations and class attributes

The approach utilizes sentence structure rules outlined in the Language Model section. All nouns identified within a sentence are saved in a set, referred to as the "NounList".

Continue of Table 2.

image003

To determine class operations, the approach first identifies the source and destination nouns within the NounList, along with the operation name. Based on these identified elements, classes, operations, and relationships are created in accordance with the aforementioned sentence structure rules.

The source noun term represents the caller of the identified operation and is a candidate for being associated with only one class. If no class exists for the source noun term, a new class is created for it according to rule T35. The destination noun term may either be an attribute of an existing class in which the identified operation is to be hosted, or it may be a candidate for being associated with a new class. If a class already exists for the destination noun term, the identified operation is hosted in that class as per rule T36. Otherwise, a new entity class is created, the identified operation is hosted in that class, and the class is added to the ClassDiagram instance.

Сontinue of Table 2.

image004

Identifying association relationships

Using the source noun term, destination noun term, and operation name for each operation identified before, the approach establishes an association relationship between the class representing the source noun term and the class representing the destination noun term, with navigability set from the former to the latter.

Сontinue of Table 2.

image005

Identifying generalization relationships

In order to identify generalization relationships, the approach scans through the sentences present in the flows and description sections of UCD. Specifically, it looks for sentences containing words such as "is a," "kind of," and their various synonyms. These sentences are then used to determine the existence of generalization relationships.The words "is a," "type of," "kind of," and their synonyms are referred to as Generalization Words (GenWords).

Continue of Table 2.

image006

Identifying aggregation relationships

An aggregation relationship refers to the relationship between two classes where the part class is contained in the whole class, either as a part of the whole class or as an attribute within it. The approach identifies aggregation relationships using two main tactics.

  1. When a class (C1) is present as an attribute in another class (C2), C1 is identified as the part class, C2 as the whole class, and an aggregation relationship is established between them
  2. The approach searches for sentences in the flows and description sections of UCD containing sub strings such as "part of," "consists of," "contains," and their synonyms(AggWords), as these sentences are potential candidates for aggregation relationships.

Сontinue of Table 2.

image007

Results & Conclusion

This paper suggests a set of transformation rules for creating class diagrams from software requirements that are documented in the form of use case descriptions (UCDs). The approach begins by parsing the UCDs using the Stanford NL parser APIs to generate TDs and POS-tags. The sentences are then identified by their sentence structure, through applying the proposed comprehensive sentence structure rules on their TDs and POS-tags. A comprehensive set of transformation rules are then applied to the sentences' TDs and POS-tags to identify the elements required for generating the class diagram.

This work has several possible future directions, including generalizing the approach to interpret and transform more complex and compound sentences in addition to the simple and complex sentences currently handled. The approach could also be extended to generate platform-specific models (PSMs) from platform-independent models (PIMs), as well as to generate template code.

References

  1. Abbott RJ. (1983) Program design by informal english descriptions. Communications of the ACM 26 (11): 882-894.
  2. Arango G. (1989) Domain analysis: From art form to engineering discipline. ACM Sigsoft software engineering notes 14 (3): 152-159.
  3. Mich L. (1996) Nl-oops: from natural language to object oriented requirements using the natural language processing system lolita. Natural language engineering 2 (02): 161-187.
  4. Mich L., Garigliano R. (2002) Nl-oops: A requirements analysis tool based on natural language processing. In: Proceedings of Third International Conference on Data Mining Methods and Databases for Engineering, Bologna, Italy.
  5. Yue T., Briand L., Labiche Y. (2013a) Automatically deriving a uml analysis model from a use case model. Tech. Rep. 2010-15 (Version 2), Simula Research Laboratory.
  6. Greenbaum S. The Oxford English Grammar. Oxford University Press Oxford. 1996.
  7. Bezivin J. Model driven engineering: An emerging technical space. In: Generative and transformational techniques in software engineering, ´ Springer, 2006, pp. 36-64.
  8. Siqueira F.L., Silva PSM. An essential textual use case meta-model based on an analysis of existing proposals. In: WER, 2008.
  9. Hanks P. Lexical patterns: From hornby to hunston and beyond. In: Proceedings of the XIII EURALEX International Congress, 2008, pp. 89-129.
  10. Hornby A. A guide to patterns and usage in English. Oxford University Press, 1975.
  11. Kamsties E., Peach B. Taming ambiguity in natural language requirements. In: Proceedings of the Thirteenth International Conference on Software and Systems Engineering and Applications, 2000.

Интересная статья? Поделись ей с другими: