Brochures

Temporal Patterns for Document Verification

Categories
Published
of 15
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Mirjana Jakšić, Burkhard Freitag University of Passau Germany Abstract In this paper we present a novel user-friendly high-level approach to the specification
Transcript
Mirjana Jakšić, Burkhard Freitag University of Passau Germany Abstract In this paper we present a novel user-friendly high-level approach to the specification of temporal properties of web documents which can be used for verification purposes. The method described is based on specification patterns supporting an incremental construction of commonly used consistency criteria. We show that our approach fills the gap between a temporal logic such as CTL as a powerful tool for specifying consistency criteria for web documents and users that maintain documents but have no or very limited knowledge about the specification formalism. An empiric assessment of the usability of specification patterns for web documents confirms that a pattern based specification shows significantly better results than the direct specification with CTL. 1 Introduction The concept of consistency is commonly applied to databases, programs, protocols, concurrent processes, and systems but can be naturally extended to digital documents. Various notions of consistency and a wide range of consistency checking methods have been studied in the field of digital documents. In this paper we address the problem of specifying consistency criteria for the purpose of verification of web documents. This work is part of the Verdikt project [19]. We focus on temporal properties of documents along standard reading paths. For example, we check whether in a web-based training (WBT) document every description of a certain concept is followed by an example of the same concept. This kind of consistency is particularly useful when having to ensure document coherence and certain properties of the narrative flow, e.g., in e-learning or technical documentation. In the Verdikt project the verification is performed by model-checking based on the temporal description logic ALCCTL [18]. For the sake of simplicity and clarity, we express consistency criteria in the more common, but also less expressive computation tree logic - CTL [6, 9] in this paper. However, the results described apply also to ALCCTL. Temporal logics are usually used for verification tasks in the application field of software engineering, but there are also systems using temporal logic for hypertext verification, e.g. [17]. Applying a temporal logic such as CTL or ALCCTL requires good mathematical knowledge and a lot of experience and usually involves considerable effort in terms of manpower and time. For this reason, a high-level mechanism supporting the process of formal specification is highly desirable. Our goal is to provide a user-friendly high-level specification scheme for temporal properties, which supports the incremental construction of commonly used consistency criteria for web documents. Among the existing methods for high-level specification, pattern-based approaches are well established and widely used [5, 7, 14]. In many cases they do not require deep-level knowledge of the underlying specification formalism. We will show that specification patterns which originally have been introduced for the field of reactive systems [5] can be adapted and enhanced for the purpose of specifying consistency properties of documents. Furthermore, we define an appropriate mapping of patterns onto CTL formulae. We also show that the construction of commonly used consistency conditions for web documents can be performed incrementally, thus giving less experienced users the opportunity to proceed This work is partially funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) under grant number FR 1021/7-2 17 from low to higher complexity. The usability of specification patterns has been evaluated in comparison to a direct specification using plain CTL. The results show that for inexperienced users the specification with patterns is significantly easier than the one with CTL. The contribution of this paper consists of: defining a set of specification patterns representing general temporal constraints that can be applied to express document consistency, showing how the proposed specification patterns can be used in the process of formalizing consistency criteria for web documents, and evaluating the usability of the approach. The paper is organized as follows. Section 2 describes the problem addressed. Section 3 introduces specification patterns for documents, section 4 deals with pattern transformation into CTL, while our specification tool is introduced in section 5. Evaluation results are presented in section 6. Section 7 discusses our results and related work. Section 8 concludes the paper. 2 Problem Description Our aim is to check the consistency of the narrative structure of a document. The narrative structure represents relevant aspects of the content and structure in a document s model and is defined as follows (see also [19]). Definition 1 (Narrative units. Start units. Narrative relation. Narrative path). A document D is structured as a finite set NU /0 of narrative units where each u NU is a cohesive, self-contained part of D. SU NU denotes a non-empty set of start units such that each u SU is a sensible starting point for reading the document. A narrative relation NR on the set of narrative units is defined by NR := {(u,u ) NU NU it is sensible to proceed to unit u immediately after having read unit u}. Let NR NU NU be a narrative relation of a document. Then a (potentially infinite) sequence (u 0,u 1,...) of narrative units is a narrative path iff (u i,u i+1 ) NR for each i N. Remark 2 (Narrative path). Since web documents are typically not read linearly, narrative paths cannot be assumed to be acyclic. As a consequence, we have to consider narrative paths to be potentially infinite. By allowing infinite paths the document model does not put a limit on the number of times a certain narrative unit can be visited. Definition 3 (Narrative structure). A narrative structure is a tuple NS = (NU, SU, NR) where NU is a set of narrative units, SU NU is a set of start units, and NR NU NU is narrative relation on NU such that the following holds: i) NR is left-total on NU, i.e. for each u NU there is some u NU such that (u,u ) NR. ii) Any narrative unit of NU can be reached on some narrative path in NR: for each u NU there is a start unit s SU and a narrative path (u 0,u 1,...) NR such that u 0 = s and u i = u for some i N. Remark 4 (Narrative structure). Demanding NR to be left-total simplifies the formal verification framework. For the sake of generality, narrative units which do not have any sensible successor unit are modelled as being reflexively related with themselves. The process of reading a web document (by a human reader) can be modeled as a collection of paths in a state transition system (S,R,L) where the set of states S represents the narrative units of the document, R S S is a narrative relation on S and L represents a set of local interpretations, one for each state s S. 18 Definition 5 (Temporal structure. Temporal verification model). A CTL temporal structure is a state transition system M = (S,R,L), where: S is a set of states, AP denotes the set of all atomic CTL propositions, R S S is a left-total binary transition relation, defining the possible transitions between states, L : S P(AP) is a labeling function, that assigns each state a set I AP of atomic CTL propositions that hold at this particular state. A temporal verification model of a document is a temporal structure with a distinguished starting state s 0 S. Example 6 (Narrative structure). Figure 1a) depicts a fragment of a narrative structure of a web document taken from a web based training (WBT) about datastructures. The unit start is followed by the definition of datastructures. After this unit there are two possible branches to follow. The first one proceeds with an example of datastructures, then with a summary and a test about datastructures, and finally with the end unit. The other branch continues with a definition of abstract datatypes, followed by an example of abstract datatypes, and then joins the other branch at the summary unit. Note, that the fact, that a human reader would probably classify the detour over the abstract datatype unit as a side note, is inessential in our context. start s 0 {start} definition of abstract datatype definition of datastructure {definition, abstract_datatype} s 3 s 1 {definition, example of datastructure s 2 {example, example of abstract datatype summary of datastructure {example, abstract_datatype} s 4 s 5 {summary, test about datastructure s 6 {test, end s 7 {end} a) b) Figure 1: a) Narrative structure of a document, b) Temporal structure of a document Let us consider the following sample consistency criteria: 1. On all paths there exists a summary unit before the test unit. 2. Every definition of the topic datastructure is on all succeeding paths followed by an example of the same topic. 3. After the summary unit, no definition units are allowed. 19 Obviously, criterion 1 holds in the structure of Figure 1a). On both paths a summary unit exists immediately before the test unit. On the other hand, criterion 2 does not hold in the given structure, because there is a path ( start, definition of datastructure, definition of abstract datatypes, example of abstract datatypes, summary of datastructure, test about datastructure, end ) with a definition of datastructure not being followed by an example of datastructure. Finally, criterion 3 holds in the given structure, because both definitions ( definition of datastructure, definition of abstract datatype ) appear before the summary unit, concerning both possible paths. Example 7 (Temporal structure of a document). Figure 1b) depicts the CTL temporal structure of a document with a narrative structure as shown in Figure 1a). In this structure holds: the set of states is defined as S = {s 0,s 1,s 2,s 3,s 4,s 5,s 6,s 7 }, atomic propositions correspond to topics (datastructure, abstract datatype) and structural types (definition, example, summary,...), the transition relation is given by R = {(s 0,s 1 ),(s 1,s 2 ),(s 1,s 3 ),...,(s 6,s 7 ),(s 7,s 7 )}, the labeling function determining the interpretation of a state is defined as L = {s 0 {start},s 1 {de f inition,,...,s 7 {end}}. There are two different paths, namely s 0 s 1 s 2 s 5 s 6 s 7... (short for: {(s 0,s 1 ), (s 1,s 2 ), (s 2,s 5 ), (s 5,s 6 ), (s 6,s 7 ), (s 7,s 7 )}) and s 0 s 1 s 3 s 4 s 5 s 6 s The main steps of consistency specification and verification are shown in Figure 2. Users appear in two different roles: First, there are document authors, who provide, organize, and maintain document fragments. Experienced authors may also be able to specify consistency criteria using the interface for pattern-based specification to be described later in this paper. Second, there are temporal logic experts who can specify complex criteria directly in CTL and maintain the verification model, if necessary. 3 2 CTL specification temporal verification model 5 transformation pattern-based specification 4 model-checking error report 1 information extraction and integration documentfragments users Figure 2: Automated verification of semi-structured documents Assume that there are several text components, possibly in different formats (no. 1 in Figure 2). The information about the document s content and structure are available in the form of markup and external metadata or are provided by external information extraction tools. The collected information is represented by a temporal verification model (no. 2 in Figure 2) which essentially formalizes the 20 narrative structure of the document. This way an abstraction is provided from implementation details which are irrelevant for the verification tasks. The specification criteria are expressed in CTL (no. 3 in Figure 2) and verified against the verification model by the CTL model checker. The verification results (counterexamples) are then presented to the user (no. 4 in Figure 2). For example, a CTL formula which expresses the second criterion of Example 6 reads: AG((de f inition datastructure) AF(example datastructure)) Since CTL, as a temporal logic, is likely to be too demanding for non-expert users - which of course tend to be the majority - a user-level specification method based on specification patterns has been developed (no. 5 in Figure 2). Patterns represent commonly occurring requirements concerning the content and structure of documents (see Definition 9). Specification patterns are translated into CTL formulae. Our approach to automated verification of semi-structured documents is presented in detail in [19]. 3 Specification Patterns for Documents The primary goal of the work described in this paper is the definition of a high-level specification formalism for consistency criteria for web documents, which should fulfill the following properties: The proposed high-level formalism must represent the temporal properties of web documents and must be intuitively understandable, so that a user does not have to be aware of the underlying logic formalism. The system should provide a reasonable expressive power and yet stay compact and manageable. It is important to support the incremental development of specifications, i.e., it should allow the user to first recognize the general rule and then to refine it if required. The approach has to be extensible and adaptable to possibly different underlying logic formalisms. A pattern-based approach to the presentation, formulation, and reuse of property specifications in reactive systems has been introduced in [5]. A set of possible constraints has been defined and patterns have been created for them. The patterns are provided to the users who can identify similar requirements in their systems and select patterns that address those requirements. Until now, seven specification formalisms are supported, among them CTL [1]. We found that many of these patterns could also be useful for expressing document properties [10, 13]. The abstraction from temporal properties allows users not to worry about the underlying logic. The flexible definition and organization of the original patterns allow us to choose only a subset and to adapt and extend it easily for our needs. Because patterns defined in [5] are meant to be used by users familiar with the underlying specification formalism, user support for the specification process is not provided. Different from that situation, our use cases (see [15]) involve non-expert users; consequently, we have to support them in expressing formal consistency criteria. To this end, we provide an interface allowing to express loose criteria, which can be later enhanced if necessary. Example 8 (Properties of consistency criteria). Let us consider the consistency criterion: There always exists a summary unit before the first test unit. The following important properties can be observed: 1. It expresses a kind of constraint: the existence of a summary unit. 2. It specifies the part of the document or, more precisely, of its temporal structure, where the specification should hold: before the first test unit. The properties 1. and 2. characterize a specification pattern of the following kind: Within the considered structure, on all paths starting from the current state, property p holds before property s holds for the first time. The considered structure can be the whole document, but also any document fragment. 21 As one can observe, criteria expressed in natural language are quite ambiguous. For example, requiring that each definition of datastructure is followed by an example on the same topic does not specify precisely whether there should be an example of datastructure on all following paths after the definition of datastructure, or whether it is enough having an example on some path. Natural language specifications of certain properties of specification patterns are also ambiguous. Here are some examples of such ambiguities: Does q follows p require that q has to hold on all following paths, or on some path? After s could mean after each s or after the first one. It is also not clear what happens if there is no s in the whole document. Is the criterion satisfied in this case or not? Does the meaning of before s include the narrative unit where s holds for the first time or not? The ambiguities of natural language specifications were the main motivation for us to first define a set of basic specification patterns together with their corresponding CTL formulae and then to determine how the basic patterns can be modified, i.e. we defined a set of modified patterns with their corresponding CTL formulae. This way users can execute a two-stage process, first determining the general properties of the criterion they want to express adding refinements as necessary in the second step. The semantics of pattern types, scopes, and modifiers we use is determined by the definition of the mapping of specification patterns onto CTL as will be detailed in section 4. Definition 9 (Specification pattern). A specification pattern (for documents) is a generalized representation of a commonly occurring requirement on the content and structure of documents (cf. [5]). Specifications are instances of specification patterns. A specification pattern is represented by a 4-tuple: (pattern type, p modi f ier, scope, s modi f ier). A pattern type (pattern type) determines the type of the constraint expressed by the specification pattern. Each pattern type is represented by a pattern type name and one or two pattern properties. Pattern type names (universally, exists, follows, precedes) denote the type of the constraint and can only be understood in conjunction with pattern properties. A pattern property is a parameter which represents the CTL formula required to hold by the pattern type. Let p and q be CTL formulae. Possible values of pattern type are: universally p, exists p, q follows p, and p precedes q. universally p means that p holds in every narrative unit. exists p expresses that p has to hold in some narrative unit. q follows p means each unit satisfying p must be succeeded by a unit for which property q holds. p precedes q means that if property q holds in some narrative unit this unit must be preceded by a unit for which property p holds. By default, each pattern type applies to all paths of a document but this can be overridden. pattern type universally p exists p q follows p p precedes q pattern modifiers null p, absence, some path null p, some path null p, immediate p null p Table 1: Pattern types with allowed pattern modifiers A pattern modifier (p modi f ier) allows to refine a pattern type, by further restricting or loosening the original meaning. Possible values of p modi f ier are: null p, absence, immediate p, some path. Modifier null p indicates that the original meaning of a pattern type is not changed. 22 Table 1 shows the allowed pattern modifiers for each pattern type. For pattern types universally p and exists p there is a pattern modifier some path. It says that the constraint holds on some path of a document, as opposed to the default meaning. For the pattern type universally p a pattern modifier absence is defined, which denotes that p does not hold in any narrative unit. The pattern type q follows p can be used with the modifier immediate p, which expresses that q must hold in all next narrative units of the one where p holds. A scope determines where in a document a specification is intended to hold. A scope is represented by a scope name and one or two scope properties. A scope property is a parameter, which will be replaced by a CTL formula at instantiation time. Let s and r be CTL formulae. Possible values of scope are: globally, before s, after s, and between s and r. Scope globally requires no parameters and actually expresses an unrestricted scope - a specification having this scope applies to the whole document structure. before s expresses that the
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks