请输入您要查询的百科知识:

 

词条 Brzozowski derivative
释义

  1. Derivative of a regular expression

  2. Computation

  3. Properties

  4. See also

  5. References

In theoretical computer science, in particular in formal language theory, the Brzozowski derivative u−1S of a set S of strings and a string u is defined as the set of all strings obtainable from a string in S by cutting off a prefixing u, formally: u−1S = { v ∈ Σ*: uvS }, cf. picture.[1]

It is named after the computer scientist Janusz Brzozowski who investigated their properties and gave an algorithm to compute the derivative of a generalized regular expression.

Derivative of a regular expression

Given a finite alphabet A of symbols,[2] a generalized regular expression denotes a possibly infinite set of finite-length strings of symbols from A. It may be built of:

  • ∅ (denoting the empty set of strings),
  • ε (denoting the singleton set containing just the empty string),
  • a symbol a from A (denoting the singleton set containing the single-symbol string a),
  • RS (where R and S are, in turn, generalized regular expressions; denoting their set's union),
  • RS (denoting the intersection of R 's and S 's set),
  • ¬R (denoting the complement of R 's set with respect to the set of all strings of symbols from A),
  • RS (denoting the set of all possible concatenations of strings from R 's and S 's set),
  • R (denoting the set of n-fold repetitions of strings from R 's set, for any n≥0, including the empty string).

In an ordinary regular expression, neither ∧ nor ¬ is allowed.

The string set denoted by a generalized regular expression R is called its language, denoted as L(R).

Computation

For any given generalized regular expression R and any string u, the derivative u−1R is again a generalized regular expression.[3]

It may be computed recursively as follows.[4]

(ua)−1R = a−1(u−1R)       for a symbol a and a string u
ε−1R = R

Using the previous two rules, the derivative with respect to an arbitrary string is explained by the derivative with respect to a single-symbol string a.

The latter can be computed as follows:[5]

a−1a = ε
a−1b = ∅ for each symbol ba
a−1ε = ∅
a−1 = ∅
a−1(R*) = a−1RR*
a−1(RS) = (a−1R)S ∨ ν(R)a−1S
a−1(RS) = (a−1R) ∧ (a−1S)
a−1(RS) = (a−1R) ∨ (a−1S)
a−1R) = ¬(a−1R)

Here, ν(R) is an auxiliary function yielding a generalized regular expression that evaluates to the empty string ε if R 's language contains ε, and otherwise evaluates to ∅. This function can be computed by the following rules:[6]

ν(a) = ∅ for any symbol a
ν(ε) = ε
ν(∅) = ∅
ν(R*) = ε
ν(RS) = ν(R) ∧ ν(S)
ν(RS) = ν(R) ∧ ν(S)
ν(RS) = ν(R) ∨ ν(S)
ν(¬R) = ε if ν(R) = ∅
ν(¬R) = ∅ if ν(R) = ε

Properties

A string u is a member of the string set denoted by a generalized regular expression R if and only if ε is a member of the string set denoted by the derivative u−1R.[7]

Considering all the derivatives of a fixed generalized regular expression R results in only finitely many different languages. If their number is denoted by dR, all these languages can be obtained as derivatives of R with respect to string of length below dR.[8] Furthermore, there is a complete deterministic finite automaton with dR states which recognises the regular language given by R, as laid out by the Myhill–Nerode theorem.

See also

  • Quotient of a formal language

References

1. ^{{cite journal| author=Janusz A. Brzozowski| title=Derivatives of Regular Expressions| journal=J ACM| year=1964| volume=11| issue=4| pages=481–494| doi=10.1145/321239.321249}}
2. ^Brzozowski (1964), p.481, required A to consist of the 2n combinations of n bits, for some n.
3. ^Brzozowski (1964), p.483, Theorem 4.1
4. ^Brzozowski (1964), p.483, Theorem 3.2
5. ^Brzozowski (1964), p.483, Theorem 3.1
6. ^Brzozowski (1964), p.482, Definition 3.2
7. ^Brzozowski (1964), p.483, Theorem 4.2
8. ^Brzozowski (1964), p.484, Theorem 4.3

1 : Formal languages

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/12 0:07:13