My friend John Jozwiak has held a research interest in Groebner bases for sometime. Why? According to a well-put email he sent me today:
they are the algebraic tool for what is essentially “does this set of rewrite rules define a language which contains this string?”
Put in other terms, John’s saying that these mathematical tools can be used to determine if a sentence is a part of a language – or, if the linguistic description of a language (the “rewrite rules”) can be used to make a given sentence (the “string”).
Could this possible lead to the sort of breakthrough machine translation has been looking for? Perhaps John will comment for us…
Here’s John’s follow up on GBs:
so, i think the real situation is we have
a set or ‘space’, some elements from the set,
some rule for symbolically taking elements and
creating new ones.
if , as in GBs, we are looking at the space
of Integers[x1,...,xm] then the rules are
+,-,* . a basis for that space can create
anyone in it using the rule.
so a basis is {1,x1,…,xm}
as would be other redundant sets like
{1,x1,…,xm,x1+…+xm}.
however, there is a unique minimal basis.
more on that shortly.
if we are interested in strings
(as WE are) then we could consider
sequences over some basis like the letters
and symbols (number, punctuation) of english,
and perhaps emoticons if you like. anyway,
the emoticons are redundant, so would not be
in a minimal basis. the rule for creating
new strings from others is concatenation.
perhaps in a semantic realm there are other rules.
perhaps a semantic space (realm) necessarily has
the syntactic realm as a subspace? hmmm.
anyway, so, if we totally order ALL the elements
in set, and we have some basis B under consideration,
in that we want to answer questions related to it
(as in “did B create _this_?”),
then we can , at least with polynomials,
try to construct an equivalent basis (spans
the same space) whose elements are minimal
in the total order.
that is the idea of Gro”bner bases.
so, with the total order and the insistence
on minimality with respect to the order,
the GB turns out unique, spans the same set,
by construction. we are nearly done.
so, to construct a GB(B) from basis B,
we evolve a basis BB from B in that we
we iterate the following till nothing changes:
choose two elements in BB and look at their
littlest (in the total order) rule-derived offspring O.
set BB = BB setunion {0}.
continue till BB no longer evolves.
in polynomials the evolutionary steps i know of are
to take two polynomials from BB, and create a new one
with smallest leading term (coefficient times variables),
and reinsert that into the new BB. the buchberger algorithm
for creating a GB just does that for all pairs till BB is
invariant, as i think of it.
so, what if we look at strings as polynomials where
the * rule ‘does not commute’ so that it is only coincidence
that a*b=b*a ? then + and * with variables the nonterminals
and the coefficients the letters look like a basis for a
language of strings.
this is all way clearer with examples.
that is it, essentially.
You’ll probably never read this but I was just thinking similar stuff. You can do stuff with the commutative image of a context free algebra nicely with Groebner bases. For example, to count the number of syllables in a sentence you need only the commutative image of the sentence, So Groebner bases give a nice algorithm to generate sentences from a CFG with a given number of syllables, useful when composing poetry