Basic approach

A TGrep 2 query consists of regular expressions and links (which can be modified). The simplest query simply consists of one regular expression, e.g. /the/. This would return every sentence which somewhere has a node which matches the regular expression, e.g. any sentence with the, them, atheist, etc. Accordingly, you could also use /^VP$/ to get any sentence containing a verb phrase (in the Penn Treebank).

Note: It is always a good idea to start simple and then go into more detail. TGrep2 queries are not intuitive. Do not worry, a good query takes time. Even experienced people have to rely on a lot of trial & error.

If we now want to expand our query and look for sentences where the verb phrase is the parent of a noun phrase, it would look like this: /^VP$/ < /^NP/. Note that we did not put a $ after the NP because there might be additional function tags following. In fact, the aforementioned query returns 39970 results in the Penn Treebank. /^VP$/ < /^NP$/ only gives you 34881 results.

The general scheme is: regexp link regexp

Since this is still a rather basic query, we will expand it once again. For the sake of this tutorial, we will now look for verb phrases which immediately dominate a noun phrase which immediately dominates a determiner. A quick glance at the Penn Treebank tagset tells us that the tag for determiners is DT. Your first estimate at a query would probably be /^VP$/ < /^NP/ < /DT/. This query, however, only returns 491 results, which does not seem right because determiners are quite common in noun phrases. Moreover, the actual results do not seem to feature the patterns we were looking for. And indeed, this is not the correct query. The thing with TGrep2 is, that links always refer to the first element. This is a very common mistake. With this query we actually looked for verb phrases which is the parent of a noun phrase and is the parent of a determiner. So how do we get around that?

Brackets. With brackets you can make sure that your links refer to the element you want them to. When you use brackets, the link always refers to the first element within the brackets. Thus, the query we need to use looks like this: /^VP$/ < (/^NP/ < /DT/). Since the NP is the first element within the brackets, this is what the following link refers to. Similar to mathematics or labelled bracketing, there can be many levels of brackets which may result in rather complex queries.

Note that the contents of a pair of brackets acts like a regular expression, as far as the TGrep2 query syntax is concerned. Thus, our scheme above would now like this:

regexp or a pair of brackets link regexp or a pair of brackets

A pair of brackets in return must contain any kind of valid TGrep2 query. You can imagine the contents of one pair of brackets (and any further pairs there might be within that pair) as a special kind of 'regular expression' matching anything that matches the query within the brackets.

To further illustrate this, we will give you a couple of queries and explain what the would look for:

  • (A > B) !. C
    This query would look for an A which is the child of B and this A shall not immediately precede a C.

  • A < B < C < D > E
    This query would look for an A which is the parent of a B, a C, a D and is the child of an E.

  • A , (B > (C >> D))
    This query would look for an A which immediately follows a B which is the child of a C which itself is dominated (not necessarily immediately) by a D.

  • A , (B > C >> D)
    This query would look for an A which immediately follows a B which is the child of a C and as well dominated by a D. Note that for the lack of brackets in comparison to the former query, the >> Drefers back to the B since it is the first element within the brackets.

Exercise

Suppose we were looking for an A which is the parent of a B and is the sister of a C which itself is the parent of a D. Which of the following would be a correct query? (Note that there are queries not listed here which are correct as well; in this list however, only one query is correct)

A > B $ (C > D)
A < B $ (C < D)
A < B $ C < D
A < (B $ (C < D))