Labelled Bracketing

Introduction

Although our aim is to aid you in seeing the forest for the trees, one of your first hurdles will be to actually see the trees.

We suppose that you have dealt with so-called Phrase Structure Trees before. They are a great way to visually communicate the syntactic structure of a tree. The computer, however, cannot easily deal with the trees you are used to. We will not go into too much technical detail here, but the computer needs a way to have the same information in plain text-format. Thus, there has to be a way of encoding all these nodes and branches so a computer can interpret them easily and this is where labelled bracketing comes into play.

The first example of labelled bracketing you will encounter during the seminar or during your own experiments with TGrep2 might look something like this sentence from the Penn Treebank:

(S (NP-SBJ  (DT The)
            (NN woman))
   (VP (VBD had)
       (VP (ADVP (RB nearly))
           (VBN died)))
   (. .))

Right now, it will probably just look like a bunch of brackets and some nodes you might remember from your Linguistischer Grundkurs, although the names of the nodes might slightly differ, but this issue will be handled in the chapters about the different treebanks and their subjects). With a little bit of practice, though, you will be able to analyse and interpret the labelled bracketing just as easily as this tree:

The woman had nearly died That looks more like it, doesn’t it?

Both the labelled bracketing and the conventional tree contain the same information (except for the full stop, which was ignored in the tree in order to not further complicate it). It is simply a matter of getting used to either representation.

Hierarchy

The first major difference is that the tree has a hierarchical structure that goes from top to bottom. In the labelled bracketing-notation, the mother-node S is the leftmost node and the words themselves are on the very right. Correspondingly, we will begin interpreting it at the right and work ourselves further to the right and to the bottom. One can also see, that the S is preceded by an opening bracket. Brackets are used to indicate the different levels of hierarchy in the structure. Each opening bracket represents a new level and requires a closing bracket somewhere later. Every correctly bracketed sentence contains the same amount of opening and closing brackets, which makes it a good and quick test whether the trees you will convert into labelled bracketing form during the seminar are correct.

Indentation

So far, we have just discussed the very first node. Following the (S, there is another opening bracket which means, that another level of hierarchy comes into play, whose first node is NP-SBJ. From the tree given above, we know that the NP has a sister: the VP. In the tree, both of them were next to each other, but since we shifted our point of view by 90°, the VP is below the NP. But how do we know that they are sisters? This relationship requires another concept of labelled bracketing: Indentation. To illustrate indentation, we will shorten our structure to the following:

(S  (NP-SBJ)
    (VP))

In labelled bracketing, sister-nodes are always on the same level of indentation. The NP and the VP have the same horizontal distance between their beginning and the start of the line, although there is nothing in front of the VP in its line.

Combined Efforts

Now that we have established the notions of hierarchy and indentation in labelled bracketing, we will need to combine these two in order to fully grasp the structure of a sentence. In order to do this, we will once again look at our previous tree. As you can see, the NP has two daughters which obviously are on the same level of hierarchy below the NP. Thus, they need to be indented further to the right. This gives us the following:

(S   (NP-SBJ  (DT The)
              (NN woman)))

But our NP-SBJ has a sister, a VP. How do we include that information? First of all, we know it has to be on the same level of indentation and since it follows the NP, it has to be below it. We cannot, however, put the VP directly under the NP because then we would destroy our hierarchy as you can see here:

(S   (NP-SBJ  (DT The))
     (VP      (NN woman)))

This structure would stand for the following tree:

The woman

The logical consequence is that we have to insert our VP in a new line, below any preceding nodes that are on a lower level of hierarchy, which yields us the following result:

(S   (NP-SBJ (DT The)
             (NN woman))
      (VP [...])

By now you will hopefully see the similarity between the labelled bracketing notation and the trees. Finishing this example sentence is simply a matter of applying these concepts for every node.

They key to fully understanding this is really just practice. Just get some trees and convert them into labelled bracketing or the other way around. If you are having trouble, try re-reading this section or ask your fellow students.