The Penn Treebank Tagset

Similarly to the SUSANNE Corpus Tagset, the Penn Treebank Tagset consists of two main parts. There is the syntactic tagset and the POS tagset.

The syntactic tagset

ADJP Adjective phrase
ADVP Adverb phrase
NP Noun phrase
PP Prepositional phrase
S Simple declarative clause
SBAR Clause introduced by subordinating conjunction or 0 (see below)
SBARQ Direct question introduced by wh-word or wh-phrase
SINV Declarative sentence with subject-aux inversion
SQ Subconstituent of SBARQ excluding wh-word or wh-phrase
VP Verb phrase
WHADVP Wh-adverb phrase
WHNP Wh-noun phrase
WHPP Wh-prepositional phrase
X Constituent of unknown or uncertain category
Null elements
* „Understood“ subject of infinitive or imperative
0 Zero variant of that in subordinate clauses
T Trace—marks position where moved wh-constituent is interpreted
NIL Marks position where preposition is interpreted in pied-piping contexts

The POS tagset

CC Coordinating Conjunction
CD Cardinal Number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition/subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Posessive ending
PRP Personal pronoun
PP Posseive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol (mathematic or scientific)
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund/present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT wh-determiner
WP wh-pronoun
WP$ Possesive wh-pronoun
WRB wh-adverb
# Pound sign
$ Dollar sign
. Sentence-final punctuation
, Comma
: Colon, semi-colon
( Left bracket character
) Right bracket character
" Straight double quote
Left open single quote
Left open double quote
Right closed single quote
Right closed double quote

This list is taken from the HTML version of ‚Building a large annotated corpus of English: the Penn Treebank‘ by Mitchell P. Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini which also contains a lot of useful information about the Penn Treebank.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert