Consistency in OntoNotes

July 29th, 2013
ling, tech
OntoNotes is the product of humans marking up a large amount of text with linguistic details, to produce examples for computers to learn from. For example, we might have this sentence, transcribed from a 2001 ABC news broadcast:
Like many Heartland states, Iowa has had trouble keeping young people down on the farm or anywhere within state lines. (en/bn/abc_0001:0)
A human linguist manually parsed this into a tree:
(TOP (S (PP-MNR (IN Like)
                (NP (JJ many) (NNP Heartland) (NNS states)))
        (, ,)
        (NP-SBJ (NNP Iowa))
        (VP (VBZ has)
            (VP (VBN had)
                (NP (NP (NN trouble))
                    (S-NOM (NP-SBJ (-NONE- *PRO*))
                           (VP (VBG keeping)
                               (NP (JJ young) (NNS people))
                               (ADVP-LOC (ADVP (RB down)
                                               (PP (IN on)
                                                   (NP (DT the) (NN farm))))
                                         (CC or)
                                         (ADVP (RB anywhere)
                                               (PP (IN within)
                                                   (NP (NN state)
                                                       (NNS lines))))))))))
        (. .)))
Then it went through several additional layers of annotation to specify things like "Heartland" being a location, "Iowa" being referred to in later sentences with "it", and the relationship of the various arguments to the main verb, 'had'.

There are many places in this process where one could make mistakes, and inconsistent data makes it much harder for maching learning systems. From the beginning, OntoNotes was intended to generate high quality data by doing most of the annotation work twice and then adjudicating any disagreements. [1] But how consistent is the final product?

One way to measure this is to look at a document in the corpus that was accidentally included multiple times. This wasn't noticed at the time and was annotated repeatedly. Documents wsj_0190, wsj_0364, wsj_0511, wsj_0696, wsj_1056, wsj_1228, wsj_1382, wsj_1557, and wsj_1558 all read:

Companies listed below reported quarterly profit substantially different from the average of analysts' estimates. The companies are followed by at least three analysts, and had a minimum five-cent change in actual earnings per share. Estimated and actual results involving losses are omitted. The percent difference compares actual profit with the 30-day estimate where at least three analysts have issues forecasts in the past 30 days. Otherwise, actual profit is compared with the 300-day estimate.
My guess is that this appears multiple times in the corpus because it was printed multiple times in the Wall Street Journal. This means it's somewhat atypical and is kind of boilerplate-ish, but we do at least have a lot of copies of it.

How many different ways did these sentences get analyzed? Let's go sentence by sentence.

"Companies listed below reported quarterly profit substantially different from the average of analysts' estimates."

wsj_0190, wsj_0364, wsj_1228:
(TOP (S (NP-SBJ (NP (NNS Companies))
                (VP (VBN listed) (NP (-NONE- *)) (ADVP-LOC (IN below))))
        (VP (VBD reported)
            (NP (NP (JJ quarterly) (NN profit))
                (ADJP (RB substantially)
                      (JJ different)
                      (PP (IN from)
                          (NP (NP (DT the) (NN average))
                              (PP (IN of)
                                  (NP (NP (NNS analysts) (POS '))
                                      (NNS estimates))))))))
        (. .)))

wsj_0511, wsj_1557, wsj_1558:
(TOP (S (NP-SBJ (NP (NNS Companies))
                (VP (VBN listed) (NP (-NONE- *)) (PP-LOC (IN below))))
        (VP (VBD reported)
            (NP (NP (JJ quarterly) (NN profit))
                (ADJP (RB substantially)
                      (JJ different)
                      (PP (IN from)
                          (NP (NP (DT the) (NN average))
                              (PP (IN of)
                                  (NP (NP (NNS analysts) (POS '))
                                      (NNS estimates))))))))
        (. .)))

wsj_0696, wsj_1056, wsj_1382:
(TOP (S (NP-SBJ (NP (NNS Companies))
                (VP (VBN listed) (NP (-NONE- *)) (ADVP-LOC (RB below))))
        (VP (VBD reported)
            (NP (NP (JJ quarterly) (NN profit))
                (ADJP (RB substantially)
                      (JJ different)
                      (PP (IN from)
                          (NP (NP (DT the) (NN average))
                              (PP (IN of)
                                  (NP (NP (NNS analysts) (POS '))
                                      (NNS estimates))))))))
        (. .)))
These three versions differ only in their analysis of "listed below". We see (ADVP-LOC (IN below)), (PP-LOC (IN below)) and (ADVP-LOC (RB below)).

Proposition annotation specifies the relationship between various arguments of verbs. For this sentence we had two sets:

wsj_0190, wsj_0364, wsj_0696, wsj_1056, wsj_1228, wsj_1382:
  1 list.01 ----- 1:0-rel 2:0-ARG1 3:1-ARG2 0:1*2:0-LINK-PCR
  4 report.01 ----- 4:0-rel 0:2-ARG0 5:2-ARG1

wsj_0511, wsj_1557, wsj_1558:
  1 list.01 ----- 1:0-rel 2:0-ARG1 3:1-ARGM-LOC 0:1*2:0-LINK-PCR
  4 report.01 ----- 4:0-rel 0:2-ARG0 5:2-ARG1
The disagreement is over whether "below" is the second argument of "listed" or a locative modifier, and was probably caused by the corresponding disagreement in the parsing.

All named entity annotation passes identified "quarterly" as a date.

"The companies are followed by at least three analysts, and had a minimum five-cent change in actual earnings per share."

wsj_0190:
(TOP (S (NP-SBJ-1 (DT The) (NNS companies))
        (VP (VP (VBP are)
                (VP (VBN followed)
                    (NP (-NONE- *-1))
                    (PP (IN by)
                        (NP-LGS (QP (ADVP (IN at) (JJS least)) (CD three))
                                (NNS analysts)))))
            (, ,)
            (CC and)
            (VP (VBD had)
                (NP (NP (DT a)
                        (JJ minimum)
                        (NML (CD five) (HYPH -) (NN cent))
                        (NN change))
                    (PP (IN in)
                        (NP (NP (JJ actual) (NNS earnings))
                            (PP (IN per) (NP (NN share))))))))
        (. .)))

wsj_0364, wsj_0511, wsj_0696, wsj_1056, wsj_1228, wsj_1382, wsj_1557, wsj_1558:
(TOP (S (NP-SBJ-1 (DT The) (NNS companies))
        (VP (VP (VBP are)
                (VP (VBN followed)
                    (NP (-NONE- *-1))
                    (PP (IN by)
                        (NP-LGS (QP (ADVP (RB at) (RBS least)) (CD three))
                                (NNS analysts)))))
            (, ,)
            (CC and)
            (VP (VBD had)
                (NP (NP (DT a)
                        (JJ minimum)
                        (NML (CD five) (HYPH -) (NN cent))
                        (NN change))
                    (PP-LOC (IN in)
                            (NP (NP (JJ actual) (NNS earnings))
                                (PP (IN per) (NP (NN share))))))))
        (. .)))
The two version here disagree in two places. In wsj_0190 we have "at least three" being (ADVP (IN at) (JJS least)) which is weird because the ADVP would be an adverb phrase without an adverb, just a preposition (IN) and superlative adjective (JJS). The others, with (ADVP (RB at) (RBS least)), are much more reasonable. That adverb phrase consists of an adverb (RB) and a superlative adverb (RBS).

They also disagree whether the prepositional phrase "in actual earnings per share" should be locative. I don't see why it would be, but only wsj_0190 doesn't mark it that way, so maybe I'm missing something.

For propositional annotation we had:

wsj_1228, wsj_1382, wsj_1557, wsj_1558:
  3 follow.02 ----- 3:0-rel 4:0-ARG1 5:1-ARG0
  12 have.03 ----- 12:0-rel 0:1-ARG0 13:2-ARG1
wsj_0190, wsj_0364, wsj_0511, wsj_0696, wsj_1056:
  3 follow.02 ----- 3:0-rel 4:0-ARG1 5:1-ARG0
  12 have.03 ----- 12:0-rel 4:0-ARG0 13:2-ARG1
The disagreement here isn't a real disagreement. Node 0:1 is "the companies" and node 4:0 is a trace that refers back to "the companies".

All named entity annotation passes identified "at least three" as numeric and "five-cent" as money. All coreference annotation passes matched "the companies" back to "companies listed below" in the previous sentence.

"Estimated and actual results involving losses are omitted."

wsj_0190, wsj_0511, wsj_1056, wsj_1228, wsj_1557, wsj_1558:
(TOP (S (NP-SBJ-1 (NP (ADJP (VBN Estimated) (CC and) (JJ actual))
                      (NNS results))
                  (VP (VBG involving) (NP (NNS losses))))
        (VP (VBP are) (VP (VBN omitted) (NP (-NONE- *-1))))
        (. .)))

wsj 0364, wsj_0696, wsj_1382:
(TOP (S (NP-SBJ-1 (NP (ADJP (JJ Estimated) (CC and) (JJ actual))
                      (NNS results))
                  (VP (VBG involving) (NP (NNS losses))))
        (VP (VBP are) (VP (VBN omitted) (NP (-NONE- *-1))))
        (. .)))
We see "estimated" being interpreted as either a past participle (VBN) or adjective (JJ). Both are pretty reasonable.

For propositions, only the instances of "estimated" tagged as VBN were eligible for annotation. All of those were annotated as:

0 estimate.01 ----- 0:0-rel 3:0,4:1-ARG1

All documents had the other two propositions annotated the same way:

4 involve.01 ----- 4:0-rel 0:2-ARG2 5:1-ARG1
7 omit-v omit.01 ----- 7:0-rel 8:0-ARG1
No named entity pass found anything here.

"The percent difference compares actual profit with the 30-day estimate where at least three analysts have issues forecasts in the past 30 days."

wsj_0190:
(TOP (S (NP-SBJ (DT The) (NN percent) (NN difference))
        (VP (VBZ compares)
            (NP (JJ actual) (NN profit))
            (PP-CLR (IN with)
                    (NP (DT the)
                        (NML (CD 30) (HYPH -) (NN day))
                        (NN estimate)))
            (SBAR-ADV (WHADVP-1 (WRB where))
                      (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three))
                                 (NNS analysts))
                         (VP (VBP have)
                             (VP (NNS issues)
                                 (NP (NNS forecasts))
                                 (PP-TMP (IN in)
                                         (NP (DT the)
                                             (JJ past)
                                             (CD 30)
                                             (NNS days)))
                                 (ADVP-LOC (-NONE- *T*-1)))))))
        (. .)))

wsj 0364:
(TOP (S (NP-SBJ (DT The) (NN percent) (NN difference))
        (VP (VBZ compares)
            (NP (JJ actual) (NN profit))
            (PP-CLR (IN with)
                    (NP (NP (DT the)
                            (NML (CD 30) (HYPH -) (NN day))
                            (NN estimate))
                        (SBAR-LOC (WHADVP-1 (WRB where))
                                  (S (NP-SBJ (QP (ADVP (RB at) (RBS least))
                                                 (CD three))
                                             (NNS analysts))
                                     (VP (VBP have)
                                         (NP (NNS issues) (NNS forecasts))
                                         (PP-TMP (IN in)
                                                 (NP (DT the)
                                                     (JJ past)
                                                     (CD 30)
                                                     (NNS days)))
                                         (ADVP-LOC (-NONE- *T*-1))))))))
        (. .)))

wsj_0511, wsj_0696, wsj_1382:
(TOP (S (NP-SBJ (DT The) (NN percent) (NN difference))
        (VP (VBZ compares)
            (NP (JJ actual) (NN profit))
            (PP-CLR (IN with)
                    (NP (NP (DT the)
                            (NML (CD 30) (HYPH -) (NN day))
                            (NN estimate))
                        (SBAR (WHADVP-1 (WRB where))
                              (S (NP-SBJ (QP (ADVP (RB at) (RBS least))
                                             (CD three))
                                         (NNS analysts))
                                 (VP (VBP have)
                                     (VP (NNS issues)
                                         (NP (NNS forecasts))
                                         (PP-TMP (IN in)
                                                 (NP (DT the)
                                                     (JJ past)
                                                     (CD 30)
                                                     (NNS days)))
                                         (ADVP-LOC (-NONE- *T*-1)))))))))
        (. .)))

wsj_1056:
(TOP (S (NP-SBJ (DT The) (NN percent) (NN difference))
        (VP (VBZ compares)
            (NP (JJ actual) (NN profit))
            (PP-CLR (IN with)
                    (NP (DT the)
                        (NML (CD 30) (HYPH -) (NN day))
                        (NN estimate)))
            (SBAR-ADV (WHADVP-1 (WRB where))
                      (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three))
                                 (NNS analysts))
                         (VP (VBP have)
                             (VP (NNS issues)
                                 (NP (NNS forecasts))
                                 (ADVP-LOC (-NONE- *T*-1))
                                 (PP-TMP (IN in)
                                         (NP (DT the)
                                             (JJ past)
                                             (CD 30)
                                             (NNS days))))))))
        (. .)))

wsj_1228:
(TOP (S (NP-SBJ (DT The) (NN percent) (NN difference))
        (VP (VBZ compares)
            (NP (JJ actual) (NN profit))
            (PP-CLR (IN with)
                    (NP (NP (DT the)
                            (NML (CD 30) (HYPH -) (NN day))
                            (NN estimate))
                        (SBAR (WHADVP-1 (WRB where))
                              (S (NP-SBJ (QP (ADVP (RB at) (RBS least))
                                             (CD three))
                                         (NNS analysts))
                                 (VP (VBP have)
                                     (NP (NNS issues) (NNS forecasts))
                                     (PP-TMP (IN in)
                                             (NP (DT the)
                                                 (JJ past)
                                                 (CD 30)
                                                 (NNS days)))
                                     (ADVP-LOC (-NONE- *T*-1))))))))
        (. .)))

wsj_1557, wsj_1558:
(TOP (S (NP-SBJ (DT The) (NN percent) (NN difference))
        (VP (VBZ compares)
            (NP (JJ actual) (NN profit))
            (PP-CLR (IN with)
                    (NP (NP (DT the) (CD 30-day) (NN estimate))
                        (SBAR (WHADVP-1 (WRB where))
                              (S (NP-SBJ (QP (ADVP (RB at) (RBS least))
                                             (CD three))
                                         (NNS analysts))
                                 (VP (VB have)
                                     (VP (NNS issues)
                                         (NP (NNS forecasts))
                                         (PP-TMP (IN in)
                                                 (NP (DT the)
                                                     (JJ past)
                                                     (CD 30)
                                                     (NNS days)))
                                         (ADVP-LOC (-NONE- *T*-1)))))))))
        (. .)))
This one is complicated, and has several different issues. First, there's a disagrement between
(with the 30-day estimate) (where at least three analysts have issues forecasts in the past 30 days).
and
(with the 30-day estimate (where at least three analysts have issues forecasts in the past 30 days.))
Is the "where" clause under the "with" clause?

Second, there's a disagreement over where the trace goes. All of them put it at the end except for wsj_1056 which puts it after "forecasts". I don't understand traces well enough to say what's going on here.

Third, there's a typo of "issues" for "issued", and so it's tagged as a plural noun (NNS) when it should be a verb. This comes from automated part of speech tagging that is supposed to be hand-corrected but in this case was missed. Some of the parses have it heading a verb phrase (VP) which makes sense except for the tag, while others nonsensically treat "issues forecasts" as a noun phrase.

Fourth, there's a disagreement in tokenization. Most of them break "30-day" into (NML (CD 30) (HYPH -) (NN day)) but wsj_1557 and wsj_1558 leave it as a simple (CD 30-day). I think all hyphens in this corpus are supposed to be split, so I'm not sure why these two are left connected.

Fifth, in wsj_0364 the clause "where at least three analysts have issues forecasts in the past 30 days" is marked as locative but not in the others. As before, I don't see how this use has anything to do with location.

For proposition annotation, the two parse trees where the "where" clause isn't under the "with" clause get an extra argument:

wsj_0190, wsj_1056:
  3 compare.01 ----- 3:0-rel 0:1-ARG0 4:1-ARG1 6:1-ARG2 12:2-ARGM-ADV

wsj_0364, wsj_0511, wsj_0696, wsj_1228, wsj_1557, wsj_1558:
  3 compare.01 ----- 3:0-rel 0:1-ARG0 4:1-ARG1 6:1-ARG2

All named entity annotation passes labeled "30-day" and "the past 30 days" as dates, and "at least three" as a number.

"Otherwise, actual profit is compared with the 300-day estimate."

wsj_0190, wsj_0364, wsj_0511, wsj_0696, wsj_1056, wsj_1228, wsj_1382:
(TOP (S (ADVP (RB Otherwise))
        (, ,)
        (NP-SBJ-1 (JJ actual) (NN profit))
        (VP (VBZ is)
            (VP (VBN compared)
                (NP (-NONE- *-1))
                (PP-CLR (IN with)
                        (NP (DT the)
                            (NML (CD 300) (HYPH -) (NN day))
                            (NN estimate)))))
        (. .)))

wsj_1557, wsj_1558:
(TOP (S (ADVP (RB Otherwise))
        (, ,)
        (NP-SBJ-1 (JJ actual) (NN profit))
        (VP (VBZ is)
            (VP (VBN compared)
                (NP (-NONE- *-1))
                (PP-CLR (IN with)
                        (NP (DT the) (CD 300-day) (NN estimate)))))
        (. .)))
These two differ only in whether the "300-day" is split at the hyphen, and the two documents that don't split it are the same two that didn't split "30-day". Those two also grouped together in each of the previous cases and are numbered sequentially, so I'm not sure we should really be treating wsj_1557 and wsj_1558 as independent annotations.

The proposition annotations are almost the same, but some annotate the trace as a direct semantic link (LINK-PCR):

wsj_0190, wsj_0364, wsj_0511, wsj_0696:
  5 compare.01 ----- 5:0-rel 0:1-ARGM-DIS 6:0-ARG1 7:1-ARG2 6:0*6:0-LINK-PCR

wsj_1056, wsj_1228, wsj_1382, wsj_1557, wsj_1558:
  5 compare.01 ----- 5:0-rel 0:1-ARGM-DIS 6:0-ARG1 7:1-ARG2
I can't think of what a link from 6:0 to 6:0 would mean, but this seems simple enough to have been cleaned up manually if it were actually invalid.

All named entity annotation passes labeled "300-day" as a date. All coreference passes but one connected "actual profit" back to "actual profit" in the previous sentence.

Summary

Almost all the disagreement here was in the parses, which are also the most complex annotations. There wasn't enough named entity or coreference evaluation to get a good sense of how accurate they are.

(Note: I worked on OntoNotes at BBN from 2008 to 2010.)


[1] OntoNotes: The 90% Solution Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel (2006).

Comment via: google plus, facebook

Recent posts on blogs I like:

The Grimke Sisters and Sexism

The necessity of birth control

via Thing of Things April 22, 2024

Clarendon Postmortem

I posted a postmortem of a community I worked to help build, Clarendon, in Cambridge MA, over at Supernuclear.

via Home March 19, 2024

How web bloat impacts users with slow devices

In 2017, we looked at how web bloat affects users with slow connections. Even in the U.S., many users didn't have broadband speeds, making much of the web difficult to use. It's still the case that many users don't have broadband speeds, both …

via Posts on March 16, 2024

more     (via openring)