You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"boutta", short for "about to", is currently tagged as P in the 0.3 data release, but it should be V since it's like a modal auxiliary verb, similar to "ought to". In fact, the Brown clusters have figured this out, grouping "boutta" with "tryna", "gonna", and "finna" variants ("trying/going to", "going to", "fixing to"): http://www.ark.cs.cmu.edu/TweetNLP/paths/0011001.html
This might also be related to immediate future auxiliaries as mentioned in the NAACL paper (for "finna" and Texan English).
Current examples of the problem, just for "boutta":
~/twi/pos/ark-tweet-nlp/data/twpos-data-v0.3 % grep -ni boutta *.conll
oct27.conll:22611:boutta P
oct27.conll:26789:Boutta P
Some further inconsistencies. Here are examples of this cluster in the data. I haven't looked at them in context yet but highly doubt the P reading is correct.
daily547.conll:1422 Tryna V
daily547.conll:2499 tryna V
daily547.conll:3934 Bouta P
oct27.conll:1534 fiNna R
oct27.conll:3469 fina V
oct27.conll:3923 gon V
oct27.conll:6065 tryna V
oct27.conll:7890 tryna V
oct27.conll:8455 gne V
oct27.conll:11337 tryna V
oct27.conll:13993 gon V
oct27.conll:19302 finna P
oct27.conll:21114 gon V
oct27.conll:22610 boutta P
oct27.conll:24181 tryna V
oct27.conll:26788 Boutta P
The text was updated successfully, but these errors were encountered:
Nathan noticed this today:
"boutta", short for "about to", is currently tagged as P in the 0.3 data release, but it should be V since it's like a modal auxiliary verb, similar to "ought to". In fact, the Brown clusters have figured this out, grouping "boutta" with "tryna", "gonna", and "finna" variants ("trying/going to", "going to", "fixing to"): http://www.ark.cs.cmu.edu/TweetNLP/paths/0011001.html
This might also be related to immediate future auxiliaries as mentioned in the NAACL paper (for "finna" and Texan English).
Current examples of the problem, just for "boutta":
~/twi/pos/ark-tweet-nlp/data/twpos-data-v0.3 % grep -ni boutta *.conll
oct27.conll:22611:boutta P
oct27.conll:26789:Boutta P
Some further inconsistencies. Here are examples of this cluster in the data. I haven't looked at them in context yet but highly doubt the P reading is correct.
daily547.conll:1422 Tryna V
daily547.conll:2499 tryna V
daily547.conll:3934 Bouta P
oct27.conll:1534 fiNna R
oct27.conll:3469 fina V
oct27.conll:3923 gon V
oct27.conll:6065 tryna V
oct27.conll:7890 tryna V
oct27.conll:8455 gne V
oct27.conll:11337 tryna V
oct27.conll:13993 gon V
oct27.conll:19302 finna P
oct27.conll:21114 gon V
oct27.conll:22610 boutta P
oct27.conll:24181 tryna V
oct27.conll:26788 Boutta P
The text was updated successfully, but these errors were encountered: