- Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathterms.tex
437 lines (406 loc) · 14.1 KB
/
terms.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
\section{Terms}
\label{sec:terms}
Terms are lexical units which refer to word forms (and groups multi word
forms). Terms are attached to several morphosyntactic information (such as
part of speech, lemma, etc), and may be linked to external resources like
WordNet senses, etc.\\
The \texttt{<term>} element has the following attributes:
\begin{itemize}
\item\texttt{id} (\textbf{required}): unique identifier starting with the
prefix ``t''. If the term is a compound term (see Section
\ref{sec:compound-terms}), the prefix is ``t.mw''.
\item\texttt{type} (optional): type of the term. Currently, 2 values are
possible:
\begin{itemize}
\item\texttt{open}: open category term
\item\texttt{close}: close category term
\end{itemize}
\item\texttt{lemma} (optional): lemma of the term
\item\texttt{pos} (optional): part of speech. The first letter of the pos
attribute must be one of the following:
\begin{itemize}
\item [N] common noun
\item [R] proper noun
\item [G] adjective
\item [V] verb
\item [P] preposition
\item [A] adverb
\item [C] conjunction
\item [D] determiner
\item [O] other
\end{itemize}
\item\texttt{morphofeat} (optional): morphosyntactic feature encoded as a
single attribute.
\item\texttt{head} (optional): if the term is a compound, the id of the
head component (see Section \ref{sec:compound-terms}).
\item\texttt{case} (optional): declension case of the term.
\end{itemize}
The \texttt{<term>} element have the following sub-element:
\begin{itemize}
\item\texttt{span} (\textbf{obligatory)}: used to identify the tokens that
the term spans. A single term may refer to one or more words, in case of
multiword expressions. The \texttt{<span>} element has as many
\texttt{<target>} sub-elements as spanned tokens (see example below).
\end{itemize}
Example of term level annotations:
\begin{Verbatim}[fontsize=\small]
<terms>
<term id="t1" lemma="John" pos="R">
<span>
<target id="w1"/>
</span>
</term>
<term id="t2" type="open" lemma="teach" pos="V">
<span>
<target id="w2"/>
</span>
</term>
<term id="t3" lemma="mathematics" pos="N">
<span>
<target id="w3"/>
</span>
</term>
<term id="t4" lemma="20" pos="N">
<span>
<target id="w4"/>
</span>
</term>
<term id="t5" lemma="minute" pos="N">
<span>
<target id="w5"/>
</span>
</term>
<term id="t5" lemma="every" pos="D">
<span>
<target id="w6"/>
</span>
</term>
<term id="t6" lemma="Monday" pos="N">
<span>
<target id="w7"/>
</span>
</term>
<term id="t7" lemma="in" pos="P">
<span>
<target id="w8"/>
</span>
</term>
<term id="t.mw8" lemma="New_York" pos="R">
<span>
<target id="w9"/>
<target id="w10"/>
</span>
</term>
</terms>
\end{Verbatim}
\subsection{Sentiment features}
\label{sec:sentiment-features}
The term layer represents sentiment information which is context-independent
and that can be found in a sentiment lexicon. It is related to concepts
expressed by words/ terms (e.g. ``beautiful'') or multi-word expressions
(e. g. ``out of order''). NAF provides possibilities to store sentiment
information at term level and at sense/synset level. In the latter case, the
sentiment information is included in the \texttt{<externalReference>}
section (Section \ref{sec:external-references}) and a word sense
disambiguation process may identify the correct sense along its sentiment
information. \\
%The extension contains the following information categories.
The \texttt{<sentiment>} element has the following attributes:
\begin{itemize}
\item\texttt{Resource} (\textbf{required}): identifier and reference to an external sentiment
resource.
\item\texttt{Polarity} (\textbf{required}): Refers to the property of a
word/sense to express positive, negative or no sentiment. These values are
possible:
\begin{itemize}
\item positive
\item negative
\item neutral
\item Or a numerical value.
\end{itemize}
\item\texttt{Strength} (optional): refers to the strength of the polarity.
\begin{itemize}
\item weak
\item average
\item strong
\item Or a numerical value
\end{itemize}
\item\texttt{Subjectivity} (optional): refers to the property of a words to express an
opinion (or not).
\begin{itemize}
\item subjective
\item objective
\item factual
\item opinionated
\end{itemize}
\item\texttt{Sentiment\_semantic\_type} (optional): refers to a
sentiment-related semantic type. Possible values:
\begin{itemize}
\item Aesthetics\_evaluation
\item Moral\_judgment
\item Emotion
\item etc.
\end{itemize}
\item\texttt{Sentiment\_modifier} (optional): refers to words which modify
the polarity of another word. Possible values:
\begin{itemize}
\item intensifier polarity shifter
\item weakener polarity shifter
\end{itemize}
\item\texttt{Sentiment\_marker} (optional): refers to words which
themselves do not carry polarity, but are kind of vehicles of it. Possible
values:
\begin{itemize}
\item Find, think, in my opinion, according to ...
\end{itemize}
\item\texttt{Sentiment\_product\_feature} (optional): refers to a domain;
mainly used in feature-based sentiment analysis. Values are related to
specific domain. These are some possible values for for the tourist
domain:
\begin{itemize}
\item staff, cleanliness, beds, bathroom, transportation, location, etc..
\end{itemize}
\end{itemize}
Table \ref{tab:sent-attr-some} shows sentiment values for some words.
\begin{table}[t]
\centering
\ttfamily
\small
\begin{tabular}{|c|l|}
\hline
\textrm{Word} & \textrm{Sentiment attributes}\\
\hline
\multirow{6}{*}{\textrm{beatiful}} & polarity = "positive" \\
& strength = "average" \\
& subjectivity = "subjective"\\
& sentiment\_semantic\_type = "aesthetics\_evaluation"\\
& sentiment\_modifier = ""\\
& sentiment\_product\_feature = "general"\\
\hline
\multirow{6}{*}{\textrm{valley}} & polarity = "negative"\\
& strength = "average"\\
& subjectivity = "factual"\\
& sentiment\_semantic\_type = ""\\
& sentiment\_modifier = ""\\
& sentiment\_product\_feature = "beds"\\
\hline
\multirow{6}{*}{\textrm{very}} & polarity = ""\\
& strength = ""\\
& subjectivity = ""\\
& sentiment\_semantic\_type = ""\\
& sentiment\_modifier = "intensifier"\\
& sentiment\_product\_feature = ""\\
\hline
\end{tabular}
\caption{\textrm{Sentiment attributes.}}
\label{tab:sent-attr-some}
\end{table}
Example:
\begin{Verbatim}[fontsize=\small]
<!-- term level sentiment annotation -->
<term id="t2" lemma="nice" pos="G">
<sentiment resource="VUA_polarityLexicon_word" polarity="positive"
strength="average" subjectivity="subjective"
sentiment_semantic_type="behaviour/trait" />
<span><target id="w2"/></span>
<!-- sense level sentiment annotation -->
</term>
<term id="t5" lemma="warm" pos="G">
<sentiment/>
<span><target id="w5"/></span>
<externalReferences>
<externalRef resource="WN-ENG" reference="c_1009" conf="0.38">
<sentiment resource="VUA_polarityLexicon_synset" polarity="positive"
strength="average" subjectivity="subjective"
sentiment_semantic_type="behaviour/traitEvaluation"/>
</externalRef>
<externalRef resource="WN-ENG" reference="c_1008" conf="0.31">
<sentiment resource="VUA_polarityLexicon_synset" polarity="positive"
strength="average" subjectivity="objective"
sentiment_semantic_type="temperature"/>
</externalRef>
</externalReferences>
</term>
\end{Verbatim}
\subsection{Spanning elements}
\label{sec:span}
The \texttt{<span>} element is used to refer to elements of lower
layers. For example, \texttt{<span>} is used within \texttt{<term>} elements
to identify the basic tokens they are composed by (\texttt{<wf>} elements in
the \texttt{text} layer). Similarly, \texttt{<span>} is used in the named
entity layer (see Section \ref{sec:named-entities}) to refer to the terms
that comprise the named entity expression.
In NAF, the \texttt{<span>} element has always the same attributes and
sub-elements, regardless of the specific layer it is used.
\noindent The \texttt{<span>} element has the following attribute:
\begin{itemize}
\item\texttt{primary} (optional). This attribute is only meaningful when
multiple \texttt{<span>} elements exist in the same level (for instance,
on coreference clusters, see Section \ref{sec:coreference}). If present,
it specifies that the element is distinguished or preferred over the
remaining siblings. Only one \texttt{<span>} element among the siblings
can have a \texttt{primary} attribute set (usually with the value
``yes'').
\end{itemize}
\noindent The \texttt{<span>} element must have one or more
\texttt{<target>} sub-elements, each one pointing to an element of another
layer. The order of \texttt{<target>} sub-elements within the same
\texttt{<span>} is important.
\noindent The \texttt{<target>} element has the following attributes:
\begin{itemize}
\item\texttt{id} (\textbf{required)}: used to identify the target
element. The \texttt{id} attribute must be an identifier of another
element and must exist in the NAF document.
\item\texttt{head} (optional): in \texttt{<span>} elements that refer to
more than one element, identifies the distinguished or preferred target.
\end{itemize}
\subsection{External References}
\label{sec:external-references}
The \texttt{<externalReferences>} element is used to associate terms to
external resources, such as elements of a Knowledge base, an ontology,
etc. It consists of several \texttt{<externalRef>} elements, one per
association. The \texttt{<externalRef>} elements may be nested, meaning that
each externalRef refines the relation expressed by the parent element.\\
The \texttt{<externalRef>} element has the following attributes:
\begin{itemize}
\item\texttt{resource} (\textbf{required}): indicates the identifier of the
resource referred to.
\item\texttt{reference} (\textbf{required}): code of the referred
element. If the element is a synset of some version of WordNet, it is
recommended to follow the pattern:\\
\begin{center}
\texttt{[a-z]{3}-[0-9]{2}-[0-9]+-[nvars]}
\end{center}
which is a string composed by four fields separated by a dash. The four
fields are the following:
\begin{itemize}
\item Language code (three characters, lowercase).
\item WordNet version (two digits).
\item Synset identifier composed by digits.
\item POS character:
\begin{itemize}
\item [n] noun
\item [v] verb
\item [a] adjective
\item [r] adverb
\end{itemize}
\end{itemize}
examples of valid patterns are: ``eng-20-12345678-n'',
``spa-16-017403-v'', etc.
\item\texttt{reftype} (optional): indicates the kind of relation the
externalRef is expressing.
\item\texttt{status} (optional): indicates the status of the relationship.
\item\texttt{source} (optional): the name of the process which created the
external reference.
\item\texttt{confidence} (optional): a floating number between 0 and
1. Indicates the confidence weight of the association.
\end{itemize}
Example of term level annotations:
\begin{Verbatim}[fontsize=\small]
<terms>
<term id="t1" lemma="John" pos="R">
<span>
<target id="w1"/>
</span>
</term>
<term id="t2" type="open" lemma="teach" pos="V">
<span>
<target id="w2"/>
</span>
<externalReferences>
<externalRef resource="WN-1.7" reference="eng-17-00861095-v"
confidence="0.80">
<externalRef resource="ontology" reference="Teach"
reftype="SubClassOf">
</externalRef>
</externalRef>
<externalRef resource="WN-1.7" reference="eng-17-00859568-v"
confidence="0.20"/>
</externalReferences>
</term>
<term id="t3" lemma="mathematics" pos="N">
<span>
<target id="w3"/>
</span>
<externalReferences>
<externalRef resource="WN-1.7" reference="eng-17-04597590-n"
confidence="1.0"/>
</externalReferences>
</term>
<term id="t4" lemma="20" pos="N">
<span>
<target id="w4"/>
</span>
</term>
<term id="t5" lemma="minute" pos="N">
<span>
<target id="w5"/>
</span>
</term>
<externalReferences>
<externalRef resource="WN-1.7" reference="eng-17-12621100-n"
confidence="0.80"/>
<externalRef resource="WN-1.7" reference="eng-17-12631889-n"
confidence="0.20"/>
</externalReferences>
<term id="t5" lemma="every" pos="D">
<span>
<target id="w6"/>
</span>
</term>
<term id="t6" lemma="Monday" pos="N">
<span>
<target id="w7"/>
</span>
<externalReferences>
<externalRef resource="WN-1.7" reference="eng-17-12557842-n"
confidence="1.0"/>
</externalReferences>
</term>
<term id="t7" lemma="in" pos="P">
<span>
<target id="w8"/>
</span>
</term>
<term id="mw8" lemma="New_York" pos="R">
<span>
<target id="w9"/>
<target id="w10"/>
</span>
</term>
</terms>
\end{Verbatim}
\subsection{Compound terms}
\label{sec:compound-terms}
Compound terms can be represented in NAF by including \texttt{<component>}
elements within \texttt{<term>} elements. For example, the term
landbouwbeleid (English: agriculture policy) would look like this:
\begin{Verbatim}[fontsize=\small]
<term id="t.mw7" head="t.mw7.1" lemma="landbouwbeleid" pos="N" type="open">
<span>
<target id="w7"/>
<target id="w8/>
</span>
<component id="t.mw7.1" lemma="landbouw" pos="N" type="open">
<span>
<target id="w7"/>
</span>
<externalReferences>...</externalReferences>
</component>
<component id="t.mw7.2" lemma="beleid" pos="N" type="open">
<span>
<target id="w8"/>
</span>
<externalReferences>...</externalReferences>
</component>
<externalReferences>...</externalReferences>
</term>
\end{Verbatim}
Note: the \texttt{id} attribute of compound terms start with the prefix ``t.mw''.\\
The \texttt{<component>} element has the same attributes and structure as
the \texttt{term} element, the only difference being that components can not
contain more \texttt{component} elements.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "naf"
%%% End: