WP1: Integration of language resources in eLearning

advertisement
Combining pattern-based and
machine learning methods to
detect definitions for
eLearning purposes
Eline Westerhout
&
Paola Monachesi
Overview
• Extraction of definitions within eLearning
• Types of definitory contexts
• Grammar approach
• Machine learning approach
• Conclusions
• Future work
• Discussion
Extraction of definitions within
eLearning
• Definition extraction:
– question answering
– building dictionaries from text
– ontology learning
• Challenges within eLearning:
– corpus
– size of LOs
Types - I
• is_def:
Gnuplot is een programma om grafieken te maken
‘Gnuplot is a program for drawing graphs’
• verb_def:
E-learning omvat hulpmiddelen en toepassingen die via het
internet beschikbaar zijn en creatieve mogelijkheden
bieden om de leerervaring te verbeteren .
‘eLearning comprises resources and applications that are
available via the internet and provide creative
possibilities to improve the learning experience’
Types - II
• punct_def
Passen: plastic kaarten voorzien van een magnetische strip,
[...] toegang krijgt tot bepaalde faciliteiten.
‘Passes: plastic cards equipped with a magnetic strip, that
[...] gets access to certain facilities. ’
• pron_def
Dedicated readers. Dit zijn speciale apparaten, ontwikkeld
met het exclusieve doel e-boeken te kunnen lezen.
‘Dedicated readers. These are special devices, developed
with the exclusive goal to make it
possible to read ebooks.’
Grammar approach
• General
• Example
• Results
Identification of definitory contexts
• Make use of the linguistic annotation of LOs (partof-speech tags)
• Domain: computer science for non-experts
• Use of language specific grammars
• Workflow
– Searching and marking definitory contexts in LOs
(manually)
– Drafting local grammars on the basis of these examples
– Apply the grammars to new LOs
Grammar example
Een vette letter is een letter die zwarter
wordt afgedrukt dan de andere letters.
<rule name="simple_NP" >
<seq>
<and>
<ref name="art"/>
<ref name="cap"/>
</and>
<ref name="adj" mult="*"/>
<ref name="noun" mult="+"/>
</seq>
</rule>
Een vette letter is een letter die zwarter
wordt afgedrukt dan de andere letters.
<query match="tok[@ctag='V' and @base='zijn'
and @msd[starts-with(.,'hulpofkopp')]]"/>
Een vette letter is een letter die zwarter
wordt afgedrukt dan de andere letters.
<rule name="noun_phrase">
<seq>
<ref name="art" mult="?"/>
<ref name="adj" mult="*" />
<ref name="noun" mult="+" />
</seq>
</rule>
Een vette letter is een letter die zwarter
wordt afgedrukt dan de andere letters.
<rule name="is_are_def">
<seq>
<ref name="simple_NP"/>
<query match="tok[@ctag='V' and @base='zijn' and
@msd[starts-with(.,'hulpofkopp')]]"/>
<ref name="noun_phrase" />
<ref name="tok_or_chunk" mult="*"/>
</seq>
</rule>
Een vette letter is een letter die zwarter
wordt afgedrukt dan de andere letters.
<definingText>
<markedTerm>
<tok sp="n" msd="onbep,zijdofonzijd,neut" ctag="Art" base="een" id="t214.2">Een</tok>
<tok sp="n" msd="attr,stell,vervneut" ctag="Adj" base="vet"
id="t214.3">vette</tok>
<tok sp="n" msd="soort,ev,neut" ctag="N" base="letter" id="t214.4">letter</tok>
</markedTerm>
<tok sp="n" msd="hulpofkopp,ott,3,ev" ctag="V" base="zijn" id="t214.5">is</tok>
<tok sp="n" msd="onbep,zijdofonzijd,neut" ctag="Art" base="een"
id="t214.6">een</tok>
<tok sp="n" msd="soort,ev,neut" ctag="N" base="letter" id="t214.7">letter</tok>
...
<tok sp="n" msd="onbep,neut,attr" ctag="Pron" base="andere"
id="t214.14">andere</tok>
<tok sp="n" msd="soort,mv,neut" ctag="N" base="letter" id="t214.15">letters</tok>
<tok sp="n" msd="punt" ctag="Punc" base="." id="t214.16">.</tok>
</definingText>
Results (grammar)
is_def
verb_def
punct_def
pron_def
P
0.2810
0.4464
0.0991
0.0918
R
0.8652
0.7576
0.6818
0.4130
F
0.4242
0.5618
0.1731
0.1502
Machine learning
• Features
• Configurations
• Results
Features
• Text properties: bag-of-words, bigrams,
and bigram preceding the definition
• Syntactic properties: type of determiner
within the defined term (definite,
indefinite, no determiner)
• Proper nouns: presence of a proper noun
in the defined term
Configurations
S e ttin g
A ttribu te s
1
using ba g-of-words
2
using bigra m s
3
com bining ba g-of-words and bigra m s
4
a ddin g bigra m precedin g definition to s etting 3
5
a ddin g definitenes s of article in m arked term to s etting 3
6
a ddin g presen ce of proper noun to s etting 3
7
a ddin g bigra m precedin g definition & definitenes s of article in m arked
term to s etting 3
8
a ddin g bigra m precedin g definition & pres en ce of proper noun to
s etting 3
9
a ddin g definitenes s of article in m arked term & presen ce of proper
noun to s etting 3
10
using all attributes
Results – is_def (ML)
1
2
3
4
5
6
7
8
9
10
P
0.6944
0.6625
0.7662
0.7662
0.7763
0.7662
0.7867
0.7632
0.7895
0.8000
R
0.6494
0.6883
0.7662
0.7662
0.7662
0.7662
0.7662
0.7532
0.7792
0.7792
F
0.6711
0.6752
0.7662
0.7662
0.7712
0.7662
0.7763
0.7582
0.7843
0.7895
Results – is_def (final)
1
2
3
4
5
6
7
8
9
10
P
0.6944
0.6625
0.7662
0.7662
0.7763
0.7662
0.7867
0.7632
0.7895
0.8000
R
0.5618
0.5955
0.6629
0.6629
0.6629
0.6629
0.6629
0.6517
0.6742
0.6742
F
0.6211
0.6272
0.7108
0.7108
0.7152
0.7108
0.7195
0.7030
0.7273
0.7317
Results – punct_def (ML)
1
2
3
4
5
6
7
8
9
10
P
0.4324
0.3171
0.4510
0.4681
0.4528
0.5000
0.5106
0.5000
0.5000
0.5000
R
0.3556
0.2889
0.5111
0.4889
0.5333
0.5333
0.5333
0.5333
0.5778
0.5333
F
0.3902
0.3023
0.4792
0.4783
0.4898
0.5161
0.5217
0.5161
0.5361
0.5161
Results – punct_def (final)
1
2
3
4
5
6
7
8
9
10
P
0.4324
0.3171
0.4510
0.4681
0.4528
0.5000
0.5106
0.5000
0.5000
0.5000
R
0.2424
0.1970
0.3485
0.3333
0.3636
0.3636
0.3636
0.3636
0.3939
0.3636
F
0.3107
0.2430
0.3932
0.3894
0.4034
0.4211
0.4248
0.4211
0.4407
0.4211
Final results
is_def
punct_def
before
after (10)
before
after (9)
• precision 
• recall

• f-score

P
0.2810
0.8000
0.0991
0.5000
R
0.8652
0.6742
0.6818
0.3939
F
0.4242
0.7317
0.1731
0.4407
(50 % and 40 %)
(20 % and 30 %)
(30 % and 25 %)
Related work
• Question answering:
– Fahmi & Bouma (2006)
– Miliaraki & Androutsopoulos (2004)
• Glossary creation:
– Muresan & Klavans (2002)
• Ontology learning:
– Storrer & Wellinghof (2006)
– Walter & Pinkal (2006)
Future work
• try different features
• evaluate other classifiers
• extend to all types of definitions
• scenario based evaluation of the GCD
Discussion
• Good features?
• Apply filtering: yes or no?
• How to evaluate the performance?
– scenario based?
– compare with manual annotation?
– ...
Download