[Librezale] [Firefox OS] auto-zuzenketa eta auto-gomendioak

Julen Ruiz Aizpuru julenx a bildua gmail.com
Ost, Maiatza 21, 10:17:43, CEST 2014


Aupa!

Firefox OSerako auto-zuzenketa eta auto-gomendioak gaitzeko bug bat
zabaldu nuen [1] aurreko batean eta Kevin Scannel nekaezina laguntzeko
prest agertu da.

Weba corpus gisa hartuta, errepikapenen hitz-zerrenda bat sortuko luke
euskararentzako. Hori egiteko galdera batzuk luzatu ditu ordea, hitzen
bolumenari buruzkoak batez ere:


"That said, Basque is morphologically very complex, and so no matter
how big of a corpus I collect, there will be many words missing.  For
example, the Xuxen spellchecker addon accepts hundreds of millions of
words in total (so many it's hard to even estimate), but accepts
86-87% of words in typical running texts.  Julen, any thoughts on
this?  Would you be satisfied with a frequency list of say 1.5-2M
words even if there are many gaps?

Also, do you want me to only include words that the spellchecker
accepts?   This is what I've done for other languages to avoid English
or Spanish "pollution", but again this might leave out some important
words.   I could also send you a list of the most frequent words not
accepted by the spell checker and you could manually clean that list
(and potentially add them to Xuxen)."


Ni behintzat ez nago ohituta zenbaki hauekin lan egiten, beraz nahiago
dut galdera dakienari delegatu. Aditurik salan? Igor, ideiarik bai?
Bugeko eztabaidan sartzeko modurik bai?

IXAkoei ere galdetuko diet, bide batez Xuxen hobetzeko ere balio lezake-eta.


Julen.


[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1007547


Librezale posta zerrendari buruzko informazio gehiago