SBBIC has wanted to see a Khmer grammar checker developed for some time now, and we have just begin to start the process.  We selected LanguageTool, a grammar checker extension for OpenOffice as the best option for a Khmer grammar checker.  You can download our latest Alpha 3.5 test release if you want to try it out.  Usage is very, very, very limited at this stage, but please try it out, and develop more rules to be used and pass them on to us to include in the next version.  The extension requires Java 5.0 or higher. Currently there are over 25,000 tagged Khmer words in the dictionary.

DOWNLOAD:

Download “SBBIC Khmer Grammar Checker” #libreoffice – Downloaded 31315 times –

Here’s a video of it in action:

Here are some test phrases you can use to try out the grammar checker:

ហើយនឹង

ហើយហើយ

ខ្ញុំចង់ឡាននោះ

ខ្ញុំចង់បានទៅ

មនុស្សបួន

មនុស្សបួនជា

មនុស្សពីរ

មនុស្សបីក្បាល

ជាមួយនិង

មកពីរអ្នក

ទាំងពីរអ្នក

Also, here is the list of known issues (including issues already corrected):

1. របស់ is seen is marked as a noun (which it is in some cases), so it breaks adjective noun normal order (ex. ធីតាគឺជាបងស្រីម្នាក់មានចិត្តល្អចំពោះប្អូនប្រុសតូចរបស់នាង។ To fix this I have removed the NN (noun) tag from របស់ because it seems it should never be used by itself anyway (it is always paired with another noun – ex. នេះរបស់អ្នកណា? -របស់ខ្ញុំ ។ របស់របរ
2. ណា was tagged as a noun so it affected the order of Noun + pronoun (ex. នេះរបស់អ្នកណា?). We removed its noun tagging.
3. ដោយ was tagged as a noun so it was making a false positive with គេដោយថ្នមៗ។ We removed its noun tag.
4. បាន had a noun tag so it was making a false positive: នាងបានប្រលែងនឹងប្អូនយ៉ាងទន់ភ្លន់ផងដែរ។ We removed its noun tag.
5. ចំពោះ removed adjective tag (ex. និងស្លូតបូតចំពោះប្អូនប្រុសតូចរបស់នាង។)
6. នោះ and នេះ were marked as a noun (ex. បើគេមិនហាត់រៀនស្លូតបូតនោះ។) removed noun – and added DP (demonstrative pronoun).
7. គឺធ្វើខ្លួនអោយក្លាយជាមិត្តដ៏ល្អម្នាក់ជាមុនសិន។ Change it to be a classifier only.
គាត់ជាមនុស្សដ៏កំណាញ់ម្នាក់ គាត់ជាមនុស្សម្នាក់ដែលខ្ញុំទុកចិត្ត គាត់ជាមនុស្សតែម្នាក់គត់ដែលខ្ញុំស្រឡាញ់ should make a rule ដ៏ ឬ ដែលជា + adjective + classifier is ok.
8. ម្នាក់ៗ took away noun tag.
9. ពីរការ should be wrong
10. ព្រះយេស៊ូវក៏គ្រប់គ្រងលើអាកាសធាតុផងដែរ។ wrong is not normally an adjective or an adverb, it is normally a preposition – removed all other tags but IN
11. ព្រះយេស៊ូវមានអំណាចដើម្បីបញ្ឈប់ព្យុះហើយពន្លឺព្រះអាទិត្យផងដែរ។ wrong ហើយ is not an adjective, removed the adjective and adverb tags (it is only a cc)
12. ខ្ញុំទុក is not a noun it is a verb, changed to verb only
13. ចិត្ត is not an adjective, changed
14. គាត់ is not a noun, changed
15. តូច is an adjective, changed.
16. ដោយសារ when សារ is by itself, it is possibly a noun (letter), but when ដោយ is in front of it, it is a conjunction – we need to train the Disambiguator. DONE
17. មានសេចក្តីភ័យខ្លាច ពេលដែលទូកកំពុងលិច need to figure out how to deal with spaces – should they be checked? If they are, what are the rules? If they aren’t, will we miss anything. Perhaps in this case, because ពេល is linked with ដែល the meaning is therefore changed to “when” which is not a noun, but an adjective. So perhaps a rule in the Disambiguator.   DONE.   ភ័យខ្លាច ពេល  ភ័យខ្លាច ពេលដែល ភ័យខ្លាច នៅពេលដែល
18. គឺ is marked as a noun. ទ្រង់គឺ  Change to verb only. Caused grammar checker to think it can be possessed by a pronoun DONE
19. ក្មេងប្រុស is false positive for adjective should follow noun.  Added exception if noun is also a gender word to ignore. DONE
20. ទ្រង់សុគត falsely tagged as a noun (សុគត) – should be verb only. CHANGED.
21. អ្នករស់នៅជាមួយទ្រង់អស់កល្បជានិច្ច អស់ is correctly tagged as a noun, but in this case, the pronoun ទ្រង់ should not possess the noun – this noun is actually working as an adverb for រស់នៅ (answering the question how long). Some other uses of អស់
អ្នកទាំងអស់គ្នា
គំនិតឆ្នៃប្រឌិតអស់ទាស់បែបនេះ
ស្លាប់ទាំងអស់នៅក្នុងគ្រោះថ្នាក់
ចាញ់បោកគេអស់លុយ៦ម៉ឺនដុល្លារ
I think we can fix this in the disambiguator by this rule: PRONOUN + អស់ + NOUN = TAG  អស់ + NOUN as ADVERB – NEED TO CHECK THIS RULE

22. នឹង was tagged as an adjective breaking អ្នកនឹងមិនបាត់បង់ but it is rarely a adjective, so we removed the tag and will deal with it being an adjective in the disambiguator when needed.
23. ខ្ញុំ was tagged as a noun when it should only be a pronoun. CHANGED.
24. ព្រះហស្ត tagged as an adjective (ព្រះ) when that usage is quite rare – removed the adjective tag, and will tag as an adjective in the disambiguator when needed.
25. នៅពេលអ្នក – need to add exception to the noun adjective order rule to allow ពេល to be an adjective with a noun that is also an pronoun. DONE.
26. ស្មោះត្រង់ក្នុងការ – false tag as a noun (ក្នុង) REMOVED tag.
27. សប្បាយ was tagged as a noun (breaking អ្នកសប្បាយ) REMOVED
28. អ្នកអាចទៅក្រោយពេលអ្នកស្លាប់ false positive for adjective noun order. In this case ពេល is operating as an adjective (when because it is linked with ក្រោយ), not as a noun.  We need a rule in the disambiguator but it is not immediately clear what would be a good rule that would include other situations because ពេលក្រោយ is also used a lot (meaning later).
OTHER SAMPLE USES:
PRO + TOBE VB + VERB + ADJ (ក្រោយ) + ADJ (ពេល) + PRO + VERB
អ្នកអាចទៅក្រោយពេលអ្នកស្លាប់
PRO + TOBE VB + VERB + ពេល (NOUN) + ADJ (ក្រោយ)
អ្នកអាចទៅពេលក្រោយ
CC + VERB + AW + VERB + ADJ + NOUN (ពេល) + ADVERB (ក្រោយ) + NOUN + AW
និងណែនាំឲ្យញ៉ាំឲ្យទៀងពេល ក្រោយបាយរួច
NUM + CC + ADJ (ក្រោយ) + ADJ (ពេល) + VERB + VERB
ពីរនាក់ក្រោយពេលវាយសម្លាប់
NOUN + VERB + ADJ (ក្រោយ) + ADJ (ពេល) + VERB + NOUN
ចោរប្លន់ក្រោយពេលចាយលុយ
តើការដុសធ្មេញក្រោយពេលទទួលទានអាហាររួចបាត់បង់កាល់ស្យូមពិតដែរឬអត់ ?
So we might actually just do a grammar rule that deals with ពេល and the words that can be used with it. FOR NOW we will do a disambiguator as follows: ADJ + ពេល + VB or NN or PRO tag as ADJ. DONE.  ALSO we need to disallow a visible a space between the two words both in the disambiguator rule and in the grammar rule for NN + ADJ order.
29. ជាយូរឆ្នាំមកហើយ
MORE EXAMPLES:
រយៈពេលជាយូរមុនពេលដែល
បានឃើញជាយូរខែមកហើយ
I think we can tag យូរ as an adverb, modifying ជា – DONE in disambiguator.
30.
31. ជាអ្នកសង្គ្រោះរបស់គេ Added disambiguator exception to noun possessed by pronoun: ជា + អ្នក + Noun to TAG អ្នក as part of the noun. NOTE: it is possible we should just tag សង្គ្រោះ as a verb. But for now we added the rule.
32. គ្រប់គ្នាបានធ្វើបាបហើយខ្វះមិនដល់សិរីល្អនៃព្រះ – this actually looks like it might be bad grammar – it is from Romans 3:23 in the old version.
33. អ្នកសប្បាយចិត្ត – is it always possible to have PRO + ADJ + NOUN? Or only in some cases? Currently added disambiguator exception for PRO + ADJ + ចិត្ត because then the two words together act as an adjective. DONE
34. ក៏មិនមែនកើតពីអ្នករាល់គ្នាដែរ – false tag in this situation (tagged as adjective) – មែន with negative particle មិន works as adverb (does not).  Sometimes it does not have the negative particle though, and still is working as an adverb (truly). BUT in this case កើត is tagged wrong, it is rarely a noun.
EXAMPLES:
ភាសាខ្មែរស្រួលរៀនមែន
ការមិនមែនអន្តោប្រវេសន៍ចុច
ពេលវាយគ្នាមែនទែន
មិនមែនមកពីការ
កាត់បន្ថយមែនបរិមាណឧស្ម័ន
ពុំមែនកើតមានតែចំពោះ
មិនមែនលុយក្រោមតុទេ

35. ទោះជាយ៉ាងនេះក្តី need to add exception, as នេះ and នោះ cannot possess anything. DONE
36. បានអោយយើងដឹងថា normally a verb (ដឹង) – removed verb tag, will add later to exception if needed. DONE
37. ទុកចិត្តលើទង្វើល្អផ្ទាល់ខ្លួនរបស់គេ – make an exception if noun + adj + noun the adjective is linked with first noun.
EXAMPLES:
រឿងផ្ទាល់ខ្លួន
ដោយខ្លួនឯង : ឲ្យផ្ទាល់ខ្លួន, ទទួលផ្ទាល់ដៃ, និយាយផ្ទាល់មាត់, ធ្វើការផ្ទាល់ខ្លួន ។ រៀនផ្ទាល់មាត់ ឬ រៀនផ្ទាល់ពីមាត់ រៀនតពីមាត់គ្រូ គឺគ្រូប្រាប់ឲ្យថាតាម ឲ្យរៀនទន្ទេញ ដោយឥតមានគម្ពីរឬក្បួនអ្វីសម្រាប់អានទន្ទេញ : ខ្ញុំចេះអាគមនេះ ដោយរៀនផ្ទាល់មាត់គ្រូ ( ម. ព. មុខបាឋ ទៀតផង
បានជាគេផ្ទាល់ឲ្យ រកហើបមាត់មិនរួច, អញដឹងពុតវាអស់ហើយ ចាំមើលអញផ្ទាល់វាម្ដង ។ ផ្ទញ់ផ្ទាល់ គឺផ្ទញ់ឲ្យទាល់សេចក្ដី ។ ផ្ទញ់ផ្ទាល់
តាមរយៈការផលិតរបស់ខ្លួនផ្ទាល់
មានការយល់ព្រមពីសាមីខ្លួនផ្ទាល់ដោយគ្មានការបង្ខិតបង្ខំអ្វីឡើយ
I think actually this should be fixed in the disambiguator with a rule regarding ផ្ទាល់ for it is tagged as a noun (as well as JJ, RB, VB, and IN), but in this case we can be sure it is a pronoun if it is linked with another pronoun before or after it. Can it be NOUN + ផ្ទាល់ + PRONOUN and VERB + PRONOUN + ផ្ទាល់  and ADJECTIVE + ផ្ទាល់  + PRONOUN ? DONE – needs checking
38. ទ្រង់ធំដឹងក្តីដោយគ្មានអំពើបាបឡើយ Added  disambiguator rule to tag ធំ in this case as an adjective (it can be a noun). PRONOUN + ADJ – tag as ADJ – DONE – make sure this doesn’t break other issues.
39. និងជាព្រះក្នុងពេលតែមួយ AND ពិតក្នុងពេលតែមួយ – disambiguator rule: NOUN + តែ + មួយ = tag តែមួយ as adjective
40.  មានគម្ពីរឬក្បួនអ្វីសម្រាប់អានទន្ទេញ – incorrect tag of PRP for អ្វី – REMOVED
41. ពេញមួយថ្ងៃជារៀងរាល់ថ្ងៃ – រៀងរាល់ is an adjective, but this is the correct order. Added exception. DONE
42. មនុស្សពីរ។ ក្នុងពេលតែមួយ។ Khan breaks the checker (this sentence should be flagged but it is not and the second sentence should not be flagged, but it is).
43. នៅជាមួយទ្រង់ – Add disambiguator (នៅ) ជា + មួយ + ANYWORD tag ជាមួយ as preposition  (with) also removed NN tag for មួយ
MORE EXAMPLES:
ចង់ផ្សះផ្សារជាមួយភរិយាវិញ
ជាមួយនឹងអត្ថន័យខ្លី
ប៉ូលិសមួយគ្រឿង
អានជាមួយក្រុមអ្នកសិល្បៈនោះ
44. អានជាមួយក្រុមអ្នកសិល្បៈនោះ Added disambiguator CLS + អ្នក + NOUN tag អ្នក as noun.
45. ទ្រង់បានប្រទាន and អ្នករាល់គ្នាបានសង្គ្រោះ Added disambiguator បាន + VERB tag បាន as PAS
46. ក្មេងមួយចំនួនពុំមាន
MORE EXAMPLES:
មានចំនួន ៣៦០នាក់
ជ្រើសប្រវែងទីពីរចំនួនពីរដែល
I think this error only occurs because មួយ can also be a noun.  So we will make a disambiguator rule that will flag it as a number. DONE
47. នាងបានប្រលែងនឹងប្អូនយ៉ាងទន់ភ្លន់ added disambiguator rule when យ៉ាង + ANYWORD tag យ៉ាង as adverb. DONE

You can view how to create rules in the xml file here: http://languagetool.wikidot.com/developing-a-tagger-dictionary

And you can view our working list of grammar rules to be inputted into the grammar checker here: https://sbbic.org/lang/en-us/current-projects/khmer-grammar-checker-rules/

ហើយនឹង

ប្រុសក្មេង

ក្រហមឡាន

ខ្ញុំឡាន

ស្រីលោក

ឡាននោះខៀវមិនទេ

ខ្ញុំចង់ឡាននោះ

ខ្ញុំចង់បានទៅ

ប្រុសចំណាស់

ខ្ញុំមានមិត្តពីនាក់

មនុស្សបួន

មនុស្សបួនជា

មនុស្សពីរ

មនុស្សបីក្បាល

ជាមួយនិង

ជាមួយនិង

មកពីរអ្នក

ទាំងពីរអ្នក

9 Comments. Leave new

  • There is an error the download link.

    Reply
  • Thanks Laura, it should be working now.

    Reply
  • កូនខ្មែរ
    June 8, 2011 8:45 am

    វីដេអូ​នេះ មិន​អាច​មើល​បាន​ទេ​។ សូម​ជួយ​ប្រាប់​ពី​របៀប​ដំឡើង និង​ធ្វើ​យ៉ាងណា ទើប​អាច​ប្រើ​កម្មវិធី​នេះ​បាន​។ សូម​អរគុណ​!

    Reply
  • The demo movie is not so clear, seem that you used “ប្រុសក្មេង” and mentioned that it is wrong. please kindly check in the following example: អ្នកស្រីខ្ញំុ គាត់ចូលចិត្តតែប្រុសក្មេងទេ។ ឯងស្រលាញ់ទាហានស្រីគឺមិនអីទេ តែប្រសិនបើឯងស្រលាញ់ប៉ះស្រីទាហានវិញ នេះឯងពិតជាមានបញ្ហាធំហើយ។

    Reply
    • Hello Sovann,
      Yes you are right – this movie is actually of the first version – the updated version has this issue fixed. I have changed the video to a newer version. Please let us know if you find any problems so we can correct them.
      Also, you can join the SBBIC page on Facebook for more videos: https://www.facebook.com/sbbic
      Thank you!

      Reply
  • i can’t download it. can u tell me how do i download it? i want to use it because i think it is so good.

    Reply
  • what about g.cheker for smart phone?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

This site uses Akismet to reduce spam. Learn how your comment data is processed.