A Comparative Study of Some Automatic Arabic Text Diacritization Systems
Table 6
Examples of some common mistakes done by the HMM-based systems
Common mistakes done by the HMM-based systems
The Predicted
The correct
Description
لَمْ أَرَهُ ضَحِكَ ضُحَكَا أَكْثَرُ مِنْهُ
لَمْ أَرَهُ ضَحِكَ ضَحِكًا أَكْثَرَ مِنْهُ
(i) For “ضُحَكَا” /Duhaka/ it is an error done by the letter-based component. It should be “ضَحِكًا” /dahikan/ instead.
(ii) For “أَكْثَرُ” / Aktharu/ it is a case-ending error. It should be “أَكْثَرَ” /Akthara/ instead.
مَاذَا تَصُـــنٍــعَانٍ
مَاذَا تَـصْــــنَـــعَانِ
(i) For “تَصُـــنٍــعَانٍ”/Tasunin’anin/ it is a case-ending error: since it is a verb it should not have a nunation at the end. Also, it has a morphological error: nunation cannot be in the middle. This error is generated by the letter-based HMM. The correct form is “تَـصْــــنَـــعَانِ” /tasna’ani/.
فَلَا تُــــخْـــلَـــفُ فِــــيِــــه
فَلَا تَـــخَــلُّـــفَ فِـــيــهِ
(i) For “فِــــيِــــه”/Feyeh/ it has a case-ending and an internal error. The correct form is “فِـــيــهِ” /Fihi/.
(ii) For “تُــــخْـــلَـــفُ” /Tukhlafu/ all the structure is wrong. The correct form is “تَـــخَــلُّـــفَ” /Takhallufa/. Here the letter-based component in the system, when it diacritized the unseen word “تــخـــلــــف” it modified some letters of the words surrounding the unseen word. The word “فـــيـــه” /Fih/ is correctly diacritized in other places where there is no OOV.
نٍــقْــــبَـــــلَــــــهُ
نَــــقْــــــبَـــــلُـــــهُ
For “نٍــقْــــبَـــــلَــــــهُ” /Nin-qbalahu/: the nunation cannot be in the middle nor the beginning of a word. The internal diacritization is all-false. The correct form is “نَــــقْــــــبَـــــلُـــــهُ” /naqbaluhu/.
قَوْلُ مَرْغُوبٍ عَنْهُ
قَوْلٌ مَرْغُوبٌ عَنْهُ
“مَرْغُوبٍ” /marghubin/ here has a case-ending error. It should be “مَرْغُوبٌ” /marghubun/
يَا ابْــنُ أَخِي
يَا ابْــــنَ أَخِي
“ابْــنُ” /Bnu/ here has a case-ending error. It should be “م ابْــــنَ” /Bna/. It is a context-related error.