Advances in Human-Computer Interaction

Review Article

A Comparative Study of Some Automatic Arabic Text Diacritization Systems

Table 6

Examples of some common mistakes done by the HMM-based systems


Common mistakes done by the HMM-based systems
The Predicted	The correct	Description

لَمْ أَرَهُ ضَحِكَ ضُحَكَا أَكْثَرُ مِنْهُ	لَمْ أَرَهُ ضَحِكَ ضَحِكًا أَكْثَرَ مِنْهُ	(i) For “ضُحَكَا” /Duhaka/ it is an error done by the letter-based component. It should be “ضَحِكًا” /dahikan/ instead.
لَمْ أَرَهُ ضَحِكَ ضُحَكَا أَكْثَرُ مِنْهُ	لَمْ أَرَهُ ضَحِكَ ضَحِكًا أَكْثَرَ مِنْهُ	(ii) For “أَكْثَرُ” / Aktharu/ it is a case-ending error. It should be “أَكْثَرَ” /Akthara/ instead.
مَاذَا تَصُـــنٍــعَانٍ	مَاذَا تَـصْــــنَـــعَانِ	(i) For “تَصُـــنٍــعَانٍ”/Tasunin’anin/ it is a case-ending error: since it is a verb it should not have a nunation at the end. Also, it has a morphological error: nunation cannot be in the middle. This error is generated by the letter-based HMM. The correct form is “تَـصْــــنَـــعَانِ” /tasna’ani/.
فَلَا تُــــخْـــلَـــفُ فِــــيِــــه	فَلَا تَـــخَــلُّـــفَ فِـــيــهِ	(i) For “فِــــيِــــه”/Feyeh/ it has a case-ending and an internal error. The correct form is “فِـــيــهِ” /Fihi/.
فَلَا تُــــخْـــلَـــفُ فِــــيِــــه	فَلَا تَـــخَــلُّـــفَ فِـــيــهِ	(ii) For “تُــــخْـــلَـــفُ” /Tukhlafu/ all the structure is wrong. The correct form is “تَـــخَــلُّـــفَ” /Takhallufa/. Here the letter-based component in the system, when it diacritized the unseen word “تــخـــلــــف” it modified some letters of the words surrounding the unseen word. The word “فـــيـــه” /Fih/ is correctly diacritized in other places where there is no OOV.
نٍــقْــــبَـــــلَــــــهُ	نَــــقْــــــبَـــــلُـــــهُ	For “نٍــقْــــبَـــــلَــــــهُ” /Nin-qbalahu/: the nunation cannot be in the middle nor the beginning of a word. The internal diacritization is all-false. The correct form is “نَــــقْــــــبَـــــلُـــــهُ” /naqbaluhu/.
قَوْلُ مَرْغُوبٍ عَنْهُ	قَوْلٌ مَرْغُوبٌ عَنْهُ	“مَرْغُوبٍ” /marghubin/ here has a case-ending error. It should be “مَرْغُوبٌ” /marghubun/
يَا ابْــنُ أَخِي	يَا ابْــــنَ أَخِي	“ابْــنُ” /Bnu/ here has a case-ending error. It should be “م ابْــــنَ” /Bna/. It is a context-related error.