Review Article

A Comparative Study of Some Automatic Arabic Text Diacritization Systems

Table 6

Examples of some common mistakes done by the HMM-based systems

Common mistakes done by the HMM-based systems
The PredictedThe correctDescription

لَمْ أَرَهُ ضَحِكَ ضُحَكَا أَكْثَرُ مِنْهُلَمْ أَرَهُ ضَحِكَ ضَحِكًا أَكْثَرَ مِنْهُ(i) For “ضُحَكَا” /Duhaka/ it is an error done by the letter-based component. It should be “ضَحِكًا” /dahikan/ instead.
(ii) For “أَكْثَرُ” / Aktharu/ it is a case-ending error. It should be “أَكْثَرَ” /Akthara/ instead.
مَاذَا تَصُـــنٍــعَانٍمَاذَا تَـصْــــنَـــعَانِ(i) For “تَصُـــنٍــعَانٍ”/Tasunin’anin/ it is a case-ending error: since it is a verb it should not have a nunation at the end. Also, it has a morphological error: nunation cannot be in the middle. This error is generated by the letter-based HMM. The correct form is “تَـصْــــنَـــعَانِ” /tasna’ani/.
فَلَا تُــــخْـــلَـــفُ فِــــيِــــهفَلَا تَـــخَــلُّـــفَ فِـــيــهِ(i) For “فِــــيِــــه”/Feyeh/ it has a case-ending and an internal error. The correct form is “فِـــيــهِ” /Fihi/.
(ii) For “تُــــخْـــلَـــفُ” /Tukhlafu/ all the structure is wrong. The correct form is “تَـــخَــلُّـــفَ” /Takhallufa/. Here the letter-based component in the system, when it diacritized the unseen word “تــخـــلــــف” it modified some letters of the words surrounding the unseen word. The word “فـــيـــه” /Fih/ is correctly diacritized in other places where there is no OOV.
نٍــقْــــبَـــــلَــــــهُنَــــقْــــــبَـــــلُـــــهُFor “نٍــقْــــبَـــــلَــــــهُ” /Nin-qbalahu/: the nunation cannot be in the middle nor the beginning of a word. The internal diacritization is all-false. The correct form is “نَــــقْــــــبَـــــلُـــــهُ” /naqbaluhu/.
قَوْلُ مَرْغُوبٍ عَنْهُقَوْلٌ مَرْغُوبٌ عَنْهُ“مَرْغُوبٍ” /marghubin/ here has a case-ending error. It should be “مَرْغُوبٌ” /marghubun/
يَا ابْــنُ أَخِييَا ابْــــنَ أَخِي“ابْــنُ” /Bnu/ here has a case-ending error. It should be “م ابْــــنَ” /Bna/. It is a context-related error.