A Comparative Study of Some Automatic Arabic Text Diacritization Systems
Table 7
Common mistakes done by the deep-net-based systems.
Common mistakes done by the deep-network-based systems
The predicted
The correct
Description
الوَلَدُ الوَاحِــــدِ
الْوَلَدُ الْـوَاحِــــــدُ
(i) For “الوَاحِــــدِ”/Alwahidi/here it is a case-ending error. The right form is “الْـوَاحِــــــدُ”/Alwahidu/.
لِأَنَّهُ تَــــمْـــــلِــــــكْ
لِأَنَّهُ تَـــمَــــلَّكَ
(i) Here the form “تَــــمْـــــلِــــــكْ”/Tamlik/is wrong because the internal diacritics and the case ending are wrong. The right answer would be “تَـــمَـــــلَّكَ”/Tamallaka/
ضَحِكَ ضَـــحِّـــــكَا أَكْـثَـــــرُ
ضَحِكَ ضَــحِــكًــا أَكْـثَــرَ
(i) For the word “ضَـــحِّـــــكَا”/Dahhikan/it has a case-ending error because a nunation is required in the kaf letter “كـ”. It has also an internal error where the germination should not appear. The correct form is “ضَــحِــكًــا”/dahikan/.
(ii) For the word “أَكْـثَـــــرُ”/Akthara/, it is a case-ending error. In this context, the correct form should be “أَكْـثَــرَ“.
تَـحْـــرِجُــــتْ
تَحَرَّجْتَ
(i) For the word “تَـحْـــرِجُــــتْ”/Tahrijut/the whole form is incorrect. The correct form is “تَحَرَّجْتَ”/Taharrajta/.
مَاذَا تَــــصِــنِـــــعَانِ
مَاذَا تَصْــــــنَــعَـــانِ
(i) “تَــــصِــنِـــــعَانِ”/Tassini’ani/here it has an error related to the internal diacritics. The correct form is “تَـصْــــنَـــعَانِ”/Tasna’ani/.
أَعْلَمُ أَنّــــهُــــــمْ
أَعْلَمُ أَنَّــــهُـــــــمْ
(i) The word “أَنّــــهُــــــمْ”/Ann-hum/has a wrong internal diacritic over the letter “نـ”. The germination should be accompanied by the short vowel Fat-ha. The correct form is “أَنَّــــهُـــــــمْ”/Annahum/.