عناصر مشابهة

Arabic Text Classification Using Dynamic N-Gram

تفصيل البيانات البيبلوغرافية
العنوان بلغة أخرى:تصنيف النصوص العربية باستخدام الانغرام المتغير
الناشر: المفرق
المؤلف الرئيسي: Al Omoush, Safaa Qasim (مؤلف)
مؤلفين آخرين: Samawi, Venus W. (Advisor)
التاريخ الميلادي:2013
الصفحات:1 - 52
رقم MD:819023
نوع المحتوى: رسائل جامعية
اللغة:English
قواعد المعلومات:Dissertations
الدرجة العلمية:رسالة ماجستير
الجامعة:جامعة آل البيت
الكلية:كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات
مواضيع:
رابط المحتوى:
LEADER 03798nam a22003257a 4500
001 1469699
041 |a eng 
100 |9 438843  |a Al Omoush, Safaa Qasim  |e Author 
245 |a Arabic Text Classification Using Dynamic N-Gram 
246 |a تصنيف النصوص العربية باستخدام الانغرام المتغير 
260 |a المفرق  |c 2013 
300 |a 1 - 52 
336 |a رسائل جامعية 
502 |c جامعة آل البيت  |f كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات  |g الاردن  |o 0078  |b رسالة ماجستير 
520 |a N-gram is defined as a subsequence of N items from a given sequence. In case of noisy text problem, N-gram is the ideal solution. Therefore, we are interested in using N-gram to represent text documents. In the literature, N-gram refers sometimes to sequences that are not ordered or consecutive. In this thesis, an N-gram will refer to a chain of N consecutive characters. Few researches used N as static value for Arabic text classification and information retrieval purposes. In static N-gram, the text will be segmented to create N-grams with the same length (value of N) such as 3, 4, 5...etc. The problem of this type of text representation is that, if there is a word or stem with letters less than N character, it will be neglected and considered as a useless word. For example, if N=4 then all the words which have fewer letters than 4 will be neglected. Our work is concerned with developing an automated system for classifying Arabic text documents by using N-gram as text representation. We have suggested dynamic N-gram, where N will be determined dynamically (based on word length) to reduce the common grams that may belong to totally different words. To study the performance of dynamic N-gram (weather it will improve the classification accuracy or not), both traditional static N-gram system and the suggested dynamic N-gram system have been built. The result of the two systems will be compared from accuracy, recall, precision, and F-measure point of views. F-measure is a standard statistical measure that is used to measure the performance of a classifier system. The F-measure is an average parameter based on precision and recall. Our proposed system consists of number of phases: document preprocessing, document feature extraction, construction of the classifier, and document classification. We have constructed two classifiers: Naïve Bayes (NB) classifier and Dice-measure distance classifier. Finally, in classification phase, we have evaluated the performance of our proposed system using Diab dataset, and calculated the standards evaluation measurements mentioned above. The classification results was promising (F-measure=98.87% with Dice-measure classifier). Also, it is found that the Dice-measure classifier performs better when dynamic N-gram is used. 
653 |a تصنيف النصوص  |a تصنيف النصوص العربية  |a الانغرام المتغير  |a علم الحاسب الآلى  |a تكنولوجيا المعلومات 
700 |9 46739  |a Samawi, Venus W.  |e Advisor 
856 |u 9802-005-012-0078-T.pdf  |y صفحة العنوان 
856 |u 9802-005-012-0078-A.pdf  |y المستخلص 
856 |u 9802-005-012-0078-C.pdf  |y قائمة المحتويات 
856 |u 9802-005-012-0078-F.pdf  |y 24 صفحة الأولى 
856 |u 9802-005-012-0078-1.pdf  |y 1 الفصل 
856 |u 9802-005-012-0078-2.pdf  |y 2 الفصل 
856 |u 9802-005-012-0078-3.pdf  |y 3 الفصل 
856 |u 9802-005-012-0078-4.pdf  |y 4 الفصل 
856 |u 9802-005-012-0078-O.pdf  |y الخاتمة 
856 |u 9802-005-012-0078-R.pdf  |y المصادر والمراجع 
930 |d y 
995 |a Dissertations 
999 |c 819023  |d 819023