عناصر مشابهة
Cybercrime and Authorship Detection in Very Short Texts: A Quantitative Morpho-Lexical Approach
المصدر: | مجلة البحث العلمي في الآداب |
---|---|
الناشر: |
جامعة عين شمس - كلية البنات للآداب والعلوم والتربية
|
المؤلف الرئيسي: | |
المجلد/العدد: | ع20, ج1 |
محكمة: | نعم |
الدولة: | مصر |
التاريخ الميلادي: | 2019 |
الصفحات: | 291 - 316 |
DOI: | 10.21608/JSSA.2019.38725 |
ISSN: | 2356-8321 |
رقم MD: | 978060 |
نوع المحتوى: | بحوث ومقالات |
اللغة: | English |
قواعد المعلومات: | AraBase |
مواضيع: | |
رابط المحتوى: |
|
المستخلص: | The present study proposes an integrated framework that considers letter- pair frequencies / combinations along with the lexical features of documents. Drawing on a quantitative morpho-lexical approach, the study tests the hypothesis that letter information or mapping carries unique stylistic features; and therefore detecting stable word combinations and morphological patterns can be used to enhance the authorship performance in relation to very short texts. The data used for analysis is a corpus of 12240 tweets derived from 87 Twitter accounts. Self-organizing maps (SOMs) model is used for classifying the input patterns that share common features together as a clue that tweets grouped under one class membership are written by the same author. Results indicate that the classification accuracy based on the proposed system is around 76%. Up to 22% of this accuracy was lost, however, when only distinctive words were used, and 26% was lost when the classification performance was based on letter combinations and morphological patterns only. The integration of letter-pairs and morphological patterns had the advantage of improving the accuracy of determining the author of a given tweet. This indicates that the integration of different linguistic variables into an integrated system leads to a better classification performance of very short texts. It is also clear that the use of the self-organizing map (SOM) led to better clustering performance for its capacity to integrate two different linguistic levels of each author profile together. |
---|