Classification of Document Similarity Using Winnowing Algorithm with Jaccard Coefficient Approach

Authors

  • Uzdha Zachrias Universitas Mercu Buana Author
  • Wawan Gunawan Universitas Mercu Buana Author

Keywords:

Jaccard Coefficient, Plagiarisme, Similarity, Stopword, Winnowing

Abstract

In the era of information technology advancement, easy access to various sources of information through the internet has changed the way students conduct research. While it provides significant benefits, this convenience also brings the problem of plagiarism, which is a detrimental act in the academic world. Plagiarism is the act of copying or taking ideas from someone else's work without giving proper credit, which is contrary to academic guidelines. This research aims to develop an effective plagiarism detection system that is in accordance with the Indonesian language. This system uses a Winnowing algorithm with a Jaccard Coefficient approach and a technique of eliminating non-descriptive words (stopwords) in Indonesian. Samples of documents in Indonesian were taken from the final project of Mercu Buana University students. The data is collected from the university's repository and will be analyzed to measure the level of similarity between documents and the performance of the Winnowing algorithm in detecting plagiarism. The results of this study show that the development of a plagiarism detection system using the Winnowing algorithm and the Jaccard Coefficient approach with an n-gram value of 7 succeeded in achieving optimal results with precision, recall, and accuracy results reaching 100%. The similarity index detection system is able to provide accurate and relevant results on Indonesian documents.

Downloads

Download data is not yet available.

References

M. A. Shadiqi, “Memahami dan Mencegah Perilaku Plagiarisme dalam Menulis Karya Ilmiah,” Buletin Psikologi, vol. 27, no. 1, p. 30, Jun. 2019, doi: 10.22146/buletinpsikologi.43058.

N. Nurdin, R. Rizal, and R. Rizwan, “Pendeteksian Dokumen Plagiarisme dengan Menggunakan Metode Weight Tree,” Telematika, vol. 12, no. 1, p. 31, Feb. 2019, doi: 10.35671/telematika.v12i1.775.

S. Sariffuddin, K. D. Astuti, and R. Arthur, “Investigating Plagiarism: The Form and The Motivation in Performing Plagiarism in High Education,” Journal of Education and Learning (EduLearn), vol. 11, no. 2, pp. 172–178, May 2017, doi: 10.11591/edulearn.v11i2.5994.

D. D. Sinaga and S. Hansun, “Indonesian text document similarity detection system using rabin-karp and confix-stripping algorithms,” International Journal of Innovative Computing, Information and Control, vol. 14, no. 5, pp. 1893–1903, Oct. 2018, doi: 10.24507/ijicic.14.05.1893.

S. Sugiono, H. Herwin, H. Hamdani, and E. Erlin, “Aplikasi Pendeteksi Tingkat Kesamaan Dokumen Teks: Algoritma Rabin Karp Vs. Winnowing,” Digital Zone: Jurnal Teknologi Informasi dan Komunikasi, vol. 9, no. 1, pp. 82–93, May 2018, doi: 10.31849/digitalzone.v9i1.1242.

S. Sunardi, A. Yudhana, and I. A. Mukaromah, “Implementasi Deteksi Plagiarisme Menggunakan Metode N-Gram Dan Jaccard Similarity Terhadap Algoritma Winnowing,” Transmisi: Jurnal Ilmiah Teknik Elektro, vol. 20, no. 3, pp. 105–110, 2018.

Y. Nurdiansyah, F. Nur Muharrom, and F. Firdaus, “Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document,” in MATEC Web of Conferences, EDP Sciences, Apr. 2018. doi: 10.1051/matecconf/201816401048.

E. G. , W. A. , & H. S. Hasan, “The Implementation of Winnowing Algorithm for Plagiarism Detection in Moodle-based E-learning,” 2018.

Wawan Gunawan and Bagus Seno Prasetyo Diwiryo, “Implementasi Algoritma Fuzzy C-Means Clustering Sistem Crowdfunding pada SektorIndustri Kreatif Berbasis Web,” JEPIN (Jurnal Edukasi dan Penelitian Informatika) , vol. 6, 2020.

Ragil Dimas Himawan and Eliyani Eliyani, “Perbandingan Akurasi Analisis Sentimen Tweet terhadap Pemerintah Provinsi DKI Jakarta di Masa Pandemi,” JEPIN (Jurnal Edukasi dan Penelitian Informatika) , vol. 7, 2021.

D. Soyusiawaty and F. Rahmawanto, “Similarity Detector on the Student Assignment Document Using Levenshtein Distance Method,” in 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, Nov. 2018, pp. 656–661. doi: 10.1109/ISRITI.2018.8864339.

E. Y. Puspaningrum, B. Nugroho, A. Setiawan, and N. Hariyanti, “Detection of Text Similarity for Indication Plagiarism Using Winnowing Algorithm Based K-gram and Jaccard Coefficient,” J Phys Conf Ser, vol. 1569, no. 2, p. 022044, Jul. 2020, doi: 10.1088/1742-6596/1569/2/022044.

W. G. S. Parwita, I. G. A. A. D. Indradewi, and I. N. S. W. Wijaya, “String Matching based Plagiarism Detection for Document in Bahasa Indonesia,” in 2019 5th International Conference on New Media Studies (CONMEDIA), IEEE, Oct. 2019, pp. 54–58. doi: 10.1109/CONMEDIA46929.2019.8981821.

A. A. Lutfi, A. E. Permanasari, and S. Fauziati, “Sentiment analysis in the sales review of Indonesian marketplace by utilizing Support Vector Machine,” Journal of Information Systems Engineering and Business Intelligence, vol. 4, no. 1, pp. 57–64, 2018.

P. Buttar, J. Kaur, and P. Kaur Buttar, “A Systematic Review on Stopword Removal Algorithms,” 2018, [Online]. Available: http://www.ijfrcsce.org

Y. Arifin, S. M. Isa, L. A. Wulandhari, and E. Abdurachman, “Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing,” J Phys Conf Ser, vol. 978, p. 012082, Mar. 2018, doi: 10.1088/1742-6596/978/1/012082.

K. Yang and Y. Xu, “An effective method for complex network community detection based on hierarchical splitting,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Apr. 2019, pp. 10–14. doi: 10.1145/3325730.3325747.

D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Comput Oper Res, vol. 152, p. 106131, Apr. 2023, doi: 10.1016/j.cor.2022.106131.

Downloads

Published

2025-03-27

How to Cite

Classification of Document Similarity Using Winnowing Algorithm with Jaccard Coefficient Approach. (2025). Science of Information & Technology Applied, 1(1). https://ejournal.bacadulu.net/index.php/sinta/article/view/87