Transformer-based abstractive indonesian text summarization

Miracle Aurelia; Sheila Monica; Abba Suganda Girsang

Authors

Miracle Aurelia Bina Nusantara University Author https://orcid.org/0000-0001-8532-0529
Sheila Monica Bina Nusantara University Author https://orcid.org/0000-0003-4574-3679
Abba Suganda Girsang Bina Nusantara University Author https://orcid.org/0000-0003-4574-3679

Keywords:

Abstractive summarization, BART model, ChatGPT augmentation, IndoSum, Liputan6

Abstract

The volume of data created, captured, copied, and consumed worldwide has increased from 2 zettabytes in 2010 to over 97 zettabytes in 2020, with an estimation of 181 zettabytes in 2025. Automatic text summarization(ATS)will ease giving points of information and will increase efficiency at the time consumed to understand the information. Therefore, improving ATSperformance in summarizing news articles is the goal of this paper. This work will fine-tune the BART model using IndoSum, Liputan6, and Liputan6 augmented dataset for abstractive summarization. Data augmentation forLiputan6 will be augmented with the ChatGPT method. This work will also use r ecall-oriented understudy of gisting evaluation (ROUGE)as an evaluation metric. The data augmentation with ChatGPT used 10% of the clean news article from the Liputan6 trainingdataset and ChatGPT generated the abstractive summary based on that input, culminating in over 36 thousand data for the model’s fine-tuning. BART model that was finetuned using Indosum, Liputan6, and augmented Liputan6 dataset has the best ROUGE-2 score, outperforming ORACLE’s model although ORACLE still has the best ROUGE-1 and ROUGE-L score. This concludes that fine-tuning the BART model with multiple datasets will increase the performance of the model to do abstractive summarization tasks.

Transformer-based abstractive indonesian text summarization

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

About Journal

Journal Policies

Author

Article Template

Information