Transformer-based abstractive indonesian text summarization

Authors

Keywords:

Abstractive summarization, BART model, ChatGPT augmentation, IndoSum, Liputan6

Abstract

The volume of data created, captured, copied, and consumed worldwide has increased  from  2  zettabytes  in  2010 to  over  97  zettabytes  in  2020,  with  an estimation  of 181 zettabytes  in 2025. Automatic text summarization(ATS)will ease giving points of information and will increase efficiency at the time consumed   to   understand   the   information.   Therefore,   improving ATSperformance   in   summarizing   news   articles   is   the   goal   of   this   paper. This  work  will  fine-tune  the  BART  model  using  IndoSum,  Liputan6, and     Liputan6     augmented     dataset     for     abstractive     summarization. Data  augmentation  forLiputan6  will  be  augmented  with  the  ChatGPT method.  This  work  will  also  use r ecall-oriented  understudy  of  gisting evaluation (ROUGE)as  an  evaluation  metric.  The  data  augmentation  with ChatGPT  used  10%  of  the  clean  news  article  from  the  Liputan6  trainingdataset and ChatGPT generated the abstractive summary based on that input, culminating  in  over  36  thousand  data  for  the  model’s  fine-tuning.  BART model that was finetuned using Indosum, Liputan6, and augmented Liputan6 dataset  has  the  best  ROUGE-2  score, outperforming  ORACLE’s  model although  ORACLE  still  has  the  best  ROUGE-1  and  ROUGE-L  score. This  concludes  that  fine-tuning  the  BART  model  with  multiple  datasets will    increase    the    performance    of    the    model    to    do    abstractive summarization tasks.

Downloads

Published

2026-02-12

Issue

Section

Articles