Extraction of association rules in a diabetic dataset using parallel FP-growth algorithm under apache spark

Authors

Keywords:

Apache spark, Association rules, Diabetes prediction, FP-growth, Parallel FP-growth

Abstract

This  research  paper  focuses  on  enhancing  the frequent  pattern  growth (FP-growth)   algorithm,   an   advanced   version  of   the   Apriori  algorithm, by employing a parallelization approach using the Apache Spark framework. Association  rule  mining,  particularly  in  healthcare  data  for  predicting  and diagnosing diabetes,   necessitates   the   handling   of   large   datasets   which traditional  methods  may  not  process  efficiently.  Our  method  improves  the FP-growth  algorithm’s  scalability  and  processing  efficiency  by  leveraging the  distributed  computing  capabilities of  apache spark.  We  conducted  a comprehensive  analysis  of  diabetes  data,  focusing  on  extracting  frequent itemsets   and   association   rules   to   predict   diabetes   onset.   The   results demonstrate   that   our   parallelized   FP-growth(PFP-growth) algorithm significantly  enhances  prediction  accuracy  and  processing  speed,  offering substantial  improvements  over  traditional  methods.  These  findings  provide valuable  insights  into  disease  progression  and  management,  suggesting  a scalable solution for large-scale data environments in healthcare analytics.

Downloads

Published

2026-02-12

Issue

Section

Articles