Unifying Sentiment Analysis and Emotion Recognition for Bangla Text: A Hybrid Approach

Md Motaleb Hossen Manik, Anite Halim Sagor, Fahim Ahmed Mondal, Md Mossadek Touhid, Md Zabirul Islam
Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh

Highlights

Bangla text analysis is often studied in two separate tracks: Sentiment Analysis (SA) and Emotion Recognition (ER). This paper proposes a single framework that performs both SA and ER using a hybrid approach. Instead of relying on one model, it first runs five machine learning algorithms (SVM, Logistic Regression, Decision Tree, Random Forest, KNN) and then combines their predictions using a weighted hybridization strategy based on each model’s accuracy.

Key Achievements (reported in the paper):

🏗️ Overall Workflow (Fig. 1)

Workflow diagram of the proposed framework (Fig. 1)

Figure 1. Workflow diagram of the proposed framework. Data are collected from online sources, then annotated and preprocessed. Next, five ML models generate intermediary predictions, which are combined by a weighted hybridization step to produce final SA/ER outputs and evaluation results.

📊 Datasets, Labels & Distribution (Tables I–II, Fig. 2)

The paper develops two separate datasets: Sentiment Analysis (SA) and Emotion Recognition (ER). Data were collected from social platforms and online sources (e.g., Facebook, YouTube, blogs, Twitter/X), and then manually annotated by the authors and linguistic experts.

Dataset sizes and class counts (reported):
• SA dataset: 8,000 samples labeled into Positive (3,500), Negative (2,500), Neutral (2,000)
• ER dataset: 6,500 samples labeled into Happy (1,300), Excited (1,200), Tender (1,200), Sad (1,150), Scared (950), Angry (700)

Table I. Sample Dataset for SA

Sentiment Example
Positive {এই রেস্টুরেন্টের পরিবেশ অনেক ভালো}
Negative {তাদের ব্যবহার একদমই ভালো না}
Neutral {আমি এখন খবরের কাগজ পড়ছি}
Positive {এই খাবার জায়গাটার পরিবেশ দারুণ চমৎকার}

Table I. Sample entries of the SA dataset (as shown in the paper).

Table II. Sample Dataset for ER

Emotion Example
Happy {আমি ভাল আছি}, {এই খবরটা পেয়ে সে খুশি}
Excited {ভালো ফলাফল করতে পেরে অনেক ভালো লাগছে}
Tender {সে একজন ভালো মনের মানুষ}
Sad {আজকে আমার ভালো লাগছে না}
Scared {আমি বাবাকে অনেক বেশি ভয় পাই}
Angry {তার সাথে কথা বলার কোনো মানেই হয় না}

Table II. Sample entries of the ER dataset (as shown in the paper).

🧮 Class Distribution (Fig. 2)

Distribution of data in distinct datasets (Fig. 2)

Figure 2. Distribution of data in distinct datasets. The figure summarizes how samples are distributed across SA (positive/negative/neutral) and ER (six emotion categories).

🧹 Text Preprocessing (as described)

🧠 Hybrid Method: “Intermediary + Weighted Hybridization”

The method has two stages:

  1. Intermediary results: represent each text using TF-IDF, then train and evaluate SVM, Logistic Regression, Decision Tree, Random Forest, and KNN.
  2. Hybridizing results: compute each model’s accuracy, convert them into weights, then combine predicted labels by summing weights per candidate label and choosing the label with the highest total weight (Algorithm 1 in the paper).

🧪 Experimental Setup

📈 Results & Analysis (Figs. 3–4)

Evaluation metrics of ML algorithms (Fig. 3)

Figure 3. Evaluation metrics of ML algorithms. The paper reports that SVM provides the strongest baseline performance among the five ML models.

Evaluation metrics value of proposed framework (Fig. 4)

Figure 4. Evaluation metrics value of proposed framework. Reported overall performance: Accuracy 96.57%, Precision 95.96%, Recall 95.85%, F1-score 95.90%.

📊 Improvement over Individual ML Models (Table III)

Table III reports the proposed framework’s metric values and how much they improve over each individual ML baseline.

Metric (Proposed %) Baseline Algorithm Baseline Value (%) Improvement (%)
Accuracy (96.57)SVM93.483.30
LR90.257.00
DT87.5410.31
RF88.688.89
KNN84.6814.04
Precision (95.96)SVM92.963.22
LR90.006.62
DT87.1210.14
RF87.2110.03
KNN84.5813.45
Recall (95.85)SVM92.853.23
LR90.136.34
DT86.4810.83
RF88.148.74
KNN83.1715.24
F1-score (95.90)SVM92.903.22
LR90.066.48
DT86.7910.49
RF87.679.38
KNN83.8614.35

Table III. Improvement in results while using the proposed framework (values copied from the paper).

🏁 Comparison with Existing Works (Fig. 5, Table IV)

Evaluation metrics comparison between existing works and proposed framework (Fig. 5)

Figure 5. Evaluation metrics comparison between existing works and proposed framework. The paper reports higher overall metric values for the proposed unified SA+ER hybrid framework compared with the selected prior works.

Table IV in the paper compares multiple aspects (task scope, dataset type, feature engineering, data imbalance handling, etc.). Below is a clean HTML version of the same comparison (keeping the same aspect list and the “Proposed” column content).

Aspect What the paper claims for the Proposed framework
Both SA and ER Yes (unified)
Hybrid approach Yes (weighted hybridization of multiple ML outputs)
Datasets Developed (new SA and ER datasets)
Performance High (reported metrics ~96% range)
Efficiency High (CPU-only setup; lightweight ML pipeline)
Domain Multiple (data collected from multiple online sources)
Handling data imbalance Yes (paper notes ensuring balanced training distribution)
Feature engineering TF-IDF + Hybridizing
Granularity 9 total classes (3 sentiment + 6 emotion)

Table IV (cleaned view). Summary of the “Proposed” column from the paper’s comparison table (Table IV).

🚀 Key Contributions & Practical Impact

Technical Contributions

  • Unified Bangla text analysis: simultaneously supports SA and ER in one pipeline.
  • Hybrid decision rule: uses accuracy-derived weights to combine model predictions per sample.
  • Reproducible ML baseline stack: TF-IDF + 5 classic ML classifiers.

Deployment / Future Directions (as stated)

  • Data expansion: increase dataset size to improve generalizability.
  • Advanced models: explore transformer-based deep learning for harder linguistic patterns.
  • Applications: real-time systems for social media monitoring and sentiment-driven decision-making.

🔬 Research Significance

The core value of this paper is practical: it shows that a lightweight, CPU-friendly hybridization strategy can unify sentiment and emotion understanding for Bangla text and achieve strong reported metrics without relying on large neural models. This can be especially useful when compute is limited but consistent SA+ER outputs are needed.

📝 Citation

If you find Sentimen Analysis paper useful in your research, please consider citing:

@inproceedings{manik2025unifying,
  title={Unifying Sentiment Analysis and Emotion Recognition for Bangla Text: A Hybrid Approach},
  author={Manik, Md Motaleb Hossen and Sagor, Anite Halim and Mondal, Fahim Ahmed and Touhid, Md Mossadek and Islam, Md Zabirul},
  booktitle={2025 International Conference on Electrical, Computer and Communication Engineering (ECCE)},
  pages={1--6},
  year={2025},
  organization={IEEE}
}

📖 Paper: 2025 International Conference on Electrical, Computer and Communication Engineering (ECCE)

A Hybrid Framework for Sentiment Analysis from Bangla Texts

Md Motaleb Hossen Manik, Fabliha Haque, MMA Hashem, Md Ahsan Habib, Md Zabirul Islam, Tanim Ahmed
Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh

Highlights

This paper proposes a hybrid framework for performing Sentiment Analysis (SA) on Bangla and phonetic Bangla texts. Instead of relying solely on machine learning or solely on rule-based logic, the framework combines four machine learning algorithms with a newly designed rule-based approach, and merges their outputs using a weighted aggregation strategy.

Key Achievements (as reported in the paper):

  • 📚 New manually built dataset: 1,600 reviews (400 Excellent, 400 Good, 400 Neutral, 400 Bad), including Bangla and phonetic Bangla.
  • 🧠 Hybrid design: SVM, Logistic Regression, Decision Tree, Random Forest + 53-rule linguistic system.
  • 🔍 Quantification of Polarity: Supports five sentiment levels (Strongly Positive, Positive, Neutral, Negative, Strongly Negative).
  • ↩ Negation handling: Polarity inversion mechanism included (e.g., “না”, “নয়”).
  • 📈 Final accuracy: 95.54%, outperforming prior Bangla SA works.
  • ⚙️ Lightweight setup: Implemented in Python 3.8.8 on CPU (Intel i5, 12GB RAM), no GPU used.

🏗️ Overall Hybrid Framework (Fig. 1)

Overall structure of the proposed framework (Fig. 1)

Figure 1 (page 3). The workflow begins with data collection from social platforms, followed by preprocessing and feature engineering. The dataset is split into training/validation/testing sets. In parallel, tokenization feeds a rule-based engine. Finally, outputs from ML models and rule-based approach are combined through weighted aggregation to produce the final sentiment.

📊 Dataset Construction (Section III)

Due to limited Bangla sentiment datasets, the authors manually constructed a new dataset. Reviews were collected from YouTube, Facebook, Instagram, Twitter, and manually annotated into four categories: Excellent, Good, Neutral, Bad.

The dataset contains:

Table II. Sample Dataset

Actual Review Original Form Class
তাদের ব্যবহার অনেক ভালোতাদের ব্যবহার অদেক ভাদ াExcellent
Tmr kotha ar kaj a mil naiতোমার কথা আর কাজে মিল নাইBad

Table II (page 2). Sample Bangla and phonetic Bangla reviews.

🧠 Methodology

1️⃣ Machine Learning Component

2️⃣ Rule-Based Component (53 Rules)

📈 Accuracy Comparison (Fig. 2)

Accuracy comparison among approaches (Fig. 2)

Figure 2 (page 5). SVM achieves 86.85%, rule-based approach achieves 92.32%, and the final hybrid framework achieves 95.54%.

🔎 Rule Category Impact (Fig. 3)

Accuracy for different rule combinations (Fig. 3)

Figure 3 (page 5). Accuracy improves as rule complexity increases: unigram < unigram+bigram < unigram+bigram+trigram.

⚖️ Weighted Aggregation Strategy

Each approach receives a weight proportional to its accuracy:

Wi = Accuracy(Ai) / Σ Accuracy(Ai)

Reported weights:

Final prediction: On = Σ (Ci × Wi)

📊 Comparison with Prior Work (Table V)

Aspect Proposed
Hybrid Approach
Ranked Algorithms
Quantified Polarity
Phonetic Bangla
Negation Handling
Final Accuracy95.54%

Table V (page 6). Proposed framework outperforms prior Bangla SA models.

⚠️ Limitations & Future Work

Future improvements include larger datasets, improved feature selection, and expanding rule coverage for broader domains.

📝 Citation

If you find this paper useful in your research, please consider citing:

@inproceedings{manik2022hybrid,
  title={A Hybrid Framework for Sentiment Analysis from Bangla Texts},
  author={Manik, Md Motaleb Hossen and Haque, Fabliha and Hashem, MMA and Habib, Md Ahsan and Islam, Md Zabirul and Ahmed, Tanim},
  booktitle={2022 25th International Conference on Computer and Information Technology (ICCIT)},
  pages={517--522},
  year={2022},
  organization={IEEE}
}

📖 Paper: 2022 25th International Conference on Computer and Information Technology (ICCIT)

Public sector corruption analysis with modified K-means algorithm using perception data

Anik Pramanik, Amlan Sarker, Md Zabirul Islam, MMA Hashem
Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh

Highlights

Corruption in public-sector services can be difficult to measure early using only official reports. This paper proposes a modified, attribute-weighted K-means clustering model that uses perception (survey) data to segment public organizations by corruption level. The study applies the model to Bangladesh, evaluates it against a reference corruption report, and also presents a cloud-based web application architecture for real-time data collection and analysis.

Key Achievements (as reported in the paper):

  • 🧩 Survey-driven dataset: 70 survey participants; 20 public-sector organizations; 7 attributes scored in [1–5].
  • ⚖️ Attribute weighting: Weights derived from corruption-type frequency statistics (e.g., bribery, negligence) instead of treating all attributes equally.
  • 🔎 Meaningful segmentation: Clusters labeled as Least, Moderately, and Highly corrupted.
  • ✅ Reference-based evaluation: Compared with an official corruption report; achieved accuracy = 0.875 (87.5%).
  • ☁️ Deployment concept: Cloud-based Flask web app architecture for survey intake, database storage, clustering, and visualization.

🏗️ Overall Corruption Analysis Pipeline (Fig. 1)

Block diagram of the proposed corruption analysis model (Fig. 1)

Figure 1 (page 2). The paper’s pipeline starts from survey/perception data and data processing to build a dataset of organizations. In parallel, corruption frequency statistics are used to build a weight vector. The core step is weighted-attribute K-means, producing meaningful clusters, followed by cluster analysis & labeling.

📊 Data Collection & Organization Attributes (Section III)

Each public-sector organization is represented using 7 attributes: Integrity, Accountability, Independence, Resource, Transparency, Co-operation, and Awareness. A multidimensional survey is used to score each attribute using question-level ratings of 1 (minimum), 3 (mid-point), or 5 (maximum). For each organization and attribute, the final value is the average across surveyed users.

🧾 Sample Survey Assessment (Table I)

Field Example (from the paper)
OrganizationPassport Office
AttributeTransparency
Question examples Record availability, procurement info accessibility, disclosure of assets/interests, and public advertising of vacancies/openings.

Table I (page 2). Example questionnaire block used to compute an attribute score (Transparency) for an organization.

Dataset size (as reported): data collected from 70 survey participants; dataset includes 20 public-sector organizations; each described by 7 attributes with values in [1, 5].

📌 Why Attribute Weighting Matters (Fig. 2, Table II)

Standard K-means uses Euclidean distance and effectively treats all attributes as equally important. The paper argues this is misleading for corruption analysis because some corruption types occur much more frequently (for example, bribery and negligence). Therefore, the model builds a weight vector using corruption-type frequency statistics and applies weighted distance in clustering.

Percentages of household victims of different corruption-types (Fig. 2)

Figure 2 (page 2). The paper reports household victim percentages by corruption type, including Bribe (49.8) and Negligence (39.9) as the most frequent categories, followed by smaller-frequency types. These frequencies motivate higher weights for attributes linked to high-frequency corruption types.

⚖️ Weight Vector of Attributes (Table II)

Attribute Weight Indicator of
Integrity0.46Bribery, unauthorized money
Accountability0.37Negligence of duties
Co-operation0.06Harassment, misbehavior
Transparency0.03Fraudulence
Awareness0.03Fraudulence
Resource0.02Embezzlement, fraudulence
Independence0.02Nepotism, influential interference

Table II (page 2). Normalized attribute weights used by the proposed model.

🧠 Core Model: Attribute-Weighted K-means (Section IV)

The clustering uses a weighted distance (instead of standard Euclidean distance). Each attribute belongs to a weight partition, and the distance between organizations is computed by multiplying each squared attribute difference by its attribute weight.

The algorithm initializes K = 3 cluster centers using uniform attribute values from the set {1, 3, 5}, assigns organizations to the nearest weighted-distance centroid, and iteratively updates cluster centers until the stress function falls below a threshold.

📈 Results: Cluster Centers & Separation (Table III, Fig. 3)

📐 Resultant Cluster Centers (Table III)

Attribute Centroid 1 Centroid 2 Centroid 3
Integrity4.133.051.68
Accountability3.892.902.25
Co-operation3.373.153.04
Transparency3.333.213.14
Awareness3.112.602.98
Resource3.202.803.01
Independence3.413.343.11

Table III (page 3). Cluster centers found by the proposed weighted K-means.

Scatter plot for Integrity vs Accountability (Fig. 3)

Figure 3 (page 3). Scatter plot of organizations using the two most heavily weighted attributes: Integrity and Accountability, showing cluster separation.

🏷️ Cluster Labels (Table IV)

Cluster ID Weighted Sum of Attributes Classification
Cluster 1169.24Least corrupted
Cluster 2123.83Moderately corrupted
Cluster 392.50Highly corrupted

Table IV (page 3). Cluster labeling derived from weighted sums of cluster-center attributes.

✅ Evaluation vs Reference Corruption Report (Tables V–VI)

To validate reliability, the paper compares the model’s cluster labels against a reference corruption report (TIB National Household Survey 2017). Performance is summarized using precision, recall, and F1-score per cluster, along with macro and weighted averages.

📊 Performance Analysis (Table V)

Class Precision Recall F1-score
Cluster 1 (Least Corrupted)1.0000.7500.857
Cluster 2 (Moderately Corrupted)0.8750.8750.875
Cluster 3 (Highly Corrupted)0.8001.0000.889
Accuracy0.875
Macro Average0.8920.8750.874
Weighted Average0.8870.8750.874

Table V (page 4). Precision/recall/F1 by cluster, plus macro and weighted averages.

🧾 Sector-by-Sector Comparison (Table VI)

The paper lists sector corruption percentages from the reference report and compares them against the model’s predicted class. (Below is the paper’s table rendered into HTML.)

Organizations % Affected Classification by % Model Classification
Law Agency72.5HighHigh
Passport67.3HighHigh
BRTA65.4HighModerate
Judiciary60.5HighHigh
Land Services44.9ModerateModerate
Education42.9ModerateModerate
Health42.5ModerateModerate
Agriculture41.6ModerateModerate
Electricity38.9ModerateModerate
Gas38.3ModerateModerate
Local Gov. Institute26.7ModerateModerate
Insurance12.3LeastLeast
Tax and Customs11.1LeastModerate
Banking5.7LeastLeast
NGO5.4LeastLeast

Table VI (page 4). Side-by-side comparison between reference sector classification and the model’s predicted class.

☁️ Real-Time Cloud Web Application Architecture (Fig. 4)

Architecture of real-time cloud-based web application (Fig. 4)

Figure 4 (page 4). The paper’s deployment design uses a Flask-based web UI where a user selects an organization, answers questionnaires, and submits scores. The backend processes scores, stores the dataset in a database, runs the proposed clustering on a cloud server, and returns results with visualizations (charts/tables/plots).

🚀 Practical Impact (as described)

  • Early segmentation: Uses perception data to group sectors as least/moderate/high corruption, supporting prioritization of audits and interventions.
  • Adaptable weighting: Attribute weights reflect corruption-type frequency, so the model emphasizes dominant corruption channels.
  • Scalable collection: Web-based design supports larger-scale data collection and repeated re-analysis over time.

⚠️ Limitations & Future Directions (as stated)

  • Collect more data across countries, more organizations, and more diverse populations to improve generalization.
  • Expand survey coverage to better represent the full spectrum of public-sector experiences.

📝 Citation

If you find this paper useful in your research, please consider citing:

@inproceedings{pramanik2020public,
  title={Public sector corruption analysis with modified K-means algorithm using perception data},
  author={Pramanik, Anik and Sarker, Amlan and Islam, Zabirul and Hashem, MMA},
  booktitle={2020 11th International Conference on Electrical and Computer Engineering (ICECE)},
  pages={198--201},
  year={2020},
  organization={IEEE}
}

📖 Paper: 2020 11th International Conference on Electrical and Computer Engineering (ICECE)

2026 Md Zabirul Islam