Modified EDA and Backtranslation Augmentation in Deep Learning Models for Indonesian Aspect-Based Sentiment Analysis

. Natasya, Abba Suganda Girsang

Abstract


In the process of developing a business, aspect-based sentiment analysis (ABSA) could help extract customers' opinions on different aspects of the business from online reviews. Researchers have found great prospective in deep learning approaches to solving ABSA tasks. Furthermore, studies have also explored the implementation of text augmentation, such as Easy Data Augmentation (EDA), to improve the deep learning models’ performance using only simple operations. However, when implementing EDA to ABSA, there will be high chances that the augmented sentences could lose important aspects or sentiment-related words (target words) critical for training. Corresponding to that, another study has made adjustments to EDA for English aspect-based sentiment data provided with the target words tag. However, the solution still needs additional modifications in the case of non-tagged data. Hence, in this work, we will focus on modifying EDA that integrates POS tagging and word similarity to not only understand the context of the words but also extract the target words directly from non-tagged sentences. Additionally, the modified EDA is combined with the backtranslation method, as the latter has also shown quite a significant contribution to the model’s performance in several research studies. The proposed method is then evaluated on a small Indonesian ABSA dataset using baseline deep learning models. Results show that the augmentation method could increase the model’s performance on a limited dataset problem. In general, the best performance for aspect classification is achieved by implementing the proposed method, which increases the macro-accuracy and F1, respectively, on Long Short-Term Memory (LSTM) and Bidirectional LSTM models compared to the original EDA. The proposed method also obtained the best performance for sentiment classification using a convolutional neural network, increasing the overall accuracy by 2.2% and F1 by 3.2%.

 

Doi: 10.28991/ESJ-2023-07-01-018

Full Text: PDF


Keywords


Easy Data Augmentation; Backtranslation; Long Short-Term Memory; Bidirectional LSTM; Convolutional Neural Network.

References


Zhao, Y., Xu, X., & Wang, M. (2019). Predicting overall customer satisfaction: Big data evidence from hotel online textual reviews. International Journal of Hospitality Management, 76, 111–121. doi:10.1016/j.ijhm.2018.03.017.

Tao, J., & Fang, X. (2020). Toward multi-label sentiment analysis: a transfer learning-based approach. Journal of Big Data, 7(1), 1-26. doi:10.1186/s40537-019-0278-0.

Zulqarnain, M., Ghazali, R., Hassim, Y. M. M., & Rehan, M. (2020). A comparative review on deep learning models for text classification. Indonesian Journal of Electrical Engineering and Computer Science, 19(1), 325–335. doi:10.11591/ijeecs.v19.i1.pp325-335.

Zhu, Y., Gao, X., Zhang, W., Liu, S., & Zhang, Y. (2018). A bi-directional LSTM-CNN model with attention for Aspect-level text classification. Future Internet, 10(12), 1–11. doi:10.3390/fi10120116.

Ilmania, A., Abdurrahman, Cahyawijaya, S., & Purwarianti, A. (2018). Aspect Detection and Sentiment Classification Using Deep Neural Network for Indonesian Aspect-Based Sentiment Analysis. 2018 International Conference on Asian Language Processing (IALP), Bandung, Indonesia. doi:10.1109/ialp.2018.8629181.

Wei, J., & Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). doi:10.18653/v1/d19-1670.

Rizos, G., Hemker, K., & Schuller, B. (2019). Augment to Prevent. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, New York, United States. doi:10.1145/3357384.3358040.

Duong, H. T., & Nguyen-Thi, T. A. (2021). A review: preprocessing techniques and data augmentation for sentiment analysis. Computational Social Networks, 8(1), 1–16. doi:10.1186/s40649-020-00080-x.

Liesting, T., Frasincar, F., & Truşcă, M. M. (2021). Data augmentation in a hybrid approach for aspect-based sentiment analysis. Proceedings of the 36th Annual ACM Symposium on Applied Computing, New York, United States, 828–835. doi:10.1145/3412841.3441958.

Kumar, V., Choudhary, A., & Cho, E. (2020). Data augmentation using pre-trained transformer models. arXiv preprint arXiv:2003.02245. doi:10.48550/arXiv.2003.02245.

Al-Smadi, M., Qawasmeh, O., Al-Ayyoub, M., Jararweh, Y., & Gupta, B. (2018). Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science, 27, 386–393. doi:10.1016/j.jocs.2017.11.006.

Ray, P., & Chakrabarti, A. (2022). A Mixed approach of Deep Learning method and Rule-Based method to improve Aspect Level Sentiment Analysis. Applied Computing and Informatics, 18(1–2), 163–178. doi:10.1016/j.aci.2019.02.002.

Abulaish, M., & Sah, A. K. (2019). A Text Data Augmentation Approach for Improving the Performance of CNN. 2019 11th International Conference on Communication Systems Networks (COMSNETS). doi:10.1109/comsnets.2019.8711054.

Ekawati, D., & Khodra, M. L. (2017). Aspect-based sentiment analysis for Indonesian restaurant reviews. 2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA). doi:10.1109/icaicta.2017.8090963.

Sudheer, K., & Valarmathi, B. (2018). Real time sentiment analysis of e-commerce websites using machine learning algorithms. International Journal of Mechanical Engineering and Technology, 9(2), 180–193.

Abka, A. F. (2016). Evaluating the use of word embeddings for part-of-speech tagging in Bahasa Indonesia. 2016 International Conference on Computer, Control, Informatics and Its Applications (IC3INA). doi:10.1109/ic3ina.2016.7863051.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135–146. doi:10.1162/tacl_a_00051.

Sennrich, R., Haddow, B., & Birch, A. (2016). Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 86–96. doi:10.18653/v1/p16-1009.

Bayhaqy, A., Sfenrianto, S., Nainggolan, K., & Kaburuan, E. R. (2018). Sentiment Analysis about E-Commerce from Tweets Using Decision Tree, K-Nearest Neighbor, and Naïve Bayes. 2018 International Conference on Orange Technologies (ICOT). doi:10.1109/icot.2018.8705796.

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. doi:10.1162/neco.1997.9.8.1735.

Wang, B., Wang, A., Chen, F., Wang, Y., & Kuo, C. C. J. (2019). Evaluating word embedding models: Methods and experimental results. APSIPA Transactions on Signal and Information Processing, 8, 1–14. doi:10.1017/ATSIP.2019.12.

Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. doi:10.1109/78.650093.

Yoon, K. (2014). Convolutional Neural Networks for Sentence Classification [OL]. arXiv Preprint. doi:10.48550/arXiv.1408.5882

Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. arXiv preprint, arXiv:1510.03820. doi:10.48550/arXiv.1510.03820

Gojali, S., & Khodra, M. L. (2016). Aspect based sentiment analysis for review rating prediction. 2016 International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA). doi:10.1109/icaicta.2016.7803110.


Full Text: PDF

DOI: 10.28991/ESJ-2023-07-01-018

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Natasya Natasya