Dual-Agent Q-Learning for Cross-Layer IEEE 802.11bd Optimization in Dense VANETs
Downloads
Dense vehicular ad hoc networks face critical challenges in reliably delivering safety messages due to channel congestion, packet collisions, and interference. This study develops a dual-agent Q-learning framework for cross-layer IEEE 802.11bd optimization to improve latency and power efficiency while maintaining acceptable packet delivery ratios in dense traffic. We propose a decomposed architecture separating PHY-layer power control and MAC-layer beacon rate adaptation, with deterministic SINR-based MCS selection ensuring IEEE 802.11bd compliance. The framework is evaluated using a Python-based VANET simulator implementing the IEEE 802.11bd PHY/MAC stack with realistic SUMO mobility, multi-class background traffic, and omnidirectional/sectoral antennas across 20-90 vehicles/km densities. Results show dual-agent Q-learning reduces average latency by 44.6% (31.1ms to 17.2ms) and transmission power by 55% (15-20dBm to 9dBm) compared to static baselines, with acceptable 5-11% PDR reduction (94.2% to 88.6%). The approach converges within 8,500 episodes, significantly faster than single-agent Q-learning (12,500) and dual-agent DQN (14,000-35,000). This work introduces the first dual-agent tabular Q-learning for joint power-rate-MCS optimization in IEEE 802.11bd VANETs, demonstrating that agent decomposition reduces state-action complexity while enabling interpretable, fast-converging control suitable for sub-100ms vehicular applications.
Downloads
[1] W.H.O. (2023). Despite Notable Progress, Road Safety Remains Urgent Global Issue. World Health Organization, Geneva, Switzerland. Available: Available online: https://www.who.int/news/item/13-12-2023-despite-notable-progress-road-safety-remains-urgent-global-issue (accessed on May 2026).
[2] Fu, Y., Li, C., Yu, F. R., Luan, T. H., & Zhang, Y. (2022). A Survey of Driving Safety with Sensing, Vehicular Communications, and Artificial Intelligence-Based Collision Avoidance. IEEE Transactions on Intelligent Transportation Systems, 23(7), 6142–6163. doi:10.1109/TITS.2021.3083927.
[3] Joerer, S., Segata, M., Bloessl, B., Cigno, R. Lo, Sommer, C., & Dressler, F. (2014). A vehicular networking perspective on estimating vehicle collision probability at intersections. IEEE Transactions on Vehicular Technology, 63(4), 1802–1812. doi:10.1109/TVT.2013.2287343.
[4] SAE Standard J2945/2_201810. (2018). Dedicated Short Range Communications (DSRC) Performance Requirements for V2V Safety Awareness. SAE International, Warrendale, United States. doi:10.4271/J2945/2_201810.
[5] Kenney, J. B. (2011). Dedicated short-range communications (DSRC) standards in the United States. Proceedings of the IEEE, 99(7), 1162–1182. doi:10.1109/JPROC.2011.2132790.
[6] IEEE Computer Society LAN/MAN Standards Committee. (2009). IEEE Standard for Information Technology- Telecommunication and Information Exchange between Systems-Local and Metropolitan Area Networks-Specific Requirements Part11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment1: Radio Resource Measurement of Wireless LANs, 1-148. Available online: http://standards.ieee.org/getieee802/download/802.11n-2009.pdf (accessed on May 2026).
[7] Maaloul, S., Aniss, H., Kassab, M., & Berbineau, M. (2021). Classification of C-ITS Services in Vehicular Environments. IEEE Access, 9, 117868–117879. doi:10.1109/ACCESS.2021.3105815.
[8] Molina-Masegosa, R., Gozalvez, J., & Sepulcre, M. (2020). Comparison of IEEE 802.11p and LTE-V2X: An Evaluation with Periodic and Aperiodic Messages of Constant and Variable Size. IEEE Access, 8, 121526–121548. doi:10.1109/ACCESS.2020.3007115.
[9] Iliopoulos, C., Iossifides, A., Foh, C. H., & Chatzimisios, P. (2025). IEEE 802.11BD for Next-Generation V2X Communications: From Protocol to Services. IEEE Communications Standards Magazine, 9(2), 88–98. doi:10.1109/MCOMSTD.2025.3569015.
[10] Kumar, S., Kumar, A., Tyagi, V., & Kumar, A. (2020). Impact of Network Density on AODV protocol in VANET. 2020 IEEE 5th International Conference on Computing Communication and Automation, ICCCA 2020, 559–564. doi:10.1109/ICCCA49541.2020.9250898.
[11] Aznar-Poveda, J., Garcia-Sanchez, A. J., Egea-Lopez, E., & Garcia-Haro, J. (2021). MDPRP: A Q-Learning Approach for the Joint Control of Beaconing Rate and Transmission Power in VANETs. IEEE Access, 9, 10166–10178. doi:10.1109/ACCESS.2021.3050625.
[12] Triwinarko, A., Dayoub, I., & Cherkaoui, S. (2021). PHY layer enhancements for next generation V2X communication. Vehicular Communications, 32, 100385. doi:10.1016/j.vehcom.2021.100385.
[13] Kaul, S., Ramachandran, K., Shankar, P., Oh, S., Gruteser, M., Seskar, I., & Nadeem, T. (2007). Effect of antenna placement and diversity on vehicular network communications. 2007 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, 112-121. doi:10.1109/SAHCN.2007.4292823.
[14] Xie, X., Huang, B., Yang, S., & Lv, T. (2009). Adaptive multi-channel MAC protocol for dense VANET with directional antennas. In 2009 6th IEEE Consumer Communications and Networking Conference, 1-5. doi:10.1109/CCNC.2009.4784948.
[15] Ren, J., Zhang, G., & Li, D. (2017). Multicast Capacity for VANETs with Directional Antenna and Delay Constraint under Random Walk Mobility Model. IEEE Access, 5, 3958–3970. doi:10.1109/ACCESS.2017.2683718.
[16] Subramanian, A. P., Navda, V., Deshpande, P., & Das, S. R. (2008). A measurement study of inter-vehicular communication using steerable beam directional antenna. Proceedings of the Fifth ACM International Workshop on VehiculAr Inter-NETworking, 7–16. doi:10.1145/1410043.1410046.
[17] Li, H., & Xu, Z. (2018). Routing Protocol in VANETs Equipped with Directional Antennas: Topology-Based Neighbor Discovery and Routing Analysis. Wireless Communications and Mobile Computing, 2018(1), 7635143. doi:10.1155/2018/7635143.
[18] Yanbin, W., Zhuofei, W., Jing, Z., Zhijuan, L., & Xiaomin, M. (2020). Analysis and adaptive optimization of vehicular safety message communications at intersections. Ad Hoc Networks, 107, 102241. doi:10.1016/j.adhoc.2020.102241.
[19] Sepulcre, M., Gozalvez, J., & Miralles, H. (2019). Context-Aware Beaconing for Cooperative Awareness in Vehicular Networks. IEEE Transactions on Intelligent Transportation Systems, 20(2), 726–740. doi:10.1109/TITS.2018.2853644.
[20] Ma, X., & Trivedi, K. S. (2021). SINR-Based Analysis of IEEE 802.11p/bd Broadcast VANETs for Safety Services. IEEE Transactions on Network and Service Management, 18(3), 2672–2686. doi:10.1109/TNSM.2021.3069206.
[21] Popovski, P., Stefanovic, C., Nielsen, J. J., de Carvalho, E., Angjelichinoski, M., Trillingsgaard, K. F., & Bana, A.-S. (2019). Wireless Access in Ultra-Reliable Low-Latency Communication (URLLC). IEEE Transactions on Communications, 67(8), 5783–5801. doi:10.1109/tcomm.2019.2914652.
[22] Chang, H., Song, Y. E., Kim, H., & Jung, H. (2018). Distributed transmission power control for communication congestion control and awareness enhancement in VANETs. PLoS ONE, 13(9), 203261. doi:10.1371/journal.pone.0203261.
[23] Jain, A., Mehrotra, A., Rewariya, A., & Kumar, S. (2022). A Systematic Study of Deep Q-Networks and Its Variations. 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2022, 2157–2162. doi:10.1109/ICACITE53722.2022.9823631.
[24] Gu, Y., Cheng, Y., Chen, C. L. P., & Wang, X. (2022). Proximal Policy Optimization with Policy Feedback. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7), 4600–4610. doi:10.1109/TSMC.2021.3098451.
[25] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. 35th International Conference on Machine Learning, ICML 2018, 5, 2976–2989.
[26] Galliera, R., Morelli, A., Fronteddu, R., & Suri, N. (2023). MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks. Proceedings of IEEE/IFIP Network Operations and Management Symposium 2023, NOMS 2023, 1–10. doi:10.1109/NOMS56928.2023.10154210.
[27] Aznar-Poveda, J., Garcia-Sanchez, A. J., Egea-Lopez, E., & Garcia-Haro, J. (2021). Simultaneous Data Rate and Transmission Power Adaptation in V2V Communications: A Deep Reinforcement Learning Approach. IEEE Access, 9, 122067–122081. doi:10.1109/ACCESS.2021.3109422.
[28] Liu, Q., & Ma, Y. (2025). Communication resource allocation method in vehicular networks based on federated multi-agent deep reinforcement learning. Scientific Reports, 15(1). doi:10.1038/s41598-025-15982-x.
[29] Elloumi, M., Hassan, Z. Z., & Kaddoum, G. (2025). Spectrum Sharing in Internet-of-Vehicles Networks: Digital Twin-Empowered Proactive Interference Management Approach. IEEE Transactions on Network and Service Management, 22(4), 3228–3248. doi:10.1109/TNSM.2025.3541977.
[30] Cui, J. (2025). A Deep Reinforcement Learning Approach for Dynamic Resource Allocation in VANETs with Human–Centric Interaction Interfaces. Transactions on Emerging Telecommunications Technologies, 36(8), e70221. doi:10.1002/ett.70221.
[31] Commsignia. (2024). OBU Lite - Powerful V2X Onboard Unit. Commsignia, Budapest, Hungary. Available online: https://commsignia.com/products/obu (accesed on May 2026).
[32] Ajeevi Technologies. (2024). On-Board Unit (AJV-IOT-OBU-001). Ajeevi Technologies, Noida, India. Available online: https://ajeevi.com/wp-content/uploads/2024/02/OBU.docx.pdf (accessed on May 2026).
[33] Jang, B., Kim, M., Harerimana, G., & Kim, J. W. (2019). Q-Learning Algorithms: A Comprehensive Classification and Applications. IEEE Access, 7, 133653–133667. doi:10.1109/ACCESS.2019.2941229.
[34] Sarris, I. (2018). ubx-v2x. GitHub, Inc, San Francisco, United States. Available online: https://github.com/u-blox/ubx-v2x (accessed on May 2026).
[35] Turcanu, I., Salvo, P., Baiocchi, A., Cuomo, F., & Engel, T. (2020). A multi-hop broadcast wave approach for floating car data collection in vehicular networks. Vehicular Communications, 24, 100232. doi:10.1016/j.vehcom.2020.100232.
[36] ETSI TR 102 638. (2011). Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Definitions. European Telecommunications Standards Institute, Sophia, France. Available online: https://www.etsi.org/deliver/etsi_tr/102600_102699/102638/01.01.01_60/tr_102638v010101p.pdf (accessed on May 2026).
[37] Abboud, K., Omar, H. A., & Zhuang, W. (2016). Interworking of DSRC and Cellular Network Technologies for V2X Communications: A Survey. IEEE Transactions on Vehicular Technology, 65(12), 9457–9470. doi:10.1109/TVT.2016.2591558.
[38] CSN ETSI EN 302 637-3 V1.3.1. (2019). Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Part 3: Specifications of Decentralized Environmental Notification Basic Service. European Standard, Brussels, Belgium.
[39] Su, Z., Xu, Q., & Qi, Q. (2016). Big data in mobile social networks: A QoE-oriented framework. IEEE Network, 30(1), 52–57. doi:10.1109/MNET.2016.7389831.
[40] Yang, H., Zheng, K., Zhang, K., Mei, J., & Qian, Y. (2020). Ultra-Reliable and Low-Latency Communications for Connected Vehicles: Challenges and Solutions. IEEE Network, 34(3), 92–100. doi:10.1109/MNET.011.1900242.
[41] Miao, Z., Li, C., Zhu, L., Han, X., Wang, M., Cai, X., Liu, Z., & Xiong, L. (2016). On resource management in vehicular Ad Hoc networks: A fuzzy optimization scheme. IEEE Vehicular Technology Conference, 2016-July, 1–5. doi:10.1109/VTCSpring.2016.7504373.
[42] Haitao, Z., Yuting, Z., Hongbo, Z., & Dapeng, L. (2018). Resource Management in Vehicular Ad Hoc Networks: Multi-parameter Fuzzy Optimization Scheme. Procedia Computer Science, 129, 443–448. doi:10.1016/j.procs.2018.03.022.
[43] Goudarzi, F., Asgari, H., & Al-Raweshidy, H. S. (2019). Fair and stable joint beacon frequency and power control for connected vehicles. Wireless Networks, 25(8), 4979–4990. doi:10.1007/s11276-019-02076-6.
[44] Kapade, N. (2015). TLC: Trust Point Load Balancing Method using Coalitional Game Theory for message forwarding in VANET. Proceedings - 2014 IEEE Global Conference on Wireless Computing and Networking, GCWCN 2014, 160–164. doi:10.1109/GCWCN.2014.7030870.
[45] Cho, B. M., Jang, M. S., & Park, K. J. (2020). Channel-Aware Congestion Control in Vehicular Cyber-Physical Systems. IEEE Access, 8, 73193–73203. doi:10.1109/ACCESS.2020.2987416.
[46] Wei, L. J., & Lim, J. M. Y. (2019). Identifying Transmission Opportunity through Transmission Power and Bit Rate for Improved VANET Efficiency. Mobile Networks and Applications, 24(5), 1630–1638. doi:10.1007/s11036-018-1180-2.
[47] Aygun, B., Boban, M., & Wyglinski, A. M. (2016). ECPR: Environment-and context-aware combined power and rate distributed congestion control for vehicular communications. Computer Communications, 93, 3–16. doi:10.1016/j.comcom.2016.05.015.
[48] Triwinarko, A., Dayoub, I., Zwingelstein-Colin, M., Gharbi, M., & Bouraoui, B. (2020). A PHY/MAC cross-layer design with transmit antenna selection and power adaptation for receiver blocking problem in dense VANETs. Vehicular Communications, 24, 100233. doi:10.1016/j.vehcom.2020.100233.
[49] Ye, H., Li, G. Y., & Juang, B. H. F. (2019). Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Transactions on Vehicular Technology, 68(4), 3163–3173. doi:10.1109/TVT.2019.2897134.
[50] Tian, J., An, S. H., Islam, A., & Chang, K. H. (2023). A Hybrid Power-Rate Management Strategy in Distributed Congestion Control for 5G-NR-V2X Sidelink Communications. Sensors, 23(15). doi:10.3390/s23156657.
[51] Egea-Lopez, E. (2016). Fair distributed Congestion Control with transmit power for vehicular networks. 2016 IEEE 17th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 1–6. doi:10.1109/WoWMoM.2016.7523571.
[52] Caizzone, G., Giacomazzi, P., Musumeci, L., & Verticale, G. (2005). A power control algorithm with high channel availability for vehicular ad hoc networks. IEEE International Conference on Communications, 5, 3171–3176. doi:10.1109/icc.2005.1494999.
[53] Chaab, W. Al, Ismail, M., Altahrawi, M. A., Mahdi, H., & Ramli, N. (2017). Efficient rate adaptation algorithm in high-dense vehicular ad hoc network. 2017 IEEE 13th Malaysia International Conference on Communications, MICC 2017, 2017-November, 23–28. doi:10.1109/MICC.2017.8311725.
[54] Tielert, T., Jiang, D., Chen, Q., Delgrossi, L., & Hartenstein, H. (2011). Design methodology and evaluation of rate adaptation based congestion control for vehicle safety communications. IEEE Vehicular Networking Conference, VNC, 116–123. doi:10.1109/VNC.2011.6117132.
[55] Amer, H., Al-Kashoash, H., Khami, M. J., Mayfield, M., & Mihaylova, L. (2020). Non-cooperative game based congestion control for data rate optimization in vehicular ad hoc networks. Ad Hoc Networks, 107, 102181. doi:10.1016/j.adhoc.2020.102181.
[56] Mande, S., Ramachandran, N., Salma Asiya Begum, S., & Moreira, F. (2024). Optimized Reinforcement Learning for Resource Allocation in Vehicular Ad Hoc Networks. IEEE Access, 12, 167040–167048. doi:10.1109/ACCESS.2024.3489395.
[57] Jacob, M., Gopika, S., Ravindran, D., & Veerachamy, V. (2025). Implementation of Reinforcement Learning-Optimized Communication Protocols for VANETs: Challenges and Solutions. Advances in Communication and Applications, ERCICA 2024. Lecture Notes in Electrical Engineering, vol 1398, Springer, Singapore. doi:10.1007/978-981-96-4679-1_19.
[58] Ramesh, S. S. S., Banu, J. F., Kavitha, V. R., & Ramesh, T. (2025). Enhancing Intelligent Transportation Systems in Smart Cities Using VANETs With Deep Reinforcement Transfer Learning and Explainable AI. Transactions on Emerging Telecommunications Technologies, 36(8), e70219. doi:10.1002/ett.70219.
[59] Kai, C., & Liang, S. (2025). Control Strategy for VANET Autonomous Driving Vehicles in Emergency Situations Based on Deep Learning. Transactions on Emerging Telecommunications Technologies, 36(12), e70302. doi:10.1002/ett.70302.
[60] Liu, Z., & Deng, Y. (2025). Resource allocation strategy for vehicular communication networks based on multi-agent deep reinforcement learning. Vehicular Communications, 53. doi:10.1016/j.vehcom.2025.100895.
[61] Akinlade, O. (2018). Adaptive transmission power with vehicle density for congestion control. Master Thesis, University of Windsor, Windsor, Canada.
[62] Bansal, G., Kenney, J. B., & Rohrs, C. E. (2013). LIMERIC: A linear adaptive message rate algorithm for DSRC congestion control. IEEE Transactions on Vehicular Technology, 62(9), 4182–4197. doi:10.1109/TVT.2013.2275014.
[63] Aznar-Poveda, J., Egea-Lopez, E., Garcia-Sanchez, A. J., & Pavon-Marino, P. (2019). Time-to-collision-based awareness and congestion control for vehicular communications. IEEE Access, 7, 154192–154208. doi:10.1109/ACCESS.2019.2949131.
[64] Abdolahi, F., Mišić, J., & Mišić, V. B. (2025). Aligning Priorities: Interconnecting Vehicular Cloud Using IEEE 802.11bd Communications. IEEE International Conference on Communications, 5431–5436. doi:10.1109/ICC52391.2025.11162117.
[65] Yacheur, B. Y., Ahmed, T., & Mosbah, M. (2020). Analysis and Comparison of IEEE 802.11p and IEEE 802.11bd. Communication Technologies for Vehicles. Nets4Cars/Nets4Trains/Nets4Aircraft 2020. Lecture Notes in Computer Science, Volume 12574, Springer, Cham, Switzerland. doi:10.1007/978-3-030-66030-7_5.
[66] Ehsanfar, S., Moessner, K., Gizzini, A. K., & Chafii, M. (2022). Performance Comparison of IEEE 802.11p, 802.11bd-draft and a Unique-Word-based PHY in Doubly-Dispersive Channels. IEEE Wireless Communications and Networking Conference, WCNC, 2022-April, 1815–1820. doi:10.1109/WCNC51071.2022.9771810.
[67] Ye, H., Li, G. Y., & Juang, B.-H. (2019). Deep Reinforcement Learning for V2V Communications with Dynamic Vehicle Environments. IEEE Transactions on Vehicular Technology, 68(4), 3163–3173. doi:10.1109/TVT.2019.2896055.
[68] Haider, A., & Hwang, S. H. (2019). Adaptive transmit power control algorithm for sensing-based semi-persistent scheduling in C-V2X mode 4 communication. Electronics (Switzerland), 8(8), 846. doi:10.3390/electronics8080846.
[69] Joseph, M., Liu, X., & Jaekel, A. (2018). An adaptive power level control algorithm for DSRC congestion control. DIVANet 2018 - Proceedings of the 8th ACM Symposium on Design and Analysis of Intelligent Vehicular Networks and Applications, 57–62. doi:10.1145/3272036.3272041.
[70] Aslani, R., & Rasti, M. (2020). A Distributed Power Control Algorithm for Energy Efficiency Maximization in Wireless Cellular Networks. IEEE Wireless Communications Letters, 9(11), 1975–1979. doi:10.1109/LWC.2020.3010156.
[71] Wang, M., Chen, T., Du, F., Wang, J., Yin, G., & Zhang, Y. (2022). Research on adaptive beacon message transmission power in VANETs. Journal of Ambient Intelligence and Humanized Computing, 13(3), 1307–1319. doi:10.1007/s12652-020-02575-x.
[72] Shwetha, A., & Sankar, P. (2018). Queue management scheme to control congestion in a vehicular based sensor network. 2018 2nd International Conference on Inventive Systems and Control (ICISC), 917–921. doi:10.1109/ICISC.2018.8398933.
[73] Tayyaba, S. K., Khattak, H. A., Almogren, A., Shah, M. A., Ud Din, I., Alkhalifa, I., & Guizani, M. (2020). 5G vehicular network resource management for improving radio access through machine learning. IEEE Access, 8, 6792–6800. doi:10.1109/ACCESS.2020.2964697.
[74] Eckhoff, D., Brummer, A., & Sommer, C. (2016). On the impact of antenna patterns on VANET simulation. IEEE Vehicular Networking Conference, VNC, 0, 1–4. doi:10.1109/VNC.2016.7835925.
[75] F.H.A. (2018). Traffic Data Computation Method Pocket Guide. FHWA-PL-18-027, Federal Highway Administration (F.H.A.), Washington, USA. Available online: https://www.fhwa.dot.gov/policyinformation/pubs/pl18027_traffic_data_pocket_guide.pdf (accessed on May 2026).
[76] Dharsandiya, A. N., & Patel, R. M. (2016). A review on MAC protocols of Vehicular Ad Hoc Networks. 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 1040–1045. doi:10.1109/WiSPNET.2016.7566295.
[77] Rappaport, T. S. (2010). Wireless communications: Principles and practice, 2/E. Pearson Education India, Bengaluru, India.
[78] Ng, A. Y., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), 27-30 June, Bled, Slovenia.
[79] Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. MIT Press, Cambridge, United States.
[80] Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690. doi:10.1109/9.580874.
[81] Buşoniu, L., Babuška, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 38(2), 156-172. doi:10.1109/TSMCC.2007.913919.
[82] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. doi:10.1038/nature14236.
[83] Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 3215–3222. doi:10.1609/aaai.v32i1.11796.
[84] Baccour, E., Mhaisen, N., Abdellatif, A. A., Erbad, A., Mohamed, A., Hamdi, M., & Guizani, M. (2022). Pervasive AI for IoT Applications: A Survey on Resource-Efficient Distributed Artificial Intelligence. IEEE Communications Surveys and Tutorials, 24(4), 2366–2418. doi:10.1109/COMST.2022.3200740.
[85] Gill, S. S., Golec, M., Hu, J., Xu, M., Du, J., Wu, H., Walia, G. K., Murugesan, S. S., Ali, B., Kumar, M., Ye, K., Verma, P., Kumar, S., Cuadrado, F., & Uhlig, S. (2025). Edge AI: A Taxonomy, Systematic Review and Future Directions. Cluster Computing, 28(1), 18. doi:10.1007/s10586-024-04686-y.
[86] Bo, J., & Zhao, X. (2025). Vehicle Edge Computing Task Offloading Strategy Based on Multi-Agent Deep Reinforcement Learning. Journal of Grid Computing, 23(2), 13. doi:10.1007/s10723-025-09800-x.
[87] Tian, H., Zhu, L., & Tan, L. (2025). A joint task caching and computation offloading scheme based on deep reinforcement learning. Peer-to-Peer Networking and Applications, 18(1), 1–19. doi:10.1007/s12083-024-01836-2.
[88] Agarwal, R., Schwarzer, M., Castro, P. S., Courville, A. C., & Bellemare, M. (2021). Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 34, 29304-29320.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.



















