System and Methods for RL-Based Congestion Control of Real-Time Communications = 강화학습 기반 실시간 커뮤니케이션 정체 제어를 위한 시스템 및 기술|RISS 상세보기

국문 초록 (Abstract)

COVID-19 팬데믹과 함께 비디오 컨퍼런싱부터 클라우드 게이밍까지 다양한 실시간 커뮤니케이션 애플리케이션들이 일상 생활 속에 보편화되고 있다. 점점 다양해지는 애플리케이션과 인터넷 환경에서 지속적으로 높은 체감 품질을 달성하는 정체 제어 알고리즘을 개발하기 위해 최근 강화학습 기반의 실시간 커뮤니케이션 정체 제어 기술들이 다수 등장하였다. 강화학습 기반 정체 제어 알고리즘 연구를 촉진하는 데 있어 중요한 문제점 중 하나는 강화학습 기반 정체 제어 알고리즘의 학습, 평가 및 검증을 지원하는 공개 프레임워크의 부재이다. 본 논문의 첫 번째 부분에서는 이러한 문제점을 해결하는 공개 프레임워크인 오픈넷랩을 제안한다. 오픈넷랩은 다음 세 가지로 구성되어 있다. 첫째, 강화학습 에이전트 설계에 있어 사용자 편의성과 프로그래머빌리티를 향상시키기 위해 실시간 커뮤니케이션 시스템 내부 디테일과 분리된 Gym 환경 및 사용하기 쉬운 인터페이스를 제공한다. 둘째, 알고리즘 개발 단계에서의 학습 및 평가를 위해 밀리초 단위 타이밍 정렬을 갖춘 고해상도 시뮬레이션 네트워크 환경을 사용하여 빠른 훈련 및 재현 가능한 평가를 가능하게 한다. 셋째, 실제 인터넷 환경에서 성능을 검증하기 위해 사용자 정의 가능한 실시간 커뮤니케이션 콜을 실행할 수 있는 공용 인터넷 테스트베드를 제공한다. 초기 사용 사례는 오픈넷랩이 네트워크 성능 및 체감 품질 메트릭 모두에서 널리 쓰이는 규칙 기반 정체 제어 알고리즘을 능가하는 새로운 강화 학습 기반 정체 제어 알고리즘 개발을 촉진했음을 보여준다. 추가로, 강화학습 기반 정체 제어 알고리즘에 특수한 대역폭 과다 사용 및 과소 사용으로 인한 성능 문제를 완화할 수 있는 지문 기반 방법을 제시한다. 두 번째 문제점은 체감 품질 지향 강화학습 기반 정체 제어를 달성하는 데 있어 해결해야 할 핵심 과제를 이해하는 것이다. GCC와 같은 널리 사용되는 규칙 기반 정체 제어 알고리즘에 비해 더 나은 엔드 투 엔드 체감 품질을 달성하는 강화학습 기반 정체 제어 알고리즘 디자인 연구는 늘어나고 있다. 그러나 주어진 네트워크 환경과 여러 타겟 체감 품질 지표가 있을 때 이를 만족시키도록 강화학습 기반 정체 제어 알고리즘을 명시적으로 디자인, 학습, 추론하는 데 어떤 과제를 해결해야 하는지에 대한 연구가 부족한 상황이다. 본 논문의 두 번째 부분에서는 이 문제를 해결하기 위해 다중 목표 강화학습을 활용하는 체감 품질 지향 강화학습 기반 정체 제어 알고리즘의 디자인, 학습 및 추론 방법을 제안한다. 구체적으로, 정체 제어 알고리즘이 최적화하는 서비스 품질 지표와 정체 제어 알고리즘의 최종 성능 목표인 체감 품질 지표 간 격차를 제대로 처리해야 한다는 점, 체감 품질 지표의 서비스 품질 지표에 대한 민감도가 네트워크 환경 특성에 따라 다르다는 관찰을 이용한다. 이에 기반해 다양한 네트워크 환경에서 높은 체감 품질을 달성하기 위해 여러 성능 목표, 즉 여러 보상 함수를 구성하는 서비스 품질 지표 가중치들을 학습할 수 있는 모델 아키텍처를 설계한다. 또한, 해당 모델로 추론을 수행하기 위해 주어진 네트워크 환경에서의 체감 품질 민감도를 고려해 가장 적합한 서비스 품질 가중치 값 선택을 자동화하는 메서드를 제안한다.

번역하기

COVID-19 팬데믹과 함께 비디오 컨퍼런싱부터 클라우드 게이밍까지 다양한 실시간 커뮤니케이션 애플리케이션들이 일상 생활 속에 보편화되고 있다. 점점 다양해지는 애플리케이션과 인터넷 ...

다국어 초록 (Multilingual Abstract)

Recently, real-time communication (RTC) applications have gained popularity from video conferencing to cloud gaming. Various techniques have been proposed that leverage reinforcement learning (RL) for congestion control (CC) to achieve consis- tently high quality-of-experience (QoE). The first part of this dissertation introduces OpenNetLab, an open framework that addresses this missing piece. For researchers that design RL-based CC for RTC, it provides simple interfaces with a customizable gym environment. The framework enables fast training and reproducible evaluation with a high-fidelity simulated net- work environment. Finally, it offers a public Internet testbed for running customizable end-to-end RTC calls for validation under unseen network conditions. Additionally, we present a fingerprint-based method that can mitigate performance issues specific to a given RL-based CC algorithm, such as bandwidth overuse and underuse. The second part of this dissertation presents measurement studies on QoS sensi- tivity of an RL-based CC for RTC under different network environments. Building on the understanding obtained from the measurement study, we introduce design, train- ing, and deployment of a QoE-oriented RL-based CC algorithm that aims to bridge the gap between the QoS and the QoE metrics by multi-objective RL-based approach that exploits QoS sensitivity-based clustering of network environments. For fast training and high performance deployment, we present amethod for choosing the appropriate performance objective for the sensitivity observed in a given network environment, based on sensitivity-aware K-means clustering of network environments. Keywords: Reinforcement learning, congestion control, real-time communications Student Number: 2018-33251

번역하기

목차 (Table of Contents)

Abstract 1
1 Introduction 8
2 Background and Related Work 11
2.1 CC for RTC 11
2.1.1 Limitation of Rule-based CC for RTC 14

Abstract 1
1 Introduction 8
2 Background and Related Work 11
2.1 CC for RTC 11
2.1.1 Limitation of Rule-based CC for RTC 14
2.1.2 RL-based CC for RTC 15
2.1.3 Hybrid CC for RTC 16
2.2 Performance-oriented CC using RL . 17
2.3 Open Framework for Networked Systems . 18
3 Open Framework for Training, Evaluation and Validation 20
3.1 Motivation 22
3.2 OpenNetLab 23
3.2.1 Design Decisions . 23
3.2.2 Model Design 25
3.2.3 Training and Evaluation on Network and RTC Environment . 26
3.2.4 Validation on Real Network and End-to-End RTC Environment 26
2
3.3 Evaluation 29
3.4 Fingerprint-Based Hybrid Bandwidth Estimators 32
3.5 Discussion 38
3.6 Summary . 39
4 Measurement Study with an Open Benchmark 43
4.1 Motivation 43
4.2 Design and Implementation 44
4.3 Measurement Study . 46
4.4 Summary . 61
5 QoE-oriented RL-Based CC for RTC 62
5.1 Motivation 62
5.2 QoE-oriented RL-based CC for RTC . 64
5.2.1 Sensitivity-based QoS Weight Assignment 64
5.2.2 Model Architecture 65
5.2.3 Multi-Objective RL Training and Deployment 67
5.3 Evaluation 68
5.3.1 Setup 68
5.3.2 End-to-End QoE . 69
5.3.3 Ablation Study 71
5.4 Discussion 73
5.5 Summary . 74
6 Conclusion 75
요약 84
3

참고문헌 (Reference)

1. Actor-critic algorithms, J. Tsitsiklis, V. Konda and, Advances in neural information processing systems, vol. 12, , 1999

2. VmafThe journey continues, J. Cock, K. Swanson, C. Bampis, J. Novak, A. Moorthy and, A. Aaron, Z. Li, Netflix Technology Blog, vol. 25, , 2018

3. Measuring broadband america., F. C. Commission, https://www. fcc. gov/reports-research/ reports/measuring-broadband-america/ measuring-fixed-broadband-eleventh-report, , 2021

4. Handling packet loss in webrtc in, S. Holmer, M. Paniconi, M. Shemer and, 2013 IEEE International Conference on Image Processing, pp. 1860–1864, IEEE, , 2013

5. Multi-objective congestion control in, X. Jin, X. Liao, W. Wang, K. Chen and, Y. Ma, J. Zhang, H. Tian, EuroSys, pp. 218–235, , 2022

6. Proximal policy optimization algorithms, O. Klimov, P. Dhariwal, J. Schulman, F. Wolski, A. Radford and, arXiv preprint arXiv:1707.06347, , 2017

7. Cloud gaming with foveated video encoding, G. K. Illahi, A. Ylä- Jääski, T. V. Gemert, A. Oulasvirta and, E. Masala, M. Siekkinen, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 16, no. 1, pp. 1–24, , 2020

8. MahimahiAccurate record-and-replay for http., H. Balakrishnan, S. Das, R. Netravali, A. Sivaraman, A. Goyal, K. Winstein, J. Mickens and, Usenix annual technical conference, pp. 417–429, , 2015

9. Neural adaptive video streaming with pensieve, H. Mao, M. Alizadeh, R. Netravali and, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 197–210, , 2017

10. Cubica new tcp-friendly high-speed tcp variant, S. Ha, L. Xu, I. Rhee and, ACM SIGOPS operating systems review, vol. 42, no. 5, pp. 64–74, , 2008