RISS 검색 - 학위논문

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

1
南北韓歷史敎科書의 國內民族解放運動에 대한 敍述比較 : 1920 ~ 1930年代를 中心으로

이필현 韓國外國語大學校敎育大學院 2004 국내석사

RANK : 247631
- 복사/대출신청
2
Learning action representation with limited information

이필현 Graduate School, Yonsei University 2023 국내박사

RANK : 247615
- 원문보기
- 음성듣기
With the tremendous growth of the volume of video content on the Internet, it has become an essential task to analyze human actions in long untrimmed videos. Although remarkable advances in the field of deep learning have allowed for constructing strong automatic video analysis models, they come at a cost—deep learning models often require costly information such as human annotations and rich data from various sources. This effectively hinders the deployment of the models in many real-world systems where the available information is restricted. To tackle the challenge, this dissertation aims to build efficient models that are able to learn action representations under constrained scenarios where only a limited amount of information can be leveraged for model training and inference. Specifically, the main focus lies in the task of temporal action localization (or detection), whose goal is to localize temporal intervals of action instances in the given video. The main contributions of this dissertation are as follows. First, we focus on utilizing video-level weak supervision for model training to alleviate the notoriously expensive cost of human annotations for temporal action localization. Specifically, we make the first attempt to model background frames given video-level labels. The key idea is to suppress the activation from background frames for precise action localization by forcing them to be classified into the auxiliary background class. Then we delve deeper into the way of background modeling and introduce a novel perspective on background frames where they are considered to be out-of-distribution samples. Secondly, we explore another type of weak supervision — point-level annotations — where only a single frame for each action instance is annotated. In this setting, we propose a pseudo-label-based approach to learn action completeness from sparse point labels. The resulting model is capable of producing more complete and accurate action predictions. Lastly, we figure out that the bottleneck of action localization models at inference is the heavy computational cost of the motion modality, i.e., optical flow. To relieve the cost, we design a decomposed cross-modal knowledge distillation pipeline to inject motion knowledge into an RGB-based model. By exploiting multimodal complementarity, the model can accurately predict action intervals at low latency, shedding light on the potential adoption of temporal action localization models in real-world systems. We believe that the action representation learning methods under the information constraints proposed in this dissertation will serve as an essential tool for real-world action analysis systems and potentially benefit various computer vision applications.

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

맨처음 페이지로 1 맨끝 페이지로

상세검색

RISS 보유자료

상세검색

해외전자자료

연관 검색어 추천