3D 컨볼루션을 이용한 시간 구분 네트워크 기반 행동인식|RISS 상세보기

국문 초록 (Abstract)

본 논문에서는 영상데이터를 입력으로 받았을 때, 입력 영상이 어떤 행동을 하는 영상인지 분류하는 문제인 행동인식 문제를 다루고, 이를 해결하기 위한 방법으로 3D convolution 연산 하나로 시-공간적 특징을 추출하고 이에 TSN구조를 접목시킨 3D-TSN을 제안한다. 3D-TSN은 기존 3D convolution 기반의 행동인식 알고리즘과 비교하여 입력 형태는 같지만 입력을 만들어내는 방식이 달라져야 한다. 3D-TSN은 segment별로 입력 데이터를 샘플링 할 때, 시간 축으로 연속된 프레임들을 샘플링 해서 짧은 시간 정보를 학습하도록 했고, TSN구조를 사용하여 시간 축으로 멀리 있는 프레임에 대해서도 학습하도록 했다. 본 논문에서는 제안한 방법론에 대한 실험을 위해 HMDB-51 데이터 셋을 사용하였다. 그 결과 3D-TSN과 3D convolution을 이용하여 행동인식을 한 결과를 비교하면 같은 네트워크를 사용하였을 때 기준으로 3~4%정도의 성능향상이 있는 것을 확인할 수 있었다. 또 입력 데이터의 시간 축 길이를 변화하며 실험하였을 때, 3D-TSN이 3D-ResNet과 비교하여 입력데이터로 들어가는 프레임 숫자 변화에 더 강인한 것을 확인할 수 있었다. 최종적으로 3D-TSN에 optical flow를 이용한 네트워크와 two-stream으로 구성한 결과 최대 73.59%의 성능을 확인할 수 있었다.

번역하기

본 논문에서는 영상데이터를 입력으로 받았을 때, 입력 영상이 어떤 행동을 하는 영상인지 분류하는 문제인 행동인식 문제를 다루고, 이를 해결하기 위한 방법으로 3D convolution 연산 하나로 ...

다국어 초록 (Multilingual Abstract)

In this paper, we deal with the action recognition problem, which classifies the input image as the action image when the image data is input. We propose a 3D-TSN that combines TSN structure with 3D convolution. 3D-TSN has the same input format as the 3D convolution- based action recognition algorithm, but the way of generating the input should be different. When sampling an input data in a segment, the 3D-TSN samples the consecutive frames on the time axis to learn short time information and also uses the TSN structure to learn about frames far in time. In this paper, we use the HMDB-51 data set to test the proposed methodology. As a result, it was confirmed that there is a 3 ~ 4% performance improvement over the 3D convolution result. In addition, when we experimented with changing the frame number of the input data, we can confirm that it is more robust to the change of the frame number entering the input data by comparing with the 3D-convolution. When the 3D-TSN and the optical flow were constructed as a two-stream network, the maximum performance was 73.59%.

번역하기

목차 (Table of Contents)

목 차
Ⅰ. 서론 1

목 차
Ⅰ. 서론 1
Ⅱ. 관련연구 4
1. RNN 기반 방법론 4
1) LRCN 4
2. CNN 기반 방법론 6
1) single stream 6
2) two-stream 7
3) Action Recognition using Temporal Segment Networks 8
Ⅲ. 제안하는 방법 10
1. Motivation 10
2. 3D CNN 및 입력데이터 11
3. 3D Temporal Segment Networks 14
Ⅲ. 실험 및 결과 17
1. 실험 환경 및 데이터 전처리 17
2. 실험 결과 19
Ⅲ. 결론 22
참고문헌 23
영문요약 25

상세검색

RISS 보유자료

상세검색

해외전자자료

3D 컨볼루션을 이용한 시간 구분 네트워크 기반 행동인식

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료