http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
다시점 객체 공분할을 이용한 2D-3D 물체 자세 추정
김성흠,복윤수,권인소 한국로봇학회 2017 로봇학회 논문지 Vol.12 No.1
We present a region-based approach for accurate pose estimation of small mechanical components. Our algorithm consists of two key phases: Multi-view object co-segmentation and pose estimation. In the first phase, we explain an automatic method to extract binary masks of a target object captured from multiple viewpoints. For initialization, we assume the target object is bounded by the convex volume of interest defined by a few user inputs. The co-segmented target object shares the same geometric representation in space, and has distinctive color models from those of the backgrounds. In the second phase, we retrieve a 3D model instance with correct upright orientation, and estimate a relative pose of the object observed from images. Our energy function, combining region and boundary terms for the proposed measures, maximizes the overlapping regions and boundaries between the multi-view co-segmentations and projected masks of the reference model. Based on high-quality co-segmentations consistent across all different viewpoints, our final results are accurate model indices and pose parameters of the extracted object. We demonstrate the effectiveness of the proposed method using various examples.
단일 파노라마 입력의 실내 공간 레이아웃 복원 모델 경량화
길다영,김성흠 제어·로봇·시스템학회 2022 제어·로봇·시스템학회 논문지 Vol.28 No.10
In this paper, we present a lightweight deep learning model for room layout estimation. We build on the work in HorizonNet, in which a 3D room is restored from a single panoramic picture using three steps (pre-processing, feature extraction, and post-processing) by studying the efficient computation of the feature extraction part. In contrast to the baseline method that uses a combination of a typical residual network (ResNet) and long short-term memory (LSTM) as its principal architecture, we focus on the use of a platform-aware neural architecture search for mobile applications (MnasNet) and a gated recurrent unit (GRU) instead of conventional LSTM. Subsequently, hyperparameters of the suggested architectures are selected using sampling-based optimization. In our qualitative and quantitative experiments, the lightweight model using the combination of MnasNet and GRU required approximately half as many parameters compared to the original method for competitive performance of room layout estimation. Based on the Manhattan world assumption, the proposed architecture was validated using Stanford2D3D, PanoContext, and our real-world panorama dataset collected by off-the-shelf software in RICOH THETA Z1. .
정적 영상 기반 행동 인식을 활용한 낙상 감지 기술 및 운전자 부주의 감지 기술 개발
김준용,김성흠 제어·로봇·시스템학회 2022 제어·로봇·시스템학회 논문지 Vol.28 No.3
Given image-level training data, a single test image can be used to identify a person’s actions or behaviors. This studyhighlights how the use of keypoint-based representation from single images can simplify the visual patterns of image events. Overthe past few decades, deep learning techniques have successfully been applied in data-driven keypoint detection and skeletalanalysis. While some engineered features from RGB or RGB-D datasets fail in image recognition applications due to ambiguouslightning conditions, previous knowledge from machine learning experimentation on large-scale data can be transferred into a newdomain to improve performance. This idea of applying previously trained data can be applied to other applications as well. Byadopting pre-trained convolutional neural network (CNN) models utilizing action recognition, new applications can be developedfor events such as fall detection and driver drowsiness detection. Using state-of-the-art CNNs as a deep feature extractor to extractimportant key points of a human body or face, the geometric relationship of the predicted joints or facial features can be analyzedto aid in the design of hazardous event detection methods. The methods in this report were validated with publicly available datasetsand successfully demonstrated in real-time. Two different data acquisition systems were used to train and validate these methodsusing real-world images and qualitatively verify them with a sequence of static images. Details of the algorithms and two practicalapplications are also outlined. The approach used in this study is scalable and can be extended to other hazardous event detectionmethods in the future. .
6자유도 자세 추정을 위한 대용량 3D 객체 데이터 구축
장재훈,김준용,김성흠 제어·로봇·시스템학회 2023 제어·로봇·시스템학회 논문지 Vol.29 No.12
. Given the growing necessity of substantial human annotations in deep learning systems to enhance functionality and performance, it is imperative for researchers to scrutinize existing databases and develop their own datasets with custom labels, particularly for target applications such as object detection and pose estimation. This study introduces a large-scale 3D object dataset tailored for six degrees of freedom pose estimation in real-world scenarios. We describe the key features of our datasets available in the AI hub, emphasizing the expansive 3D object collection. Our methodology involves establishing a correspondence between eight points of an object cube in a 2D image, with the object’s pose determined using the conventional perspective-n-point (PnP) algorithm. To analyze the reprojection error, we employed a high-quality 3D mesh model and a binary mask of the target object in the RGB image. For database validation, all object categories were tested using a representative YOLO-like convolutional neural network architecture, such as real-time singleshot pose estimation. In addition, we conduct an in-depth analysis of the current database’s limitations. In the AI hub, we meticulously released all information regarding our new database, presenting it in a format consistent with our baseline database, LINEMOD. A comparative analysis against this baseline was conducted. To overcome the scalability concerns associated with unseen object categories, we explored an effective methodology that leverages vision and language knowledge distillation.