단일 이미지 기반 3차원 생성 모델 = 3D generative model from a single image|RISS 상세보기

국문 초록 (Abstract)

최근 확산 모델(Diffusion Model)의 강력한 성능이 입증되며, 컴퓨터 비전 분야, 특히 3차원 객체 생성과 텍스처 전이 작업에서 주목받고 있다. 기존 텍스트 변환(Textual Inversion) 기반 접근법은 텍스트-이미지 생성 모델에서 의사 텍스트 프롬프트를 학습해 목표 객체의 개념과 스타일을 생성 할 수 있지만, 추가 학습 시간과 제어 능력 부족이라는 한계를 가진다. 이를 해결하기 위해 챕터 1 에서는 (1) 텍스트 변환 없이 깊이, 자세, 텍스트 등 추가적인 제어를 지원하는 범용 이미지 어댑터 (IP-Adapter)를 활용해 3D 객체를 생성하는 방법과 (2) 3차원 일관성을 향상하기 위한 깊이 조건 화된 워밍업 전략을 제안한다. 실험 결과, 제안 기법은 기존 모델과 비교하여 정성적 및 정량적으로 우수한 성능을 보였으며, 사용자 연구를 통해 입력 이미지와의 충실도 및 3D 일관성에서도 높은 평가를 받았다. 챕터 2에서는텍스처전이과업에서카테고리간텍스처전이를효과적으로수행하기위해확산 모델에서 타임스텝(timestep)에 따른 적응적 어텐션 스케일링 기법을 제안하였다. 이 기법은 메시 정보를 담은 텍스트와 텍스처 정보를 담은 이미지를 모두 고려하여 최적의 텍스처 전이를 가능하 게 한다. 또한 단일 이미지 입력으로 인한 3차원 일관성 저하 문제를 해결하기 위해, 셀프-어텐션 블록와 크로스-어텐션 블록에서의 연산 제어 기법을 제안한다. 실험 결과 제안 기법은 기존 모델과 비교하여 뛰어난 텍스처 생성 능력을 보임과 동시에 뛰어난 3차원 일관성을 유지하였다.

번역하기

최근 확산 모델(Diffusion Model)의 강력한 성능이 입증되며, 컴퓨터 비전 분야, 특히 3차원 객체 생성과 텍스처 전이 작업에서 주목받고 있다. 기존 텍스트 변환(Textual Inversion) 기반 접근법은 텍...

다국어 초록 (Multilingual Abstract)

Recent advancements in diffusion models have demonstrated their remarkable performance, gaining significant attention in the field of computer vision, particularly in 3D object generation and texture transfer tasks. Traditional approaches based on Textual Inversion involve learning pseudo-text prompts in text-to-image generation models to synthesize the target object’s concept and style. However, these methods face limitations such as additional training time and lack of control. To address these challenges, Chapter 1 proposes (1) a method for generating 3D objects using an image adapter (IP-Adapter) that supports additional controls such as depth, pose, and text without requiring textual inversion, (2) a depth-conditioned warmup strategy to enhance 3D consistency. Experimental results indicate that the proposed approach outperforms existing models both qualitatively and quantitatively. Moreover, a user study highlights high fidelity to input images and 3D consistency.

In Chapter 2, to effectively perform texture transfer across categories, an adaptive attention scaling mechanism based on timesteps in diffusion models is proposed. This method optimally facilitates texture transfer by considering both text containing semantic information and images containing texture details. Additionally, to address the issue of reduced 3D consistency caused by single-image input, a control mechanism is introduced in self-attention and cross-attention blocks. Experimental results confirm that the proposed method not only achieves superior texture generation capabilities compared to existing models but also maintains 3D consistency.

번역하기

목차 (Table of Contents)

1 단일 이미지 기반 3차원 객체 생성 1
1.1 요약 1
1.2 서론 2
1.3 관련 연구 2
1.3.1 신경망을 활용한 3차원 표현 2

1 단일 이미지 기반 3차원 객체 생성 1
1.1 요약 1
1.2 서론 2
1.3 관련 연구 2
1.3.1 신경망을 활용한 3차원 표현 2
1.3.2 확산 모델 3
1.3.3 3차원 생성 모델 4
1.4 제안 방법 4
1.4.1 문제 정의 4
1.4.2 제어 가능한 이미지 프롬프트 점수 증류 샘플링 5
1.4.3 깊이 조건화된 워밍업 전략 7
1.5 실험 8
1.5.1 실험 상세 8
1.5.2 데이터 집합 8
1.5.3 정량적 평가 8
1.5.4 정성적 평가 12
1.5.5 논의 13
1.5.6 사용자 평가 13
1.6 결론 15
2 단일 이미지 기반 3차원 텍스처 전이 16
2.1 요약 16
2.2 서론 16
2.3 본론 17
2.3.1 타임스텝에 따른 텍스트-이미지 간 적응적 어텐션 스케일링 17
2.3.2 시점 정렬을 통한 셀프-어텐션 제어 18
2.3.3 시점 종속을 통한 텍스트-어텐션 제어 19
2.4 실험 및 논의 20
2.5 결론 21
References 24
Abstract 30

상세검색

RISS 보유자료

상세검색

해외전자자료

단일 이미지 기반 3차원 생성 모델 = 3D generative model from a single image

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료