서브메뉴
검색
상세정보
Deep Generative Priors for View Synthesis at Scale.
Deep Generative Priors for View Synthesis at Scale.
상세정보
- 자료유형
- 학위논문(국외)
- 기본표목-개인명
- 표제와 책임표시사항
- Deep Generative Priors for View Synthesis at Scale.
- 발행, 배포, 간사 사항
- 발행, 배포, 간사 사항
- 형태사항
- 123 p.
- 일반주기
- Source: Dissertations Abstracts International, Volume: 87-04, Section: A.
- 일반주기
- Advisor: Kanazawa, Angjoo.
- 학위논문주기
- Thesis (Ph.D.)--University of California, Berkeley, 2025.
- 요약 등 주기
- 요약View synthesis-the task of generating photorealistic images of a scene from novel camera viewpoints-is a cornerstone of computer vision, underpinning graphics, immersive reality, and embodied AI. Yet despite its importance, view synthesis has not demonstrated scaling properties comparable to those in language or 2D generation, even when provided with more data and compute: reconstruction-based methods collapse under sparse views or scene motion, while generative models struggle with 3D consistency and precise camera control.This thesis shows that deep generative priors-instantiated as diffusion models conditioned on camera poses-bridge this gap. We proceed in three steps. First, we start by revealing that state-of-the-art dynamic view-synthesis benchmarks quietly rely on multi-view cues; removing those cues triggers steep performance drops and exposes the brittleness of reconstruction-based models. Then, we present a working solution that injects learned monocular depth and long-range tracking priors into a dynamic 3D Gaussian scene representation, recovering globally consistent geometry and motion from a single video. Finally, we abandon explicit reconstruction altogether, coupling camera-conditioned diffusion with a two-pass sampling strategy to synthesize minute-long, camera-controlled videos from as little as one input image.From diagnosing the limits of reconstruction, to augmenting it with data-driven regularizers, to replacing it with a fully generative pipeline, our results trace a clear progression that delivers state-of-the-art fidelity, temporal coherence, and camera control precision while requiring orders-of-magnitude less input signal. We conclude by outlining open challenges and future directions for scaling view synthesis to truly world-scale 3D environments.
- 주제명부출표목-일반주제명
- 주제명부출표목-일반주제명
- 주제명부출표목-일반주제명
- 비통제 색인어
- 비통제 색인어
- 비통제 색인어
- 비통제 색인어
- 부출표목-단체명
- 기본자료저록
- Dissertations Abstracts International. 87-04A.
- 전자적 위치 및 접속
- 원문정보보기
MARC
008260219s2025 us ||||||||||||||c||eng d■001000017359367
■00520260202105109
■006m o d
■007cr#unu||||||||
■020 ▼a9798297601284
■035 ▼a(MiAaPQ)AAI32236843
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a004
■1001 ▼aGao, Hang.
■24510▼aDeep Generative Priors for View Synthesis at Scale.
■260 ▼a[S.l.]▼bUniversity of California, Berkeley. ▼c2025
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2025
■300 ▼a123 p.
■500 ▼aSource: Dissertations Abstracts International, Volume: 87-04, Section: A.
■500 ▼aAdvisor: Kanazawa, Angjoo.
■5021 ▼aThesis (Ph.D.)--University of California, Berkeley, 2025.
■520 ▼aView synthesis-the task of generating photorealistic images of a scene from novel camera viewpoints-is a cornerstone of computer vision, underpinning graphics, immersive reality, and embodied AI. Yet despite its importance, view synthesis has not demonstrated scaling properties comparable to those in language or 2D generation, even when provided with more data and compute: reconstruction-based methods collapse under sparse views or scene motion, while generative models struggle with 3D consistency and precise camera control.This thesis shows that deep generative priors-instantiated as diffusion models conditioned on camera poses-bridge this gap. We proceed in three steps. First, we start by revealing that state-of-the-art dynamic view-synthesis benchmarks quietly rely on multi-view cues; removing those cues triggers steep performance drops and exposes the brittleness of reconstruction-based models. Then, we present a working solution that injects learned monocular depth and long-range tracking priors into a dynamic 3D Gaussian scene representation, recovering globally consistent geometry and motion from a single video. Finally, we abandon explicit reconstruction altogether, coupling camera-conditioned diffusion with a two-pass sampling strategy to synthesize minute-long, camera-controlled videos from as little as one input image.From diagnosing the limits of reconstruction, to augmenting it with data-driven regularizers, to replacing it with a fully generative pipeline, our results trace a clear progression that delivers state-of-the-art fidelity, temporal coherence, and camera control precision while requiring orders-of-magnitude less input signal. We conclude by outlining open challenges and future directions for scaling view synthesis to truly world-scale 3D environments.
■590 ▼aSchool code: 0028.
■650 4▼aComputer science.
■650 4▼aComputer engineering.
■650 4▼aInformation science.
■653 ▼aDeep generative
■653 ▼aCamera control
■653 ▼a3D Gaussian scene
■653 ▼aGraphics
■690 ▼a0984
■690 ▼a0464
■690 ▼a0723
■71020▼aUniversity of California, Berkeley▼bElectrical Engineering & Computer Sciences.
■7730 ▼tDissertations Abstracts International▼g87-04A.
■790 ▼a0028
■791 ▼aPh.D.
■792 ▼a2025
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17359367▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.


