We gratefully acknowledge support from
the Simons Foundation and member institutions.

Graphics

New submissions

[ total of 20 entries: 1-20 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 7 May 24

[1]  arXiv:2405.02759 [pdf, other]
Title: Region-Aware Color Smudging
Subjects: Graphics (cs.GR)

Color smudge operations from digital painting software enable users to create natural shading effects in high-fidelity paintings by interactively mixing colors. To precisely control results in traditional painting software, users tend to organize flat-filled color regions in multiple layers and smudge them to generate different color gradients. However, the requirement to carefully deal with regions makes the smudging process time-consuming and laborious, especially for non-professional users. This motivates us to investigate how to infer user-desired smudging effects when users smudge over regions in a single layer. To investigate improving color smudge performance, we first conduct a formative study. Following the findings of this study, we design SmartSmudge, a novel smudge tool that offers users dynamical smudge brushes and real-time region selection for easily generating natural and efficient shading effects. We demonstrate the efficiency and effectiveness of the proposed tool via a user study and quantitative analysis

Cross-lists for Tue, 7 May 24

[2]  arXiv:2405.02386 (cross-list from cs.CV) [pdf, other]
Title: Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids
Comments: SIGGRAPH 2024, Project page: this https URL , Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotropic areas, achieving high-fidelity anti-aliasing renderings. Central to our approach are two key components: Platonic Solid Projection and Ripmap encoding. The Platonic Solid Projection factorizes the 3D space onto the unparalleled faces of a certain Platonic solid, such that the anisotropic 3D areas can be projected onto planes with distinguishable characterization. Meanwhile, each face of the Platonic solid is encoded by the Ripmap encoding, which is constructed by anisotropically pre-filtering a learnable feature grid, to enable featurzing the projected anisotropic areas both precisely and efficiently by the anisotropic area-sampling. Extensive experiments on both well-established synthetic datasets and a newly captured real-world dataset demonstrate that our Rip-NeRF attains state-of-the-art rendering quality, particularly excelling in the fine details of repetitive structures and textures, while maintaining relatively swift training times.

[3]  arXiv:2405.02508 (cross-list from cs.CV) [pdf, other]
Title: Rasterized Edge Gradients: Handling Discontinuities Differentiably
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for rasterization-based differentiable renderers. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy, allowing for a straightforward, effective, and performant solution. We introduce a novel concept of micro-edges, which allows us to treat the rasterized images as outcomes of a differentiable, continuous process aligned with the inherently non-differentiable, discrete-pixel rasterization. This technique eliminates the necessity for rendering approximations or other modifications to the forward pass, preserving the integrity of the rendered image, which makes it applicable to rasterized masks, depth, and normals images where filtering is prohibitive. Utilizing micro-edges simplifies gradient interpretation at discontinuities and enables handling of geometry intersections, offering an advantage over the prior art. We showcase our method in dynamic human head scene reconstruction, demonstrating effective handling of camera images and segmentation masks.

[4]  arXiv:2405.02672 (cross-list from cs.HC) [pdf, other]
Title: Effects of Realism and Representation on Self-Embodied Avatars in Immersive Virtual Environments
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)

Virtual Reality (VR) has recently gained traction with many new and ever more affordable devices being released. The increase in popularity of this paradigm of interaction has given birth to new applications and has attracted casual consumers to experience VR. Providing a self-embodied representation (avatar) of users' full bodies inside shared virtual spaces can improve the VR experience and make it more engaging to both new and experienced users . This is especially important in fully immersive systems, where the equipment completely occludes the real world making self awareness problematic. Indeed, the feeling of presence of the user is highly influenced by their virtual representations, even though small flaws could lead to uncanny valley side-effects. Following previous research, we would like to assess whether using a third-person perspective could also benefit the VR experience, via an improved spatial awareness of the user's virtual surroundings. In this paper we investigate realism and perspective of self-embodied representation in VR setups in natural tasks, such as walking and avoiding obstacles. We compare both First and Third-Person perspectives with three different levels of realism in avatar representation. These range from a stylized abstract avatar, to a "realistic" mesh-based humanoid representation and a point-cloud rendering. The latter uses data captured via depth-sensors and mapped into a virtual self inside the Virtual Environment. We present a throughout evaluation and comparison of these different representations, describing a series of guidelines for self-embodied VR applications. The effects of the uncanny valley are also discussed in the context of navigation and reflex-based tasks.

[5]  arXiv:2405.02676 (cross-list from cs.CV) [pdf, other]
Title: Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics
Comments: SIGGRAPH 2024 Conference Track
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.

[6]  arXiv:2405.03221 (cross-list from cs.CV) [pdf, other]
Title: Spatial and Surface Correspondence Field for Interaction Transfer
Comments: Accepted to SIGGRAPH 2024, project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combined spatial and surface representation. We correspond the agent points and object points related to the representation to the target object space using a learned spatial and surface correspondence field, which represents objects as deformed and rotated signed distance fields. With the corresponded points, an optimization is performed under the constraints of our spatial and surface interaction representation and additional regularization. Experiments conducted on human-chair and hand-mug interaction transfer tasks show that our approach can handle larger geometry and topology variations between source and target shapes, significantly outperforming state-of-the-art methods.

[7]  arXiv:2405.03417 (cross-list from cs.CV) [pdf, other]
Title: Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
Comments: 24 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting.

[8]  arXiv:2405.03485 (cross-list from cs.CV) [pdf, other]
Title: LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
Comments: 9 pages,7 figures, SIGGRAPH 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific motions to the correct body parts. To address this issue, we propose a two-stage pipeline to overcome this challenge: it first employs large language models (LLMs) to decompose global motion descriptions into part-specific narratives, which are then processed by independent body-part motion encoders to ensure precise local semantic alignment. Finally, an attention-based full-body optimizer refines the motion generation results and guarantees the overall coherence. Our experiments demonstrate that LGTM gains significant improvements in generating locally accurate, semantically-aligned human motion, marking a notable advancement in text-to-motion applications. Code and data for this paper are available at https://github.com/L-Sun/LGTM

[9]  arXiv:2405.03659 (cross-list from cs.CV) [pdf, other]
Title: A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset.

Replacements for Tue, 7 May 24

[10]  arXiv:2310.02043 (replaced) [pdf, other]
Title: View-Independent Adjoint Light Tracing for Lighting Design Optimization
Subjects: Graphics (cs.GR)
[11]  arXiv:2402.02771 (replaced) [pdf, other]
Title: TensoSDF: Roughness-aware Tensorial Representation for Robust Geometry and Material Reconstruction
Comments: Accepted by SIGGRAPH 2024
Subjects: Graphics (cs.GR)
[12]  arXiv:2403.06321 (replaced) [pdf, other]
Title: Vertex Block Descent
Subjects: Graphics (cs.GR)
[13]  arXiv:2306.12422 (replaced) [pdf, other]
Title: DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Comments: ICLR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[14]  arXiv:2308.10638 (replaced) [pdf, other]
Title: SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Comments: Updated to camera ready version of CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
[15]  arXiv:2311.10093 (replaced) [pdf, other]
Title: The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Comments: Accepted to SIGGRAPH 2024. Project page is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[16]  arXiv:2311.17834 (replaced) [pdf, other]
Title: Spice-E : Structural Priors in 3D Diffusion using Cross-Entity Attention
Comments: Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[17]  arXiv:2401.00847 (replaced) [pdf, other]
Title: Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Authors: Jiye Lee, Hanbyul Joo
Comments: Accepted to CVPR 2024; Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[18]  arXiv:2403.15064 (replaced) [pdf, other]
Title: Recent Trends in 3D Reconstruction of General Non-Rigid Scenes
Comments: 42 pages, 18 figures, 5 tables; State-of-the-Art Report at EUROGRAPHICS 2024. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[19]  arXiv:2404.11565 (replaced) [pdf, other]
Title: MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Comments: Project Website: this https URL, Same as previous version, only updated metadata because bib was missing an author name
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[20]  arXiv:2405.01558 (replaced) [pdf, other]
Title: Configurable Learned Holography
Comments: 14 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)
[ total of 20 entries: 1-20 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)