Faculty of Informatics

Tools

Info for

Italiano

About

Study

Research

Practicalities

News and events

Events

July

2024

19.
07.
2024

Nothing is exact - the role of geometry in R/D for CAD software

Seminars

July

2024

22.
07.
2024

PyTamaro Summer Academy 2024

Workshop, Seminar

Context Aware Collaborative IoT Services in a Smart World

Seminars

July

2024

25.
07.
2024

The Multilevel Kernel-method for Compactly Supported RBFs and its Application in High-dimensional Approximation

Seminars

September

2024

04.
09.
2024

SIESTA 2024 - 4th International Software Engineering Summer School

Workshop, Conference

September

2024

11.
09.
2024

Bernoulli Society Worskhop - Innovation in dynamic network modelling

Workshop

September

2024

17.
09.
2024

ICANN 2024 - The 33rd International Conference on Artificial Neural Networks

Conference

October

2024

09.
10.
2024

A Lugano la 13. edizione della DACH+ Energy Informatics Conference

Conference

Open-Vocabulary 3D Scene Understanding towards Embodied Manipulation

Staff - Faculty of Informatics

Date: 7 June 2024 / 16:15 - 17:00

USI East Campus, Room C1.04

Speaker: Francis Engelmann - ETH Zurich

Abstract: 3D scene understanding is a key ability of humans (and many other living species) to navigate and interact with the environment around us. Bringing these capabilities to intelligent devices (e.g., household robots, smart glasses) is a key effort in current 3D vision research and embodied AI. In this talk, I will present general deep learning models to address a wide variety of 3D scene understanding tasks across multiple modalities, including 3D instance segmentation, vectorized floorplan estimation and human body-part segmentation.

In the second part of the talk, I will discuss multi-modal foundation models for 3D scene understanding. In particular large vision-language models (VLM) which enable possibilities that go well beyond the conventional closed-set 3D vision methods, which are constrained to predefined object categories. Using this new paradigm, we can alleviate these strict constraints and obtain open-vocabulary 3D scene representations for querying arbitrary object classes, recognizing scene functionalities, affordances, and more.

Biography: Francis Engelmann is a postdoctoral researcher at ETH Zurich collaborating with Prof. Marc Pollefeys and a visiting researcher at Google Zurich collaborating with Federico Tombari. His current research interests are at the intersection of deep learning, computer vision, and large visual-language models. His research focuses on 3D scene understanding and representations for open-vocabulary search and manipulation. Prior to joining ETH Zurich, he obtained his Ph.D. from RWTH Aachen with Prof. Bastian Leibe. Francis is a Fellow of the ETH AI Center, a member of the ELLIS Society, and a recipient of ETHZ Career Seed Award and SNSF Postdoc.Mobility fellowship.

Host: Prof. Marc Langheinrich

Contact

Staff - Faculty of Informatics

+41 58 666 46 90

[email protected]

Attachments

Add to your calendar

Share

Facebook

Twitter

LinkedIn

Whatsapp

Email

Print

Faculty of Informatics
Università della Svizzera italiana
Via Buffi 13
6900 Lugano, Svizzera
tel +41 58 666 46 90
e-mail [email protected]
Other contacts Feedback on the website

Directions

How to get to the Faculty

Stay in touch

About

Study

Research

Practicalities

News and events

Open-Vocabulary 3D Scene Understanding towards Embodied Manipulation

Contact

Attachments

Share

Print

Directions

Stay in touch