0% found this document useful (0 votes)
88 views7 pages

Michelangelo: Using A Shape-Image-Text-Aligned Space To Create and Translate 3D Shapes

Do you want to create 3D shapes from images, text or sketches? In this document, we describe Michelangelo, a novel model that can do that and more. Michelangelo is a conditional generative adversarial network that learns a shape-image-text-aligned space. This space allows Michelangelo to generate realistic and diverse 3D shapes that match the given condition. It also allows Michelangelo to translate between different inputs. Find out more about Michelangelo

Uploaded by

My Social
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views7 pages

Michelangelo: Using A Shape-Image-Text-Aligned Space To Create and Translate 3D Shapes

Do you want to create 3D shapes from images, text or sketches? In this document, we describe Michelangelo, a novel model that can do that and more. Michelangelo is a conditional generative adversarial network that learns a shape-image-text-aligned space. This space allows Michelangelo to generate realistic and diverse 3D shapes that match the given condition. It also allows Michelangelo to translate between different inputs. Find out more about Michelangelo

Uploaded by

My Social
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

To read more such articles, please visit our blog https://socialviews81.blogspot.

com/

Michelangelo: Using a Shape-Image-Text-Aligned Space to


Create and Translate 3D Shapes

Introduction

A new model was developed by a team of researchers from


ShanghaiTech University, Tencent PCG, Fudan University, Shanghai
Engineering Research Center of Intelligent Vision and Imaging and
Shanghai Engineering Research Center of Energy Efficient and Custom
AI IC. These institutions are leading centers of research and innovation
in China, with expertise in computer vision, natural language processing,
artificial intelligence and 3D graphics. This new model is a novel deep
learning model that can generate realistic and diverse 3D shapes from
different modalities, such as images, text or sketches. The motivation
behind this model was to enable creative exploration and manipulation of
3D shapes using natural language and visual cues. This new model is
called 'Michelangelo'.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

What is Michelangelo?

Michelangelo is a conditional generative adversarial network (GAN) that


learns a shared latent representation for 3D shapes, images and text. It
can then use this representation to generate 3D shapes that match the
given condition, such as an image, a text description or a sketch.
Michelangelo can also perform cross-modal translation, such as
converting an image to a text description or a text description to a
sketch.

Key Features of Michelangelo

Some of the key features of Michelangelo are:

● It can generate high-quality 3D shapes that are realistic, diverse


and consistent with the given condition.
● It can handle complex and fine-grained conditions, such as
multiple objects, attributes, poses and viewpoints.
● It can generate 3D shapes from different modalities, such as
images, text or sketches, and perform cross-modal translation
between them.
● It can generate 3D shapes in various formats, such as point
clouds, voxels or meshes.
● It can generate 3D shapes for different categories, such as
animals, cars or chairs.

Capabilities/Use Case of Michelangelo

Michelangelo has many potential applications in various domains, such


as:

● Computer graphics and animation: Michelangelo can be used to


create realistic and diverse 3D models for games, movies or virtual
reality.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

● Computer vision and robotics: Michelangelo can be used to


recognize and manipulate 3D objects from images or text.
● Education and art: Michelangelo can be used to teach and learn
about 3D shapes and their properties using natural language and
visual cues.
● Design and engineering: Michelangelo can be used to explore and
prototype new 3D designs using sketches or descriptions.

How does Michelangelo work?

Michelangelo is a model that can create 3D shapes from different kinds


of inputs, such as pictures, words or drawings. It can also change one
kind of input into another, such as turning a picture into words or words
into a drawing. To do this, Michelangelo uses two main parts: a
SITA-VAE and an ASLDM.

source - https://neuralcarver.github.io/michelangelo/

The SITA-VAE (Shaped-Image-Text-Aligned Variational Auto-Encoder) is


like a translator that can speak three languages: 3D shapes, pictures
and words. It can take any of these inputs and turn them into a code that
can be understood by the other parts. For example, it can take a picture
of a cat and turn it into a code that can be used to make a 3D shape of a
cat. The SITA-VAE has three sub-parts: a shape encoder, an image
encoder and a text encoder. Each sub-part can take one kind of input

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

and turn it into a code. The codes are all in the same format, so they can
be mixed and matched. This means that the SITA-VAE can translate
between 3D shapes, pictures and words.

The ASLDM (Aligned Shape Latent Diffusion Mode) is like an artist that
can make 3D shapes from codes. It can take any code and use it to
make a 3D shape that matches the code. The ASLDM works by adding
some randomness to the code and then making a 3D shape from the
random code. The randomness is added slowly, so that the ASLDM can
learn to make 3D shapes that are smooth and realistic.

The ASLDM can also make 3D shapes from codes that are translated
from pictures or words. This is because the SITA-VAE can translate
between 3D shapes, pictures and words. This means that the ASLDM
can make 3D shapes from pictures or words that match the pictures or
words.

Performance Evaluation

In order to truly understand the capabilities of Michelangelo, the


researchers conducted a series of tests on various datasets featuring
diverse 3D shapes like ShapeNet, ModelNet, and COCO. In addition,
they compared Michelangelo against several other methods designed to
generate 3D shapes from images or text, including Occ, ConvOcc,
IF-Net, 3DILG, and 3DS2V. The findings revealed that Michelangelo
outperformed the other methods in terms of creating highly accurate and
diverse 3D shapes.

source - https://arxiv.org/pdf/2306.17115.pdf

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

To delve deeper into Michelangelo's potential, the researchers examined


how effectively it could recreate a 3D shape based on a code from the
ShapeNet dataset. Furthermore, they explored Michelangelo's ability to
generate 3D shapes using both images and text, utilizing a combined
dataset of ShapeNet and 3D Cartoon Monster. It is important to note that
the same codes and inputs were employed for the other methods as
well. The results unequivocally demonstrated that Michelangelo
exhibited superior performance across most categories. Moreover, it
successfully produced 3D shapes that closely resembled the original
ones within each category.

Through comprehensive evaluations on a variety of datasets and a


thorough comparison with other methods, Michelangelo has proven its
prowess in creating accurate and diverse 3D shapes. This breakthrough
technology showcases the remarkable potential of Michelangelo in the
realm of 3D shape generation.

How to access and use this model?

Michelangelo is also open-source and can be used locally. You can find
the source code, the pre-trained models and the instructions on how to
run the model on GitHub Website. The model is licensed under the MIT
License, which means you can use it for any purpose, as long as you
give credit to the original authors.

The page dedicated to the Michelangelo project showcases visually


appealing images that are produced through the utilization of 3DS2V,
3DILG, and Michelangelo itself. Users have the opportunity to evaluate
the image quality by effortlessly maneuvering the images in various
directions, thereby examining their three-dimensional perspective. These
images are thoughtfully displayed side by side, facilitating a convenient
means of comparing the shapes that are generated. Users are actively

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

encouraged to engage with the images and explore the extensive range
of shapes that are generated, ultimately enabling them to discern the
superior performance of Michelangelo in contrast to other cutting-edge
models used for 3D shape generation and cross-modal translation.

If you are interested to learn more about Michelangelo model, all


relevant links are provided under the 'source' section at the end of this
article.

Limitations

Michelangelo is a remarkable model that can generate realistic and


diverse 3D shapes from different modalities, but it also has some
limitations, such as:

1. It requires a large amount of data and computational resources to


train and run the model.
2. It may generate shapes that are not semantically or physically
plausible, especially for complex or rare conditions.
3. It may not capture all the details or variations of the input condition,
especially for fine-grained attributes or poses.
4. It may not generalize well to unseen categories or modalities that
are not in the training data.

Conclusion

Michelangelo is a novel and powerful model that can generate realistic


and diverse 3D shapes from different modalities, such as images, text or
sketches. However, it also has some limitations that need to be
addressed in future work. Michelangelo is a creative and inspiring model
that opens new possibilities for 3D shape generation and manipulation.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

source
research paper - https://arxiv.org/abs/2306.17115
project details- https://neuralcarver.github.io/michelangelo/
GitHub Repo - https://github.com/NeuralCarver/michelangelo

To read more such articles, please visit our blog https://socialviews81.blogspot.com/

You might also like