CADSketchNet' - An Annotated Sketch Dataset For 3D CAD Model Retrieval With Deep Neural Networks
CADSketchNet' - An Annotated Sketch Dataset For 3D CAD Model Retrieval With Deep Neural Networks
CADSketchNet' - An Annotated Sketch Dataset For 3D CAD Model Retrieval With Deep Neural Networks
a r t i c l e i n f o a b s t r a c t
Article history: Ongoing advancements in the fields of 3D modelling and digital archiving have led to an outburst in the
Received 2 April 2021 amount of data stored digitally. Consequently, several retrieval systems have been developed depending
Revised 30 June 2021
on the type of data stored in these databases. However, unlike text data or images, performing a search
Accepted 2 July 2021
for 3D models is non-trivial. Among 3D models, retrieving 3D Engineering/CAD models or mechanical
Available online 8 July 2021
components is even more challenging due to the presence of holes, volumetric features, presence of sharp
Keywords: edges etc., which make CAD a domain unto itself. The research work presented in this paper aims at de-
Retrieval veloping a dataset suitable for building a retrieval system for 3D CAD models based on deep learning. 3D
Search CAD models from the available CAD databases are collected, and a dataset of computer-generated sketch
Dataset data, termed ‘CADSketchNet’, has been prepared. Additionally, hand-drawn sketches of the components
Deep Learning are also added to CADSketchNet. Using the sketch images from this dataset, the paper also aims at eval-
CAD
uating the performance of various retrieval system or a search engine for 3D CAD models that accepts a
Sketch
sketch image as the input query. Many experimental models are constructed and tested on CADSketch-
Net. These experiments, along with the model architecture, choice of similarity metrics are reported along
with the search results.
© 2021 Elsevier Ltd. All rights reserved.
1. Introduction engines for 3D CAD models, using a sketch image as the input
query. Using a sketch-based query for the search offers many ad-
The search or retrieval of Engineering (CAD) models is crucial vantages. 3D shapes, unlike text documents, are not easily re-
for a task such as design reuse [1]. Designers spend a significant trieved using textual annotations ([5]) since it is difficult to charac-
time searching for the right information and using a large percent- terize what human beings see and perceive using a mere text an-
age of existing design for a new product development [2]. [3] indi- notation. [6,7] show that content-based 3D shape retrieval meth-
cates that a large percentage (75% or greater) of design reuses ex- ods (those that use the visual/shape properties of the 3D models)
isting knowledge for the new product development. This calls for are shown to be more effective. It is also shown in [6,7] that us-
the search and classification of 3D Engineering models [4]. With ing traditional search methods for multimedia data will not yield
the wide applicability of 3D data and the increased capabilities of the desired results. [8] utilizes the idea of using the feature de-
modelling, digital archiving, and visualization tools, the problem of scriptors/vectors of the 3D model for the search query. Among the
searching or retrieving CAD models becomes a predominant one. available query options, a sketch-based query is shown to be very
The research work presented in this paper aims at develop- intuitive and convenient for the user [9–11], since it is easier for
ing a well-annotated sketch dataset of 3D Engineering CAD mod- the user to learn and use such a system over using a 3D model it-
els, that can aid in the development of deep learning-based search self as the query since it requires technical expertise and skill [12].
The Princeton Shape Benchmark (PSB) [13] was one of the ear-
lier 3D shape databases. Consequently, large-scale datasets such
as ShapeNet [14] came into being. Due to such data availability,
∗
Corresponding author.
many machine learning-based techniques, which require a good
E-mail addresses: [email protected] (B. Manda), [email protected]
(S. Dhayarkar), [email protected] (S. Mitheran), [email protected] (V.K. amount of data to train the models, have been developed ([15,16]).
Viekash), [email protected] (R. Muthuganapathy). [17] provided the first benchmark dataset of sketches based on
1
Shubham Dhayarkar, Sai Mitheran, V.K. Viekash have contributed equally. the 3D models in PSB. As a result of this, [18] and [19] have
https://doi.org/10.1016/j.cag.2021.07.001
0097-8493/© 2021 Elsevier Ltd. All rights reserved.
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
1. Using the available 3D CAD models from [26] and [27], a 2.3. 3D CAD models of engineering shapes
large-scale dataset of computer-generated sketch data called
‘CADSketchNet’ is created. The SHREC track [42] presents a retrieval challenge using the
2. Additional hand-drawn sketches of CAD models are also in- Engineering Shape Benchmark (ESB) dataset [26]. [43] uses the
corporated into the dataset using the models available from idea from content-based image retrieval to the domain of 3D CAD
[26]. models. [44] performs a visual similarity-based retrieval using 2D
101
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
engineering drawings. This method converts 2D drawings to a 3.1. Challenges in creating a dataset of hand-drawn sketches
shape histogram and then applies the idea of spherical harmonics
to obtain a rotation-invariant shape descriptor. Minkowski distance Creating a hand-drawn sketch for a 3D mechanical component
was used to measure the similarity between feature representa- is much more challenging than sketching a generic 3D shape. This
tions. is because,
[45] uses the data from ShapeLab [46] and ESB to develop a
• It is difficult to capture the detailed information present in a
sketch-based 3D part retrieval system. This paper uses the idea
CAD model, such as the presence of holes and volumetric fea-
of classifier combinations to aid in the retrieval process. Engineer-
tures in a single sketch.
ing models are classified functionally and not visually, as opposed
• Multiple viewing directions can be chosen to draw the sketch.
to the image or 3D graphical data. Taking this into account, the
• The sketches need to be drawn by users with domain knowl-
extracted shape descriptors (Zernike Moments and Fourier Trans-
edge and experience. Gathering a set of users to contribute to
forms) are sent to a Support Vector Machine (SVM) classifier. A
building such a dataset is both time-consuming and expensive.
weighted combination of the classifier outputs is then used to es-
• Once the hand-drawn sketches are obtained, they need to be
timate the class or category of the input query, which is then com-
verified and validated for correctness and closeness to the input
pared against the classes of the database. This is one of the earlier
CAD model.
works that use learning-based methods for building a sketch-based
• Different users have different drawing styles, and consistency
retrieval system for CAD models. However, the sketch-data itself is
needs to be maintained across the hand-drawn sketches.
not available.
More recently, a sketch-based semantic retrieval of 3D CAD Due to these reasons, attempting to create a dataset of hand-
models is presented by [25]. The CAD models and their paramet- drawn sketches for a large number of 3D CAD models is a tedious
ric features (used in 3D modelling software) are taken. The pre- task.
processed sketch query is first vectorized and then passed onto
topology-based rules. An integrated similarity measurement strat- 3.2. Hand-drawn sketch data generation
egy is used to compute the similarity between the query sketch
and CAD models’ database. This research work uses a dataset of The ESB dataset [26] is a publicly available CAD database that
2148 CAD models and six corresponding views of each model. This is also well annotated. In the ESB, there are 801 3D CAD mod-
dataset is proprietary and is not available. Also, the method pre- els across 42 classes (excluding the models in the ‘Misc’ category).
sented here uses the classical rule-based approach over the latest Since, the ESB is a reasonably sized dataset, we attempt to obtain
advances in learning-based approaches. hand-drawn sketches for all 801 3D CAD models of the ESB.
102
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
103
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Fig. 3. (a) 2D image obtained from Algorithm 1 (b) Result of Canny edge detection (c) Result of Gaussian blurring (d) Result of weighted scheme of (b)&(c)
Algorithm 2 Method to generate computer sketch from image 800 seconds to generate a single sketch image on an NVIDIA
1080Ti GPU, owing to the complex neural network pipeline. This
Input Images of 3D CAD models from Algorithm 1
is not suitable for generating sketches for a large number of 3D
1: procedure img_to_sketch
models. On the other hand, while the performance of the proposed
2: I ← Read input image in RGB color space
weighted-canny method is close enough to the Neural Contours
3: G ← Convert image from RGB colorspace to Grayscale
method in most of these cases, the time taken by to generate one
4: IG ← Invert color values of all pixels in Grayscale
sketch from a 3D model is just one second (including converting
5: B ← Convolve non-uniform GF : kernel size k & SD σ
a 3D model to an image followed by generating a sketch), on the
6: IB ← Invert the blurred image
same hardware setup. This aids very much in generating sketches
7: Element-wise division of G & IB with scale = 256.0
for a large dataset of 3D models such as the MCB. Hence, the pro-
8: O1 ← Binary threshold the obtained image
posed method of weighted canny edge detection is chosen to effi-
9: O2 ← Canny Edge Detection of G
ciently generate all the computer sketches of the MCB dataset.
10: S ← Weighted average over O1 and O2
Output Computer-generated sketches of 3D CAD models It is important to note that the goal of creating this dataset is to
aid in the development of deep learning-based CAD model search
engines. Since the end-users of the search engine are humans, and
sketches drawn by an average human being are bound to have er-
In addition to the weighted Canny edge detection method men-
rors, the dataset needs to contain sketches that are not perfect.
tioned above, other popular edge detectors such as the Scharr,
Only then will the learning-based methodologies that make use
Prewitt, Sobel and Robert Cross operators are also experimented
of this dataset become robust to input noise and errors. Figure 4
with. A similar weighted scheme is applied to each of these meth-
shows, for two sample cases, the image of the 3D CAD model,
ods. The other state-of-the-art sketch generation methods such as
the hand-drawn sketch and the computer-generated sketch. It can
NeuralContours [54], PhotoSketch [55], and Context-aware tracing
be observed that while the computer-generated sketch contains
strategy (CATS) [56], are also experimented.
a lot of detail and bears a close resemblance to the input, the
sketch does not look realistic. On the other hand, the hand-drawn
3.3.3. Comparing hand-drawn and computer-generated sketches sketches provide a realistic database of query sketches that can
Creating hand-drawn sketches for a 3D CAD model is ex- be used to train a robust search engine. Clearly, a hand-drawn
tremely difficult, as established in Section 3.1, and since such sketch dataset is more important and valuable as compared to
sketch data is not available for the MCB, we cannot directly com- computer-generated sketch data. Nevertheless, in the absence of
pare the computer-generated and hand-drawn sketches. However, a standard large-scale benchmark dataset of hand-drawn sketches,
for the ESB dataset, we have obtained hand-drawn sketches (see the computer-generated sketch data generated for the MCB dataset
Section 3.2). Therefore, computer sketches are generated for the using Algorithm 2 is the best option available.
801 models in ESB, and these are compared with the correspond-
ing hand-drawn sketches.
Many sketch generation methods were experimented in 3.3.4. Analysis of the proposed sketch-generation method
Section 3.3.2. To find out which among these methods result in To further understand the complexity involved in generating
sketches that most closely resemble the hand-drawn ones, we at- the sketch data, the time taken by the proposed technique is com-
tempt to compare the computer generated sketch and the hand- puted for each class of the MCB. Since the MCB dataset is not
drawn sketch of each 3D CAD model in the ESB. Various state-of- class-balanced, i.e. the data is unevenly distributed across classes,
the-art similarity metrics are used, and the average similarity score we compare the average time for generating a single sketch of
across all models is obtained. A detailed comparison is reported a particular class. It is observed that complex object categories
in Table 1. From the Table, it is clear that the proposed weighted such as Helical Geared Motors, Castor, Turbine etc. take a signifi-
canny approach performs much better than the plain Canny edge cantly higher time compared to the other categories. Simple object
detection. The MSE value with and without the non-maximal sup- classes such as Convex Washer, Cylindrical Pin, Setscrew, Washer
pression (NMS) stage of the plain canny method are 1010.96 & Bolt etc., take negligible time.
1577.43 respectively, while that of weighted canny is 209.41. The As discussed earlier with respect to the hand-drawn sketches,
values indicate that canny without NMS performs poorly, and the we only attempt to generate a single sketch corresponding to ev-
weighted canny approach performs the best. ery 3D CAD model in the MCB dataset and do not change any-
It can also be seen that for most similarity metrics, NeuralCon- thing else. Hence, the number of sketches obtained are the same
tours [54] generates a sketch closest to the hand-drawn sketch. as the number of CAD models in MCB, i.e. 58696 sketches across
However, the time taken to generate one sketch through the other the 68 classes of the MCB. For detailed information regarding the
methods is much higher than the proposed sketch generation categories and the number of models in each class, the reader is
method in Section 3.3.2. The neural contours method takes around directed to the MCB paper [27].
104
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Table 1
Similarity results obtained by using various approaches for comparing the hand-drawn and the computer-generated sketches for various sketch-
generation methods on the 801 CAD models of the ESB dataset. ↑ indicates that greater value for the metric indicates higher similarity, while ↓
indicates the opposite. The plain-canny results reported here are with non-maximal suppression. The similarity measures used are PSNR - Peak
signal-to-noise ratio; MS-SSIM - Multi Scale Structural Similarity Index [57]; IE - Information Entropy; VIF - Visual Information Fidelity [58]; MSE
- Mean Squared Error; UQI - Universal image Quality Index [59];
Sketch-generation method PSNR ↑ MS-SSIM ↑ IE ↑ VIF ↑ MSE ↓ UQI ↑ Conversion time (Per image in sec) ↓
Fig. 4. (a), (d) - Sample Images extracted from two random CAD models in ESB; (b), (e) Computer generated sketch data; (c),(f) - hand-drawn sketch data; Although the
computer-generated sketch has a lot of detail and resembles the input closely, the hand-drawn sketch database provides realistic query images that aid in training robust
retrieval systems
Fig. 5. Sample images from the developed ‘CADSketchNet’ Dataset-A: Computer-generated sketches.
3.4. Summary and ‘CADSketchNet’ Details the Dataset-B as well. Some sample hand-drawn sketches are
shown in Figure 2.
The dataset ‘CADSketchNet’ contains two subsets.
Dataset-A has all the images split into an 80-20 ratio for train-
• Dataset-A contains (1) one representative image for each 3D ing and testing, respectively. This split is as per the MCB [27].
CAD model in the MCB dataset (obtained using Algorithm 1) (2) Dataset-B contains no train-test split since the size of the dataset
one computer-generated sketch for each representative image is not as large as Dataset-A. Nonetheless, users who intend to use
in the MCB dataset (obtained using Algorithm 2). This results in this data can customize the train-test ratio as needed.
58,696 computer-generated sketches across 68 categories. Some
sample sketches generated by the proposed method are shown
in Figure 5. 4. Experiments
• Dataset-B contains 801 hand-drawn sketches, one for each 3D
CAD model in the ESB dataset across 42 categories (as de- In this section, we analyze the behaviour of a few learning algo-
scribed in Section 3.2). Since the 3D CAD model data is ob- rithms for 3D CAD model retrieval on the Dataset-A and Dataset-B
tained from the ESB, the same category information applies to of the CADSketchNet.
105
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Table 2
Results obtained by using various view-based approaches when trained using CADSketchNet - Dataset-A.
Methodology Training Time Precision Recall Retrieval Time mAP Top k-Accuracy %
Fig. 6. A generic pipeline for a sketch-based 3D CAD model search engine. Input query - a sketch image drawn by the user. Pipeline I - input sketch pre-processing followed
by feature extraction. Pipeline II - extracts features of all the CAD models in the database and stores them as a bag of features. The output feature vector from I is compared
against the bag of features from II. Using similarity metrics, the object(s) that is(are) most similar to the input query is(are) retrieved.
4.1. Experiments on Dataset-A • MVCNN-SA - Uses an approach similar to MVCNN, but attempts
to assign relative importance to the input views by using an
Methods proposed in literature mainly use point cloud repre- additional self-attention network.
sentations ([15,16]), and Voxel-grid representations ([60]). These
network architectures take in point cloud inputs or graph inputs A train-test split ratio of 80%-20% is used on ‘CADSketchNet’,
etc., and not images. Since the sketch data is available in the form which is similar to that of the MCB. The results of each of these
of images, it is not possible to experiment with these architectures. methods are summarized in Section 6.
However, a few papers use view-based representations ([37], [61]).
Since, we are dealing mainly with the image representations of 3D 4.2. Experiments on Dataset-B
CAD models, only the view-based methods can be experimented
on the Dataset-A of ‘CADSketchNet’. Dataset-B in itself is not a large-scale dataset, and hence, not
The performances of four view-based learning architectures all deep network architectures can be trained on it. However, us-
are analysed: MVCNN [37], GVCNN [61], RotationNet [62], and ing the LFD images (1 3D CAD model = 20 images), the amount
MVCNN-SA [63]. For training each model, we use the code and the of available training data also increases. Hence, in addition to
default settings for hyper-parameters, as mentioned in the respec- the view-based techniques mentioned above, we also come up
tive papers. A short description of these papers is mentioned here with a few other rule-based and learning-based approaches and
for the sake of completeness and for a better understanding of the analyze their performance. The overall pipeline for performing a
techniques experimented: sketch-based search for 3D CAD models using deep learning can
be broadly described as follows:
• MVCNN - Uses two camera setups (12 views and 80 views) to
render 2D images from a 3D model. These views are passed to a 1. Preparing a dataset of 3D CAD Models and their correspond-
first CNN for extracting relevant features. The obtained features ing sketches suitable for training and testing a deep learning
are then pooled and fed into a second CNN to obtain a compact model (discussed in Section 3).
shape descriptor. 2. Extracting feature representations from the CAD model as
• GVCNN - The 2D views of a 3D model are generated followed well as the query sketch.
by a grouping of these views resulting in different clusters with 3. Developing the model architecture that can efficiently be
associated weights. GoogLeNet is used as the base architecture. trained using the extracted. representation(s) as input.
• RotationNet - Uses only a partial set (≥1) of the full multi- 4. Checking the similarity of the queried sketch and the 3D
view images of an object as input. For each input image, the CAD models either directly or via their feature representa-
CNN also outputs the best viewpoint along with the predicted tions.
class, since the network treats the view-images as latent vari- 5. Retrieving the top-ranked result(s) based on the metric(s) of
ables that are optimized in the training process. similarity used.
106
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Table 3
Results obtained by using various view-based approaches when trained using CADSketchNet - Dataset-B.
Methodology Training Time Precision Recall Retrieval Time mAP Top k-Accuracy %
Table 4
Results obtained by using various models when trained on CADSketchNet - Dataset-B. The Siamese Network architecture out-
performs other methods.
Methodology Training Time Precision Recall Retrieval Time mAP Top k-Accuracy %
Table 5
Results obtained by using various CNN architectures for the sketch-based retrieval of 3D CAD models when trained on the
hand-drawn sketches of CADSketchNet (Dataset-B).
An overview of a generic sketch-based search engine for CAD region. Our Model-1 uses the HOG in both Pipelines I and II (see
models is shown in Fig. 6. In the following sub-sections, each step Fig. 6).
of the pipeline used is explained in greater detail. Pipeline I: The inputs to this pipeline are the sketch images
from the dataset. These sketches are passed to the HOG algorithm,
which generates the feature vectors for each image separately. The
4.2.1. Feature Extraction and Model Architecture Details
following configuration is used for the HOG algorithm after due ex-
This section discusses steps 2 & 3 of the pipeline, namely (1)
perimentation: No. of pixels per cell: (8,8); No. of cells per block:
extracting the feature representations of the query sketch and the
(1,1); Orientations: 8; Block Normalization: L2; Feature Vector Size:
3D CAD models, and (2) using the extracted representations to
1024∗ 1;
build an appropriate network architecture. These two steps go
Pipeline II: Using the idea of LFD, 20 images (256∗ 256) are ob-
hand-in-hand since the model architecture depends upon the di-
tained for each 3D CAD model, resulting in 801∗ 20 images. These
mensionality of the extracted features. If the extracted represen-
images are then forwarded to the HOG block, which has a simi-
tation is a feature vector (1D), a deep neural network (DNN) or
lar configuration as mentioned above, and the bag of features is
variational auto-encoders (VAE) can be used. If the extracted rep-
obtained.
resentations are images (2D), then a convolutional neural network
Model-2 : HOG-AE Model-2 uses the HOG pipeline as defined
(CNN), convolutional auto-encoder (CAE), or a Siamese Network
in Model-1 for Pipeline I and an auto-encoder for Pipeline II. The
(SN) are some possible options. In some other cases, the features
bag of features is obtained by training the auto-encoder (AE) on
are extracted by the neural network itself. The various methods ex-
the LFD images and then extracting the encoded representation
perimented by us are described in this section.
from the latent space of the AE. After various experiments, the
Model-1 : HOG-HOG Histogram of Oriented Gradients (HOG)
following architecture for the AE is obtained. Encoder details: 8
is a widely used feature descriptor in computer vision and image
conv layers with (3∗ 3) filter and (1,1) stride; Batch Normalization
processing. It differs from other feature extraction methods by ex-
(BN) is applied after every two conv layers; 2 dense layers; De-
tracting the edges’ gradient and orientation rather than extracting
coder details: 2 dense and 8 deconv layers; an up-sampling layer
the edges themselves. The input image is broken down into smaller
of size (2,2) is applied after every two deconv layers; Activation
localized regions, and for each region, the gradients and orienta-
function: LReLU with a negative slope of 0.01. This AE is trained for
tions are computed. Using these a Histogram is computed for each
107
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Fig. 7. Top 10 search results of the developed Siamese network model for some sample input queries. Models retrieved in red boxes are from a different category.
30 epochs using the Adam Optimization algorithm. Learning rate: 5.2. Hyper-parameter Tuning, Loss function & Optimization
0.0 0 01; Loss: Mean Squared Error (MSE);
Model-3 : HOG-StackedAE Model-3 uses the HOG pipeline as Training a neural network is an arduous process because of
defined in Model-1 for Pipeline I and a stacked auto-encoder (SAE) the many decisions involved beforehand, such as choice of per-
for Pipeline II. Pipeline II is similar to the one defined in Model- formance metrics, hyper-parameters, loss function, etc. Our choices
2. Instead of passing all 801∗ 20 images as separate inputs, the 20 are mainly based on heuristics ([65–67]) and are backed by exper-
images of each 3D CAD model are stacked and sent as a single imental verification. The Weights & Biases (wandb) library is also
input. Changes from pipeline in Model-2: Number of epochs: 50; used to assist in hyper-parameter tuning.
Learning-rate: 3e-5;
Model-4 : HOG-3DCNN Model-4 uses the HOG pipeline as de- 6. Results and discussion
fined in Model-1 for Pipeline I and a 3D convolutional neural net-
work (3D-CNN) for Pipeline II. For Pipeline II, each 3D CAD model The results of all the Model experiments (see Section 4) are dis-
is passed through a 3D-CNN. The extracted representations from cussed here. Since there are no existing methods on the CADS-
the final dense layer are collected together as the bag of fea- ketchNet dataset, we use these experimental results to compare
tures. The architecture used for the 3D-CNN is 18 3D conv layers the performance of the best retrieval system. The results of search
with kernel size (1,1,1) and stride (1,1,1); max-pooling with kernel- or retrieval are subjective and cannot be precisely quantified. Nev-
size=2, stride=2 along with BN applied after every 2 conv layers; ertheless, to evaluate the retrieval system’s performance, we use
average pool layer with kernel-size=2, stride=2; two dense layers the standard metrics popularly used in literature. The ‘top k ac-
with a dropout 0.5. This network is trained for 120 epochs with curacy’ denotes how many of the k-retrieved classes match the
a learning rate of 0.0 0 01; LReLU activation; Loss: MSE; Optimizer: ground truth class. For instance, if 6 out of the top 20 retrieved re-
Adam; sults match the ground truth class, the accuracy is 30%. For all the
Model-5 : CNN-CNN / Siamese Network Model-5 uses a convo- reported results, we use k=10. We also calculate the precision and
lutional neural network (CNN) for both pipelines. This architecture recall values for the retrieval results. The mean Average Precision
is also known as Siamese Network, which implements the same (mAP), which is the area under the P-R curve, is also computed.
network architecture and weights for both pipelines. The network
is trained for 10 epochs; Learning Rate: 0.0 0 01; Optimizer: Adam; 6.1. Results of experiments on dataset-A
Batch Size: 2; Loss: SiameseLoss function as described in [64]; Ac-
tivation function: LeakyReLU with a negative slope = 0.01. The results of the four view-based methods experimented in
To ensure that a sufficient proportion of similar and dissimilar Section 4.1 are summarized in Table 2 along with the time taken
pairs are generated, an approach similar to that of [64] is used. for retrieval. Both the MVCNN and the MVCNN-SA methods yield
For each training sketch, a random number of view pairs (k p ) in similar performance when tested on Dataset-A, with MVCNN-SA
the same category and kn view samples from other categories (dis- performing slightly better in terms of lesser training time, lesser
similar pairs) are chosen. In the current experiment, the values time taken for retrieval and the top-k accuracy. Although Rota-
k p = 2, kn = 20 are used. This random pairing is done for each tionNet takes the longest time for training, it performs better than
training epoch. For increasing the number of training samples, data GVCNN. This could be because of the following reasons. (1) GVCNN
augmentation for the sketch set is also done. assigns relative weights for the input view images and groups the
108
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Table 7
Results obtained by the Siamese Network Architecture (Model-5) for the sketch-based retrieval of 3D CAD models when trained on the computer-
generated sketches of CADSketchNet (Dataset-A): Part 1 of 2
S No Class No.of Models Precision Recall Retrieval Time mAP Top k-Accuracy
1 Articulations, eyelets and other articulated joints 1632 0.992 0.815 4.10E-05 0.986 97.58
2 Bearing accessories 107 0.990 0.803 5.60e-05 0.985 98.11
3 Bushes 764 0.988 0.820 4.20e-05 0.985 97.99
4 Cap nuts 225 0.983 0.809 4.10e-05 0.989 97.84
5 Castle nuts 226 0.979 0.820 4.80e-05 0.988 97.82
6 Castor 99 0.983 0.821 4.60e-05 0.987 98.18
7 Chain drives 100 0.974 0.814 4.00e-05 0.988 98.17
8 Clamps 155 0.979 0.807 4.60e-05 0.989 98.21
9 Collars 52 0.986 0.835 5.10e-05 0.984 97.56
10 Conventional rivets 3806 0.983 0.819 3.70e-05 0.986 97.82
11 Convex washer 91 0.981 0.818 3.40e-05 0.987 98.03
12 Cylindrical pins 1895 0.982 0.814 3.30e-05 0.985 97.69
13 Elbow fitting 383 0.992 0.823 3.10e-05 0.987 97.41
14 Eye screws 1131 0.980 0.831 5.20e-05 0.984 98.16
15 Fan 213 0.995 0.817 3.00e-05 0.989 97.15
16 Flanged block bearing 404 0.988 0.815 2.90e-05 0.988 98.03
17 Flanged plain bearings 110 0.985 0.816 4.30e-05 0.988 98.50
18 Flange nut 53 0.991 0.808 5.60e-05 0.988 97.81
19 Grooved pins 2245 0.980 0.812 3.50e-05 0.987 97.85
20 Helical geared motors 732 0.989 0.825 4.70e-05 0.986 97.83
21 Hexagonal nuts 1039 0.991 0.815 3.20e-05 0.988 97.50
22 Hinge 54 0.976 0.815 5.60e-05 0.986 97.96
23 Hook 119 0.985 0.828 5.40e-05 0.990 98.17
24 Impeller 145 0.997 0.840 4.10e-05 0.989 97.06
25 Keys and keyways, splines 4936 0.993 0.818 6.10e-05 0.985 97.71
26 Knob 644 0.988 0.809 4.20e-05 0.984 97.72
27 Lever 1032 0.972 0.816 3.90e-05 0.987 97.66
28 Locating pins 55 0.992 0.820 4.70e-05 0.989 98.14
29 Locknuts 254 0.988 0.805 5.30e-05 0.991 97.75
30 Lockwashers 434 0.979 0.817 4.60e-05 0.986 98.04
31 Nozzle 154 0.988 0.820 4.40e-05 0.988 97.77
32 Plain guidings 49 0.980 0.811 4.70e-05 0.987 98.25
33 Plates, circulate plates 365 0.985 0.813 2.20e-05 0.985 97.67
34 Plugs 169 0.983 0.815 4.60e-05 0.983 98.03
35 Pulleys 121 0.976 0.830 5.10e-05 0.988 97.99
36 Radial contact ball bearings 1199 0.981 0.824 2.30e-05 0.988 97.83
views according to the assigned weights. It might be that the view- Model-2 and Model-3 use a similar approach to retrieval, with
ing direction of the sketch image does not match with that of the the differences being Model-2 using the view images of the CAD
group with the highest weight, thus slightly affecting the perfor- models separately, while Model-3 uses the view images together
mance. (2) RotationNet takes into account the alignment of the (stacked). The time taken to train Model-3 (see Table 4) is expect-
input objects. Since object orientations in the MCB Dataset, and edly lesser since Model-2 has to process 20 times more image data.
thereby the CADSketchNet-A, are aligned, the performance of the While the results from Model-2 appear quite satisfactory (in the
RotationNet model is enhanced. sense that the top-k accuracy is slightly greater than 50%), Model-3
severely under-performs. This might be because the network uses a
stacked set of view images, and in an attempt to capture much in-
6.2. Results of experiments on dataset-B
formation, the network overfits the data and thus under-performs.
Thus Model-2, which treats each view image separately, performs
The results of the four view-based methods experimented in
much better.
Section 4.2 are summarized in Table 3. Also, for the experimental
Model-4 uses a 3DCNN, which is computationally very inten-
models described in 4.2.1, the outputs from pipelines I and II are
sive. This is evident from Table 4, where Model-4 takes the high-
compared using a ‘similarity check’ block. For all the models (ex-
est time for training and searching. This is not conducive since,
cept Model-5), the Mean Squared Error (MSE) function is used to
in real-time scenarios, the search results need to be delivered to
calculate the similarity between the extracted features. For Model-
the user quickly. Also, the model’s performance is not satisfactory
5, a custom Siamese Loss function described in [64] is used to
since capturing the 3D CAD models directly leads to unnecessary
measure the similarity. These results and the time taken for re-
computations since most voxels in the input data would be empty
trieval are reported in Table 3.
(3D CAD models are typically sparse). This adversely affects the re-
trieval performance.
6.2.1. Discussion As it is evident from Table 4, using a Siamese network archi-
Model-1 uses a simple HOG-HOG pipeline and is unable to pro- tecture (CNN in both pipelines) for the search engine yields the
vide the user with relevant search results. Other models use deep best results among all methods that have been experimented, for
learning techniques and provide better results than the naive ap- all evaluation metrics. Some sample search results are shown in
proach used in Model-1 (except Model-3). Even the time taken for Figure 7. Also, the model takes much lesser training time com-
retrieval is relatively higher, considering the inexpensive computa- pared with other models. The search time depends a lot on the
tional nature of an algorithmic approach instead of a data-driven network architecture, the number of parameters to be trained and
approach.
109
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Table 8
Results obtained by the Siamese Network Architecture (Model-5) for the sketch-based retrieval of 3D CAD models when trained on the
computer-generated sketches of CADSketchNet (Dataset-A): Part 2 of 2
S No Class No.of Models Precision Recall Retrieval Time mAP Top k-Accuracy
so forth. Also, the other methods use MSE for similarity measure- 6.3. Comparison with deep-learning approaches used for 3D
ment, while Model-5 uses a custom Siamese loss function which graphical models
takes longer to compute. Moreover, search time is only a secondary
measure of evaluating model performance. It is more important to The results mentioned in the preceding sections are for models
obtain a good performance as opposed to obtaining inaccurate re- trained on the created CADSketchNet dataset, which focuses on 3D
sults quickly. CAD models of mechanical components. The Section 1, Introduc-
Since Model-5 gives the best performance, various deep learn- tion, includes a description of how these CAD models differ from
ing architecture pipelines are used for the Siamese network and ordinary 3D shapes. In this section, we attempt to validate the as-
are analyzed for performance. These results are reported in Table 5. sertion by employing deep-learning models designed for 3D graph-
It can be seen that using a ResNet18 backbone yields the best re- ical models. We examine the performance of the same view-based
trieval accuracy, and ResNet34 backbone results in the best mAP strategies that were mentioned in Section 4.1.
value. We use the same pipeline summarized in Figure 6 for this
In addition, the class-wise results of only the Model-5 are re- purpose. Pipeline I receives no changes, whereas Pipeline II em-
ported in Table 7 & 8 when trained on Dataset-A. It is observed ploys the architecture that was pre-trained on ModelNet40 (a
that the Siamese model performs very well on the computer- 3D graphical models dataset). The performance of the aforemen-
generated sketch Dataset-A. The top k-accuracy value for every tioned approaches is then evaluated on the created CADSketchNet.
class in the MCB dataset is in the range 97.06% to 98.50%. MCB is a Table 6 summarizes these results. Comparing these, with the meth-
large dataset and has sufficient examples in each category. Hence, ods trained on 3D CAD model data (Tables 2 & 3), it can be inferred
the higher scores are justified. that, the methodologies established for 3D graphical models do not
Table 9 reports the class-wise results of the Siamese Model translate well to 3D CAD models of engineering shapes. Network
when trained on Dataset-B, i.e. the hand-drawn sketches, which architectures trained on a dataset specific to CAD models perform
has only 801 training samples. The results obtained differ signif- significantly better. Therefore, instead of attempting to generalise
icantly from those found for Dataset-A. The class ’Long Machine the usage of approaches intended at graphical models upon engi-
Elements,’ which comprises only 15 sketches, has the lowest ob- neering shapes, there is a need for developing dedicated datasets
tained accuracy rating of 86.67 percent. Considering the scarcity of and methodologies that explicitly focus on 3D CAD models.
training data, this value indicates a good performance. Some cate-
6.4. Limitations and possible future work
gories only have single-digit training examples, such as ‘BackDoors’
and ‘Clips’. Some of these classes obtain a 100% accuracy, but this
The scope of this work is confined to 3D Engineering CAD Mesh
is only because there are insufficient number of testing examples.
models. The current method does not retrieve other types of data,
For a majority of the other categories, where there are a signif-
such as images or 3D point sets. It is worthwhile to investigate the
icant number of examples available to train, the Siamese Model
possibility of creating a unified dataset with multiple input formats
performs quite well.
and developing a retrieval system that can search for multiple data
110
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Table 9
Results obtained by the Siamese Network Architecture (Model-5) for the sketch-based retrieval of 3D CAD models when
trained on the hand-drawn sketches of CADSketchNet (Dataset-B).
S No Class No.of Models Precision Recall Retrieval Time mAP Top k-Accuracy
111
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
in deep learning. The performance of standard view-based meth- [14] Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3D shapenets: A deep
ods proposed in the literature is analyzed on CADSketchNet. Addi- representation for volumetric shapes. In: 2015 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR); 2015. p. 1912–20. doi:10.1109/CVPR.
tionally, the results of a few other experiments using popular deep 2015.7298801.
learning architectures are also reported. The possibilities of extend- [15] Qi CR, Su H, Mo K, Guibas LJ. Pointnet: Deep learning on point sets for 3d clas-
ing this research work to other similar problems have also been sification and segmentation. In: CVPR. IEEE Computer Society; 2017a. p. 77–85.
[16] Qi CR, Yi L, Su H, Guibas LJ. Pointnet++: Deep hierarchical feature learning on
discussed. point sets in a metric space. In: NIPS.
[17] Eitz M, Richter R, Boubekeur T, Hildebrand K, Alexa M. Sketch-based shape
Declaration of Competing Interest retrieval. ACM Trans Graph 2012a;31 31:1–31:10.
[18] Li B, Lu Y, Godil A, Schreck T, Aono M, Johan H, Saavedra JM, Tashiro S.
SHREC’13 Track: Large Scale Sketch-Based 3D Shape Retrieval. In: Castellani U,
The authors of this manuscript titled “ ‘CADSketchNet’ - An An- Schreck T, Biasotti S, Pratikakis I, Godil A, Veltkamp R, editors. Eurographics
notated Sketch dataset for 3D CAD Model Retrieval with Deep Neu- Workshop on 3D Object Retrieval. The Eurographics Association; 2013. ISBN
978-3-905674-44-6.
ral Networks agree to the submission of this manuscript to the
[19] Li B, Lu Y, Li C, Godil A, Schreck T, Aono M, Burtscher M, Fu H, Furuya T, Jo-
Computers and Graphics Journal as part of the 3DOR 2021 Con- han H, Liu J, Ohbuchi R, Tatsuma A, Zou C. Extended Large Scale Sketch-Based
ference. No author is affiliated with any financial organization or 3D Shape Retrieval. In: Bustos B, Tabia H, Vandeborre J-P, Veltkamp R, editors.
have received funding from sources. This paper is not currently be- Eurographics Workshop on 3D Object Retrieval. The Eurographics Association.
ISBN 978-3-905674-58-3.
ing considered for publication elsewhere. [20] Qin F-w, Li L-y, Gao S-m, Yang X-l, Chen X. A deep learning approach to
the classification of 3D CAD models. Journal of Zhejiang University SCIENCE
CRediT authorship contribution statement C 2014;15(2):91–106. doi:10.1631/jzus.C1300185.
[21] Koch S, Matveev A, Jiang Z, Williams F, Artemov A, Burnaev E, Alexa M,
Zorin D, Panozzo D. Abc: A big cad model dataset for geometric deep learning.
Bharadwaj Manda: Conceptualization, Methodology, Software, In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR);
Data curation, Writing – original draft, Visualization, Project ad- 2019.
[22] Ip CY, Regli WC, Sieger L, Shokoufandeh A. Automated learning of model clas-
ministration. Shubham Dhayarkar: Methodology, Software, Vali- sifications. In: Proceedings of the Eighth ACM Symposium on Solid Modeling
dation, Data curation. Sai Mitheran: Methodology, Software, Val- and Applications. New York, NY, USA: ACM; 2003. p. 322–7. ISBN 1-58113-706-
idation, Data curation. V.K. Viekash: Methodology, Software, Vali- 0. doi:10.1145/781606.781659.
[23] Wu MC, Jen SR. A neural network approach to the classification of 3D
dation, Data curation. Ramanathan Muthuganapathy: Conceptual- prismatic parts. Int J Adv Manuf Technol 1996;11(5):325–35. doi:10.1007/
ization, Resources, Writing – review & editing, Supervision. BF01845691.
[24] Bespalov D, Ip CY, Regli WC, Shaffer J. Benchmarking CAD search techniques.
In: Proceedings of the 2005 ACM Symposium on Solid and Physical Modeling.
Acknowledgments
New York, NY, USA: ACM; 2005. p. 275–86. ISBN 1-59593-015-9. doi:10.1145/
1060244.1060275.
Thanks are due to the teams of the ESB and the MCB datasets [25] Qin F, Gao S, Yang X, Bai J, hong Zhao Q. A sketch-based semantic retrieval
approach for 3d cad models. Appl Math-A J Chinese Univer 2017;32:27–52.
for making their data publicly available. Thanks are also due to
[26] Jayanti S, Kalyanaraman Y, Iyer N, Ramani K. Developing an engineering shape
many users who have contributed to our CADSketchNet dataset. benchmark for CAD models. Comput-Aided Des 2006;38(9):939–53. doi:10.
1016/j.cad.2006.06.007. Shape Similarity Detection and Search for CAD/CAE
References Applications
[27] Kim S, Chi H-g, Hu X, Huang Q, Ramani K. A large-scale annotated mechanical
[1] Bai J, Gao S, Tang W, Liu Y, Guo S. Design reuse oriented partial retrieval of components benchmark for classification and retrieval tasks with deep neural
cad models. Comput-Aided Des 2010;42(12):1069–84. doi:10.1016/j.cad.2010. networks. In: Proceedings of 16th European Conference on Computer Vision
07.002. (ECCV); 2020.
[2] Gunn TG. The mechanization of design and manufacturing. Sci Am [28] Manda B, Bhaskare P, Muthuganapathy R. A convolutional neural network ap-
1982;247(3):114–31. proach to the classification of engineering models. IEEE Access 2021. doi:10.
[3] Ullman DG. The mechanical design process, vol 2. McGraw-Hill New York; 1109/ACCESS.2021.3055826. 1–1
2010. https://www.davidullman.com/mechanical- design- process- 6ed [29] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpa-
[4] Iyer N, Jayanti S, Lou K, Kalyanaraman Y, Ramani K. Three-dimensional thy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. Imagenet large scale vi-
shape searching: state-of-the-art review and future trends. Comput-Aided Des sual recognition challenge. Int J Comput Vis 2015;115(3):211–52. doi:10.1007/
2005;37(5):509–30. doi:10.1016/j.cad.2004.07.002. Geometric Modeling and s11263- 015- 0816- y.
Processing 2004 [30] Sangkloy P, Burnell N, Ham C, Hays J. The sketchy database: Learning to
[5] Funkhouser T, Min P, Kazhdan M, Chen J, Halderman A, Dobkin D, Jacobs D. retrieve badly drawn bunnies. ACM Trans Graph 2016;35(4). doi:10.1145/
A search engine for 3d models. ACM Trans Graph 2003;22(1):83–105. doi:10. 2897824.2925954.
1145/588272.588279. [31] Eitz M, Hays J, Alexa M. How do humans sketch objects? ACM Trans Graph
[6] Tangelder JWH, Veltkamp RC. A survey of content based 3d shape retrieval 2012b;31(4). doi:10.1145/2185520.2185540.
methods. In: Proceedings Shape Modeling Applications, 20 04.; 20 04. p. 145– [32] Radenovic F, Tolias G, Chum O. Deep shape matching. In: Proceedings of the
56. doi:10.1109/SMI.2004.1314502. European Conference on Computer Vision (ECCV); 2018.
[7] Bustos B, Keim D, Saupe D, Schreck T. Content-based 3d object retrieval. IEEE [33] Dey S, Riba P, Dutta A, Llads JL, Song Y. Doodle to search: Practical zero-shot
Comput Graph Appl 2007;27(4):22–7. doi:10.1109/MCG.2007.80. sketch-based image retrieval. In: 2019 IEEE/CVF Conference on Computer Vi-
[8] Suzuki MT. A web-based retrieval system for 3d polygonal models. In: Pro- sion and Pattern Recognition (CVPR); 2019. p. 2174–83. doi:10.1109/CVPR.2019.
ceedings Joint 9th IFSA World Congress and 20th NAFIPS International Con- 00228.
ference (Cat. No. 01TH8569), vol. 4; 2001. p. 2271–6. doi:10.1109/NAFIPS.2001. [34] Yelamarthi SK, Reddy MK, Mishra A, Mittal A. A zero-shot framework for
944425. sketch-based image retrieval. In: ECCV; 2018.
[9] Aono M, Iwabuchi H. 3d shape retrieval from a 2d image as query. In: Pro- [35] Bhunia A, Yang Y, Hospedales TM, Xiang T, Song Y-Z. Sketch less for more:
ceedings of The 2012 Asia Pacific Signal and Information Processing Associa- On-the-fly fine-grained sketch-based image retrieval. 2020 IEEE/CVF Conf
tion Annual Summit and Conference; 2012. p. 1–10. Comput VisionPattern Recognit (CVPR) 2020:9776–85.
[10] Shin H, Igarashi T. Magic canvas: Interactive design of a 3-d scene proto- [36] Labs P.V..R.. Modelnet benchmark leaderboard. 2021. http://modelnet.cs.
type from freehand sketches. In: Proceedings of Graphics Interface 2007. New princeton.edu/.
York, NY, USA: Association for Computing Machinery; 2007. p. 63–70. ISBN [37] Su H, Maji S, Kalogerakis E, Learned-Miller EG. Multi-view convolutional neu-
9781568813370. doi:10.1145/1268517.1268530. ral networks for 3d shape recognition. 2015 IEEE Int Conf Comput Vis (ICCV)
[11] Lee J, Funkhouser T. Sketch-based search and composition of 3d models. In: 2015:945–53.
Proceedings of the Fifth Eurographics Conference on Sketch-Based Interfaces [38] Sfikas K, Theoharis T, Pratikakis I. Exploiting the PANORAMA representation
and Modeling. Goslar, DEU: Eurographics Association; 2008. p. 97–104. ISBN for convolutional neural network classification and retrieval. In: Pratikakis I,
9783905674071. Dupont F, Ovsjanikov M, editors. Eurographics Workshop on 3D Object Re-
[12] Li B, Lu Y, Godil A, Schreck T, Bustos B, Ferreira A, Furuya T, Fonseca MJ, trieval. The Eurographics Association; 2017. ISBN 978-3-03868-030-7.
Johan H, Matsuda T, Ohbuchi R, Pascoal PB, Saavedra JM. A comparison of [39] Maturana D, Scherer S. Voxnet: A 3d convolutional neural network for real-
methods for sketch-based 3d shape retrieval. Comput Vis Image Understand time object recognition. In: 2015 IEEE/RSJ International Conference on In-
2014a;119:57–80. doi:10.1016/j.cviu.2013.11.008. telligent Robots and Systems (IROS); 2015. p. 922–8. doi:10.1109/IROS.2015.
[13] Shilane P, Min P, Kazhdan M, Funkhouser T. The princeton shape benchmark. 7353481.
In: Proceedings Shape Modeling Applications, 20 04.; 20 04. p. 167–78. doi:10. [40] Li B, Schreck T, Godil A, Alexa M, Boubekeur T, Bustos B, Chen J, Eitz M,
1109/SMI.2004.1314504. Furuya T, Hildebrand K, Huang S, Johan H, Kuijper A, Ohbuchi R, Richter R,
112
B. Manda, S. Dhayarkar, S. Mitheran et al. Computers & Graphics 99 (2021) 100–113
Saavedra JM, Scherer M, Yanagimachi T, Yoon GJ, Yoon SM. SHREC’12 Track: [56] Huan L, Zheng X, Xue N, He W, Gong J, Xia G. Unmixing convolutional features
Sketch-Based 3D Shape Retrieval. In: Spagnuolo M, Bronstein M, Bronstein A, for crisp edge detection. CoRR 2020;abs/2011.09808.
Ferreira A, editors. Eurographics Workshop on 3D Object Retrieval. The Euro- [57] Wang Z, Simoncelli E, Bovik A. Multiscale structural similarity for image qual-
graphics Association; 2012. ISBN 978-3-905674-36-1. ity assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Sys-
[41] Hua B-S, Truong Q-T, Tran M-K, Pham Q-H, Kanezaki A, Lee T, Chiang H, tems Computers, 2003, 2; 2003. p. 1398–402. doi:10.1109/ACSSC.2003.1292216.
Hsu W, Li B, Lu Y, Johan H, Tashiro S, Aono M, Tran M-T, Pham V-K, Nguyen H- [58] Sheikh H, Bovik A. Image information and visual quality. IEEE Trans Image Pro-
D, Nguyen V-T, Tran Q-T, Phan TV, Truong B, Do MN, Duong A-D, Yu L-F, cess 2006;15(2):430–44. doi:10.1109/TIP.2005.859378.
Nguyen DT, Yeung S-K. RGB-D to CAD Retrieval with ObjectNN Dataset. In: [59] Wang Z, Bovik A. A universal image quality index. IEEE Signal Process Letts
Pratikakis I, Dupont F, Ovsjanikov M, editors. Eurographics Workshop on 3D 2002;9(3):81–4. doi:10.1109/97.995823.
Object Retrieval. The Eurographics Association; 2017. ISBN 978-3-03868-030- [60] Maturana D, Scherer S. Voxnet: A 3d convolutional neural network for re-
7. al-time object recognition. 2015 IEEE/RSJ Int Conf Intell Robot Syst (IROS)
[42] Muthuganapathy R, Ramani K. Shape retrieval contest 2008: Cad models. 2015:922–8.
In: 2008 IEEE International Conference on Shape Modeling and Applications; [61] Feng Y, Zhang Z, Zhao X, Ji R, Gao Y. Gvcnn: Group-view convolutional neural
2008. p. 221–2. doi:10.1109/SMI.2008.4547977. networks for 3d shape recognition. 2018 IEEE/CVF Conf Comput Vis Pattern
[43] Jain A, Muthuganapathy R, Ramani K. Content-based image retrieval using Recognit 2018:264–72.
shape and depth from an engineering database. In: Proceedings of the 3rd [62] Kanezaki A, Matsushita Y, Nishida Y. Rotationnet for joint object categoriza-
International Conference on Advances in Visual Computing - Volume Part II. tion and unsupervised pose estimation from multi-view images. IEEE Trans
Berlin, Heidelberg: Springer-Verlag; 2007. p. 255–64. ISBN 3540768556. Pattern Anal Mach Intell (TPAMI) 2021;43(1):269–83. doi:10.1109/TPAMI.2019.
[44] Pu J, Ramani K. On visual similarity based 2d drawing retrieval. Comput-Aided 2922640.
Des 2006;38(3):249–59. doi:10.1016/j.cad.2005.10.009. [63] Shajahan DA, Nayel V, Muthuganapathy R. Roof classification from 3-d lidar
[45] Hou S, Ramani K. Calligraphic interfaces: Classifier combination for sketch- point clouds using multiview cnn with self-attention. IEEE Geosci Remote Sens
based 3d part retrieval. Comput Graph 2007;31(4):598–609. doi:10.1016/j.cag. Lett 2020;17(8):1465–9. doi:10.1109/LGRS.2019.2945886.
20 07.04.0 05. [64] Fang Wang, Le Kang, Yi Li. Sketch-based 3d shape retrieval using convolutional
[46] Pu J, Ramani K. A 3d model retrieval method using 2d freehand sketches. neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern
In: Sunderam VS, van Albada GD, Sloot PMA, Dongarra JJ, editors. Compu- Recognition (CVPR); 2015. p. 1875–83. doi:10.1109/CVPR.2015.7298797.
tational Science – ICCS 2005. Berlin, Heidelberg: Springer Berlin Heidelberg; [65] Goodfellow I, Bengio Y, Courville A. Deep Learning, Chapter - Practical Method-
2005. p. 343–6. ISBN 978-3-540-32114-9. ology. MIT Press; 2016. http://www.deeplearningbook.org
[47] Bonnici A, Akman A, Calleja G, Camilleri KP, Fehling P, Ferreira A, Her- [66] Bengio Y. Practical recommendations for gradient-based training of deep ar-
muth F, Israel JH, Landwehr T, Liu J, et al. Sketch-based interaction and mod- chitectures. CoRR 2012;abs/1206.5533.
eling: where do we stand? Artif Intell Eng Design Anal Manuf: AI EDAM [67] Larochelle H, Bengio Y, Louradour J, Lamblin P. Exploring strategies for training
2019;33(4):370–88. deep neural networks. J Mach Learn Res 2009;10:1–40.
[48] Gryaditskaya Y, Sypesteyn M, Hoftijzer JW, Pont S, Durand F, Bousseau A. [68] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to doc-
Opensketch: a richly-annotated dataset of product design sketches. ACM Trans ument recognition. Proc IEEE 1998;86(11):2278–324. doi:10.1109/5.726791.
Graphics (Proc SIGGRAPH Asia) 2019;38. [69] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep con-
[49] Han W, Xiang S, Liu C, Wang R, Feng C. Spare3d: A dataset for spatial rea- volutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ,
soning on three-view line drawings. In: 2020 IEEE/CVF Conference on Com- editors. Advances in Neural Information Processing Systems 25. Curran Asso-
puter Vision and Pattern Recognition (CVPR); 2020. p. 14678–87. doi:10.1109/ ciates, Inc.; 2012. p. 1097–105.
CVPR42600.2020.01470. [70] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale
[50] Zhong Y, Qi Y, Gryaditskaya Y, Zhang H, Song Y-Z. Towards practical sketch- image recognition. CoRR 2014;abs/1409.1556.
based 3d shape generation: The role of professional sketches. IEEE Trans Cir- [71] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception
cuit Syst Video Technol 2020. doi:10.1109/TCSVT.2020.3040900. 1–1 architecture for computer vision. In: Proceedings of the IEEE conference on
[51] Seff A, Ovadia Y, Zhou W, Adams R. Sketchgraphs: A large-scale dataset computer vision and pattern recognition; 2016. p. 2818–26.
for modeling relational geometry in computer-aided design. ArXiv [72] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convo-
2020;abs/2007.08506. lutional networks. In: Proceedings of the IEEE conference on computer vision
[52] Chen D-Y, Tian X-P, Shen Y-T, Ouhyoung M. On visual similarity based and pattern recognition; 2017. p. 4700–8.
3D model retrieval. Comput Graphic Forum 2003;22(3):223–32. doi:10.1111/ [73] Chollet F. Xception: deep learning with depthwise separable convolutions
1467-8659.00669. (2016). arXiv preprint arXiv:161002357 2016.
[53] Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal [74] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition.
Mach Intell 1986;PAMI-8(6):679–98. doi:10.1109/TPAMI.1986.4767851. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR);
[54] Liu D, Nabail M, Hertzmann A, Kalogerakis E. Neural contours: Learning to 2016. p. 770–8. doi:10.1109/CVPR.2016.90.
draw lines from 3d shapes. In: 2020 IEEE/CVF Conference on Computer Vision [75] Xie S, Girshick R, Dollr P, Tu Z, He K. Aggregated residual transformations for
and Pattern Recognition (CVPR); 2020. p. 5427–35. doi:10.1109/CVPR42600. deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pat-
2020.00547. tern Recognition (CVPR); 2017. p. 5987–95. doi:10.1109/CVPR.2017.634.
[55] Li M, Lin Z, Mech R, Yumer E, Ramanan D. Photo-sketching: Inferring contour [76] Han Z, Mo R, Yang H, Hao L. CAD assembly model retrieval based on
drawings from images. In: 2019 IEEE Winter Conference on Applications of multi-source semantics information and weighted bipartite graph. Comput Ind
Computer Vision (WACV); 2019. p. 1403–12. doi:10.1109/WACV.2019.00154. 2018;96:54–65. doi:10.1016/j.compind.2018.01.003.
113