Network Inversion and It
Network Inversion and It
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
Paper ID 18183
022 ditionally, we incorporate feature orthogonality as a regu- that learns the input space corresponding to different classes 060
023 larization term to boost image diversity which penalises the within a classifier using a single conditioned generator 061
024 deviations of the Gram matrix of the features from the iden- trained to generate a diverse set of samples from the in- 062
025 tity matrix, ensuring orthogonality and promoting distinct, put space with desired labels guided by a combination of 063
026 non-redundant representations for each label. The paper losses including cross-entropy, KL Divergence, cosine sim- 064
027 concludes by exploring immediate applications of the pro- ilarity and feature orthogonality. To ensure the generator 065
028 posed network inversion approach in interpretability, out- learns a diverse set of inputs, we alter the conditioning from 066
029 of-distribution detection, and training data reconstruction. simple labels to vectors and matrices that encode the label 067
030
information. This diversity is further reinforced through the 068
application of heavy dropout during the generation process, 069
specifically during up-convolution, and by minimizing the 070
031 1. Introduction cosine similarity between the features of the generated im- 071
ages as returned by the classifier. Additionally, we incor- 072
032 Neural networks have become indispensable in a wide array porate feature orthogonality as a regularization term, by pe- 073
033 of applications, ranging from image recognition and natu- nalizing deviations of the Gram matrix of the features from 074
034 ral language processing to autonomous driving and medi- the identity matrix. The orthogonality loss combined with 075
035 cal diagnostics. Despite their remarkable performance, the cosine similarity helps achieve a more varied and represen- 076
036 decision-making processes within these networks often re- tative set of generated images, each corresponding to differ- 077
037 main elusive, earning them the moniker ”black boxes.” This ent conditioning vectors. 078
1
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
079 Our methodology not only increases the diversity of the tings are studied in [21], where an attacker aims to infer 131
080 generated inputs but also provides deeper insights into the training data from a model’s predictions by training a sec- 132
081 decision-making processes of neural networks. By reveal- ondary neural network to perform the inversion, using the 133
082 ing the hidden patterns and features that influence network adversary’s background knowledge to construct an auxil- 134
083 predictions, we gain a more comprehensive understanding iary dataset, without access to the original training data. 135
084 of neural network behavior. This understanding is crucial
085 for several applications, including improving interpretabil- The paper [10] presents a method for tackling data- 136
086 ity, enhancing safety, and boosting adversarial robustness. driven optimization problems, where the goal is to find in- 137
087 We also demonstrate how our approach can be used to puts that maximize an unknown score function by propos- 138
088 generate interpretable decision boundaries from the features ing Model Inversion Networks (MINs), which learn an in- 139
089 of inverted images, providing a clearer view of the net- verse mapping from scores to inputs, allowing them to scale 140
090 work’s classification strategies. Furthermore, we explore to high-dimensional input spaces. While [1] introduces an 141
091 the applications of inversion in reconstructing training data automated method for inversion by focusing on the relia- 142
092 by exploiting unique properties of the training data rela- bility of inverse solutions by seeking inverse solutions near 143
093 tive to the classifier. We also employ inversion for out- reliable data points that are sampled from the forward pro- 144
094 of-distribution (OOD) detection by retraining the classifier cess and used for training the surrogate model. By incorpo- 145
095 with an additional class for “garbage” inverted samples, rating predictive uncertainty into the inversion process and 146
096 which aids in identifying data that do not belong to the train- minimizing it, this approach achieves higher accuracy and 147
097 ing distribution. robustness. 148
098 2. Related Works The traditional methods for network inversion often rely 149
on gradient descent through a highly non-convex loss land- 150
099 The concept of neural network inversion has garnered sig- scape, leading to slow and unstable optimization processes. 151
100 nificant attention as a method for visualizing and under- To address these challenges, recent work by [11] proposes 152
101 standing the internal mechanisms of neural networks. In- learning a loss landscape where gradient descent becomes 153
102 version seeks to identify input patterns that closely approx- efficient, thus significantly improving the speed and stabil- 154
103 imate a given output target, thereby revealing the informa- ity of the inversion process. Similarly [16] proposes an al- 155
104 tion processing capabilities embedded within the network’s ternate approach to inversion by encoding the network into a 156
105 weights. These methods reveal important insights into how Conjunctive Normal Form (CNF) propositional formula and 157
106 models represent and manipulate data, offering a pathway using SAT solvers and samplers to find satisfying assign- 158
107 to expose the latent structure of neural networks. Early re- ments for the constrained CNF formula. While this method, 159
108 search on inversion for multi-layer perceptrons in [8], de- unlike optimization-based approaches, is deterministic and 160
109 rived from the back-propagation algorithm, demonstrates ensures the generation of diverse input samples with desired 161
110 the utility of this method in applications like digit recogni- labels. However, the downside of this approach lies in its 162
111 tion highlighting that while multi-layer perceptrons exhibit computational complexity, which makes it less feasible for 163
112 strong generalization capabilities—successfully classifying large-scale practical applications. 164
113 untrained digits—they often falter in rejecting counterex-
114 amples, such as random patterns. Our approach to neural network inversion aims to strike a 165
115 Subsequently [7] expanded on this idea by proposing balance between computational efficiency and the diversity 166
116 evolutionary inversion procedures for feed-forward net- of generated inputs by using a carefully conditioned gener- 167
117 works that stands out for its ability to identify multiple in- ator trained to learn the data distribution in the input space 168
118 version points simultaneously, providing a more compre- of a trained neural network. The conditioning information 169
119 hensive view of the network’s input-output relationships. is encoded into vectors in a concealed manner to enhance 170
120 The paper [14] explores the lack of explanation capabil- the diversity of the generated inputs by avoiding easy short- 171
121 ity in artificial neural networks (ANNs) and introduces an cut solutions. This method is further enhanced through the 172
122 inversion-based method for rule extraction to calculate the application of heavy dropout during the generation process 173
123 input patterns that correspond to specific output targets, al- and the minimization of cosine similarity between a batch 174
124 lowing for the generation of hyperplane-based rules that ex- of the features of the generated images. This combination 175
125 plain the neural network’s decision-making process. [18] of techniques ensures a diverse representation of the input 176
126 addresses the problem of inverting deep networks to find in- space for any given output, thereby addressing the limita- 177
127 puts that minimize certain output criteria by reformulating tions of previous methods. Additionally, our approach is 178
128 network propagation as a constrained optimization prob- computationally less expensive compared to search-based 179
129 lem and solving it using the alternating direction method SAT solvers, making it more feasible for practical applica- 180
130 of multipliers. Model Inversion attacks in adversarial set- tions. 181
2
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
182 3. Methodology and allows for more ways to encode the label information 231
for a better capture of the diversity in the inversion process. 232
183 Our approach to Network Inversion uses a single carefully In its simplest form we use a Hot Conditioning Matrix in 233
184 conditioned generator that learns diverse data distributions which an N XN dimensional matrix is defined such that 234
185 in the input space of the trained classifier.
all the elements in a given row and column (same index) 235
186 3.1. Classifier across the matrix are set to one while the rest all entries are 236
zeroes. The index of the row or column set to 1 now serves 237
187 In this paper inversion and reconstruction is performed on a as the label for the conditioning purposes. The conditioning 238
188 classifier which includes convolution and fully connected matrix is concatenated with the latent vector intermediately 239
189 layers as appropriate to the classification task. We use after up-sampling it to N XN spatial dimensions, while the 240
190 standard non-linearity layers like Leaky-ReLU [20] and generation upto this point remains unconditioned. 241
191 Dropout layers [15] in the classifier for regularisation pur-
192 poses to discourage memorisation. The classification net- 3.2.4. Vector-Matrix Conditioning 242
193 work is trained on a particular dataset and then held in eval- Since the generation is initially unconditioned in Intermedi- 243
194 uation mode for the purpose of inversion. ate Matrix Conditioning, we combine both vector and ma- 244
195 3.2. Generator trix conditioning, in which vectors are used for early condi- 245
tioning of the generator upto N XN spatial dimensions fol- 246
196 The images in the input space of the classifier will be gener- lowed by concatenation of the conditioning matrix for sub- 247
197 ated by an appropriately conditioned generator. The gener- sequent generation. The argmax index of the vector, which 248
198 ator builds up from a latent vector by up-convolution opera- is the same as the row or column index set to high in the 249
199 tions to generate the image of the given size. While genera- matrix, now serves as the conditioning label. 250
200 tors are conventionally conditioned on an embedding learnt
201 of a label for generative modelling tasks, we given its sim- 3.3. Network Inversion 251
202 plicity, observe its ineffectiveness in network inversion and
The main objective of Network Inversion is to generate im- 252
203 instead propose more intense conditioning mechanism us-
ages that when passed through the classifier will elicit the 253
204 ing vectors and matrices.
same label as the generator was conditioned to. Achieving 254
205 3.2.1. Label Conditioning this objective through a straightforward cross-entropy loss 255
206 Label Conditioning of a generator is a simple approach to between the conditioning label and the classifier’s output 256
207 condition the generator on an embedding learnt off of the can lead to mode collapse, where the generator finds short- 257
208 labels each representative of the separate classes. The con- cuts that undermine diversity. With the classifier trained, 258
209 ditioning labels are then used in the cross entropy loss func- the inversion is performed by training the generator to learn 259
210 tion with the outputs of the classifier. While Label Condi- the data distribution for different classes in the input space 260
211 tioning can be used for inversion, the inverted samples do of the classifier as shown schematically in Figure 1 using a 261
212 not seem to have the diversity that is expected of the inver- combined loss function LInv defined as: 262
213 sion process due to the simplicity and varying confidence
214 behind the same label. LInv = α · LKL + β · LCE + γ · LCosine + δ · LOrtho 263
215 3.2.2. Vector Conditioning where LKL is the KL Divergence loss, LCE is the Cross En- 264
216 In order to achieve more diversity in the generated images, tropy loss, LCosine is the Cosine Similarity loss, and LOrtho 265
217 the conditioning mechanism of the generator is altered by is the Feature Orthogonality loss. The hyperparameters 266
218 encoding the label information into an N -dimensional vec- α, β, γ, δ control the contribution of each individual loss 267
219 tor for an N -class classification task. The vectors for this term defined as: 268
220 purpose are randomly generated from a normal distribution X P (i)
221 and then soft-maxed to represent an input conditioning dis- LKL = DKL (P ∥Q) = P (i) log 269
Q(i)
222 tribution for the generated images. The argmax index of the i
223 soft-maxed vectors now serves as the de facto conditioning X 270
224 label, which can be used in the cross-entropy loss function LCE = − yi log(ŷi ) 271
225 without being explicitly revealed to the generator. i
272
226 3.2.3. Intermediate Matrix Conditioning 1 X
LCosine = cos(θij ) 273
227 Vector Conditioning allows for a encoding the label infor- N (N − 1)
i̸=j
228 mation into the vectors using the argmax criteria. This can 274
be further extended into Matrix Conditioning which appar- 1 X
229 LOrtho = 2 (Gij − δij )2 275
230 ently serves as a better prior in case of generating images N i,j
3
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
276 where DKL represents the KL Divergence between the input 3.3.2. KL Divergence 303
277 distribution P and the output distribution Q, yi is the set KL Divergence is used to train the generator to learn the 304
278 encoded label, ŷi is the predicted label from the classifier, data distribution in the input space of the classifier for dif- 305
279 cos(θij ) represents the cosine similarity between features ferent conditioning vectors. During training, the KL Diver- 306
280 of generated images i and j, Gij is the element of the Gram gence loss function measures and minimise the difference 307
281 matrix, and δij is the Kronecker delta function. N is the between the output distribution of the generated images, as 308
282 number of feature vectors in the batch. predicted by the classifier, and the conditioning distribution 309
283 Thus, the combined loss function ensures that the gen- used to generate these images. This divergence metric is 310
284 erator matches the input and output distributions using KL crucial for aligning the generated image distributions with 311
285 Divergence and also generates images with desired labels the intended conditioning distribution. 312
286 using Cross Entropy, while maintaining diversity in the gen- 3.3.3. Cosine Similarity 313
287 erated images through Feature Orthogonality and Cosine
To enhance the diversity of the generated images, we use 314
288 Similarity.
cosine similarity to assesses and minimises the angular dis- 315
tance between the features of a batch of generated images 316
289 3.3.1. Cross Entropy across the last fully connected layers, promoting variability 317
in the generated images. The combination of cosine simi- 318
290 The key goal of the inversion process is to generate images larity with cross-entropy loss not only ensures that the gen- 319
291 with the desired labels and the same can be easily achieved erated images are classified correctly but also enforces di- 320
292 using cross entropy loss. In cases where the label infor- versity among the images produced for each label. 321
293 mation is encoded into the vectors without being explicitly
294 revealed to the generator, the encoded labels can be used 3.3.4. Feature Orthogonality 322
295 in the cross entropy loss function with the classifier outputs In addition to the cosine similarity loss, we incorporate fea- 323
296 for the generated images in order to train the generator. In ture orthogonality as a regularization term to further en- 324
297 contrast to the label conditioning, vector conditioning com- hance the diversity of generated images by minimizing the 325
298 plicate the training objectives to the extent that the genera- deviation of the Gram matrix of the features from the iden- 326
299 tor does not immediately converge, instead the convergence tity matrix. By ensuring that the features of generated im- 327
300 occurs only when the generator figures out the encoded con- ages are orthogonal, we promote the generation of distinct 328
301 ditioning mechanism allowing for a better exploration of the and non-redundant representations for each conditioning la- 329
302 input space of the classifier. bel. 330
4
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
Figure 2. Inverted Images for all 10 classes in MNIST, FashionMNIST, SVHN & CIFAR-10 respectively.
332 In this section, we present the experimental results obtained In this section we study the applications of our pro- 371
333 by applying our inversion technique on the MNIST [4], posed network inversion approach in interpretability, out- 372
334 FashionMNIST [19], SVHN and CIFAR-10 [9] datasets by of-distribution (OOD) detection, and training data recon- 373
335 training a generator to produce images that, when passed struction. 374
336 through a classifier, elicit the desired labels. The classifier
337 is initially normally trained on a dataset and then held in 5.1. Interpretability 375
338 evaluation for the purpose of inversion and reconstruction. In interpretability, we analyze the features of the inverted 376
339 The images generated by the conditioned generator corre- samples generated after training the generator to an Inver- 377
340 sponding to the latent and the conditioning vectors are then sion Accuracy of over 95%. Inversion Accuracy refers to 378
341 passed through the classifier. the percentage of images generated with desired labels same 379
342 The classifier is a simple multi-layer convolutional neu- as the output labels from the classifier. Figure 3 shows the 380
343 ral network consisting of convolutional layers, dropout lay- PCA plots, decision boundaries and t-SNE plots respec- 381
344 ers, batch normalization, and leaky-relu activation followed tively for the features in the penultimate layer of the clas- 382
345 by fully connected layers and softmax for classification. sifier for the inverted samples. 383
346 While the generator is based on Vector-Matrix Condition- To evaluate the diversity of features corresponding to 384
347 ing in which the class labels are encoded into random soft- each class, PCA (Principal Component Analysis) plots il- 385
348 maxed vectors concatenated with the latent vector followed lustrate the distribution of class-specific features across the 386
349 by multiple layers of transposed convolutions, batch nor- feature space. This spread aligns with our objective of gen- 387
350 malization [6] and dropout layers [15] to encourage diver- erating diverse inputs that represent various characteristics 388
351 sity in the generated images. Once the vectors are upsam- within a particular class, ensuring a comprehensive under- 389
352 pled to N XN spatial dimensions for an N class classifica- standing of each class’s feature space. We further visual- 390
353 tion task they are concatenated with a conditioning matrix ize interpretable decision boundaries by mapping the PCA- 391
354 for subsequent generation upto the required image size of transformed features onto a mesh grid and performing infer- 392
355 28X28 or 32X32. ence on this feature mesh grid allowing us to highlight the 393
356 The inverted images are visualized to assess the quality regions associated with different classes. This visualization 394
357 and diversity of the generated samples in Figure 2 for all reveals how features from inverted samples influence the 395
358 10 classes of MNIST, FashionMNIST, SVHN and CIFAR- network’s classification criteria, providing insight into the 396
359 10 respectively. While each row corresponds to a different model’s internal decision-making structure. 397
360 class each column corresponds to a different generator and Additionally, we employ t-SNE (t-distributed Stochastic 398
361 as can be observed the images within each row represent the Neighbor Embedding) plots to explore the distinct distribu- 399
362 diversity of samples generated for that class. It is observed tions from which the inverted images are generated. As can 400
363 that high weightage to cosine similarity increases both the be observed from the t-SNE plots, for a specific value of the 401
364 inter-class and the intra-class diversity in the generated sam- hyperparameters in our loss function, images for each class 402
365 ples of a single generator. These inverted samples that are originate from two separate distributions. We also observe 403
366 confidently classified by the generator are unlike anything that increasing the weightage of the cosine similarity term 404
367 the model was trained on, and yet happen to be in the input in the loss function further enhances the diversity among in- 405
368 space of different labels highlighting their unsuitability in verted samples, leading to the emergence of multiple clus- 406
369 safety-critical tasks. ters within each class. 407
5
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
Figure 3. PCA, Decision Boundaries and t-SNE plots of the features extracted from the generated images. Each color represents a different
class. While the PCA plot illustrates the spread of a class in the feature space, t-SNE plot shows the two different clusters for each class.
408 Lastly, we also use Sparse Autoencoders (SAEs) [13], to this ”garbage” class, further augmenting the dataset with 443
409 [12] on the features in the penultimate layer of the classi- a wider variety of OOD examples. This iterative retrain- 444
410 fier for the inverted samples to identify interpretable feature ing process helps the classifier better recognize and distin- 445
411 patterns. While SAEs have traditionally been applied to an- guish the boundary between legitimate (in-distribution) and 446
412 alyze features of the actual training data, our application of spurious (OOD) data, improving its robustness in practical 447
413 SAEs on a diverse set of inverted samples reveals distinct applications. 448
414 feature activations for different data distributions generated To manage the data imbalance introduced by the addition 449
415 by various generators within the same class. Notably, the of the garbage class, we employ a weighted cross-entropy 450
416 set of features activated for the training data differs from loss function. This function assigns varying weights to each 451
417 those activated for random inverted samples even within the class, which are dynamically adjusted after each epoch to 452
418 same class, suggesting that this approach could potentially reflect the evolving class distribution. This weighted ap- 453
419 be used in anomaly detection. proach ensures that the classifier continues to learn effec- 454
tively from the augmented dataset while addressing any im- 455
420 5.2. Out-of-Distribution Detection balance caused by the inclusion of inverted samples. By 456
421 Classifiers excel at distinguishing between different classes reinforcing the classifier’s ability to generalize and detect 457
422 with high accuracy, but they often fail when confronted OOD samples, this method mitigates the influence of class 458
423 with out-of-distribution (OOD) samples—inputs that devi- imbalances, thereby enhancing overall robustness and accu- 459
424 ate from the training distribution. These counterexamples racy in handling OOD detection tasks. 460
425 are frequently misclassified with high confidence, exposing Out-of-distribution (OOD) detection experiments were 461
426 a critical weakness in the model’s generalization capabili- conducted on models trained on MNIST and tested for OOD 462
427 ties. Therefore, it is crucial to make classifiers more robust detection on FMNIST, and vice versa. Similarly, mod- 463
428 to counterexamples and enable them to flag OOD samples els trained on SVHN were tested on CIFAR10, and vice 464
429 into a separate class. versa. These experiments evaluate the effectiveness of the 465
430 To address this issue, we leverage network inversion to proposed approach in identifying OOD samples and assign- 466
431 generate OOD samples from the classifier’s input space ing them to the ”garbage” class. In these experiments, we 467
432 for each label. These inverted samples, which represent observe that while the majority of OOD samples are cor- 468
433 data outside the classifier’s learned distribution, are then as- rectly assigned to the garbage class, a small percentage of 469
434 signed to a designated ”garbage” class. This process al- the samples can still be misclassified into in-distribution 470
435 lows the classifier to explicitly recognize and isolate spuri- classes. However, a significant finding is that the least con- 471
436 ous data points, improving its robustness and reliability. fidently classified in-distribution sample is still more confi- 472
437 To implement this approach, we begin by training the dently classified compared to the most confidently misclas- 473
438 classifier with an extra ”garbage” class that initially con- sified out-of-distribution sample, suggesting the existence 474
439 tains samples generated from random Gaussian noise. This of a clear threshold. This clear distinction in confidence lev- 475
440 step provides a baseline for the classifier to identify non- els demonstrates the robustness of the proposed approach 476
441 specific, out-of-distribution patterns. After each training in separating in-distribution and OOD data, highlighting its 477
442 epoch, inverted samples generated for each class are added potential for reliable OOD detection in practical scenarios. 478
6
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
479 5.3. Training-Like Data Reconstruction els generalize well, reconstructions are typically more dif- 504
ficult. Also in fully connected layers, each input feature is 505
480 We also demonstrate the utility of our proposed network assigned dedicated weights, which may make reconstruc- 506
481 inversion technique for reconstructing training-like data en- tion easier as the model captures more direct associations 507
482 tirely from the weights of the single classifier without any between inputs and outputs. While as in convolutional lay- 508
483 insight of the training process. The paper [5] introduces ers, due to the weight-sharing mechanism, where the same 509
484 a novel reconstruction method based on the implicit bias set of weights is applied across different parts of the input, 510
485 of gradient-based training methods to reconstruct training the reconstruction becomes more challenging. 511
486 samples specifically focusing on binary MLP classifiers.
487 Later [3] extend on these results by perfroming training data Network Inversion can be used for training data recon- 512
488 reconstruction in a multi-class setting on models trained on struction as shown schematically in Figure 4 by exploiting 513
489 even larger number of samples. The paper [2] addresses the key properties of the training data in relation to the classi- 514
490 issue of whether an informed adversary, who has knowl- fier that guide the generator towards producing training-like 515
491 edge of all training data points except one, can success- data including model confidence, robustness to perturba- 516
492 fully reconstruct the missing data point given access to the tions, and gradient behavior along with some prior knowl- 517
493 trained machine learning model. Subsequenlty [17] inves- edge about the training data. 518
494 tigates how model gradients can leak sensitive information In order to take model confidence into account, we use 519
495 about training data, posing serious privacy concerns. hot conditioning vectors in reconstruction instead of soft 520
496 While in [5] training data is reconstructed for binary conditioning vectors used in inversion encouraging the gen- 521
497 multi-layer perceptron classifiers trained using binary cross eration of samples that elicit high-confidence predictions 522
498 entropy, we steer network inversion to reconstruct data sim- from the model. Since the classifier is expected to han- 523
499 ilar to the training set for Convolutional Neural Nets. In dle perturbations around the training data effectively, the 524
500 restricted settings, over-parameterized models can easily perturbed images should retain the same labels and also 525
501 memorize portions of the training data, leading to suc- be confidently classified. To achieve this, we introduce an 526
502 cessful reconstructions. For under-parameterized models, L∞ perturbation to the generated images and pass both the 527
503 where there is no possibility of memorization and the mod- original and perturbed images represented by dashed lines, 528
7
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
Figure 5. Reconstructed Images for all 10 classes in MNIST, FashionMNIST, SVHN and CIFAR10 respectively .
529 through the classifier and use them in the loss evaluation. FashionMNIST reconstructions were performed for models 560
530 We also introduce a gradient minimization loss to penalise trained on datasets of size 1000, 10000 and 60000, while as 561
531 the large gradients of the classifier’s output with respect to for SVHN and CIFAR-10, on datasets of size 1000, 5000, 562
532 its weights when processing the generated images ensuring and 10000. The reconstruction results using three differ- 563
533 that the generator produces samples that have small gradient ent generators on each of the three dataset sizes on all four 564
534 norm, a property expected of the training samples. Further- datasets are shown in Figure 5 along with a column of rep- 565
535 more, we incorporate prior knowledge through pixel con- resentative training data. In case of SVHN we held out a 566
536 straint and variational losses to ensure that the generated cleaner version of the dataset in which every image includes 567
537 images have valid pixel values and are noise-free ensuring a single digit. While as in case of CIFAR-10 given the low 568
538 visually realistic and smooth reconstructions. resolution of the images the reconstructions in some cases 569
539 Hence the previously defined inversion loss LInv is aug- are not perfect although they capture the semantic structure 570
540 mented to include the above aspects into a combined recon- behind the images in the class very well. 571
541 struction loss LRecon defined as:
6. Conclusions 572
542 LRecon = α · LKL + α′ · Lpert ′ pert
KL + β · LCE + β · LCE
543 + γ · LCosine + δ · LOrtho This paper introduced a novel approach to network inver- 573
544 + η1 · LVar + η2 · LPix + η3 · LGrad sion, utilizing a conditioned generator to generate a diverse 574
set of inputs with desired output labels. By shifting from 575
where Lpert pert simple label conditioning to vector encoding and incorpo- 576
545 KL and LCE represent the KL divergence and
546 cross-entropy losses applied on perturbed images, weighted rating heavy dropout during the generation process, our 577
547 by α′ and β ′ respectively while LVar , LPix and LGrad repre- method complicates the conditioning mechanism, encour- 578
548 sent the variational loss, Pixel Loss and penalty on gradient aging the generator to explore a more extensive range of the 579
549 norm each weighted by η1 , η2 , and η3 respectively and de- data distribution. 580
550 fined for an Image I as: In interpretability, it provides insights into the internal 581
decision-making patterns of neural networks by analyz- 582
N ing the features of inverted samples, visualizing decision 583
1 X X
551 LVar = (Ii,h+1,w − Ii,h,w )2 boundaries, and identifying diverse feature representations 584
N i=1
h,w through tools like PCA, t-SNE, and Sparse Autoencoders. 585
!
For out-of-distribution (OOD) detection, network inversion 586
552 + (Ii,h,w+1 − Ii,h,w )2 is used to generate OOD samples that are assigned to a 587
designated ”garbage” class, improving the classifier’s ro- 588
553 bustness by distinguishing in-distribution and OOD sam- 589
554 LGrad = ∥∇θ L(fθ (I), y)∥
ples with a clear confidence threshold. Lastly, in training 590
data reconstruction, the method reconstructs training-like 591
X X data using the classifier’s weights, exploiting model con- 592
555 LPix = max(0, −I) + max(0, I − 1) fidence, gradient behavior, and prior knowledge to generate 593
data semantically similar to the training data. 594
556 The reconstruction experiments were carried out on models Future work will aim to quantify the aspects of the in- 595
557 trained on datasets of varying size and as a general trend the version technique and explore its potential in enhancing 596
558 quality of the reconstructed samples degrades with increas- interpretability and robustness across various real-world 597
559 ing number of the training samples. In case of MNIST and tasks. 598
8
CVPR CVPR
#18183 #18183
CVPR 2025 Submission #18183. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.