By Bayes' code, the posterior odds of y = 1 might be shown as:

By Bayes’ code, the posterior odds of y = 1 might be shown as:

(Failure of OOD detection under invariant classifier) Consider an out-of-distribution input which contains the environmental feature: ? out ( x ) = M inv z out + M e z e , where z out ? ? inv . Given the invariant classifier (cf. Lemma 2), the posterior probability for the OOD input is p ( y = 1 ? ? out ) = ? ( 2 p ? z e ? + log ? / ( 1 ? ? ) ) , where ? is the logistic function. Thus for arbitrary confidence 0 < c : = P ( y = 1 ? ? out ) < 1 , there exists ? out ( x ) with z e such that p ? z e = 1 2 ? log c ( 1 ? ? ) ? ( 1 ? c ) .

Proof. Envision an aside-of-shipping input x out which have Meters inv = [ I s ? s 0 1 ? s ] , and Yards age = [ 0 s ? elizabeth p ? ] , then the feature symbolization is ? elizabeth ( x ) = [ z out p ? z elizabeth ] , in which p ‘s the device-standard vector discussed from inside the Lemma 2 .

Then we have P ( y = 1 ? ? out ) = P ( y = 1 ? z out , p ? z e ) = ? ( 2 p ? z e ? + log ? / ( 1 ? ? ) ) , where ? is the logistic function. Thus for arbitrary confidence 0 < c : = P ( y = 1 ? ? out ) < 1 , there exists ? out ( x ) with z e such that p ? z e = 1 2 ? log c ( 1 ? ? ) ? ( 1 ? c ) . ?

Remark: From inside the a standard instance, z aside is going to be modeled because a random vector which is in addition to the during the-shipments labels y = step 1 and you will y = ? step 1 and you will environment keeps: z aside ? ? y and you may z away ? ? z age . For this reason into the Eq. 5 i’ve P ( z away ? y = 1 ) = P ( z away ? y = ? 1 ) = P ( z out ) . Following P ( y = step one ? ? aside ) = ? ( dos p ? z elizabeth ? + diary ? / ( step one ? ? ) ) , same as in Eq. eight . Ergo all of our fundamental theorem nonetheless keeps not as much as a great deal more general instance.

Appendix B Expansion: Colour Spurious Relationship

To further examine our very own conclusions past record and you will sex spurious (environmental) enjoys, you can expect most experimental efficiency to your ColorMNIST dataset, because the revealed from inside the Figure 5 .

Investigations Task step 3: ColorMNIST.

[ lecun1998gradient ] , which composes colored backgrounds on digit images. In this dataset, E = denotes the background color and we use Y = as in-distribution classes. The correlation between the background color e and the digit y is explicitly controlled, with r ? . That is, r denotes the probability of P ( e = red ? y = 0 ) = P ( e = purple ? y = 0 ) = P ( e = green ? y = 1 ) = P ( e = pink ? y = 1 ) , while 0.5 ? r = P ( e = green ? y = 0 ) = P ( e = pink ? y = 0 ) = P ( e = red ? y = 1 ) = P ( e = purple ? y = 1 ) . Note that the maximum correlation r (reported in Table 4 ) is 0.45 . As ColorMNIST is relatively simpler compared http://datingranking.net/pl/black-singles-recenzja/ to Waterbirds and CelebA, further increasing the correlation results in less interesting environments where the learner can easily pick up the contextual information. For spurious OOD, we use digits with background color red and green , which contain overlapping environmental features as the training data. For non-spurious OOD, following common practice [ MSP ] , we use the Textures [ cimpoi2014describing ] , LSUN [ lsun ] and iSUN [ xu2015turkergaze ] datasets. We train on ResNet-18 [ he2016deep ] , which achieves 99.9 % accuracy on the in-distribution test set. The OOD detection performance is shown in Table 4 .