Saturday, June 12, 2010

False Alarm Probability to Likelihood Ratio Threshold

Let's start from something easier to understand. We know from the "likelihood ratio approach to classification" that

\[P({\bf x}|\omega_1)P(\omega_1) > P({\bf x}|\omega_2)P(\omega_2)\]
\[L_R({\bf x})=\frac{P({\bf x}|\omega_1)}{P({\bf x}|\omega_2)}>\frac{P(\omega_2)}{P(\omega_1)}=\tau_c\]
The $\tau_c$ provides us an automatic decision boundary that we can use to calculate the false alarm probability, P(FA), later.

In the previous post, we know that we can set up an arbitrary decision boundary by setting the P(FA). That is, we can either directly find the quantile P(FA) on the observations of non-target class $\omega_2$,

 xq=quantile(x(ind2),pfa);

or use the inverse cummulative distribution function at P(FA) if we assume the observations come from a probability distribution

 xi=norminv(pfa,mu2,sd2);

The question now is,  Can we derive the threshold from the P(FA)?  If so, the classification task will be just to compare the likelihood ratio of the two classes, and assign the class number when the ratio is greater than the threshold, which we know what the false alarm probability is.

It turns out that, if we do cross validation on $\omega_2$ and calculate likelihood ratio
\[L_r=\frac{P(x_i^2|\omega_1)}{P(x_i^2|\omega_2)}\]
of each testing $x_i$, we will get a distribution of $L_r$. From here, if we take the quantile $\hat q_{1-P(FA)}$ , it is the threshold at P(FA). (Higher quantile means higher threshold and thus perfers the target class, this implies lower P(FA) ).

The following code should demonstrate the concept of the two approaches of getting the threshold from P(FA). The first approach uses the quantile 1-P(FA) from the likelihood ratio and the second one directly gets the quantile P(FA) from data (blue) and then calculate the threshold .

% p(x|w1) ~ N(-1,1), p(w1)=0.6
% p(x|w2) ~ N(1,1), p(w2)=0.4
n=1000;
u=rand(1,n);
x=zeros(size(u));
ind1 = u<= 0.60;
ind2 = ~ind1;
n1=sum(ind1);
n2=sum(ind2);
x(ind1)=randn(1,n1)-1;
x(ind2)=randn(1,n2)+1; 


mu1=mean(x(ind1));
sd1=std(x(ind1));
xtrain = x(ind2);
lr2=zeros(1,n2);
 

% cross validation on w2
for i=1:n2
    xi=xtrain(i);
    xtrain2=xtrain;
    xtrain2(i)=[];
    mu2=mean(xtrain2);
    sd2=std(xtrain2);
    lr2(i)=normpdf(xi,mu1,sd1)/normpdf(xi,mu2,sd2);
end

pfa = 0.01:0.01:0.99;

% approach #1
threshold=quantile(lr2,1-pfa);


% approach #2
xq=quantile(x(ind2),pfa);
threshold2=normpdf(xq,mu1,sd1)./normpdf(xq,mu2,sd2);

plot(pfa,threshold,'r'); hold on;
plot(pfa,threshold2,'b'); hold off;


Both of them are almost identical, with some slightly differences.

>> sum(abs(threshold-threshold2))
ans =
    1.7067

No comments:

Post a Comment