Monday, December 10, 2007

When is something a "0" versus an "NA"?

In a recent talk in the department, a presenter wanted to code leadership transition outcomes according to a binary rule, where "transition with punishment" outcomes were coded as 1's and "transition with no punishment" outcomes were coded as 0's. The "transition with no punishment" cases included natural death or assassination while in office.

It seemed to me that this coding didn't make sense. When you have death in office, the case should be considered as censored, no? In these cases, you are not able to observe whether the outcome is truly "punishment" or not. Think about it. So the "death in office cases" are not really cases of "no punishment" but rather "not observed." Then, one needs to decide whether these cases should just be dropped from the analysis or whether a selection correction should be included.

I claim that a terrible thing to do is to lump censored observations into the "0" category. Here's my reasoning. We know that simply dropping cases where there is censored data (listwise deletion) leads to bias when the likelihood of censoring is dependent on the outcome (Y). But when censoring likelihood is dependent only on the explanatory factors (the X's) of interest, listwise deletion is not biased. But labeling censored cases as 0's can lead to bias if censoring is associated with either Y or the X's.

To see an example of how this works, consider the following contingency table:

Table 1: Table to Estimate Pr(Y=1|X=0) and Pr(Y=1|X=1) When No Censoring Is Present

----+---+---+
Y=1 | 2 | 2 |
----+---+---+
Y=0 | 2 | 2 |
----+---+---+
----|X=0|X=1|
----+---+---+

The relationship shown in Table 1, which is the "true" relationship since all data is observed, is that Pr(Y=1|X=0) equals Pr(Y=1|X=1). Now suppose that there is a 1/2 chance that data will be missing when X=0 and a 0 chance that it will be missing otherwise. Listwise deletion would produce the following table:

Table 2: Table to Estimate Pr(Y=1|X=0) and Pr(Y=1|X=1) Given Censoring Dependent on X and Listwise Deletion

----+---+---+
Y=1 | 1 | 2 |
----+---+---+
Y=0 | 1 | 2 |
----+---+---+
----|X=0|X=1|
----+---+---+

Table 2 leads us to infer correctly that Pr(Y=1|X=0) equals Pr(Y=1|X=1). However, if we assume the same censoring mechanism, then labeling censored observations as "Y=0" gives the following:

Table 3: Table to Estimate Pr(Y=1|X=0) and Pr(Y=1|X=1) Given Censoring Dependent on X and Labeling Censored Obs as Y=0

----+---+---+
Y=1 | 1 | 2 |
----+---+---+
Y=0 | 3 | 2 |
----+---+---+
----|X=0|X=1|
----+---+---+

Table 3 leads us to infer incorrectly that Pr(Y=1|X=0) does not equal Pr(Y=1|X=1). If the "true" relationship is one in which Pr(Y=1|X=0) does not equal Pr(Y=1|X=1), things can be similarly messed up (try it with simple contingency table examples). What a mess...

1 comment:

Anonymous said...

Nice story you got here. It would be great to read more about this topic. Thanx for giving that info.
Sexy Lady
London night life escort