Medical Testing Example At the end of this lecture, you'll understand how these terms are relevant in thecontext of a medical test. You'll also find them useful when we discuss machinelearning. Let's consider a hypothetical condition called Very Bad Syndrome (VBS), a conditionthat scientists have just developed tests for. Let's suppose that there is a set ofpeople who are taking the test that tells you whether or not you have VBS. Let'sdivide up this set into two subsets: S will be equal to the set of people x in X whohave VBS; H will stand for "healthy" and will denote the set of people x in X who donot have VBS. Now, let's note that X equals S union H, because whether or not a person has VBS,he or she either does or does not have it. There is no one who both has it and doesnot have it. Okay, in some sense the whole point of medical testing is to figure outwhether a person is in S or H. Let's think about what the test tells us: people who testpositive for VBS are in P; those who test negative for VBS are in P'. That is, they come into the lab, they take the test, and whatever marker the test has for positive,they test positive. The doctor looks at the test and says, the test says you have VBS.it's very important to realize that is a different concept than, you have VBS. All thatwe're saying is you test positive for VBS.And let's let N, N for negative, be the set of x in X. Thus, x is negative for VBS. Thus, you take the test again and your doctor orclinician looks at it. If it turns red or whatever the test needs to do, then you areconsidered to be negative for VBS. This does not mean that you are in the clear; itsimply means that your test results have returned as negative. Again, assuming thatthe test is deterministic, we have P union N = everyone and P intersect N = no one(just like we had S union H = everyone and S intersect N = no one).In an ideal world, the sick would be the ones who test positive, and those who testedpositive would be sick. In reality, we cannot always assume this to be true. There arefour intersections that help us talk about those discrepancies. Let's consider Sintersect P; let's consider H intersect N. Let us consider S intersect N. And let us consider H intersect P. What do these testsrepresent? We must consider their size and how they relate to the real world. First,let us consider S intersect P. This means you are in S. That means you have VBS,and it means you are in P, which means you test positive. These are what are oftencalled the true positives. The bad news is that you have VBS; however, the goodnews is that the test accurately told you that you had VBS, so perhaps you can seektreatment. The second test—that is, H intersect N—means that you do not have VBS
because you are in H: healthy. And it means that you are an N: negative (in regardsto this disease). These are true negatives. The good news is that you don't have the disease. The somewhat good news is thata test told you this, so now you don't need to worry about it.The next two sets are ones we would like to avoid. We would prefer that these twosets were empty and that the next two sets were filled with zeroes, but this is notalways the case. A false negative is when you test negative for a disease but trulyhave it.
A false negative result occurs when a person has a disease, but the test indicatesthat he or she does not have it. These people are given false hope because they donot receive the information about their disease that could help them take action totreat it. One common reason for a false negative result is that there is too muchoverlap between the populations being studied and the group being tested; anothercommon reason is that the test itself is unreliable. A false positive result occurs when
a person does not have a disease but tests positive for it; this results from problemswith both populations being tested and with the test itself.Although some people in S intersect N may not have the disease, they are stillworried about it. Some people may even receive treatment that has side effects withno payoff, because they do not have the disease. Comparing the cardinalities of setsallows us to talk about m (the number of elements within a set) which is useful inmedical testing. And we can even generalize this idea to other areas of machinelearning later in this course and into future classes. Let's consider the ratio of the number of people with VBS in the study (thenumerator) to the total number of people in the study (the denominator). This ratiomust be less than or equal to 1, because everyone who is in the study has VBS.Therefore, we can calculate a prevalence rate for VBS among the participants. When designing a study, the goal is to collect data about the people in the study thatmirrors data about people in a much larger population. If a study takes place at aVBS clinic, for example, then it is safe to assume that the proportion of people withVBS in that sample will be close to 1. However, this does not mean that the trueproportion of people in the United States with VBS is also 1. What should we get, by the way, when we add these two quantities? Think about it asecond, we better get 1, because you either have it or you don't. Let's think about this cardinality of S intersect P. Remember what those were, thosewere the true positives. Divided by the cardinality of S, so what are those? In thenumerator, on top, we have the number of people who are true positives, and in thedenominator we have the number of people who are sick. This is what is called thetrue positive rate. A number we would like to be small is the false-positive rate, or the proportion ofhealthy people who test positive for a disease. Let's look at H intersect P. Thecardinality of that, divided by the cardinality of H. In the numerator, we have thenumber of false positives: people who are actually healthy but test positive for adisease. In the denominator, we have the number of people who are actually healthy.This is called the false-positive rate. We would like the false positive rate to be as close to 0 as possible. And so wecompare the size of S intersect N, divided by the size of S, with the true negativerate, which is H intersect N divided by H. To summarize, medical testing theory can be simplified into one sentence: The true positive rate and the true negative rate should ideally be 1 and 0 respectively;however, this never happens. The true positive rate and the false negative rate aregenerally close to 1, while the false positive rate is generally close to 0. Later, whenyou start thinking about business analytics, it may be helpful to consider what types
of false positive rates are acceptable. It's important to realize that this conceptapplies far beyond medical testing. If something is either true or false—either youhave a disease or you don't—you can use vocabulary like "positive" or "negative" todescribe your test results.