Cartesian Plane - Distance Formula | Duke University

University
Duke University
Course
Data Science Math Skills
Pages

5

Academic year

2022
Author

KatrCrayon
Views

98

Cartesian Plane - Distance Formula First, we will review the distance formula and then introduce some new concepts.First, the distance formula: If we have two points in the Cartesian plane, withcoordinates (x1, y1) and (x2, y2), then the distance between them is given by thePythagorean Theorem. The Pythagorean Theorem is a statement of the mathematical relationship betweenthe lengths of the sides of a right triangle and its hypotenuse; specifically, that thesquare of the hypotenuse is equal to the sum of the squares of the other two sides.This relationship can be written as z^2 = x^2 + y^2, which is equivalent to saying thatz is equal to √(x^2 + y^2). This formula can be used to calculate distance bysubtracting y from both sides, which will yield z = x^2 − y^2. Suppose we have a right triangle (a triangle where one of the angles is 90 degrees).Let's assume that we've drawn it so that it's very close to being a right angle. We willnow consider two points on this plane: Point A, which is equal to (a, b), and Point C,which is equal to (c, d). We will draw a line segment between these two points sothat it passes through the point D, which is equal to (a+c)+(b+d). Now we want toask: How far apart are Points A and C? What the distance formula says is that thedistance between A and C—that is, Dist(A,C)—is given by the square root of themathematical expression "difference in x values" plus "difference in y values." Whywould that be true?Let us draw a right triangle. Let us take this point here and draw a dotted line, andthen draw another dotted line there. If you recall the Points in the Plane, we can

prove that this point has the same y-coordinate as A and has the same x-coordinateas C; this point is indeed (c,b). Therefore, the length of this is c minus a, and thelength of this is d minus b. So we have a right triangle with sides lengths c minus aand d minus b. Therefore, the length of this hypotenuse, once this hypotenuse is, ofcourse, is the distance between A and C, is given by this formula down here. Let's work through some examples. For this problem, we have some points on theplane and we'll compute the distance between them. So let's start with point A is (1,1) and let's take point B way up here, not to scale. It is (5, 4). We can compute thedistance between A and B by using the distance formula: square root of thedifference in x values squared times x values squared. Now we have to do a little bitof arithmetic: x value squared minus 1^2 plus 4^2 equals 16+9 equals 25 whichmagically works out to be 5. So that means that the length of this line between A andB, let's draw it in, the length of that line is five.It is five units apart in the x direction,which is interesting because it does not follow that you need to go five units in the xdirection to get from A to B. You also do not need to go five units in the y direction.They are fairly far away from each other. Let's also draw the origin. This is the pointbig O, this is (0,0). And let's compute the distance between A and the origin. It'sequal to the square root, so the distance between the x-values, so (1-0)^2 + (1-0)^2.

Stop for a second by the way and point out that (1-0)^2 is the same thing as (0-1)^2.That is, it doesn't matter whether you do the x-value of A minus the x-value of O orthe x-value of O minus the x-value of A, which makes sense because the distancefrom A to zero should be the same as distance from O to A, should be symmetric.Soif we work this out, this is just the square root of two. In other words, for the fans ofthe Pythagorean Theorem, that length there is square root of two, there's a righttriangle. OK, let's make one more point. Let's look at D equals (1,3/2), so the distance of thatline. Now, here you don't really need a fancy formula; you notice the only differencebetween them is an x-value. It's pretty clear the distance between A and D is just3/2-1, just a half. So let's use the distance formula for this problem: the set S, whichis equal to the origin, B, and D. Notice I just computed the distances from A to thesethree points O, B and D. The distance from A to O is 1.4; the distance from A to D isapproximately 0.5; and the distance from A to B is 5. The distance between A and Dis one half. The square root of two equals approximately 1.4, so we have these threedistances here. Here is the key concept: consider the set S, which is equal to theorigin, B and D. Notice that we computed the distances from A to these three points

O, B and D: The distance from A to O is 1.4; the distance from A to D is one half or0.5; and the distance from A to B is five. This shows that the nearest neighbor of A inS is D because it's the nearest point; the second nearest neighbor of A in S is O,since it's not so far away; and finally, B is farthest away from A in S. That'ssomething we often use in data science when we have these three points O, D andB—we want to say if A had to be most like one of them, which one would it be? Inthis case we see that if you choose point D, you'd be correct because that's whatwas used in this example. One last little use of distance formulas that we use in data science is the idea ofclustering. Let's suppose we have a set of points in the plane. So here are many,many points that look like this, and let's say here is another bunch of points that looklike that, and say another clump over here. Visually, we might say there are threeclusters of points, or clumps. We did not define what a cluster or clump is; however,it looks like we have three groups. Over here there is cluster one, cluster two andcluster three. If these were people measured by some blood measurement orsomething like that, we could say that there are three groups, group one, group twoand group three. A distance measurement can be used to express the degree ofseparation between two points. If the points A, B and C are all in cluster one, but

point D is in cluster three, we might say that the distance between A and B is muchless than the distance between A and C, which is also much less than the distancebetween A and D. So having this distance formula, this distance metric, often allowsyou to break points up into stereotypical clusters or clumps, and somehow, whateverthese are measuring, A and B are much, much more similar than A is to C and A isto D.