Main result and example with a nonlinear function ● A nonlinear function is a function where the relationship between the input and output is not a straight line. The graph of a nonlinear function looks more like anS-shape or an L-shape than a straight line. ● The graph of a linear function looks like a straight line. The graph of any other type of function can be classified as either linear or nonlinear. So that was a warm-up for understanding non-linear functions. The ideas you learned canbe applied to a non-linear function because it is well approximated by a linear function. For example, let us retell the previous story to understand this nonlinear function. We cango back to the function f at the beginning that we approximated with a linear model. The function f of (x, y) is y squared minus x cubed minus x. The approximation for f of (1plus delta x, 1 plus delta y) was approximately negative 1 minus 4 delta x plus 2 delta y. Here is a question that is similar to this one but concerns this more complex function. Let us begin at (1, 1), the point at which we did our linear approximation. We can move inany direction by a small distance, 0.1. How can we maximize the function f? And why doesit say 0.1 instead of 1? The function f looks a lot like a linear function as long as you are in a small box—forexample, if you only move by about 0.1. And it may not appear to be a linear function of your move by 1. And the effect of yourmove be 10 may not appear to be linear. If we are only talking about moving a little bit, then under certain circumstances we canapply this linear approximation and transform our problem into one that is easier to solve.Therefore, the vector that we have moved by is (delta x, delta y). Its length is 0.1. The value of f equals negative 1 minus 4 times delta x plus 2 times delta y.
We would like the change in f to be as big as possible. It appears to be a dot product onceagain. Thus, the change in f is negative 4 delta x plus 2 delta y. This can be written as (negative4, comma 2) dotted with (delta x, delta y). Actually, the derivatives of f at that point with respect to x and y are the gradient of f at (1,1). We can then use the dot product formula to find the norm of the gradient of f times thenorm of (delta x, delta y) times cos(theta), where (theta) is the angle between f and (deltax, delta y). And we're moving by 0.1, so we know that this is 0.1. Therefore, it's the norm of thegradient of f times 0.1 times cosine of theta. This is most pronounced when the cosine of theta equals 0. When the vector (delta x, delta y) is in the same direction as the gradient, the function islargest. Now we can make a summary that is similar to the one we made for linear functions.Where shall we put it? In summary, the gradient points straight uphill in the direction of the steepest increase. The magnitude of a gradient is the slope. A steep hill has a large magnitude, and a shallowhill has a small magnitude. Another way to say this is that if we move a distance d in the direction of the gradient, thenf increases by the slope times d—the norm of this gradient times d. We can see this bylooking at our formula for f over here. To get the biggest increase, we want to choose theta equals 0. This means that they willgo in the direction of the gradient. The cosine of theta equals 1. The change in f is the size of the gradient times how far wemoved.
You may ask, why is the slope the magnitude of the gradient and not the product of themagnitude of the gradient and distance? The question is why the slope of a line is equal to the magnitude of its gradient, and notequal to the magnitude of its gradient times 0.1. Let's vizualize a slope. What is it? Slope is here. Consider measuring it. If this is 1, this oneprobably is 1.5. So if this is 2, that appears to be 3. The slope of a line is rise over run, which in this case is 1.5. The slope formula is versatile because it can be used with any combination of run and rise. We would have different vertical displacements, but we would always have the sameslope. This slope is important because it describes how the line is angled. When we look back at the expression cosine and theta being 1, this is the rise. The expression is the magnitude of delta x delta y. This is the run. So, this is the graph of f, and this is height. And then we changed x and y, and moved adistance 0.1 there. And then we saw the change in f. What if we have problems with realizing what the rise and run are from this 2D picture?How does a flat, two-dimensional picture translate into a three-dimensional representationof x, y, and z? So the question is, how does a 2D picture compare to a 3D object? The truth is that x and y are down here, z is coming this way, and what do we have. We begin with a point in the xy-plane, and then another. The resulting graph is displayedbelow.
At these points, the function has a value and the value clearly is bigger. The differencebetween the values of each point is the rise, while the distance between the two pointsthemselves is called run. Is the rise just cosine theta equals 1? Entire blue thing is the rise. This is the graph of f. Tocompute the rise, we determine how much larger f is at this point compared to f at thatpoint. Rephrasing, it's the change in f. We computed the change in f with linear approximation.This is the old value of f. This is the change in f and we simplify this and we got that.