A 'lesser-known' approach to differentiation from first principles

Students learn at GCSE, or very early on in A level, that there are three main ways of writing a quadratic expression. Each way has its own advantages, particularly in terms of identifying features of quadratic graphs. Most students ought to be confident in their understanding and use of the following results:

Fully expanded form: y=ax2+bx+c	Sign of a tells us if it is ‘u-shaped’ or ‘n-shaped’. c is the y -intercept.
Factorised form: y = k(x – a) (x – .)	Sign of k tells us if it is ‘u-shaped’ or ‘n-shaped’. a and . are the x-intercepts.
Completed square form: y = r (x-p)2 + q	Sign of r tells us if it is ‘u-shaped’ or ‘n-shaped’. The vertex has coordinates (p,q).

A lesser-known fact is that, in the fully expanded form, y = ax2 + bx + c, the line y = bx + c is the tangent to the curve at the y-intercept, and hence is the gradient of the curve at this point.

Can you demonstrate this using a graph-plotter?
Can you explain this ‘algebraically’?

An interesting thought regarding this is that we’re obtaining information about the gradient of a curve at a given point, without using any standard ‘rules’ for differentiation!

The question is, then, can we apply this idea to find the gradient of a curve at points other than the -intercept? Let’s start by considering the curve y=x2 . The challenge of differentiation from first principles is to find the gradient of the tangent to this curve at any given point(a,a2) , without using the standard ‘rules’ for differentiation.

Students familiar with the basic ideas of graphical transformations will know that the graph y = (x+a)2 is a translation of the curve y = x2 by a units in the negative x-direction.

What happens to the point (a,a2) under this translation?
Can you explain why the gradient of y = x2 at (a,a2) is the same as the gradient of y = (x + a)2 at (0,a2)?

If we expand y = (x + a)2, we obtain y = x2 + 2ax + a2. From our earlier discussion, we can immediately say that the gradient of this curve at (0,a2) is 2a , and hence the gradient of y = x2 at(a,a2) is also 2a .

In other words, we’ve established the rule y = x2 . dy/dx = 2x from first principles, but without getting into the rather messy business of taking limits!

Can we extend this idea to, for example, y = x3 or, more generally, y = kxn ?

Consider any polynomial graph y = ao + a1x + … + a1xn . What is the tangent to this graph at the y-intercept?
Make sure that you can explain this ‘algebraically’ and demonstrate it using a graph-plotter for a few different examples.

If we consider the curve y = (x + a)3 , this is a translation of y = x3 and the gradient of y = (x + a)3 at its y-intercept must be the same as the gradient of y = x3 when x = a .

y = (x + a)3 = x3 + 3ax2 + 3a2x + a3

The gradient of this graph at the y-intercept is 3a2 and hence the gradient of y = x3 at (a, a3) must also be 3a2. In other words, y = x3 . dy/dx = 3×2

Finally, let’s consider the graph y = k(x + a)n. This is a translation of the graph y = kxn. Using the binomial expansion y = k(x + a)n = k(xn + naxn-1 + (n2) a2xn-2 + …+nan-1x+an).

The gradient of this graph at the y-intercept is knan-1 and hence the gradient of y = kxn at (a, kan) is also knan-1. In other words, y = kxn . dy/dx = knxn-1 .

This is the standard rule for differentiation, obtained from first principles without any apparent reference to the idea of limits! We should probably note that a rigorous justification of the idea of tangents at the y-intercept would still involve the use of limits.

A ‘lesser-known’ approach to differentiation from first principles