Root finding Algorithms

Every aspect of our lives is bombarded with problems that require solutions. From a simple tasks as travelling, weekend planning, playing etc we desire to utilize the information available and carry out the task efficiently with minimum cost incurred. All these are problems which need optimal solutions.

In this section we would use define simple math problems and ways to solve them. Then later on tackle real world problems and ways to go around them.

The first class of numerical methods we will discuss are root finding methods, which approximate the roots (or zeroes) of a function. Suppose that a function \(f\) is a continuous function, a root of \(f\) is a solution to the equation \(f (x) = 0\). Graphically, a root is where the graph \(y = f (x)\) crosses the x-axis.

Often simple problems have analytic solutions. For example, to solve the roots of\(f(x) = x^2-5x + 6\) we can use any of the known methods, ie factorization, completing square or even the quadratic formula. and we would end up having \(x=2\) and \(x = 3\) as the roots. We could simply code the quadratic formula in R and each time we have a quadratic equation, we could easily find its roots:

\[x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}\]

In R we could do:

root <- function(a, b, c){
  (-b +c(-1,1)*sqrt(b^2 - 4*a*c))/(2*a)
}
root(1,-5,6)

[1] 2 3

The funtion root above computes the solution to most of the quadratic functions. Of course the function will not work for a quadratic with with a negative discriminant.

root(5,-9,5)

Warning in sqrt(b^2 - 4 * a * c): NaNs produced

[1] NaN NaN

Thus we can simply convert the discriminant to complex by adding a complex 0 to it and hence making the function work in all instances.

root <- function(a, b, c){
  (-b +c(-1,1)*sqrt(b^2 - 4*a*c + 0i))/(2*a)
}
root(1,-5,6)

[1] 2+0i 3+0i

root(5,-9,5)

[1] 0.9-0.4358899i 0.9+0.4358899i

Instead of writing our own function, we could use polyroot function to find the roots of any polynomial function, ie linear, quadratic,cubic, quartic, quintic, biquadratic etc. We do not have to write our own function:

polyroot(c(6,-5,1)) #x^2 - 5x + 6 = 0

[1] 2+0i 3-0i

polyroot(c(-18,33,-18,3)) # 3x^3 -18x^2 + 33x -18 = 0

[1] 1-0i 2+0i 3-0i

Numerical solutions

Most of these functions are in a simplified form. What if we had functions that are not polynomial in nature? or a polynomial not in its simplest form? or even a function with no analytic solution? To solve these, we use controlled trial and error method, each time updating the solution depending on how close our previous solution was. We repeat the procedure many times until we converge to a solution that works.

There are various methods on how the iteration is done. These have different names. Lets look at one simple example. Suppose we want to find \(\sqrt{2}\). We know that the solution lies between 0 and 2. Then our tial and error method will go as follows.

\[ \begin{aligned} \operatorname{begin} &= 0, \quad\operatorname{end}=2\\ x_1 &= \frac{0 + 2}{2} = 1 ~~ie~~ \text{midpoint between }0 \text{ and }2 \\ \operatorname{error} &= 2-1 = 1\\ \text{Since error}&>0, \text{we set }\operatorname{begin=}1\quad\\&\text{and compute our next viable point}\quad x_2 = \frac{1+2}{2} = 1.5 ~~ie~~ \text{midpoint between }1 \text{ and }2 \\ \operatorname{error} &= 2-1.5^2 = -0.25\\ \text{Since error}&<0, \operatorname{end=}1.5\quad x_3 = \frac{1+1.5}{2} = 1.25\\ \operatorname{error} &= 2-1.25^2 = 0.4375\\ \text{Since error}&>0, \operatorname{begin=}1.25\\ x_4 &= \frac{1.25+1.5}{2} = 1.375\\ \vdots \end{aligned} \]

This algorithm converges slowly. We continue averaging until the \(|\epsilon|\leq\tau\) where \(\tau\) is taken to be a very small number for example \(\tau =10^{-8}\) =1e-8. Write a function sqrt_avg(x) implementing the algorithm above. Notice that for each iteration that we did, the interval was halved and we maintained the interval that contained the solution. This method was introduced in Exercise 3 question 7. We would discuss this method in details in the next section.

Other times, we do a quick brute force. ie since we need the root/zero of a function ie \(f(x) = 0\), we rewrite the problem such that \(x \pm f(x) = x\). This is simply adding/subtracting a zero as \(f(x) = 0\). We start from a starting position. If we perceive that the solution is below the starting position we use subtraction otherwise we use addition. Then we solve the new \(x\) by computing the difference between the current \(x\) and the value of the function evaluated at the current position \(f(x)\).

A quick example. Suppose we have a function \(f(x) = x-4\) , we know that the zero of the function occurs at \(x=4\) . Assume we start with \(x=10\) we could solve the problem as \(10 - f(10) = 10 - (10-4) = 4\). By using one iteration, we converge to the solution \(4\). We used subtraction since our starting point was larger than the solution. What if we had \(f(x) = x^2-5x+6\)? Assume we start with \(x = 10\) we would have:

\[ \begin{aligned} \text{iteration 1}\quad&10 - (10^2 - 5(10) + 6) = -46\\ \text{iteration 2}\quad &-56 - f(-46) = -2398\\ \vdots \end{aligned} \]

Notice how instead of converging, we are diverging to \(-\infty\). This is because of the very huge values solved form the \(f(x)\). What if instead of using \(x\pm f(x)=x\) we use \(x\pm\alpha f(x)=x\) where \(\alpha\in(0,1)\) Note that at the solution we would still have \(x \pm \alpha\cdot0 = x\) and the equation \(x=x\) would have been maintained. We can choose an arbitrary \(\alpha\) as long as we remain in the viable region of the solution. Now lets solve the problem above with \(\alpha=0.1\)

\[ \begin{aligned} \text{iteration 1}\quad&10 - 0.1(10^2 - 5(10) + 6) = 4.4\\ \text{iteration 2}\quad &4.4 - 0.1f(4.4) = 4.064\\ \text{iteration 3}\quad &4.064 - 0.1f(4.064) = 3.84439\\ \text{iteration 4}\quad &3.84439 - 0.1f(3.84439) = 3.688652\\ \text{iteration 5}\quad &3.688652- 0.1f(3.688652) = 3.572363\\ \text{iteration 6}\quad &3.572363 - 0.1f(3.572363) = 3.482366\\ \text{iteration 7}\quad &3.482366 - 0.1f(3.482366) = 3.410862\\ \text{iteration 8}\quad &3.410862 - 0.1f(3.410862) = 3.352895\\ \text{iteration 9}\quad &3.352895 - 0.1f(3.352895) = 3.305152\\ \text{iteration 10}\quad &3.305152 - 0.1f(3.305152) = 3.265325\\ \vdots \end{aligned} \]

Notice how we are converging to \(x=3\). We can quickly run 150 iterations to check the results:

f <- function(x)x^2 - 5*x + 6
alpha <- 0.1
x <- 10
for(i in 1:150){
  x <- x - alpha * f(x) # we used subtraction 
}
x

[1] 3

What if we we started at \(x = -10\) ? we would use addition

f <- function(x)x^2 - 5*x + 6
alpha <- 0.07
x <- -10
for(i in 1:200){
  x <- x + alpha * f(x) # we used addition
}
x

[1] 2

We were able to get the correct values of \(x=2\) and \(x = 3\) . Notice that I used different \(\alpha\), different number of iterations, different signs ie addition vs subtraction. etc These variables are difficult to control. Different methods have been developed to ensure some uniformity in the choice of \(\alpha\), whether to subtract or add etc. Take a look at Exercise 4 question 9 whereby \(\alpha\) is dynamic. In the next section we would discuss the methods of determining the \(\alpha\).

Bisection Method

The bisection method is one of the simplest numerical methods for root-finding to implement, and it almost always works. The main disadvantage is that convergence is relatively slow. The idea behind the bisection method is root-bracketing, which works by first finding an interval in which the root must lie, and then successively refining the bounding interval in such a way that the root is guaranteed to always lie inside the interval.

In the bisection method, the width of the bounding interval is successively halved.

Suppose that \(f\) is a continuous function. Then \(f\) has a root in the interval \(\left(x_{\ell}, x_r\right)\) if either:

\(-f\left(x_{\ell}\right)<0\) and \(f\left(x_r\right)>0\), or
\(f\left(x_{\ell}\right)>0\) and \(f\left(x_r\right)<0\).

A convenient way to verify this condition is to check if \(f\left(x_{\ell}\right)f\left(x_r\right)<0\). If \(x_{\ell}\) and \(x_r\) satisfy this condition, we say \(x_{\ell}\) and \(x_r\) bracket the root. The bisection method works by taking an interval \(\left(x_{\ell}, x_r\right)\) that contains a root, then successively refining \(x_{\ell}\) and \(x_r\) until \(x_r-x_{\ell} \leq \varepsilon\) , where \(\varepsilon>0\) is the pre-specified tolerance level.

Procedure

Start with \(x_{\ell}<x_r\) such that \(f\left(x_{\ell}\right) f\left(x_r\right)<0\).

If \(x_r-x_{\ell} \leq \varepsilon\), then stop.
Compute \(x_m=\frac{x_{\ell}+x_r}{2}\). If \(f\left(x_m\right)=0\), then stop.
If \(f\left(x_{\ell}\right) f\left(x_m\right)<0\), then assign \(x_r=x_m\). Otherwise, assign \(x_{\ell}=x_m\).
Go back to step 1.

Notice that, at every iteration, the interval \(\left(x_{\ell}, x_r\right)\) always contains a root. Assuming we start with \(f\left(x_{\ell}\right) f\left(x_r\right)<0\), the algorithm is guaranteed to converge, with the approximation error reducing by a constant factor \(1 / 2\) at each iteration. If we stop when \(x_r-x_{\ell} \leq \varepsilon\), then we know that both \(x_{\ell}\) and \(x_r\) are within \(\varepsilon\) distance of a root.

f <- function(x)x^2-5*x+6

my_roots <- function(f, x_l, x_r, tolerr = 1e-8){
  repeat{
      mid <- (x_l + x_r)/2
      f_mid <- f(mid)
      if(f_mid == 0 | x_r - x_l < tolerr) break
      if(f(x_l)*f_mid < 0) x_r <- mid
      else x_l <- mid
    }
  mid
}

my_roots(f,2.5, 10)

[1] 3

my_roots(f,-10,2.5)

[1] 2

my_roots(\(x)log(x)-exp(-x), 0, 10)

[1] 1.3098

There is a function in R that does exactly what we have implemented above. Its called uniroot: It has various parameters. do a quick help('uniroot') to know how to use the function

uniroot(f, c(2.5, 10), tol = 1e-10)

$root
[1] 3

$f.root
[1] 5.329071e-15

$iter
[1] 14

$init.it
[1] NA

$estim.prec
[1] 5.000134e-11

uniroot(f, lower = -10, upper = 2.5)

$root
[1] 2.000006

$f.root
[1] -6.423345e-06

$iter
[1] 14

$init.it
[1] NA

$estim.prec
[1] 6.103516e-05

uniroot(f, lower = -10, upper = -9, extendInt = 'yes', tol=1e-15)

$root
[1] 2

$f.root
[1] 0

$iter
[1] 24

$init.it
[1] 7

$estim.prec
[1] 0.0003936229

uniroot(\(x)log(x)-exp(-x), c(0, 10))

$root
[1] 1.309799

$f.root
[1] -1.897809e-07

$iter
[1] 8

$init.it
[1] NA

$estim.prec
[1] 6.103516e-05

Use the function to solve problems in Exercise 4

Newton-Raphson

The main idea behind Newton-Raphson is to use the property that the tangent line at a point is the best linear approximation to the function near that point. At the point \((x, 0)\), ie the zero of the function, we can write the equation of the tangent line given that we know the gradient at the particular point. Recall that the slope of a line is computed as rise over run:

\[ m = \frac{y_1-y_0}{x_1-x_0} \]

Assume we start at point \((x_0, y_0)\) on the tangent line, since the tangent line is a good approximation to \(f\) near \(x_0\), the next guess \(x_1\) is the point where the tangent line crosses the x-axis.

\[ m = \frac{0 - y_0}{x_1 - x_0}\\ \]

Then solving for \(x_1\) we get:

\[ x_1 = x_0 - \frac{y_0}{m} \]

Using function notation ie \(m = f'(x_0)\) and \(y_0 = f(x_0)\) we obtain

\[ x_1 = x_0 - \frac{f(x_0)}{f'(x_0)} \]

Repeating the same procedure, the next guess \(x_2\) is then

\[x_2 = x_1-\frac{f\left(x_1\right)}{f^{\prime}\left(x_1\right)}\]

In general we could write this as:

\[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} \]

Procedure

The Newton-Raphson algorithm is described as follows:

Start from an initial value \(x_0\) chosen arbitrarily.
Repeat:
\[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} \]
Stop when \(\left|f\left(x_n\right)\right| \leq \varepsilon\), where \(\varepsilon>0\) is the pre-specified tolerance level.

Notice how this iteration is similar to \(x_{n+1} = x_{n} - \alpha f(x_n)\) introduced in the section above, with \(\alpha = \frac{1}{f'(x_n)}\). The newton raphson method is one way of determining the value of \(\alpha\). Using this method, we always subtract, regardless of the starting point.

f <- function(x)x^2 - 5*x + 6
fprime <- function(x)2*x - 5
x <- 10
for(i in 1:8){
  x <- x -  f(x)/fprime(x) # we used subtraction 
}
x

[1] 3

x <- -10
for(i in 1:8){
  x <- x -  f(x)/fprime(x) # we used subtraction 
}
x

[1] 2

Notice how this time round, we did not bother to do a trial and error on what to include. We just wrote the code, and the different starting points led to the different needed solutions. Notice how i only used 7 iterations to converge to the solution.

x <- 10
for(i in 1:150){
  alpha <- 1/fprime(x)
  # To write in alpha notation:
  x <- x -  alpha * f(x) # we used subtraction 
}
x

[1] 3

g <- function(x) log(x)-exp(-x)
gprime <- function(x)1/x + exp(-x)
x <- 1
tolerr <- 1e-10

while(abs(g(x))>tolerr){
  x <- x - g(x)/gprime(x)
}
x

[1] 1.3098

Secant Method

Note that although newton method converges very fast (quadratically), we need the analytic computation of the gradient. Sometimes this is not easily found, and thus we use the secant slope rather than the tanget slope. The formula remains the same, although \(m \neq f'(x)\) but rather \(m\) remains to be as previously defined.

we thus have the secant update method as:

\[ x_{n+1} = x_n - \frac{f(x_n)}{m}\quad \text{where } m = \frac{f(x_{n}) - f(x_{n-1})}{x_n - x_{n-1}} \]

Note: Notice that if \(x_n\) and \(x_{n-1}\) are close to each other, then
\[ f'(x_n)\approx\frac{f(x_n) - f(x_{n-1})}{x_n - x_{n-1}} \]

which is why the secant method is often seen as an approximation to the Newton-Raphson method, where we approximate the derivative by a finite difference/numerical approximation.

The secant method also uses the form \(x_{n+1} = x_n-\alpha f(x_n)\) where \(\alpha = 1/m\) and \(m\) is as defined above.

Procedure

Start from arbitrary initial values \(x_0\) and \(x_1\).
Repeat:

\[x_{n+1} = x_n-\frac{x_n-x_{n-1}}{f(x_n) - f(x_{n-1})}f(x_n)\]
Stop when \(\left|f\left(x_n\right)\right| \leq \varepsilon\), where \(\varepsilon>0\) is the pre-specified tolerance level.

f <- function(x) log(x)-exp(-x)
x0 <- 1
x1 <- 2
tolerr <- 1e-10
while(abs(f(x1))>tolerr){
  alpha <- (x1 - x0)/(f(x1) - f(x0))
  x2 <- x1 -  alpha * f(x1)
  x0 <- x1
  x1 <- x2
}
x2

[1] 1.3098

The above code can be shortened to

f <- function(x) log(x)-exp(-x)
x0 <- 1
x1 <- 2
tolerr <- 1e-10
while(abs(f(x1))>tolerr){
  alpha <- (x1 - x0)/(f(x1) - f(x0))
  x1 <- (x0 <- x1) -  alpha * f(x1)
}
x1

[1] 1.3098

Another method is by using the numerical gradient approximation. ie gradient is defined as

\[ f'(x) = \lim_{h\to0}\frac{f(x+h) -f(x)}{h} \]

Since in the world of computing, \(h\) cannot be zero, we set is to be a small number and do the approximation:

\[ f'(x) \approx \frac{f(x+h) -f(x)}{h} \]

f <- function(x) log(x)-exp(-x)
x <- 1
tolerr <- 1e-10
h <- 1e-5
while(abs(f(x))>tolerr){
  alpha <- h / (f(x + h) - f(x))
  x <- x -  alpha * f(x)
}
x

[1] 1.3098

Fixed Point Method

A fixed point of a function \(f\) is the point such that \(g(x) = x\). Graphically its the point where the graph \(g(x)\) crosses the line \(y=x\)

An interesting/useful property of fixed points is that they remain fixed after iteratively applying the function an arbitrary number of times. That is, if \(x\) is a fixed point of \(g\), then

\[ x = g(x) = g(g(x)) = g(g(g(x))) = \cdots \]

The problem of finding fixed points can be framed as a root-finding problem. Define the function \(f (x)\) by the equation \(f(x)= c[g(x) − x]\), for some constant \(c\neq 0\). Then \(f (x) = 0\) if and only if \(g(x) = x\). Conversely, the problem of finding solutions to \(f (x) = 0\) is equivalent to finding the fixed points of the function \(g(x) = cf (x) + x\) Note that there are often many ways to write \(f (x) = 0\) as \(g(x) = x\), and, in practice, some are better than others.

Procedure

Start from an initial value \(x_0\).
Compute \(g\left(x_0\right)\) as the next value \(x_1\).
Repeat:

\[ x_{n+1} = g\left(x_n\right) \]
stop when \(\left|x_n-x_{n-1}\right| \leq \varepsilon\), for some user-specified tolerance level \(\varepsilon>0\).

Using the example above: \(f(x) = \log(x) - \exp(-x)\) , we first have to express the problem as a fixed point

\[ \log(x) - e^{-x} = 0 \implies x = e^{e^{-x}} \]

Now our \(g(x) = e{e{-x}}\), then the fixed point of \(g(x)\) corresponds to the root of \(f(x)\)

A quick implementation:

g <- function(x) exp(exp(-x))
x_old <- 10
x_new <- g(x_old)
its <- 0
tol <- 1e-8
while(abs(x_new - x_old) > tol){
  x_old <- x_new
  x_new <- g(x_old)
  its <- its + 1
}
x_new

[1] 1.3098

A shorter version

g <- function(x)exp(exp(-x))
x <- 10
tolerr <- 1e-8
while(abs(x - (x <- g(x)))>tolerr){}
x

[1] 1.3098

Using recursion:

s <- function(x = 10, tol = 1e-8) {
  if(abs((y<-exp(exp(-x)))- x)<tol)x else s(y)
}
s()

[1] 1.3098

Notice that using the fixed point for this particular problem, we can begin from any number, ie including negative numbers, while for the methods previously discussed, we could only start from positive numbers since we cannot evaluate a logarithm of a negative number. Chose the method to implement wisely.

The methods mentioned above can be extended to solve multivariate functions.

Suppose we have a system of linear equation. How can we solve this?

\[ \begin{pmatrix}1\\1\end{pmatrix}x + \begin{pmatrix}-4\\2\end{pmatrix}y = \begin{pmatrix}-7\\5\end{pmatrix} \]

Notice that the system has been written in vector notations. We can then implement the solution as \(x_{n+1} = x_{n} - \alpha f(x_n)\).

h <- function(x, z = c(-7,5)) x[1] + c(-4,2)*x[2] - z
x <- c(0,0) #initial starting point
for (i in 1:100){
  x <- x - 0.2*h(x)
}
x # final root of the function

[1] 1 2

Exercise

Find the root of the following function

\[ \begin{aligned} 3x+2y+4z &= 4\\ 4x+2y+4z &= 2\\ x+y+4z&=4\\ \end{aligned} \]

Since the above is a system of simultaneous equations, the exact solution for above can be obtained by using the solve function. First we could rewrite the above in matrix notation ie \(Ax = b\) as shown below

\[ \begin{aligned} \begin{pmatrix}3&2&4\\4&2&4\\1&1&4\end{pmatrix} \begin{pmatrix}x\\y\\z\end{pmatrix}=\begin{pmatrix}4\\2\\4\end{pmatrix} \end{aligned} \]

This can be solved in R as:

A <- matrix(c(3,4,1,2,2,1,4,4,4), 3)
b <- c(4,2,4)
solve(A, b)

[1] -2.0  4.0  0.5

Now use newton raphson and the to find the root of the problem.

Using the uniroot function, Obtain the inverse function of \(f(x) = x\log(x)\)

ie a function \(g\) whereby \(g(\cdot) = f^{-1}(\cdot)\) or simply \(g(f(x)) =f(g(x)) = x\)