Chapter 2
Advanced Integration and
Applications
This chapter covers a variety of methods and applications for single-variable integrals. The first two
sections lay the groundwork for multivariable integration by exploring the connections between integration
and geometry. One section touches on approximation methods for integrals. Other sections prepare us
for our goal: applying integration to probability and statistics.
Contents
2.1 Area Between Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.4 Approximate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.6 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.7 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 158
Section 2.1
Area Between Curves
Goals:
1 Use integrals to calculate the geometric area of a region.
The Fundamental Theorem of Calculus relates the change in a function to the area under a curve.
Modern scientists have seized upon integration as a way to study change, whether they are measuring
a chemical reaction, the position of a particle, or economic activity. The geometric applications are
irrelevant to most consumers of calculus.
Historically, these methods were exciting to scholars who had been limited to area formulas for circles
and triangles. Now any shape that was defined by an algebraic function was fair game. In this section
we push integration beyond areas under a curve to areas bounded by two or more curves. This gives us
the ability to measure a wide variety of shapes, but geometry is not our end goal. Instead the goal is
to study how integration works on these oddly shaped regions. We will find that the methods of this
section return to relevance when it is time to integrate functions of more than one variable.
Question 2.1.1
How Is the Integral Related to Geometric Area?
When we defined the definite integral, we were attempting to compute the area under a curve.
However, our methods introduced a glitch. Consider the following example.
This region has an area of
38
3
, but
Z
8
3
f(x) dx =
38
3
.
Figure: A region below the x-axis and above y = f (x)
We were taught that the integral does not measure geometric area, but instead signed area. Area
below the x axis counts as negative.
Why does this happen? Recall the definition of the definite integral.
60
Definition
The integral is computed by the following limit
Z
b
a
f(x) dx = lim
x0
X
i
f(x
i
)∆x
This limit takes better and better approximations of the area. The approximation is a sum of
rectangles, whose area is height × width. All the rectangles have width x, but their heights vary, and
we used the height of the graph y = f(x) to measure them. This works fine when f(x) is positive.
When f (x) < 0, the product f(x
i
)∆x computes a negative “area” for each rectangle.
Figure: An approximation by rectangles of negative height
In this example the resolution of this glitch is straightforward. Eliminating the negative sign, we
obtain the correct area. However, we can imagine a region that requires a more sophisticated approach.
Question 2.1.2
What Integral Computes the Geometric Area Between Two Graphs?
Suppose we want to know the area between the graphs y = f (x) and y = g(x) for some interval
a x b. We can approximate this by rectangles. As the number of rectangles increases, the
approximation becomes more accurate.
61
Question 2.1.2
What Integral Computes the Geometric Area Between Two Graphs?
Figure: The region between y = f(x) and y = g(x), approximated by rectangles
Let’s derive a formula for this rectangle approximation.
We let x
i
denote the left endpoint of each subin-
terval. The rectangles have width x and height
g(x
i
) f (x
i
). We compute:
Area = lim
x0
X
i
(g(x
i
) f (x
i
))∆x
This limit exactly matches the definition of a definite
integral. The function being integrated is g(x)
f(x). Thus we can compute the area below y =
g(x) and above y = f(x) by integrating g(x) f(x)
from a to b.
Main Idea
The area above y = f(x) and below y = g(x) from x = a to x = b is computed
Z
b
a
g(x) f(x) dx.
62
Example 2.1.3
The Area Between Two Curves
Suppose we want to compute the area between y =
x and y = x
x from x = 6 to x = 12.
How do we know which graph is on top and which is on the bottom?
The height of a graph is the value of the function. We can evaluate the function at some x in the
interval [6, 12]. The most convenient x is x = 9.
9 = 3 9
9 = 6
So at x = 9, y = x
x is above y =
x.
Exercise
We’ve established that at x = 9, y = x
x is above y =
x. Unfortunately there are infinitely many
points between x = 6 and x = 12. How can we decide which graph is on top at each of them?
1 Does the graph of y =
x intersect the graph of y = x
x between x = 6 and x = 12?
2 What theorem could we use to argue that if y =
x is ever above y = x
x then the graphs
must have intersected?
Solution
1 To test where the graphs intersect, we set the functions equal to each other.
x = x
x
0 = x 2
x
0 =
x(
x 2) (factor)
x = 0 or
x 2 = 0
x = 0 or 4
Neither of these is in [6, 12].
2 The Intermediate Value Theorem tells us that these functions cannot switch places without inter-
secting. Switching places means that the difference (x
x) (
x) would change from positive
to negative. As this is a continuous function, the Intermediate Value Theorem says there must
be some point along the way where (x
x) (
x) = 0. We’ve already shown that all those
points lie outside the interval, so we can conclude that y = x
x is above y =
x over the
entire interval [6, 12].
The figure below confirms that y = x
x is on top for all x in [6, 12].
63
Example 2.1.3
The Area Between Two Curves
Figure: An approximation of the area between y = x
x and y =
x
Main Ideas
Plugging a test point into f(x) and g(x) tells us which graph is above the other.
If the functions are continuous, then solving f (x) = g(x) computes the only points where the
graphs can change positions.
Example 2.1.4
The Area Enclosed by Two Curves
Set up an integral that computes the area enclosed between the curves y = x
2
and y = 3 x x
2
.
Figure: The area enclosed by two parabolas
64
Solution
These are parabolas. If they enclose any area, the downward facing parabola must lie above the upward
facing parabola. This tells us we are integrating
Z
b
a
3 x x
2
x
2
dx
But what are the bounds of integration? To know this we must find the points where the graphs
intersect.
3 x x
2
= x
2
0 = 2x
2
+ x 3
0 = (2x + 3)(x 1)
x =
3
2
or 1
The area is computed
Area=
Z
1
3/2
3 x x
2
x
2
dx
Main Ideas
To determine the range of x values that define an enclosed region, solve for the intersection points
between the graphs.
Sketching the graphs can be a time-saver and a reality check for your answer.
Example 2.1.5
The Area Enclosed by Two Curves that Intersect More than Twice
Compute the area enclosed by f(x) = x
3
10x and g(x) = 3x
2
.
65
Example 2.1.5
The Area Enclosed by Two Curves that Intersect More than Twice
Solution
To find the intersections we set f(x) = g(x) and solve:
x
3
10x = 3x
2
x
3
3x
2
10x = 0
x(x 5)(x + 2) = 0
x =0, 5, or 2
Our region is bounded between x = 2 and x = 5, but one graph does not need to be above the other
for the entire region. The graphs intersect at x = 0 so one graph might be on top for [2, 0], while the
other is on top for [0, 5]. To find out which is which we could evaluate at test points (we would need
two). Alternately, since we’ve already factored f(x) g(x) = x(x 5)(x + 2) we can perform a sign
analysis:
x + +
(x 5) +
(x + 2) + + +
f(x) g(x) + +
2 0 5
Thus x
3
10x > 3x
2
on [2, 0] and x
3
10x < 3x
2
on [0, 5]. The enclosed area is computed by:
Area =
Z
0
2
x
3
10x 3x
2
dx +
Z
5
0
3x
2
x
3
+ 10x dx
=
x
4
4
5x
2
x
3
0
2
+ x
3
x
4
4
+ 5x
2
5
0
= (0 0 0 4 + 20 8) +
125
625
4
+ 125 0 + 0 0
=
407
4
Main Ideas
With more intersections, we must check the region between each pair of intersections to see which
graph is on top.
It can be more efficient to make a sign analysis chart.
Sketching the graphs may be more difficult. If you can do it, it will corroborate (or correct) your
calculations.
66
Example 2.1.6
A Region without a Single Top Curve
Compute the area enclosed by the curves y = 1, y =
16
x
and y = 2
x.
We should start by drawing this region and finding the coordinates of the intersections.
There are three intersections to solve for, one using each pair of equations.
16
x
= 2
x
16
x
= 1 2
x = 1
16 = 2x
3
2
16 = x
x =
1
2
8 = x
3
2
x = 4 x = 16 x =
1
4
If we write this area as an integral
R
16
1
4
g(x) f(x) dx, the top function would need to be piece-wise:
g(x) =
(
2
x if
1
4
x 4
16
x
if 4 x 16
.
We don’t know the anti-derivative of a piece-wise function. Instead, we consider a few different ap-
proaches. Since the upper boundary is defined by a different function for different values of x, one
approach is to break the region into two integrals.
Figure: Two subregions whose areas can be expressed by integrals
The area of the region on the left is
R
4
1
4
2
x1 dx. The are of the region on the right is
R
16
4
16
x
1 dx.
Adding these together gives the total enclosed area.
Another approach would be to obtain the area by subtraction. Find the following two areas on the
diagram:
Z
16
1
4
2
x 1 dx
Z
16
4
2
x
16
x
dx
67
Example 2.1.6
A Region without a Single Top Curve
You should be able to convince yourself that
Enclosed Area =
Z
16
1
4
2
x 1 dx
Z
16
4
2
x
16
x
dx
Both of these approaches require us to evaluate two integrals. That is unavoidable because our inte-
grals are limits of an approximation by rectangles of different heights, and those heights are determined
by different enclosing graphs, depending on which x value we measure at. For this particular region,
there is a way to avoid this.
Instead we can approximate the region by rectangles of different widths.
Notice the left endpoint always lies on y = 2
x and the right endpoint always lies on y =
16
x
. As
the height of the rectangles goes to 0, the approximation becomes exact.
Let’s derive a formula for this rectangle approximation and compute the exact area.
Let y be the height of each rectangle. The widths are given by the horizontal distance between
the graph y = 2
x and y =
16
x
at the heights y
i
corresponding to the bottom of each rectangle.
Horizontal distance is the difference in x values. What x values correspond to y
i
? We can plug in y
i
and solve for x.
68
y
i
= 2
x y
i
=
16
x
y
i
2
=
x xy
i
= 16
(y
i
)
2
4
= x x =
16
y
i
These computations should be familiar. Finding x in terms of y is called finding the inverse function.
These inverse functions give the left and right bounds of our region. To find the area, we take a sum
of the areas of these rectangles of different widths. Then we take a limit. Notice that to make the
width positive we subtract the smaller x value from the larger x value. Geometrically, this is the right
endpoint
16
y
i
minus the left endpoint
(y
i
)
2
4
.
lim
y0
X
i
16
y
i
(y
i
)
2
4
| {z }
width
y
|{z}
height
=
Z
4
1
16
y
y
2
4
dy
This limit is an integral, but the variable of integration is y, not x. The bounds of integration are
the set of y values in the region. The lowest point in the region is at y = 1. The highest is at y = 4.
We evaluate the integral using the Fundamental Theorem of Calculus, but with y instead of x.
Area Enclosed =
Z
4
1
16
y
y
2
4
dy
= 16 ln |y|
y
3
12
4
1
=
16 ln 4
64
12
16 ln 1
1
12
= 16 ln 4
63
12
Main Idea
The area to the right of x = f
1
(y) and to the left of x = g
1
(y) for y from a to b can be computed
Z
b
a
g
1
(y) f
1
(y) dy.
Strategy
Changing an integral to dy may be more work than breaking it into two or more parts. When solving
an area problem, consider both methods and use the one that seems more promising. If you run into
problems with your chosen approach, give the other method a try.
69
Section 2.1
Exercises
Summary Questions
Q1
What is the geometric significance of f(x)g(x) in the formula for the area between two graphs?
Q2
How do we determine which curve is the top of a region and which is the bottom? Describe the
difficulties that can arise.
Q3
How do we use boundaries of the form y = g(x) and y = f(x) in an dy-integral to compute
geometric area?
Q4
When setting up a dy-integral, how can we visually identify which graph’s function will be sub-
tracted from which?
Q5
An integral can be positive or negative. If we are solving for area (which may not be negative)
describe the steps we take to guarantee our area is positive.
Q6
Explain the difference between “The region enclosed by y = f(x) and y = g(x) and “The region
f(x) y g(x).”
2.1.1
Q7
Suppose the graph y = f(x) is above the x-axis.
a
How much would the geometric area between y = f (x) and the x-axis for a x b increase
if the graph were shifted up by k units. Try to argue geometrically or with a visual.
b
Would shifting the graph down by k instead decrease the area by the same amount? Draw
a graph for which it wouldn’t.
Q8
How would we use integrals to calculate the geometric area of the shaded region below?
70
Q9
The expressions
Z
b
a
|f(x)| dx and
Z
b
a
f(x) dx
are not equivalent. Explain why, and draw the graph of a function on which these expressions
disagree.
Q10
Given a differentiable function f(x), the signed area between the graph y = f
(x) and the x-axis
from x = a to x = b is denoted
R
b
a
f
(x) dx and is equal to the change in f(x) from x = a to
x = b. In what sense does the geometric area between the graph of y = f
(x) and the x-axis
represent a change in f (x)?
2.1.2
Q11
Suppose y = f (x) and y = g(x) are below the x-axis. What integral computes the geometric
area between them. How does this compare to the situation when they are above the x-axis?
Q12
Here is another way to derive the formula for the area between curves. Consider the functions
graphed here:
71
Section 2.1
Exercises
a
Indicate on the graph what areas are denoted by
R
b
a
f(x) dx and
R
b
a
g(x) dx. How are they
related to the region between y = f(x) and y = g(x).
b
Is
R
b
a
g(x) dx
R
b
a
f(x) dx equivalent to the expression for area we derived in 2.1.2? What
integral rule(s) would you apply to justify this?
c
If y = f(x) is below the x-axis, how does this change the meaning of
R
b
a
f(x) dx? Does the
formula from
b
still work? Explain.
2.1.3
Q13
Compute the area between y = 4x and y = x
3
from x = 3 to x = 5
Q14
Compute the area between y = e
x
and y = sin(πx) from x = 1 to x = 0
2.1.4
Q15
Compute the area enclosed by y =
x and y = x
2
.
Q16
Compute the area enclosed by y = x
2
5 and y = 4x.
Q17
Compute the area enclosed by y = x
2
, y = 2x 1 and x = 3.
Q18
Compute the area enclosed by y = x + 2 and y = 3
x.
72
2.1.5
Q19
Compute the area between y = sin x and y = cos x over the interval [0, 2π].
Q20
Erica and Carter were asked to compute the area enclosed by y = 4x and y = x
3
. They agree
that 4x = x
3
when x = 2 and when x = 2. Erica thinks the area is
Z
2
2
4x x
3
dx
Carter thinks it is
Z
2
2
x
3
4x dx
a
Who is correct?
b
How do you think the mistake could reasonably have happened, and how can you avoid it?
Q21
Compute the area enclosed by y = xe
x
2
, and y = ex.
Q22
Set up an integral or integrals to compute the region enclosed by the curves f (x) = x
2
(x
2
4)
and g(x) = x
4
(x
2
4).
Q23
Often the top curve of an enclosed region alternates between f(x) and g(x) at each intersection.
Can you explain what about the previous problem caused this pattern to fail?
Q24
Suppose y = f(x) and y = g(x) intersect multiple times, with x = a their leftmost intersection
and x = b their rightmost. We can express the area enclosed between them by
R
b
a
|g(x)f(x)| dx.
a
Explain why this formula works.
b
Explain why this formula isn’t partcilaularly helpful.
73
Section 2.1
Exercises
2.1.6
Q25
Compute the area enclosed by y = 6, y =
x and y = 2x
Q26
Compute the area enclosed by y = e
x
, y = e
4x
, and y = 1.
Q27
You have been taught at least three ways to set up an expression that will compute the area
enclosed by (all of) y = 3, y = 3x, y = 9 and x + y = 5. Set up all the methods you know
that will do this. You do not need to evaluate them.
Q28
Write the area in the first quadrant enclosed by y =
3x, y = 0, and x
2
+ y
2
= 4 as a single
integral.
Q29
Write the area enclosed by y =
x and y = x
2
as
a
an integral in x
b
an integral in y
Q30
Write the area in the first quadrant enclosed by y = x
2
, y = 3x
2
, and y = 18 3x as
a
a sum of integrals in x
b
a sum of integrals in y
Extension and Synthesis
Q31
Suppose you’ve found that y = f(x) and y = g(x) intersect at x = a (along with perhaps other
places). What could knowing the values of f
(a) and g
(a) tell you about where each graph is
above the other? Be as specific as possible.
Q32
Suppose you are given that for all x:
f
(x) > 0
g
(x) < 0
We approximate area between y = f(x) and y = g(x) from x = a to x = b by rectangles,
letting the x
i
be the right endpoints of each subinterval. What can we say about whether the
approximation will overestimate or underestimate the true area?
74
Section 2.2
Volumes
Goals:
1 Recognize cross sections of a solid object.
2 Write the area of each cross section as a function.
3 Compute the volume of a solid.
4 Visualize and compute the volume of a solid of revolution.
The motivation for the definite integral was computing an area. However, the definition turns out
to be more useful than that. With the correct setup, we can express a volume as an integral as well.
Question 2.2.1
What Is Volume?
Dimension
In mathematics, we define the dimension of an object. Dimension measures the number of degrees of
freedom available to a point traveling in the object.
The definition may not match your intuition for dimension. For example, you only encounter a
parabola in two (or more)-dimensional space. However, the parabola itself is one-dimensional. If you
imagine that you are an insect crawling on the parabola, you can only travel forward or backward, not
side to side. If you were small enough, the parabola would seem indistinguishable from a line.
Example
1 A plane is two dimensional. You can travel left/right or up/down.
2 A circle is one dimensional. You can only travel clockwise/counterclockwise.
3 A point is zero dimensional. There is nowhere to travel within it.
We measure objects of different dimensions differently. In all cases, measuring is counting how many
units of measurement fit inside the object. A 6 unit by 3 unit rectangle has area 18 square units, because
18 unit squares can fit inside it. For less regular objects we need to consider parts of square units. This
requires a lot of work to do formally, but the intuition should be straightforward.
75
Question 2.2.1
What Is Volume?
Figure: Objects of several dimensions and their units of measurement
We use different names to describe objects and their measurements in different dimensions:
Dimension Names Measurement
0 point none
1 line, circle, curve length
2 square, polygon, disc, sphere, surface area
3 cube, polyhedron, ball, solid volume
Vocabulary Check
It doesn’t make sense to talk about the volume of a surface. No unit cubes will fit inside it.
Similarly it doesn’t make sense to talk about the area of a solid. Infinitely many unit squares will fit
in any solid. However, solids have boundary surfaces, and we do sometimes measure their areas.
The simplest solid to measure is a (right) prism. If a prism has height h, we can see that each unit
square (or part thereof) in the base has h unit cubes stacked above it. Thus we have
76
Formula for Volume of a Prism
volume = area of base ×height
Figure: A prism divided into unit cubes and its base divided into unit squares.
Here we see the base of the prism and the square units (or parts thereof) that it contains. The prism
has height 3.5. We can see there are 3.5 cubic units above each square unit in the base.
You may be questioning the relevance of studying areas and volumes in the 21st century. Few people
need to compute geometric measurements in their careers. However, geometry is not the end goal of
this investigation.
Remark
Our motivation for studying solids is not to solve geometry problems. Recall that the definite integral
allowed us to express total change as an area:
total change = rate of change × time
f(b) f (a) =
Z
b
a
f
(t) dt
This allowed us to use our geometric intuition of areas to better understand rates of change. Similarly,
volume will allow us to use geometry understand different types of rates later on.
77
Question 2.2.2
How Do We Visualize 3-Dimensional Solids?
Without computer graphics, it can be difficult to visualize anything but the simplest solids. Taking
an arbitrary solid like a lamp or a sculpture, computing its volume by filling it with cubes is a hopeless
endeavor (though a computer could make a decent estimate using small enough cubes). In the absence
of a computer rendering, how do we give our brains a visual reference, and how can we leverage this to
make measurements? We use cross sections.
Definition
A cross section of a solid object is its intersection with some transversal plane.
Transversal means the plane cuts across the solid. In the case of this square-based pyramid, a
transversal plane parallel to the base intersects the pyramid in a square. If it intersects at a different
height, the intersection would be larger or smaller. If it intersects at a different angle, it wouldn’t produce
a square at all.
Figure: A cross section of a pyramid
A solid can be reassembled from its cross sections. This is valuable because cross sections are two-
dimensional, making them easier to draw or visualize. If you have a set of parallel cross sections, you
can imagine them side by side and infer the shape of the original solid.
78
Figure: A set of parallel cross sections of a solid
Question 2.2.3
How Can We Approximate or Compute the Volume of a Non-Prism Solid?
Suppose we want to find the volume of a pyramid. Different square units of the base have a different
number of cubic units above them. Thus we need a more robust approach than counting cubes.
Figure: A pyramid with its base divided into unit squares
We will approximate the pyramid by prisms, whose bases are cross sections.
79
Question 2.2.3
How Can We Approximate or Compute the Volume of a Non-Prism Solid?
Figure: A pyramid approximated by prisms
The key insight is to represent the different heights of these cross sections by the variable x. We can
imagine the x-axis running through the solid in the direction of its height. The bases of the prisms are
cross sections. We let x
i
denote the height at which the i
th
prism’s base lies. The distance between the
heights x
i
is denoted x, which is also the height of each prism. At different heights, we have different
cross sections with different areas. Area is what we really care about, since we want to compute the
volume of these prisms. We write cross sectional area as a function.
A(x) = Area of the cross section at height x
The sum of the volumes of these prisms can be written:
X
i
A(x
i
)∆x.
Taking a limit gives the exact volume of the solid:
Volume = lim
x0
X
i
A(x
i
)∆x
Notice that this is fits the definition of a definite integral, where A(x) is the function being integrated.
That is excellent news for us. Instead of having to learn a new way of evaluating this limit, we can use
the tools of integration that we already know.
Theorem
If the cross section of a solid, perpendicular to the x-axis, has area A(x) at each x, then the volume of
the solid is
Z
b
a
A(x) dx
where a and b are the values of x at the bottom and top of the solid.
80
Example 2.2.4
A Solid with Its Cross-Sections Given
Suppose a solid S extends from x = 2 to x = 6 and the cross section at each x is a right triangle
of height
1
x
and base x
2
. Compute the volume of S.
Solution
We will let the x direction be the height of our solid. Then the cross sectional area at each x is the area
of the triangle at that x.
A(x) =
1
2
bh =
1
2
x
2
1
x
=
1
2
x
Integrating this from x = 2 to x = 6 gives the volume.
Volume =
Z
6
2
A(x) dx
=
Z
6
2
1
2
x dx
=
1
4
x
2
6
2
=
1
4
36
1
4
4
= 8
The volume is 8 cubic units.
Example 2.2.5
A Solid Obtained by Rotation
Suppose the region under the graph y =
5
x+1
from x = 1 to x = 4 is rotated around the x-axis.
Compute the volume of the resulting solid.
81
Example 2.2.5
A Solid Obtained by Rotation
Figure: The solid obtained by rotating the region under y =
5
x+1
about the x-axis
Solution
When we cut the region under the graph perpendicular to the x-axis, we obtain a line segment whose
height is the value of the function. When that line segment is rotated around the axis, it sweeps out a
circle, with the line segment as the radius. We can use the formula for the area of a circle.
A(x) = πr
2
= π
5
x + 1
2
=
25π
(x + 1)
2
We apply our volume formula.
Volume =
Z
4
1
A(x) dx
=
Z
5
1
25π
(x + 1)
2
dx
=
Z
6
2
25π
u
2
du
=
25π
u
6
2
=
25π
6
+
25π
2
=
25π
3
u = x + 1 x = 1 u = 2
du = dx x = 5 u = 6
u-substitution
The volume of the solid is
25π
3
cubic units.
Main Idea
When the region under a graph y = f (x) is rotated around the x-axis, the cross sections are discs of
radius f (x). Their areas are π[f(x)]
2
.
82
Example 2.2.6
A Solid Defined by Its Base
Suppose we have a solid S with the following properties:
The base of S is the region enclosed by y = 0 and y = 4x x
2
.
The cross-sections of S perpendicular to the x-axis are trapezoids which have one base in the base
of S, another base twice as long, and whose heights are 6 units.
Compute the volume of S.
Solution
We find the x-bounds of S by computing the x-bounds of the base. We solve
0 = 4x x
2
0 = x(4 x)
x 0 or 4
So x ranges from 0 to 4. The base of the trapezoid at each x is the height from y = 0 to y = 4x x
2
.
Note 4x x
2
> 0 when 0 < x < 4. Thus the base b
1
= 4x x
2
. The other base is twice as long, so it
is 8x 2x
2
. The height is 6, regardless of x.
A(x) =
1
2
(b
1
+ b
2
)h area of a trapezoid
=
1
2
(4x x
2
+ 8x 2x
2
))6
= 36x 9x
2
Volume =
Z
4
0
36x 9x
2
dx
= 18x
2
3x
3
4
0
= 96
Figure: A solid with base between two graphs and trapezoidal cross-sections
83
Example 2.2.6
A Solid Defined by Its Base
Main Idea
The cross section of the base of a solid is a segment. If we know what role this segment plays in the
cross section of the solid, we can use the expression for the length of this segment to derive an expression
for A(x).
Remark
Notice it is not necessary to be able to visualize the solid to compute its volume from cross sections. It
is not even necessary to know what the cross-sections look like precisely. For instance, our trapezoids
may or may not have a right angle. As long as we can compute the area, the exact shape is irrelevant.
Example 2.2.7
A Solid Described by Measurements
Compute the volume of a pyramid with a square base of side length s and a height of h.
Solution
Let x = 0 be the base of the pyramid and x = h be the vertex. The cross sections are squares. Since
the edges of the pyramid are straight, the squares shrink linearly from s at x = 0 to 0 at x = h. The
line that goes through these two points is
Side length =
s
h
x + s
The cross sections have area
A(x) = (Side length)
2
=
s
h
x + s
2
= s
2
1
h
2
x
2
2
h
x + 1
We can plug this into the formula for volume.
Volume =
Z
h
0
s
2
1
h
2
x
2
2
h
x + 1
dx
= s
2
1
3h
2
x
3
1
h
x
2
+ x
h
0
= s
2
1
3h
2
h
3
1
h
h
2
+ h 0
= s
2
1
3
1 + 1
h
=
1
3
s
2
h
The volume of the pyramid in cubic units is V =
1
3
s
2
h.
84
Section 2.2
Exercises
Summary Questions
Q1
Describe how a cross section of a solid is produced.
Q2
What is the significance of the function A(x) in the formula for the volume of a solid?
Q3
What shapes do we use to approximate the volume of a solid? Why do we choose that shape?
Q4
When we rotate the region under y = f(x) around the x axis, how do we compute the area of
each cross-section?
2.2.1
Q5
Which of the following shapes have (nonzero) volume?
a square
a ball
a sphere
a cube
a cone
a triangle
Q6
Suppose I have a solid S. I tried to fit a unit cube into S but I couldn’t do it, no matter where
I placed the cube or how I rotated it. I conclude that the volume of S is less than 1 unit cube.
What do you think of my conclusion?
Q7
Will the volume of an object be greater is measured in cubic centimeters or cubic inches? Explain
using the definition of how we measure volume.
Q8
Suppose I create a solid by stacking a cone on top of a cylinder. How is the volume of my
new solid related to the volume of the cone and the volume of the cylinder? Explain using the
definition of how we measure volume.
85
Section 2.2
Exercises
2.2.2
Q9
Let S be a sphere of radius 5 centered at the origin. What are the cross sections, perpendicular
to the x-axis? How do they change as you travel along the axis from 5 to 5?
Q10
Describe or draw the cross sections of the pyramid below when it is cut by planes parallel to the
one pictured.
Q11
Suppose all of the cross sections of a solid S, perpendicular to the height, are identical (same
shape and same size). What kind of solid is S?
Q12
Describe the cross sections of a cube
a
perpendicular to an edge.
b
perpendicular to the line connecting the midpoints to two opposite edges.
c
perpendicular to the diagonal that connects two opposite vertices.
86
2.2.3
Q13
Suppose I’m trying to approximate the volume of a solid S of height 12 using four prisms of equal
height. Supoose those prisms have volumes 5.1, 6, 7.2 and 9.6
a
What is the approximate volume of S?
b
What are the areas of the cross sections I used to produce each prism?
Q14
Suppose I’m trying to approximate the volume of the half-ball below by prisms. I subdivide the
height into n subheights and use the cross section at the left hand side of each as the base of each
prism. Will I overestimate or underestimate the volume? Explain how you know in a sentence or
two.
Q15
Produce an approximation of the volume of a pyramid with height 9 and square base of side
length 6 using 3 prisms. There are multiple correct answer to this, corresponding to different
choices of where to take the cross sections.
Q16
Suppose a solid S has height 16. Suppose all of its cross-sections perpendicular to the height
have a different shape, but all of those shapes have area 5.
a
What is the volume of S?
b
Do you really need calculus to solve
a
? Discuss.
2.2.4
Q17
Compute the volume of the solid between x = 0 and x = 3 whose cross sections at each x are
squares of side length e
x
.
Q18
Compute the volume of the solid between x = 0 and x = 2 whose cross sections at each x are
trapezoids of bases x + 1 and x + 3 and height x
2
.
Q19
Compute the volume of the solid whose cross sections, perpendicular to the x-axis, are triangles
whose bases lie between y = 3x and y = x
2
from x = 0 to x = 3 and whose heights are equal
to the length of their bases.
87
Section 2.2
Exercises
Q20
Compute the volume of a solid between x = 1 and x = e
2
whose cross sections perpendicular to
the x-axis are rectangles of base ln x and height
ln x
x
.
2.2.5
Q21
Compute the volume of the solid created by rotating the region under y =
x from x = 0 to
x = 9 around the x-axis.
Q22
Consider the semidisk of radius 3 below:
a
Write a function y = f(x) that defines the boundary of this semidisk.
b
Suppose this semidisk is rotated around the x-axis. Describe the resulting solid.
c
Compute A(x), the area of the cross section at each value of x.
d
Write and evaluate an integral that computes the volume the solid of rotation.
Q23
Compute the volume of the solid created by rotating the region y = 4 x
2
from x = 2 to
x = 2 about the x-axis.
Q24
Compute the volume of the solid created by rotating a trapezoid with vertices (2, 0), (5, 0), (5, 8)
and (2, 2) around the x-axis.
88
2.2.6
Q25
Compute the volume of a solid whose base is the triangle under y =
1
2
x+3 in the first quadrant
and whose cross sections, perpendicular to the x-axis are triangles of height 8.
Q26
Compute the volume of a solid whose base is the region enclosed by y =
x and y =
x
2
and
whose cross sections, perpendicular to the x-axis are squares.
Q27
Compute the volume of a solid whose base is a right triangle with legs 4 and 3 and whose cross
sections, perpendicular to the leg of length 4, are semicircles with their diameter in the base.
Q28
Compute the volume of a solid S whose base is the unit disc and whose cross sections perpendicular
to the x-axis are isosceles right triangles, with one leg in the base.
Extension and Synthesis
Q29
Let D be the region enclosed by y = x
2
6x and the x-axis.
a
Set up an integral that will compute the geometric area of D. You do not need to evaluate
it.
b
Let S be a solid whose base is D and whose cross sections perpendicular to the x-axis are
semicircles with their diameter in D. Set up an integral that will compute the volume of S.
You do not need to evaluate it.
Q30
Consider the solid obtained by rotating the triangle below around the x-axis.
a
Describe the shape of the cross sections. Which measurements of this shape depend on x?
b
Compute a formula for A(x), the area of the cross section at each value of x.
c
Compute the volume of the solid.
89
Section 2.2
Exercises
Q31
A solid S of height 12 has the following cross sections areas A(x) at height x. How would you
approximate the volume?
x A(x)
1 10
5 12
7 11
10 7
12 2
90
Section 2.3
Integration by Parts
Goals:
1 Use the integration by parts formula to find anti-derivatives and definite integrals.
2 Choose appropriate decompositions for integrating by parts.
3 Recognize when applying the formula multiple times will be fruitful.
The product rule gives us a reliable method for computing derivatives of products. If you can
differentiate each factor in a product, you can differentiate the entire product. This is not the case for
integration. In this section we add another tool to our limited tool set for integrating a product of two
functions. Even with this method, many problems will be permanently out of reach.
Question 2.3.1
How Do We Compute an Anti-Derivative of a Product of Two Functions?
We reversed the chain rule (which computes derivatives) to compute anti-derivatives of certain
functions. This method is called u-substitution. The du term means that we often end up integrating
a product of functions with this method.
Example
Compute the integral:
Z
3
0
xe
x
2
dx
Solution
Z
3
0
xe
x
2
dx =
Z
9
0
1
2
e
u
du
=
1
2
e
u
9
0
=
1
2
(e
9
1)
u = x
2
x = 0 u = 0
du = 2x dx x = 3 u = 9
u-substitution
Main Idea
u-substitution is extremely fragile. Our example relies on the fact that the factor x is a constant multiple
of the derivative of the inner function, x
2
.
Since the chain rule can only produce certain products, we should look for other differentiation rules
that could produce other products. The product rule is the obvious candidate.
91
Question 2.3.1
How Do We Compute an Anti-Derivative of a Product of Two Functions?
Reminder
The Product Rule states that if f(x) and g(x) are differentiable, then
[f(x)g(x)]
= f
(x)g(x) + g
(x)f(x).
Example
Compute
Z
x
2
cos x + 2x sin x dx
Solution
This integrand looks like it might be the output of the product rule. If we write
f
(x)g(x) + g
(x)f(x) = x
2
cos x + 2x sin x
we can match up the factors as
f(x) = sin x f
(x) = cos x
g(x) = x
2
g
(x) = 2x
Since
d
dx
(sin(x)x
2
) = x
2
cos x + 2x sin x we can conclude
Z
x
2
cos x + 2x sin x dx = sin(x)x
2
+ c
If anything, this is more fragile than u-substitution. It requires a sum of compatible products. How
can we make the formula [f(x)g(x)]
= f
(x)g(x) + g
(x)f(x) more useful?
A formula that applies to a single product instead of a sum of two products would be much more
useful. We can obtain it by subtracting.
f
(x)g(x) + g
(x)f(x) = [f(x)g(x)]
product rule
Z
f
(x)g(x) + g
(x)f(x) dx = f(x)g(x) + c integrate both sides
Z
f
(x)g(x) dx +
Z
g
(x)f(x) dx = f(x)g(x) + c sum rule of integrals
Z
g
(x)f(x) dx = f(x)g(x)
Z
f
(x)g(x) subtract from both sides
Notice we don’t need the +c anymore. Both sides contain an indefinite integral so the possible
constant of difference is built in on both sides. We can make one further move to simplify the equation.
Since g
(x)dx is the differential of g(x) and f
(x)dx is the differential of f(x), it is convenient to
represent these functions with variables. u and v are the traditional choices here.
This method is called integration by parts. Here is the formal statement.
92
Theorem
Suppose an integral can be written
Z
u dv where
u is a function (more precisely u(x)),
and dv is a differential (more precisely v
(x)dx).
We can apply the following formula:
Z
u dv = uv
Z
v du
The integration by parts formula was not difficult to derive. The more pressing question is whether
it is useful. It replaces the problem of evaluating
R
u dv with a new problem: evaluating
R
v du. We
need to see some examples to determine whether it is ever any help at all.
Example 2.3.2
Computing an Anti-derivative Using Integration by Parts
Compute
Z
xe
x
dx.
Solution
To use integration by parts, we need to look at the integrand xe
x
and decide which part is u and which
part is dv. Let’s try letting u = x and dv = e
x
dx. The formula says
Z
u dv = uv
Z
v du.
We can replace
Z
xe
x
dx by the right hand side, but we need to know what du and v are. We find du
by taking the differential of u. We find v by taking the antiderivative of dv.
u = x = du = dx
dv = e
x
dx = v = e
x
Now we can apply the integration by parts formula.
Z
xe
x
dx = xe
x
Z
e
x
dx
Notice the integrand vdu is not a product. It is a function whose antiderivative we know. Thus
integration by parts allowed us to replace a product we couldn’t integrate with something we could.
Evaluating the integral, we obtain:
Z
xe
x
dx = xe
x
e
x
+ c
93
Example 2.3.2
Computing an Anti-derivative Using Integration by Parts
We can always verify our antiderivatives by differentiating them. In this case
d
dx
(xe
x
e
x
+ c) = xe
x
+ e
x
(1)
| {z }
product rule
e
x
= xe
x
This verifies that we have found the correct antiderivative of xe
x
.
Remark
The most general antiderivative of dv = e
x
dx would be v = e
x
+ c. However, we can get away
with using a specific antiderivative instead. To convince yourself of this, try redoing the problem with
v = e
x
+ c, and see that the c cancels out of your answer.
Question 2.3.3
How Do We Choose u and dv?
What would happen if we again solved
Z
xe
x
dx by parts, but set
u = e
x
dv = x dx?
In this case we compute
Z
xe
x
dx
=
1
2
e
x
x
2
Z
1
2
x
2
e
x
dx
u = e
x
dv = x dx
du = e
x
dx v =
1
2
x
2
by parts
This is no less correct than our previous application of the formula. It is, however, much less useful.
To evaluate this we need to know an anti-derivative of
1
2
x
2
e
x
, which seems like an even harder problem
than the one we started with. As we can see, the choice of u and dv can determine the success or failure
of integration by parts. So what makes a good choice of u and dv?
In integration by parts, u is going to be differentiated. This usually makes functions simpler if
anything. dv is going to be integrated. This could make
Z
v du difficult to compute. The following
mnemonic helps us decide which factor to choose as u and which as v.
94
I.L.A.T.E.
When deciding which factor of a product should be u and which should be dv, put them into the chart
below.
Inverse
functions
Logarithms
Algebraic
expressions
(polyniomials)
Trig
functions
Exponential
functions
better u’s better dv’s
Let’s apply I.L.A.T.E to the following products:
1
Z
x
5
ln x dx
x
5
is algebraic. ln x is a logarithm. We should let u = ln x and dv = x
5
dx.
2
Z
x sin x dx
x is algebraic. sin x is trigonometric. We should let u = x and dv = sin x dx.
3
Z
x
2
tan
1
(x) dx
x
2
is algebraic. tan
1
(x) is an inverse function. We should let u = tan
1
(x) and dv = x
2
dx.
Z
x
2
tan
1
(x) dx
=
1
3
x
3
tan
1
(x)
Z
1
3
x
3
1
1 + x
2
dx
=
1
3
x
3
tan
1
(x)
Z
1
3
x
3
1
1 + x
2
dx
=
1
3
x
3
tan
1
(x)
Z
1
6
x
2
1 + x
2
2x dx
=
1
3
x
3
tan
1
(x)
Z
1
6
u 1
u
du
=
1
3
x
3
tan
1
(x)
1
6
Z
1
1
u
du
=
1
3
x
3
tan
1
(x)
1
6
(u ln |u|) + c
=
1
3
x
3
tan
1
(x)
1
6
(1 + x
2
ln |1 + x
2
|) + c
u = tan
1
(x) dv = x
2
dx
du =
1
1+x
2
dx v =
1
3
x
3
by parts
u = 1 + x
2
du = 2x dx
u-substitution
95
Example 2.3.4
Using Integration by Parts More than Once
Compute
Z
π
0
x
2
cos x dx
Solution
I.L.A.T.E. suggests u = x
2
and dv = cos x dx. When we apply integration by parts to a definite integral,
the
R
v du maintains the same bounds of integration. The uv is evaluated at those bounds, because it
is part of the antiderivative.
Z
π
0
x
2
cos x dx
= x
2
sin x
π
0
Z
π
0
2x sin x dx
u = x
2
dv = cos x dx
du = 2x dx v = sin x
by parts
Unfortunately, we don’t know the anti-derivative of 2x sin x. It is still a product. We can try applying
integration by parts again to replace
R
π
0
2x sin x with something we can evaluate.
Z
π
0
x
2
cos x dx
= x
2
sin x
π
0
Z
π
0
2x sin x dx
= x
2
sin x
π
0
2x cos x
π
0
Z
π
0
2 cos x dx
= x
2
sin x
π
0
+ 2x cos x
π
0
2 sin x
π
0
= (π
2
)(0) (0)(0) + (2π)(1) (0)(1) (0) + (0)
= 2π
u = 2x dv = sin x dx
du = 2 dx v = cos x
by parts (again)
Change of Variables?
Notice that despite defining functions u and v, we continue to work in terms of the variable x. Contrast
this with u-substitution where the variable x can be completely eliminated in a definite integral. That
approach isn’t possible here. We’d have to write v as a function of u. This would be complicated or
impossible.
96
Example 2.3.5
Using Integration by Parts to Produce an Equation
Compute
Z
e
2x
cos x dx
Solution
I.L.A.T.E. suggests u = cos x and dv = e
2x
dx. To integrate dv we use a u-substitution. We apply the
integration by parts formula, factoring the
1
2
from the integrand:
Z
e
2x
cos x dx
=
1
2
e
2x
cos x
Z
1
2
e
2x
sin x dx
=
1
2
e
2x
cos x +
1
2
Z
e
2x
sin x dx
u = cos x dv = e
2x
dx
du = sin x dx v =
1
2
e
2x
by parts
Did this help? We don’t know the antiderivative of e
2x
sin x. Even worse, it doesn’t seem to have
improved in any way. It is just as complicated as what we started with. Our intuition might be to give
up and try another approach. Perhaps I.L.A.T.E. has done us wrong and we should choose a different
u and dv. In this case, however, we should reject that intuition and continue. We’ll apply integration
by parts again.
Z
e
2x
cos x dx
=
1
2
e
2x
cos x +
1
2
Z
e
2x
sin x dx
=
1
2
e
2x
cos x +
1
2
1
2
e
2x
sin x
1
2
Z
e
2x
cos x dx
=
1
2
e
2x
cos x +
1
4
e
2x
sin x
1
4
Z
e
2x
cos x dx
u = sin x dv = e
2x
dx
du = cos x dx v =
1
2
e
2x
by parts again
Does this help? Again the integrand does not seem to have improved, until we notice that the
integrand is exactly what we began with. We could add
1
4
R
e
2x
cos x dx to both sides of the equation,
and we could solve for
R
e
2x
cos x dx algebraically.
Z
e
2x
cos x dx =
1
2
e
2x
cos x +
1
4
e
2x
sin x
1
4
Z
e
2x
cos x dx
5
4
Z
e
2x
cos x dx =
1
2
e
2x
cos x +
1
4
e
2x
sin x + c
Z
e
2x
cos x dx =
4
5
1
2
e
2x
cos x +
1
4
e
2x
sin x
+ c
Z
e
2x
cos x dx =
2
5
e
2x
cos x +
1
5
e
2x
sin x + c
97
Example 2.3.5
Using Integration by Parts to Produce an Equation
Main Idea
We’ve seen a variety of techniques to apply when integration by parts does not give us an immediate
answer. The success of integration by parts depends on the
Z
v du term. You might use the following
flow chart to decide how to proceed once you have applied integration by parts.
Is
Z
v du still a product?
Integrate it.
You are done.
Can you apply a u-sub?
Use u-sub.
You are done.
How does
Z
v du compare
to the orginal integrand?
Apply integration by
parts again.
Use another
approach.
Write an equation
and solve.
no
yes
yes
no
simpler
similar
more
complicated
constant multiple
Section 2.3
Exercises
Summary Questions
Q1
What type of integrands are good candidates for integration by parts?
Q2
How is u handled differently in integration by parts than in u-substitution?
Q3
How is the acronym I.L.A.T.E. used?
Q4
Under what conditions would we want to apply integration by parts more than once?
98
2.3.1
Q5
Compute
Z
sin x
1 + x
2
+ cos x tan
1
x dx
Q6
Which of the following can be integrated using u-substitution?
R
e
x
dx
R
xe
x
dx
R
x
2
e
x
dx
R
x
3
e
x
dx
R
e
x
2
dx
R
xe
x
2
dx
R
x
2
e
x
2
dx
R
x
3
e
x
2
dx
R
e
x
3
dx
R
xe
x
3
dx
R
x
2
e
x
3
dx
R
x
3
e
x
3
dx
R
e
x
4
dx
R
xe
x
4
dx
R
x
2
e
x
4
dx
R
x
3
e
x
4
dx
2.3.3
Q7
Evaluate
Z
ln x
x
3
dx.
Q8
Evaluate
Z
x sin x dx.
Q9
Use integration by parts to compute
Z
tan
1
x dx. Note that
d
dx
tan
1
x =
1
1+x
2
Q10
We can write
Z
ln x dx as a product:
Z
(1)(ln x) dx.
a
How does I.L.A.T.E. suggest we proceed?
b
Use integration by parts to compute the antiderivative.
Q11
Compute
R
sin
1
x dx.
Q12
Compute
R
π/4
0
tan
1
x dx.
99
Section 2.3
Exercises
2.3.4
Q13
Compute
Z
x
2
cos(x + 2) dx.
Q14
Compute
Z
1
0
x
3
e
x
dx.
Q15
Compute
Z
x
7
sin(x
2
) dx. Hint: The easiest way to split this is not the correct way. You’ll
need some factors of x to find an antiderivative of your trig function.
Q16
Compute
Z
π
0
x sin x dx.
2.3.5
Q17
Compute
Z
e
3x
sin x dx.
Q18
Compute
Z
e
x
cos 2x dx.
Extension and Synthesis
Q19
Compute
Z
x
3
e
x
2
dx. Choose your dv carefully. You want something that you can integrate.
Q20
Compute
Z
sin(ln x) dx. Perform a u-substitution before trying by parts.
Q21
Compute the area enclosed by y = xe
x
and y = ex.
Q22
Let S be a solid between x = 0 and x = 3 whose cross-sections perpendicular to the x-axis are
triangles of base x and height e
x
. Compute the volume of S.
Q23
Let S be the solid obtained by rotating the region below y = ln x from x = 1 to x = 5 about
the x-axis. Compute the volume of S.
Q24
Suppose that S is a solid between x = 1 and x = 5 whose cross sections (perpendicular to the
x-axis) are triangles of height x
2
and base ln x at each x. Compute the volume of S.
100
Section 2.4
Approximate Integration
Goals:
1 Use several methods to approximate definite integrals.
2 Assess the accuracy of an approximation.
3 Approximate integrals given incomplete information.
One of the first applications of integration is to measure total change. If v(t) is our velocity,
R
b
a
f(t) dt
computes the total displacement between the times a and b. In practice, to evaluate such an integral,
we need to know the antiderivative of f. Can we realistically expect to do this? Except in theoretical
situations (say a physics experiment), we cannot. A person driving a car will not produce a velocity
function that can be expressed in terms of algebra or trigonometry. While every continuous function has
an antiderivative, it doesn’t help us if we don’t know what it is or how to evaluate it.
Our best option in these situations is to approximate the integral. For instance, if we measure
velocity once per second, we could multiply each velocity by one second to approximate the distance
traveled in that second. Adding these up would approximate the total displacement. What we’ve done
is approximated the integral by rectangles of width 1. The natural question to ask is: how accurate is
such an approximation? How can we make it more accurate? These are the questions we’ll need to
address whenever we want to apply calculus to data sets instead of abstract functions.
Question 2.4.1
What x
i
Can We Use when Approximating an Integral?
Recall the following
Definition
The definite integral is given by the formula
Z
b
a
f(x) dx = lim
x0
n
X
i=1
f(x
i
)∆x
where x are the lengths of the subintervals of [a, b], and x
i
is a number in the i
th
subinterval.
Without the limit (which is difficult or impossible to compute anyway) the sums on the right are
approximations of the integral. Once we choose an x
i
for each i, we can evaluate this approximation.
The simplest idea is to just use the left endpoint of each subinterval as x
i
.
101
Question 2.4.1
What x
i
Can We Use when Approximating an Integral?
Notation
The notation L
n
refers to the approximation of
Z
b
a
f(x) dx by n rectangles,
n
X
i=1
f(x
i
)∆x,
where the x
i
are the left endpoints of each subinterval.
Similarly R
n
refers to the approximation using the right
endpoints for x
i
.
L
4
approximation
Example 2.4.2
Computing an L
n
Approximation
a
Compute an L
3
approximation of
Z
5
1
x
2
dx.
b
Does L
3
over or underestimate the actual value of
Z
5
1
x
2
dx?
Solution
a
Let f(x) = x
2
. The interval [1, 5] has length 5 (1) = 6. Three rectangles means that
x =
6
3
= 2. We can divide up the interval to find all three subintervals. A diagram is a good
way to avoid mistakes.
x
1 1 3 5
The left endpoints are 1, 1 and 3. Our approximation is
L
3
=
3
X
i=1
f(x
i
)∆x
= f(x
1
)∆x + f (x
2
)∆x + f (x
3
)∆x
= x(f(x
1
) + f (x
2
) + f (x
3
))
= 2((1)
2
+ 1
2
+ 3
2
)
= 22
102
b
When the function increases, it has more signed area beneath it than then left-endpoint rectangles.
When it decreases it has less. f(x) = x
2
increases and decreases, but on the interval [1, 5], it
spends much more time increasing than decreasing. Thus we expect that L
3
underestimates the
true integral. We can verify our intuition with a computation.
Z
5
1
x
2
dx =
x
3
3
5
1
=
126
3
> 22
Question 2.4.3
How Accurate is an L
n
or R
n
Approximation?
An approximation is much more useful, if we have some idea of how accurate (or inaccurate) it might
be. The way we quantify this inaccuracy is error.
103
Question 2.4.3
How Accurate is an L
n
or R
n
Approximation?
Definitions
The error in an approximation is given by
error = approximated value actual value
In a real world approximation, we do not know the exact error (why?). We will settle for putting a
bound on error. This is a number N such that we are sure that
|error| N.
Determining error bounds can be difficult. Here are some questions to ask.
1 In what circumstances is the approximation exact?
2 What property or measurement seems to correspond to the amount of error?
3 Is there a “worst case scenario” associated to that property or measurement?
The following exercise explores these questions.
Exercise
a
Draw a function for which L
n
is always an overestimate.
b
Draw a function for which L
n
is always an underestimate.
c
What has to be true of a function for L
n
to always be exact?
d
What familiar calculus measurement appears to measure whether you are in the situations you
described in
a
-
c
?
104
Solution
a
A decreasing function will be overestimated by L
n
.
b
An increasing function will be underestimated by L
n
.
c
If L
n
is always exact, then f (x) is a constant function.
d
Functions can be classified as increasing, decreasing or constant by their first derivative. f
(x)
seems to determine the sign (and maybe size) of the error.
Figure: The error of an L
n
approximation
Let’s use the results of the exercise to formulate an error bound for L
n
.
Higher derivatives seem to produce more negative errors. If we allow for steeper and steeper slopes,
there is no limit to how large the error could be. So let’s put a bound on how big the derivative is.
Suppose we know that f
(x) S on [a, b]. Over each interval [x
i
, x
i+1
] we know that f(x) lies below
the line of slope S through (x
i
, f(x
i
)):
f(x) S(x x
i
) + f (x
i
)
105
Question 2.4.3
How Accurate is an L
n
or R
n
Approximation?
The region below the graph y = f(x) and above the i
th
rectangle is smaller than the region below the
line and above the rectangle, but we can compute the area of the larger region. It is a triangle. Its base
is x =
ba
n
. Its height can be determined by the slope of the line.
Figure: The error and the error bound over one rectangle of an L
n
approximation
height
base
=
rise
run
= S area =
1
2
(base)(height)
height
x
= S =
1
2
Sx
2
height = Sx =
1
2
S
b a
n
2
So the error over each subinterval can be no larger than
1
2
S
ba
n
2
. There are n subintervals, so the
total L
n
approximation underestimates
R
b
a
f(x) dx by no more than
S(ba)
2
2n
.
We can make a similar argument that if f
(x) S then L
n
overestimates
R
b
a
f(x) dx by no more
than
S(ba)
2
2n
. We can combine these two statements into one by using absolute values. S f
(x) S
is rewritten |f
(x)| S.
We could make the same argument for the R
n
approximation. We’d only need to swapping the
overestimate with the underestimate. The error bounds it produces are the same. Our result can be
stated as a theorem:
Theorem
If E
L
and E
R
are the errors in an L
n
and R
n
approximations of
Z
b
a
f(x) dx and |f
(x)| S on [a, b]
then
|E
L
|
S(b a)
2
2n
and |E
R
|
S(b a)
2
2n
106
Remark
The argument that the line of slope S is the “worst case” scenario is a useful heuristic, but you may be
unsatisfied with its lack of rigor. A formal argument relies on the following ideas:
Larger functions have larger integrals. If f(x) g(x), then
R
b
a
f(x) dx
R
b
a
g(x) dx as long as
a b.
The Fundamental Theorem of Calculus tells us we can write f (x) = f (x
i
) +
R
x
x
i
f
(t)dt.
The line of slope S would be L(x) = f (x
i
) +
R
x
x
i
S dt. Over the interval [x
i
, x
i+1
], comparing these
integrals shows that f(x) L(x). Thus
R
x
i+1
x
i
f(x) dx
R
x
i+1
x
i
L(x) dx. This tells us that there is
more error, and thus a larger underestimate in the left hand approximation of L(x) than there is in the
left hand approximation of f (x).
Example 2.4.4
Computing an E
L
Bound
Suppose we want to understand the error of an L
n
approximation of
Z
16
1
x dx.
a
What bounds can we put on |f
(x)| for our error calculation?
b
What bound can we put on the error of the L
5
approximation?
c
What n would we need in order to guarantee that the L
n
approximation has error at most
1
100
.
d
What problem would result, if we tried to bound the error of an L
n
approximation of
Z
16
0
x dx?
How might you resolve this?
Solution
a
f
(x) =
1
2
x
. This is always positive, and it decreases as x increases. The largest value of f
(x)
on [1, 16] occurs when x = 1. If we let S = f
(1) =
1
2
, we are guaranteed that for all x in [1, 16],
|f
(x)| <
1
2
.
107
Example 2.4.4
Computing an E
L
Bound
b
By our theorem
|E
L
|
S(b a)
2
2n
=
1
2
(16 1)
2
2(5)
=
45
4
So the error lies between
45
4
and
45
4
.
c
We can set our error bound (with n as a variable) to be less than
1
100
and solve for n.
|E
L
|
1
2
(16 1)
2
2n
1
100
225
4n
1
100
(225)(100) 4n
(225)(25) n
5625 n
We conclude that the error will be less than
1
100
as long as n is at least 5625. Note that since this
is an error bound, the actual error may shrink below
1
100
with fewer rectangles. We would need a
different method to verify that, though.
d
If we want apply our theorem to
Z
16
0
x dx, we need an S such that |f
(x)| S. This derivative
is f
(x) =
1
2
x
, which increases without bound as x 0
+
. Thus there is no S, and we cannot
apply the error bound theorem.
To get around this problem we could break the interval into two parts and bound them by different
methods. We can bound the error on rectangles 2 through n over the interval [∆x, 16] using the
theorem as above. In this case S =
1
2
x
will work. To bound the error over the first rectangle
[0, x], note that f (x) is increasing. The first rectangle of L
n
will underestimate the integral,
while the first rectangle of R
n
will overestimate it. Thus the actual error can be no bigger than
the difference between them, which is
xx 0∆x. The total error can be no larger than the
sum of the error bound over [0, x] and the error bound over [∆x, 16].
108
Question 2.4.5
How Can We Make our Approximation Less Sensitive to Slope?
L
n
and R
n
have large errors when function is increasing or decreasing rapidly. We’ll examine two
approximations that are more resilient. The first is the midpoint approximation.
Notation
The M
n
approximation of
Z
b
a
f(x) dx is calculated by
summing:
n
X
i=1
f(x
i
)∆x
where the x
i
are the midpoints of each subinterval.
M
4
Our final approximation abandons rectangles entirely. Using trapezoids instead allows for shapes that
reflect the value of the function at both the right and left endpoint. In this construction, the trapezoids
are sideways from the way you may be used to looking at them when you learned their area formula
A =
1
2
(b
1
+ b
2
)h. The parallel bases are vertical. The height is along the x-axis.
Notation
The T
n
approximation of
Z
b
a
f(x) dx is calculated by
summing:
n
X
i=1
1
2
(f(x
i
) + f (x
i+1
))∆x
where x
i
and x
i+1
and the two endpoints of the i
th
subin-
terval.
T
n
can also be calculated as
1
2
(L
n
+ R
n
).
T
4
Example 2.4.6
A Midpoint Approximation
Calculate the M
3
approximation of
Z
5
1
x
2
dx.
Solution
x =
5(1)
3
= 2. We can sketch the intervals:
109
Example 2.4.6
A Midpoint Approximation
x
1 1 3 5
The midpoints are x
1
= 0, x
2
= 2 and x
3
= 4.
M
3
=
n
X
i=1
f(x
i
)∆x
= x(f(x
1
) + f (x
2
) + f (x
3
))
= 2(0
2
+ 2
2
+ 4
2
)
= 40
Example 2.4.7
A Trapezoid Approximation Using a Table of Values
Approximation has no practical use for algebraic functions. We would rather get the exact answer
by taking an antiderivative and applying the Fundamental Theorem of Calculus. In many real-world
applications, our data about a function consists of a finite number of measurements. In this case, we
don’t even have an expression for the function, let alone its antiderivative. Here is an example where
approximation is the best we can do.
Suppose we have the following table of values for a function f(x)
x 0 2 4 6 8 10 12 14 16
f(x) 2 5 3 4 7 8 5 4 1
Calculate the T
3
approximation of
Z
14
2
f(x) dx.
Solution
x =
142
3
= 4. We can sketch the intervals:
x
2 6 10 14
110
T
3
=
3
X
i=1
1
2
(f(x
i
) + f (x
i+1
))∆x
=
1
2
x(f(x
1
) + f (x
2
) + f (x
2
) + f (x
3
) + f (x
3
) + f (x
4
))
=
1
2
x(f(2) + f (6) + f(6) + f(10) + f (10) + f(14))
=
1
2
(4)(5 + 4 + 4 + 8 + 8 + 4)
= 66
Question 2.4.8
How Do the Error Bounds of the Approximations Compare?
T
n
and M
n
have zero error when f (x) is a straight line, regardless of slope. Larger errors result
from high rates of curvature. You can see this by using a small number of rectangles/trapezoids and
increasing the curvature of the function. Proving an error bound involves using a quadratic as a “worst
case scenario.” Any function with second derivative smaller than the quadratic will have a smaller error.
Here is the result.
111
Question 2.4.8
How Do the Error Bounds of the Approximations Compare?
Theorem
Suppose |f
′′
(x)| K for a x b. If E
T
and E
M
are the error in the trapezoid and midpoint
approximations of
Z
b
a
f(x) dx then
|E
T
|
K(b a)
3
12n
2
and |E
M
|
K(b a)
3
24n
2
Remarks
1 The maximum error is smaller when the function has less curvature.
2 The error is also reduced by increasing n, the number of subintervals.
3 These formulas indicate that we can usually expect M
n
to have half as much error as T
n
.
4 As n increases, the error bounds for M
n
and T
n
approach 0 much more quickly than L
n
and R
n
.
Example 2.4.9
Choosing n to Meet an Error Target
Suppose we wish to approximate
R
16
1
x dx by a midpoint approximation. How many rectangles
must we use to guarantee that the error is smaller than
1
1000
?
Solution
The midpoint error formula requires use to have a bound K on |f
′′
(x)| on [1, 16].
f
(x) =
1
2
x
f
′′
(x) =
1
4x
3/2
As x gets larger, the denominator of f
′′
(x) gets larger, meaning |f
′′
(x)| gets smaller (we could also
verify this by checking the sign of f
′′′
(x)). Thus it will be largest at x = 1. We can safely use the value
there as our K
|f
′′
(x)| |f
′′
(1)| =
1
4
= K
112
We can now apply the error bound formula, leaving n as a variable. We will set the error bound to be
less than
1
1000
and solve for n.
|E
M
|
K(b a)
3
24n
2
1
1000
1
4
(16 1)
3
24n
2
1
1000
1
4
(16 1)
3
24n
2
1
1000
all factors are postive
(1000)(15)
3
(4)(24)
n
2
isolate n
2
140, 625
4
n
2
375
2
n square root of both sides
Thus any n bigger than 375/2, will work. We need to use at least 188 rectangles to guarantee that the
error is less than
1
1000
. Note that we might achieve a sufficiently small error with fewer rectangles, but
our error bound theorem can not guarantee it.
Section 2.4
Exercises
Summary Questions
Q1
How is the error in an approximation defined?
Q2
What does the first derivative of f(x) tell you about the error in the right-hand approximation
of
Z
b
a
f(x) dx?
Q3
As the number of subintervals gets large, which approximation(s) converge most quickly to the
actual value?
Q4
Under what situation is a midpoint approximation preferable to a trapezoid approximation? When
would trapezoid be preferable?
113
Section 2.4
Exercises
2.4.1
Q5
Seong-ju and Anthony are both approximating
Z
4
4
x
2
dx with 4 rectangles. They know that
they can use any combination of test points in their rectangles. What is the maximum difference
between their approximations?
Q6 a
What x and x
i
’s would you use for the L
4
approximation of
Z
23
3
f(x) dx?
b
Can you write a general expression for x and the x
i
’s for
Z
b
a
f(x) dx?
2.4.2
Q7
Compute the L
5
approximation of
Z
16
1
x
3/2
dx.
Q8
Compute the R
3
approximation of
Z
8
2
x sin
πx
12
dx.
Q9
Compute the L
4
approximation of
Z
2
0
x
3
e
x
.
Q10
Compute the L
5
approximation of
Z
18
3
3
x
x
dx.
114
2.4.3
Q11
Compute the theoretical error bound on the L
14
approximation of
Z
8
1
3
x dx.
Q12
Compute the theoretical error bound on the R
5
approximation of
Z
15
0
1
x
2
+ 1
dx.
Q13
How large would n need to be to guarantee that the L
n
approximation of
Z
8
2
log
2
x dx is within
1
10000
of the actual value?
Q14
How large would n need to be to guarantee that the R
n
approximation of
Z
2
1
x
3
dx is within
1
1000
of the actual value?
2.4.4
Q15
Suppose we make the following approximations of
Z
30
15
4x + 7 dx. Without computing them, put
them in order from least to greatest (some may be equal).
L
4
L
8
R
4
R
8
M
4
M
8
The actual value
Q16
Yiming has a great idea. He approximates
R
b
a
f(x) dx by 12 rectangles. In order to mitigate the
error of left and right hand approximations, he takes the right endpoint of the first subinterval as
a test point, but the left endpoint of the second subinterval. He continues to alternate for all 12
subintervals. What is another name for the approximation Yiming has produced?
115
Section 2.4
Exercises
2.4.5
Q17
Compute the T
3
approximation of
Z
16
1
x
2
x dx.
Q18
Compute the M
3
approximation of
Z
16
1
x
2
x dx.
Q19
Compute the M
4
approximation of
Z
9
1
cos
πx
2
12
dx.
Q20
Compute the T
2
approximation of
Z
6
0
e
x
2
+2x
.
2.4.6
Q21
Given the following table of values of f (x)
x 0 3 6 9 12 15 18 21
f(x) 10 13 11 15 13 11 9 12
a
Compute the M
2
approximation of
Z
15
3
f(x) dx.
b
Compute the T
3
approximation of
Z
18
0
f(x) dx.
Q22
Given the following table of values of h(x)
x 1 2 3 4 5 6 7 8 9
h(x) 2 1 3 4 2 1 3 5 4
a
Compute the T
3
approximation of
Z
9
1
h(x) dx.
b
Compute the M
3
approximation of
Z
8
2
h(x) dx.
116
2.4.7
Q23
Let f (x) =
1
x
3
. If you wanted to use a midpoint approximation with n rectangles to approximate
Z
5
3
f(x) dx. How large must n be to guarantee your approximation had an error of no more
than
1
10000
? Your answer should have the form n . . ., but you do not need to simplify any
arithmetic.
Q24
Suppose we want to approximate
Z
9
1
x dx.
a
Produce the T
4
approximation. Don’t bother simplifying the arithmetic.
b
Solve for a value n such that T
n
has an error of at most
1
1000000
. Don’t simplify the arithmetic.
Q25
Consider the following data about an unknown function g(x).
x 0 2 4 6 8 10 12 14
g(x) 3 5 8 9 7 4 3 1
a
Compute a M
3
approximation of
Z
12
0
g(x) dx.
b
If you are given that |g
′′
(x)| <
1
4
, what bound can you put on the error of the previous
approximation?
Q26
Sasha is trying to bound the error of her M
10
approximation of
Z
π
0
sin x dx. She computes
f
′′
(0) = 0 and f
′′
(π) = 0 and so decides to use K = 0.
a
What does her choice of K imply about the accuracy of her approximation.
b
Explain what is wrong with Sasha’s reasoning.
c
Compute the actual error bound for the M
10
approximation.
117
Section 2.4
Exercises
Extension and Synthesis
Q27
Give an example of a function for which L
4
and R
4
are both overestimates on some interval. You
may want to express your function by drawing its graph.
Q28
Suppose we want to estimate
Z
20
4
f(x) dx and have the following table of values
x 4 6 8 10 12 14 16 18 20
f(x) 3 5 4 2 1 6 2 5 8
a
What estimates are possible with this data?
b
Would you expect the M
4
or the T
8
approximation to give you a better estimate?
Q29
Consider T
3
, the trapezoid approximation of
Z
8
2
x
3
dx.
a
Produce this approximation. Do not simplify the arithmetic.
b
Compute the theoretical error bound for this approximation.
c
Explain in a couple sentences how you can tell whether the error is positive or negative. You
can include a diagram, if you’d like to.
Q30
Suppose you are interested in the value of
Z
25
0
f(x) dx, but you have only the following data.
x 1 2 6 8 13 14 20 23 25
f(x) 12 19 20 20 28 34 50 57 66
How might you approximate
Z
25
0
f(x) dx?
Q31
Suppose you invent your own approximation for a definite integral. You name it the “ultimate
approximation” and denote it U
n
. Its formula is
U
n
=
L
n
+ R
n
+ M
n
+ T
n
4
.
Will U
n
overestimate or underestimate the integral of a linear function? Justify your answer.
Q32
Suppose we compute an L
5
approximation of
R
13
7
f(x) dx.
118
a
What formula that we learned would give a bound on the error of this approximation? Fill in
all the information you can, and indicate the information that you would need to complete
the calculation. Be as specific as possible.
b
Suppose that, instead of the information you need for the formula, you were only given that
f is an increasing function on [7, 13]. How could you compute an error bound in this case?
Justify your answer.
119
Section 2.5
Improper Integrals
Goals:
1 Integrate a function that has a discontinuity.
2 Recognize when an integral is improper.
3 Determine whether an improper integral converges or diverges.
4 Compute the value of an improper integral.
5 Use comparison to determine convergence.
So far we have been content to evaluate integrals of continuous functions over bounded integrals.
Not all functions are continuous. We may be interested in the area under a discontinuous function, even
one with a vertical asymptote. We may be interested in the area under the entire graph of a function,
not just over some subset. In many cases these areas will be infinite, but in some cases they are not.
We will need to develop the methods to determine which case is which.
Question 2.5.1
What Is Infinity?
In this section we’ll be revisiting ideas about infinity.
Notation
The symbol implies that a variable or function is increasing without bound. It eventually gets bigger
than every number.
is not a number. We cannot evaluate
1
or · 0 or tan
1
().
The main way that we’ve encountered this notation is with limits. Limits at infinity will also be
relevant to improper integrals, so you may want to review them.
120
Exercise
Evaluate the following limits:
a
lim
x→∞
1
x
2
b
lim
x→∞
x
c
lim
t→−∞
e
t
d
lim
y→∞
sin y
e
lim
w→∞
ln w
f
lim
x→−∞
3x
2
+ 7
x
2
5x
Solution
a
lim
x→∞
1
x
2
= 0.
b
lim
x→∞
x = .
c
lim
t→−∞
e
t
= 0.
d
lim
y→∞
sin y does not exist.
e
lim
w→∞
ln w = .
f
lim
x→−∞
3x
2
+ 7
x
2
5x
= 3.
121
Question 2.5.2
How Do We Integrate a Discontinuous Function?
Consider the function
f(x) =
(
3x
2
if x 2
10 2x if x > 2
What is
Z
5
0
f(x) dx?
Figure: The area beneath a discontinuous graph
Z
5
0
f(x) dx is the signed area under f (x) from x = 0 to x = 5. It is equal to a limit
Z
5
0
f(x) dx = lim
x0
n
X
i=1
f(x
i
)∆x
If we look at the rectangle approximations in this equation, we see that they can badly estimate the
function near the point of discontinuity.
Figure: Rectangle approximations of the area beneath a discontinuous graph
122
Remarks
We might worry that the approximations are so bad, that the limit lim
x0
n
X
i=1
f(x
i
)∆x does not
exist. Fortunately, it does, as long as there are only finitely many discontinuities..
f(x) almost has an antiderivative function. F (x) =
Z
x
0
f(t) dt has derivative f(x) at all x,
except perhaps at the points of discontinuity.
While it may be comforting to know that an antiderivative function exists, it doesn’t help us evaluate
the integral. We don’t know what number to assign to F (x) for many values of x. So how do we compute
Z
5
0
f(x) dx? Instead of dealing with a a function whose antiderivative we don’t know, we break this
into two integrals that we do know.
Z
5
0
f(x) dx =
Z
2
0
f(x) dx +
Z
5
2
f(x) dx
=
Z
2
0
3x
2
dx +
Z
5
2
f(x) dx
Why can’t we replace
R
5
2
f(x) dx with
R
5
2
10 2x dx? At x = 2, f(x) = 3x
2
, not 10 2x. This is
unfortunate, because for any number t > 2 we could replace
R
5
t
f(x) dx with
R
5
t
10 2x dx. We will
need to break our integral down further.
Z
5
0
f(x) dx =
Z
2
0
f(x) dx +
Z
t
2
f(x) dx +
Z
5
t
f(x) dx
=
Z
2
0
3x
2
dx +
Z
t
2
f(x) dx +
Z
5
t
10 2x dx
We still don’t know the value of the middle integral, but we know that as t approaches 2, the domain
of integration shrinks to 0. We can take advantage of this by taking a limit.
Z
5
0
f(x) dx = lim
t2
+
Z
2
0
3x
2
dx +
Z
t
2
f(x) dx +
Z
5
t
10 2x dx
= lim
t2
+
x
3
2
0
dx +
Z
t
2
f(x) dx + 10x x
2
5
t
= lim
t2
+
8 0 +
Z
t
2
f(x) dx + (50 25) (10t t
2
)
= lim
t2
+
33 10t + t
2
+
Z
t
2
f(x) dx
= 33 10(2) + 2
2
+
Z
2
2
f(x) dx
123
Question 2.5.2
How Do We Integrate a Discontinuous Function?
= 17
Notice that we had to evaluate an integral with the variable t as a bound. Once we had applied the
Fundamental Theorem of Calculus and plugged in t, this integral became a continuous function and we
could evaluate the limit.
Notice also the strange role the limit played in this computation. Usually we take limits to see what
value a changing function approaches. Our function has the same value for any choice of t (make sure
you see why), so technically we were taking the limit of a constant function. The limit was a purely
computational tool.
Remark
The discontinuity at x = 2 meant that we were stuck with an integral
R
t
2
f(x) dx. With a less well-
behaved function we might have also needed an integral on the left side of 2, like
R
2
s
f(x) dx. However,
these two integrals can always be sent to zero by a limit, so when solving integrals of discontinuous
functions, we can leave these out of our calculations.
We can summarize the method as follows:
Integrating discontinuous functions
If f (x) is discontinuous at x = c and a c b, then
Z
b
a
f(x) dx = lim
tc
Z
t
a
f(x) dx + lim
sc
+
Z
b
s
f(x) dx
provided that both of these limits exist.
A removable discontinuity should not slow us down even this much. The area under a single point
of discontinuity is zero. We can use the following theorem for a function with any finite number of
removable discontinuities.
Theorem
If f (x) and g(x) are equal on [a, b] except at a finite number of points, then
Z
b
a
f(x) dx =
Z
b
a
g(x) dx.
This theorem eliminates the need to use limits in our example
Z
5
0
f(x) dx =
Z
2
0
f(x)
|{z}
=3x
2
dx +
Z
5
2
f(x)
|{z}
= 10 2x
except at x = 2
dx
=
Z
2
0
3x
2
dx +
Z
5
2
10 2x dx
Most discontinuities can be handled this way, but there is one type that will still require limits.
124
Example 2.5.3
Integrating a Function with a Vertical Asymptote
Definition
When f (x) has a vertical asymptote at c in [a, b] we call
Z
b
a
f(x) dx an improper integral.
How can we compute
Z
4
0
1
x
dx?
In this case, breaking this integral into 2 doesn’t help.
Z
4
0
1
x
dx = lim
t0
+
Z
t
0
1
x
dx +
Z
4
t
1
x
dx
We cannot take for granted that lim
t0
+
Z
t
0
1
x
dx goes to 0. The interval is getting smaller, but the
values of the function may be so large that its rectangle approximations stay arbitrarily large and do not
limit to 0. If there were an unbounded amount of area in lim
t0
+
Z
t
0
1
x
dx, then as t 0
+
,
Z
4
t
1
x
dx
would absorb more and more of that area and tend to . Thus if (and only if) lim
t0
+
Z
4
t
1
x
dx exists,
we can assume that the remaining piece
Z
t
0
1
x
dx limits to 0 and can be ignored.
Solution
Z
4
0
1
x
dx = lim
t0
+
Z
4
t
1
x
dx
= lim
t0
+
2
x
4
t
= lim
t0
+
2
4 2
t
= 4 0
Since lim
t0
+
Z
4
t
1
x
dx exists, we conclude that
Z
4
0
1
x
dx = lim
t0
+
Z
4
t
1
x
dx = 4
125
Example 2.5.3
Integrating a Function with a Vertical Asymptote
Figure: The area beneath a function with a vertical asymptote
Main Idea
To compute an improper integral, we introduce a dummy variable t and take limit(s) as t c. If the
limit(s) exist, we say the integral converges. If any do not, we say it diverges.
Remark
Convergent and divergent are the terms that describe whether the limit which defines an integral ap-
proaches a single, finite numerical value. They perform a similar role to “exists” and “does not exist”
for limits or “defined” and “undefined” for arithmetic.
Question 2.5.4
How Can We Compute an Integral over an Unbounded Region?
So far we have been interested in integrals over bounded intervals: a x b. We approximated
these with rectangles.
Figure: The area beneath a graph, approximated by rectangles
126
Consider how this approach would work with an unbounded interval: a x.
Rectangles will not approximate the area we want, but we can compute any finite subsection of it:
Z
t
a
f(x) dx. Like with a discontinuity, we’ll take a limit.
Definition
An integral of the form
Z
a
f(x) dx is also called an improper integral. We evaluate it by computing
Z
a
f(x) dx = lim
t→∞
Z
t
a
f(x) dx
assuming this limit exists. If the limit exists we say the improper integral converges. Otherwise we say
it diverges.
Similarly, we can compute
Z
b
−∞
f(x) dx = lim
t→−∞
Z
b
t
f(x) dx.
Example 2.5.5
Evaluating an Improper Integral
Compute
Z
2
32
x
3
dx.
Figure: An integral over an unbounded domain
127
Example 2.5.5
Evaluating an Improper Integral
Solution
We’ll compute the limit.
lim
t→∞
Z
2
32
x
3
dx = lim
t→∞
16
x
2
t
2
= lim
t→∞
16
t
2
+ 4
= 4
Since the limit exists, it is the value of the improper integral.
Z
2
32
x
3
dx = 4.
Example 2.5.6
An Integral over the Entire Real Line
So far we have looked at intervals unbounded in one direction. If the interval is (−∞, ), the entire
real line, then we use the following definition.
Definition
The improper integral
Z
−∞
f(x) dx is computed:
Z
−∞
f(x) dx =
Z
a
−∞
f(x) dx +
Z
a
f(x) dx
for any number a, so long as both integrals on the right converge. If either integral diverges, then we
say
Z
−∞
f(x) dx diverges as well.
Let
f(x) =
(
e
x
if x < 1
e
x
if x 1
.
Compute
Z
−∞
f(x) dx.
128
Figure: An integral over the real line, broken into two limits
Solution
We break this integral into two limits. The natural breaking point is a = 1 since that is where the
function changes branches anyway. Both limits must converge for the integral to converge.
lim
s→−∞
Z
1
s
f(x) dx lim
t→∞
Z
t
1
f(x) dx
lim
s→−∞
Z
1
s
e
x
dx lim
t→∞
Z
t
1
e
x
dx
= lim
s→−∞
e
x
1
s
= lim
t→∞
2e
x
t
1
= lim
s→−∞
e e
s
= lim
t→∞
2e
t 2e
= e = (diverges)
One limit converges to e. The other diverges. This means that
Z
−∞
f(x) dx diverges.
Question 2.5.7
Can We Take a Limit of
R
t
t
f(x) dx Instead?
We might wonder whether we need to break an integral
Z
−∞
f(x) dx into two integrals. Instead
of two dummy variables, one going to −∞ and one going to , could we replace them by one? The
129
Question 2.5.7
Can We Take a Limit of
R
t
t
f(x) dx Instead?
integral
Z
−∞
x
3
dx is a useful test case. We can certainly compute
lim
t→∞
Z
t
t
x
3
dx = lim
t→∞
x
4
4
t
t
= lim
t→∞
t
4
4
t
4
4
= lim
t→∞
0
= 0
This might even seem right because the area above the axis seems to cancel out the area below the
axis. However, intuitively, we expect that the area of a region should be preserved if we shift it in some
direction. Let’s shift this graph one unit to the left.
lim
t→∞
Z
t
t
(x + 1)
3
dx = lim
t→∞
(x + 1)
4
4
t
t
= lim
t→∞
(t + 1)
4
4
(t + 1)
4
4
= lim
t→∞
t
4
+ 4t
3
+ 6t
2
+ 4t + 1
4
t
4
4t
3
+ 6t
2
4t + 1
4
= lim
t→∞
2t
3
2t
= −∞
We can see that, for any choice of t, there will be more area below the graph than above, and the
difference grows quickly as t increases. If the area of a region changes when we shift it to the side, then
that area was not well defined to begin with. We thus say that these integrals diverge, not because
they go to or −∞, but because they are not defined at all. The formal definition above handles this
example correctly.
Z
0
−∞
x
3
dx diverges, so
Z
−∞
x
3
dx also diverges.
Figure: The area under a functions of the form f(x) = (x a)
3
130
Main Idea
Do not replace the correct definition:
lim
t→−∞
Z
a
t
f(x) dx + lim
t→∞
Z
t
a
f(x) dx
with the “shortcut:”
lim
t→∞
Z
t
t
f(x) dx
The “shortcut” can suggest that the integral converges, when in fact it diverges.
Synthesis 2.5.8
A Comparison Test
Recall the following theorems
Theorem
If f (x) g(x) on [a, b] then
Z
b
a
f(x) dx
Z
b
a
g(x) dx.
Theorem
Let a be a real number or ±∞. If F (x) G(x) for all x near a, then lim
xa
F (x) lim
xa
G(x).
Suppose we have a function f(x) whose anti-derivative we don’t know, and a function g(x) whose
anti-derivative we do know. What can the divergence or convergence of
Z
a
g(x) dx tell us about
Z
a
f(x) dx?
131
Synthesis 2.5.8
A Comparison Test
Solution
If we know that f(x) g(x) then for all t a,
Z
t
a
f(x) dx
Z
t
a
g(x) dx. This allows us to also
compare their limits, which are the improper integrals:
Z
a
f(x) dx and
Z
a
g(x) dx. This could be
useful in a couple ways.
If lim
t→∞
Z
t
a
g(x) dx = −∞ then lim
t→∞
Z
t
a
f(x) dx = −∞ as well, meaning
Z
a
f(x) dx diverges.
If on the other hand f(x) g(x) and lim
t→∞
Z
t
a
g(x) dx = then lim
t→∞
Z
t
a
f(x) dx = as well,
which also means
Z
a
f(x) dx diverges.
We might like to reverse these and say that if
Z
a
g(x) dx converges,
Z
a
f(x) dx must as well,
but
Z
a
f(x) dx can diverge without going to infinity. f (x) could oscillate between positive and
negative so that
Z
t
a
f(x) dx increases and decreases and does not have a limit as t .
We can actually solve the last issue adding the assumption that f(x) is non-negative. The result is
not easy to prove, but it is useful.
Theorem
Suppose 0 f(x) g(x) for all x.
If
Z
a
f(x) dx diverges,
Z
a
g(x) dx diverges.
If
Z
a
g(x) dx converges, then
Z
a
f(x) dx converges.
There are similar versions of this theorem for integrals to −∞ or for functions that are non-positive.
132
Section 2.5
Exercises
Summary Questions
Q1
What is an improper integral?
Q2
Under what conditions were we able to conclude that
Z
b
a
f(x) dx =
Z
b
a
g(x) dx?
Q3
What does it mean for an improper integral to converge or diverge?
Q4
If we know that
Z
a
g(x) dx converges, what condition on f(x) would guarantee that
Z
a
f(x) dx
converges?
2.5.1
Q5
In the expressions below, which of the boxes can legally be replaced by an symbol?
lim
x 1
x + 2 = 3
Z
4
0
f(x) dx = e
5
+
1
6
x
2
+ 2x log
7
|x|
8
1
Q6
Evaluate lim
x→∞
4
p
x
3
2x + 1.
Q7
Evaluate the following limits:
a
lim
x→∞
x
2
+ 3x + 5
e
x
b
lim
x→−∞
x
2
+ 3x + 5
e
x
Q8
Evaluate lim
w→∞
ln
1
w
.
133
Section 2.5
Exercises
2.5.2
Q9
Evaluate
R
3
0
x
2
x
dx. Explain how you dealt with any discontinuities.
Q10
Let
f(x) =
(
4 x = 1, 4, or 6
2 otherwise
.
a
Sketch the graph y = f(x).
b
Evaluate
Z
5
0
f(x) dx. State what tool you used to deal with any discontinuities.
Q11
Let
g(x) =
x if 0 x 4
3 if 4 < x < 6
1
x
2
if 6 x
.
Compute
Z
8
1
g(x) dx.
Q12
The sign function has the form
σ(x) =
(
1 if x > 0
1 if x < 0
.
Write a formula (in terms of a and b) for
Z
b
a
σ(x) dx. Your answer will be a piecewise expression.
2.5.3
Q13
Consider the integral
Z
2
2
1
x
dx.
a
Sketch the graph of y =
1
x
.
b
Set up the limits that would compute this integral.
c
Do these limits exist?
134
Q14
Evaluate
Z
1
0
ln x dx.
Q15
Evaluate
Z
4
0
1
x
+
1
4 x
dx.
Q16
Evaluate
Z
3
0
2
w
2
dw.
2.5.4
Q17
How large will the base (x) of each rectangle be, if we want to approximate:
a
The area over the interval [4, 16] with 3 rectangles?
b
The area over the interval [a, b] with n rectangles?
c
The area over the interval [a, ) with n rectangles?
Q18
Compute
Z
3
2
x
dx.
Q19
Compute
Z
0
−∞
e
x
dx.
Q20
Evaluate
Z
0
e
2x
dx.
Q21
Evaluate
Z
1
0
ln x dx. You may need l’Hˆopital’s rule.
Q22
Compute
Z
3
1
x
3
dx, showing all necessary steps.
135
Section 2.5
Exercises
2.5.5
Q23
Compute
Z
−∞
xe
x
2
dx.
Q24
Show how to evaluate
Z
−∞
x
1/3
dx or show that it diverges.
Q25
Let
f(x)
(
1
x
3
if x < 2
1
(x+4)
2
if x 2
.
Evaluate
Z
−∞
f(x) dx.
Q26
How would you write
Z
−∞
1
1 + x
2
dx as a sum of two limits? You might recall that
Z
1
1 + x
2
dx =
tan
1
x + c. Use this to evaluate the integral.
Extension and Synthesis
Q27
Let
f(x)
(
3
x if x < 8
10 x if x 8
.
a
Is f (x) continuous? Justify your answer with a calculation
b
What is the area enclosed by y = f(x) and y = 0?
Q28
Let
f(x)
x
4/3
if x < 8
1
3
x
if 8 x < 0
e
x
if x 0
.
Evaluate
Z
−∞
f(x) dx.
136
Q29
Consider the region R below y =
1
x
, above y = 0 and to the right of x = 1.
a
Try to compute the area of R using an integral.
b
Suppose R is rotated around the x-axis to create a solid S. Compute the volume of S.
c
How annoying are the conclusions of
a
and
b
?
Q30
Consider the region in the first quadrant whose boundary is the curves y =
3
x
, y = 2x 1 and
y = 0.
a
Write the area of this region as an integral in the variable y. Do not evaluate.
b
Suppose this region is rotated around the x-axis. Write the resulting volume using one or
more integrals. Do not evaluate.
137
Section 2.6
Probability
Goals:
1 Test the properties of a probability density function.
2 Use probability density function to describe the underlying random variable.
3 Use the uniform, exponential, and normal distributions.
4 Compute probabilities and expected values.
The main problem facing every planner is uncertainty. When will the next epidemic strike? Will the
stock market go up or down? How many rare particles will flow through a detection device? These
outcomes cannot be known ahead of time, but they can be modeled as probabilities. Knowing when the
epidemic is likely to happen can guide our decision of how much to invest in mitigation. Knowing how
many particles are likely to pass through an area can inform us how sensitive our detection device needs
to be.
On the other hand, probabilities can also help us understand what has already happened. Probabilities
tell us whether the results of an experiment are likely to be a coincidence. Is an apparent pattern just
the variation inherent in random sampling, or is it likely to be present if the procedure is repeated? This
is in fact the basic model for statistical reasoning:
1 Assume that the type of pattern you’re looking for does not exist (a null hypothesis).
2 Collect observations.
3 Compute the probability of seeing those observations, given your assumption.
4 If the probability is very low, then the assumption is probably false.
Such reasoning allows us to conclude that survey is representative of the population as a whole. It
allows us understand what outcome will occur on average, or how much outcomes are likely to vary.
Such statistics help us understand the way the world works. We can design our next experiment or plan
our future behavior around that understanding. For example, on average, the stock market goes up.
This is one of the most powerful financial facts available to long-term investors, and it can be grounded
in a probabilistic study of past performance.
Question 2.6.1
What Is a Continuous Probability Distribution?
Definition
A random variable encodes the possible outcomes of a random selection. We use the notation
P (outcome) to denote the probability that a particular outcome occurs. If an outcome is impossible,
we write P (outcome) = 0. If it is certain we write P (outcome) = 1.
138
Example
Our outcome can be any expression concerning the random variable, for instance:
If S is the sum of the rolls of two six-sided dice, then
P (S = 8) =
5
36
.
If T is the number of tails when two coins are flipped then
P (T 1) =
3
4
.
We can encode these probabilities with a distribution function. The value of the function at each
number a is the probability that the outcome is a.
Example
If T is the number of tails obtained from two fair coins then
f
T
(t) =
1
4
if t = 0
1
2
if t = 1
1
4
if t = 2
0 if t = anything else
Notice
The sum of the probabilities adds to 1.
There are only finitely many values of T that are possible.
What if we wanted to model height with a random variable? No one is exactly 68 inches tall. Even
people who say they are “five feet eight inches” are slightly taller or shorter. A distribution function
like we made for coins is unsuitable. It would have the property f
H
(h) = 0 for all h. To handle this
situation, we need to define a different kind of random variable with a different relationship to a defining
function.
139
Question 2.6.1
What Is a Continuous Probability Distribution?
Definition
A continuous random variable X is a random variable whose outcomes are real numbers, and whose
probability is modeled by a probability density function f
X
(x) such that
P (a X b) =
Z
b
a
f
X
(x) dx.
f
X
(x) must satisfy
1 f
X
(x) 0 for all x.
2
Z
−∞
f
X
(x) dx = 1
Remark
The term density should give us a hint about how to think about these functions. Density is a rate.
The value of a probability density function tells you the rate of likelihood per unit of length on the real
number line. Integrating this rate over an interval gives the total likelihood of lying on that interval,
much like integrating a rate of change over an interval computes the total change.
An integral is the natural way to measure probability. The rules of integration are compatible with
our intuition of probability. Suppose we have an interval [a, b] broken into two or more subintervals. The
total probability of X having an outcome in [a, b] is equal to the sum of the probabilities of the outcome
lying in each subinterval. Similarly, the area above [a, b] and below the graph y = f (x) is equal to the
sum of the areas above each subinterval. In equations, these are the laws:
P (a X c) + P (c X b) = P (a X b)
Z
c
a
f
X
(x) dx +
Z
b
c
f
X
(x) dx =
Z
b
a
f
X
(x) dx
140
Example 2.6.2
Describing a Random Variable from its Density Function
Consider the function
f
X
(x) =
(
1
9
x
2
if 0 x 3
0 if x > 3 or x < 0
a
Verify that f
X
is a probability density function.
b
If f
X
is the density function of X, compute P (X 2).
c
What does f
X
tell us about the likely values of X?
Solution
a
We need to check that f
X
(x) is never negative and
Z
−∞
f
X
(x) dx = 1
f
X
(x) is never negative, because it is either a square or 0.
Z
−∞
f
X
(x) dx =
Z
0
−∞
f
X
(x) dx +
Z
3
0
f
X
(x) dx +
Z
3
f
X
(x) dx
=
Z
0
−∞
0 dx +
Z
3
0
1
9
x
2
dx +
Z
3
0 dx
=
1
27
x
3
3
0
=
1
27
(27 0)
= 1
b
P (x 2) =
Z
2
f
X
(x) dx
=
Z
3
2
f
X
(x) dx +
Z
3
f
X
(x) dx
=
Z
3
2
1
9
x
2
dx +
Z
3
0 dx
=
1
27
x
3
3
2
141
Example 2.6.2
Describing a Random Variable from its Density Function
=
1
27
(27 8)
=
19
27
c
Outcomes outside of [0, 3] are impossible. Among the outcomes in [0, 3], outcomes closer to 3 are
more likely than outcomes closer to 0, because the density function has a greater value there.
Figure: The density function of X and the area representing P (X > 2)
Main Ideas
To verify that a function is a probability density function, we need to check that it is never negative
and that it integrates, over the entire real line, to 1.
We compute the probability that X has an outcome in an interval by integrating f
X
(x) over that
interval.
Outcomes of X where f
X
(x) is large are more likely than outcomes where f
X
(x) is small.
142
Figure: The density function of X and the areas that represent the likelihood of larger and smaller
outcomes
Question 2.6.3
What Density Functions Arise Naturally?
The requirements to be a probability density function are not very strict. The vast majority of prob-
ability density functions do not model a real life phenomenon or even an intriguing thought experiment.
What follows are three families of density functions that are especially useful. The first is the simplest.
When we lack data to suggest otherwise, it is a common choice when creating a model with some
randomness.
Definition
Given an interval [a, b], the uniform distribution on [a, b] is given by
f
X
(x) =
(
1
ba
if a x b
0 if x > b or x < a
Notice that the shorter the interval [a, b] is, the higher density is required to integrate to a total
probability of 1.
143
Question 2.6.3
What Density Functions Arise Naturally?
Figure: The density function of a uniform distribution
An intuitive but imprecise way to describe a random variable with a uniform distribution is to say that
all outcomes in [a, b] are equally likely. Since every outcome of a continuous random variable occurs with
probability 0, this is unhelpful. X is remarkable, because all outcomes in [a, b] have equal probability
density. To connect this to actual probabilities, we might say that all subintervals of [a, b] are equally
likely to contain the outcome of X, but this is incorrect. X is 3 times as likely to have an outcome in
an interval of length 6 as an interval of length 2. A precise statement would be: the likelihood of the
outcome of X occurring in each subinterval of [a, b] is proportional to the length of the subinterval.
Our second family of random variables naturally measures waiting time. This answer questions like:
when will the next customer come in? When will this device next detect a certain type of ambient
particle? Here is the formal definition.
Definition
Suppose an event happens randomly and uniformly at an average rate of λ times per unit of time (x).
Then the amount of time until it next occurs is given by the exponential distribution:
f
X
(x) =
(
λe
λx
if 0 x
0 if x < 0
Observe the following
1 Higher λ means that X is likely to be smaller, as the event occurs sooner.
2 The probability of the event occurring in given interval, given that it did not occur before that
interval, depends only on the length of the interval.
144
Figure: The density function of an exponential distribution
The second point is best illustrated with a concrete example.
Example
Gravitational waves large enough to detect pass through the earth from time to time. Suppose we
switch on a gravitational wave detector, and the time (in days) until the first detection is modeled by
the exponential random variable X with density function 0.7e
0.7x
.
The probability that the first detection occurs within two days is 0.75.
If the first detection does not occur in the first two days, then the probability that it occurs in the
following two days is 0.75
If the first detection does not occur in the first four days, then the probability that it occurs in
the following two days is 0.75
And so on
From this we can compute
P (2 X 4) = (1 P (X 2))
| {z }
X is not in
the first two days
(0.75)
= (0.25)(0.75)
= 0.1875
Our final family is the most famous, because it is the most generally applicable.
Definition
The normal distribution is sometimes called a bell curve. Many natural phenomena are normally
distributed. The formula is
f
X
(x) =
1
σ
2π
e
(xµ)
2
2σ
2
145
Question 2.6.3
What Density Functions Arise Naturally?
The anti-derivative of this density function cannot be expressed with functions that we can evaluate.
Instead we can look up values in a table. The normal distribution has a special role in statistics:
Theorem [The Central Limit Theorem]
The average of any n independent identically distributed random variables (for instance performing the
same experiment n times) will converge to a normal distribution as n gets large.
This theorem helps explain why many natural measurements are approximated by bell curves. For
example, human height is affected by hundreds of factors, including individual genes, nutrition and
environment. If we view human height as an average of these factors, scaled with appropriate units,
then we expect human heights to be modeled by a normal random variable. Viewing a histogram of
human height statistics shows the expected bell curve.
The parameters in f
X
can be interpreted as follows:
µ is the average value of X. It corresponds to the peak of the bell curve.
σ is the standard deviation of X. Larger σ means that X has a larger probability of being far
from µ.
Figure: The density function (bell curve) of a normal distribution
Question 2.6.4
What Is the Expected Value of a Random Variable?
Expected value will be the first statistic we can compute for a random variable. Statistics of a data
set tell us something about the numbers in the data set. Statistics of a random variable should tell us
something about the outcomes of the random variable.
The expected value or average value of X describes what the average result will be, if you
let X take a value at random many times. It is typically denoted E[X] or with the letter µ.
146
Example
Suppose we average our rolls of a six-sided die. As the number of rolls n gets large, we’ll roll each
number close to
n
6
times. The sum of the rolls will be approximately
1
n
6
+ 2
n
6
+ 3
n
6
+ 4
n
6
+ 5
n
6
+ 6
n
6
to compute the average, we divide by n. Fortunately, every term already has an n.
µ = 1
1
6
+ 2
1
6
+ 3
1
6
+ 4
1
6
+ 5
1
6
+ 6
1
6
= 3.5
In general dividing the number of occurrences of the result a in n evaluations of X will be nf
X
(a).
When we divide out n, we obtain the following weighted average:
Formula
The expected value of a (discrete) random variable X with probability distribution function f
X
is
E[X] =
X
x
xf
X
(x)
where x is summed over all possible outcomes of X.
To produce the corresponding formula for a continuous random variable, instead of multiplying
each outcome by its probability and summing, we multiply each output by its density and integrate
Formula
The expected value of a continuous random variable X with probability density function f
X
is
E[X] =
Z
−∞
xf
X
(x) dx
147
Example 2.6.5
The Expected Value of a Uniform Random Variable
Compute the expected value of a uniform random variable on [a, b].
Solution
We’ll apply the formula. Since f
X
(x) has discontinuities at a and b, we will break it into three parts.
E[X] =
Z
−∞
xf
X
(x) dx
=
Z
a
−∞
x(0) dx +
Z
b
a
x
1
b a
dx +
Z
b
x(0) dx
=
1
2(b a)
x
2
b
a
=
1
2(b a)
b
2
1
2(b a)
a
2
=
b
2
a
2
2(b a)
=
(b a)(b + a)
2(b a)
=
b + a
2
Notice that this is the midpoint of the interval [a, b]. Since X is uniformly distributed across the interval,
we’d expect the average value to occur at the midpoint.
Main Ideas
E[X] is typically occurs somewhere in the middle of the possible outcomes of X. With symmetric
density functions, it is the midpoint.
Example 2.6.6
The Expected Value of an Exponential Random Variable
a
Compute the expected value of a exponential random variable.
b
Explain why the role of λ in the answer to
a
makes sense.
148
Solution
a
We will use the formula. Even after removing the region of 0 density, we are left with an improper
integral. We therefore will compute a limit.
E[X] =
Z
−∞
xf
X
(x) dx
=
Z
0
−∞
x(0) dx +
Z
0
xλe
λx
dx
= lim
t→∞
Z
t
0
xλe
λx
dx
= lim
t→∞
xe
λx
t
0
Z
t
0
e
λx
dx
= lim
t→∞
xe
λx
1
λ
e
λx
t
0
= lim
t→∞
te
λt
e
λt
+ 0e
0
+
1
λ
e
0
= lim
t→∞
te
λt
0 + 0 +
1
λ
=
1
λ
+ lim
t→∞
t
e
λt
form
=
1
λ
+ lim
t→∞
1
λe
λt
(l’Hˆopital’s rule)
=
1
λ
+ 0
u = x dv = λe
λx
dx
du = dx v = e
λx
by parts
Our final answer is
E[X] =
1
λ
b
X measures the time until an event with average frequency λ occurs. Thus on average, we expect
to wait
1
λ
for it. For example, if an event occurs three times per hour, we would expect to wait
about 20 minutes for it to occur.
149
Example 2.6.6
The Expected Value of an Exponential Random Variable
Figure: The expected value of a exponential random variable
Main Idea
For asymmetric density functions, E[X] will not be in the middle of the range of values. It will be pulled
toward regions of higher likelihood.
Synthesis 2.6.7
Median Wait Time
Suppose that an exponential random variable models the wait time of a random caller to a call
center.
a
What is the median wait time?
b
Explain graphically why the median wait time less than the expected wait time.
Solution
a
The median is the number m such that half the outcomes are larger than m and half are smaller.
150
We can write this as the following equation and solve for m.
P (X m) = 0.5
Z
m
−∞
f
X
(x) dx = 0.5
Z
0
−∞
f
X
(x) dx +
Z
m
0
f
X
(x) dx = 0.5 (presumably m > 0)
Z
0
−∞
0 dx +
Z
m
0
λe
λx
dx = 0.5
e
λx
m
0
= 0.5
e
λm
+ e
0
= 0.5
e
λm
= 0.5
λm = ln 0.5
m =
1
λ
ln 2
b
The median is the point such that half the area under y = f
X
(x) lies on either side. The expected
value is weighted. A few outcomes far to one side can balance many outcomes slightly to the
other side. The outcomes of X extends to on the right but only to 0 on the left. These distant
outcomes pull the average to the right, but their distant position has no effect on the median.
Figure: The median M and expected value µ of an exponential random variable
151
Synthesis 2.6.7
Median Wait Time
Main Idea
The median is the value m such that half the area under y = f
X
(x) lies on either side of x = m.
We compute the median by setting P (X m) = 0.5 and solving for m.
Median is not the same as expected value. y = f
X
(x) may have more area on one side of E[X]
than the other, if the smaller side’s area is farther from the middle.
Section 2.6
Exercises
Summary Questions
Q1
Describe the difference between a continuous random variable and a non-continuous (discrete)
one.
Q2
How do we use a probability density function to compute the probability of an outcome?
Q3
What must be true about a probability density function?
Q4
How do you compute the expected value of a random variable?
2.6.1
Q5
How many possible outcomes does a continuous random variable have?
Q6
Which of the following probability questions can be answered without any further information?
Explain.
i. If you spin a prize wheel 3 times, what is the probability that my winnings add up to exactly
$80?
ii. If you flip two weighted (unfair) coins, what is the probability that exactly one of them comes
up tails?
152
iii. If you pick a random person, what is the probability that her height is exactly 68 inches?
iv. If I spin a wheel of names, what is the probability that it takes exactly 7 spins to land on my
own name?
Q7
Let X be a continuous random variable. Compute P (X = 13).
Q8
Another book might teach you that P (a < X < b) =
Z
b
a
f
X
(x) dx, instead of P (a X b) =
Z
b
a
f
X
(x) dx. Why shouldn’t this bother you?
Q9
Let f
T
(t) be a probability density function of a random variable T . What quantity is represented
by
Z
5
−∞
f
T
(t) dt?
Q10
Let f
X
(x) be a probability density function of a random variable X. What quantity is represented
by
Z
2
f
X
(x) dx?
Q11
Given a density function f
U
(u) for a random variable U, write an integral or integrals to compute
P (4 U
2
9).
Q12
Suppose the height of a mature sunflower is given by the random variable H with density function
f
H
(h). If you friend tells you that her sunflower is in the top quintile in height, explain how you
could use f
H
to determine a range that the height of her sunflower must lie in.
2.6.2
Q13
Let W be a random variable with density function
f
W
(w) =
(
36w
2
144
if 0 w 6
0 otherwise
Compute P (2 W 9)
Q14
Let T be a random variable with density function
f
T
(t) =
(
3
t
2
if 0 t 1
0 otherwise
Compute (0 T
1
4
)
153
Section 2.6
Exercises
2.6.3
Q15
If U is a uniform random variable on [4, 7.5], compute is the probability that U 5.5.
Q16
If X is a uniform random variable on [2, c] and P (0 X 4) = 0.25, what is c?
Q17
If W is an exponential random variable such that P (W 1) =
2
7
, then compute the value of the
parameter λ in its density function f
W
.
Q18
Juan looks at the density function of an exponential random variable X and says X is more
likely to have the value 1 than 5.” “That’s silly,” replies Neha, X has exactly zero probability
of being either of those. They are equally likely.” What do you think of their argument?
2.6.4
Q19
Let f (x) =
(
bx
3
x 2
0 x < 2
.
a
Compute a number b so that f is a probability density function.
b
If f is the density function for some random variable Z, compute E[Z].
Q20
Suppose X is a random variable with density function f
X
(x). Suppose f
X
(x) is 0 outside [3, 11]
and decreasing on [3, 11]. Is E[X] greater or less than 7? Explain.
Q21
Suppose X is a continuous random variable with probability density function
f
X
(x) =
(
3
x
16
if 0 x 4
0 if x > 4 or x < 0
a
In a sentence or two, state what you would need to check to ensure that f
X
(x) is a valid
probability density function. You do not need to actually perform the calculations.
b
Compute E[X].
154
Q22
Explain how you can use the graph of a normal random variable to identify the expected value.
Then compute that value using the expected value formula.
2.6.5
Q23
Give the expected value of a uniform random variable on [5.2, 9.4].
Q24
If the uniform random variable on [a, b] has expected value 7, and a = 3, what is b?
Q25
In this example, we divided by (b a). What would happen if b a = 0?
Q26
If you know the expected value µ of a uniform random variable X, what is the probability that
µ? Is this problem answerable without the assumption that X is uniform? Explain.
2.6.6
Q27
Suppose X and Y are two different exponential random variables modeling events that occur on
average p and 2p times per day respectively. How are their expected values related?
Q28
Does our expected value formula result sense if λ < 0? Why should this not bother us.
Q29
On bus route 70, 3 buses come per hour, on average.
a
Write a probability density function for X, the amount of time until the next bus arrives.
b
What is the expected amount of time until the next bus comes?
c
How likely is it that you will wait more than an hour for the bus?
Q30
If X is an exponential random variable, what is the probability that X E[X].
155
Section 2.6
Exercises
2.6.7
Q31
Compute the median value of a uniform random variable on [a, b].
Q32
Let W be a random variable with density function
f
W
(w) =
(
36w
2
144
if 0 w 6
0 otherwise
Compute the median value of W .
Q33
Let T be a random variable with density function
f
T
(t) =
(
3
t
2
if 0 t 1
0 otherwise
Compute the median value of T .
Q34
Examine the graph of the density function of a normal random variable X. What is the median
of X? Explain how you can see this in the graph.
Extension and Synthesis
Q35
Suppose X is a uniform random variable on [a, b] and P (3 X 4) =
1
2
. Describe all possible
values of a and b.
Q36
Suppose the random variable W has the density function
f
W
(w) =
(
k(7 w) if 1 w 7
0 if w > 7 or w < 1
a
What values of W are possible?
b
What can you say about which values of W are more likely than others?
c
Given that f
W
is a density function, what is the value of the constant k?
156
d
What is the average value of W ?
e
Can you compute the median value of W ? This might be easier with geometry than with
calculus.
Q37
Suppose that g(x) is a probability distribution for a random variable X and g(x) = 0 for all
x 0.
a
What is the value of
Z
0
−∞
g(x) dx? Justify your answer with a sentence or computation.
b
Give a formula for E[X]. Is it positive or negative? Justify your answer in a sentence or two.
Q38
Recall that an even function f (x) has the property that f(x) = f(x) for all x. If the density
function of a random variable is even, what does that say about the expected value and median
of X? Explain your answer.
157
Section 2.7
Functions of Random Variables
Goals:
1 Compute expected values of functions of a random variable.
2 Compute the average value of a function.
3 Compute the variance of a random variable.
Sometimes the quantity modeled by a random variable is not the quantity we actually care about. For
example, while we might have a model for how many people will contract a disease, what we actually
would like to predict is how many healthcare resources they will require. The number of patients
determines the required resources, so mathematically, resources is a function of patients. Expected
values of such functions turn out to be straightforward to compute. A natural way to generate statistics
about a random variable is to write a function that measures something interesting and compute its
expected value.
Question 2.7.1
What Is a Function of a Random Variable?
When we write a function g(X) of a random variable X, then the output Y of this function is itself
a random variable. These functions are most intuitive with a discrete random variable. In this case we
can compute Y ’s probability distribution function by applying g to each outcome of X and summing
the probabilities that produce each output.
Example
Let X be a discrete random variable with probability distribution function f
X
(x). If Y = g(X) = X
2
then Y is a random variable and we can compute its probability distribution function f
Y
(y).
f
X
(x) =
0.1 if x = 0
0.2 if x = 2
0.3 if x = 3
0.4 if x = 2
0 otherwise
f
Y
(y) =
0.1 if y = 0
0.6 if y = 4
0.3 if y = 9
0 otherwise
Since X = 2 and X = 2 both produce Y = 4, we added their probabilities together.
The function g does not need to be algebraically defined.
158
Example
Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed
(meaning each occurs with probability
1
100
). Let N give the number of digits of X. Then N has
distribution function.
f
N
(n) =
9
100
if n = 1
90
100
if n = 2
1
100
if n = 3
0 otherwise
Question 2.7.2
How Do We Compute Expected Value of a Function?
In the case of a discreet random variable, we can compute expected value directly from the distribution
function.
Example
Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed.
Let N give the number of digits of X.
E[N] = (1)
9
100
+ (2)
90
100
+ (3)
1
100
= 1.92
Alternately, we could avoid using f
N
by directly applying the digits function to each outcome X and
taking a weighted average.
Example
E[N] = (1)
1
100
+ ··· + (1)
1
100
| {z }
9 times
+ (2)
1
100
+ ··· + (2)
1
100
| {z }
90 times
+ (3)
1
100
= 1.92
159
Question 2.7.2
How Do We Compute Expected Value of a Function?
In general this gives us two ways to compute the expected value of a function.
Formulas
If Y = g[X] then we can compute E[Y ] from f
X
or from f
Y
.
E[Y ] =
X
outcomes y
i
y
i
f
Y
(y
i
)
E[Y ] =
X
outcomes x
i
g(x
i
)f
X
(x
i
)
Remarks
We can equate these formulas by substituting
f
Y
(y
i
) =
X
g(x
j
)=y
i
f
X
(x
j
).
All that remains is to distribute the y
i
.
Both formulas will get us to the answer, but one of them skips the step of finding a distribution
function for Y .
In the case of a continuous random variable X, we might find it difficult to find the expected value
of Y = g(X) directly. We would need to
Find a density function f
Y
(y) such that
Z
b
a
f
Y
(y) dy = P (a g(X) b)
for all a and b
Integrate E[Y ] =
Z
−∞
yf
Y
(y) dy.
The first step is difficult for any but the simplest functions.
Fortunately, there is an integration analogue of substitution and distributive argument for discrete
variables. This allows us to compute the average outcome of Y as a weighted average of the probabilities
of X.
Theorem
If Y = g(X) is a function of a continuous random variable X with density function f
X
(x), then
E[Y ] =
Z
−∞
g(x)f
X
(x) dx
160
Notice that the expected value of X is a special case of this theorem. In this case, we are computing
the expected value of the function g(X) = X.
Example 2.7.3
Computing the Expected Value of a Function
Consider the random variable X with density function
f
X
(x) =
(
1
9
x
2
if 0 x 3
0 if x > 3 or x < 0
What is the expected value of e
X
?
Solution
Since we want E[e
X
], our function is g(x) = e
x
.
E[e
X
] =
Z
−∞
e
x
f
X
(x) dx
=
Z
3
0
1
9
x
2
e
x
dx
=
1
9
x
2
e
x
3
0
Z
3
0
2
9
xe
x
dx
=
1
9
x
2
e
x
3
0
2
9
xe
x
3
0
+
Z
3
0
2
9
e
x
dx
=
1
9
x
2
e
x
2
9
xe
x
+
2
9
e
x
3
0
=
5e
3
2
9
u =
1
9
x
2
dv = e
x
dx
du =
2
9
x dx v = e
x
by parts
u =
2
9
x dv = e
x
dx
du =
2
9
dx v = e
x
by parts again
We can check whether our answer is reasonable. Since X has outcomes between 0 and 3, e
X
should
have outcomes between 1 and e
3
. Our expected value should also fall in that range, and it does.
161
Application 2.7.4
The Average Value of a Function
Sometimes people refer to the average value of a function without any reference to a random variable.
In this case, we understand the input variable to be uniformly distributed.
Definition
The average value of a function from x = a to x = b is the expected value of f(X), where X is
a uniform random variable on [a, b]. The density function is a constant, so we can factor it out of the
integral. We obtain the formula:
f
ave
=
1
b a
Z
b
a
f(x) dx.
The number f
ave
has geometric significance as well. The signed area under the graph y = f (x) from
x = a to x = b is
Area =
Z
b
a
f(x) dx.
The region under the horizontal line y = f
ave
is a rectangle with equal signed area:
Area = width ×height = (b a)
1
b a
Z
b
a
f(x) dx
!
.
In other words, if we flattened the area under f into a rectangle, f
ave
would be its height.
Figure: The graph of y = f(x) and the constant function y = f
ave
162
Example 2.7.5
Computing The Average Value of a Function
Compute the average value of f(x) = xe
x
2
between x = 1 and x = 3.
Solution
f
ave
=
1
3 1
Z
3
1
xe
x
2
dx
=
1
2
Z
9
1
1
2
e
u
du
=
1
4
e
u
9
1
=
1
4
(e
9
e)
u = x
2
x = 1 u = 1
du = 2x dx x = 3 u = 9
1
4
du =
y
2
dy
u-substitution
Application 2.7.6
Variance
Suppose we wanted to plan ahead for the outcome of some random variable X. We might choose
to prepare for the circumstance in which X takes on the value E[X]. This is most likely to be a good
bet, but how much effort should we expend preparing for outcomes far from E[X]? It would help to
know how likely X is to be far from E[X]. We can model this with a distance function (actually we’ll
use distance squared) and compute the expected value of the distance function.
Definition
The variance of a random variable X is the expected value of (X E[X])
2
. If X is continuous with
density function f
X
(x), we obtain the formula
Z
−∞
(x E[X])
2
f
X
(x) dx
The square root of variance is the standard deviation. Standard deviation is often denoted by σ, and
variance is often denoted by σ
2
.
If the expected value of (x E[X])
2
is larger, then X is more likely to be far from its expected
value.
163
Application 2.7.6
Variance
Figure: A density function with less variance and a density function with more variance
For example, we can compute the variance of X where X is a uniform random variable on [0, 8].
Solution
Variance is the expected value of (X E[X])
2
, so first we need to know the number E[X]. We showed
earlier that for a uniform random variable, E[X] is the midpoint of the interval. In this case that is
8+0
2
= 4. Armed with this value, we can compute the variance.
E
h
(X 4)
2
i
=
Z
−∞
(x 4)
2
f
X
(x) dx
=
Z
8
0
(x 4)
2
1
8 0
dx because f
X
(x) = 0 outside [0, 8]
=
1
8
Z
8
0
x
2
8x + 16 dx factor out
1
8
=
1
8
x
3
3
4x
2
+ 16x
8
0
=
1
8
512
3
(4)(64) + (16)(8) 0 + 0 0
=
1
8
128
3
=
16
3
Remarks
In order to solve for variance, we need to know the expected value. We may have to compute
E[X] =
Z
−∞
xf
X
(x) dx.
Variance is larger when the area under y = f
X
(x) is spread farther to both sides, away from E[X].
164
Section 2.7
Exercises
Summary Questions
Q1
What kind of object is a function of a random variable?
Q2
How do we compute the expected value of a random variable?
Q3
If someone mentions the “average value” of a function without mentioning what random variable
to use, what do you assume?
Q4
What function’s expected value is the variance?
2.7.1
Q5
Let X be a random variable that indicates how long from now an event will occur (in hours).
How could a random variable indicating how long until the event happens in minutes be defined
in terms of X?
Q6
Suppose the radius of a circle R is a random variable. How could we define a random variable to
express the area of the circle?
Q7
Dominic buys 200 shares of a stock for $60 each. At the end of the day, the stock is worth $V
per share, where V is a random variable. How could you express Dominic’s profit or loss from his
stock purchase with a random variable?
Q8
Suppose X is a random variable with outcomes in the range [2, 7]. What is the range of outcomes
of the random variable Y =
3
X
2
?
165
Section 2.7
Exercises
2.7.2
Q9
Suppose X is a random variable and Y = cX for some number c. Explain using one or more
rules of integration why E[Y ] = cE[X].
Q10
Suppose X is a random variable and Y = X + d for some number d. Explain using one or more
rules of integration why E[Y ] = E[X] + d.
Q11
Let X be a uniform random variable on [2, 5] with density function f
X
. Write a density function
f
Y
for Y = 10X. Explain how your density function differs from f
X
.
Q12
Let X be a uniform random variable on [0, 3]. Is Y = X
2
a uniform random variable on [0, 9]?
Provide evidence for your answer.
2.7.3
Q13
Let W be a random variable with density function
f
W
(w) =
(
36w
2
144
if 0 w 6
0 otherwise
Compute E
1
W
Q14
Let T be a random variable with density function
f
T
(t) =
(
2
t
3
if 0 t 1
0 otherwise
Compute E[T
3
].
Q15
Let X be an exponential random variable. Compute E[X
2
].
Q16
Let g(x) = c be a constant function. Let X be a random variable. Compute E[g(X)].
166
2.7.4
Q17
Suppose that you are told that the average value of f(x) from x = a to x = b is 0.
a
What geometric information does this give you about the graph y = f(x). Be specific.
b
Suppose you are told that f (x) is non-negative for all x. How does that affect your answer
to
a
?
Q18
Suppose you know that f(x) =
3
x has a positive average value over [a, b]. What does this tell
you about a and b?
2.7.5
Q19
Compute the average value of f(x) = x
2
over [0, 3].
Q20
Compute the average value of g(x) = x sin x over [0, π].
Q21
Compute the average value of f(x) = x
2
e
3x
over [0, 2]
Q22
What happens if we try to compute the average value of h(x) =
1
x
2
over [2, 2]?
2.7.6
Q23
Compute the variance of an exponential random variable X. Note that you may already know
some components of this computation from earlier examples and exercises.
Q24
Compute the variance of a uniform random variable on [2, 7].
167
Section 2.7
Exercises
Q25
Let W be a random variable with density function
f
W
(w) =
(
36w
2
144
if 0 w 6
0 otherwise
Compute the variance of W . I’d suggest using a computer to help with the algebra.
Q26
Let T be a random variable with density function
f
T
(t) =
(
2
t
3
if 0 t 1
0 otherwise
Compute the variance of T .
Synthesis and Extension
Q27
Let X be a random variable with density function f
X
. Let Y = cX for some number c. Write a
formula for f
Y
Q28
Compute the value b such that the average value of f(x) = x
2
over [0, b] is 1.
Q29
Some people memorize compute variance using the formula σ
2
= E[X
2
] E[X]
2
. Explain why
this formula is equivalent to the one we gave. (This is a famous calculation, so if you can’t figure
it out, look it up and try to explain each step).
168
Back to Contents