×

Mathematics I (Math 1132)

Admin bar avatar
Study Force Academy
Durham College, Mathematics
Free
  • 42 lessons
  • 0 quizzes
  • 10 week duration

Mathematics I (Math 1132)

Regression

In statistics, regression is the fitting of a curve to a set of data points. The fitting of a straight line to data
points is called linear regression, while the fitting of some other curve is called nonlinear regression. In this course, you’re only expected to learn linear regression in this section.

When relationship between two variables are studied, let’s say femur length (independent variable) versus height (dependent variable) in humans, you’re likely to obtain a linear relationship if the data collected is coming from a large enough group of people.

  • It’s assumed that one’s height depends on the length of their longest bone, the femur. Hence, the femur is the independent variable. The dependent variable goes on the vertical axes, while the independent variable goes on the horizontal axis.

After the data is collected and plotted on an x-y plane, a scatter plot is formed (sample shown below). A scatter plot is simply a plot of all of the data points.

Notice the trend: the points are moving almost linearly from bottom left to top right. This is called a positive correlation between the dependent variable x and the independent variable y. When the points move from top left to bottom right, that’s called a negative correlation. When neither of these occur, there’s no correlation between the x and y variable. Sometimes you might have a prominent trend happening, except for one point, as shown below.

The point that’s circled is called an outlier. Such points are usually suspected as being the result of an error and are sometimes discarded.

While there are 1 of 3 correlations (mentioned above), the degree of scatter can determine how strong the relationship is:

  • strong, positive correlation
  • weak, positive correlation
  • strong, negative correlation
  • weak, negative correlation

To determine the degree of scatter, the correlation coefficient r gives a numerical measure of this property. The correlation coefficient can be calculated using the following formula:

r=nxyxynx2x2×ny2y2

The following video explains how this formula works:

Line of Best Fit

In this unit, you’ve already learned how to graph linear functions. In other words, you were given a first-degree equation, and learned how to spot special features such as the slope and y-intercept to graph any equation. But what if you were working with raw data, such as data found in a scatter plot graph. In that case, you’d have to use a line of best fit by eye to approximate a line that fits the data. The video below explains how this is done:

Method of Least Squares

If you’re looking for a more accurate method to generate an equation because maybe the points are too scattered or you’re not that good at eyeballing the line of best fit, you can use the method of least squares to find the slope and y-intercept. This method uses residuals (the vertical distance between a data point and the approximating curve) so that the sum of the squares of the residuals is a minimum – hence the name. Two separate formulas are used, one to calculate the slope, and the other to calculate the y-intercept. Interestingly, these formulas are derived using calculus.

Slope m=nxyxynx2x2y intercept b=x2yxxynx2x2

 

Leave a Reply

Your email address will not be published. Required fields are marked *