Regression

{:toc}

Regression analysis is a statistical process for estimating the relationships among variables.

Correlation only measures the strength of a linear relationship, it doesn’t tell anything regarding the relationship.
Regression is used to figure out the relationship itself.

Eg: if correlation between Y and X is 0.7 then it says if X increases, 70% of the time Y increases. But regression tells if X increases by 1 unit, by how many units does Y increase.

Types of Data

  • Cross-sectional : data at a single point in time with multiple variables
  • Time Series : data at multiple points in time with a single variable
  • Longitudinal / Pooled / Panel : cross sectional time series data

Common Terms

  • Dependent Variable (Y) : the field which relies on other variables

  • Independent Variable(s) (X) : the assorted variables which has a direct effect on the dependent variable, whether positive or negative.

  • Linear Equation : Simply the a straight line with a formula

    y = mx + c
    y:
    m: intercept
    x: x co-ordinates
    c: constant (or error)

  • B-coefficient : if X increases by 1 unit, then Y increases by B-coeff units.

  • Intercept : Value of predicted Y if both X=0 and Y=0
    Intercept is the value or baseline, (organic growth)

  • Degrees of freedom = no of obs - (dimensions of x + dimension of y) = n- (k+1) [analogy : 5 hats]

Linear regression is a minimization function where the model is built to minimize the sum of squared errors whereas logistic regression is a maximization function where the model tries to maximize the parameter values of every variable in such a way that it fits very well on the data