# With k-1 dummy variables, why do we need a constant?

During lesson 5, Jeremy mentioned that when setting up dummy variables for tabular data, we include a constant for k-1 dummy variables. Meanwhile, if you have k dummy variables, you don’t need a constant. Why do we have to have a constant for k-1 dummy variables?

Hi,

Let’s consider an example where we have a table of values with dependent variable `income` (y) and independent variables `age` (x1) and `marital status` (x2). `age` takes numerical values and `marital status` can be single, married, or divorced.

Let us introduce `k = 3` dummy variables for `marital status` - x21 = 1 if single, x22 = 1 if married, and x23 = 1 if divorced. Our regression function will look like:

`y = a1*x1 + a21*x21 + a22*x22 + a23*x23`

If we choose to use k - 1 = 2 dummy variables, we can eliminate x23 from above equation to rewrite it as:

`y = a1*x1 + (a21 - a23)*x21 + (a22 - a23)*x22 + a23`
`or, y = a1*x1 + a21'*x21 + a22'*x22 + a23`

Now a23 coefficient becomes the bias which we’ll have to estimate.