I want to include the term and its square (predictor variables) into a regression because I assume that low values of have a positive effect on the dependent variable and high values have a negative effect. The should capture the effect of the higher values. I therefore expect that the coefficient of will be positive and the coefficient of will be negative. Besides , I also include other predictor variables.
I read in some posts here that it is a good idea to center the variables in this case to avoid multicollinearity. When conducting multiple regression, when should you center your predictor variables & when should you standardize them?
Should I center both variables seperately (at the mean) or should I only center and then take the square or should I only center and include the original ?
Is it a problem if is a count variable?
In order to avoid being a count variable, I thought about dividing it by a theoretically defined area, for example 5 square kilometers. This should be a little bit similar to a point density calculation.
However, I am afraid that in this situation my initial assumption about the sign of the coefficients would not hold anymore, as when and
=
but would then be smaller because .