CST 383 - Intro to Data Science | Week 7
Learning log week 7:
This week we covered encoding categorical variables, logistic regression, and overfitting. I learned that categorical variables need to be changed into numerical values before most machine learning models can use them. One hot encoding made sense to me because it creates separate columns for each category instead of assigning numbers that could accidentally suggest an order.
Logistic regression was a little confusing at first because the name sounds like it should predict a numerical value, but it is actually used for classification. I understand now that it predicts the probability of an outcome, such as whether a customer will churn or not. I still want more practice interpreting the coefficients and understanding exactly how they affect the predicted probability.
The topic of overfitting also stood out to me. A model can perform very well on the training data but still perform poorly on new data. This helped me understand why we need test sets and cross validation. One question I still have is how to tell how complex a model should be before it starts overfitting. I understand the general idea, but I think seeing more examples would help.
Comments
Post a Comment