Posts

Showing posts from June, 2026

CST 383 - Intro to Data Science | Week 5

Learning log 4: This week we learned more about how important preprocessing is before building a machine learning model. One thing that stood out to me is that missing data is not always obvious. Sometimes it shows up as actual NA values, but other times it can be hidden as values like 0 or “information requested,” depending on the dataset. That made me realize that cleaning data is not just a technical step, but also requires thinking carefully about what the values actually mean. I also learned why scaling matters, especially for models like KNN. Since KNN uses distance to compare points, features with larger numbers can have too much influence if the data is not scaled. This helped me understand why preprocessing and modeling are connected instead of being separate tasks. The test/train split and cross validation topics were useful too. I understand that the test set should be saved for checking how well the model works on new data, while cross validation helps compare models or hyp...