Posts

Showing posts from May, 2026

CST 383 - Intro to Data Science | Week 4

Learning log 4: This week I learned more about how to compare two discrete or categorical variables, especially using statistics and visualizations. One thing that stood out to me was how useful crosstabs are for organizing categorical data. Before this, I mostly thought about looking at one variable at a time, but now I can see how comparing two variables can show patterns that are not obvious at first. I also learned that choosing the right kind of plot matters a lot. For example, bar plots are useful for categorical variables, and grouped or stacked bar plots can help compare categories across another variable. I am starting to understand why we should think about the type of variables first before choosing a visualization. Something I am still trying to get better at is knowing when to use counts, fractions, or percentages. Sometimes the code is not too hard, but deciding what the graph should actually show is harder. I also want more practice with crosstabs and the normalize opt...

CST 383 - Intro to Data Science | Week 3

Learning log 3: This week I learned more about how to describe and visualize different types of variables. A lot of the week focused on continuous variables, like using density plots, histograms, and box plots to understand the shape of data. I also learned that the way you choose bins in a histogram can change how the distribution looks, so graphs are not just automatic answers. They need choices that make sense. The famous distributions topic helped me see why distributions matter in data science. The normal distribution is starting to make more sense, especially the idea of mean, standard deviation, and how values are spread out. I still find probability density a little confusing because the y-axis is not exactly a probability by itself. I understand that the area under the curve matters, but I need more practice with that idea. For two continuous variables, correlation and visualization were useful because they show how two things can move together. I understand that correlation...

CST 383 - Intro to Data Science | Week 2

Learning log 2: This week I learned more about pandas and how it is used to work with data in a more organized way. Last week we used NumPy arrays, but this week pandas Series and DataFrames made the data feel easier to understand because the rows and columns can have labels. I learned that a Series is like one column of data with an index, while a DataFrame is like a full table with rows and columns. One topic that made more sense after the labs was indexing. With a pandas Series, I can use dictionary style indexing like mpg['Ana'] , or I can use .loc to get values by label. With DataFrames, I practiced getting columns, rows, and specific values. I also learned that pandas lines up data by index, not just by position. That was important in the series lab because one student was missing from the distance data, so pandas returned NaN for that calculation. I also learned about aggregation, which seems like one of the most important skills so far. Simple aggregation uses func...

CST 383 - Intro to Data Science | Week 1

Learning log 1: This week I learned the basics of the course, the homework policy, and how Python is used in data science. The biggest topic for me was NumPy. I learned that NumPy arrays are useful because they make it easier to work with a lot of numbers at once. Instead of writing loops for everything, I can use vectorized operations like tuition > 12000 or x * 10, and NumPy applies it to the whole array. One thing that made more sense after doing the labs was slicing versus fancy indexing. Slicing is like x[:3] or x[-3:], where I am taking a range of values. Fancy indexing is when I choose specific positions, like x[[0, 2, 4]]. At first these felt similar, but now I see that slicing is more about a continuous section and fancy indexing is more about picking exact values. Boolean masks were probably the most important idea for me this week. A mask is an array of True and False values that can filter another array. For example, tuition[tuition > 12000] gives only the tuition val...