CST 383 - Intro to Data Science | Week 1
Learning log 1:
This week I learned the basics of the course, the homework policy, and how Python is used in data science. The biggest topic for me was NumPy. I learned that NumPy arrays are useful because they make it easier to work with a lot of numbers at once. Instead of writing loops for everything, I can use vectorized operations like tuition > 12000 or x * 10, and NumPy applies it to the whole array.
One thing that made more sense after doing the labs was slicing versus fancy indexing. Slicing is like x[:3] or x[-3:], where I am taking a range of values. Fancy indexing is when I choose specific positions, like x[[0, 2, 4]]. At first these felt similar, but now I see that slicing is more about a continuous section and fancy indexing is more about picking exact values.
Boolean masks were probably the most important idea for me this week. A mask is an array of True and False values that can filter another array. For example, tuition[tuition > 12000] gives only the tuition values above 12000. This feels useful for data science because real datasets are usually too big to check one row at a time.
I was a little confused by 2D arrays at first, especially remembering that column 0 is the first column, not column 1. For example, X[:, 2] means all rows from the third column, and X[:, 3] means all rows from the fourth column. After doing the 2D lab, I understand that the comma separates rows and columns.
One question I still have is when it is better to use a NumPy array compared to a pandas DataFrame. In the 2D lab, the data started as a DataFrame but then became a NumPy array. I understand NumPy is good for numeric operations, but pandas seems easier when columns have names. I feel like I understand the basics, but I still need more practice reading NumPy expressions quickly.
Comments
Post a Comment