This week’s topic is about decision tree. Decision tree is a supervised learning method that calculates the past data and condition that leads to the end results, for example the condition and cases of being sick, it will tell you from your current condition with the past data whether you are sick or not.
In an supervised learning, overfitting is not a good thing for the machine. Overfitting is when the machine learn too specifically about an object, and when someone give them a picture of that object but with a spot on them, they could not recognize the object, due to the fact that the model they are trained with are too good and the machine cannot recognize anything else.
There is 3 types of data gathering and learning for machine, which is classification, regression, and clustering.
We have learned clustering a couple of weeks ago. Classification is when a data is about an object with a distinct feature that differentiate them, for example, a fruit or an animal. Regression is about a data that are used to predict the up and down of a market price, or stock price.
As the name imply, a decision tree is a tree made based on the data given, start from the root with the highest entropy value, which is calculated from the yes and no data of the given condition. And if the information gain is the highest, it will be the root, and this calculation will iterate until the child and leaf are calculated.