Last week, we debunked the mythical Cult of the Expert in favor of a more complete and unbiased analysis of data. We also briefly mentioned the role that services like our own LexSemble play in developing different analytical methods. We have spoken at length about cross-validation, and how important it is to check our analytical model for robust decision trees that are well generalized. Now it’s time to combine these ideas into a powerful new methodology called the Random Forest Method.
The Random Forest Method
The Random Forest Method is essentially a more refined explanation of the concepts we’ve already covered. First, we gather data. Second, we partition the data into multiple different testing sets, with some overlap between sets in order to establish cross-validation. Third, these different testing sets wind up creating different decision trees. Multiple decision trees created through cross-validation create a more dependable model. Even if every decision tree produced in this way possesses areas of weakness, the amount of trees means that these weaknesses will be compensated for by the aggregation of every tree’s differing point of view. This in turn reduces complications such as overfitting, as we have previously discussed.
The Random Forest Method is the essential principle behind crowd-sourcing. The antithesis of the Cult of the Expert, the Random Forest Method aggregates multiple decision trees in order to find a predictive algorithm that fits the largest possible data environment. Click here for an easy explainer video on the Random Forest Method.
The Random Forest Method is how we arrive at the best possible analytical tools for a machine learning program. Similarly, crowd-sourcing is how we can develop decision trees powered by humans. LexSemble is one such tool. Like the Random Forest Method, crowd-sourcing with services like LexSemble allows a large group of people with a wide array of perspectives and expertise to break down and analyze various tasks. Each individual in the crowd can produce their own decision tree. Each tree will be different, but taken together, the forest these trees create will have immense predictive power. This predictive power can then reduce the potential for knowledge gaps that could negatively impact a company’s data strategy.
Next week, we will throw some more abstract terms at you. Now that we have mastered machine learning, it’s time to take a look at some examples.
This article is part of our 7-part Intro series. The others can be found here.