When I decided to build my first machine learning model, I realised there are a couple fundamental things I need to figure out first:

  • What do I want to predict?
  • Where do I get the data to train my model?

Given that the training data is probably the most important thing when it comes to the quality of predictions that you can make, I thought I’ll start with researching the kinds of available open datasets that I can potentially use. Some of the sources that I have found are: