In a previous blog post, I’ve explained how to search for blogs from the AI domain. The idea was, that most authoritative structure in the internet which is aware of all these blogs already is google and what the user has to do is to find out how to ask Google the right way. Let us make an example, how to not to find relevant blogs in the internet. If we are typing in “Artificial Intelligence” lots of websites will be shown. The problem is, that they are talking about the mainstream term “AI” which stands for anything and nothing. The result list is long and the knowledge on these sites is low. A second problem is, that most of the content was created not by amateurs but from large newspapers with the aim to fill the empty gaps in a journal.
The better idea is to search for a slightly different keyword and be flexible in the search request to Google. How to search right for AI blogs? This question is the right one, but it is hard to answer. The trick is to find a combination between detailed keywords and the correct domain name. What we want are not results about AI in general but about “Python game AI”. And we are not interested in articles from the Guardian but from amateur blogs. A possible search request is:
["Fuzzy logic" OR “Model predictive control” OR “Forward model” OR “AI planning” OR "hierarchical task network" OR “blackboard architecture” OR "model-based reinforcement learning" OR “Learning from demonstration”] [site:hypotheses.org OR site:wordpress.com OR site:blogspot.com OR site:github.io]
It is using different keywords which are aggregated with the OR operator and at the same time different domain-names are searched also connected with the OR operator. The improvement to the last posted code snipped is the “site:github.io” add-on. Github.io is referencing to so called github pages. This is a feature from the github social network which allows the user to publish static HTML pages in the Internet without costs. Many hobby programmers are using this feature, because it gives them more control over the content. In theory they can create old-school HTML pages which are not dependent on WordPress like bloatware, but are slim and without any pictures. And it seems, that github has fullfilled with this feature the needs of computer science students very well. High quality content is available under this domain.
The other keywords for restricting the search results to wordpress and blogspot are well known search techniques to make smaller blogs hosted on one of the major blogging websites visible.
I would guess, that the code snippet to ask Google the right way is not perfect. Perhaps it’s possible to adjust the keywords and the domain name a bit to get better results. What will happen in the worst-case that not a single blog is shown in the result list because the date range was restricted to the last week. It’s unclear if Google hasn’t indexed the blogs yet, if no content was posted, or if with the code snippet something is wrong. But in general this is the right way to identify relevant AI blogs in the internet.