It’s pretty obvious to anyone who knows me that computers fascinate me. The hardware, the software, their uses. Everything about them intrigues me.
What tells packets where to go once they are out on the open web? How does a computer generate a random number? What allows memory to hold a persistent electrical signal? I encourage you to find out the answers to each of those in your spare time – everything about it is fascinating.
One of the particular things that I am interested in is Artificial Intelligence. It just so happens that one of my favorite YouTube channels Computerphile has several recent videos that are extremely informative on AI. They also have videos about Machine Learning and Search Engines in videos from recent months. All worth watching. Each of the topics are somewhat related to each other and yet each is distinctly different.
After watching them it got me to thinking about Structured Data and how exactly the structure is given or defined. At small scale you can take a dataset find common attributes and organize it by that criteria.
You manually set the criteria and the amount of categories then sort them into each pile. It’s easy.
How exactly would that be done with data that has no labels or clear set of common attributes? Taking unorganized data and indexing it, assigning labels, working out attributes. Finding better and more efficient ways of doing that is part of the improvement process of Machine Learning.
That’s exactly what I’m going to investigate doing in a long running project. Extremely efficient indexation and giving structure to random data is kind of how search engines work. There’s a strong correlation between the kind of thing I want to do and how search engines provide the most relevant result for a given terms.
I’m going to grab my data from Twitter and store it, index it, categorize it and learn from it. The data from Twitter already has somewhat of a structure to start with but that exact structure might not be what I’m after. I want to structure it in many more ways.
I’m going to make use of what I learn in… maybe no ways at all but I’m gonna do it anyhow haha!
- Make a Twitter Bot with search capabilities.
- Store Tweets in a database.
- Index them.
- Categorize the data.
- Learn and Enjoy!
I hope that I’ll learn an awful lot from doing this. Probably not directly from the data I gather but definitely in terms of skills. Plus everyone needs a project to keep them focused. Some of the elements of this have been on my project list for a long time, now is as good a time as any to make some headway.