[Updated March 2020] status: infrequently twiddling with content; confidence: unsolicited internet advice; importance: low
So, I set out to learn to make machines learn, whether they like it or not (or at least what the fuss about other people learning to make machines learn is all about). What did I do, and what would I recommend you do if you like learning so very very much that you want machines to do it for you so that you never have to do it again?
Here are some thoughts from someone who knew almost nothing about computers, nor had ever taken calculus, before deciding to try to get up to speed on the state of the art of modern AI.
But first, why?
“Why am I doing this?” is one of the best questions ever composed in human language, and yet is woefully under-asked. So let’s ask it.
Why am I doing this? I am doing this because I believe artificial intelligence, if even a fraction of the promise of recent progress is borne out over the long term, is likely to be one of the most important technologies developed in my lifetime, if not ever. Given this, I am very excited about the historic opportunity to help make sure that the development of this massively powerful technology will bring the greatest possible blessings to our world, with the least possible risks. One thing that seems certain to be useful in this pursuit, no matter my particular role, is having a certain level of fluency with the technical underpinnings of the tools being used.
Recommended:
Get comfortable-ish coding in Python first.
My very first efforts to learn technical things about AI included enrolling in Russian- and Chinese-language Coursera courses on machine learning. That was not a great strategy, but it did then imbue me with a certain confidence later on to jump into courses in English, thinking, “meh, it’s in my native language, how hard can it be?” And indeed, it wasn’t actually as hard as I had once imagined! But eventually, the homework called for using something called a “for loop”. This was a beast I had not encountered in my travels yet, and verily did it stop me in my tracks.
Ok, back to the drawing board. I ended up “learning” Python through a strange chimera of the first half of Coursera’s UMich Python specialization, and the latter half of the Udacity Python course. The prof for the UMich course, Chuck Severance, is a great goofy-uncle type and explains things in a very accessible way, but the content for the back half of the course seemed not super relevant for my exact purposes. The Udacity course content was good, but I hope whichever UX designer made the call to have all their content sliced up into “snackable” ~2 minute video segments has solemnly reconsidered all their life choices leading up to that decision. Seriously, who does that? (Whoever first used the word “snackable”, and whoever decided that the snacking paradigm should metastasize to forms of consumption besides eating, should probably join them in the self-flagellation.)
If you want to be able to actually do things in ML, rather than just understand them, you will need at least enough familiarity with programming in general, and Python in particular, to be able to dig in documentation and Google around for whatever information you lack. Furthermore, computer code, unsurprisingly, forms an important part of the sociolect of the ML community, cropping up everywhere from puns to repurposed metaphors to pseudo-code used in place of formal mathematical notation. It also seems to be a surprisingly small up-front investment.
That math stuff? Yeah you might want that.
Nobody who is generally numerate should be afraid of machine learning. Apparently rocket science is in fact literally rocket science, but I have a sneaking suspicion that many technical fields, upon a scratch of the surface, might not be as awe-inspiringly intimidating as the Latin- and Greek-derived terminology would have you believe. I’ve gotten surprisingly (to me) far into the territory of “generally having an idea what’s going on” with almost no remedial renovations to the rather sorry edifice of my math education up until now.
However, I do think things would have gone more smoothly had I been a bit more familiar with some fields of math before diving in head first. At the very least, I would have avoided some minor frustration in encountering terms like “the Jacobian”(!), and had more n+1 experiences, and thus learned more, along the way. I’ve recently started going through this Mathematics of Machine Learning specialization from Imperial College London on Coursera. As is my wont, I jumped right into the middle course on multivariate calculus, for no good principled reason other than “bah, I know how to multiply matrices plenty well already!”… I recommend starting at the beginning.
Recommended with caveats:
Just read a paper! (Or maybe a blog?)
I originally had this in the “Recommended” section but reconsidered based on comments from colleagues. Problems with reading papers from the point of view of someone just starting to learn ML include that 1. they can be obscurantist, as they are often optimized for impressiveness, rather than readability or honesty, and 2. very few papers are worth reading for a neophyte, and it’s hard to tell which are which. Following from 2. is the phenomenon whereby most truly important papers will spawn blog posts, Youtube videos, and other kinds of explainer content as their significance becomes clear over time, so you often don’t even necessarily need to read the rare few papers that would be worth reading.
However, there are benefits to reading papers as well. Eventually, if you intend to inhabit or sojourn in the frontiers of the field, you will need to be comfortable reading papers, as they do represent a major medium of communication. There is a lot of knowledge encrypted in papers, and it can be exciting to dive directly into such primary documents rather than relying on someone else’s interpretation and repackaging.
Besides that, however, reading papers is a good way to test your comprehension of terminology. A paper is fundamentally a place where some new concept is built out of the building blocks of the concepts of the past. In trying to pick these pieces up and construct something new out of them, it will become very clear which ones you have a grasp on, and which ones you do not. Even better, you will be encountering concepts you are not familiar with embedded in a context that is meaningful to you.
So, my advice is: be on the lookout for papers that seem exciting, or seem to address a specific question you are wondering about. For example, recently I realized that, while I was familiar with the idea of batches, I did not understand how or why different batch sizes would affect training and performance of a neural net. I started Googling around, and found a paper that specifically discussed this topic, so I was very excited to try to extract what knowledge I could from it. In the process, I also learned about other concepts such as stochastic gradient descent and eigenvalues.
Not recommended:
Andrew Ng’s “Machine Learning” Coursera course.
You don’t want to learn Octave. And you don’t need to learn Octave. Invest in yourself and just pay the $50/mo for his newer, improveder deeplearning.ai specialization.
Bet you thought I was about to throw shade, huh?