Learning, Machine Learning

So, I set out to learn to make machines learn, whether they like it or not. What did I do, and what would I recommend you do if you like learning so very very much that you want machines to do it for you so that you never have to do it again?

Here are some thoughts from someone who knew almost nothing about computers at all before deciding to try to get up to speed on the state of the art of modern AI. This is a living document that I’ll add to and revise as I go along.

But first, why?

“Why am I doing this?” is one of the best questions ever composed in human language, and yet is woefully under-asked. So let’s ask it.

Why am I doing this? I am doing this because I believe artificial intelligence, if even a fraction of the promise of recent progress is borne out over the long term, is likely to be one of the most important technologies developed in my lifetime, if not ever. Given this, I am very excited about the historic opportunity to help make sure that the development of this massively powerful technology will bring the greatest possible blessings to our world, with the least possible risks. One thing that seems certain to be useful in this pursuit, no matter my particular role, is having a certain level of fluency with the technical underpinnings of the tools being used.


Get comfortable-ish coding in Python first.

My very first efforts to learn technical things about AI included enrolling in Russian- and Chinese-language Coursera courses on machine learning. That was not a great strategy, but it did then imbue me with a certain confidence later on to jump into courses in English, thinking, “meh, it’s in my native language, how hard can it be?” And indeed, it wasn’t actually as hard as I had once imagined! But eventually, the homework called for using something called a “for loop”. This was a beast I had not encountered in my travels yet, and verily did it stop me in my tracks.

Ok, back to the drawing board. I ended up “learning” Python through a strange chimera of the first half of Coursera’s UMich Python specialization, and the latter half of the Udacity Python course. The prof for the UMich course, Chuck Severance, is a great goofy-uncle type and explains things in a very accessible way, but the content for the back half of the course seemed not super relevant for my exact purposes. The Udacity course content was good, but I hope whichever UX designer made the call to have all their content sliced up into ~2 minute video segments has solemnly reconsidered all their life choices leading up to that decision. Seriously, who does that?

In any case, having at least enough familiarity with programming in general, and Python in particular, to be able to dig in the documentation and Google around for whatever information you lack is critical. It also seems to be a surprisingly small up-front investment.

Recommended with caveats:

Just read a paper!

I originally had this in the “Recommended” section but reconsidered based on comments from colleagues at OpenAI. Problems with reading papers from the point of view of someone learning ML include that 1. they can be obscurantist, as they are often optimized for impressiveness, rather than readability or honesty, and 2. very few papers are worth reading for a neophyte, and it’s hard to tell which are which. Following from 2. is the phenomenon whereby most truly important papers will spawn blog posts, Youtube videos, and other kinds of explainer content as their significance becomes clear over time, so you often don’t even necessarily need to read the rare few papers that would be worth reading.

However, there are benefits to reading papers as well. Eventually, if you intend to inhabit or sojourn in the frontiers of the field, you will need to be comfortable reading papers, as they do represent a major medium of communication. There is a lot of knowledge encrypted in papers, and it can be exciting to dive directly into such primary documents rather than relying on someone else’s interpretation and repackaging.

Besides that, however, reading papers is a good way to test your comprehension of terminology. A paper is fundamentally a place where some new concept is built out of the building blocks of the concepts of the past. In trying to pick these pieces up and construct something new out of them, it will become very clear which ones you have a grasp on, and which ones you do not. Even better, you will be encountering concepts you are not familiar with embedded in a context that is meaningful to you.

So, my advice is: be on the lookout for papers that seem exciting, or seem to address a specific question you are wondering about. For example, recently I realized that, while I was familiar with the idea of batches, I did not understand how or why different batch sizes would affect training and performance of a neural net. I started Googling around, and found a paper that specifically discussed this topic, so I was very excited to try to extract what knowledge I could from it. In the process, I also learned about other concepts such as stochastic gradient descent and eigenvalues.

Not recommended:

Andrew Ng’s “Machine Learning” Coursera course.

You don’t want to learn Octave. And you don’t need to learn Octave. Invest in yourself and just pay the $50/mo for his newer, improveder deeplearning.ai specialization.

Bet you thought I was about to throw shade, huh?

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close