How a Neural Network is like Munna Bhai & Classifying the Largest Number
It took me a good week or so after going through the first 3 videos of the FastAI course for it to click in my head as to what was going on in a neural network. And now, the basics feel very foolish. On the side the past few weeks, I've been doing algorithmic and data structure problems for fun*, and I thought it might be fun to see if a neural network could solve some of these problems.
And so, I learnt Keras which looked easy enough.
I came up with how the network should look on paper. We all know deep learning is just matrix multiplication passed through a nonlinear function several times, and so,
looked about right. I didn't need any more than one layer because it's a trivial problem. And it was!
Except:
It wasn't confident about most small numbers (say, 2 decimals), no matter what I did. I trained on small numbers exclusively, and yet a lot of the time it was about 50.000000001% sure, but correct nonetheless.
Why it's not confident on smaller numbers:
Well, it's because of my activation function, the softmax.
Aaand here's why neural networks are like Munna Bhai:
 The neural networks of today are not how we learn. They're primitive forms of function approximators that only work on the data they are trained on. These things don't generalize well for data they aren't trained on.
 That's a lot like Munnabhai from the movie Lage raho Munna Bhai where he sees Gandhi's projection after having studied about him for weeks nonstop, but his figment of imagination can only answer questions he studied.
 And so, I started out with trying to predict the largest continuous sum in an array.
 But to define it better, I'd make sure the array only had 100 numbers.
 But 10 numbers are easier for me to visually see. And so, the array would only have 10 numbers.
 Shortly after I realized that debugging this would take me too much time, and so I had to think of an easier problem.
 Okay, how about finding the largest number of a list.
 But, a regression problem is most likely going to give random numbers, and I'll have to use silly metrics like RMSE which don't make too much sense in a question like this.
 And so I decided I'll do the easiest thing possible:
 Given two numbers, predict the larger number.
And so, I learnt Keras which looked easy enough.
I came up with how the network should look on paper. We all know deep learning is just matrix multiplication passed through a nonlinear function several times, and so,
looked about right. I didn't need any more than one layer because it's a trivial problem. And it was!
Except:
It wasn't confident about most small numbers (say, 2 decimals), no matter what I did. I trained on small numbers exclusively, and yet a lot of the time it was about 50.000000001% sure, but correct nonetheless.
Why it's not confident on smaller numbers:
Well, it's because of my activation function, the softmax.
For small numbers, the probability distribution is very different than for bigger numbers.
A friend who does DL at Google convinced me it's alright it isn't confident, as long as it's right.Aaand here's why neural networks are like Munna Bhai:
 The neural networks of today are not how we learn. They're primitive forms of function approximators that only work on the data they are trained on. These things don't generalize well for data they aren't trained on.
 That's a lot like Munnabhai from the movie Lage raho Munna Bhai where he sees Gandhi's projection after having studied about him for weeks nonstop, but his figment of imagination can only answer questions he studied.
One cringeworthy post a year hurt no one.

* No one really does algorithms and data structures for fun. We all know why people do them ;)
Comments
Post a Comment