What Are The Bad Things About Machine Learning?

By Kimberly Cook |Email | Mar 6, 2019 | 3000 Views

There are a number of really unfortunate consequences of getting caught up in the spell of machine learning. I speak as a long-time sufferer of this ailment since I have spent 36 long years suffering from this "disease" (I wrote my first ML program in 1982). As it happens, every generation of brain researchers, from work in the 1950s to the most recent wave in deep learning and ML, have been infected by this disease and trapped into this mode of thinking.

In every field of science, our most cherished hopes have been dashed, and I'll argue below by analogy, so it must be the case with ML. The famous psychoanalyst Sigmund Freud once wrote that there have been three great advances in human thought:
  • Believing we are at the center of the universe: Copernicus and Galileo dashed these hopes, at great personal risk to their lives, since earlier such "blasphemers" were burned alive at the stake. As the most recent estimate puts it, we are one tiny planet revolving around a star, there are a trillion stars in our Milky Way galaxy, and a trillion such galaxies. Much as we might care about the next US Presidential Election, or making the maximum number of hits on Instagram, the universe does not care. We are absolutely inconsequential in the grand scheme of things.

  • Assuming that humans are "divine" creatures, distinct from other animals on the planet: Darwin dashed these hopes, and while the Kansas School Board and other similar states in the US try to throw Darwin out of the school curriculum each year, 150 years after Darwin, in favor of "intelligent creation" (whatever that means), it is an established fact beyond any doubt that we are descended from the apes, and share a huge genetic similarity with many millions of other species on the planet. That simple but astonishing fact, strangely enough, may be the single biggest hope for extending human life on this planet, since breakthrough medicines for treating our most basic ailments can be developed from recognizing and exploiting the genetic similarities among species.

  • Freud claimed the mantle for this last advance: the vain hope that humans are rational beings. Freud thought his life's work decisively disproved the illusion that humans act rationally. Much of the work in AI, economics, psychology, and other areas still cling to this illusion, despite the fact that there is huge evidence against it. Behavioral economics is finally realizing that perhaps it might be better to actually study how humans make decisions, rather than assuming they act rationally. AI has yet to make this intellectual leap, and work in reinforcement learning, to take one area, still assumes humans act to maximize expected utility and therefore, behave rationally.

OK, in a similar vein, what are some of the most common fallacies that ML researchers believe in. If you find yourself nodding in agreement with the first statement in any of the numbered paragraphs below, consider yourself as having been diagnosed as suffering from "ML-itis", for which there is no simple cure, except if you make a concerted effort to challenge your most cherished beliefs and hopes (as scientists above were forced to do).

  • Believing that every problem in AI, science, business, and entertainment is best approached through an ML solution: this is a "paradigm" best exemplified by deep learning. Whether it is computer vision, speech recognition, NLP, digital marketing, robotics, social interactions, financial investing, you name it, the DL paradigm assumes that a network can be designed to fit some unknown function, given enough data. With every iteration, it seems that the "promised land" is just around the corner, and with a little bit more math, and a lot of GPU power, the magical solution will appear.

  • Evaluating a desired machine learning model based on its fit to some training or test data: another common and hard-to-avoid misapprehension of ML researchers is that the method that produces the lowest error on training data (or test data derived from the same distribution) must be the best approach. So, hypothetically, if a deep learning (or whatever) approach performs, say 3% better on MNIST, compared to some other approach, e.g. random forest, then the lower error approach is intrinsically better. In other words, fit to training or test data trumps all other criteria.

  • Assuming that for every given problem, there has to be an ML solution, regardless of what computational learning theory has shown: so, more than 5 decades ago, Gold established the futility of trying to induce grammars from data, even given arbitrary computation and data. While linguists took Gold's work to heart and sought to develop models that don't assume grammars can be induced from data, ML researchers largely ignored the results from their own community, and have continued to search for the "magic solution", e.g., LSTMs and other sequence models.

  • Wanting a completely domain-independent solution to the problem of how the brain works, according to which any approach that is developed in one domain (e.g., convolutional networks for vision) must automatically also be the best approach in every other domain (e.g., deep reinforcement learning, speech recognition, robotics, etc.). The fact that decades of neuroscience research has produced significant evidence for the modularity of the brain has done little to dampen the enthusiasm for the hope that there is one magical architecture, and one magical solution, that will produce the best result in each and every domain.

The Nobel-prize winning economist, Ronald Coase, made two important observations, which are worth recalling. In his Warren Nutter lectures in 1981, and in an earlier talk at the University of Virginia in 1960s, he said:

  • But a theory is not like an airline or bus timetable. We are not interested simply in the accuracy of its predictions. A theory also serves as a base for thinking. It helps us to understand what is going on by enabling us to organize our thoughts. Faced with a choice between a theory which predicts well but gives us little insight into how the system works and one which gives us this insight but predicts badly, I would choose the latter, and I am inclined to think that most economists would do the same.

  • If you torture the data enough, nature will always confess.

In other words, just because a particular ML paradigm does better on some test data sets does not automatically imply it is the right approach. A great deal depends on other criteria, as he articulates above. Also, he cautioned against building your entire research paradigm around one dataset (e.g, MNIST).

How to overcome the dangers of succumbing to these illusions about the power of learning from data? The longer you spend in the ML field, the harder the recovery will be, sadly. Try reminding yourself each and every day that machine learning, or knowledge acquisition from data in the absence of explicit programming, is very much dependent on a specific problem formulation, and that there are many cases where no machine learning solution may be possible.

It is no accident that the vast majority of species on earth, from baby lizards that must survive hordes of marauding snakes in the Galapagos before finding refuge in the sea, to newly born wildebeest in the African plains must be able to get up and walk almost instantly, are largely programmed at birth. So, evolution is fully capable of programming us, if necessary. So, it stands to reason that a lot of our neural hardware must contain many pre-programmed routines, with some parameters that need to be adjusted from data, and not trillions of randomly assigned parameters that must be set by backpropagation. If you can recognize this simple fact, then congratulations! Your recovery has begunā?¦

Source: HOB