machine learning

You may already be aware that traditional cybersecurity solutions are no longer effective, or that prescriptive analytics are a critical development for modern industry, but that knowledge only gets you so far. How do you go from recognizing a need for machine learning in your organization to implementing the solution–particularly when everyone and their brother is throwing around the term “machine learning” until it seems almost meaningless? How do you separate the wheat from the chaff, and the real machine learning companies from the bandwagon-jumpers looking to cash in on the latest trend? There’s plenty of companies that promise machine learning solutions, but not all of them are the real deal.

Luckily, we’ve prepared a toolbox for you to help you deal with this exact problem. When you’re deciding whether or not to work with a company that claims to do machine learning, ask them the following five questions. A true machine learning company will be able to answer these well. The fakes? Not so much.

1. What do you look for in a data scientist when you hire?

These days, a “data scientist” can range from someone with a background as a postdoctoral researcher in computational astrophysics to somebody who happens to be very crafty with Microsoft Excel. Because data science is such a new field, proper definitions have yet to be established, meaning it is necessary to go a bit beneath the surface to identify which candidates can add value to a company. A good machine learning company will look for at least three major skill sets in a data scientist, all of which are necessary in order to build scalable products:

  1. A data scientist must have good knowledge of machine learning algorithms
  2. A data scientist must be statistically inclined (this traditionally comes with a background as a Masters/PhD student)
  3. A data scientist must be a capable programmer

2. What components of your product use machine learning?

Any product that markets itself as a machine learning product should contain one key feature: adaptive models that learn from data to solve a problem. The most critical component that differentiates machine learning from traditional statistical analysis or heuristics-based systems is that machine learning is designed to adapt to the data it ingests. This means that every unique deployment of a machine learning product should perform slightly differently, as it will be learning from new data in every implementation.

3. How does your product deal with messy data?

Messy data, meaning incomplete, missing, or skewed data, is an unfortunate reality of the world we live in. Anybody who tells you otherwise is probably in finance (an industry that has “clean data” down to an art form). Because of that, any machine learning product worth its salt should have mechanisms built in to clean messy data, or at least handle it in such a fashion that a major alert is not triggered. If this functionality does not exist, users of the product will end up in a lot of scenarios in which they are receiving incorrect predictions or results.

4. How are artificial intelligence and machine learning different?

In order to be an effective machine learning company, it is imperative to have a solid foundational understanding of the theoretical space you work in. Artificial intelligence is the umbrella term, and signifies computers or machines exhibiting human-like intelligence. Artificial General Intelligence (AGI) is the concept beneath AI in which computers or machines fully think and act like sentient beings. This is what you read about in science fiction novels, and it may well become a reality before the turn of the century.

Machine learning is a subset of AI, and has very little to do with AGI. Machine learning is the algorithmic practice of learning from data. It is often associated with developing “Expert Systems” that can learn from data in order to perform a very specific task. Because of the constrained inputs and outputs, machine learning excels at specific tasks, but will not adapt to new types of data or problem sets without the guidance of a data scientist. For more information on AI and machine learning, check out this article.

5. Do you use deep learning? Explain it to me.

Almost everybody says they use deep learning, but few have much context as to what deep learning actually means. The reality is, deep learning is just a subset of machine learning, and an extension of using neural networks. In order to be defined as “deep,” these neural networks must contain at least one hidden layer. This is not an overly complex task for a data scientist, as there are commoditized open source packages that enable deep learning, but any machine learning company should be able to define exactly what it is and how they use it. For more information, and a great read on neural networks and deep learning, check out this article.

So there you have it: the five questions any machine learning company should be able to answer. If the company you’re talking to has good answers, you’re safe to move ahead on working with them. If not? Take your business elsewhere.