There are a few essential items you need to solve a problem with machine learning: a straightforward problem to solve, a bunch of data that illustrates the problem, and a human to organize the data to use machine learning to solve the problem. Oh, and a bunch of computing power. It seems straightforward, right? Not so fast! Jumping those hurdles can be challenging. Machine learning can assist with many opportunities, and although machine learning could significantly improve many difficulties in education, not all problems fit the mold.
As an example, let’s look at how machine learning has been used to solve photo recognition. A straightforward assignment for a human would be to identify objects in a photograph. The average human would have difficulty recognizing cancer in an X-Ray or CRT scan, but identifying a wine glass or a dog in a photo might be more manageable. So let’s stick with simple.
We can easily have humans look at thousands of pictures and tag (or label) them with objects that appear; this picture has a dog; this picture has a cat. Data scientists know how to organize tagged photos and use machine learning algorithms to generate models that represent human learning. A data scientist “trains” a Machine Learning model to recognize items in a picture by feeding the algorithm many past examples. After training with many cases, the model becomes more adept at finding objects until, eventually, the algorithm becomes faster and more accurate than the humans that taught it.
Anyone who has used Apple Photos or Google Photos knows the result of this technology. Instead of spending hours and hours looking for pictures of Aunt Matilda for her birthday album, you teach an application what Aunt Matilda looks like with a few pictures, and the app finds the rest (most of the time).
Straightforward Problems
Machine learning cannot solve every problem. At least not yet. A rule of thumb is, “if you can teach an intern to do a repetitive task, you can probably help that task with machine learning.” For instance, you can train most adult humans to drive a car, obey a bunch of rules, and avoid hitting things. The fact that we are close to developing driverless cars powered by machine learning illustrates this point. Driving is a task that is on the way to a solution with machine learning.
Another example is finding new antibiotics. Researchers at MIT recently discovered that they could use characteristic data from past antibiotics, and use machine learning to predict which antibiotics can be used as new medicines. As a further example, hedge funds have been using ML for years to use past financial data to predict future prices.
The main idea is that the problem and its data needs to be somewhat structured and repeatable. With enough time, a person could use the existing data to predict probable outcomes in the future.
Large Quantities of Clean, Labeled Data
Just labeling (or tagging) a few pictures of Aunt Matilda is not enough data to inform a machine learning model. You only have to tag a few because Apple and Google already trained their models using thousands (if not millions) of photos. To prepare a model, you generally need a large amount of example data with different and distinct outcomes. Some people use a rule of thumb that greater than 5,000 samples are required. However, that threshold depends on the problem you’re solving. Training an ML program to recognize blood cell anomalies, for example, might take thousands of illustrations while teaching a car to drive through New York City would require millions.
For machine learning to work, the data also needs to be clean and accurately labeled. For example, let’s say I’m trying to predict the number of COVID-19 patients that will arrive in the county of San Francisco next week. The information is essential to estimate the staffing, beds, and supplies I will need across all hospitals. I have a bunch of data coming from different sources that utilize different formats; some of the tables have fields that are named differently or use words instead of numbers. In some instances, there is no consistency in the size or scope of the data collected and available.
All of the columns must contain consistent data before we can perform any kind of automated data analysis. Machine learning is lousy at taking completely unstructured data lacking uniformity and making sense out of it. That’s why you need a data scientist.
What is a Data Scientist?
Structuring the data so that the machines can make sense of it is no simple task. You need someone who understands how to organize the sets of data and, in some cases, restate the problem. A data scientist, among other things, is a person with an understanding of statistics and a computer science background that can analyze, process, and model data. They can then interpret the results to create actionable plans for companies and other organizations.
The thing about data scientists today is their scarcity. Silicon Valley is vacuuming up anyone who knows anything about it. The number of positions open for data scientists is increasing dramatically. But the number of folks graduating from college with this unique knowledge is not keeping up with demand. Several online education companies like Coursera and Udemy are providing online certificate programs to help meet the demand. The scarcity of talent is why data scientists are fast becoming the highest-paid people in the industry.
Crunching Data Requires Computer Power
The amount of computing power required by AI/Machine Learning around the world is skyrocketing. We hear the phrase “data is the new oil” more frequently. Companies are finding that there is real value in “mining” their data stores. But the computing power required could require a lot of shiny new hardware to make the data scientists happy.
Luckily you don’t need to own the dedicated resources to enable Machine Learning. All major cloud providers like AWS, Google, and Microsoft will allow you to quickly rent the cloud computing you need to process the problem and keep your work discrete and confidential. You can briefly rent the power you need for the project on which you are working.
Bringing it Back to Education
There are many problems in education that machine learning can help solve, and there are problems that aren’t yet ready. Let’s consider an example that could be in reach: teaching math to kindergarteners. We know that kids optimally learn math in a variety of different modalities and at different speeds. We also know there are many different ways to teach math. Generally, however, schools and governments tend to choose a curriculum that helps a statistical majority of kids. Then they train the teachers and administrators on that specific curriculum and approach. If a child who does not learn “that way” is fortunate, the student has a teacher who modifies the curriculum specifically for them; unfortunately, that leaves behind many children.
What if, instead, we could collect and aggregate and organize the data on how thousands of kindergarten kids learn math: how they approach the math problems, the problems they find comfortable, and the problems they find hard. Educators with data scientists utilizing machine learning algorithms can create a customized learning model and a plan for each child. Then, the administrators and educators need to use the ML feedback and with fidelity provide a personalized approach to the children that need it. Thus, together, educators and AI/Machine Learning can improve education for students.
If you start to look at education as a data problem and consider the collection of data to be a priority, you can begin to see the opportunities that are low hanging fruit in our schools. Using machine learning, among the many different possibilities, we could identify learning disabilities earlier, intervene where kids are falling behind, and help kids identify career paths suited to their success. In short, we can help all kids thrive in a system that leaves so many behind. No one gets left behind with customized plans that work for each child.
Next Installment: The Barriers to Education
Let’s examine real-world education problems impeding the least fortunate kids amongst us. How can ML and AI fuel better outcomes and support equity in education?