Alan Turing OBE (Officer of the Order of the British Empire) FRS (Fellow of the Royal Society) born 23 June 1912 and died 7 June 1954 was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist, or what we would call today an under-achiever. Mr. Turing was highly influential in the development of theoretical computer science, providing a formalization of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer. Turing is widely considered to be the father of theoretical computer science and artificial intelligence. At one point he made the statement “A computer would deserve to be called intelligent if it could deceive a human into believing it was human”. The basis for what became known as the Turing test. If you’ve seen any of my presentations on Artificial Intelligence (AI) I use the Goggle assistant as an example. See https://www.youtube.com/watch?v=-RHG5DFAjp8 to hear the Google assistant call. I’m sure you will agree the Turing test has been passed.
Alan Turing was discussing this idea back in 1950 so AI has been around now for 70 years. Today AI and it’s subsets of machine learning and deep learning are all over the Internet and the new hope is that these systems will be able to solve mankind’s greatest issues such as the climate, food, energy and transportation. One may ask why the sudden excitement given its 70 year history? For me it involves the 3 requirements for AI which are mathematics, massive data for training and compute power. This article will discuss these three.
Besides the prodigious work done by Alan Turing you should be aware of a breakthrough by the team of Warren McCulloch and Walter Pitts who, in 1943, proposed the McCulloch-Pitts neuron model. The model was specifically targeted as a computational model of the “nerve net” in the brain. See Figure 1.
Figure 1 McCulloch-Pitts neuron model
This mathematical model of how a human neuron works would allow, potentially, artificial intelligence. A machine could learn like a human being learns, perhaps. Based on this concept Frank Rosenblatt built a machine known as the Perceptron. It was a machine you didn’t program but trained. In an example I use (see: https://www.youtube.com/watch?v=cNxadbrN_aI&t=7s ) it is trained to distinguish between men and women. It is given a lot of photographs during the training period and told whether they are male or female. After enough training the Perceptron is able to accurately determine if a photograph is of a man or a woman, most of the time. This was very promising technology and proved the McCulloch-Pitts model and that machines could learn, instead of being programmed. However there were issues. First and foremost in the 1950’s and 1960’s this was very expensive. But what really slowed progress was a paper by Marvin Minsky and Seymour Papert which discussed some of the limitations of Perceptron’s called “Perceptrons: an introduction to computational geometry”. It has been argued that the paper was the reason for what is known as the AI winter. This was a period of about twenty years when funding for AI virtually dried up. Mathematics marches on however and although there are many things to note two items of significance are the multi-layer perceptron and backpropagation.
The original perceptron designed by Rosenblatt and based on the McCulloch-Pitts model just had an input area and output. This meant that what it could be ‘trained to do’ was very limited and binary. With the invention of the multilayer model much greater capabilities were opened up. This type of model is closer to the human brain which uses a multilayer approach with neurons signaling other neurons to arrive at an answer. See Figure 2.
Figure 2 – multilayer Perceptron
The layers separating the input from the output are called hidden layers and there can be many hidden layers. As the number of layers increases (more than 3) it is considered to be ‘deep’ which we will see is the basis for the AI method known as ‘deep learning’ but more on that later.
Another breakthrough that allowed AI to get the winter behind them and move forward was a tuning technique called backpropagation. What happens when a machine learns is that weights are assigned to each node or neuron (the blue circles in Figure 2). Based on the input layer a weight is given to each of the blue circles in the first hidden layer (note that each blue circle in the first hidden layer is connected to each input node. Based on the input and the training, different weights will be assigned to each node in the first hidden layer. All the nodes in the first hidden layer are connected to all the nodes in the second hidden layer and weights are assigned to them also based on the training. When training is started these are simply estimates (or guesses). As more training is done the accuracy is increased as the weights are tuned. So if a picture of a cat is input on a trained system, the weights should lead to a cat output. Backpropagation is a method for fine-tuning the weights assigned to each node. Once the weights are established, in our case left to right or from input through the hidden layers to output backpropagation goes backward through the network to fine-tune the weights. This method has decreased the learning time and increased the accuracy of the models.
At this point, I’d like to turn to the second requirement for AI which is massive amounts of data. I’m sure we have all heard the predictions for data growth. It didn’t seem like so long ago that a Gigabyte was a lot of data. Now we are discussing Exabyte and beyond. The amount of data is overwhelming and we have created a situation where only computers are fast enough to sift through all this information. To make any sense of it will require AI. This is probably a good time to define a few terms. When you train a machine or AI system it is done generally in three different ways. Supervised learning, Unsupervised Learning and Reinforcement Learning.
Supervised learning is when we train a machine with known inputs. In the Perceptron example above, pictures were given to the Perceptron and identified as being male or female. In supervised learning, we are providing known examples to the machine. They are labeled (in the picture case male or female). The machine is provided with a training set. A training set is a number of labeled pictures used as input. The machine reads them all in setting and adjusting its weights based on the inputs. Once this is done a test set – pictures it has not seen – is given to it and the data scientist will see how accurately the machine identifies the test set. If the accuracy is high enough the machine is considered trained and can start processing live data. So to take a more tangible example a machine could be trained on fraud patterns provided by labeled transactions. Once it proves to be accurate in identifying fraud using test sets actual transactions can be sent through it to check for fraud in real-time.
A second method is called unsupervised learning. As it sounds the machine is not provided labeled data but is simply given raw data that it evaluates on its own looking for patterns and correlations. A training set is still used. This type of learning is useful in areas such as recommendation engines “People who bought x also bought y and z”. It is also useful for customer segmentation and intersectionality (age, gender, salary, education, etc). In short, the machine determines what data is like other data – think of a heat map.
My last method is called reinforcement learning. This is especially good for things like a rumba vacuum. The vacuum goes in a direction until it hits something. It will create a map of the room over time and be able to vacuum around tables, chairs, furniture and know the room’s dimensions. Although great for vacuums and robots we wouldn’t want self-driving cars to learn this way. Smile.
You have probably heard the terms Artificial Intelligence, Machine Learning and Deep Learning. Let’s break those down a bit. Artificial intelligence is the term used for everything. Machine learning and deep learning are merely subsets of AI. Think of it as the umbrella term. Starting in about the 1990s as the Internet is ramping up and Moore’s law is going like gangbusters Machine Learning (ML) gets going. Machine learning is the idea of using particular algorithms that may be tuned against a particular data set to derive useful information and insights. Some of these algorithms include: Linear Regression, K-means, Naïve Bayes, self-organizing maps, etc. For details feel free to Google any and all. This is a very short list. Within a company, the data scientist would have a lot of data and would want to create some insights using the data. One of the many things a data scientist would do is determine the best algorithm to use. The correct one will likely provide great insights. The wrong ones will provide garbage results. A really good data scientist might determine a linear regression model will yield the best results and will be able to tune it to get even better results. If your child is in school and good in math/statistics, we will need lots of data scientists for a long while and they are pretty well paid.
For me, Deep Learning (DL) is in areas where the machine would take in information similar to a human. We collect our information through our eyes and ears so visual and audio learning falls into the area of deep learning. Generally, there are many hidden layers in a deep learning model and based on the evidence the more layers the better it gets. In this area, there are two big models currently. A Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN). A RNN is a deep learning system that needs to remember information over time. Think of audio input and imagine me speaking a sentence like “My Aunt Betty lived in Georgia until she was 21 then she moved to Florida met and married Steve”. A neural network takes in a word at a time, just like we do. By using the RNN method the machine can keep track of keywords and concepts. We easily know that the word ‘she’ in the sentence refers back to Betty but a machine needs to remember that association. Additionally words such as “Aunt” “Betty” “Georgia” “21” “Florida” “Married” “Steve” may be important depending on what the machine needs to learn. By using RNN techniques the system will know there is an association (marriage) between Steve and Betty and that Betty must have been older than 21 when she was married.
A method used for visual information is called Convolutional (CNN) and processes pixels for understanding. This is the best method for facial recognition and self-driving cars. It has an ability to take in information and process it quickly. Think of car cameras looking in all directions, calculating the speed of everything while also scanning for signs, pedestrians and bikes. It’s a lot to process which is why we have so many commercials on not texting, eating or drinking while driving. It’s pretty hard for us and so far our brains are way beyond the best super-computer. The advantage the machines have is that, generally, they specialize. All a self-driving car does is drive – no distractions. That being said, I have participated in an MIT project to try and provide moral principles to the self-driving car. This involves what to do in a situation that will likely cause injury or death. For example, a self-driving car has 3 people inside and the brakes go out as it’s heading toward a crosswalk with people in it. Assuming no ability to warn the pedestrians should the car veer into a wall possibly killing the passengers or go into the walkway possibly killing the pedestrians? What if there are 2 in the walkway? What if there are 4? Does the age of the people in the vehicle or in the walkway come into play? Do the careers of the people in the vehicle or in the walkway come into play? These are thorny issues.
No mention of deep learning would be complete without acknowledging the contribution by Fei-Fei Li and ImageNet. A professor of computer science at Stanford University in addition to being Co-Director of the Stanford Institute for Human-Centered Artificial Intelligence, and a Co-Director of the Stanford Vision and Learning Lab, she is clearly another under-achiever. In 2006 Fei-Fei had an idea to create a massive image database At that time there were pictures all over the Internet but they were not classified (labeled). Fei-Fei managed to get funding to have pictures labeled and placed into the ImageNet database. It grew to millions of labeled pictures which could then be used for training systems. ImageNet was and is a massive amount of labeled image data. She started the ImageNet challenge which was about which team could do the best job in correctly identifying pictures. Now a human being is about 95% accurate in identifying pictures. The first contest was held in 2010 with the winner achieving 72% accuracy. The next year it was slightly better with 74% accuracy but the following year a team using a CNN won with an accuracy of 84%. Their system used a model with 8 hidden layers. More layers were added each year until 2015 when the winning team had an accuracy of 97% (better than human) using a model with 152 hidden layers. This work really laid the foundation for facial recognition and autonomous driving. See figure 3.
Figure 3 Self-driving real-time identification
Our last area for successful AI is compute technology. We are all familiar with Moore’s law that compute capabilities would double every 18 months. This held true for much longer than Gordon Moore believed thanks to miniaturization and multicore technology but depending on who you believe Moore’s law came to an end in 2013 or sometime thereafter. We can attribute the continued compute increases for AI through advances in GPU technology. Originally used for gaming, GPUs were used to process graphic information. What became clear was their ability to process heavy compute cycles. This is exactly what is required for deep learning applications. In figure 3 we see a real-time view of a street and the need to process everything on that street in real-time. HPE with its acquisition of SGI along with the recent acquisition of Cray has two powerhouses in terms of AI/ML/DL. The Apollo line along with Cray provides massive supercomputing to process today’s Exabyte workloads.
As I said initially for AI to be successful it required mathematics (check), massive amounts of data for training (check) and blazing compute power (check). So I believe we can see why there is so much discussion around AI since data and compute are catching up to the math. But are we destined for some Skynet future? I am hopeful we are not. Most people know that in 1997 an IBM system known as ‘Deep Blue’ beat Grand Master Chess champion Garry Kasparov. What most people do not know is that Garry went on to create a new chess league known as Centaur. His position was that ‘Deep Blue’ had access to a massive number of historical chess games which Garry could not possibly keep in his own memory. He suggested a combination of machine and Grand Master (Centaur) as the basis for a new league. A grandmaster would receive a recommended move from the system. The master could accept the recommended move or decide to make a different move. When Centaur’s play machine only systems, the Centaurs usually win. So man plus machine is better than man versus machine. That’s what I hope for as the future of AI.
With that background in mind can HPE be a force for good in companies exploring AI/ML/DL? HPE brings a massive amount of technology, partnerships and expertise to assist customers in this journey. One of HPE ‘s goals is to help customers “Unlock data with AI” from the edge to cloud…now and into the future. We look at this across three dimensions, specifically AI within our products, AI for our customers and AI for the future.
Perhaps the ultimate use of AI is to use AI to predict and solve problems automatically for our customer base. We call this AI-driven operations (AI-Ops) which covers the complete edge-to-cloud-to-core infrastructure. Two HPE product examples include 1) HPE InfoSight and 2) HPE Aruba Introspect. Both of these solutions improve productivity and support by using embedded AI techniques.
When you look to the future…. think of Hewlett Packard labs and the work they’re doing to create the latest HPE technologies and products. Examples include Dot Product Engine, the Apollo line and Memory-Driven Computing, the ultimate AI-friendly architecture. These will only improve with the acquisition of Cray. Beyond these and by way of example, in the Covid-19 area HPE has been involved in the development of swarm analytics. This is a very creative idea whereby machines at hospitals and clinics can share the ‘hidden layers’ of their models, that is the weights derived from machine/deep learning using their local information. By sharing this information, which is completely depersonalized, with other clinics around the world a combined better model can be developed and shared. There is a YouTube discussion of this by our own Dr. Eng Lim Goh, SVP & CTO for AI at HPE. See: https://www.youtube.com/watch?v=Yw15-tLViAs I’m sure you’ll be impressed.
HPE is also focused on AI for Business. Customers are faced with many challenges along their AI journey from getting started, the right tools and services, choosing the right infrastructure and navigated a vast partner ecosystem. We are extending our purpose-built AI portfolio, leveraging Pointnext advisory services to provide much-needed advice and expertise, continuing to build on our partner ecosystem and developing optimized configurations. All of which is to help customers unlock their data to gain faster insights and deliver better business outcomes.