The AI in your Phone

How AI is Taking over your Phone

In A.I., Computers, Videos by Paul ShillitoLeave a Comment


If you’ve bought a new smartphone in the last couple of years you may well have noticed that it’s quite a bit smarter and more capable than models of just a few years earlier.

Now some might put it down to the CPUs, memory and general computer hardware becoming faster but that doesn’t explain the huge performance increases that now allow features like face ID to verify it’s you and unlock your device in under a second, far more accurate voice recognition, automatic text identification & translation using your camera, smart HDR for pictures and video and yes even animojis, plus a whole lot more. This is following a trend in which you will see a great deal more of the devices we use every day becoming a lot smarter.

This has all happened in the last couple of years with the convergence of neural network engines, machine learning software and the data to train them on has become more widly available.

This is how the smartphone in your pocket is helping to lead the way in intelligent machines.

If you think about it, a smartphone has got to be one of the more awkward computing devices to use, you only have a small screen that has to double up as a keyboard and mouse, not ideal for those that have sausage fingers and dodgy eyesight and yet you can now carry around the power of a desktop computer in your pocket.

So to make these devices easier for everyone to use and to get a competitive advantage over rivals, the search has been on to make these devices work for us humans, including the older ones like me,  and work in ways that are as seamless and natural as possible, so there will be more talking to your phone and less tapping on the screen.

Now, this sounds like it might be an easy task with today’s powerful processors but talking and interacting with a machine like we would another human is an incredibly difficult task for a traditional computer.

If you take voice recognition for example, everyone speaks differently, some fast some slow, and then there are widely varying accents for each language.

Ten years or so ago dictation software required that you to read several paragraphs of text to train it to just your voice and even then the results weren’t that good. Today with the latest devices, you can just talk to your phone and it will create a very accurate text transcription with few errors with no training.

So what has changed to allow these newfound capabilities. Well, if you have seen any of the marketing blurb you may have noticed things like multi-core Neural engines and machine AI getting are getting lot more mentions

Apple use the term neural engine, google has its tensor engine, Samsung has its Neural Processing Unit and Huawei has its Kirin chipset but basically, these are chips specifically designed to run neural network software.

Although the term AI or Artificial intelligence is used this is Machine learning. So what’s the difference?

According to computer scientist and philosopher, Judea Pearl in his book, The Book of Why. Machine learning learns and predicts based on passive observations, whereas Artificial intelligence implies an agent interacting with the environment to learn and take actions that maximize its chance of achieving its goals.

So what is the difference between that and a normal computer? Well in a normal computer program, a programmer has to define what the program is required to do ahead of time and then program all the responses it will depending on the input it receives, if an input deviates from the expected, the system can not change itself to allow for that, its effectively a dumb system just following precise instructions exactly.

But sometimes there are circumstances when every possible variation can not be programmed for ahead of time like pattern recognition in images and sounds, things that us humans and our biological neural brains are very good at.

Machine Learning on the other hand is a computer system or a model of a process, that’s like a computer program but one that can learn and adapt without following explicit instructions. So, for example, a handwriting recognition model could be trained to recognise any legible characters by analysing thousands of real-world examples and tuning itself to provide the optimum performance that could equal that of a human.

Normal computer CPUs process information in a serial fashion dealing with one block of data, usually  8, 16 or 32 bits at a time but doing this very very quickly, billions of times per second. Adding more CPU cores increases the amount of data that can be processed. However with current software there comes a point where adding more cores decreases the efficiency of each one, so a 64 core processor is overall much less efficient from a data throughput point of view than an 8 core processer and is nowhere near the eight times the computing power that you would think.

A neural network works very differently and are parallel computing engines which works like a collection of connected neurons in a biological brain.

In a normal computer, the basic operation element is a gate that performs a Boolean logical operation on the inputs which are either 1 or 0 and provide an output that is 0 or 1. A neural network uses artificial neurons and much like their biological counterparts, they work in a very different way. 

So lets have a look at an artificial neuron, it can a number of inputs, each one with a value between 0 and 1 rather than just 0 and 1. Each input also has a weighting value associated with it which can also be between 0 and 1 and then a single overall bias value also between 0 and 1 which is added to the output. The output then goes through an activation function that scales the output so large changes are scaled-down and small ones scaled up.

The weighting value assigned to each input is multiplied by the input value and determines how important the input is in affecting the neurons output value and it is this and the bias values that allow a neural network to learn without having a specific programmed action.

Because we are still dealing with digital chips, each neuron is a like a little program which can run on the cores of highly parallel GPU’s or similar dedicated chips, so thousands can run at the same time  extremely quicky.

Back in 1983 researchers constructed a neural model to recognise handwritten numbers from 0 to 9 using a 28 x 28 pixel grid and 60,000 images of handwritten numbers taken from US Census Bureau employees. These images were anti-aliased so their edges were smoothed with greyscale shading so each pixel could have varying values depending on where the character appeared on the grid and how bright or dark the pixels were.

In this model there are three layers, in the first there are 784 neurons, one for each pixel on the 28 x 28 grid, only a few are shown to simplify the image. Each input neuron measures the value of the pixel with 0.0 representing white and 1.0 black with greyscales values in-between.

The outputs of the 784 neurons feed the second hidden layer of 15 neurons which then feeds 10 output neurons of the third layer which represent the output of the network. So when the model is shown a number two then output 2 should be close to 1 and the rest close to zero and if a six were shown output 6 would be close to 1 and the rest close to zero.

This model has 809 neurons in three layers,  together with the weighting values for each input and bias values it has a total of 11935 parameters that can be adjusted.

To train the network the 11935 weight and bais parameters are set to random values. Then a character image is shown to the network, in this case it is a 5. If the network is correctly calibrated then output 5 should be close to 1 and the others close to zero so no changes would be needed. However, if say number 8 was 0.7 that would show that the weighing values for 8 need to be reduced and the values for 5 need to be increased. Once that is done it moves on to the next number image.

This is how the neural network learns. During the training when the outputs don’t correspond with a known input, the weighing values and bias values are adjusted up or down over successive attempts which could be thousands or millions of times in more sophisticated models in a method which is called hill climbing, until the outputs reach an optimum detection performance, this is where it is said to reach the top of the hill or the most accurate guess.

Using this method, any conceivable mathematical equation could be modelled. However, these could also be called probability engines because the output is the probability of what the network thinks it is compared to the input. This also means that they are less accurate for high precision mathematical models than traditional computers although this is increasing as time goes by and models and techniques are developed and refined.

Since 1983 when this deep learning was developed, artificial neural networks have become more sophisticated but in 2012 a breakthrough with a deep learning network was made for the annual Imagenet competition organised by Stanford University.

In this, entrants were given a set of over 1 million training images each labelled with over a 1000 categories like “Articulated Truck”, “Cruiseliner”, “Lion”, “Sunflower”. The winner would be the one who could correctly classify another set of images that were not part of the training set. The models could make five best guesses which would then be matched against the human labelled images.

The Winner was a system called Alexnet after the lead author Alex Krizhevsky. Normally the best error rate during the competition was about 26%, but Alexnet beat them all with an error rate of just 16%.

To compare it to the handwriting network which had 3 layers, 809 neurons and just under 12000 parameters, AlexNet had 8 layers, 650,000 neurons and 60 million parameters. However, the interesting thing was that it was run across two Nvidia GTX 580 Graphic cards each with 512 cores giving the model much more parallel processing power compared to a normal CPU.

Even with this and a lot of optimization, it still took 6 days to train the model but the results were pretty good.

Since then deep learning networks or convolutional networks which Alexnet was, have evolved rapidly with models having tens of layers, millions of neurons and billions of adjustable parameters. With this huge amount of matrix calculations you can see why they need the parallel processing power of GPUs which are also becoming more powerful each year.

By 2017, the best ImagNet competitors had error rates of less than 3% and better than most humans.

This is where we are today, smartphones make the perfect partner for using neural networks powered by custom GPU’s which are now appearing as standard fare in nearly all the highend smartphones and tasked with doing the fuzzy edged tasks like image and speech recognition, text and language translation, video and image optimisation and manipuation, augmented reality and a host of other things that bolster the traditional computing CPUs and make the devices not only faster, easier to use but do tasks that would normally have been done by humans.

This is something that will spread, the new Apple M1 MAC’s and Macbooks use a 16-core Neural Engine so we can expect to see these being integrated into future CPU’s from Intel and AMD at some point.

Machine learning is already being used in many areas not only by the big tech companies like Google and Facebook but in other fields as far apart as medicine to finance and almost everything else in-between including things like self driving cars.

But using their vast amounts of user data for training machine learning systems, companies like Google and Facebook and Amazon have opened up another can of wholes over privacy and only recently Facebook shut down their face recognition program and delete the face data of over a billion users because of this very issue.

Keeping Machine learning data and running the neural networks on the mobile devices and not using online servers to run the neural networks and hold the data like Google and Facebook, Apple say they avoid these issues but looks like it will transform future home and mobile computing and although it won’t be able to take over the world, it will certainly change it.

So I hope you enjoyed the video and if you did then please thumbs up, subscribe and share and don’t forget that Patreon supporters get ad-free versions of the videos before they are released on Youtube.

Paul Shillito
Creator and presenter of Curious Droid Youtube channel and website

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.