by James Longbottom

Technical Consultant

Given the colossal amount of unstructured, potential-laden data in the world, the branch of AI technology we call ‘unsupervised machine learning’ may yet become the most crucial to the business world. But first off: what is it?

What does unsupervised mean?

Unsupervised is a branch of machine learning within AI that allows data to be understood without any or much previous information. This is the flip side of supervised learning, which relies on labeled data to build the best model. Unsupervised is generally useful in situations where the end goal is not too clearly defined, so evaluating unknown patterns or trends could be an advantage. The rest of this blog contains some examples of unsupervised machine learning and how they can be used in real world applications. Perhaps some of the examples can overlap with data that you possess?




Clustering is a very common example of unsupervised, taking big data – typically numeric data, as pattern-finding can be difficult and time consuming – and finds commonalities and clusters of interesting sets. This mode of machine learning has some interesting uses. For example, music players and audio streaming platforms like Spotify and Apple Music will use this type of machine learning to cluster music into surefire recommendations. The AI engine will automatically generate a range of features with which to bind the music into clusters, allowing previously-unprocessed music to be instantly classified and packaged for users.




Geo-clustering can have many uses, the most obvious being the ability to provide insight in geo-tags, usually in the longitude-latitude format. One of the ways to achieve this is by using ‘k-means’, which is a clustering algorithm. With k-means, a predetermined number of clusters is provided as input and the algorithm generates the clusters within the un-labeled dataset. If used on a large scale, you may have to factor in the curvature of the earth, but more simply it can provide insight and even assist with scheduling a route with many points of location.




Natural language processing is an exciting topic within machine learning. Language can often be the most unstructured and random in form, especially when factoring in multiple languages. This method uses many layers of preprocessing the text or audio data to make it workable. This can then feed into modelling and pattern mining to identify commonalities in language. At its most direct, it can identify patterns in huge data sets of text transcription or audio of language. At its best, it can attempt to understand language. Classic examples are voice assistants like Siri and Amazon Alexa, but this approach can also help with language-based chatbots and service call automation.




Like the more common convolutional neural network (CNN), except auto-encoders excel at the things CNN is not so great at. Just like a neural network, it’s trained to produce an output which is similar to its input (so it attempts to copy its input to its output) and since it doesn’t need any labels, it is unsupervised when the training happens. This can power a huge range of applications, from Natural-language understanding to improving image recognition to identify images and link with tags that have not been assessed before by the AI. Possible examples could be a smart image searching tool or a language recognition tool working with unstructured data.


Making the most of the data


While the above examples are only a brief summary of what unsupervised machine learning can achieve, the theme is consistent: these AI systems can be best applied to random, unstructured, mixed-up data to provide structure and meaning that would surpass manual and even end-user ability when it comes to processing. With more and more companies newly investing in AI, unsupervised learning is set to become a prominent way to unlock the value in oceans of unstructured data waiting to be processed.


Share this on: