What is supervised machine learning and how does it relate to unsupervised machine learning?
In this post you will discover supervised learning, unsupervised learning and semis-supervised learning. After reading this post you will know:
- About the classification and regression supervised learning problems.
- About the clustering and association unsupervised learning problems.
- Example algorithms used for supervised and unsupervised problems.
- A problem that sits in between supervised and unsupervised learning called semi-supervised learning.
Let’s get started.
Supervised Machine Learning
The majority of practical machine learning uses supervised learning.
Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.
It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance.
Get your FREE Algorithms Mind Map
Sample of the handy machine learning algorithms mind map.
I've created a handy mind map of 60+ algorithms organized by type.
Download it, print it and use it.
Also get exclusive access to the machine learning algorithms email mini-course.
Supervised learning problems can be further grouped into regression and classification problems.
- Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
- Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
Some common types of problems built on top of classification and regression include recommendation and time series prediction respectively.
Some popular examples of supervised machine learning algorithms are:
- Linear regression for regression problems.
- Random forest for classification and regression problems.
- Support vector machines for classification problems.
Unsupervised Machine Learning
Unsupervised learning is where you only have input data (X) and no corresponding output variables.
The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.
Unsupervised learning problems can be further grouped into clustering and association problems.
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Some popular examples of unsupervised learning algorithms are:
- k-means for clustering problems.
- Apriori algorithm for association rule learning problems.
Semi-Supervised Machine Learning
Problems where you have a large amount of input data (X) and only some of the data is labeled (Y) are called semi-supervised learning problems.
These problems sit in between both supervised and unsupervised learning.
A good example is a photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled.
Many real world machine learning problems fall into this area. This is because it can be expensive or time-consuming to label data as it may require access to domain experts. Whereas unlabeled data is cheap and easy to collect and store.
You can use unsupervised learning techniques to discover and learn the structure in the input variables.
You can also use supervised learning techniques to make best guess predictions for the unlabeled data, feed that data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data.
In this post you learned the difference between supervised, unsupervised and semi-supervised learning. You now know that:
- Supervised: All data is labeled and the algorithms learn to predict the output from the input data.
- Unsupervised: All data is unlabeled and the algorithms learn to inherent structure from the input data.
- Semi-supervised: Some data is labeled but most of it is unlabeled and a mixture of supervised and unsupervised techniques can be used.
Do you have any questions about supervised, unsupervised or semi-supervised learning? Leave a comment and ask your question and I will do my best to answer it.
Frustrated With Machine Learning Math?
See How Algorithms Work in Minutes
…with just arithmetic and simple examples
Discover how in my new Ebook: Master Machine Learning Algorithms
It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…
Finally, Pull Back the Curtain on
Machine Learning Algorithms
Skip the Academics. Just Results.
Click to learn more.
WRITING A CLASSIFICATION PAPER
Classification is sorting things into groups or categories on a single basis of division. A classification paper says something meaningful about how a whole relates to parts, or parts relate to a whole. Like skimming, scanning, paraphrasing, and summarizing, classification requires the ability to group related words, ideas, and characteristics.
Prewriting and purpose
It is a rare writer, student or otherwise, who can sit down and draft a classification essay without prewriting. A classification paper requires that you create categories, so prewriting for a classification paper involves grouping things in different ways in order to discover what categories make the most sense for the purpose you intend.
An important part of creating useful categories is seeing the different ways that things can be grouped. For example, a list of United States presidents may be grouped in any number of ways, depending on your purpose. They might be classified by political party, age on taking office, or previous occupations, but you could just as well, depending on your purpose, classify them by the pets they keep or how they keep physically fit. If your purpose was to analyze presidential administrations, you would group information focusing on the presidents' more public actions–say, cabinet appointments and judicial nominations. On the other hand, if you intended to write about the private lives of presidents, you might select information about personal relationships or hobbies.
Make sure the categories you create have a single basis of classification and that the group fits the categories you propose. You may not, for example, write about twentieth century presidents on the basis of the kinds of pets they kept if some of those presidents did not keep pets. The group does not fit the category. If you intend to talk about all the presidents, you must reinvent the categories so that all the presidents fit into it. In the example below, the group is "all U.S. presidents" and the two categories are "those who kept pets and those who did not":
Some U.S. presidents have indulged their love of pets, keeping menageries of animals around the White House, and others have preferred the White House pet-free.
Alternatively, in the following example, the group is "twentieth century U.S. presidential pet-keepers" and the three categories are "dog lovers, cat lovers, and exotic fish enthusiasts."
Among the twentieth century presidents who kept pets, presidential pet-keepers can be classified as dog-lovers, cat-lovers, or exotic fish enthusiasts (for who can really love a fish?).
Developing a thesis
Once you have decided on your group, purpose, and categories, develop a thesis statement that does the following three things:
- names what group of people or things you intend to classify
- describes the basis of the classification
- labels the categories you have developed
Here is a thesis statement for a classification paper written for a Health and Human Fitness class that includes all three of the above elements, underlined:
Our last five U.S. presidents have practiced physical fitness regimens that varied from the very formal to the informal. They have been either regular private gym-goers, disciplined public joggers, or casual active sports enthusiasts.
Order is the way you arrange ideas to show how they relate to one another. For example, it is common to arrange facts and discussion points from most- to least-important or from least- to most-important, or from oldest to most recent or longest to shortest. The example thesis statement above is ordered from most- to least-formal physical fitness activities. There is no one right way; use an ordering system that seems best to suit your purpose and the type of information you are working with.
For example, suppose you are writing about the last five U.S. presidents for a psychology class. If you wish to show that these presidents' public decisions spring directly from negative issues in their personal relationships, you might order your information from most private to more public actions to clearly establish this connection. Or, if you wish to give the reader the impression that he is moving into increasingly intimate knowledge of personal presidential foibles, you may choose the reverse, ordering your information from public to private.
Signal phrases, or transitions, typically used for classification papers include the following:
- this type of...
- several kinds of...
- in this category...
- can be divided into...
- classified according to...
- is categorized by...
These phrases signal to the reader your intention to divide and sort things. They also contribute to the unity of the paper.
Classification requires that you invent (or discover) abstract categories, impose them on a concrete whole, and derive something new-a tall order that you can, nevertheless, manage if you resist the temptation to skip the brainstorming steps. Remember that clinical dissection is never an aim in itself; the point of classification is to reveal and communicate something meaningful.