What is Machine learning

In this tutorial, I am going to teach you about machine learning. This tutorial contains short and precise notes about machine learning. Before proceed to this tutorial, I recommend you to read my preceding post related to Artificial Intelligence.

Introduction to Machine Learning

  • In short, Machine Learning is a modeling technique that involves data.
  • Machine Learning is a technique that figures out the “model” out of “data.”
  • Here, the data literally means information such as documents, audio, images, etc.
  • The “model” is the final product of Machine Learning.
Machine learning

Types of Machine learning

type of machine learning

Supervised learning

Supervised learning is very similar to the process in which a human learns things. Consider that humans obtain new knowledge as we solve exercise problems.

  • Select an exercise problem. Apply current knowledge to solve the problem. Compare the answer with the solution.
  • If the answer is wrong, modify current knowledge.
  • Repeat Steps 1 and 2 for all the exercise problems.

When we apply an analogy between this example and the Machine Learning process

  • the exercise problems and solutions correspond to the training data
  • the knowledge corresponds to the model.
Supervised learning

Building supervised learning machine learning models has four stages:

1: Feature extraction

2: Training (model building)

3: Testing or validation

4: prediction

How Developer use Supervised machine Learning?

How supervised learning works

The image includes tree, house, green-land, cloud, and sky. But this image is tagged (labeled) with only house, tree, and cloud. Because the we needs to classify only with these target labels. Then the supervised model predicts the image as only house, tree, and cloud. It cannot classify as both tree, house, green-land, cloud, and sky because even though green-land and sky are parts of the image features, it is not tagged (labeled) with the image features.

data labeling in machine learning

This implies that during supervised learning image features are used only as identity for the target label. That means we can tag (map) the target labels with identity image in multi label classification even though that labels are not present inside the image. The following figure also illustrates Supervised Learning.

Multi label image

You see this image; this is classified as both cat and bird. But there is also trees, wood frame, and electric wire in the image. In supervised learning the machine only understands the tagged (mapped) label with the image feature

The model uses the image features as identity of tagged label. If we label the above image only with trees, wood frame, and electric wire, it classifies as these target label. Even though there is cat and bird feature, the model can not understand this feature because the image is tagged (labeled) only with trees, wood frame, and electric wire.

Broadly, there are two types commonly used as supervised learning algorithms. Thus are regression and classification

Regression

The output to be predicted is a continuous number in relevance with a given input dataset. The regression does not determine the class. Instead, it estimates a value. As an example, if you have datasets of age and income and want to find the model that estimates income by age, it becomes a regression problem

Regression

Classification

The classification problem focuses on literally finding the classes to which the data belongs. Some examples may help.

Classification in machine learning

The output to be predicted is the actual or the probability of an event/class and the number of classes to be predicted can be two or more. In this case, the training data of N sets of the element will look like Figure.

Classification using SVM
Classification labels

There are three main classes in classification task.

Type of classes in Machine learning
Types of classes in machine learning

Supervised learning algorithm

  • Decision tree
  • Support Vector Machine (SVM)
  • k Nearest Neighbors (kNN)

Support Vector Machine (SVM)

SVM in Machine learning

k Nearest Neighbors (kNN)

KNN

Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look at the below snapshot:

Support vector machine

Above, you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification.

Unsupervised Machine learning

There are situations where the desired output class/event is unknown for historical data. The objective in such cases would be to study the patterns in the input dataset to get better understanding and identify similar patterns that can be grouped into specific classes or events.

This concept is similar to a student who just sorts out the problems by construction and attribute and doesn’t learn how to solve them because there are no known correct outputs.

Clustering machine learning

The basic type of unsupervised learning is:

  • Clustering
  • Dimension reduction
  • Anomaly detection

Clustering

The goal here is to divide the input dataset into logical groups of related items. Some examples are grouping similar news articles, grouping similar customers based on their profile, etc. the popular algorithm for clustering is K-means. Suppose we have ‘n’ data points that we need to cluster into k (c1, c2, c3) groups.

Clustering in Machine learning

Dimension Reduction

Here the goal is to simplify a large input dataset by mapping them to a lower dimensional space. For example, carrying analysis on a large dimension dataset is very computationally intensive, so to simplify you may want to find the key variables that hold a significant percentage (say 95%) of information and only use them for analysis.

The common algorithm for dimension reduction technique is Principal Component Analysis (PCA)

PCA in machine learning

Anomaly Detection

It is also commonly known as outlier detection

Anomaly Detection is the identification of items, events or observations which do not conform to an expected pattern or behavior in comparison with other items in a given dataset. It has applicability in a variety of domains, such as machine or system health monitoring, event detection, fraud/intrusion detection etc. In the recent days, anomaly detection has seen a big area of interest in the word of Internet of Things to enable detection of new behavior in a given context.

Reinforcement training

The combustion of supervised and unsupervised learning

Data Analysis packages

There are four key packages that are most widely used for data analysis.

  • NumPy
  • Matplotlib
  • Pandas

NumPy

NumPy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

A NumPy array is a collection of similar data type values, and is indexed by a tuple of nonnegative numbers. The rank of the array is the number of dimensions, and the shape of an array is a tuple of numbers giving the size of the array along each dimension.

Pandas

Pandas are an open-source Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

Data Structures Pandas introduces two new data structures to Python – Series and Data Frame, both of which are built on top of NumPy.

Pandas data

Series

This is a one-dimensional object similar to column in a spreadsheet or SQL table. By default each item will be assigned an index label from 0 to N.

series

Basic Statistics Summary

Pandas has some built-in functions to help us to get better understanding of data using basic statistical summary methods.

describe ()will returns the quick stats such as count, mean, std (standard deviation), min, first quartile, median, third quartile, max on each column of the data frame

Matplotlib

Matplotlib is a numerical mathematics extension NumPy and a great package to view or present data in a pictorial or graphical format. It enables analysts and decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.

Using Global Functions

Let’s look at some of the most commonly used charts.

  • plt.bar – creates a bar chart
  • plt.scatter – makes a scatter plot
  • plt.boxplot – makes a box and whisker plot
  • plt.hist – makes a histogram
  • plt.plot – creates a line plot
bar chart
bar chart in different color
Line chart

Thank you for reading. To know more about our service visit this site.

Leave a Reply

Your email address will not be published. Required fields are marked *