Understanding Machine Learning Methodology

February 22, 2020
9 mins read

Motivation

Well, if we talk about a human cell sample extracted from a patient. The cell would have some characteristics. One of the interesting questions we can ask, what kind of statistics that cell have? One could easily presume that only a doctor with years of experience could diagnose a tumor and say if the patient is developing cancer or not.

Let’s imagine that we’ve obtained a dataset containing characteristics of thousands of human cell samples extracted from patients who were believed to be at risk of developing cancer. Analysis of the original data showed that many of the characteristics differed significantly among different samples.

We can use the values of these cell characteristics in samples from other patients to give an early indication of whether a new sample belongs to which type or characteristics. We should clean our data, select a proper algorithm for building a prediction model, and train our model to understand patterns of different kinds of cells within the data.

Once the model has been trained by going through data iteratively, it can be used to predict our new or unknown cell with rather high accuracy.

This is what we called machine learning! It is the way that a machine learning model can do a doctor’s task or at least help that doctor make the process faster.

What is Machine Learning?

Machine learning is the ability of computers to learn without being explicitly programmed.

Without being explicitly programmed” means, e.g. we’ve to predict the image of animals. So before machine learning, each image would be transformed to a vector by features then traditionally we’ve to write down a lot of rules or methods in order to get computers to be intelligent and detect the animals. Perhaps it would be the failure because its highly dependent upon current data sets.

So here comes the machine learning, using machine learning allows us to build a model that looks at all the feature sets, and their corresponding type of animals, and learn it learns the pattern of each animal. It is a model built by machine learning algorithms. It detects without explicitly being programmed to do so. In essence, machine learning follows the same process that a 4-year-old child uses to learn, understand, and differentiate animals.

So, machine learning algorithms, inspired by the human learning process, iteratively learn from data and allow computers to find hidden insights. These models help us in a variety of tasks, such as object recognition, summarization, recommendation, and so on.

Machine Learning impacts society in a very influential way. E.g.

  • Paypal uses Machine Learning to detect fraud.
  • Amazon uses Machine Learning to give you suggestion, what you can further buy.
  • Banks also use Machine Learning to approve Loans.
  • Telcos use customers data to segment them.

Applications of Machine Learning;

There are many applications of machine learning like Search engine results, voice recognition, Number Plate Recognition, Dream Reader. This small sampling is just the beginning, from automatic cars to scientific discovery, any of these things are part of today’s world of machine learning.

If we talk about the search engine, Imagine if we’re on Google, we give very reliable information and speed, it’s automated and time goes on we got more information, the search engine returns better and better results.

Same with Voice Recognition, where its better and better voice recognizing what we’re saying and able to transcribe it for any of our Google commands or home devices where they recognized our voice, we can see that in a number of recognition apps.

So the use of machine learning is because it helps make life easier. It helps our processes be more consistent and reliable.

Major Techniques of Machine Learning

So, let’s quickly examine a few of the more popular techniques.

  • Regression / Estimation; Predict Continous Values
    • This technique is used for predicting a continuous value;
      • E.g. predicting things like the price of a house based on its characteristics, or to estimate the CO2 emission from a car’s engine.
  • Classification; Predicting the item class/category of a case.
    • A Classification technique is used for Predicting the class or category of a case.
      • E.g. if a cell is benign or malignant, or whether or not a customer will churn.
  • Clustering; Finding the structure of data; summarization.
    • Clustering groups of similar cases.
      • E.g. Can find similar patients, or can be used for customer segmentation in the banking field.
  • Anomaly Detection; Discovering abnormal and unusual cases.
    • Anomaly detection is used to discover abnormal and unusual cases.
      • E.g. It is used for credit card fraud detection.
  • Sequence mining; Predicting next events, click-stream (Morkov Model, HMM).
    • Sequence mining is used for predicting the next event.
      • E.g. the click-stream in websites.
  • Dimension Reduction; Reducing the size of data (PCA).
    • Dimension reduction is used to reduce the size of data.
  • Recommendation Systems; Recommending Items.
    • This associates people’s preferences with others who have similar tastes and recommends new items to them.
      • E.g. Recommended Books or Food.

How does Machine Learning work?

Machine Learning works in different phases.

Phase#1: Learning

We’ve phase#1 which “Learning”, that broken up into three different steps;

  • Pre-Processing: The first step is we need to clean and format the data. (That is because computers are not smart when it comes to figuring out the difference between a picture or text when we send it in), so the first thing we do is usually clean the data so all our pictures are in one file and text is being processed separately. Because if we would try to process text like we do a picture we’re not gonna get the right answer and vice-versa, once we pre-process the data and we’ve it nicely clean, we’re gonna go in and start learning.
  • Learning: In this step, we take that data and learn from it. And here comes the supervised and unsupervised learning.
  • Testing: In this step, we have it a test to make sure we are getting the right answer out of it.

Phase#2: Prediction

In this phase, we’re actually using it or putting it into commercial use and that is to do a prediction and on there now we have our train model and our new data come together and output is going to be a prediction of what we are looking for. We can see that in the form of predicted data.

Machine Learning Workflow; It works iteratively;

  • Define Objective
  • Prepare the Data
  • Collect Data
  • Select Algorithm
  • Train Model
  • Test Model
  • Predict

Machine Learning with Python

Python is a preferred language among data scientists. We can write our machine learning algorithm using python, and it works very well. However, there are a lot of modules and libraries already implemented in python that can make our life much easier.

Numpy

Numpy is a math library to work with n-dimensional arrays in Python. It enables you to do computation efficiently and effectively. It is better than regular python because of its amazing capabilities.

  • E.g. for working with arrays, dictionaries, functions, datatypes, and working with images, we need to know Numpy.

SciPy

SciPy is a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics and much more. SciPy is a good library for scientific and high-performance computation.

Matplotlib

Matplotlib is a very popular plotting package that provides 2D plotting as well as 3D plotting.

Pandas

Pandas library is a very high-level python library that provides high-performance, easy to use data structures. It has many functions for data importing, manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Scikit Learn

Scikit-learn is a collection of algorithms and tools for machine learning. Scikit-learn is a free machine learning library for the Python programming language.

  • It has most of the classification, regression and clustering algorithms.
  • It’s designed to work with the Python numerical and scientific libraries
    • NumPy
    • SciPy.

Most of the tasks that need to be done in a machine learning pipeline are implemented already in scikit learn, including

  • Pre-processing of data.
  • Feature selection
  • Feature extraction
  • Train/Test splitting
  • Defining the Algorithms.
  • Fitting models
  • Tuning parameters
  • Prediction
  • Evaluation, and
  • Exporting the model.

Tools for Hands-On

Particularly if we talk about tools to start with then there are a number of tools and IDEs available to start with. One of the cool tools to start with is “Jupyter Notebook“. All we need to do is install Anaconda for it. We would get a simple interface where we can easily run and test our code easily.

Supervised vs Unsupervised vs Reinforcement

Supervised Learning

It’s the “Task Driven” (Predict next value). Here, we teach the model! then with that knowledge, it can predict unknown or future instances.

Supervise means to observe and direct the execution of a task, project, or activity. Obviously, we aren’t going to be supervising a person. Instead, we’ll be supervising a machine learning model that might be able to produce classification regions, etc.

So, how do we supervise a machine learning model? We do this by “teaching” the model. i.e. we load the model with knowledge so that we can have it predict future instances.

But! How exactly do we teach a model? We teach the model by training it with some data from a labeled dataset. It’s important to note that the data is labeled.

And what does a labeled dataset look like? Well, it can look something like a spreadsheet with proper labeling over it. The top row is called Attributes and the columns are called Features, which include the data.

If you plot this data and look at a single data point on a plot, it’ll have all of these attributes. That would make a row on this chart, also referred to as an observation.

Looking directly at the value of the data, you can have two kinds.

  • The first is numerical; When dealing with machine learning, the most commonly used data is numeric.
  • The second is categorical; It’s non-numeric because it contains characters rather than numbers.

Supervised Learning Types

There are two types of Supervised Learning techniques.

  • Classification: is the process of predicting a discrete class label or category.
  • Regression: is the process of predicting a continuous value as opposed to predicting a categorical value in classification.

Unsupervised Learning

Its data-driven (identify clusters). Here, we do not supervise the model, but we let the model work on its own to discover information that may not be visible to the human eye.

It means, The Unsupervised algorithm trains on the dataset, and draws conclusions on UNLABELED data.

Unsupervised learning has more difficult algorithms than supervised learning since we know little to no information about the data, or the outcomes that are to be expected.

Dimension reduction, Density estimation, Market basket analysis, and Clustering are the most widely used unsupervised machine learning techniques.

  • Dimensionality Reduction: and/or feature selection plays a large role in this by reducing redundant features to make the classification easier.
  • Market basket analysis: It is a modeling technique based upon the theory that if you buy a certain group of items, you’re more likely to buy another group of items.
  • Density estimation: It is a very simple concept that is mostly used to explore the data to find some structure within it.
  • Clustering: It is considered to be one of the most popular unsupervised machine learning techniques used for grouping data points or objects that are somehow similar. It is a grouping of data points or objects that are somehow similar by
    • Discovery Structure
    • Summarization
    • Anomaly detection

Cluster analysis has many applications in different domains, whether it be a bank’s desire to segment its customers based on certain characteristics, or helping an individual to organize and group his/her favorite types of books!

Comparison:

So, The biggest difference between Supervised and Unsupervised Learning is that supervised learning deals with labeled data while Unsupervised Learning deals with unlabeled data.

  • In supervised learning, we have machine learning algorithms for Classification and Regression.
  • In unsupervised learning, we have methods such as clustering.
  • In comparison to supervised learning, unsupervised learning has fewer models and fewer evaluation methods that can be used to ensure that the outcome of the model is accurate.
  • As such, unsupervised learning creates a less controllable environment, as the machine is creating outcomes for us.

Reinforcement Learning

Here, it involves teaching the machine to think for itself based on its past action reward.

Comparison of Machine Learning with other Key Technologies;

  • AI tries to make computers intelligent in order to mimic the cognitive functions of humans. So, Artificial Intelligence is a general field with a broad scope including Computer Vision, Language Processing, Creativity, Summarization.
  • Machine Learning is the branch of AI that covers the statistical part of artificial intelligence. It teaches the computer to solve problems by looking at hundreds or thousands of examples, learning from them, and then using that experience to solve the same problem in new situations.
  • Deep Learning is a very special field of Machine Learning where computers can actually learn and make intelligent decisions on their own. Deep learning involves a deeper level of automation in comparison with most machine learning algorithms.

Conclusion:

So this is all about the basic understanding of Machine Learning. Let’s do further hands-on with its key concepts one by one here;

Data Scientist & Solution Architect || IBM Recognised Speaker, Mentor, and Teacher || Debater || Blogger || Guinness World Record Holder || Watson Solution Developer || IBM Community Activist || Aspiring to Inspire.

Leave a Reply

Your email address will not be published.

Previous Story

AI vs ML vs DL

Next Story

Understanding of Data Science Methodology

Latest from Blog

Quantum Computing | What, Why & How

Now as we are starting the 5th decade after the very first proposed model in the 1980s, quantum computers are now commercially available, brought out of the lab, and into the industry by IBM Quantum. With many experts predicting it will revolutionize the way we approach problem-solving. With the recent

Case Study: Digital Transformation of “justSajid Bank”

In this current era of what we call “digital first”, industries are repeatedly changing and evolving with the help of technology. So as the banking industry is reshaping itself, banks need to keep adopting positive changes to counter and overcome the key challenges of the modern world to overcome the

Metaverse || A Whole New World.

Things are going to be super dramatic in the next few years. Metaverse is going to open a lot of new doors to revise things that we are doing now. The culture and lifestyle would be changed. Some new cultures would have emerged and everybody would be living without physical

Pakistan Day 2020 | Coronavirus Technical Solutions

Well, on Pakistan Day 2020 our nation along with the rest of the world is facing Corona Pandemic. As the sons of this great motherland, we are trying to play our roles at our places. Within the last few weeks, Alhamdulillah I’ve got a chance to develop two technical solutions

Password: Choose the Secure, Easy and Cool Password

A super secure Password is very common nowadays. Although according to experts the password is going to expire very soon. Moreover, the pin codes are also replacing password in some why. Perhaps still there is a lot of importance of a secure password in many places. The one of the most

JumpStart with DevOps

What is DevOps; DevOps is a Software Development Strategy, that bridges the gap between the Dev and the Ops side of the company. * It’s not a tool, it’s a Methodology to bridge the gap b/w development and operations teams! As there is always a lot of conflicts between the

JumpStart Programming with Python 3

Motivation; Well, if you want to play with data and deal with complex analytics problems then Python is the best for you. We can use Python for developing complex scientific and numeric apps. Python is designed with features to facilitate data analysis and visualization. The syntax in Python helps the programmers to do coding

Understanding of Data Science Methodology

Motivation: It’s all about the different methods used in data science. Data Science Methodology: There is the following methodology used in data science which can further categories into different phases; From Problem to Approach Business Understanding Analytical Approach Working with Data Data Requirements Data Collection Data Understanding Data Preparation Deriving the

SIMPLE NAVIGATION IN WINDOWS APPLICATION

The Navigation of UWP Apps is extremely cool and easier than other platforms. It allows enabling a variety of intuitive user experiences for moving between apps, pages, and content. Navigation is the key part of the Application. In your small Apps, you may maintain your contents and functionality in a

UNDERSTANDING THE ARCHITECTURE OF UNIVERSAL WINDOWS PLATFORM

The architecture of Universal Windows Platform is most exclusive with the center of gravity which is One Windows Core. Now the whole development will follow the common refactored core that will common for all the windows Releases. It’s all about One Core, One Hardware Platform, Universal Hardware Driver and Standalone

Guinness World Record Holder | Hajj Hackathon | Khadim Al-Haramain Al-Sharifain

Khadim Al-Haramain Al-Sharifain Built an MVP of a cognitive platform name “Khadim Al-Haramain Al-Sharifain“.  Objective The objective was to facilitate Hajj operations by introducing cognitive services integrated with Hajj systems. Hence millions of Muslims who are there for Allah Rab-ul-Izat, can perform anything without disturbing their focuses and concentrations.  Functionalities

JumpStart with IBM Bluemix

IBM Bluemix is the world most exclusive cloud platform that provides developers to quickly develop, Deploy and manage Apps over the cloud without dealing with any underline infrastructure. Bluemix provides great open source platforms for your needs. Tech experts believe that up to 2020 the computing era would completely change.

JumpStart Into Big Data With HDInsight

What would happen when the volume of your data increased repeatedly over time and you need high velocity at the same time. Not only that but you have a different variety of data and Variability also exists in your data. So how would you handle all that data? If we

Microsoft SharePoint: Create an Intranet in SharePoint

As you should already have some hands on with SharePoint. So, Lets do some hands on with SharePoint Intranet for Small company. Simply start with a Blank Site Collection. Let’s assume that the company has 3 Departments: Sales, Production and Support. They each have their own area in which they should

.NET Core

.NET Core 1.0  .NET Core 1.0 is a major new investment in the future of .NET and laying the foundation for decades to come. Still, it is in its early stage, and for some time you might still focus .NET Framework 4.6 depending on your application needs. But for many scenarios,

JUMPSTART WITH DATA BINDING IN UWP

Data Binding is one of the most interesting topics in Universal App Development. In order to perform the better functionality, you really need to have some great usage of Data Binding in you Apps. So! First thing first. What is actually the Data Binding and is its usage in UWP.

justSajid | A PLACE FOR MY THOUGHTS

This blog is my personal mental playground. The views expressed are mine, and mine alone. They do not represent the views of my employer, my family, my friends, my imaginary friends, my neighbors, people who live in my city, province, country, region, or planet, nor do they represent the views

justSajid | A PLACE FOR MY THOUGHTS

justSajid is my personal mental playground. The views expressed are mine, and mine alone. They do not represent the views of my employer, my family, my friends, my imaginary friends, my neighbors, people who live in my city, province, country, region, or planet, nor do they represent the views of

Agentic AI: The Future of Machines Acting with Purpose

Artificial Intelligence (AI) has come a long way from being a fancy algorithm for chess games to becoming our digital assistant, creative collaborator, and even a philosopher’s muse. One of the more intriguing advancements in AI is Agentic AI—an AI system that can act autonomously, pursue goals, and make decisions,
GoUp