Pathfinder: July 2018

Machine learning: PREDICT THE OUTPUT

Field of study that gives the ability to computer to learn (Train) without being explicitly programmed

Machine learning is a set of algorithms that can take a set of inputs and return a prediction

Artificial intelligence and machine learning are two confusing terms. Artificial intelligence is the science of training machine to imitate or reproduce human task. A scientist can use different methods to train a machine. At the beginning of the AI's ages, programmers wrote hard-coded programs,

that is, type every logical possibility the machine can face and how to respond. When a system grows complex, it becomes difficult to manage the rules. To overcome this issue, the machine can use data to learn how to take care of all the situations from a given environment.We need ML in cases where we cannot directly write a program to handle every case. Artificial intelligence improves an existing product. Before the age of machine learning,core products were building upon hard-code rule. Firms introduced artificial intelligence to enhance the functionality of the product rather than starting from scratch to design new products. You can think of a Facebook image. A few years ago, you had to tag your friends manually. Nowadays, with the help of AI, Facebook gives you friend's recommendation.

There are 5 basic steps used to perform a machine learning task:

Collecting data: Be it the raw data from excel, access, text files etc., this step (gathering past data) forms the foundation of the future learning. The better the variety, density and volume of relevant data, better the learning prospects for the machine becomes.

Preparing the data: Any analytical process thrives on the quality of the data used. One needs to spend time determining the quality of data and then taking steps for fixing issues such as missing data and treatment of outliers.

(Learn)Training a model: This step involves choosing the appropriate algorithm and representation of data in the form of the model. The cleaned data is split into two parts – train and test (proportion depending on the prerequisites); the first part (training data) is used for developing the model. The second part (test data), is used as a reference.

Evaluating the model: To test the accuracy, the second part of the data (holdout / test data) is used. This step determines the precision in the choice of the algorithm based on the outcome. A better test to check accuracy of model is to see its performance on data which was not used at all during model build.

Improving the performance: This step might involve choosing a different model altogether or introducing more variables to augment the efficiency. That’s why significant amount of time needs to be spent in data collection and preparation.

Machine learning is a subfield of computer science (CS) and artificial intelligence (AI) that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions.

Besides CS and AI, it has strong ties to statistics and optimization, which deliver both methods and theory to the field.

Machine learning is engaged in a range of computing tasks where designing and programming explicit, rule-based algorithms is infeasible. Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Machine learning, data mining, and pattern recognition are sometimes conflated.

Machine learning tasks can be of several forms.

In supervised learning, the computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs. Spam filtering is an example of supervised learning.

In unsupervised learning, no labels are given to the learning algorithm, leaving it on its own to groups of similar inputs (clustering), density estimates or projections of high-dimensional data that can be visualized effectively. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end. Topic modeling is an example of unsupervised learning, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics.

In reinforcement learning, a computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not.

Generalization: A core objective of a learner is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

These two terms are commonly confused, as they often employ the same methods and overlap significantly.

1. Machine learning focuses on prediction, based on known properties learned from the training data. Machine Learning concentrates on performing a given task

2. Data Mining focuses on the discovery of (previously) unknown properties in the data. Data Mining deals with searching specific information.

This is the analysis step of Knowledge Discovery in Databases.

The two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also works data mining methods as “unsupervised learning” or as a preprocessing step to improve learner accuracy.

Some machine learning systems attempt to eliminate the need for human awareness in data analysis, while others adopt a collaborative approach between human and machine

1) The benefit of machine learning is that it can predict

If you’re just tagging your friend’s faces in pictures, you’re not using a machine learning model. If you upload a new photo and suddenly it tells you who each person is. The whole point of machine learning is to predict things based on patterns and other factors it has been trained with. It can be anything; housing prices based on zip code and number of bedrooms, likelihood of a flight delay based on time of year and weather, tagging of objects or people in pictures etc.

2) Machine learning requires training

You have to tell a machine learning model what it’s trying to predict. Think about how a human child learns. The first time they see a banana, they have no idea what it is. You then tell them it is a banana. The next time they see one (not the one you trained them on because you already ate it) they’ll identify it as a banana. Machine learning works in a similar way. You show it as many pictures of a banana as you possibly can, tell it its a banana, and then test it with a picture of a banana it wasn’t trained on. This is an over simplification a bit because I’m leaving out the part where you also have to tell it what isn’t a banana, and show it different kinds of bananas, different colors, pictures from different perspectives and angles etc.

3) 80% accuracy is considered a success

We are not at the point in technology where a machine learning platform will achieve 100% accuracy with identifying bananas in pictures. But that is ok. It turns out that humans aren’t 100% accurate either. The unspoken rule in the industry is that a model with 80% accuracy is a success. If you think about how useful it is to identify 800,000 images correctly in your collection, whilst MAYBE not getting 200,000 correct, you’re still saving yourself 80% of your time. That is huge from a value perspective. If I could wave a magic wand and increase your productivity that much, you’d give me lots of money. Well, it turns out I can, using machine learning, so please send check or cash.

4) Machine learning is different from AI, deep learning, or neural networks

People tend to throw all of these terms around casually. To sound like an expert, learn the difference.

AI — Artificial Intelligence just means a computer that is as good/better as humans at doing specific tasks. It can also mean a robot that can make decisions based on lots of input, not unlike the Terminator or C3PO.

ML — Machine learning is a method for achieving AI. It means making a prediction about something based on training from sets of parsed data. There are lots of different ways a ML platform can implement training sets to predict things.

NL — Neural networks is one of these ways a machine learning model can predict things. Neural networks work a bit like your brain, by tuning itself through lots and lots of training to understand what a banana is supposed to look like. You create layers of nodes that get very deep.
Machine learning is just to give trained data to a program and get better result for complex problems. It is very close to data mining. While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with: The heavily hyped, self-driving Google car? The essence of machine learning. Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life. Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation. Fraud detection?

How is machine learning different from X?
X = Artificial Intelligence(AI):
It refers to the procedure of programming a computer (machine) to take rational. Ah! what is rational? Rational is the basis of taking a decision.

I mentioned ‘rational’ instead of intelligence (as expected) because we human beings tend to take decisions which are high on being rational and feasible rather than being explicitly intelligent. This is because all intelligent decisions needn’t be rational and feasible (my hypothesis). Hence, the central motive behind using AI is to achieve the computer (machine) behave in a dandy fashion in lieu of human guidance instead of being doltish!

AI may include programs to check whether certain parameters within a program are behaving normally. For example, the machine may raise an alarm if a parameter say ‘X’ crosses a certain threshold which might in turn affect the outcome of the related process.
Use of Artificial Intelligence in Machine Learning

Machine Learning is a subset of AI where the machine is trained to learn from its past experience. The past experience is developed through the data collected. Then it combines with algorithms such as Naïve Bayes, Support Vector Machine(SVM) to deliver the final results.

X = Statistics:
Statistics is that branch of mathematics which utilizes data, either of the entire population or a sample drawn from the population to carry out the analysis and present inferences. Some statistical techniques used are regression, variance, standard deviation, conditional probability and many others
Use of Statistics in Machine Learning
Let’s understand this. Suppose, I need to separate the mails in my inbox into two categories: ‘spam’ and ‘important’. For identifying the spam mails, I can use a machine learning algorithm known as Naïve Bayes which will check the frequency of the past spam mails to identify the new email as spam. Naïve Bayes uses the statistical technique Baye’s theorem (commonly known as conditional probability). Hence, we can say machine learning algorithms uses statistical concepts to execute machine learning.

X = Deep Learning:
Deep Learning is associated with a machine learning algorithm (Artificial Neural Network, ANN) which uses the concept of human brain to facilitate the modeling of arbitrary functions. ANN requires a vast amount of data and this algorithm is highly flexible when it comes to model multiple outputs simultaneously. ANN is more complex topic and we may do justice to it in an altogether separate article

X = Data Mining:
During my initial days as an analyst, I always used to muddle the two terms: Machine Learning and Data Mining. But, later I learnt, Data Mining deals with searching specific information. And Machine Learning solely concentrates on performing a given task. Let me cite the example which helped me to remember the difference; Teaching someone how to dance is Machine Learning. And using someone to find best dance centers in the city is Data Mining.

RPA – To work with semi structure/unstructured data

Machine Learning in Practice
Machine learning algorithms are only a very small part of using machine learning in practice as a data analyst or data scientist. In practice, the process often looks like:

1. Understand the domain, prior knowledge and goals. Talk to domain experts. Often the goals are very unclear. You often have more things to try then you can possibly implement.

2. Data integration, selection, cleaning and pre-processing. This is often the most time consuming part. It is important to have high quality data. The more data you have, the more it sucks because the data is dirty. Garbage in, garbage out.

3. Learning models. The fun part. This part is very mature. The tools are general.

4. Interpreting results. Sometimes it does not matter how the model works as long it delivers results. Other domains require that the model is understandable. You will be challenged by human experts.

5. Consolidating and deploying discovered knowledge. The majority of projects that are successful in the lab are not used in practice. It is very hard to get something used.

Use
1) Predict stock market
2) recommendations of movies – Netflix or amazon
3) recommend friend/ads on fb

July 17, 2018

Machine learning