Pathfinder

July 17, 2018

Machine learning

Machine learning: PREDICT THE OUTPUT

Field of study that gives the ability to computer to learn (Train) without being explicitly programmed

Machine learning is a set of algorithms that can take a set of inputs and return a prediction

Artificial intelligence and machine learning are two confusing terms. Artificial intelligence is the science of training machine to imitate or reproduce human task. A scientist can use different methods to train a machine. At the beginning of the AI's ages, programmers wrote hard-coded programs,

that is, type every logical possibility the machine can face and how to respond. When a system grows complex, it becomes difficult to manage the rules. To overcome this issue, the machine can use data to learn how to take care of all the situations from a given environment.We need ML in cases where we cannot directly write a program to handle every case. Artificial intelligence improves an existing product. Before the age of machine learning,core products were building upon hard-code rule. Firms introduced artificial intelligence to enhance the functionality of the product rather than starting from scratch to design new products. You can think of a Facebook image. A few years ago, you had to tag your friends manually. Nowadays, with the help of AI, Facebook gives you friend's recommendation.

There are 5 basic steps used to perform a machine learning task:

Collecting data: Be it the raw data from excel, access, text files etc., this step (gathering past data) forms the foundation of the future learning. The better the variety, density and volume of relevant data, better the learning prospects for the machine becomes.

Preparing the data: Any analytical process thrives on the quality of the data used. One needs to spend time determining the quality of data and then taking steps for fixing issues such as missing data and treatment of outliers.

(Learn)Training a model: This step involves choosing the appropriate algorithm and representation of data in the form of the model. The cleaned data is split into two parts – train and test (proportion depending on the prerequisites); the first part (training data) is used for developing the model. The second part (test data), is used as a reference.

Evaluating the model: To test the accuracy, the second part of the data (holdout / test data) is used. This step determines the precision in the choice of the algorithm based on the outcome. A better test to check accuracy of model is to see its performance on data which was not used at all during model build.

Improving the performance: This step might involve choosing a different model altogether or introducing more variables to augment the efficiency. That’s why significant amount of time needs to be spent in data collection and preparation.

Machine learning is a subfield of computer science (CS) and artificial intelligence (AI) that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions.

Besides CS and AI, it has strong ties to statistics and optimization, which deliver both methods and theory to the field.

Machine learning is engaged in a range of computing tasks where designing and programming explicit, rule-based algorithms is infeasible. Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Machine learning, data mining, and pattern recognition are sometimes conflated.

Machine learning tasks can be of several forms.

In supervised learning, the computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs. Spam filtering is an example of supervised learning.

In unsupervised learning, no labels are given to the learning algorithm, leaving it on its own to groups of similar inputs (clustering), density estimates or projections of high-dimensional data that can be visualized effectively. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end. Topic modeling is an example of unsupervised learning, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics.

In reinforcement learning, a computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not.

Generalization: A core objective of a learner is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

These two terms are commonly confused, as they often employ the same methods and overlap significantly.

1. Machine learning focuses on prediction, based on known properties learned from the training data. Machine Learning concentrates on performing a given task

2. Data Mining focuses on the discovery of (previously) unknown properties in the data. Data Mining deals with searching specific information.

This is the analysis step of Knowledge Discovery in Databases.

The two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also works data mining methods as “unsupervised learning” or as a preprocessing step to improve learner accuracy.

Some machine learning systems attempt to eliminate the need for human awareness in data analysis, while others adopt a collaborative approach between human and machine

1) The benefit of machine learning is that it can predict

If you’re just tagging your friend’s faces in pictures, you’re not using a machine learning model. If you upload a new photo and suddenly it tells you who each person is. The whole point of machine learning is to predict things based on patterns and other factors it has been trained with. It can be anything; housing prices based on zip code and number of bedrooms, likelihood of a flight delay based on time of year and weather, tagging of objects or people in pictures etc.

2) Machine learning requires training

You have to tell a machine learning model what it’s trying to predict. Think about how a human child learns. The first time they see a banana, they have no idea what it is. You then tell them it is a banana. The next time they see one (not the one you trained them on because you already ate it) they’ll identify it as a banana. Machine learning works in a similar way. You show it as many pictures of a banana as you possibly can, tell it its a banana, and then test it with a picture of a banana it wasn’t trained on. This is an over simplification a bit because I’m leaving out the part where you also have to tell it what isn’t a banana, and show it different kinds of bananas, different colors, pictures from different perspectives and angles etc.

3) 80% accuracy is considered a success

We are not at the point in technology where a machine learning platform will achieve 100% accuracy with identifying bananas in pictures. But that is ok. It turns out that humans aren’t 100% accurate either. The unspoken rule in the industry is that a model with 80% accuracy is a success. If you think about how useful it is to identify 800,000 images correctly in your collection, whilst MAYBE not getting 200,000 correct, you’re still saving yourself 80% of your time. That is huge from a value perspective. If I could wave a magic wand and increase your productivity that much, you’d give me lots of money. Well, it turns out I can, using machine learning, so please send check or cash.

4) Machine learning is different from AI, deep learning, or neural networks

People tend to throw all of these terms around casually. To sound like an expert, learn the difference.

AI — Artificial Intelligence just means a computer that is as good/better as humans at doing specific tasks. It can also mean a robot that can make decisions based on lots of input, not unlike the Terminator or C3PO.

ML — Machine learning is a method for achieving AI. It means making a prediction about something based on training from sets of parsed data. There are lots of different ways a ML platform can implement training sets to predict things.

NL — Neural networks is one of these ways a machine learning model can predict things. Neural networks work a bit like your brain, by tuning itself through lots and lots of training to understand what a banana is supposed to look like. You create layers of nodes that get very deep.
Machine learning is just to give trained data to a program and get better result for complex problems. It is very close to data mining. While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with: The heavily hyped, self-driving Google car? The essence of machine learning. Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life. Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation. Fraud detection?

How is machine learning different from X?
X = Artificial Intelligence(AI):
It refers to the procedure of programming a computer (machine) to take rational. Ah! what is rational? Rational is the basis of taking a decision.

I mentioned ‘rational’ instead of intelligence (as expected) because we human beings tend to take decisions which are high on being rational and feasible rather than being explicitly intelligent. This is because all intelligent decisions needn’t be rational and feasible (my hypothesis). Hence, the central motive behind using AI is to achieve the computer (machine) behave in a dandy fashion in lieu of human guidance instead of being doltish!

AI may include programs to check whether certain parameters within a program are behaving normally. For example, the machine may raise an alarm if a parameter say ‘X’ crosses a certain threshold which might in turn affect the outcome of the related process.
Use of Artificial Intelligence in Machine Learning

Machine Learning is a subset of AI where the machine is trained to learn from its past experience. The past experience is developed through the data collected. Then it combines with algorithms such as Naïve Bayes, Support Vector Machine(SVM) to deliver the final results.

X = Statistics:
Statistics is that branch of mathematics which utilizes data, either of the entire population or a sample drawn from the population to carry out the analysis and present inferences. Some statistical techniques used are regression, variance, standard deviation, conditional probability and many others
Use of Statistics in Machine Learning
Let’s understand this. Suppose, I need to separate the mails in my inbox into two categories: ‘spam’ and ‘important’. For identifying the spam mails, I can use a machine learning algorithm known as Naïve Bayes which will check the frequency of the past spam mails to identify the new email as spam. Naïve Bayes uses the statistical technique Baye’s theorem (commonly known as conditional probability). Hence, we can say machine learning algorithms uses statistical concepts to execute machine learning.

X = Deep Learning:
Deep Learning is associated with a machine learning algorithm (Artificial Neural Network, ANN) which uses the concept of human brain to facilitate the modeling of arbitrary functions. ANN requires a vast amount of data and this algorithm is highly flexible when it comes to model multiple outputs simultaneously. ANN is more complex topic and we may do justice to it in an altogether separate article

X = Data Mining:
During my initial days as an analyst, I always used to muddle the two terms: Machine Learning and Data Mining. But, later I learnt, Data Mining deals with searching specific information. And Machine Learning solely concentrates on performing a given task. Let me cite the example which helped me to remember the difference; Teaching someone how to dance is Machine Learning. And using someone to find best dance centers in the city is Data Mining.

RPA – To work with semi structure/unstructured data

Machine Learning in Practice
Machine learning algorithms are only a very small part of using machine learning in practice as a data analyst or data scientist. In practice, the process often looks like:

1. Understand the domain, prior knowledge and goals. Talk to domain experts. Often the goals are very unclear. You often have more things to try then you can possibly implement.

2. Data integration, selection, cleaning and pre-processing. This is often the most time consuming part. It is important to have high quality data. The more data you have, the more it sucks because the data is dirty. Garbage in, garbage out.

3. Learning models. The fun part. This part is very mature. The tools are general.

4. Interpreting results. Sometimes it does not matter how the model works as long it delivers results. Other domains require that the model is understandable. You will be challenged by human experts.

5. Consolidating and deploying discovered knowledge. The majority of projects that are successful in the lab are not used in practice. It is very hard to get something used.

Use
1) Predict stock market
2) recommendations of movies – Netflix or amazon
3) recommend friend/ads on fb

June 27, 2018

RPA Blogs

http://automatorsworld.com/
https://www.totalebizsolutions.com/blogs/robotic-process-automation-blogs/

RPA Team structure

CoE Team

IT Architecture
Delivery Manager

- Focused on the strategic nature of Robotics for each of the delivery areas
- Stakeholder management at a senior level
- The delivery manager reports into the Head of Robotic Process Automation
- There may be multiple delivery managers, depending on the number of individual businesses within an organization.
- Collates and communicates the update on each project being delivered
- Holds a weekly meeting with process owners on the delivery of their automated processes
- Undertakes assessments of new robotic processes of a high-value nature
- Tracks delivery vs costs vs benefits
- Manages escalations for the development pods- ensures they are raises to the right stakeholders
- Manages the overall delivery plans for each delivery pod

Business Analyst
RPA Developer
Support and Infra Team
- When a process goes live, it transitions from test to Live production. There is a handover at this point with the delivery pod.
- The support team from this point forward, monitor processes all day. Processes can have a number of breakpoints within the process itself (handovers/pick ups from workflows/folders etc).
- There role is to step through a sample of cases each day plus respond to alerts where there is a direct failure in the process.
- The control room act as the first point of contact for all updates and alerts on process. The process owner can liaise with them directly.

April 15, 2018

Identify right process for automation

Benefit to the company after automation

Volume of transaction
No of employee hrs. (FTE)
Faster the Process (Reduce SLA time)
Error- prone tasks that human do

Identify the RPA process candidate

Define the start and end point of the process
Take 1 process and breakdown that into the subprocess
Structured Process
Rule-based process
Clear decision logic
Definable workflow
Uses multiple application or tools
Repetitive, mundane and unsatisfying Process
High volume and low to medium complexity
Duplication of efforts
Development time for BOT
Stability of process

How often does the process get changed?

How much does it change?

What is the lead time for change?

Operation team Governance

- Monitor BOT result

- Run BOT on time

- Convey change in Password

April 8, 2018

Understand the complexity of Process for automation

Understand the complexity of Process

- number of steps,scenario,application,roles
- Stability of process
- Number of IT systems or IT services required to fulfill the process, or
- Number of activities/Task, Number of controls, or Number of decisions, or
- Number of people involved in the process, or
- Number of departments involved in the process, or
- The mechanism of the decision making, or
- Level of uncertainty and potential change in any of the process activities

March 31, 2018

RPA

Evolution of RPA DO/Learn/Think

Do (Structure data and rule base flow)

- RPA Tool (Automation anywhere, UiPath,etc..)
- VBA (Macro)
- VBScript/ BAT files/ Powershell
- Javascript/Jquery
- Excel formula

Learn (Unstructured data and complex process)
Need to train the BOT frequently to match the decision-making frequency depending on the diversity of the input data.
- Python
- Machine Learning
- Deep Learning
- Cognitive Learning

Think (BOT can take a decision)
- AI

Advantage of RPA

- Frees up employees to do higher-value work
- Save FTE
- Speed up the processing time (Reduce SLA)
- Work for 24*7
- Communicate with multiple technologies (can eliminate development efforts)
- Checker BOT
- Maker BOT
- Generate Report (data massage)
- Process improvement
- Data migration
- Higher outcome quality
- Scalability and agility
etc...

Timeline of a project in RPA

- BRD not having all the details
- Not getting application access on time
- POC
- It is depending on many factors like Project Complexity, RPA tool used, Citrix automation or not, available reusable assets (bots, libraries and etc) and the project team size.
- Change in requirement post development

RPA can fail due to below reason

Organizational pitfalls:
- Lack of time commitment from the local team
- Lack of leadership buy-in
- Lack of IT ownership
- Unclear responsibilities
- Unrealistic expectation
- Communication between operation and application team

Process pitfalls:
- Choosing a process with insignificant business impact
- Choosing a too complex process
- Choosing a process where better custom solutions exist
- Lack of focus in process selection
- Striving for end-to-end automation when it is not cost-effective
- Business exception (data is not proper)

Technical pitfalls:
- Should follow SDLC properly
- Architecture and Authentication
- Choosing a solution that requires intensive programming
- System exception (an application bugs)

Post-implementation pitfalls:
- Operation team governance
- Scalability
- Maintenance

Test Cases for Automation

- Test case executed with a different set of data
- Test case executed with complex business logic
- Test case executed with a different set of users

- Test case Involves a large amount of data

AI Product

- Chatbot
- Voicebot
- Siri/Cortana
etc...

January 27, 2018

Oracle and SQL Server Topics

Oracle Topics

- Installation of Oracle client

- Tablespace

- DBF file

- Relation between tablespace and DBF file

- User

- Role and permission

- Pseudo columns

- Save point,Rollback and commit

- Insert,delete,update and select

- Equi join and non equi join

- Natural join

- Cross join/ Cartesian join

- Self join

- Left, Right and full outer join

- Column format

- Aggregate, Number and Conversion function

- Buffer command

- IN/ Exist

- Subquery

- Table

- View

- Index

- Materialize view

- Exception handling

- Cursor

- Trigger

- Function

- Procedure

- Package

- Dynamic SQL

- Loops

- Array

- Collection

- PL/SQL Debugger, PL/SQL Profiler, PL/SQL Tuning Tips

- Replication

- Job

http://www.way2tutorial.com/plsql/tutorial.php

https://docs.oracle.com/en/database/oracle/oracle-database/index.html

https://docs.oracle.com/cd/B14117_01/nav/portal_1.htm - Download PDF

https://www.tutorialspoint.com/plsql/plsql_arrays.htm

https://www.guru99.com/pl-sql-data-types.html

http://sql-plsql.blogspot.in/2007/05/oracle-plsql-nested-tables.html

http://dotnetmentors.com/sql/sql-server-common-table-expression-with-examples.aspx
https://www.tutorialgateway.org/sql-server-cte/

Common table expression (CTE) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, DELETE or MERGE statement.

SQL Server supports two types of CTEs-recursive and nonrecursive.

WHEN TO USE

If you need to reference/join the same data set multiple times you can do so by defining a CTE. Therefore, it can be a form of code reuse.
After declaration of table, you can refer the table created multiple times in scope of the same query.
Can be used in place of views or temporary tables.
Recursive query is easily created through CTE.
Instead of cursor

Syntax of CTE
With CTE_name <col1,col2,..>
<CTE_query>
select <col1,col2,..>
from CTE_name