July 17, 2018

Machine learning

Machine learning: PREDICT THE OUTPUT

Field of study that gives the ability to computer to learn (Train) without being explicitly programmed

Machine learning is a set of algorithms that can take a set of inputs and return a prediction
Artificial intelligence and machine learning are two confusing terms. Artificial intelligence is the science of training machine to imitate or reproduce human task. A scientist can use different methods to train a machine. At the beginning of the AI's ages, programmers wrote hard-coded programs, 
that is, type every logical possibility the machine can face and how to respond. When a system grows complex, it becomes difficult to manage the rules. To overcome this issue, the machine can use data to learn how to take care of all the situations from a given environment.We need ML in cases where we cannot directly write a program to handle every case. Artificial intelligence improves an existing product. Before the age of machine learning,core products were building upon hard-code rule. Firms introduced artificial intelligence to enhance the functionality of the product rather than starting from scratch to design new products. You can think of a Facebook image. A few years ago, you had to tag your friends manually. Nowadays, with the help of AI, Facebook gives you friend's recommendation.

There are 5 basic steps used to perform a machine learning task:

Collecting data: Be it the raw data from excel, access, text files etc., this step (gathering past data) forms the foundation of the future learning. The better the variety, density and volume of relevant data, better the learning prospects for the machine becomes. 

Preparing the data: Any analytical process thrives on the quality of the data used. One needs to spend time determining the quality of data and then taking steps for fixing issues such as missing data and treatment of outliers.

(Learn)Training a model: This step involves choosing the appropriate algorithm and representation of data in the form of the model. The cleaned data is split into two parts – train and test (proportion depending on the prerequisites); the first part (training data) is used for developing the model. The second part (test data), is used as a reference. 

Evaluating the model: To test the accuracy, the second part of the data (holdout / test data) is used. This step determines the precision in the choice of the algorithm based on the outcome. A better test to check accuracy of model is to see its performance on data which was not used at all during model build.

Improving the performance: This step might involve choosing a different model altogether or introducing more variables to augment the efficiency. That’s why significant amount of time needs to be spent in data collection and preparation.


Machine learning is a subfield of computer science (CS) and artificial intelligence (AI) that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions.

Besides CS and AI, it has strong ties to statistics and optimization, which deliver both methods and theory to the field.


Machine learning is engaged in a range of computing tasks where designing and programming explicit, rule-based algorithms is infeasible. Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Machine learning, data mining, and pattern recognition are sometimes conflated.

Machine learning tasks can be of several forms.

In supervised learning, the computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs. Spam filtering is an example of supervised learning.

In unsupervised learning, no labels are given to the learning algorithm, leaving it on its own to groups of similar inputs (clustering), density estimates or projections of high-dimensional data that can be visualized effectively. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end. Topic modeling is an example of unsupervised learning, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics.

In reinforcement learning, a computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal or not.

Generalization: A core objective of a learner is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.


These two terms are commonly confused, as they often employ the same methods and overlap significantly.

1. Machine learning focuses on prediction, based on known properties learned from the training data. Machine Learning concentrates on performing a given task

2. Data Mining focuses on the discovery of (previously) unknown properties in the data. Data Mining deals with searching specific information.


This is the analysis step of Knowledge Discovery in Databases.

The two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also works data mining methods as “unsupervised learning” or as a preprocessing step to improve learner accuracy.


Some machine learning systems attempt to eliminate the need for human awareness in data analysis, while others adopt a collaborative approach between human and machine


1) The benefit of machine learning is that it can predict

If you’re just tagging your friend’s faces in pictures, you’re not using a machine learning model. If you upload a new photo and suddenly it tells you who each person is. The whole point of machine learning is to predict things based on patterns and other factors it has been trained with. It can be anything; housing prices based on zip code and number of bedrooms, likelihood of a flight delay based on time of year and weather, tagging of objects or people in pictures etc.


2) Machine learning requires training

You have to tell a machine learning model what it’s trying to predict. Think about how a human child learns. The first time they see a banana, they have no idea what it is. You then tell them it is a banana. The next time they see one (not the one you trained them on because you already ate it) they’ll identify it as a banana. Machine learning works in a similar way. You show it as many pictures of a banana as you possibly can, tell it its a banana, and then test it with a picture of a banana it wasn’t trained on. This is an over simplification a bit because I’m leaving out the part where you also have to tell it what isn’t a banana, and show it different kinds of bananas, different colors, pictures from different perspectives and angles etc.


3) 80% accuracy is considered a success

We are not at the point in technology where a machine learning platform will achieve 100% accuracy with identifying bananas in pictures. But that is ok. It turns out that humans aren’t 100% accurate either. The unspoken rule in the industry is that a model with 80% accuracy is a success. If you think about how useful it is to identify 800,000 images correctly in your collection, whilst MAYBE not getting 200,000 correct, you’re still saving yourself 80% of your time. That is huge from a value perspective. If I could wave a magic wand and increase your productivity that much, you’d give me lots of money. Well, it turns out I can, using machine learning, so please send check or cash.


4) Machine learning is different from AI, deep learning, or neural networks

People tend to throw all of these terms around casually. To sound like an expert, learn the difference.


AI — Artificial Intelligence just means a computer that is as good/better as humans at doing specific tasks. It can also mean a robot that can make decisions based on lots of input, not unlike the Terminator or C3PO.

ML — Machine learning is a method for achieving AI. It means making a prediction about something based on training from sets of parsed data. There are lots of different ways a ML platform can implement training sets to predict things.


NL — Neural networks is one of these ways a machine learning model can predict things. Neural networks work a bit like your brain, by tuning itself through lots and lots of training to understand what a banana is supposed to look like. You create layers of nodes that get very deep.
Machine learning is just to give trained data to a program and get better result for complex problems. It is very close to data mining. While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with: The heavily hyped, self-driving Google car? The essence of machine learning. Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life. Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation. Fraud detection? 

How is machine learning different from X?
X = Artificial Intelligence(AI):
It refers to the procedure of programming a computer (machine) to take rational. Ah! what is rational? Rational is the basis of taking a decision.

I mentioned ‘rational’ instead of intelligence (as expected) because we human beings tend to take decisions which are high on being rational and feasible rather than being explicitly intelligent. This is because all intelligent decisions needn’t be rational and feasible (my hypothesis). Hence, the central motive behind using AI is to achieve the computer (machine) behave in a dandy fashion in lieu of human guidance instead of being doltish!

AI may include programs to check whether certain parameters within a program are behaving normally. For example, the machine may raise an alarm if a parameter say ‘X’ crosses a certain threshold which might in turn affect the outcome of the related process.
Use of Artificial Intelligence in Machine Learning

Machine Learning is a subset of AI where the machine is trained to learn from its past experience. The past experience is developed through the data collected. Then it combines with algorithms such as Naïve Bayes, Support Vector Machine(SVM) to deliver the final results.

X = Statistics:
Statistics is that branch of mathematics which utilizes data, either of the entire population or a sample drawn from the population to carry out the analysis and present inferences. Some statistical techniques used are regression, variance, standard deviation, conditional probability and many others
Use of Statistics in Machine Learning
Let’s understand this. Suppose, I need to separate the mails in my inbox into two categories: ‘spam’ and ‘important’. For identifying the spam mails, I can use a machine learning algorithm known as Naïve Bayes which will check the frequency of the past spam mails to identify the new email as spam. Naïve Bayes uses the statistical technique Baye’s theorem (commonly known as conditional probability). Hence, we can say machine learning algorithms uses statistical concepts to execute machine learning.

X = Deep Learning:
Deep Learning is associated with a machine learning algorithm (Artificial Neural Network, ANN) which uses the concept of human brain to facilitate the modeling of arbitrary functions. ANN requires a vast amount of data and this algorithm is highly flexible when it comes to model multiple outputs simultaneously. ANN is more complex topic and we may do justice to it in an altogether separate article

X = Data Mining:
During my initial days as an analyst, I always used to muddle the two terms: Machine Learning and Data Mining. But, later I learnt, Data Mining deals with searching specific information. And Machine Learning solely concentrates on performing a given task. Let me cite the example which helped me to remember the difference; Teaching someone how to dance is Machine Learning. And using someone to find best dance centers in the city is Data Mining.


RPA – To work with semi structure/unstructured data


Machine Learning in Practice
Machine learning algorithms are only a very small part of using machine learning in practice as a data analyst or data scientist. In practice, the process often looks like:


1. Understand the domain, prior knowledge and goals. Talk to domain experts. Often the goals are very unclear. You often have more things to try then you can possibly implement.

2. Data integration, selection, cleaning and pre-processing. This is often the most time consuming part. It is important to have high quality data. The more data you have, the more it sucks because the data is dirty. Garbage in, garbage out.

3. Learning models. The fun part. This part is very mature. The tools are general.

4. Interpreting results. Sometimes it does not matter how the model works as long it delivers results. Other domains require that the model is understandable. You will be challenged by human experts.

5. Consolidating and deploying discovered knowledge. The majority of projects that are successful in the lab are not used in practice. It is very hard to get something used.


Use
1) Predict stock market
2) recommendations of movies – Netflix or amazon
3) recommend friend/ads on fb

June 27, 2018

RPA Blogs

RPA Team structure

CoE Team

IT Architecture
Delivery Manager

- Focused on the strategic nature of Robotics for each of the delivery areas
- Stakeholder management at a senior level
- The delivery manager reports into the Head of Robotic Process Automation
- There may be multiple delivery managers, depending on the number of individual businesses within an organization.
- Collates and communicates the update on each project being delivered
- Holds a weekly meeting with process owners on the delivery of their automated processes
- Undertakes assessments of new robotic processes of a high-value nature
- Tracks delivery vs costs vs benefits
- Manages escalations for the development pods- ensures they are raises to the right stakeholders
- Manages the overall delivery plans for each delivery pod

Business Analyst
RPA Developer
Support and Infra Team
- When a process goes live, it transitions from test to Live production. There is a handover at this point with the delivery pod.
- The support team from this point forward, monitor processes all day. Processes can have a number of breakpoints within the process itself (handovers/pick ups from workflows/folders etc). 
-  There role is to step through a sample of cases each day plus respond to alerts where there is a direct failure in the process. 
- The control room act as the first point of contact for all updates and alerts on process. The process owner can liaise with them directly.

April 15, 2018

Identify right process for automation

Benefit to the company after automation

  1. Volume of transaction
  2. No of employee hrs. (FTE)
  3. Faster the Process (Reduce SLA time)
  4. Error- prone tasks that human do


Identify the RPA process candidate
  1. Define the start and end point of the process
  2. Take 1 process and breakdown that into the subprocess
  3. Structured Process
  4. Rule-based process
  5. Clear decision logic
  6. Definable workflow
  7. Uses multiple application or tools
  8. Repetitive, mundane and unsatisfying Process
  9. High volume and low to medium complexity
  10. Duplication of efforts
  11. Development time for BOT
  12. Stability of process 
How often does the process get changed?
How much does it change?
What is the lead time for change?


Operation team Governance
- Monitor BOT result
- Run BOT on time
- Convey change in Password

April 8, 2018

Understand the complexity of Process for automation

Understand the complexity of Process
- number of steps,scenario,application,roles
- Stability of process
Number of IT systems or IT services required to fulfill the process, or
- Number of activities/Task, Number of controls, or Number of decisions, or
- Number of people involved in the process, or
- Number of departments involved in the process, or
- The mechanism of the decision making, or
- Level of uncertainty and potential change in any of the process activities

March 31, 2018

RPA

Evolution of RPA  DO/Learn/Think

Do (Structure data and rule base flow)

- RPA Tool (Automation anywhere, UiPath,etc..)
- VBA (Macro)
- VBScript/ BAT files/ Powershell
- Javascript/Jquery
- Excel formula

Learn (Unstructured data and complex process)
Need to train the BOT frequently to match the decision-making frequency depending on the diversity of the input data.
- Python
- Machine Learning
- Deep Learning
- Cognitive Learning

Think (BOT can take a decision)
- AI

Advantage of RPA

- Frees up employees to do higher-value work
- Save FTE
- Speed  up the processing time (Reduce SLA)
- Work for 24*7
- Communicate with multiple technologies (can eliminate development efforts)
- Checker BOT
- Maker BOT
- Generate Report (data massage)
- Process improvement
- Data migration
- Higher outcome quality
- Scalability and agility
etc...

Timeline of a project in RPA

- BRD not having all the details
- Not getting application access on time
- POC
- It is depending on many factors like Project Complexity, RPA tool used, Citrix automation or not, available reusable assets (bots, libraries and etc) and the project team size.
- Change in requirement post development


RPA can fail due to below reason

Organizational pitfalls:
- Lack of time commitment from the local team
- Lack of leadership buy-in
- Lack of IT ownership
- Unclear responsibilities
- Unrealistic expectation
- Communication between operation and application team

Process pitfalls:
- Choosing a process with insignificant business impact
- Choosing a too complex process
- Choosing a process where better custom solutions exist
- Lack of focus in process selection
- Striving for end-to-end automation when it is not cost-effective
- Business exception (data is not proper)

Technical pitfalls:
- Should follow SDLC properly
- Architecture and Authentication
- Choosing a solution that requires intensive programming
- System exception (an application bugs)

Post-implementation pitfalls:
- Operation team governance
- Scalability
- Maintenance

Test Cases for Automation

- Test case executed with a different set of data
- Test case executed with complex business logic
- Test case executed with a different set of users

- Test case Involves a large amount of data




AI Product


- Chatbot
- Voicebot
- Siri/Cortana
etc...

January 27, 2018

Oracle and SQL Server Topics

Oracle Topics

-          Installation of Oracle client
-          Tablespace
-          DBF file
-          Relation between tablespace and DBF file
-          User
-          Role and permission
-          Pseudo columns
-          Save point,Rollback and commit
-          Insert,delete,update and select
-          Equi join and non equi join
-          Natural join
-          Cross join/ Cartesian join
-          Self join
-          Left, Right and full outer join
-          Column format
-          Aggregate, Number and Conversion function
-          Buffer command
-          IN/ Exist
-          Subquery
-          Table
-          View
-          Index
-          Materialize view
-          Exception handling
-          Cursor
-          Trigger
-          Function
-          Procedure
-          Package
-          Dynamic SQL
-          Loops
-          Array
-          Collection
-          PL/SQL Debugger, PL/SQL Profiler, PL/SQL Tuning Tips
-          Replication
-          Job

http://sql-plsql.blogspot.in/2007/05/oracle-plsql-nested-tables.html

http://dotnetmentors.com/sql/sql-server-common-table-expression-with-examples.aspx
https://www.tutorialgateway.org/sql-server-cte/


Common table expression (CTE) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, DELETE or MERGE statement.

SQL Server supports two types of CTEs-recursive and nonrecursive.

WHEN TO USE

  • If you need to reference/join the same data set multiple times you can do so by defining a CTE. Therefore, it can be a form of code reuse.
  • After declaration of table, you can refer the table created multiple times in scope of the same query.
  • Can be used in place of views or temporary tables.
  • Recursive query is easily created through CTE.
  • Instead of cursor


Syntax of CTE
With CTE_name <col1,col2,..>
<CTE_query>
select <col1,col2,..>
from CTE_name

October 9, 2017

RPA(Automation Anywhere)

TIps for AA
  1. Web object can capture with Object cloning, manage window control or manage web control.
  2. Control room client can be bot runner or bot creator
  3. IQBOT is vision BOT
  4. Always use a variable to make the things dynamic and set the value in variable from external mapping file.
  5. Avoid system variable Date - instead, use Date + Month + year because returned format is inconsistent depending upon the environment
  6. Excel, CSV/Text, browser session cannot be shared across tasks
  7. Subtask should be small
  8. MetaBot licenses are consumed on first logged in,first allocation basis
  9. To access the column value in table use index 1,2,3 like $dataset columns(1)$
  10. Do use copy & paste and shortcut keystrokes
  11. Don't use differing variable between tasks (Quick map)
  12. Avoid too many subtasks
  13. Avoid bi-directional dependencies between the task
  14. Reduce the unnecessary steps
  15. Configurable BOT
    1. Pro - Give liberty to extend/modify
    2. Con - Complex - Less User-friendly due to more configuration
AA Control room (VSI) consist of
Below components are present
1.     .Net framework
2.     IIS
3.     IE
4.     Schedule service
5.     AAE Control room API
6.     AAE Control room Site
7.     AAE analytics

Share data & service – Below components are present
1.     File repository
2.     Web socket service
3.     Sql server
4.     SAN

AAE Client Setup
1.     AAE client setup
2.     .Net Framework
3.     IE
4.     SVN

Control room Installation approaches
1.     Standalone – all components can install in single VM
2.     Distributed
1.     Application servers – VM1, VM2…
2.     Shared data & service – VM3
3.     Disaster recovery –VM4
4.     AAE Analytics – VM5

What is the method to get RPA analytics data?
1.     Tag – tag the variable
2.     Log – start capturing real time data log
3.     Analyze – Generate the reports based on historical data

AA - Surface Automation Techniques
1.     Should always capture only image 2 which in turn will dynamically load the bigger image which is image1.
2.     Repeat the capture of image 2 till the image 1 section gets autoloaded and Run the Quick test to ensure the images are captured successfully.
3.     Identify the combination of Match Percentage and Tolerance by running the automation several times in different user logins.
4.     Whenever the required image is in a multicolored background and during automation execution if unable to identify the image then change the mode to Monochrome.
5.     When the required image is black and white background, the Gray scale mode can be used to identify the image if required.

Question and Answer

Which option in AA is used to read entire cells in excel?
Select one:
a. Get Multiple cells
b.  Go to cell
c. Get all cells 
d. Enough to work with
e. All the above

Which of the below options are available in AA to activate sheet in excel?
i) Sheet By Index
ii)Sheet By Name
Select one:
a. Both i & ii 
b. Only i
c. Only ii

Is it possible to check broken link using web recorder in AA?

Select one:
a. False
b. True 

Which of the following activities can be performed using Read From CSV-Text Command in AA?
Select one or more:
a. Use encoding options: ANSI, UNICODE, and UTF8. 
b. Read multiple lines in CSV or text files 
c. None of the above
d. Read List Separated or Tab delimited data from a CSV file. 

PDF Integration in AA supports both encrypted and unencrypted files.
Select one:
a. True 
b. False

What are the actions that can be performed in image recognition once the image is found?
Select one:
a. Double Click
b. Left Click
c. Right Click
d. All of the above 

 To compare, find, split and join strings which feature in AA can be used?
Select one:
a. Object cloning
b. Keystrokes
c. Variable operation
d. String operation 

Which of the following privileged user can run the bots?
Select one:
a. Bot runner
b. Bot Creator
c. All of the above 

How do we allocate an existing license from one user to another user in the same Machine?
Select one:
a. Create a new user and upload the license file
b. Deactivate the existing user, create a new user and login with the credentials 
c. Create a new user and login with the credentials in client

Choose the commands that supports Secure Recording mode
Select one or more:
a.  All the above
b.  Object cloning 
c.  All the commands in the Enterprise client
d.  Image Recognition 
e.  OCR

Does the PDF integration command supports to extract the structured text? If so select appropriate option below
Select one:
a. All the above
b. Extract form fields
c. PDF to image
d. Extract text 

If at all the Credential information needs to be stored in centralized location, then what are all the options available AA?
Select one:
a.  Windows Vault
b.  Automation Anywhere Credential Manager 
c.  Automation Anywhere Credential Vault
d.  Windows Credential Manager

Which of the following actions can be performed by the user with Bot runner role?
Select one:
a. Can run the tasks using the 'One Time Only' schedule option only.
b. Can add, edit and delete schedules for tasks on your Client machine 
c. All of the above
d. Can create and run the tasks.

Which of the following is the best option to extract the data from Flatened PDF?
Select one:
a. All of the above
b. Extract Text 
c. OCR
d. Extract Form Fields