# in online learning the error function is considered on which of the given options

### Mohammed

Guys, does anyone know the answer?

get in online learning the error function is considered on which of the given options from screen.

## Online machine learning

## Online machine learning

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search

Not to be confused with online and offline.

Part of a series on Machine learning and data mining show Paradigms hide Problems

ClassificationRegressionClusteringdimension reductiondensity estimationAnomaly detectionData CleaningAutoMLAssociation rulesStructured predictionFeature engineeringFeature learningLearning to rankGrammar induction

show Supervised learning

(**classification** • **regression**)

show Clustering show

Dimensionality reduction

show

Structured prediction

show Anomaly detection show

Artificial neural network

show

Reinforcement learning

show

Learning with humans

show Model diagnostics show Theory show

Machine-learning venues

show Related articles vte

In computer science, **online machine learning** is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

## Contents

1 Introduction

2 Statistical view of online learning

2.1 Example: linear least squares

2.2 Batch learning

2.3 Online learning: recursive least squares

2.4 Stochastic gradient descent

2.5 Incremental stochastic gradient descent

2.6 Kernel methods

2.7 Online convex optimization

2.7.1 Follow the leader (FTL)

2.7.2 Follow the regularised leader (FTRL)

2.8 Online subgradient descent (OSD)

2.9 Other algorithms

3 Continual learning

4 Interpretations of online learning

5 Implementations 6 See also 7 References 8 External links

## Introduction[edit]

In the setting of supervised learning, a function of

{\displaystyle f:X\to Y}

is to be learned, where

{\displaystyle X}

is thought of as a space of inputs and

{\displaystyle Y}

as a space of outputs, that predicts well on instances that are drawn from a joint probability distribution

{\displaystyle p(x,y)}

on

{\displaystyle X\times Y}

. In reality, the learner never knows the true distribution

{\displaystyle p(x,y)}

over instances. Instead, the learner usually has access to a training set of examples

{\displaystyle (x_{1},y_{1}),\ldots ,(x_{n},y_{n})}

. In this setting, the loss function is given as

{\displaystyle V:Y\times Y\to \mathbb {R} }

, such that

{\displaystyle V(f(x),y)}

measures the difference between the predicted value

{\displaystyle f(x)}

and the true value {\displaystyle y}

. The ideal goal is to select a function

{\displaystyle f\in {\mathcal {H}}}

, where

{\displaystyle {\mathcal {H}}}

is a space of functions called a hypothesis space, so that some notion of total loss is minimised. Depending on the type of model (statistical or adversarial), one can devise different notions of loss, which lead to different learning algorithms.

## Statistical view of online learning[edit]

In statistical learning models, the training sample

{\displaystyle (x_{i},y_{i})}

are assumed to have been drawn from the true distribution

{\displaystyle p(x,y)}

and the objective is to minimize the expected "risk"

{\displaystyle I[f]=\mathbb {E} [V(f(x),y)]=\int V(f(x),y)\,dp(x,y)\ .}

A common paradigm in this situation is to estimate a function

{\displaystyle {\hat {f}}}

through empirical risk minimization or regularized empirical risk minimization (usually Tikhonov regularization). The choice of loss function here gives rise to several well-known learning algorithms such as regularized least squares and support vector machines. A purely online model in this category would learn based on just the new input

{\displaystyle (x_{t+1},y_{t+1})}

, the current best predictor

{\displaystyle f_{t}}

and some extra stored information (which is usually expected to have storage requirements independent of training data size). For many formulations, for example nonlinear kernel methods, true online learning is not possible, though a form of hybrid online learning with recursive algorithms can be used where

{\displaystyle f_{t+1}}

is permitted to depend on

{\displaystyle f_{t}}

and all previous data points

{\displaystyle (x_{1},y_{1}),\ldots ,(x_{t},y_{t})}

. In this case, the space requirements are no longer guaranteed to be constant since it requires storing all previous data points, but the solution may take less time to compute with the addition of a new data point, as compared to batch learning techniques.

## 45 Questions to test a data scientist on Deep Learning (along with solution)

45 must know questions on Deep Learning which every data scientist should know. Answer these questions on neural networks, deep learning & its libraries

JalFaizy Shaikh — Published On January 29, 2017 and Last Modified On June 24th, 2022

Deep Learning Intermediate Interview Questions Machine Learning Skilltest

## Introduction

Back in 2009, deep learning was only an emerging field. Only a few people recognised it as a fruitful area of research. Today, it is being used for developing applications which were considered difficult or impossible to do till some time back.

Speech recognition, image recognition, finding patterns in a dataset, object classification in photographs, character text generation, self-driving cars and many more are just a few examples. Hence it is important to be familiar with deep learning and its concepts.

In this skilltest, we tested our community on basic concepts of Deep Learning. A total of 1070 people participated in this skill test.

If you missed taking the test, here is your opportunity to look at the questions and check your skill level. If you are just getting started with Deep Learning, here is a course to assist you in your journey to Master Deep Learning:

Certified AI & ML Blackbelt+ Program

## Overall Results

Below is the distribution of scores, this will help you evaluate your performance:

You can access your performance here. More than 200 people participated in the skill test and the highest score was 35. Here are a few statistics about the distribution.

**Overall distribution**

Mean Score: 16.45 Median Score: 20 Mode Score: 0

It seems like a lot of people started the competition very late or didn’t take it beyond a few questions. I am not completely sure why, but may be because the subject is advanced for a lot of audience.

If you have any insight on why this is so, do let us know.

## Helpful Resources

Fundamentals of Deep Learning – Starting with Artificial Neural Network

Practical Guide to implementing Neural Networks in Python (using Theano)

A Complete Guide on Getting Started with Deep Learning in Python

Tutorial: Optimizing Neural Networks using Keras (with Image recognition case study)

An Introduction to Implementing Neural Networks using TensorFlow

## Questions and Answers

**Q1. A neural network model is said to be inspired from the human brain.**

**The neural network consists of many neurons, each neuron takes an input, processes it and gives an output. Here’s a diagrammatic representation of a real neuron.**

**Which of the following statement(s) correctly represents a real neuron?**

A. A neuron has a single input and a single output only

B. A neuron has multiple inputs but a single output only

C. A neuron has a single input but multiple outputs

D. A neuron has multiple inputs and multiple outputs

E. All of the above statements are valid

**Solution: (E)**

A neuron can have a single Input / Output or multiple Inputs / Outputs.

**Q2. Below is a mathematical representation of a neuron.**

**The different components of the neuron are denoted as:**

**x1, x2,…, xN: These are inputs to the neuron. These can either be the actual observations from input layer or an intermediate value from one of the hidden layers.**

**w1, w2,…,wN: The Weight of each input.**

**bi: Is termed as Bias units. These are constant values added to the input of the activation function corresponding to each weight. It works similar to an intercept term.**

**a: Is termed as the activation of the neuron which can be represented as**

**and y: is the output of the neuron**

**Considering the above notations, will a line equation (y = mx + c) fall into the category of a neuron?**

A. Yes B. No

**Solution: (A)**

A single neuron with no non-linearity can be considered as a linear regression function.

**Q3. Let us assume we implement an AND function to a single neuron. Below is a tabular representation of an AND function:**

**X1**

**X2**

**X1 AND X2**

**0**

**0**

**0**

**0**

**1**

**0**

**1**

**0**

**0**

**1**

**1**

**1**

**The activation function of our neuron is denoted as:**

स्रोत : **www.analyticsvidhya.com**

## Choosing an Error Function

The error function expresses how much we care about a deviation of a certain size. The choice of error function depends entirely on how our model will be used.

## Choosing an Error Function

The error function expresses how much we care about a deviation of a certain size. The choice of error function depends entirely on how our model will be used.

By **Brandon Rohrer**, Staff Machine Learning Engineer at LinkedIn on June 10, 2019 in Cost Function, Machine Learning

comments

For all the content, including video and code, visit the course page for How Modeling Works.

### Choosing an error function

When we go to fit a model to some data, it’s easy to overlook the choice of error function. Fitting the model is an exercise in optimization. It is finding the set of parameter values that minimizes a loss function. (If you need a refresher, check out the how optimization works series.)

The difference between a model and a measured data point is called deviation. The error function expresses how much we care about a deviation of a certain size. Are small errors OK, but large errors really bad? Or is being off by a little just as bad as being off by a lot? In business terms, we can think of the error function as how much it costs us in dollars to be wrong by a certain amount. In fact, error functions are also called cost functions.

The choice of error function depends entirely on how our model will be used.

**Use case: squared deviation**

Imagine our temperature predictions are being used to design a greenhouse. The thickness of the glass and the amount of insulation at the base are carefully selected to create an ideal growing environment. There won’t be any heaters or air-conditioners to modulate the temperature, just the passive heat flow, determined by the design of the greenhouse. The plants are hearty and can tolerate being off by a few degrees in any direction fairly well. It stunts them a little, but not catastrophically. However, the further the temperature gets from ideal, the more detrimental it is to the plants and quickly the effects become more severe. This suggests that the cost function is something like the square of the deviation.

**Use case: absolute deviation**

Now, we are designing a greenhouse again, but this time including heaters and coolers. That means we will be able to modulate the temperature in order to make it suitable for the plants, but the more heating and cooling we do, the more energy we will have to buy, the more money we will have to spend on it. The cost of being off in our prediction is related to how much it costs to correct for it, the energy cost to bring the temperature back to the appropriate range. This suggests an error function where the cost is proportional to the absolute value of the deviation.

All the models fit to our temperature data in part 1 used an absolute deviation error function.

**Use case: Absolute deviation with saturation**

Our temperature forecasts are now being used to make decisions about when to pre-heat or pre-cool an office building for a workday. Pre-heating and pre-cooling during the night allows a lower energy price and saves the company money. The cost of any deviations is the additional cost of daytime peak energy. This is proportional to the amount of time the equipment runs during the day, which in turn is directly proportional to the prediction error. However, above a certain prediction threshold no amount of the time heating or cooling will fully make up the difference, so the cost has a ceiling. The equipment just runs all day. This suggests an error function of absolute deviation with saturation.

**Use case: Squared deviation with a don’t care region**

Now our temperature predictions are being used in television forecasts. Our viewers don’t expect the predictions to be exact, so if they are off by a little bit there is no penalty. This gives us a "don’t care" region. There is no cost to being wrong by a small amount. However if the temperatures are off by much more than that, then viewers become very upset and are likely to switch to another television station for their weather reports. A quadratic curve gives us the steeply increasing cost associated with this.

**Use case: custom error function**

We can even handle much more complex cases. Imagine that our top notch business analytics team determines that our energy costs have a complicated relationship to prediction errors, say, something that looks like this.

That’s not a problem. We can use that just as easily as any of the other candidates we have considered. The only real constraint on our error function is that it doesn’t decrease as it gets further away from zero. It can follow any pattern we want as long as it always increases or remains flat.

The choice of error function makes a difference in which model will fit the best and what the parameter values of that model will be. Each one of these error functions would produce a different set of best fit parameters in our temperature model. The best fit curve will be different each time. Starting with the right error function can make a big difference in how useful our model is. The wrong error function can give us a model that is worse than useless.

Keep your eyes open for squared deviation as an error function. It’s a very common choice. So common in fact, that inexperienced modelers might assume it's the only choice. It has some very nice properties for mathematical analysis, and for that reason is favored for theoretical and academic work. But other than that, there is nothing that makes it special. Chances are it's not right for your application. If you spend some time carefully choosing your error function, you will ba glad you did.

Guys, does anyone know the answer?