
the likelihood that some value belongs to the class "+". And of course
. Thus, the result of the logistic regression is always in the interval [0, 1].
If it is impossible to produce a linear separation of points in the original space, it is worth trying to convert the feature vectors into a space with a large number of dimensions, adding additional interaction effects, higher degree terms, etc. Using a linear algorithm in such a space provides certain advantages for learning a non-linear function, since the boundary becomes non-linear when it returns to the original space.
and
, then the function corresponding to the boundary takes the form:
It is important to note thatand
are source variables, and the output variable is not part of the source space, unlike the linear regression method.
. Substituting values
and
to the boundary function, we get the result
. Now, depending on the position
Three options should be considered:
lies in the area bounded by the points of the class "+". Then
, will be positive, being somewhere within (0,
). From a mathematical point of view, the larger the value of this value, the greater the distance between the point and the border. And this means a greater likelihood that
belongs to the class "+". Consequently,
will be within (0.5, 1].
lies in the area bounded by the dots of the class "-". Now,
will be negative, being within (-
, 0). But, as in the case of a positive value, the greater the magnitude of the output value in the module, the greater the likelihood that
belongs to the class "-", and
is in the interval [0, 0.5).
lies on the border itself. In this case,
. This means that the model really cannot determine whether
to the class "+" or to the class "-". And as a result,
will be equal to 0.5.
,
) having a source data point. But how to convert the resulting value into the probability
whose limits are [0, 1]? The answer is using the odds ratio (OR) function.
probability of an event occurring
. Then, the odds ratio (
) is determined from
, and this is the ratio of the probabilities of whether an event will occur or not. It is obvious that probability and odds ratio contain the same information. But, while
ranges from 0 to 1,
ranges from 0 to
.
before
. Next, calculate the logarithm
what is called the logarithm of the odds ratio. In the mathematical sense,
has limits from 0 to
, but
- from -
before
.
, the logistic regression algorithm will look as follows:
the boundary function (or, alternatively, the odds ratio function). For simplicity, we denote this value.
.
. (because
is the logarithm).
compute
using simple dependencies.
in step 1, you can combine steps 2 and 3:
? ”The mathematical basis of this is beyond the scope of the article, but the general idea is this:
where
- data point of the training set. In simple form
can be described as:if ais part of the "+" class,
(here
- output value obtained from the logistic regression model). If a
is part of the "-" class,
.
quantifies the likelihood that a training sample point is classified by the model in the correct way. Therefore, the average value for the entire training sample shows the probability that a random data point will be correctly classified by the system, regardless of the possible class.
. And the name of this method is the maximum likelihood method. If you are not a mathematician, then you will be able to understand how optimization occurs, only if you have a good idea of ​​what is being optimized.
.Source: https://habr.com/ru/post/265007/
All Articles