
import struct import numpy as np import requests import gzip import pickle TRAIN_IMAGES_URL = "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz" TRAIN_LABELS_URL = "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz" TEST_IMAGES_URL = "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz" TEST_LABELS_URL = "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz" def downloader(url: str): response = requests.get(url, stream=True) if response.status_code != 200: print("Response for", url, "is", response.status_code) exit(1) print("Downloaded", int(response.headers.get('content-length', 0)), "bytes") decompressed = gzip.decompress(response.raw.read()) return decompressed def load_data(images_url: str, labels_url: str) -> (np.array, np.array): images_decompressed = downloader(images_url) # Big endian 4 unsigned int, 4 magic, size, rows, cols = struct.unpack(">IIII", images_decompressed[:16]) if magic != 2051: print("Wrong magic for", images_url, "Probably file corrupted") exit(2) image_data = np.array(np.frombuffer(images_decompressed[16:], dtype=np.dtype((np.ubyte, (rows * cols,)))) / 255, dtype=np.float32) labels_decompressed = downloader(labels_url) # Big endian 2 unsigned int, 4 magic, size = struct.unpack(">II", labels_decompressed[:8]) if magic != 2049: print("Wrong magic for", labels_url, "Probably file corrupted") exit(2) labels = np.frombuffer(labels_decompressed[8:], dtype=np.ubyte) return image_data, labels with open("test_images.pkl", "w+b") as output: pickle.dump(load_data(TEST_IMAGES_URL, TEST_LABELS_URL), output) with open("train_images.pkl", "w+b") as output: pickle.dump(load_data(TRAIN_IMAGES_URL, TRAIN_LABELS_URL), output)
Epsilon here is a model error. Also, for clarity and simplicity, we will deal with a one-dimensional model - multidimensionality does not add complexity, but the illustration will not work. For a moment, we’ll forget about MNIST and generate some data stretched into a line. We also rewrite the regression model (hypothesis) as follows:
. y with a cap is the predicted value of the model.
1 and 2 - unknown parameters - the main task is to find these parameters, and x is a free variable, its values are known to us. We formulate the problem again and in a slightly different language - we have a set of experimental data in the form of pairs of values
and you need to find a straight line on which these values are located, to find a line that would best summarize the experimental data. Some code to generate data: import numpy as np import matplotlib.pyplot as plt TOTAL = 200 STEP = 0.25 def func(x): return 0.2 * x + 3 def generate_sample(total=TOTAL): x = 0 while x < total * STEP: yield func(x) + np.random.uniform(-1, 1) * np.random.uniform(2, 8) x += STEP X = np.arange(0, TOTAL * STEP, STEP) Y = np.array([y for y in generate_sample(TOTAL)]) Y_real = np.array([func(x) for x in X]) plt.plot(X, Y, 'bo') plt.plot(X, Y_real, 'g', linewidth=2.0) plt.show() 

so that the predicted value is closest to real. Graphically, it expresses something like this: import matplotlib.pyplot as plt plt.plot([1, 2, 3, 4, 5], [4, 2, 9, 9, 5], 'bo') plt.plot([1, 2, 3, 4, 5], [3, 5, 7, 9, 11], '-ro') plt.show() 
should have the shortest possible length. Since the vector is not the only one, it is postulated that the sum of the squares of the lengths of all vectors should tend to the minimum, taking into account the vector of parameters
. In my opinion, quite logical method, speculative. Nevertheless, there is a mathematical proof of the correctness of this Remarque method: by length we mean the Euclidean metric , although this is not necessary. Remark 2 : note that the sum of the squares. Again, no one forbids trying to minimize just the sum of the lengths . In this picture, the red dots are the predicted value (
), blue - obtained as a result of the experiment (y without a cap).
- this is just the difference between them, the length of the vector.
- it is required to find such a vector
in which the expression
reaches a minimum. The function f in this expression is:
or
- vector consisting of the values of the dependent variable y - 
- vector of parameters - 
. In the one-dimensional case in matrix A there are only two columns - 
. In this equation, 2 unknowns are predicted values and parameters. We can try to find out the parameters from the same equation, but with known values:
Otherwise, it can be represented as a system of equations: 
.
- It is unlikely to find solutions for such a system.
The sum of squares, taking into account that everything is transformed into a vector \ matrix can be written as follows:
.
.
- (p; 1), and for vector
- (n; 1). As a result, we obtain the difference of two vectors of dimension (n; 1) -
. We write further: 






and
there is a constant. You can prove it by taking the dimension of the matrices from their definition and calculating the dimension of the expression after all multiplications:

. The minimum is quite casual - equating the first differential over
to zero. In an amicable way, you must first prove that this minimum exists at all, I propose to omit the proof and spy it in the literature yourself . Intuitively, it is clear that the quadratic function is a parabola, and it has a minimum.





called a pseudo-inverse matrix.
.
import numpy as np import matplotlib.pyplot as plt TOTAL = 200 STEP = 0.25 def func(x): return 0.2 * x + 3 def prediction(theta): return theta[0] + theta[1] * x def generate_sample(total=TOTAL): x = 0 while x < total * STEP: yield func(x) + np.random.uniform(-1, 1) * np.random.uniform(2, 8) x += STEP X = np.arange(0, TOTAL * STEP, STEP) Y = np.array([y for y in generate_sample(TOTAL)]) Y_real = np.array([func(x) for x in X]) A = np.empty((TOTAL, 2)) A[:, 0] = 1 A[:, 1] = X theta = np.linalg.pinv(A).dot(Y) print(theta) Y_prediction = A.dot(theta) error = np.abs(Y_real - Y_prediction) print("Error sum:", sum(error)) plt.plot(X, Y, 'bo') plt.plot(X, Y_real, 'g', linewidth=2.0) plt.plot(X, Y_prediction, 'r', linewidth=2.0) plt.show() 
import numpy as np import matplotlib.pyplot as plt TOTAL = 200 STEP = 0.25 def func(x): return 0.2 * x + 3 def prediction(theta): return theta[0] + theta[1] * x def generate_sample(total=TOTAL): x = 0 while x < total * STEP: yield func(x) + np.random.uniform(-1, 1) * np.random.uniform(2, 8) x += STEP X = np.arange(0, TOTAL * STEP, STEP) Y = np.array([y for y in generate_sample(TOTAL)]) Y_real = np.array([func(x) for x in X]) A = np.empty((TOTAL, 2)) A[:, 0] = 1 A[:, 1] = X theta = np.linalg.pinv(A).dot(Y) print(theta) Y_prediction = A.dot(theta) error = Y - Y_prediction error_squared = error ** 2 M = sum(error) / len(error) M_squared = M ** 2 D = sum([sq - M_squared for sq in error_squared]) / len(error) print("M:", M) print("D:", D) plt.plot(X, Y, 'bo') plt.plot(X, Y_real, 'g', linewidth=2.0) plt.plot(X, Y_prediction, 'r', linewidth=2.0) plt.show() 
Source: https://habr.com/ru/post/307004/
All Articles