⬆️ ⬇️

Python and DataScience: exploring the capabilities of the universal library Numpy





From the translator: this is a translation of the material by Rakshit Vasudeva, who has been studying DataScience closely for a long time and using Python in it. The author talks about the powerful Numpy library, which allows you to realize many of the possibilities of machine learning and working with big data.



Numpy is a math library for Python. It allows you to perform various kinds of calculations efficiently and quickly. It greatly enhances the functionality of Python thanks to the special solutions that are used in it. This article describes the basic capabilities of Numpy, and this is only the first part; others will be published later. An article for those who are just starting to learn Numpy, entering the wondrous world of mathematics in Python.



Skillbox recommends: Practical course "Python-developer from scratch . "

We remind: for all readers of "Habr" - a discount of 10,000 rubles when recording for any Skillbox course on the promotional code "Habr".


Import Library



import numpy as np 


At this point we say to Python that np is the reference for Numpy, which will be used in the future.

')

Now create the python array and the np array.



 # python array a = [1,2,3,4,5,6,7,8,9] # numpy array A = np.array([1,2,3,4,5,6,7,8,9]) 


There is no big difference in the output.



 print(a) print(A) ==================================================================== [1, 2, 3, 4, 5, 6, 7, 8, 9] [1 2 3 4 5 6 7 8 9] 


Well, why is it better to use the numpy array instead of the usual one? The answer is because np will allow us to make calculations faster and modify the overall architecture of the application.



np.arange ()



 np.arange(0,10,2) ==================================================================== array([0, 2, 4, 6, 8]) 


([start], stop, [step]) orders the numbers. That's what it means for the car.



We form the np-list, starting from 0 to 10, but do not include 10, plus we increase the numbers by 2 each time.



Thus, we get this:

array ([0, 2, 4, 6, 8])



It is important to remember that the last digit is not included in the list.



Another example:



 np.arange(2,29,5) ==================================================================== array([2, 7, 12, 17, 22, 27]) 


This array can also be called a matrix or vector. Therefore, do not worry when I say, for example: "The form of the matrix is ​​2 * 3". All this means that our array will look like this:



 array([2, 7, 12], [17, 22, 27]) 


Now let's talk about such a parameter as shape for the default np array. Shape here is an attribute. An example of its use is below.



 A = [1, 2, 3, 4, 5, 6, 7, 8, 9] A.shape ==================================================================== (9,) 


This is a matrix of numbers, where there are only 9 elements in the series. In principle, the ideal matrix is ​​1 * 9, is not it?



Basically, yes, and for this reshape () comes into play. This is a method that changes the dimensions of the original matrix as we would like.



Here is an example of using reshape () in practice.



 A = [1, 2, 3, 4, 5, 6, 7, 8, 9] A.reshape(1,9) ==================================================================== array([[1, 2, 3, 4, 5, 6, 7, 8, 9]]) 


Note that reshape returns a multidimensional matrix. This is indicated by two brackets at the beginning. [[1, 2, 3, 4, 5, 6, 7, 8, 9]] is a potentially multidimensional matrix in contrast to [1, 2, 3, 4, 5, 6, 7, 8, 9].



Another example:



 B = [1, 2, 3, 4, 5, 6, 7, 8, 9] B.reshape(3,3) ==================================================================== array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 


If we take the shape parameter for B, then it will be (3.3):



 B.shape ==================================================================== (3,3) 


Let's go to np.zeros ()



What is written in this code?



 np.zeros((4,3)) ==================================================================== ??????????? 


Namely: a 3 * 4 format matrix filled with zeros is set here. Here is the conclusion:



 np.zeros((4,3)) ==================================================================== array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) 


np.zeros ((n, m)) returns an n * m matrix filled with zeros. It's simple.



What does np.eye () do?



It returns us a unit matrix with certain characteristics.



 np.eye(5) ==================================================================== array([[1., 0., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]]) 


How to multiply two matrices?



No problem: np.dot () is used for this. This function is a scalar product if vectors and matrix products (the most ordinary) are passed to it.



Example: A = (2,3) & B = (3,2). Here the number of columns in A is 3. The number of rows in B is 3. Since the characteristics are the same, multiplication is possible.



 # generate an identity matrix of (3 x 3) I = np.eye(3) I ==================================================================== array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]) # generate another (3 x 3) matrix to be multiplied. D = np.arange(1,10).reshape(3,3) D ==================================================================== array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 


We prepared matrices for multiplication. Next - we act.



 # perform actual dot product. M = np.dot(D,I) M ==================================================================== array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]) 


Now let's add individual elements to the matrix.



 # add all the elements of matrix. sum_val = np.sum(M) sum_val ==================================================================== 45.0 


np.sum () adds entries to the matrix.



However, we have two options.



1. Fold in rows



 # sum along the rows np.sum(M,axis=1) ==================================================================== array([ 6., 15., 24.]) 


6 - the sum of the first row (1, 2, 3).

15 - the second (4, 5, 6).

24 - the third (7, 8, 9).



2. Fold in columns



 # sum along the cols np.sum(M,axis=0) ==================================================================== array([12., 15., 18.]) 


12 - the sum of the first column (1, 4, 7).

15 - on the second (2, 5, 7).

18 - on the third (3, 6, 9).



Below is the video created by the author, where everything described above is explained again, more clearly.





Skillbox recommends:



Source: https://habr.com/ru/post/422423/



All Articles