📜 ⬆️ ⬇️

Introduction to Python

In this article we will cover the basics of Python. We are getting closer and closer to the goal, in general, we will soon start working with the main libraries for Data Science and will use TensorFlow (for writing and deploying neural networks, tobish Deep Learning).

Installation


Python can be downloaded from python.org. However, if not already installed, instead of
It is recommended by the Anaconda distribution package, which already includes most of the libraries needed to work in the field of data science.

If you do not use the Anaconda distribution, do not forget to install the pip package manager, which allows you to easily install third-party packages, since we will need some of them. It is also worth installing a much more convenient for work interactive IPython shell. Note that the Anaconda distribution comes with pip and IPython.
')

Whitespace characters


In many programming languages ​​for the separation of code blocks are used
braces. Python uses indents:

#      for for i in [ 1, 2, 3, 4, 5] : print (i) #     for i for j in (1, 2, , 4, 5 ] : print ( j ) #     for j print (i + j) #     for j print (i) #     for i print ( "  ") 

This makes the code easy to read, but at the same time makes it necessary to follow the formatting. A space inside parentheses and square brackets is ignored, which makes it easier to write verbose expressions:

 #    long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20) 

and easy to read code:
 #   list_of_lists = [ [ 1 , 2, 3 ) , [4, 5, 6 ] , [ 7 , 8, 9 ] ] #      easy_to_read_list_of_lists = [1, 2, 3 ) , [4, 5, 6 ) , [7, 8, 9 ) ] 

To continue the operator on the next line, a backslash is used, however, such an entry will rarely be used:

 two_plus_three = 2 + \ 3 

Due to the formatting of the code with whitespace, difficulties arise when copying and pasting code into the Python shell. For example, trying to copy the following code:

 for i in [ 1, 2, 3, 4, 5] : #      print (1) 

A standard Python shell will cause an error:

 #    :     IndentationError : expected an indented blk 

because for the interpreter, an empty line indicates the end of a block of code with a for loop.

The IPython shell has a “magic”% paste function that correctly inserts everything that is on the clipboard, including whitespace.

Modules (Importing Libraries)


Some Python-based programming environment libraries do not load by default. In order for these tools to be used, you must import the modules that contain them.

One approach is to simply import the module itself:

 import re my_regex = re.compile ("[0-9]+",re.I) 

Here re is the name of the module containing the functions and constants for working with regular expressions. By importing the entire module in this way, you can access the functions by prefixing them with the re prefix re.

If the code already contains a variable with the name re, you can use the module alias:

 import re as regex my_regex = regex.compile("[0-9)+",regex.I) 

The alias is also used in cases when the imported module has a cumbersome name or when the code frequently accesses the module.

For example, when visualizing data based on the matplotlib module for it, usually
use the following standard alias:

 import matplotlib.pyplot as plt 

If you need to get several specific values ​​from the module, you can import them explicitly and use them without restrictions:

 from collections import defaultdict , Counter lookup = defaultdict(int) my_counter = Counter() 

Functions


A function is a rule that takes zero or more incoming arguments and returns the corresponding result. In Python, functions are usually defined using the def statement:

 def double() : """,  ,     docstring,  ,    . ,       2""" return  * 2 

Functions in Python are treated as first class objects. This means that they can be assigned to variables and passed to other functions just like any other arguments:

 #   f   def apply_to_one(f): '""'  f      """ return f(1) my _ double = double #       = apply_to_one(my_double) # = 2 

In addition, you can easily create short anonymous functions or lambda expressions:

  = apply_to_one(lambda :  + 4) # = 5 

Lambda expressions can be assigned to variables. However, it is recommended to use the operator def:

 another double = lmbd : 2 *  #    def another_double (x) : return 2 *  #   

The parameters of the function, in addition, you can pass arguments by default, which should be specified only when the expected value is different from the default value:

 def my_print (message="oe   " ): print (message ) my_print ( "pe") #  '' my_print () #  '   ' 

Sometimes it is advisable to specify arguments by name:

 #   def subtract ( a=0, =0 ) : return  - b subtract (10, 5)#  5 subtract (0, 5)#  -5 subtract (b=5 )#  ,      

In the future, the functions will be used very often.

Strings


Character strings (or sequences of characters) on both sides are delimited by single or double quotes (they must match):

 single_quoted_string = '    ' #  double_quoted_string = "  " #  

Backslash is used to encode special characters. For example:

 tab_string = "\t" #    len (tab_string)# = 1 

If you want directly backslash itself, which occurs
in the directory names in the Windows operating system, you can create an unformatted string using r '"':

 not_tab_string = r"\t" #   ' \ '  ' t ' len (not_tab_string) # = 2 

Multi-line text blocks are created using triple single (or
double quotes:

 multi_line_string = """   .        """ 

Exceptions


When something goes wrong, Python raises an exception. Unhandled exceptions cause the program to stop unexpectedly. Exceptions are handled using try and except statements :

 try: print (0 / 0) except ZeroDivisionError : rint ( "    ") 

Although in many programming languages ​​the use of exceptions is considered a bad programming style, there is nothing to worry about in Python if it is used to make code cleaner, and we will sometimes do just that.

Lists


Probably the most important data structure in Python is the list. It is just an ordered collection (or collection), similar to an array in other programming languages, but with additional functionality.

 integer_list = [1, 2, ] #    heterogeneous_list = ["", 0.1 , True] #   list_of_lists = [integer_list, heterogeneous_list, [] ] #   list_length = len(integer_list) #  = 3 list_sum = sum(integer_list)#    = 6 

You can set the value and get access to the n-th element of the list using square brackets:

  = list(range (10)) #   {0, 1 , . . . , 9] zero =  [0] # = 0 ,  -, .  .  1-  = 0 one = x [1] # = 1 nine =  [-1] # = 9, -    eight =  [-2] # = 8, -     [0] = -1 #   = { - 1 , 1 , 2, 3, . . . , 9] 

In addition, square brackets are used for "slicing" lists:

 first_three = [:] #   = [-1 , 1, 2] three_to_end = [3:] #    = {3, 4, ... , 9] one_to_four = [1:5] #     = {1 , 2, 3, 4] last_three = [-3:] #   = { 7, 8, 9] without_first_and_last = x[1:-1] #     = {1 , 2, ... , 8] _ of _ = [:] #   = [ -1, 1, 2, ... , 91 

In Python, there is an ln operator that checks whether an item belongs to a list:

 1 ln [1, 2, 3] #True 0 ln [1, 2, 3] #False 

The check consists in the sequential viewing of all elements, therefore it is worth using it only when it is known that the list is small or it does not matter how long it takes to check.

Lists are easy to link to each other:

  = [1, 2, 3] . extend ( [ 4, 5, 6] ) #   = {1, 2, 3, 4, 5, 6} 

If you need to leave the list of x unchanged, you can use the addition of lists:

  = [1, 2, 3]  =  + [4, 5, 6] #= (1, 2, 3, 4, 5, 6] ;    

Usually one item is added to the lists in one operation:

  = [1, 2, 3] x.append (0)#   = [1,2,3,0] =  [-1] # = 0 z = len (x)# = 4 

Often it is convenient to unpack the list, if you know how many items it contains:

 ,  = [1, 2] #   = 1,  = 2 

If the number of elements on both sides of the expression is not the same, then a ValueError error will be generated.

For a drop value, an underscore is usually used:

 _,  = [1, 2] #   == 2,     

Tuples


Tuples are immutable (or immutable) list cousins.

Practically everything that can be done with the list, without making changes to it, can be done with a tuple. Instead of square brackets, a tuple is drawn up with round boxes, or even without them:

 my_list = [1, 2] #   my_tuple = (1, 2) #   other_tuple = 3, 4 #    my_list [1] = 3 #  my_list = [1 , 3] try: my_tuple [1] = 3 except ypeError : print ( "   " ) 

Tuples provide a convenient way to return multiple functions from a function:

 #        def sum_and_product (x,  ) : return ( + ) , ( * ) sp = sum_and_product (2, 3) # = (5, 6) s,  = sum_and_product (S, 10) # s = 15,  = 50 

Tuples (and lists) are also used in multiple assignments:

 ,  = 1, 2 #   = 1,  = 2 ,  = ,  #   -;   = 2,  = 1 

Dictionaries


A dictionary or associative list is another basic data structure.

In it, the values ​​are associated with keys, which allows you to quickly retrieve the value corresponding to a particular key:

 empty_dict = {} #   - empty_dict2 = dict () #   - grades = { "Grigoriy" : 80, "Tim" : 95 } #   (  ) 

Access to the value by key can be obtained using square brackets:

 rigory_aleksee = grades[ "Grigoriy"] # = 80 

When you try to query a value that is not in the dictionary, you will receive a KeyError error message:

 try: kates_grade = grades [ "Kate "] except eyError: rint ( "    ! " ) 

You can check for the presence of a key using the in operator:

 grigoriy_has_grade = "Grigoriy" in grades #true kate_has_grade = "Kate" in grades #false 

Dictionaries have a get () method, which when searching for the missing key instead of calling an exception, returns the default value:

 grigoriy_grade = grades. get ( "Grigoriy ", 0) # =80 kates_grade = grades.get ("Kate" , 0) # = 0 no_ones_grade = grades.get ( "No One" ) #    = None 

Value assignment by key is performed using the same square brackets:

 grades [ "Tim" ] = 99 #    grades [ "Kate"] = 100 #    num_students = len(grades) # = 3 

Dictionaries are often used as a simple way to present structural
data:

 tweet = { "user" : " grinaleks", "text" : "   -  ", " retweet_count" : 100, "hashtags " : [ "# data", " #science", " #datascience " , " #awesome", "#yolo" ] } 

In addition to finding individual keys, you can contact everyone at once:

 tweet_keys = tweet.keys() #   tweet_values = tweet.values() #   tweet_items = tweet.items() #   (, ) "user" in tweet_keys # True,    in  "user" in tweet # -,   in  "grinaleks" in tweet_values # True 

Keys must be immutable; in particular, lists cannot be used as keys. If you need a composite key, it is better to use a tuple or to find a way to convert the key to a string.

Dictionary defaultdict


Let the document need to count the words. The obvious solution to the problem is to create a dictionary in which the keys are words, and the values ​​are the frequency of words (or the number of occurrences of words in the text). While checking the words, if the current word already exists in the dictionary, then its frequency increases, and if it does not, it is added to the dictionary:

 #   word_ counts = { } document = { } #  ;    for word in document : if word in word counts: word_counts [word] += 1 else : word_counts [word] = 1 

In addition, you can use the ad called “it is better to ask for forgiveness than permission” and catch the error when trying to access the missing key:

 word_ counts = { } for word in document : try: word_counts [word] += 1 except eyError : word_counts [word] = 1 

The third method is to use the get () method, which delicately goes out of the situation with missing keys:

 word_counts = { } for word in document : previous_count = word_counts.get (word, 0) word_counts [word] = previous_count + 1 

All the above techniques are a bit cumbersome, and for this reason it is advisable to use the dictionary defaultdict (which is also called the dictionary with: default value). It is similar to a regular dictionary except for one particular feature - when trying to access a key that does not exist in it, it first adds a value to it using the function with no arguments, which is provided when it is created. To use the defaultdict dictionaries, you must import them from the collections module:

 from collections import defaultdict word_counts = defaultdict(int) # int ()  0 for word in document : word_counts[word] += 1 

In addition, the use of defaultdict dictionaries has practical benefits when working with lists, dictionaries, and even custom functions:

 dd_list = defaultdict (list)# list ()    dd_list [2].append (l) #  dd_list  (2: {1] } dd_dict = defaultdict (dict ) # dict ()    dict dd_dict ["Grigoriy"] [ "City" ] = "Seattle" # { "Grigoriy" : { "City" : Seattle"} dd_pair = defaultdict (lambda: [0,0] ) dd_pair [2][1] = 1 #  dd_pair  (2 : {0,1] } 

These features will be needed when dictionaries are used to “collect”
results for some key and when it is necessary to avoid duplicate
checks for the presence of a key in the dictionary.

Dictionary Counter


A subclass of counter dictionaries transforms a sequence of values ​​into a defaultdict-like (int) object, where frequencies are assigned to keys, or, more precisely, keys are displayed (map) in frequency.

It will mainly be used when creating histograms:

 from collections import Counter  = Counter([0,1,2,0]) #    = { 0 : 2, 1 : 1, 2 : 1 } 

Its functionality makes it quite easy to solve the problem of counting word frequencies:

 #      word_counts = Counter (document) 

The dictionary dictionary has the most_common () method, which is often useful:

 #  10       () for word, count in word_counts.most_common(10) : print (word, count ) 

Sets


The set data structure or set is a collection of unordered elements without repetitions:

 s = set ()#    s.add (1) #  s = { 1 } s.add (2) #  s = { 1, 2 } s.add (2) # s    = { 1, 2 }  = len (s) # = 2  = 2 in s # = True z = 3 in s # = False 

Sets will be used for two reasons. First, the in operation on sets is very fast. If it is necessary to check a large set of elements for the belonging of a certain sequence, then the set data structure is better suited for this than the list:

 #  - stopwords_list = [ "a", "an" , "at "] + hundreds_of_other_words + [ "yet ", " you"] " zip" in stopwords_list # False,     #  - stopwords_set = set(stopwords_list) " zip" in stopwords_set #    

The second reason is getting unique items in the data set:

 item_list = [1, 2, 3, 1, 2, 3] #  num_items = len( item_list) #  = 6 item_set = set(item_list) #   (1, 2, 3} num_distinct_items = len(item_set) #   = 3 distinct_item_list = list(item_set) #    = [1,2,3] 

Sets will be used much less frequently than dictionaries and lists.

Control structures


As in most other programming languages, actions can be performed by condition using the if statement:

 if 1 > 2: message " 1    2 . . . " elif 1 > 3: message "elif  'else if '" else: message = "      ,  else " 

In addition, you can use the single-line triple if-then-else operator, which will sometimes be used in the future:

 parity = "" if  % 2 ===  else " " 

Python has a whlle loop:

  = 0 while  < 10: print (x, " 10")  += 1 

However, the for loop will be used more often with the in operator:

 for  in range (lO) : print (x, " 10" ) 51 

If you need a more complex logic control loop, you can use the operators

 continue  break: for  1n range (10) : 1f  == 3: continue #      if  == 5: break print (x) #    

As a result, 0, 1, 2 and 4 will be printed.

Truth


Boolean variables in Python work in the same way as in most other programming languages ​​with only one exception - they are capitalized:

 one_is_less_than_two = 1 < 2 #True true_equals_false = True == False #False 

To denote a non-existent value, use the special object None, which corresponds to the null value in other languages:

  = None print (x == None )#  True,    - print (  is None ) #  True - 

In Python, any value can be used where Boolean is expected. All the following elements have the logical value False:


Almost everything else is treated as True. This makes it easy to use if statements to check for empty lists. empty lines, empty dictionaries, etc. Sometimes, however, this leads to hardly recognizable errors, if you do not consider the following:

 s = some_function_that_returns_a_string () #    if s: first_char = s [0] #     else: first char = "" 

Here is an easier way to do the same:

 first_char = s and s [0] 
since the logical operator and returns the second value, if the first is true, and the first value, if it is false. Similarly, if x in the following expression is either a number, or perhaps None, then the result will somehow be a number:

 safe  =  or 0 #   

The Python built-in all function takes a list and returns True only when each element of the list is true, and the any built-in function returns true when at least one element is true:

 all ( [True, 1, { 3 }]) # True all ( [True, 1, {}] ) # False, {} =  any ( [ True, 1, {}]) # True, True =  all ( [] ) # True,      any ( [ ] ) # False,      

Source: https://habr.com/ru/post/450474/


All Articles