📜 ⬆️ ⬇️

Basics of Python - briefly. Strings.

Since the number of positive reviews exceeded the number of negative ones, I will continue to lay out the lessons. Those who are already familiar with the basics - you can either just skip the lesson, or try to do task 3 in the shortest way :)

For a start, a little remark.

Starting in Python 2.3, everyone who uses non-ASCII encoding needs to add a coding indication at the very beginning of the program. For the Russian language it will be mainly:
  # - * - coding: cp1251 - * - 

or use utf-8 files for storing source files (which is preferable).
')
Having studied the management of numbers, it's time to learn the lines. Python has a very rich set of opportunities in this area.

Strings


Strings can be enclosed in single or double quotes, strings can use C-style esc sequences, multi-line constants are specified in triple quotes.
  >>> "habrahabr"
 'habrahabr'
 >>> 'foo bar boz'
 'foo bar boz'
 >>> "won't"
 "won't"
 >>> 'don "t'
 'don "t'
 >>> "couldn \" t "
 'couldn' t '
 >>> "" "multi line
 ... very long
 ... string constant "" "
 'multi line \ nvery long \ nstring constant'
 >>> 'this takes \
 ... two lines'
 'this takes two lines' 

Strings can be glued together with the operator + and “multiplied” by the operator *
  >>> "Hello" + "word"
 'Hello word'
 >>> "Hello" * 3
 'Hello Hello Hello' 

In essence, its string is a sequence of characters with random access. To get a part of the characters of the string, you can use the so-called. slice operator. Please note that the numbering is starting from scratch (and rather chaotic at first glance).
  >>> str = "Hello, cruel world!"
 # get 4 character string
 >>> str [3] 
 'l'
 # all characters 8 through 14
 >>> str [7:14]
 'cruel w'
 # every second character from 2 to 13
 >>> str [1: 12: 2]
 'el, cul'
 # some values ​​can be omitted
 # every second character of the string.
 >>> str [:: 2]
 'Hlo re ol!' 

If you omit the first of the three parameters, it is considered to be zero; if you omit the second, the slice will continue to the end of the line.
 # first 2 characters of string
 >>> str [: 2]
 'He'
 # whole line except first 2 characters
 >>> str [2:]
 'llo, cruel world!' 

Slices with irregular bounds are processed as follows:
- if the upper limit of the cut is greater than the length of the string, then it is reduced to the length of the string
- if the lower limit is greater than the upper, then an empty string is returned

Also, slices can be negative.
 # last character
 >>> str [-1]
 '!' 
 # second character from end
 >>> str [-2]
 'd'
 # last two characters
 >>> str [-2:]
 'd!'
 # all characters except last two
 >>> str [: - 2]
 'Hello cruel worl' 

The best way to remember how the indices in the slice are determined is to consider them to be pointing between the characters, with the number 0 on the left border of the first character. And the right border of the last character has an index equal to the length of the string.
For positive characters, the length of the string is equal to the difference between the numbers on the border.

The len () function is used to determine the length of the string.

Unicode

In the latest versions of Python, work with Unicode strings is very well supported.

To specify a unicode string in the form of a constant, the prefix u is used.
  >>> uni = u "Test"
 >>> uni
 u '\ u0422 \ u0435 \ u0441 \ u0442' 

In addition, Python allows you to create a string in Unicode using the same function.
  >>> uni = unicode ("Test", "UTF-8")
 >>> uni
 u '\ u0422 \ u0435 \ u0441 \ u0442' 

This feature can work with Latin-1, ASCII, UTF-8, UTF-16, with Russian encodings ISO-8859-5, KOI8-R, CP1251, CP866 and Mac-cyrillic, and many others.

For the inverse transform is the method encode, which converts a unicode-string into a string with a given encoding.
  >>> uni.encode ("UTF-8")
 '\ xd0 \ xa2 \ xd0 \ xb5 \ xd1 \ x81 \ xd1 \ x82'
 >>> uni.encode ("CP1251")
 '\ xd2 \ xe5 \ xf1 \ xf2' 

To convert a string to a list on a specific delimiter, use the split method.
This method asks for a separator as a parameter, and returns a list of individual “words” by which you can “pass” in a for loop.
  >>> str = "Mary has a little lamb"
 >>> str.split ("")
 ['Mary', 'has', 'a', 'little', 'lamb']
 >>> for word in str.split (""):
 ... print word
 ...
 Mary
 has
 a
 little
 lamb 


Homework.

1. Write a program that displays the user-specified string in at least 3 different encodings. In this case, it is possible to write a call to the encode () method only once.
2. Write a program to find the longest word in the string, separated by spaces.
3. (Increased complexity) Write a phone number decoding program for caller ID.
At the request of Caller ID, the PBX sends the phone number using the following rules:
- If the number is repeated less than 2 times, then this is a hindrance and it should be discarded.
- Each significant digit is repeated at least 2 times.
- If there are several digits in a row in the number, then to indicate “the same digit as the previous one”, the sign # is used 2 or more times

For example, the incoming line 4434 ### 552222311333661 corresponds to the number 4452136
By the way, regular expressions can not be used in these tasks :)

Source: https://habr.com/ru/post/29980/


All Articles