40 key information technology concepts accessible and understandable
I present to you the translation of a very capacious, and at the same time fairly short (for such a scale of problem) article by Carl Cheo . I decided that I really wanted to make her translation almost as soon as I began to read, and I am very glad that I finally did it.
In order to make learning more fun and interesting , I present to you a list of important theories and concepts of computer science, explained with the help of analogies with a minimum of technical details. It will be like a very fast computer science course for everyone with the goal of simply giving you a general idea of ​​the basic concepts.
Important notes:
Items with an unspecified source are written by me independently. Correct me if you notice any inaccuracies. Offer the best analogy if possible.
Titles refer to their corresponding Wikipedia articles. Please read these articles for more serious and detailed explanations.
Suppose you order a complete collection of Harry Potter films from 8 parts on Amazon and download the same collection at the same time. You want to check which method is faster. The store delivered the collection almost every other day, and the download was completed just 30 minutes earlier. Great! It was a close race.
But what if I order several Blu-ray films, such as Lord of the Rings, Twilight, the Dark Knight trilogy and the like, and at the same time I will try to download all these films? Delivery will still take a day, but downloading on the network will last this time for 3 days.
When shopping on the Internet, the number of purchased items (input data) does not affect the time for which they will be delivered. The output is constant. We call it O (1).
But when downloading online, download time is directly proportional to the size of downloaded movies (input). We call it O (n). From these experiments, we understand that online shopping scales better than downloading files from the network. It is very important to understand the “O-large” -notation, as this will help you analyze the efficiency and scalability of the algorithms.
Note: the “O-large” notation is used to evaluate the performance at worst for an algorithm. Let's agree that O (1) and O (n) are the worst-case scenarios for this example.
Someone in the cinema asks you which row you're on. You are too lazy to count, so you ask the person sitting in front of you. You will simply need to add 1 to his answer to get the number of your series. Great idea, right? However, the person in front of you will do the same, and so on. In the end, the question will reach the first row and the person on the first row will answer “I am in row number 1!”. From this point on, the correct message (increased by 1 for each row) will go all its way back to what was originally asked.
A nurse carries a tray with a pack of cocoa and a cup on which a reduced image of her is drawn, holding all the same, which in turn contains an even smaller image of the same picture, and so on.
Let's imagine that your pipe is flowing in the garden. You take a bucket and a bit of sealer to fix it. After a while you see that the flow has become even bigger and you already need a plumber with normal tools. At the same time, you still use the bucket to somehow manage the water. After some time, you notice that a huge underground stream has opened. You already need to somehow handle gallons of water per second!
The buckets will no longer help. You need a completely new approach to solve the problem, as the quantity and rate of water flow has increased. To save the city from flooding, you may need the help of the government to build a dam, which requires a huge amount of engineering knowledge and a carefully crafted control system.
The term “big data” describes data sets that are so large and complex that it is impossible to work with them using ordinary data processing tools.
Imagine that you decide to walk, and your goal is to get to the highest place. You have a map, but there are thousands of possible paths on the map. You are too lazy and you simply do not have time to try each of them. So to hell with the map! You just go ahead with a simple strategy - to be greedy and short-sighted. You simply choose paths that are tilted up as much as possible.
After your journey was over and your body was exhausted and tired, you looked at the map for the first time. Oh my God! Here is a dirty river through which I had to cross instead of going up!
The greedy algorithm makes the best choice at the current moment in time and never reconsiders its decisions.
This time you climb another mountain. You decide to find a path that leads you to the highest point. However, you have no map and it is very foggy around. In order to make your task easier, you have downloaded a mobile application that tracks the path that you have covered and measures your current height above sea level.
You climb the mountain again and again. Each time you choose the exact same path that led you to the highest point you recorded before, but somewhere in the middle of your path you choose a slightly different route. You can also randomly select different starting points, such an algorithm is called climbing to the top with a random start. In this way, you do not attach to one area and reduce the likelihood that you will be stuck. The climb-to-top algorithm tries to find the best solution by generating nearby solutions. Each such solution is generated on the basis of a better solution for the current moment, but with one modified element.
This is Mount Everest, the biggest challenge you faced. Your task is to reach the summit, but it is very impractical to climb Everest many times. You only have one chance. And you have become much more prudent than before. Instead of always climbing up, you occasionally move down and find new ways to reduce the risk of going the wrong way. The higher you climb, the less likely you are to go down.
The example above describes memoization ( in Russian ) (yes, memoization is not memorization), a top-down approach to dynamic programming, in which the results of previous calculations are saved for future use.
Equality of the P and NP classes of problems is one of the most popular and important unresolved problems in computer science.
Suppose we have the following task:
Task 1: 7 x 17 = p
The answer is 119.
It was easy, right? But what if we have the opposite task:
Problem 2: p xq = 119 (p and q cannot be 1 and 119)
In order to solve the second problem, assuming that you have not seen the first, you need to go through all the possible numbers from 2 to 118. There is a good algorithm that allows you to simply decompose numbers into factors ( in Russian ).
What if I ask: can p be 7? You can answer very simply by dividing 119 by 7.
Multiplication is easy. Decomposition of a number is difficult.
As a result, task 1 is P (polynomial), since it is easy to solve. The computer copes with the multiplication of two huge numbers, without spending significantly more time than multiplying two small numbers.
Task 2 is NP (nondeterministic polynomial), because it is easy to verify, but difficult to solve. The factorization of the number 119 is still not a very difficult task for the computer, but what about the number, consisting of 500 characters? This is not possible for any computer today.
And here the important part begins: are NP-problems (for example, factoring) also P-problems (as multiplication), but we just haven’t yet discovered an effective way to solve NP-problems? Are NP problems really complex, or do we just need a moment of enlightenment from some good scientist (maybe you?) Who will come up with an efficient algorithm? Or maybe people are too stupid? Imagine that there is a machine or life that has a far greater intellect than a person . We are for them, like ants for us. Our level of intelligence for them is completely insignificant. The solution to this problem for them is 1 + 1.
One way or another, why is the problem of the equality of the P and NP classes of problems so important? If we can prove that P = NP, this means that all NP problems can be solved in reasonable computer time. We can cure cancer ( protein packing , in Russian ), crack passwords ( RSA , in Russian ), and the like. It will change the world.
The problem of equality of classes of P- and NP-complete problems is one of the seven in the list of millennium problems from the Clay Mathematical Institute. The reward for the first correct decision is $ 1 000 000.
Key Concepts # 3 - Computer Architecture and Engineering
3.1 How computers work
Computers work by adding complexity over complexity ( Computers work by adding complexity on top of complexity ). In order to drive a car, it is not necessary to understand how its engine works. Difficult details are hidden.
So how do computers turn binary code, zeroes and edinichki into programs? Here is an excellent video in which dominoes visualize how computers perform binary computations at the most basic, fundamental level:
Suppose you are working as a secretary in company A. You must answer calls, schedule meetings, type documents and stuff like that. You should always switch between these tasks according to priorities. Every time the phone rings, you must stop, no matter what you do, and answer.
Parallelism (parallel execution) is a property of programs and systems that allows tasks to run in overlapping periods of time.
In the end, you can no longer do your job due to the huge amount of data entry tasks. You complain to your boss and he happily hires a specially trained person to do data entry tasks.
This is what happens if you allow the execution of parallel transactions in the banking system, and the race condition is not processed:
You have $ 1000 in your account.
Someone transfers you $ 500, and you withdraw $ 300 from an ATM
Imagine that both transactions occur simultaneously — both transactions will be counted. that your current balance is $ 1000
Now, transaction A adds $ 500 to your account and your current balance becomes $ 1500. However, transaction B also considered that your current balance is $ 1000, and, say, it ends one millisecond later than transaction A. It will deduct $ 300 from $ 1000 and make your current balance equal to $ 700
Now you have $ 700 instead of $ 1200, because transaction B has overwritten transaction A
This happened because the banking system did not take into account current ongoing transactions.
So what can you do to cope with this situation? One simple way is mutual exclusion.
Now, regardless of which transactions go through, the system will lock [the slang, the closest definition in Russian, to block from the actions of other transactions, approx.per. ] accounts involved in the transaction.
This time, at the moment transaction A is completed, your account will be blocked. You can not withdraw money from an ATM. It will be unlocked as soon as transaction A is completed.
So mutual exclusion solves the problem, right? Yes, but no one wants an ATM to reject all requests every time a transaction occurs.
Now let's set different priority levels for different types of transactions. Let's say a cash withdrawal request has a higher priority than a bank transfer. When you withdraw money from an ATM, transaction A (which is about transferring money between accounts) will stop, allowing transaction B to be processed first, as it has a higher priority. Transaction A will continue to run as soon as Transaction B is completed.
4.4.2 Semaphore counting
The binary semaphore is simple. 1 = continue transaction. 0 = wait. On the other hand, a counting semaphore allows more than one process to run simultaneously.
Suppose you are a luggage manager in a spa and you have 30 cameras. You need to keep track of the number of keys you have, each time you receive or give someone a key, although you may not know someone. If all cameras are busy, people should queue up for them. The moment someone finishes, he will give his key to the first person in the line.
Interlocking is another common problem with parallel computing.
Let's use a similar analogy with the banking system, but with a different scenario. Just in case, let me remind you that access to a bank account is blocked if there is any transaction.
Peter transfers you $ 1000 (transaction A), and you transfer him $ 500 at the same time (transaction B)
Transaction A blocks Peter’s account and withdraws $ 1000 from his account
Transaction B blocks your account and withdraws $ 500 from your account.
Now, transaction A is trying to access your account to add $ 1000 from Peter to it.
At the same time, transaction B is also trying to add your $ 500 to Peter’s account.
However, since both transactions are not completed, neither of them can access the locked account. Both are waiting for each other to complete. Mutual blocking.
The couple just moved into the next apartment. They look very nice and friendly, often inviting you to dinner. One day you mention that you are soon leaving for a two-week vacation. They happily offer you to take care of your dog. You leave them your spare key. Since then you have not heard anything about them.
Social engineering is an attempt to deceive people to get them to disclose their personal information.
The hacker checks every possible entry with the aim of finding the easiest way to get inside. Who knows, maybe someone left the window open on the second floor?
The burglar pretends to be a plumber, and you yourself open the door for him. He fixes your current pipe and everything looks great. But after he leaves, you discover that your jewelry is gone.
Trojan is a malicious program that pretends to be useful by running malicious code in the background.
Your lock is jammed and you called a locksmith. He repaired your lock and secretly made himself a duplicate key.
The rootkit obtains administrator rights on a computer in some way (for example, using social engineering), after which it masks the necessary files in a special way so that it is difficult to detect with antivirus software.
Imagine 100 people coming to your little bookstore at one time. Your bookstore is full and no one else can enter. You can't ask them to leave, because it doesn't seem like they come in groups. They may not know each other at all. Many of them seem really interested in your books. Some even ask you where the bookshelves are. Someone at the checkout is trying to pay a penny.
People come and go for hours. They all look absolutely normal. By the end of the day, you realize that you have sold only one book. Remember the guy who paid in pennies?
DDoS is trying to disable your site or service, just flooding, overflowing it with visitors.
Suppose Alice and Bob want to send something to each other. To make sure no one can see what they send to each other, they lock it in a box. They make 2 identical (symmetric) keys for the lock and meet to exchange these keys.
Exchange of the same keys tolerably works for two people. What if Alice wants to share things with another guy named Carl, and Alice also doesn't want anyone to see these things. Alice cannot use the same lock and key they used with Bob, since Bob can open the box with his key.
Of course, Alice can exchange a completely new and correspondingly different key and lock with Karl, but what if Alice wants to exchange things with 10 different people? She will have to own 10 different keys!
So Alice comes up with a great idea. She makes one key (private key). She also distributes identical locks (public key) to her friends. Anyone can lock the lock (encrypt), but only she can open the box (decrypt). Now, anyone can send things to Alice using the locks she has distributed, so Alice no longer needs to have different keys for different people.
If Alice wants to send something to Karl, she will ask Karl for the lock (public key) with which she can lock (encrypt) her things and send it to Karl.
The basic principle is this: everyone has his own private key to decrypt messages and provides senders with public keys to encrypt messages.
You figure out everything you need to do, and write down the requirements. It's like a waterfall: there is no turning back until you start from the very beginning. You move to the next phase only when the current phase is completed.
So, you learned how to program. You write a good and beautiful code (I hope), and everything is fine. Let me introduce you to cowboy programming — a development methodology that is not taught in college.
Writing very detailed instructions for a dull, but obedient machine.
What does it mean? Imagine that you have to teach a child to shower. The child only knows how to follow your instructions. So you say to the child:
Enter the bath
Turn on shower
Take a shower
Take the soap
And so on...
Oh, hell, the child did not take off his clothes before getting into the shower!
This is how the computer works. You have to tell the computer what exactly it needs to do. He does not know how to assume and will never think about the consequences.
Question 3: why shouldn't you distract the programmer when he is focused?
The process of concentration is like falling asleep. Imagine that you will be a person who is close to falling asleep in the next few seconds. Now he will have to spend more time to fall asleep!
Objects like people. They live by doing what they have some inner knowledge about. And they have a memory, so they can memorize something. And instead of interacting with them at a low level, you interact with them at a high level of abstractions, just like we are now.
Here is an example: if I am your laundry facility, you can give me your dirty clothes and send me a message saying “Please wash my things”. I know the best laundry in San Francisco. - . , . , . « ».