Do it yourself: MSc Computer Science at the level of top American universities from home

Introduction

I have long wanted to write an article about education in Computer Science, but my hands did not reach. I finally decided to do it. So, what will be discussed? It's about what MSc Computer Science diploma at top US universities (in full detail, including core courses, books and projects) and how to comply with it.

Why MSc? This is a kind of fork: on the one hand, after MSc, you are an engineer who is already ready for life (yes, we are talking about engineering training, I think this is the most painful place in our education system), on the other hand, you can safely follow the PhD path. As you know, you can get into the PhD program and not really knowing how to program - this is especially true for theoretical Computer Science. On the other hand, finding a programmer’s job is also not very difficult, and often does not require a powerful education. But reaching the level of MSc - you get the opportunity to understand how all the new ideas in Computer Science, and the opportunity to put them into practice. That is, on the one hand, it's cool to sort out some deep learning and do something new in it, as well as take and write your operating system (who did that?). And you are not clamped into the framework of a narrow specialization (unless of course you continue to learn). So you are now a universal soldier, ready for anything.

I hope this article will be useful:
1. Students who want to meet the high standards of top US universities, or are going there to graduate school in Computer Science
2. Professionals who want to close the "holes" and gaps
3. Maybe someone from the teachers will take a note for their courses.
4. Students, graduate students of American universities - I would also like to receive feedback, especially with regard to the latest trends in education
')
What will be written here? Minimum philosophy and general thoughts: a specific program of undergraduate and graduate courses, of course from the disciplines closest to me. All courses were personally felt in their own skin, so I am writing about it. (I tried to enroll in all the interesting courses that were, but my main focus is system programming, databases and artificial intelligence. Of course, there is some bias, but I try to offer a more or less universal program).

Content

1. Basic training
2. Undergraduate program
3. Graduate program
4. Ready to check yourself? Computer Science Comps.

Basic training

The first step is to go through math. The generally accepted theory in the Russian academic environment is that our mathematics is very, very cool, and we are ahead of the rest. But the line between theoretical Computer Science and mathematics is thin, and further, we will not call everything that is included in Computer Science. Well, in Computer Science, our successes in recent years, alas ...

In a nutshell, the essence is this - even though there is not much mathematics, you should not go too far. We need to get a hybrid education - a mixture of a scientist and an engineer, and do it in a finite time. Therefore, mathematics should be minimized. For a lot of interesting things are in Computer Science.

- Analysis - a confident mastery of the basics will be fine, that is, multidimensional analysis must be understood, but it is not necessary to go deeply with all the evidence
- Linear algebra - you need to understand well, a very necessary thing everywhere. Moreover, it is desirable at a fairly advanced level (eigenvectors, singular decomposition, conjugate gradients)
- Difury - you can safely neglect, very rarely where you need them
- Optimization is very useful, especially in machine learning this is just an iron requirement
- Algebras, topologies, etc. - on the one hand, it is terribly useful, but on the other - it seems to me to study it mathematically, abstractly, without direct application - you can learn when you need it (for example, relational algebra or category theory for type systems ) and the necessary properties and principles have already been studied in conjunction with practice
- Logic, set theory - I think that it is necessary to understand the basics. ZFC must be taken.
- The theory of probability, statistics - at a minimum from classical mathematics, it is better to learn what is needed for Computer Science in the context of machine learning, and you risk digging into what is not particularly useful
- Game theory is a useful thing, but superficial knowledge will last long
- Functional analysis, variational methods - very cool, but to study only if pripret, for example in machine learning
- Numerical methods - only if you want to do them later

Everything else is either not math, but Computer Science, or not (until needed for a specific case)

Programming languages

A controversial topic, there is a lot of choice, I propose such a minimal gentlemanly set:

- assembler - you need to own some assembler, because we have to make our own compiler. There are several options:

RISC is a good thing, but where to get it, just emulate, is not very convenient. But if you adjust the environment - then RISC
LLVM is a useful thing, but it simplifies a lot of things, not 100% iron.
x86 is terrible, but gcc -S is very nice to use right on your laptop.
JVM bytecode - did not try to generate, probably there is a convenient way. But, if you use JVM a lot later, it can be very useful.

- C ++ / Java - unfortunately in system programming you cannot get away from these. But the maximum need to do without them.
- Python - quickly build something, test a hypothesis, cool libraries, especially in machine learning and math + basic pieces from functional programming
- Scala - a practical functional language, if desired, you can go down and close enough to the gland. A lot of all the system already written on Scala. It should be the main workhorse.

(SQL, Prolog - naturally too, but these are small tongues)

Let's start?

Undergraduate

The course will be considered as one quarter - 3 months. Skip all intro programming courses.

1. Discrete mathematics (do not forget that this is no longer mathematics)
Combinatorics, graph theory, discrete ter. ver., recurrence relations, function generators.

In fact, this is usually the initial course, you can watch online, for example at MIT:
ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010

All this can be missed and mastered in other courses (as time does not press). But it can become boring, and this is bad.

2. Algorithms and data structures
Starting from any sorting, hash tables, different trees, then algorithms on graphs (Djikstra, min cut / max flow).
On conceptual knowledge: assessment of complexity in O notation, greedy algorithms, dynamic programming.
Additional topics: linear programming, algorithms for strings, random algorithms.
The very first beginning of the theory of complexity.

Book: www.amazon.com/Introduction-Algorithms-Edition-Thomas-Cormen/dp/0262033844

Dynamic programming is omnipresent, trees must be understood, the lower bound must be able to be derived for sorting. Purely theoretical course, tasks in textbooks there. In general, quite simply, it does not take too long.

PS Algorithms go on in each subject area, and return to the general course only in the graduate program.
But you can immediately recommend a book on random algorithms (recently a colleague advised, so far only flipped through, but it seems very competent), it is graduate level, but you should start diving early: www.designofapproxalgs.com/index.php

3. Theory of Computation
Here I am promoting Sipser, in general a great book - must read, and it is also suitable for the graduate program.
www.amazon.com/Introduction-Theory-Computation-Michael-Sipser/dp/113318779X

This is a super-important course, it will lay the foundation for everything else. Sipser is very intuitive, logical and connected. (Recently I looked through Kolmogorov - they immediately understand that it’s better not to enter without a level in Mechmatovo. Sipster has, on the contrary, a minimum of requirements, a minimum of formalization, only the most necessary, and everything becomes very accessible).

We start with the definition of a “task” - in Sipser this is a language. That is, many lines. The algorithm is something that says on any line - yes / no. This concept goes all the work. Further, the hierarchy of languages: regular, context-independent, computable, enumerable, noncomputable. Also very well presented is complexity - P, NP, NP-complete, NP-hard, co-NP + random classes.

In addition to good theoretical training, we get excellent skills:
We work with finite automata and regulars
We work with grammars and with a machine with a stack.
We program in the Turing machine - it really needs to be done, it is fun, it widens the mind.
You can program by car here, even have breakpoints! morphett.info/turing/turing.html
We learn to prove which tasks are unsolvable, with the most convenient and fastest ways.
Playing with reductions - transferring one task to another - also greatly expands the mind

The course is purely theoretical, excellent tasks, those with asterisks can break the brain
For example: we will prove that the Turing machine becomes a finite state machine in power, if we forbid it to rewrite the input characters on the tape.

4. Mathematical logic and set theory
Usually not included in the program Computer Science, but I think you need to learn. I studied this book, a very simple book, very pleasant:
www.amazon.com/Elements-Set-Theory-Herbert-Enderton/dp/0122384407

Everything, on this purely theoretical training in undergraduate Computer Science is complete, now - applied courses

5. Compilers (2 quarters)

Book: en.wikipedia.org/wiki/Compilers : _Principles, _Techniques, _and_Tools

Much has already been mastered from the theory of computing, here the whole emphasis on practice. Our task is to make a real compiler from a serious language to an assembler. For example, we were given en.wikipedia.org/wiki/Object-Oriented_Turing , but something more interesting is possible.

- Parsing: you need to take something reasonable, like JavaCC or ANTLR.
- Transfer to AST
- Semantic analysis: light, although it can be confused with type systems
- Code generation

If there is time and energy, add Intermediate Language and a little bit of optimization, the simplest one.

As a result, we perfectly understand how the compiler works, how the function call is implemented, how to make objects, methods, arrays and so on.

Note: we had to write everything in C ++, but in the modern world it is absolutely not necessary for educational purposes. On the other hand, if the compiler is written on Python or Scale (ANTLR can work with python, I don’t know what Scala is - if someone knows a good tool, please suggest. I tried to understand what Spark SQL uses, but didn’t dig it) much more interesting with the least loss.

6. Architecture
Book: www.amazon.com/Computer-Architecture-Fifth-Edition-Quantitative/dp/012383872X

Well here it is - read, do puzzles. But if there is a possibility, it would be nice to design a mini-CPU.
On something like: www.logiccircuit.org

7. Advanced data structures
Not sure what good books are here, but it would be nice to make your own disk indexes:
- B-tree
- Linear hash
- R-tree

Everything is tough here, C ++. But you can not bother, unless you want to build a database / search engines.

8. Operating systems (2 quarters)
Here I have this approach - read the book: www.amazon.com/Operating-System-Concepts-Abraham-Silberschatz/dp/0470128720 , but only for the sake of general concepts. But really learn the OS this way:

Take Nachos: en.wikipedia.org/wiki/Not_Another_Completely_Heuristic_Operating_System (or Nachos 5.0j) and write your own modules:
- Synchronization primitives
- Library of threads
- Multi-processing
- Mini Shell
- Virtual Memory
- file system

Note: this is certainly pretty hardcore, but worth it. Probably better to take Nachos 5.0j, debugging virtual memory, which is not very pleasant in the memory problem itself.

After such an exercise, there will be no mystery about operating systems.

9. Databases (2 quarters)
We read the book: www.amazon.com/Database-Systems-Complete-Book-Edition/dp/0131873253
We do the next project: we write the SQL engine on top of a simple repository, if there is not enough time, we do it like MySQL - we run AST. If there is more time, we translate it into relational algebra and add an optimization pair (some rule-based).

Note: again, at one time I had to do everything in C ++ / lex / yacc. Time has gone ahead, if done on Python or Scala, much more can be done with less losses. Or take immediately more interesting query language, for example OQL or SQL ++.

10. Artificial Intelligence
AI - in my eyes has always been and will be the most interesting area in Computer Science, since there they always solve very complex problems. At the same time, as soon as something starts to work out well, it stops being an AI and stands out as a separate discipline. In general, we read a wonderful book, and we are doing 2-3 projects.
www.amazon.com/Artificial-Intelligence-Modern-Approach-Edition/dp/0136042597

Recommended projects:
- A * search for the problem 8-queens or something else, tinker with heuristics
- theorem proving (resolution theorem proving)
- make an effective conclusion for the Baes network
- implement the temporary logic of Alain
- self-learn to play checkers or play mini-max games of chess (counting wood)

Hardcore: you can do all the tasks in the Probabilistic Graphical Models course, but this is more for the postgraduate level.

11. Machine learning
Without this topic now in any way. I don’t know which tutorial is better for undergrad, but ideally to master Bishop:
www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738

Here is the class Andew Ng at Stanford for undergrad, not to be confused to the class on the Curser, completely different levels:
cs229.stanford.edu

And I found just wonderful lectures on YouTube (mathematicalmonk): www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA

12. Computer graphics
Not really necessary, but each programmer once wanted to write his own game, so you have to take it for fun and talk for beer.
Not sure what good books are now.
We wrote a piece of OpenGL in C, very useful - it gives an idea how all 3D engines work.
You can even write your own Ray-Tracer - also a cool thing.

13. Computer networks
Missed this course, but there is a very good course on the Cursor.
www.coursera.org/course/comnetworks

14. Distributed systems
Strongly intersect with operating systems, but a lot of their important pieces.
I would just read a book, especially key concepts, and not specific systems:
www.amazon.com/Distributed-Systems-Concepts-Design-Edition/dp/0132143011

Synchronization, global system state, consensus, transactions, etc. Now all sorts of MPP systems have become very popular, here are the basics on which they hold (or do not hold, I am preparing a popular article about all sorts of fashionable databases).

15. Programming languages
Such a course often comes across, but usually it is a waste of time imo. At this point, a compiler has been written for Turing and SQL, everything is clear. You can get stuck with something like Haskell or ML. As an option, examine XQuery for a small expansion of consciousness.

Here I would finish the undergrad program, you have a diploma B.Sc., congratulations.

Or you can go in breadth: Security / Cryptography, Scientific Programming, bore in AI: Computational Lingustics for example. What do we have after such a program? You can safely go to work, but there are still gaps in the theoretical basis. You can fill in everything yourself, but you can continue to study at M.Sc.

Graduate

Graduate school in Computer Science is not our graduate school. Here you continue to study for a couple of years and pass a comprehensive exam, in the case of a PhD in 5 disciplines, that is, you need to keep a very serious basis in your head at this moment. Very useful exercise (I passed on MSc on the 3rd, it is much easier).

And specialization begins very quickly. But let's get the basics:

1. Algorithms
The same as in undergrad, but already really, deeper, plus randomization.
Here, as if there is no universally recognized book, I advise you to climb on the websites of universities. That is, many are costing articles, slides, etc.
Well, random algorithms are our everything. So open the book: www.designofapproxalgs.com/index.php

2. Computation Theory, Essentially Pure Complexity Theory
You can cover Sipser, and try to read something else. We had Papadimitrou, but this is certainly not for the faint of heart. There are mostly some reductions, there used to be NP-complete, now there are many in random classes.

Still, if you like logic, it makes sense to read Descriptive Complexity: people.cs.umass.edu/~immerman/book/descriptiveComplexity.html

3. Architecture
The same book, but in an adult way. You can build some advanced CPU.

Next is specialization by and large. I would advise such areas, but this is my bias:

4. Machine learning
At the level of graduate school, you can take Bishop (he himself has not passed all of it), and the theory of machine learning
I like the good old: www.amazon.com/An-Introduction-Computational-Learning-Theory/dp/0262111934
But probably already much more relevant are the books.

And you can hang on the cursor:
- Probabilistic Graphical Models
- Another very good Natural Language Processing course from Columbia University
- Modern neural networks: class.coursera.org/neuralnets-2012-001

5. Database Theory
www.e-booksdirectory.com/details.php?ebook=7942
This is a very exciting field, everything is mixed here: Logic, Model theory, Complexity, Descriptive Complexity, even game theory have been exploited. The book is quite heavy and formal, but worth it, at least selectively read and do the exercises.

What is missing?
There is absolutely nothing about formal methods, well, it's a complicated thing. In theory, between set theory, artificial intelligence and database theory + descriptive complexity - there is all the tools for verification and evidence, so this course should already be purely applied. If you have experience of such a course - it would be very interesting to learn.

Internet math - well, this is also a slightly separate topic, but the whole base is laid.

Everything, if you get here, doing projects and solving problems, we can assume that you have reached the level of M.Sc. in Computer Science at the top universities in the world.

Let's pass the test?

You can search the network for “Computer Science Comprehensive Exam” or something like that, and you can find real M.Sc. exams. and Ph.D.

I decided not to post the links, so as not to scorch once again, as they are usually not very spread.

PS That's all. Probably something is outdated, but the basics always remain relevant.

PPS Of course, sitting at home on the couch is difficult to master such a program. Doing exercises and big projects is especially difficult. It is also very difficult without a university environment (but it can also be difficult at the university — nights in the laboratory are a private phenomenon). But! - everything is possible. All the above books (well, almost all) are written as simply as possible, they require a minimum of preliminary knowledge, they immediately show how to apply this knowledge, in general they are very suitable for independent study. Courses on a course or lecture notes from universities are also very useful, it takes much less effort to see a lecture than reading a book, solving problems, etc.

PPPS Tips for professionals who have “forgotten everything” are clamped in their niche, but they want to go out again to the bleeding edge: he went through it himself, there is a clouding of brains and it seems that everything is lost and there is no hope. But in essence, this is how to start playing sports after several years of unhealthy food - we start small, set short goals, rejoice at the slightest progress, and quite satisfied all the formulas will become clear again, the general worldview will be built again, and you can feel yourself like a student / graduate student again.

University sites:

Stanford: cs.stanford.edu
MIT: www.eecs.mit.edu
UC Berkeley: www.cs.berkeley.edu
UC San Diego: cs.ucsd.edu (my alma mater, well, not yet # 4, but crawling up every year!)

Source: https://habr.com/ru/post/251747/

All Articles