📜 ⬆️ ⬇️

Lectures of the Technosphere. 1 semester Methods of using DBMS in Internet applications



Today we bring to your attention the next publication in the framework of the regular column "Lectures of the Technosphere." This time you can study the materials on the course “Methods of using DBMS in Internet applications”. The goal of the course is to study the topology, diversity, and basic principles of data storage systems, as well as the algorithms underlying both centralized and distributed systems, demonstrating the fundamental trade-offs inherent in one or another solution. Course teachers: Konstantin Osipov kostja , Eugene Blikh bigbes , Roman Tsisyk.

Lecture 1. Variety of storage solutions. Data models of classical and NoSQL systems. Consistency patterns. Semantics and overdraft acceptability in Internet applications


The advantages and disadvantages of the relational model for working with data on the Internet. Key-value data model. BigTable model. Differences between document and object. The concept of storage aggregate. Data schema management. The trade-off between consistency and performance. Competitive data access in a client-server and fully distributed architecture. An example of graph problems in RDBMS.


')

Lecture 2. Classical and modern algorithms for data organization for two-level memory


(Beginning of the lecture in the previous video, ending next)
B-trees. Inverted listings. Multi-pass merge sort. The cost model is DAM. The concept of cache-oblivious algorithm. Basic cache-oblivious algorithms. The concept of write amplification. Fractal trees. LSM trees. Bloom filters. Duplex trees. BitCask: AOF architecture, keydir architecture.



Lecture 3. Caching as a mechanism for improving the efficiency of the system (Part 2)


Algorithm Least Recently Used, implementation in a DBMS, Midpoint insertion strategy. The concept of online-algorithm. The problem of "renting skis." Paging / caching as an online algorithm. Algorithms LFD (Longest Forward Distance), FIFO (First In, First Out). Conservative algorithm Randomized MARK algorithm.



Lecture 4. DBMS Architecture


Methods for reviewing the DBMS architecture. Request life cycle: receiving, parsing (secondary keys), optimization based on rules, reading metadata, checking access rights, building a plan for executing a query, evaluating and optimizing a plan, executing, sending a response. Manage locks. Disk storage format. Magazine.



Lecture 5. Transactions


ACID transaction properties (Atomicity, Consistency, Isolation, Durability). Atomicity, durability of transactions. Apply locks. The degree of detail. Principles of WAL. Physical logging. Minimal bookkepping. Simple journal optimization. Cache model Recovery after failure, acceleration recovery. Logging of completed transactions. Isolation of transactions (intuition, two-phase blocking, formalization, theorem 2PL, graph serializability).



Lecture 6. Managing locks


Capture locks to ensure consistency and isolation of transactions. Compatibility Matrix The hierarchy of locks. The concept of deadlock'a. Algorithms to identify and prevent deadlock. Update lock, upgradable lock, granularity lock, intention lock, lock for missing entries. "Fasting" in the seizure of locks. The problem of reducing the performance of 2PL-system with glut. Locks on the physical level.



Lecture 7. Optimistic transaction management algorithms


Definition of optimistic algorithms. The principle of validation / certification, the three phases of the transaction. Parallel validation. Validation - for and against. Timestamp ordering: write, read, rollback. MVCC (Multi-Version Concurrency Control).



Lecture 8. Automating the growth of a distributed system


Sharding function Garage Sharding: Double up with replication. Bad and good shard keys. Tabular function. Consistent hashing. Guava, Sumbur. Routing (smart client, coordinator, proxy, local proxy on each application server, routing inside the database). Re-sharding. Sharding and data schema change.



Lecture 9. Introduction to distributed systems. 2PC protocol


Model of a distributed system (methods of communication and coordination, possible failures). Variants of the model. Properties of a distributed system. The dilemma of the two generals. Variants of distributed DBMS. Protocol 2PC: model, role and actions of the coordinator, preparation phase, recovery options, period of uncertainty, workflow, actions of the participant, crash-safe TC, logging, example of blocking the algorithm, performance, optimization.



Lecture 10. DCA replication, Paxos algorithms


Log Replication Task. Distributed algorithm requirement as applied to Paxos. Distributed DFA: Paxos approach. Paxos components. Problem setting mutually exclusive choice. Identification of offers. Basics, steps, scripts work Paxos. Liveness property. Multi-Paxos, its tasks. Selection of LSN for the proposal. Leader selection method. PREPARE query optimization. Client protocol Changes in the composition of the quorum.



Lecture 11. RAFT algorithm


RAFT overview. Election of the leader (possible states of the participants, the concept of a period (epoch)). Normal mode (log replication). Log format, RAFT steps with no failures, consistency of distributed log. Checks in AppendEntries. Secure leader change. Neutralization of the former leaders. Correction of discrepancies in journals, commit of current period records, records from previous periods, complete rules for commit. The protocol of the client. Changes in the composition of the quorum. Change cluster configuration. Double consensus.



Previous issues


Technopark:

Technosphere:

Subscribe to the youtube channel Technopark and Technosphere!

Source: https://habr.com/ru/post/256039/


All Articles