📜 ⬆️ ⬇️

Data Modeling and Databases for a Freshman

I, as a freshman student at Innopolis University , are often asked what we are doing here. Especially for Habr, I wrote a tale about how we sawed the DMD project.



Attention! The author does not guarantee that his jokes will be clear and funny.

We have at the University such a basic (core) course that is taught to absolutely everyone - Data Modeling and Databases (DMD). As the name implies, in this course we are taught databases: what it is, why it is needed, how to use it, and what happens else.
')
Mainly we are working with PostgreSQL DBMS, that is, we are working with a relational database. As a beginner, this topic is very interesting to me, since the database is a “repository” of information. Well, who owns the information, he owns the world. The teaching staff of this course is as follows:
  1. Primary Instructor - Qiang Qu (China),
  2. Secondary Instructor - Sadegh Nobari (Iran),
  3. Assistant Instructor - Jooyoung Lee (South Korea) and Waqas Nawaz (Pakistan),
  4. Teaching Assistant - Marat Valiev (Russia).

Basically, all the lectures are held by Qiang Qu, but sometimes the lecture is divided into two parts, where Sadegh Nobari joins it. Freshmen have seminars conducted by Marat Valiev. It is also worth noting that the lectures are held in English (even in spite of the slightly noticeable accent of the teachers, everything is perceived easily and clearly).

One sunny September day, the instructors told us the good news: “Guys, in our course you must develop a system for managing publications (articles). This will be your project, which is estimated at 15% of your final result on the subject. " The whole project is divided into five parts:
  1. Develop and implement a relational model using existing DBMS.
  2. Develop a web interface for interacting with the database created in phase 1.
  3. Develop your database and link it with the one used in the first phase.
  4. Develop a web application that in real time shows the occupied memory, CPU load and disk I / O.
  5. Creative idea

To start work on a project, you need to share on the teams from your group. Requirement: a team of no more than three people. As a result, it turned out that one person turned out to be a serious team.

So, I was lucky to work in three and we were united by the fact that none of us did such things. It is necessary, it is necessary (the course you do not want to fill up the course, and it’s interesting all the same). Freshmen were assigned the task: to complete only two phases.

For the first phase , it was necessary: ​​create an ER-model and Relational model of our future database, normalize the tables, find a website with scientific articles and extort data on these articles into our database (at least a million articles), write SQL queries to rank these articles and queries to search for similar articles. Go!

To begin with, we created the ER-model of our database (I’ll say right away that a good model came out somewhere after three or four reworkings). Then they transformed all this into a relational model and normalized it to BCNF. Everything turned out just fine. After long and painful attempts to write a parser in PHP, it was decided to start Google. As a result, a grabber was found on a python, which downloads information on articles from the site and translates it all into XML files. As a result, we received 1074 XML files, each containing 1000 entries about articles. The site from which everything was pumped out - arxiv.org. Now we need to parse all these files into our database. The DOM-parser in Java was used to process the XML file. In it, we connected to our database and sent data using the JDBC Driver, kindly provided by PostgreSQL. At the exit we got a database with 1074000 entries. Wow! Half the phase is complete. The time has come for requests, but they will be written about later.

The time has come for the second phase . The team shared on the back-end and front-end. For the front used HTML, CSS and JavaScript. The first step was to create a registration form and connect a database to it to create new users and validate the entered data by registered users. No e-mail newsletter, just a pop-up window about successful registration. For the registration form, ready-made templates for JavaScript and PHP for the connection with the base were used. Next was the creation of the main pages of our site. Initially it was assumed 5 pages, in the end, three were created - Home, Add (adding a new article) and Search (for searching articles in the database). On the last two pages, simple search forms and additions were used, diluted by CSS for beautiful things. On the main page, there are two diagrams showing the number of articles from 1996 to 2015 and the number of articles in 6 main categories. For this, we used ready-made templates for JavaScript and PHP for select from the database of necessary data. At this front-end completes its activities and puts itself in the hands of back'a.

Back is divided into 4 stages. At the first stage requests for search were written. The search is performed by keyword, category, publication date, or various combinations of the above methods. Search by keyword is a search by name, summary and category. Results are given by relevance - in the first place is a coincidence in name, in the last place - in category. For quick retrieval of the search, full-text search and a GIN index with the tsvector type were implemented. GIN was chosen because it makes a quick and efficient search in the database.

At the second stage, sorting methods were written. The default setting is sorting by relevance (using the Levenshtein algorithm). Sorting by date is also possible (user can select this type). The third stage - requests to add articles. Implemented with simple INSERTs. It seems to be simple, but it was necessary to provide one thing - protection against SQL injection. For this PHP function pg_escape_string was used for the entered data. The same method was implemented when creating the registration form.

The last stage - updating and deleting articles. Again standard SQL queries are used and protection against injections is provided.

In this two phases, well, our project is completed.
Below you can see examples of some works of first-year students.




On this story of the first experience in creating projects came to an end. I hope that I have satisfied the curiosity of many, and in the future we will have projects more abruptly, more fun and more large-scale.

Good days / nights and success in your studies / work! Go to the University, the competition for grants for the next academic year is already open at apply.innopolis.ru

Marcel Gusmanov, 1st year undergraduate student

Winter photos of the University Innopolis campus
University

Dormitories

Sport center

Source: https://habr.com/ru/post/272235/


All Articles