📜 ⬆️ ⬇️

Echoes of the past - getting data from framework database 3

In this article I want to tell you about the situation that recently happened to me and the accepted way of its solution. I'm not a professional programmer, but a little experience helped me solve this problem.

Immediately I would like to draw your attention that this article will not go about the “popular framework” for any programming language, it will be about the old “framework” for ms dos and parsing the format in which it stores data with further conversion of this data to one of the modern database formats.

Pre story
I am a student in one of the city universities, studying in the specialty “Aquatic biological resources and mariculture” and, as I said before, I am not a professional programmer, however, starting the practice, which should result in graduation work - the programming skill did not stand aside.

One of the tasks of my work was the analysis of a certain number of morphometric parameters of fish. However, as you may have guessed, the original data was in handwritten form. For the figurative representation of data size, these are 40 parameters for each specimen (fish, and for a technician, 40 columns in the database) and the sample size is 188 fish (and as a result, 188 lines with 40 columns filled).
')
It would not be the most rational decision to enter this data into any database manually. Inquiring from a practitioner’s mentor who analyzed this data in 1994 with the question “Is there an electronic version of this data?” A positive answer was received. After which they were successfully copied from a 5-inch floppy disk and taken home for analysis. It seemed that success was close and a large amount of button work bypassed me. So it was, but not quite.

Data analysis
Having opened the copied data from a floppy disk, the first thing that caught my eye was the file extension (.FW3). I haven’t met with such a format yet, and naturally I turned firstly to my favorite search engines for advice. Almost immediately it became clear that this is the format of the framework 3rd version for OS dos, the latest version of this software product is 10. After reading the site of the creator of this product, it turned out that it would not be easy to get it (it is paid). Search by torrent trackers and other sources of this software also did not give anything, after which it was decided to do an analysis of the format of the files received.

For each table in this format there were 3 files (let’s say that the table name is table):


Opening several index files (table_I.FW3) showed a definite pattern:

now we could safely find the beginning and end of the columns for parsing.

A similar situation was repeated with the row repository:

however, here the separator indicated not only the beginning and end of table a and the beginning and end of the line.

Even to an unskilled programmer, the presence of such patterns clearly indicates that the data can be programmatically converted to any desired format. In my case, mysql was chosen as the data warehouse, since I had sufficient experience with it.

Data conversion
At hand there was a bunch of wamp, and therefore php was chosen for data mining (perhaps this is not the best solution, however, the functionality of the language is quite enough for this task).

I think there is no need to describe the implementation of the 200strok class for the implementation of this task. I give the source code below, maybe it will help someone when converting such files.
Parser class
$config = array( 'url' => 'http://localhost', 'db_host' => 'localhost', 'db_user' => 'mysql', 'db_pass' => 'mysql', 'db_name' => 'fish' ); class dataMiner { private $database = null; private $file_byte_column = null; private $file_byte_table = null; private $filename = null; //    private $string_column_array = array(); private $string_table_array = array(); //     private $column_byte_point = array(0, 129, 1, 255, 129, 5, 0); private $row_start_point = array(0, 129, 1, 255, 129, 8, 0); private $row_string_point = null; //    private function db() { global $config; if($this->database == null) { try { $this->database = new PDO("mysql:host={$config['db_host']};dbname={$config['db_name']}", $config['db_user'], $config['db_pass']); $this->database->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES utf8"); } catch(PDOException $e) { exit("Database connection error ".$e); } } return $this->database; } /** **   */ public function open($file_fw3_name) { $this->filename = $file_fw3_name; $this->file_byte_column = file_get_contents('db/'.$file_fw3_name.'_I.FW3'); $this->file_byte_table = file_get_contents('db/'.$file_fw3_name.'_T.FW3'); return $this; } //         //    //    $this->string_column_array //         public function makeReadable() { //          $point_string = $this->byteToString($this->column_byte_point); $start_point = strpos($this->file_byte_column, $point_string); $end_point = strpos($this->file_byte_column, $point_string, $start_point+1); $length = $end_point - $start_point; $column_string = substr($this->file_byte_column, $start_point+strlen($point_string), $length-strlen($point_string)); $column_array = explode(" ", $column_string); foreach($column_array as $value) { if(strlen($value) > 0) { $this->string_column_array[] = $value; } } //           $this->row_string_point = $this->byteToString($this->row_start_point); $table_start = strpos($this->file_byte_table, $this->row_string_point); $table_end = $this->findEndPoint(); $table_length = $table_end-$table_start; $row_string = substr($this->file_byte_table, $table_start, $table_length); $line_row = explode($this->row_string_point, $row_string); $i_j = 1; //    for($i=1;$i<=sizeof($line_row);$i++) -     foreach($line_row as $single_row) { if(strlen($single_row) > 0) { $item_array = explode(" ", $single_row); foreach($item_array as $field) { if(strlen($field) > 0) $this->string_table_array[$i_j][] = $field; } $i_j++; } } } /** **       */ private function findEndPoint($start = 0, $before_starter = 0) { //    ? if($start == 0) { $start = strpos($this->file_byte_table, $this->row_string_point); } if(FALSE === ($end = strpos($this->file_byte_table, $this->row_string_point, $start+1))) { return $before_starter; } return $this->findEndPoint($end+1, $start); } /** **       */ public function byteToString($data) { return call_user_func_array("pack", array_merge(array("C*"), $data)); } public function storeDb($table) { // ..        prepared statement - ,    //    -     )) $column_size = sizeof($this->string_column_array); $create_query = "CREATE TABLE IF NOT EXISTS `{$table}` ( `table` VARCHAR( 24 ) NOT NULL ,"; $insert_query = "INSERT INTO `{$table}` (`table`, "; for($i=0;$i<sizeof($this->string_column_array);$i++) { if($i==(sizeof($this->string_column_array)-1)) { $create_query .= "`{$this->string_column_array[$i]}` decimal(24,2) NOT NULL DEFAULT '0.00'"; $insert_query .= "`{$this->string_column_array[$i]}`"; } else { $create_query .= "`{$this->string_column_array[$i]}` decimal(24,2) NOT NULL DEFAULT '0.00' , "; $insert_query .= "`{$this->string_column_array[$i]}`, "; } } $create_query .= " ) ENGINE = MYISAM ;"; $insert_query .= ") VALUES "; $s = 1; foreach($this->string_table_array as $row_array) { $field_size = sizeof($row_array); $insert_query .= "( '{$this->filename}', "; foreach($row_array as $field) { $insert_query .= "'{$field}', "; } $null_diff = $column_size-$field_size; for($i=1;$i<=$null_diff;$i++) { if($i==$null_diff) { $insert_query .= "'0.00'"; } else { $insert_query .= "'0.00', "; } } if($s == sizeof($this->string_table_array)) { $insert_query .= " );"; } else { $insert_query .= " ), "; } $s++; } $this->db()->query($create_query); $this->db()->query($insert_query); } public function clean() { $this->file_byte_column = null; $this->file_byte_table = null; $this->string_column_array = array(); $this->string_table_array = array(); } } 

An example of using the class to parse data into the mysql database from FW3 (source table FW3 M1 _ *. FW3, target table in mysql bio_trachurus):
 //   $miner = new dataMiner; //     fw3       $miner->open('M1')->makeReadable(); //              $miner->storeDb('bio_trachurus'); 


results
As a result, I successfully received the observations I needed for analysis in a convenient format for me, removing myself from wasting time on manual entry.



Source codes published on github with an example.

Ps - do not scold for the quality of the code, it was written only to perform one task (in my case).

Ppt - trachurus has no relation to the possible train of thought of some users. Refine on Wiki

Source: https://habr.com/ru/post/180087/


All Articles