VARCHAR, CHAR, STRING, INT, SMALLINT, BIGINT, NUMERIC, FLOAT, DOUBLE, DATE, TIME, DATETIME TIMESTAMP.
on dublicate key update key update "count"="count"+1
for all insert requests. I did not use hashing, since the tests showed that there is no use for it (most likely, CUBRID already uses hashing for string keys). Multithreading will also not have an effect - at one point in time only one process can change the data in the table, while others have to wait (you can remove the lock, but the integrity of the data is broken).
init(); foreach(list_tables() as $table){ process_table($table); } save_result();
init
is a function that creates a temporary table.
global $tempTableName,$conn; $tempTableName='"temp'.rand(0,32767).'"'; $q='CREATE TABLE '.$tempTableName.'( "val" character varying(255) NOT NULL, "count" bigint DEFAULT 1 NOT NULL, CONSTRAINT pk_temp_hash_val PRIMARY KEY("val") )'; cubrid_execute($conn,$q);
list_tables
is a function that gets a list of tables in the current database.
SHOW TABLES
, which, however, is not supported by version 8.3.1 (alas, there is little in the documentation about getting a list of tables in the database), so we all go to the Cubrid WebQuery project page and study the code. For the most impatient I immediately quote the required sql query:
select class_name as table_name from db_class where class_type='CLASS' and is_system_class='NO' and class_name!='results' order by a asc;
cubrid_schema
function, but there you also need to filter the type of classes and make additional. processing.process_tables
is the most interesting function in the entire solution, which makes statistics accordingly.
select * + setFetchSize/cubrid_unbuffered_query
select *
+ condition with a call to a stored function in java type:
where "int"<0 and counter("int") or length(translate("str",'0123456789',''))>0 and counter("str")
counter
saves the value in the database and returns false.
insert from select ... on dublicate key update "count"="count"+1
integer
type (or decimal
without decimals), it suffices to check for a character.
$aliases=array( "STRING"=>"VARCHAR", 'CHAR'=>'VARCHAR', "INT"=>'INTEGER', "SHORT"=>'INTEGER', 'SMALLINT'=>'INTEGER', 'BIGINT'=>'INTEGER', 'DECIMAL'=>'NUMERIC', 'DEC'=>'NUMERIC', "REAL"=>'FLOAT', "DOUBLE PRECISION"=>'DOUBLE', ); // column process criteria and/or format $handlers=array( 'VARCHAR'=>array( 'select'=>'cast(%s as varchar(255))', 'criteria'=>"length(translate(%s,'0123456789',''))>0", ), 'INTEGER'=>array( 'select'=>'cast(%s as varchar(255))', 'criteria'=>"%s<0" ), 'FLOAT'=>array( 'select'=>"to_char(%s,'9.999999EEee')", ), 'DOUBLE'=>array( 'select'=>"to_char([double],'9.9999999999999999EEee')", ), 'DATE'=>array( 'select'=>"TO_CHAR(%s,'YYYY-MM-DD')", ), 'TIME'=>array( 'select'=>"TO_CHAR(%s,'HH24:MI:SS')", ), 'DATETIME'=>array( 'select'=>"to_char(%s,'YYYY-MM-DD HH24:MI:SS.FF')", ), 'TIMESTAMP'=>array( 'select'=>"to_char(%s,'YYYY-MM-DD HH24:MI:SS')", ), 'DEFAULT'=>array( 'select'=>'cast(%s as varchar(255))', ) );
foreach(get_columns($q) as $column){ //echo "\tProcess column:".$column['column']."\n"; while(isset(self::$aliases[$column['type']])) $column['type']=self::$aliases[$column['type']]; // If column is decimal and has no precision then convert type to integer if($column['type']==='NUMERIC' && $column['scale']===0) $column['type']='INTEGER'; $criteria=(isset(self::$handlers[$column['type']])) ? self::$handlers[$column['type']]: self::$handlers["DEFAULT"]; $toSelect=(isset($criteria['select'])) ? $criteria['select'] : '%s'; // Depending of the column type build appropiate criteria $q='insert into '.$tempTableName.' ("hash","val") '. 'SELECT 1,'.$toSelect.' '. 'FROM `'.$qtable.'` '. 'where %s is not null and '.$criteria.' '. 'ON DUPLICATE KEY UPDATE "count"="count"+1'; $q=str_replace('%s','`'.$this->escape($column['column']).'`',$q); cubrid_execute($q); }
save_result
- saves the result and deletes the temporary table
'replace into results ("userid","most_duplicated_value","total_occurrences") select \''.cubrid_real_escape_string($userid).'\',val,"count" from "'.$tempTableName.'" order by "count" desc limit 1'
replace
since during testing the code is run many times, and in the case of the insert, an error will occur for the same set of verification data.Source: https://habr.com/ru/post/123727/