Ideally, any site in the search results (SERP) is the answer to a certain need of the target audience, for the sake of which the information search process itself takes place. The Internet has made it very fast and easy to search for information, and also very simple to analyze the opinions of different people who lead blogs, write books (electronic versions), leave comments on social networks or forums, vote in polls.
In today's world, a huge number of people use search engines to solve a variety of problems, including problems of personal life. This is quite logical, because taking into account the experience of other people and the knowledge of professionals will help not only to avoid common mistakes, but also significantly increase the likelihood of choosing the right tactics and strategy. Unfortunately, this process is difficult to automate; however, this should not mean that we cannot partially automate the process of data collection and processing. This note is designed for beginner level. It will help the person who wants to increase the efficiency of information analysis to partially automate this process.
Evaluation of people's opinions
First of all, let me give you an example. Imagine that a certain young man is tormenting the search engine with the following key phrases: “How to get acquainted with a girl?”, “How to become confident?”, “What should I talk about with a girl?” And so on. In fact, he does not have sufficient knowledge of applied psychology to critically evaluate his convictions. Instead of understanding the cause of his problem and developing the necessary experience (the absence of which causes fears and uncertainty), such a person seeks to hide the symptoms. Any confidence is based on the achievement of results in significant areas and on the recognition of these results by other people. Figurative example: a novice car driver needs not to go on training to increase confidence, but to learn to drive (by studying theory and practice), then he will have more confidence when driving. And what prevents this collective image from establishing normal relations with a girl? Wrong beliefs (he thinks he correctly identified the cause of his problem) and the inability to critically evaluate them. Perhaps he is looking for in the search engine to confirm his opinion, not seeing real problems.
It is important to consider the opinion as a set of several beliefs. The most interesting thing is that a person can criticize the point of view of famous scientists (which has been tested by a mass of experiments and described in detail in books), but can easily believe in the most naive own beliefs (which my grandmother told me in childhood). The problem is that a person evaluates information by comparing with his own standards (often uncritically accepted). Therefore, in order to evaluate your opinion, you need to find out and consciously consider the opinion of authoritative sources. If the opinion is formed consciously, i.e. a person gets a higher education, good work experience, reads a huge amount of quality books and communicates with other professionals, he maximizes the likelihood of the correctness of his convictions on the relevant topic. But you can not be an expert in all areas, however, you can learn to use search engines more effectively.
')
Where to begin? It is not about the correct spelling of queries in a search engine, but about analyzing the results. If this is a significant question for you, then be sure to try to write down the most important thoughts. To do this, we will try to create an electronic summary with notes in the microblog format, which will be located on the local computer, as it is intended solely for personal use. Naturally, there may be many ways to implement such an auxiliary tool, but we will consider only a few of the possible options. We can try the Laravel framework, which allows us to solve our problem very simply and quickly.
The model will be as follows:
class Report extends Eloquent { protected $table = 'report'; }
And the controller is no more difficult:
class ReportController extends BaseController { public function index() { return View::make('report', ['report' => Report::paginate(5)]); } }
Yes, Laravel so easily allows you to do pagination. And here is the presentation:
@extends('main') @section('content') @forelse($report as $v) <h3>{{ $v->author }}</h3> <p>{{ $v->opinion }}</p> <p><em>{{ $v->url }}</em></p> @empty <p> </p> @endforelse {{ $report->links() }} {{ dd(DB::getQueryLog()) }} @stop
Do you think the proposed table structure (columns: id, author’s name, opinion, link to source, date) is quite convenient for analyzing statistics? It seems to me that such a format is very inconvenient for working with statistics, where a table with numbers is desirable. Let's try to develop a structure suitable for this task table. The surveyed opinion can have different degrees of reliability (authority of the source), rating (shared or denied by the author of the surveyed opinion, you can give a rating from 0 to 10), type of source (post on the forum, blog post, comment on the social network, official document). In addition, it will be useful to save a link to the source in the database, as well as a small comment (the most concisely stated essence). It would be possible to write a copy of the text, but this is up to you. Since opinions on different objects can be recorded in one table, I will add an object identifier (about which an opinion is collected). Thus, you will have a table with data for which it is more convenient to make a variety of reports.
The structure of our table can be as follows:
CREATE TABLE IF NOT EXISTS `opinions` ( `id` int(11) NOT NULL AUTO_INCREMENT, `url` varchar(255) NOT NULL, `description` varchar(255) NOT NULL, `type` int(11) NOT NULL, `credibility` int(11) NOT NULL, `object` int(11) NOT NULL, `rate` double NOT NULL, `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
Suppose we are interested to know the estimates of a certain fact about which we could collect an opinion (since the data type is double, then you can make a variety of gradations). But you need to know not the assessment of the opinions of all people, but only with a certain level of authority (a rough estimate of the level of reliability of the source). This table structure allows us to execute the following query:
SELECT COUNT(`rate`) AS 'count', MIN(`rate`) AS 'min', MAX(`rate`) AS 'max', SUM(`rate`) AS 'sum', AVG(`rate`) AS 'avg', STDDEV_POP(`rate`) AS 'stddev_pop', VARIANCE(`rate`) AS 'variance' FROM `opinions` WHERE `object` = 1 AND `credibility` IN (2,3);
I decided to remove the examples of the source code for processing statistics and the implementation of CRUD from this note, otherwise it would be similar to the story of Laravel. Friends, understand that a specific implementation is not important, since you could do it no worse with other frameworks, for example, Yii 2.0 in which “from birth” is automatic model generation and CRUD. Moreover, there are already Bootstrap there, which makes it very easy and quick to use ready-made styles to decorate your reports. If we add to the ChartJS or jqPlot template, we can very easily create beautiful charts. If you suddenly find it interesting to see my opinion on the implementation of the code for working with statistics, then I put on
GitHub examples (this is my opinion and it is not the best) implementations in PHP and Java.
Descriptive statistics
In the above query (to the MySQL database), we used descriptive statistics (maximum, minimum, average, sum, variance, standard deviation). This is a very frequently used approach in various analytical systems. For example, the Yandex.Metrica, Google Analytics and Piwik systems take into account general indicators (traffic, failures, average time on a site, average number of page views per session, percentage of repeat visits, etc.) and site-specific indicators (customized goals and developments). Some corporate systems may use an API to load data from the systems mentioned. Very often, not all data is needed, but only pre-processed results, for example, instead of a very large table with indicators of achieving the necessary goals (for the length of time), only the average (by day), maximum and minimum values ​​are transmitted.
Similarly, in all areas of life. It is necessary to try to evaluate everything meaningful for you in the form of accurate indicators of achieving goals. It is extremely important for a person to practice knowledge, to get feedback not only from specialists, but from circumstances in general. Attempting to accurately estimate (by concrete figures) the percentage of achievement of each goal helps to much better understand and critically assess the situation. This is very trite and everybody knows it, but try to start practicing. Just try, but consciously.
For these purposes, you can use Excel or write queries to the database using Hibernate (sometimes even Solr using SolrJ), but we will try to create a Java learning project in which we use The Apache Commons Mathematics Library. I automatically (using IDE) generated the DescriptiveStatisticsInfo POJO class based on the following properties:
private Double variance; private Double standardDeviation; private Double summ; private Double max; private Double min; private Double mean; private Long count;
Created an interface with one single method:
package kalinin.example.report; import java.util.List; import kalinin.example.help.statistics.DescriptiveStatisticsInfo; public interface IReport { DescriptiveStatisticsInfo statInfo(List<Double> in); }
For the factory, I need a listing, as long as it is:
package kalinin.example.report; public enum EReport { DESCRIPTIVE }
The factory itself turned out to be this:
package kalinin.example.report; public class ReportFactory { public IReport getReport(EReport reportType) { switch (reportType) { case DESCRIPTIVE: return new DescriptiveStatisticsReport(); default: return null; } } }
I use The Apache Commons Mathematics Library, the free and popular worldwide library:
package kalinin.example.report; import java.util.List; import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics; import kalinin.example.help.statistics.DescriptiveStatisticsInfo; public class DescriptiveStatisticsReport implements IReport { private DescriptiveStatistics stats = new DescriptiveStatistics(); public DescriptiveStatisticsInfo statInfo(List<Double> in) { this.stats.clear(); for (Double iStatistics : in) { this.stats.addValue(iStatistics); } DescriptiveStatisticsInfo stat = new DescriptiveStatisticsInfo( this.stats.getVariance(), this.stats.getStandardDeviation(), this.stats.getSum(), this.stats.getMax(), this.stats.getMin(), this.stats.getMean(), this.stats.getN() ); return stat; } }
In fact, we selected only the most significant data for decision making. Agree that if you unload all statistical data from a large number of sites, then analysts do not physically have enough time to read all this. A descriptive statistics helps us build a pivot table, based on which decisions will be made. We do not need to know the whole giant ArrayList, which contains certain numbers (Java does not store objects there, but references to them) in order to make a decision, therefore, we will not record all this data (which can be very, very much), but pass into the system the necessary descriptions of this finite set. Is this only for sites? To check the correctness of the work, I simply wrote down the result in a file using a simple class:
package kalinin.example.run; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.commons.io.FileUtils; import org.apache.log4j.Logger; import kalinin.example.help.statistics.DescriptiveStatisticsInfo; import kalinin.example.report.EReport; import kalinin.example.report.ReportFactory; public class Run { final static Logger logger = Logger.getLogger(Run.class); public static void main(String[] args) { List<Double> data = new ArrayList<Double>(); data.add(12.0); data.add(8.4); data.add(6.1); data.add(3.5); data.add(9.154); DescriptiveStatisticsInfo result = new ReportFactory().getReport(EReport.DESCRIPTIVE).statInfo(data); try { FileUtils.writeStringToFile(new File(Config.getConf("Config.fileName")), result.toString(), "UTF-8"); } catch (IOException e) { logger.error(e); } } }
There is also a heuristic algorithm for analyzing opinions. At first glance, everything is simple: there is a sign and its degree of reliability, which were determined empirically. But the most difficult to identify these signs and their weight. After collecting the data, we summarize the “weight of validity” for each matching feature. In this case, we need an empirically identified scale, according to which we will evaluate the reliability. As you understand, a single object can have any number (integer, more than zero) signs. As a result, we get a finite set (array), each element of which is the weight of the sign that matched during the check. If the sum of all elements of this finite set is greater than the empirically identified number, then the heuristic algorithm gives a positive response. However, heuristics are very difficult to apply, since you will have to know the signs, their weight and ways of detection very accurately.
Another banal and very simple fact: the human brain is so arranged that it requires not only obtaining information, but also attempts to systematize it in the form of an outline or a table. Next, you need to find confirmation and refutation of each meaningful opinion for you in this table. Understand that a person may not see very trivial things, and a conscious analysis with concrete numbers will make him pay attention to a lot of significant things. Of course, over the course of his life, a person is constantly evolving and tomorrow he may have a completely different opinion. Even this note will be perceived differently at each stage of development. It is very dangerous to be confident in everything and stop looking for quality information on important issues in all areas of life. The main thing is to start practicing with small steps (as training begins with light weights), not hoping to become a great bodybuilder in a week. And if you possess at least basic programming skills and work with databases, then the mentioned information analysis will be not only useful, but also fascinating.