Original')
He took the liberty of translating an interesting article from The New York Times.
After graduating from Harvard with a degree in Archeology and Anthropology, Carrie Grimes studied the types of Maya settlements, marking on the map the places where artifacts were found. But then she was fascinated by what she called “all these mathematical and computer things,” which were part of her work.
“People think of archeology as what Indiana Jones did, but in fact most of the work is data analysis,” says Kerry.
Now Miss Grimes is engaged in “digging” of another kind. She works at Google, where she does statistical analysis of huge amounts of data in order to find ways to improve Google’s search engines.
Miss Grimes is a statistician of the Internet generation, one of those many who change the image of a profession that was previously considered a refuge for idle maths. Now statisticians are feeling an increasing demand for their services.
“I continue to argue that the most attractive profession in the next ten years will be statistics,” says Hal Varian, chief economist at Google. “And I'm not kidding!”
The growing status of statisticians who can earn $ 125,000 a year in leading companies immediately after receiving a doctoral degree is a consequence of the explosive growth in database volumes. Computational mathematics and the Internet create all new possibilities for data analysis - sensor data, recordings from surveillance cameras, correspondence in social networks and much more. The growth rate of digital data in the foreseeable future will not decrease, and by 2012 it will increase fivefold, according to a study by IDC.
Data is only the material from which knowledge is extracted. “We are rapidly moving towards a world where everything is measured and recorded,” says Eric Bryanjolfson, economist and director of the MIT Digital Business Center. “But the difficult question remains the ability of people to use, analyze and extract something meaningful from the data.”
The new generation of statisticians is energetically tackling this problem. They use powerful computers and complex mathematical models to search for interpretable models in large data stores. Applications are extremely diverse: from improving Internet search and online advertising to cancer treatment and optimizing food delivery.
Even the recently concluded Netflix contest, for winning one million dollars in reward to anyone who could significantly improve the system of movie recommendations to users, was a competition between the means of modern statistics.
But in spite of all this, statistics are only a small part of a multitude of experts who use statistics to analyze data. Computational and numerical methods are more important than it might seem. Therefore, new experts in data analysis come from areas such as economics, computer science and mathematics.
Data analysts are in great demand in the White House today. “Clean, reliable data is the first” step towards coordinating our long-term economic policy and key policy priorities, ”said Peter Orzag, director of the Office of Management and Budget in his May speech. Later that day, Mr. Orzag admitted on his blog that his speech about the meaning of statistics was “close to my (admittedly pedantic) heart.”
IBM, seeing the future in data analysis, created the Business Intelligence and Optimization Services division in April. This division will attract more than 200 mathematicians, statisticians and other analysts to research laboratories - but this is not enough. IBM plans to attract and retrain 4000 analysts from its employees.
Another indicator of increased activity in this area is about 6,400 people attending a professional statistical conference in Washington this week, instead of 5,400 in previous years, according to information from the American Statistical Association. The participants, men and women, young and already graying, looked like any other crowd of tourists in the capital. But their enthusiastic dialogues were devoted to chance, parameters, regression and clustering. Data research develops as a profession that has traditionally been less visible and lucrative, such as determining life insurance rates.
Ms. Grimes, in her 32 years, already received a degree in statistics at Stanford in 2003 and in the same year went to work at Google. She is now one of many statisticians in a group of 250 data analysts. She uses statistical modeling to help make search technology better.
For example: Miss Grimes is working on an algorithm that tunes the search robot. The model increased the likelihood that the robot will often check constantly updated pages and less frequently check non-updated ones.
The goal, according to Miss Grimes, is to gain a small benefit in the efficiency of the calculations. “Increasing efficiency by a percentage or two can have a huge effect if the operation is repeated millions and billions of times, as we have with Google,” adds Kerry.
A new world of research opens up thanks to the amount of data on the web. Traditionally, social sciences monitored behavior through interviews and surveys. “But the Network provides this wonderful opportunity to observe how millions of people behave,” says John Kleinberg, a social networking specialist at Cornell.
For example, in a just-published study, Kleinberg and two of his colleagues followed the flow of ideas on the web. They followed 1.6 million news sites and blogs during the 2008 presidential campaign, using algorithms that searched for and tracked phrases related to news.
Cornell researchers found that, in general, traditional methods lead, and blogs follow, usually with a two and a half hour lag. But some blogs were the fastest in quoting, which later became widespread.
Huge sources of data on the web, according to experts, are dangerous. Their volume can simply “crush” statistical models. Researchers warn that a strong correlation between the data does not always mean a causal relationship between them.
For example, in the late 40s of the twentieth century, before the polio vaccine was invented, health experts in America noticed that there were more cases with an increase in ice cream and soft drinks, according to
George Washington University historian and statistics. The removal of such delicacies from the menu was even recommended as a diet for poliomyelitis. Later it turned out that polio outbreaks were more frequent during the hot summer months, when people ate more ice cream.
The “explosion” of data attracts lengthy research in statistics, which also opens up new frontiers.
“The key to allowing computers to do what they are good at is finding in these data arrays what seems strange from the point of view of mathematics,” says Daniel Gruel, an IBM researcher, whose latest work is devoted to analyzing medical data to improve quality of service. “And what they do best for people is to interpret these anomalies.”