Review of the most interesting materials on high performance (September 15-21, 2014)

I present to you the first issue of the review of the most interesting materials on high performance. When I was preparing the next issue of the review of the most interesting materials on data analysis and machine learning, I realized that the self-sufficient subject matter of the collected materials stands out. I hope that this type of review will also be useful and interesting. I will try to expand the list of resources for which I follow when preparing these reviews.

High Performance Materials

Using Apache Samza on LinkedIn
An article from LinkedIn blog about how they use Apache Samza in their application and how this product helped solve problems when working with data.
Who uses Hadoop and how
An interesting article about the current state of affairs in the Hadoop ecosystem: who uses it and how, as well as the prospects for development.
Upcoming meetings on Data Science in Moscow
In the near future, several interesting meetings are planned at once, so I decided to publish a small list of upcoming interesting meetings on the topic of data analysis and high performance in Moscow.
New type of aggregation in Elasticsearch
An article from the Elasticsearch blog about the new aggregation function top_hits, which was added to the large list of such functions in version 1.3.0.
New version of Apache Tez
A small article from the blog of Hortonworks about the capabilities of the new version of Apache Tez 0.5.
SQL queries to Hadoop using Apache Drill
A small article about Apache Drill, which allows you to work with Hadoop through the syntax of SQL queries.
Study of the impact of multiplayer load on Cloudera Impala
An article from the Cloudera blog, which shows the results of an interesting study conducted on the Cloudera Impala product under various load profiles.
Top 10 SlideShare Presentations on Data Science and Big Data
An article with a list of 10 presentations from the SlideShare website on Data Science and Big Data topics with the most views.
Disk Space Usage in MongoDB
A small article that will help you better understand how MongoDB's NoSQL database uses disk space.
Weak isolation is a serious problem.
Interesting thoughts about database isolation levels.
10 lessons from Microsoft Azure
Very interesting post, which gives 10 useful recommendations for the proper scaling of the application when using the Microsoft Azure cloud, based on their own experience.
Using Redis on Twitter
An interesting video in which Yao Yu talks about using Redis at Twiiter for scaling. And in the article on the link you can find excellent material based on the presentation.
KDD 2104: Google KV and Topic Modeling
The authors of the blog company URX share their impressions of the recently held in New York KDD 2014 conference, namely, they talk about the system called Google Knowledge Vault, which is actively used by Google to improve the quality of search, and they also talk about thematic modeling (Topic Modeling) .
Why did you choose AWS Route 53 in Loggly, not ELB?
An interesting article from the Loggly blog about why they chose Amazon Route 53 DNS, not AWS Elastic Load Balancing (ELB).
FireBox: building block for Warehouse-Scale Computers in 2020
FAST'14 conference video titled “FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers” in which Krste Asanović (University of California, Berkeley) presents his view on the future development of Warehouse Scale Computers (WSC).
About caching on @Scale
The authors of the blog company OpenDNS share their impressions of the @Scale conference organized by Facebook and talk about various modern approaches to caching, which were described at the conference.
Facebook has completely disabled one data center for fault tolerance
Jay Parikh from Facebook at the @Scale conference held in San Francisco told about an interesting experiment conducted on Facebook, namely about the complete disconnection of one of the data centers to check the overall resiliency of the system.
Apache Spark 1.1 Announcement
Announcement of the new version of Apache Spark 1.1 and a description of the main innovations.
Stream processing in Apache Spark 1.1
An article about new streaming processing capabilities in Apache Spark 1.1 and how to use this functionality.
Apache Spark 1.1 statistical calculations
A description of the advanced statistical computing features in Apache Spark 1.1.
Elasticsearch metrics
A small article from Compose's blog about Elasticsearch metrics.
News from the Apache Software Foundation Blog
A small list of the latest news from the Apache Software Foundation Blog.
Rackspace Weekly Digest
Weekly digest of interesting materials from the company Rackspace.
10 ways to work with Hadoop through SQL queries
10 tools and ways to work with Hadoop through SQL queries and a small description of each.
Review of the most interesting materials on Hadoop №87
The traditional digest of the most interesting materials on the topic of Hadoop for the week from the portal Hadoop Weekly.
174 open source drivers for MongoDB
A large set of 174 open source drivers for MongoDB NoSQL database for different programming languages.
What's new in RavenDB 3.0
Description of the features of the new version of the popular database RavenDB.
MongoDB and Elasticsearch synchronization
A small article about the service Transporter, which allows you to quickly synchronize MongoDB and ElasticSearch.
Introduction to HBase
An article containing video and explanatory material on HBase - data storage from the Hadoop ecosystem, and also on situations when this solution should be applied and when it is not.
Using OCRFile in Cascading and Apache Crunch
An example of using OCRFile for Cascading and Apache Crunch, which can improve the performance of their work.
Welcome to HadoopKitchen
Announcement of a meeting dedicated to Hadoop, which will be held in the office Mail.ru. I am also going to attend this event.
How to succeed in Big Data
A small article with infographics that tells about the main factors that influence the success of the company in the field of Big Data.
Vincent Granville about Big Data
Vincent Granville is the author of DataScienceCentral, gives his thoughts and gives the definition of Big Data.
5 key ideas for understanding Big Data
An interesting post from the portal Smart Data Collective, which tells 5 key points that will help derive benefit from the data most effectively.

Source: https://habr.com/ru/post/237581/

All Articles

Review of the most interesting materials on high performance (September 15-21, 2014)

High Performance Materials

More articles: