Approximately, 3 weeks ago I read in habre in this
topic about
Dapper -
ORM from one of the leading developers of the popular site
Stackoverflow . The name of this superhero is
Sam Saffron (hereinafter
referred to simply as Sam). In addition, before this topic appeared on Stackoverflow, it was known that it used
Linq-to-Sql . This is the main reason why I, like other developers, began to study the source code of Dapper. As it turned out, it is a little, or rather just one file. Carefully reviewing it, I wondered if it could be done even faster. It was not easy to accelerate Sam’s code, it was written too qualitatively. Next, I will describe my micro-optimizations in the form of advice to other developers. But first I want to warn some developers. This optimization accelerated Dapper by 5% and this is significant for a project like Stackoverflow, but it may not be significant for your project. Therefore,
always consider the option of macro-optimization (examples at the end of the topic) according to the results of profiling and resort to micro-optimization only in special cases.
Always use the minimum contract.
Strictly speaking, this only makes the code more qualitative and resistant to changes, but does not speed up its execution. Sometimes the desired contract is easy to determine, and sometimes not. For example, if there is no point in returning IList, if the rest of the code performs a simple iteration over the collection. Just return an IEnumerable. The choice in favor of this interface allowed Sam to use the return
yield construct in the next version:
public static IEnumerable<T> ExecuteMapperQuery<T>(this IDbConnection con, string sql, object param = null, SqlTransaction transaction = null) { using (var reader = GetReader(con, transaction, sql, GetParamInfo(param))) { var identity = new Identity(sql, con.ConnectionString, typeof(T)); var deserializer = GetDeserializer<T>(identity, reader); while (reader.Read()) { yield return deserializer(reader); } } }
As an unobvious choice, we note the
IDataReader interface. Sam often uses the
FieldCount property for objects that support this interface. Although, if you carefully examine the full hierarchy of interfaces, you can see that the FieldCount actually belongs to the
IDataRecord interface.
Consider deleting a contract.
This advice is a consequence of the previous one, so we will not dwell on it in detail. Sometimes the contract is so minimal that it can be safely removed. In the example below, instead of
IDbConnection, you could just pass a string:
private class Identity : IEquatable<Identity> { private readonly string connectionString; internal Identity(string sql, IDbConnection cnn, Type type) {
Learn to predict
Sounds a little weird, doesn't it? However, I am talking here about the definition of the logical behavior of the algorithm. Predictions are accurate and inaccurate. Consider inaccurate first Here you can not say with certainty how and what will be. For example, in Dapper the dictionary of the well-known types is filled. We know that when a certain amount is reached, the dictionary will need time to increase its size in case the user adds new elements. What prediction can we make in this case? Simple - recalculate all types and tell the dictionary how much memory we need right away. I got 35:
public static class SqlMapper { static readonly Dictionary<Type, DbType> typeMap; static SqlMapper() {
And what is the inaccuracy? The fact that this number is highly dependent on changes and when adding a new type, it will be invalid. Of course, nothing terrible will happen, but the code becomes more unreliable, and the prediction is wrong.
Accurate predictions are very good and you need to use them wherever possible. A vivid example of such a prediction is to replace the list with an array, when the number of elements is precisely known. The main reason is the same as for the dictionary. Namely, the redistribution of memory. Another significant reason is that the indexing operation in an array works much faster than calling the Add method on a list. This is clearly seen in the code generation example:
private static Func<object, List<ParamInfo>> CreateParamInfoGenerator(Type type) { DynamicMethod dm = new DynamicMethod("ParamInfo" + Guid.NewGuid().ToString(), typeof(List<ParamInfo>), new Type[] { typeof(object) }, true); var il = dm.GetILGenerator();
And for the array:
private static Func<object, IEnumerable<ParamInfo>> CreateParamInfoGenerator(Type type) { var dm = new DynamicMethod("ParamInfo" + Guid.NewGuid(), typeof(IEnumerable<ParamInfo>), new[] { typeof(object) }, true); var il = dm.GetILGenerator();
Reduce the use of reflection
The advice is obvious and everyone knows that reflection is an extremely slow mechanism. Sam is also known and he uses code generation to speed up work with the methods and properties of objects (in this case, I am against manual generation and I consider expression trees to be a worthy substitute). There is also the second generally accepted way of dealing with the costs of reflection - caching. In Dapper, a deserializer is created for each class. In the creation code you can find the line:
var getItem = typeof(IDataRecord).GetProperties(BindingFlags.Instance | BindingFlags.Public) .Where(p => p.GetIndexParameters().Any() && p.GetIndexParameters()[0].ParameterType == typeof(int)) .Select(p => p.GetGetMethod()).First();
Obviously, this information can be cached. Move getItem to the class level and initialize in a static constructor.
')
Double check circuit
Most often, programmers create closures (for those who have never met with them in C #, I recommend following this
link ) not consciously, and they pay them with unexpected errors (Sam also got caught!). However, closures can be used to accelerate. Example:
private static object GetDynamicDeserializer(IDataReader reader) { List<string> colNames = new List<string>(); for (int i = 0; i < reader.FieldCount; i++) { colNames.Add(reader.GetName(i)); } Func<IDataReader, ExpandoObject> rval = r => { IDictionary<string, object> row = new ExpandoObject(); int i = 0; foreach (var colName in colNames) { var tmp = r.GetValue(i); row[colName] = tmp == DBNull.Value ? null : tmp; i++; } return (ExpandoObject)row; }; return rval; }
As you can see in the lambda expression, a closure to the local variable colNames is created to speed up the getting of the column names. Theoretically, this can give a performance boost. After all, the name of the columns does not change when we iterate through all the entries in the IDataReader. Unfortunately, for example, the developers of
SqlDataReader also thought about it and stored the name of the columns in a similar array inside the class, so the following code will be similar to the previous one, but without the closure:
private static Func<IDataRecord, ExpandoObject> GetDynamicDeserializer() { return r => { IDictionary<string, object> row = new ExpandoObject(); for (var i = 0; i < r.FieldCount; i++) { var tmp = r.GetValue(i); row[r.GetName(i)] = tmp == DBNull.Value ? null : tmp; } return (ExpandoObject)row; }; }
Avoid multiple concatenation operations for strings.
Yes, I'm aware that every .Net developer knows that you need to use a
StringBuilder to build a string from several strings. But a few lines is how much? For two or three strings, creating a StringBuilder can be wasteful. Example:
private static IDbCommand SetupCommand(IDbConnection cnn, IDbTransaction tranaction, string sql, List<ParamInfo> paramInfo) {
We are interested in a string that is formed as "@" + info. Name + i. This is the
IDbCommand parameter
name . And for each such name three lines are created in memory. If the parameter we called text, the lines would look like this:
@ @text @text1
In principle, a little, but for 5 parameters we will have 15 lines. Time for StringBuilder? No, probably not. After analyzing the rest of the code, you can see that the "@" + info.Name construction is used very often, so we replace it with the variable infoName. So we will save on lines and additionally on reversing the property. As a result, for 5 parameters there are only 6 lines (one for infoName and one for each concatenation operation).
I could go on and talk about such trivial things as defining variables as close as possible to the place of their use or, on the contrary, moving them out of cycles, discarding unnecessary branches in an if-else statement, embedding short methods at the place of their use. But I'd rather talk about macro-optimizations. Sam is now working to speed up the addition of parameters to the IDbCommand. As an outside observer, I can advise you to pay attention to the reuse of commands and their preparation (this works fine for the
SqlCommand and
Prepare method).
Perhaps, when Dapper moves to from a working state to a release, I will make another review, but for now I will closely monitor this project and wish good luck to Sam.
PS: The author tried to fulfill his civic duty and sent a Pull Request to
GitHub , but, unfortunately, while the author was writing a topic for Habr, Sam developed Dapper and the request became irrelevant. However, the author wrote to Sam and he promised to take into account all the wishes in the release of Dapper.