Parse and calculate MS Excel formulas

One of the most interesting tasks that we had to face in the process of working on the Spreadsheet component was the mechanism for calculating formulas. Working on it, we thoroughly delved into the mechanics of the functioning of a similar mechanism in MS Excel.

And today I want to tell you about the principles of its work, tricks and pitfalls. And in order not to be reduced to dry retellings of documentation diluted with add-ons “from life” - at the same time I will briefly describe how we implemented such a mechanism.

So, in this article we will talk about the three main parts of the classical calculator of formulas - expression analysis, storage and calculation.

Internal expression

The expression in Excel is stored in reverse polish notation , RPN . An expression in RPN form is a simple array whose elements are called ParsedThing.
')
The complete set of ParsedThing consists of the following elements:

Operands - constants, arrays, references;

Constants:

ParsedThingNumeric
ParsedThingInt
ParsedThingString
ParsedThingBool
ParsedThingMissingArg
ParsedThingError

Arrays:

ParsedThingArray

References:

ParsedThingName, ParsedThingNameX
ParsedThingArea, ParsedThingAreaErr, ParsedThingArea3d, ParsedThingAreaErr3d, ParsedThingAreaN, ParsedThingArea3dRel
ParsedThingRef, ParsedThingRefErr, ParsedThingRef3d, ParsedThingErr3d, ParsedThingRefRel, ParsedThingRef3dRel
ParsedThingTable, ParsedThingTableExt

Operators - mathematical, logical, reference, as well as function calls;

Function calls:

ParsedThingFunc
ParsedThingFuncVar

Binary operators:

ParsedThingAdd
ParsedThingSubtract
ParsedThingMultiply
ParsedThingDivide
ParsedThingPower
ParsedThingConcat
ParsedThingLess
ParsedThingLessEqual
ParsedThingEqual
ParsedThingGreaterEqual
ParsedThingGreater
ParsedThingNotEqual
ParsedThingIntersect
ParsedThingUnion
ParsedThingRange

Unary operators:

ParsedThingUnaryPlus
ParsedThingUnaryMinus
ParsedThingPercent

Auxiliary elements and attributes (to optimize the speed of calculations, save spaces and line breaks in the expression, ie, do not affect the result of the calculation of the expression).

Auxiliary:

ParsedThingMemArea
ParsedThingMemNoMem
ParsedThingMemErr
ParsedThingMemFunc
ParsedThingParentheses

Attributes:

ParsedThingAttrSemi
ParsedThingAttrIf
ParsedThingAttrChoose
ParsedThingAttrGoto
ParsedThingAttrSum
ParsedThingAttrSpace

I will give a couple of examples.

"= A1 * (1 + true)". In the internal view it will look like this: {ParsedThingRef (A1), ParsedThingInt (1), ParsedThingBool (true), ParsedThingAdd, ParsedThingMultiply}
"= SUM (A1,1," 2 ",)". In the internal view it will look like this: {ParsedThingRef (A1), ParsedThingInt (1), ParsedThingString (“2”), ParsedThingMissing, ParsedThingFuncVar (“SUM”, 4 arguments)}

Calculations

Calculation of the expression written in reverse Polish notation is performed using the stack machine. A good example is given in Wikipedia.

But in the calculation of expressions from Excel was not without tricks. The developers endowed all the operands with the "value type" property. This property indicates how the operand must be converted before evaluating an operator or function. For example, ordinary mathematical operators cannot be performed on links, but can only on simple values (numeric, logical, etc.). In order for the expression “A1 + B1: C1” to work correctly, Excel indicates for references A1 and B1: C1 that they must be converted to a simple value before putting the result of the calculation on the stack.

There are three types of operands:

Reference;
Value;
Array.

Each operand is of type “default”:

All kinds of links	Reference
Constants except arrays	Value
Arrays	Array
Function calls	Value, Reference, Array

The result of the function calculation can be of any type. Most functions return a Value, some (for example, INDIRECT) are a Reference, the rest are Array (MMULT).

End users do not need to fill their heads with data types: Excel itself selects the desired type of operand already at the stage of expression analysis. And at the computation stage, one cannot do without “implicit type conversion”. It occurs according to the following scheme:

The value of the Value type can be converted to Array , in this case an array of one value will be created. In the opposite direction ( Array-> Value ) the transformation is also quite simple - the first element is taken from the array.

As can be seen from the diagram, a value of type Reference cannot be obtained from Value or Array . It is quite logical, from number, line, etc. get the link does not work.

When converting Reference to Array, all values from cells in the range are overwritten into an array. In the case when the range is complex (consisting of two or more other ranges) - the result of the conversion is equal to the error #VALUE!

The conversion of Reference to Value occurs in an interesting way. Between ourselves, this rule we called "Crossing." The easiest way to explain its essence is an example:

Suppose we want to bring to Value the values of the cells from A1 to A4, which contain the same formula “= B2: B3”, having the type Reference. The range of B2: B3 consists of one column. If this were not the case and there would be more columns, converting the Reference to Value for all cells A1 to A4 would return #VALUE! and that would be the end. Cells A2 and A3 are located in rows intersecting the range B2: B3. Conversion Reference-> Value for these cells will return the corresponding value from the range B2: B3, i.e. the conversion for A2 returns 1, and for A3 returns 2. For the remaining cells, such as A1 and A4, the conversion returns #VALUE!

The same behavior will be for the range of B7: C7, consisting of one line. For cells B6 and C6, the conversion returns the values 3 and 4, respectively, and for A6 and - D6 #VALUE! Similarly, if there were more lines in the range, the conversion would return #VALUE! for all cells from A6 to D6

There are several types conversion rules.

The values of all formulas inside cells are always reduced to the type Value.

For example:

"= 123" In this formula, a constant is specified, it is already of type Value. Nothing to convert.
"= {1,2,3}" An array is given here. Conversion to Value by the rule gives us the first element of the array - 1. It will be the result of evaluating the expression.
The formula "= A1: B1" located in cell B2. The operand link to the default range is of type Reference. In the calculation, it will be reduced to Value by the "crossing" rule. The result in this case is the value from cell B1.

Mathematical, logical and text operators can not work with links . Therefore, the arguments for them are prepared and cast to either Value or Array. The second option is only possible inside Array formulas. For example, when calculating the expression "= B1: B2 + A3: B3" written in cell A1, both arguments of the mathematical addition operator will first be converted to the Value type according to the "Crossing" rule, and then the results will be added. Those. the value will be equal to the sum of the values of the cells B1 and A3.

Reference operators cannot work with any other type than Reference . For example, the formula "= A1:" test "" will be incorrect, entering such a formula will lead to an error - Excel simply will not allow such a formula to be written into a cell.

Expressions inside “names” and some other constructions are converted to the “default” type . Unlike formulas inside cells, expressions in which are reduced to the type Value. The expression inside some “name” name "= A1: B1" as a result of the calculation will be equal to the range A1: B1. The same expression in the cell will be calculated and as a result there will be either one value or the error #VALUE! But the expression in the cell "= name" will already have the type Value and will be calculated depending on the current cell.

Parser

Having written the first version of the parser on the knee, we realized that the monster is too large and poorly amenable to modernization. And in our case, it was inevitable, since we learned a large number of subtleties already when the parser worked at the very least. For the sake of interest I decided to try other methods and armed myself for this with the Coco / R translator generator. The choice for him at that moment fell mainly due to the fact that I was already well acquainted with him. Coco / R met my expectations. The parser generated by him showed quite good results in terms of speed of work, so we decided to stop at this option.

Of course, in this article I will not dwell on the description of the features and the retelling of the Coco / R documentation. Fortunately, the documentation is written in my opinion very understandable. In addition, I recommend to read the article on the site .

We collected Coco / R from source codes

In some places, Coco / R does not generate a CLS-compliant code.

The problem is in public constants whose name begins with an underscore. The only way out of the situation is to correct Coco / R, since the full source code is available on the developers website.

Having rummaged in source codes found three places where public constants are created. All of them in the ParserGen.cs file. For example:

void GenTokens() { foreach (Symbol sym in tab.terminals) { if (Char.IsLetter(sym.name[0])) gen.WriteLine("\tpublic const int _{0} = {1};", sym.name, sym.n); } }

Further, the resulting code continues to be invalid, now according to FxCop . In our company, assemblies are constantly being tested for compliance with a large number of rules. Of course, since the code was generated, it would be possible to make an exception for it and suppress the verification of the generated classes. But this is not the best way. Fortunately, there is only one problem - the public fields do not comply with the Microsoft.Design rule : CA1051 . To fix everything, it is enough to make the necessary changes to the Parser.frame and Scanner.frame files, which are located next to the grammar file. That is, Coco / R itself is not necessary to rebuild. Here are some examples:

 public Scanner scanner; public Errors errors; public Token t; // last recognized token public Token la; // lookahead token

Some of these fields are not used at all outside the class — we simply make them private, the rest — we create public properties.

When developing grammar for Coco / R, I used a studio plugin .

His buns

Syntax highlighting for a grammar file;
Automatic generator start when saving a file with grammar;
Intellisense for keywords;
Shows compilation errors that occur in the parser file in the appropriate place in the grammar file

The last feature would be very convenient, but, unfortunately, it worked for me incorrectly - it showed errors not where necessary.

The plugin also had to be taught to generate CLS compliant code. Download the source code of the plugin , and repeat the same operations as with Coco / R itself.

We modernize the scanner and parser

I recall that to parse the expression Coco / R creates a pair of classes - Parser and Scanner. Both are re-created for each new expression. Since in our case there are many expressions, the re-creation of the scanner may take additional time on a large number of calls. In general, one parser scanner is enough for us. The first upgrade affected exactly this.

The second upgrade touched the auxiliary class Buffer, which is created by the scanner to read the incoming character stream. Out of the box, Coco / R contains a pair of Buffer and UTF8Buffer implementations. Both of them work with the stream. We do not need a thread: enough work with the string. To do this, create a third implementation of the StringBuffer, highlighting the IBuffer interface along the way:

 public interface IBuffer { string GetString(int beg, int end); int Peek(); int Pos { get; set; } int Read(); }

The implementation of StringBuffer itself is simple:

 public class StringBuffer : IBuffer { int stringLen; int bufPos; string str; public StringBuffer(string str) { stringLen = str.Length; this.str = str; if (stringLen > 0) Pos = 0; else bufPos = 0; } public int Read() { if (bufPos < stringLen) return str[bufPos++]; else return StreamBuffer.EOF; } public int Peek() { int curPos = Pos; int ch = Read(); Pos = curPos; return ch; } public string GetString(int beg, int end) { return str.Substring(beg, end - beg); } public int Pos { get { return bufPos; } set { if (value < 0 || value > stringLen) throw new FatalError("buffer out of bounds access, position: " + value); bufPos = value; } } }

We are testing

Just in case, we check that we created an additional class for a reason. We start testing three scenarios for a string of length N:

initialization from string;
reading a character (calling the IBuffer.Read () method N times);
get 10 characters from a string (call IBuffer.GetString (i-10, i) (N-10) times).

When N = 100:
Init x 100000:
Buffer: 171 ms
StringBuffer: 2ms
Read xNx10000:
Buffer: 14 ms
StringBuffer: 8ms
GetString x (N-10) x 10000:
Buffer: 250 ms
StringBuffer: 20 ms

Grammar Development

The grammar for Coco / R is described in the EBNF . The development of grammar for Coco / R is reduced to the construction of the RBNF and its design in accordance with the grammar of Coco / R in a file with the extension atg.

The parser is based on a recursive descent, the grammar must satisfy LL (k). The scanner is based on a deterministic finite state machine .

So let's get started. The first in the grammar file is the name of the future compiler:

 COMPILER FormulaParserGrammar

Next, the scanner specification must follow. The scanner will be case-insensitive, we indicate this with the help of the keyword IGNORECASE. Now you need to decide on the characters. We need to separate the numbers, letters, control characters. It turned out the following:

 CHARACTERS digit = "0123456789". chars = "~!@#$%^&*()_-+={[]}|\\:;\"',./?<> ". eol = '\r'. blank = ' '. letter = ANY - digit - chars - eol - blank + '_'.

Coco / R allows you to not only add sets of characters, but also to subtract. So, in the letter description, the keyword ANY is used, which substitutes the entire set of characters from which the other sets defined above are subtracted.

An important task of the scanner is to identify the tokens in the input sequence. While working on the parser, the set of tokens was constantly changing and as a result it looks like this:

 TOKENS ident = letter {letter | digit | '.'}. wideident = letter {letter | digit} ('?'|'\\') {letter | digit | '?'|'\\'}. positiveinumber = digit {digit}. fnumber = "." digit {digit} [("e" | "E") ["+" | "-"] digit {digit}] | digit {digit} ( "." digit {digit} [("e" | "E" ) ["+" | "-"] digit {digit} ] | ("e" | "E") ["+" | "-"] digit {digit} ). space = blank. quotedOpenBracket = "'[". quotedSymbol = "''" | "']" | "'@" | "'#". pathPart = ":\\". trueConstant = "TRUE". falseConstant = "FALSE".

Please note that an identifier may contain one or several dot characters. Such, for example, may be the name of the sheet in the link to the range. An additional, extended identifier is also required. It differs from the usual presence of a question mark or bekslesha. I note that in Excel the concept of an identifier is quite complex and difficult to describe in grammar. Instead, I check all lines identified by the scanner as ident and wideident already in the code for compliance with the following rules:

It can contain only letters, numbers, and symbols: _,., \,?;
Cannot be TRUE or FALSE;
The first character can only be a letter, underscore, or beksleshem;
If the first character of the string is Bexlesh, then the second character may not be, or it must be one of: _,., \,?;
Must not be similar to the name of the range (for example, A10);
Should not begin on a line that can be interpreted as a reference in the format R1C1. The nature of this condition is difficult to explain; I will give only a few examples of identifiers that do not satisfy him: “R1_test”, “R1test”, “RC1test”, “R”, “C”. At the same time, “RCtest” is quite suitable.

Selection quotedOpenBracket, quotedSymbol and pathPart to a separate token is nothing more than a trick. It allowed to skip characters in the column names in the table link, before which the apostrophe should follow. For example, in the expression “= Table1 [Column '[1']]”, the column name starts after the '[' character and continues to the ']' character. At the same time, the first such symbol, together with the apostrophe preceding it, will be read by the scanner as a quotedSymbol (']) terminal and, thus, reading the column name will not stop there.

Finally, let the scanner skip line feeds and tabs.
IGNORE eol + '\ n' + '\ t'. Expressions themselves can be written in several lines, but this does not affect the grammar.

We have come to the main section of PRODUCTIONS. It is necessary to describe all non-terminals. In the examples below, I will provide a simplified code, I cut out most semantic inserts, since they will make it difficult to understand grammar. I will leave only the places of construction of the expression itself, without optimizations.

For all non-terminals (of which Coco / R will make methods), a reference to the expression in the RPN form will be transmitted, as well as the data type to which it should be converted. When calling a parser for a formula inside a cell, the initial data type is Value. Further, during parsing, it will change, and the prepared type will be transferred to the branches of the parse tree. For example, when parsing the expression “= OFFSET (A1: B1, A1, A2)”, the element of the Polish notation — the OFFSET function — will receive the Value type, while parsing the first argument, it will be cast to Reference, the other two to Value. For all functions, we store information, which arguments and which types should be passed to it.

The task of the parser is also to check the formula for correctness. The formula will be considered incorrect if Excel does not allow to write it in a cell. In addition to syntax errors, an incorrect number of arguments passed to a function or a data type mismatch to the requested one can make the formula incorrect. For example, the ROW function either needs no parameters at all or only one, and it should be exclusively Reference. We have already said that it is impossible to bring any other type to the Reference, which means that the expressions “= ROW (1)”, “= ROW (“ A1 ”)” will be invalid.

Since our grammar is a sophisticated grammar for ordinary mathematical expressions, it is also constructed depending on the precedence of operations. The first are the logical expressions, the last are the operators for working with ranges. Those. in order of increasing priority.

To visualize RBNF I use a small program EBNF Visualizer . This is how the first non-terminal in our grammar will look like - a logical expression:

Further grammar for Coco / R. In the semantic inserts between “(..” And “.)” I add the desired ParsedThing to the expression.

 LogicalClause<OperandDataType dataType, ParsedExpression expression> (. IParsedThing thing = null; .) = ConcatenateClause<dataType, expression> { ( '<' (.thing = ParsedThingLess.Instance; .) | '>' (.thing = ParsedThingGreater.Instance; .) | "<=" (.thing = ParsedThingLessEqual.Instance; .) | ">=" (.thing = ParsedThingGreaterEqual.Instance; .) | "<>" (.thing = ParsedThingNotEqual.Instance; .) | '=' (.thing = ParsedThingEqual.Instance; .) ) ConcatenateClause<dataType, expression> (. expression.Add(thing); .) } .

According to this principle will be built: ConcatenateClause, AddClause, MultipyClause, PowerClause, UnaryClause, PercentClause, RangeUnionClause, RangeIntersectionClause, CellRangeClause. On the CellRangeClause, the nonterminals describing the operators end. It is followed by the first operand - OperandCommonClause. It will look something like this:

However, there is ambiguity in the above grammar. It lies in the fact that SheetName and OperandClause can begin with the same terminal - with an identifier. For example, the expression “= Sheet! A1” may follow, or it may be “= name”. Here “Sheet” and “name” are identifiers. Fortunately, Coco / R allows you to resolve conflicts by viewing the incoming stream with a scanner several terminals ahead. Those. we can look for the character '!', if one is found, then we parse the SheetName, otherwise OperandClause. This is how the grammar will look like:

 OperandCommonClause<OperandDataType dataType, ParsedExpression expression> = ( IF(IsSheetDefinition()) ( ( SheetNameQuoted<sheetDefinitionContext> | [ '[' FileDefinitionSimple ] [ SheetName<out sheetName> [ ':' SheetName<out sheetName> | '|' SingleQuotedIdent<out ddeTopic> ] ] ) '!' CellReferenceClause<dataType, expression> ) | OperandClause<dataType, expression> ) .

The IsSheetDefinition () method defined in the Parser class is used to resolve the conflict. Such methods are conveniently written in a separate file, marking the class as partial.

The SheetName terminal can start with a digit or consist of numbers only. In this case, the name of the sheet must be enclosed in apostrophes. Otherwise, Excel adds the missing extras.

 SheetName<out string sheetName> (. int sheetNameStart = la.pos;.) = ( [positiveinumber | fnumber] [ident] ) (. sheetName = scanner.Buffer.GetString(sheetNameStart, la.pos); .) .

There are a couple of mistakes in this grammar. First, all terminals inside the non-terminal are optional. And secondly, it allows you to insert a space or any character described in the IGNORE section between the number and the identifier. The problem can be solved with the help of the conflict resolution mechanism, but without additional functions. Directly in the grammar we check for the presence of a gap between the number and the identifier following it:

 SheetName<out string sheetName> (. int sheetNameStart = la.pos;.) = ( positiveinumber | fnumber [ IF(la.pos - t.pos == t.val.Length) ident ] | ident ) (. sheetName = scanner.Buffer.GetString(sheetNameStart, la.pos); .) .

In OperandClause we will get from OperandCommonClause if there is no link to a worksheet, external book or DDE source. From this non-terminal, we can get into ArrayClause, StringConstant (both cannot have a link to the list in front of them), CellReferenceClause, or we can see a bracket and go to the beginning of the entire parse tree - to LogicalClause.

 OperandClause<OperandDataType dataType, ParsedExpression expression> = ( CellReferenceClause<dataType, expression> | ArrayClause<dataType, expression> | StringConstant<expression, dataType> ) | '(' CommonCellReference<dataType, expression> ')' (. expression.Add(ParsedThingParentheses.Instance);.) .

CellReferenceClause is probably the largest non-terminal, it contains almost all types of operands:

 CellReferenceClause<OperandDataType dataType, ParsedExpression expression> = ( IF (IsTableDefinition()) TableReferenceExpressionClause<dataType, expression> | IF (IsFunctionDefinition()) FunctionClause<dataType, expression> | IF (IsDefinedNameDefinition()) DefinedNameClause<dataType, expression> | IF(IsRCCellPosition()) CellPositionRCClause<dataType, expression> | IF(IsA1CellPosition()) CellPositionA1Clause<dataType, expression> | CellError<dataType, expression> | TermNumber<expression> | BoolConstant<expression, dataType> | wideident (. expression.Add(new ParsedThingName(t.val);.) ) .

For most non-terminals, you need to create conflict resolution methods. Then we describe the grammar for all the remaining non-terminals.

Parsing incomplete expressions and "predictions"

Consider the task of highlighting the ranges involved in the formula.

The problem here is that the expression may be invalid, and you must still highlight the ranges. At that time, the parser, without mastering the expression, always returned null. Of course, it would be possible to try to figure out a partially read expression and gather links from it, but how do you know where in the expression is a specific link? Therefore, we decided to teach the parser to immediately put everything interesting in places as it reads. So, all links by type were in specially designed collections. The parser when detecting, for example, a reference to a range, calls the appropriate method

 void RegisterCellRange(CellRange range, int sheetDefinitionIndex, int position, int length)

After reading, regardless of whether it was successfully completed or not, we have a set of links.

Another mechanism began to be based on this - predictions. In the expression “= 1 * (1 + 2” the balance of brackets is broken, but, with high probability, the user forgot to put the bracket exactly at the end of the expression. That is, you can try to correct this formula by adding the missing bracket to it. Of course, the parser will do this it will not, he will only say where and what he thinks is missing. For example, the following lines appeared in OperandClause already familiar to us:

 '(' CommonCellReference<dataType, expression> (. if(la.val != ")") parserContext.RegisterSuggestion(new FunctionCloseBracketSuggestion(la.pos)); .) ')'

Of course, all these actions are done only when the expression is entered manually and there is a need to draw the involved ranges or to try to correct the unfinished formula.

Computation Performance Optimization

Excel, still at the stage of parsing the expression, tries to make life easier for itself. To this end, he, if possible, fills the expression with all sorts of auxiliary elements. Let's look at them.

AttrSemi attribute. This attribute is added to the first element in those expressions that contain volatile functions.

Attribute class Mem. This includes several attributes at once. They are united by the fact that they are designed to optimize the calculation of links. In essence, they are a wrapper over some expression. During the evaluation, the inner expression may not be calculated, the result will be Mem for it. A distinctive feature of these elements is that they are inserted in the reverse Polish record before the expression that is optimized.

ParsedThingMemFunc - indicates that the expression inside must be evaluated every time and no result can be cached. For example, the whole expression = INDIRECT ("A1"): B1 will be wrapped in MemFunc, since The INDIRECT function is a volatile function and, at the next calculation, may return a different value.
ParsedThingMemArea. It contains an expression whose value has already been calculated and will not change. This value will remain inside the attribute and at the next calculation it will be added to the stack, and the internal expression will not be calculated at all.
ParsedThingMemErr. It contains an expression whose value is calculated, will not change and is equal to an error.
ParsedThingMemNoMem. When calculating an expression inside Excel, there is a shortage of memory. In practice, I have never met this.

The AttrSum attribute is used as a simplified form of writing the SUM function in the case when only one argument is passed to the function.

The AttrIf attribute is used in conjunction with one or two Goto operators to optimize the calculation of the IF function. Let me remind the syntax of the IF function: IF (condition, true_value, [false_value]). Of the two values, you can calculate only one and save time on the calculation of the other, if you immediately go to the desired value after the condition is calculated. Thus, the simple expression = IF (condition, ”v_true”, ”v_false”) Excel dilutes with dense attributes. It turns out about the following:

The calculation goes like this. The condition value is pushed onto the stack. The next in line is the IF attribute. He looks at the value at the top of the stack. If it is true, it does nothing. If false, adds the current item counter in the expression to the offset written inside, thereby starting the counter to point to “v_false”. Next, either “v_true” or “v_false” is calculated and the result is pushed onto the stack. Next is Goto, first or second. But both of them refer to the end of the expression (or the following operators in the expression, if any).

AttrChoose works very similarly. Let me remind you that the CHOOSE function selects one of the arguments, the sequence number of which is specified in the first argument.
Thus, only one argument affects the further result of the calculation, all the others can be skipped. AttrChoose stores a set of blends, each of which points to the beginning of each next argument. The argument is followed by the familiar AttrGoto, which points to the end of the expression or the element following the CHOOSE function.

The Space attribute is used to preserve the format of a user-entered expression, namely, string and space transfers. Each such attribute except quantity has type. There are only six types:

SpaceBeforeBaseExpression,
CarriageReturnBeforeBaseExpression,
SpaceBeforeOpenParentheses,
CarriageReturnBeforeOpenParentheses,
SpaceBeforeCloseParentheses,
CarriageReturnBeforeCloseParentheses,
SpaceBeforeExpression;

Looking at these types, you can guess that Excel cannot save spaces at the end of the line and before the “=” sign. In addition, spaces within structured references and arrays will not be saved either.

Testing

When our control just learned to read and write files in the OpenXML format, we were quick to check it out. And the best way to do this is to find many, many files and try to drive reading and writing these files. So we pumped about 10k random OpenXML files and wrote a simple program. We created a task for her on the test farm. Every night, the task automatically starts and reads and writes files. If any errors occur, all the necessary information is recorded in the log. So we were able to debug a huge number of errors.

With the development of controls, both supported formats and features were added. So now 20k xls files and 15k csv files are constantly being tested. -, , .

Excel , 10 OpenXML 20 xls . , . , , . – , Excel.

, , – . , . , , .

Thanks for attention!

Source: https://habr.com/ru/post/222523/

All Articles