📜 ⬆️ ⬇️

We translate from Bencode to XML

Hello, Habr!
In this post I want to suggest trying to create an application that allows you to convert a file in bencode format to an XML file. YaP take C #.
(Yes, we will try to create a bicycle. Why is this necessary? If you are not able to solve typical tasks, non-standard ones will not work)

Bencode. As he is



So. Let's take a look at what “bencode” format implies:
It includes (supports) 4 data types.
  1. String bytes
  2. Integer
  3. List
  4. Vocabulary

As a delimiter, Bencode accepts ASCII characters and numbers.
How is this data contained?
In my opinion, simple and complex data types can be distinguished.
Simple types:
1) String byte - Before the string contains its length (number), then the colon (":") and, in fact, the string. For example: 5: hello or 12: hello, habr!
2) Integer - is written in the form: the symbol “i” (integer), number, symbol “e” (end). For example: i9e or i199e. Also, this type of "supports" negative numbers. Record we will sort on an example i-9e (-9)
')
Complex data types (composed of simple and complex types):
1) The list - (it's an array) - contains other bencode types that are written sequentially. Recording method - the symbol "l" (list), a description of data types, the symbol "e". Example: l1: I3: You2: Wee ["I", "You", "We"]
2) Dictionary - (associative array) - Contains key-value data. And in the form of a key it is necessary to have a string of bytes, and the data is sorted in lexicographical order by the "key" field. A dictionary is set as follows - the symbol “d” (dictionary), the key-value elements, the symbol “e”. For example d5: alpha4: beta10: filesCounti9ee ["alpha" - "beta", "filesCount" - 9]

Where is BEncode used?


Bencode is used in all of our favorite .torrent files. These files represent an associative array (dictionary).
We will not stop on the device .torrent files.

We organize data structures


In my opinion, it would be logical to start several classes for "unfolding on the shelves."
Therefore, we will create a class of simple elements (BItem) (it will process both a number and a string, then we will create a list class (BList) and then a class - a dictionary (BDictionary).
Since when processing the BEncode file, we assume that we do not know how the elements follow each other, we create a class (BElement) in which we encapsulate the methods for working with elements and all the classes of the list, the dictionary and the simple data. We get a composite class.
And the last 5th class will contain a list of elements. (FileBEncoding) (you can make one element, but take a more general case)
Graphically, it looks like this:


File read encoding



The class BItem contains


/// <summary> /// integer value /// start - 'i' /// end - 'e' /// Example - i145e => 145 /// string value /// start - length /// end - /// Example 5:hello => "hello" /// </summary> public class BItem { protected string strValue = ""; protected int intValue = 0; protected bool IsInt = true; public bool isInt { get { return IsInt; } } public BItem(string A) { strValue = A; IsInt = false; } public BItem(int A) { IsInt = true; intValue = A; } public string ToString() { if (IsInt) return intValue.ToString(); return strValue; } } 


The BList class contains


  /// <summary> /// List /// start - 'l' /// end - 'e' /// Example - l5:helloi145e => ("hello",145) /// </summary> public class BList { List<BElement> Items = null; public BList() { Items = new List<BElement>(); } public BElement this[int index] { get { if (Items.Count > index) { return Items[index]; } return new BElement(); } set { if (Items.Count > index) { Items[index] = value; } else { throw new Exception("   .   !"); } } } public int Count { get { return Items.Count; } } /// <summary> ///     /// </summary> public void Add(BElement inf) { Items.Add(inf); } } 


The BDictionary class contains

  /// <summary> /// Dictionary /// start - 'd' /// end - 'e' /// Example - d2:hi7:goodbyee => ("hi" => "goodbye") /// </summary> public class BDictionary { protected List<BElement> FirstItem = null; protected List<BElement> SecondItem = null; public BDictionary() { FirstItem = new List<BElement>(); SecondItem = new List<BElement>(); } public int Count{ get { return FirstItem.Count; } } /// <summary> ///  /// </summary> /// <param name="index"></param> /// <returns></returns> public BElement[] this[int index] { get{ if (FirstItem.Count > index) { BElement[] Items = new BElement[2]; Items[0] = FirstItem[index]; Items[1] = SecondItem[index]; return Items; } return new BElement[2]; } set{ if (FirstItem.Count > index) { FirstItem[index] = value[0]; SecondItem[index] = value[1]; } else { //FirstItem.Add(value[0]); // SecondItem.Add(value[1]); -    , ..    !!!!!     throw new Exception("   .   "); } } } /// <summary> ///    /// </summary> /// <param name="First"></param> /// <param name="Second"></param> public void Add(BElement First, BElement Second) { FirstItem.Add(First); SecondItem.Add(Second); } } 


We now turn to the "universal" class BElement.
The BElement class contains


  /// <summary> /// ""  /// </summary> public class BElement { public BItem STDItem = null; public BList LSTItem = null; public BDictionary DICItem = null; /// <summary> ///       string\integer /// </summary> /// <param name="Reader"> </param> /// <param name="CurrentCode">  </param> public void AddToBItem(StreamReader Reader, char CurrentCode) { char C; if (CurrentCode == 'i') {//  string Value= ""; C = (char)Reader.Read(); while (C != 'e') {// Value += C; C = (char)Reader.Read(); } try { int Res = Int32.Parse(Value); STDItem = new BItem(Res); } catch (Exception ex) { //   throw .     null' STDItem = null; } return; } int length = (int)CurrentCode - (int)'0'; C = (char)Reader.Read(); while (C != ':' && (C>='0' && C<='9')) { length = length * 10 + (int)C - (int)'0'; C = (char)Reader.Read(); } if (C!= ':') {//   (   ,   ,  throw new Exception("  "); //     throw     ...      =) STDItem = null; return; } string value = ""; for (int CurrentCount = 0; CurrentCount < length; CurrentCount++) { value += (char)Reader.Read(); } STDItem = new BItem(value); } /// <summary> /// . ,  l   /// </summary> /// <param name="Reader"> </param> public void AddToBList(StreamReader Reader) { LSTItem = new BList(); BElement Temp = GetNewBElement(Reader); while (Temp != null) { LSTItem.Add(Temp); Temp = GetNewBElement(Reader); } if (LSTItem.Count == 0) LSTItem = null;//  -        . } /// <summary> ///   /// </summary> /// <param name="Reader">  </param> public void AddToBDic(StreamReader Reader) { DICItem = new BDictionary(); BElement FirstTemp = GetNewBElement(Reader); BElement SecondTemp = GetNewBElement(Reader); while (FirstTemp != null || SecondTemp != null) { DICItem.Add(FirstTemp, SecondTemp); FirstTemp = GetNewBElement(Reader); SecondTemp = GetNewBElement(Reader); } if (DICItem.Count == 0) DICItem = null;//       } /// <summary> ///    .   /// </summary> /// <param name="Reader"> </param> /// <returns> </returns> public static BElement GetNewBElement(StreamReader Reader) { char C = (char)Reader.Read(); switch (C) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case 'i': {//   BElement STDElement = new BElement(); STDElement.AddToBItem(Reader, C); return STDElement; } case 'l': {// BElement LSTElement = new BElement(); LSTElement.AddToBList(Reader); return LSTElement; } case 'd': {// BElement DICElement = new BElement(); DICElement.AddToBDic(Reader); return DICElement; } default://("e") return null; } } } 


The last class, the "heart" of our structure - FileBEncoding
It will also implement the BEncode \ XML read algorithm.

To begin with, we implement reading.
Let this class contain:


  public class FileBEncoding { List<BElement> BenItems;//      ,   BElement BenItem BElement this[int index] { get { if (BenItems.Count > index) return BenItems[index]; return null; } set { if (BenItems.Count > index) { BenItems[index] = value; } else throw new Exception("  .   "); } } public FileBEncoding(string Path) { if (!File.Exists(Path)) return; BenItems = new List<BElement>(); StreamReader Reader = new StreamReader(Path, Encoding.ASCII); while (!Reader.EndOfStream) { BElement temp = BElement.GetNewBElement(Reader); if (temp != null) BenItems.Add(temp); } Reader.Close(); } 


Clearer line output



If you want to already see the output in xml, then this part can be skipped.
Here I would like to offer a "structured" output of information to the file \ console.
What do we need for this?
All classes in C # are derived from the Object class. This class has a ToString () method. By default, this method displays the name of the type. Override it.

  private string BElementToSTR(BElement CurrentElement, int TabCount, bool Ignore = true) { //     if (CurrentElement == null) return "";//    ,       . string Result = "";//  if (Ignore)//      PasteTab(ref Result, TabCount); if (CurrentElement.STDItem != null) { Result += CurrentElement.STDItem.ToString(); return Result; } if (CurrentElement.LSTItem != null) {//  Result += "List{\n"; for (int i = 0; i < CurrentElement.LSTItem.Count; i++) Result += BElementToSTR(CurrentElement.LSTItem[i], TabCount + 1) + '\n'; PasteTab(ref Result, TabCount); Result += "}List\n"; return Result; } if (CurrentElement.DICItem != null) {//  Result += "Dict{\n"; for (int i = 0; i < CurrentElement.DICItem.Count; i++) { Result += BElementToSTR(CurrentElement.DICItem[i][0], TabCount + 1) +" => "+ BElementToSTR(CurrentElement.DICItem[i][1], TabCount+1,false) + '\n'; } PasteTab(ref Result, TabCount); Result += "}Dict\n"; return Result; } return "";//   null,     } private string PasteTab(ref string STR,int count) {// for (int i = 0; i < count; i++) STR += '\t'; return STR; } public string ToString() { string Result = ""; for (int i = 0; i < BenItems.Count; i++) { Result += BElementToSTR(BenItems[i], 0) + "\n\n"; } return Result; } 


Here we used an additional 2 functions. The first is used to process a single BElement (you can see that it is recursive), the second is to create indents.

Create an XML file



Before we start coding, I would like to say a few words about the XML language itself.
This language has become very widespread as a single language of info-exchange. C # (or rather the .NET platform) has excellent XML support. For convenient work, we use embedded tools by connecting the System.XML namespace.

There are a number of ways to create an XML document. My choice fell on the XmlWriter class. This class allows you to create an object "from scratch". And there is a record of each element and attribute in order. The main advantage of this method is high speed.

To create a document, we define two methods.
void ToXMLFile (string path) and void BElementToXML (BElement Current, XmlWriter Writer, int order = 0)
The first method will create an object of the required XmlWriter class and call BElementToXML for each item in the list.
The second method deals with "unwinding" BElement (if it is a list / dictionary) and, in fact, generating a file.

  private void BElementToXML(BElement Current, XmlWriter Writer, int order = 0) { if (Current == null) return;//    if (Current.STDItem != null) {//     Writer.WriteAttributeString("STDType"+'_'+order.ToString(), Current.STDItem.ToString()); return; } if (Current.LSTItem != null) {// Writer.WriteStartElement("List");//    <List> for (int i = 0; i < Current.LSTItem.Count; i++) BElementToXML(Current.LSTItem[i],Writer,order);//  Writer.WriteEndElement();//  </List> return; } if (Current.DICItem != null) {// ( ) Writer.WriteStartElement("Dictionary"); for (int i = 0; i < Current.DICItem.Count; i++) { Writer.WriteStartElement("Dictionary_Items"); BElementToXML(Current.DICItem[i][0], Writer,order); BElementToXML(Current.DICItem[i][1], Writer,order+1); Writer.WriteEndElement(); } Writer.WriteEndElement(); return; } return; } public void ToXMLFile(string path) { using (XmlTextWriter XMLwr = new XmlTextWriter(path, System.Text.Encoding.Unicode)) { XMLwr.Formatting = Formatting.Indented; XMLwr.WriteStartElement("Bencode_to_XML"); foreach (BElement X in BenItems) { XMLwr.WriteStartElement("BenItem"); BElementToXML(X, XMLwr); XMLwr.WriteEndElement(); } XMLwr.WriteEndElement(); } } 


As a result, we obtain more convenient ways of perceiving a BEncode file:
(For example, using the torrent file ubuntu. SHA keys deleted.)

Console: (ToString () method)


Excel: (xml)


Conclusion



In this short article - a lesson we learned: to process BEncode files, to form an XML file.
And creating a file from BEncode - XML ​​can be useful, for example, for writing the editor of BEncode files. Here is the code put together.

Literature

Source: https://habr.com/ru/post/147990/


All Articles