Import large dbf files into Ms SQL server 2008 using SqlBulkCopy

In this article I will tell you how to load many huge dbf files consisting of millions of records into your database on the ms sql server in a reasonable time.

The task at first glance is trivial. You can use the wizard in the sql management studio or the OPENROWSET function via the query.

But the first option after several attempts was dropped due to different glitches and the need to load multiple files into one table (about 100 files). In addition, with a long download, an error occurred.

The second option also did not fit due to the different bit depth of the drivers and the bit width of the server.
')
Since the file is just huge, it was decided to read it through the stream and write to the database. Further, after reading the line in the file, you need to write this line to the table. The first thing that came to mind is to use insert, but the entry in this case would take too much time.

And then I remembered another writing mechanism via SqlBulkCopy, which allows you to fill a huge number of records without insert requests.
In practice, this is the use of the SqlBulkCopy class, in order to write through which it is necessary to implement only the IDataReader interface.

So let's start with the implementation of the interface

public class BDFBulkReader : IDataReader

Let's start with the function that returns the value of the current record:

  public object GetValue(int i) { return R[FieldIndex[i]]; }

I draw your attention to the fact that the fields in the file and the fields in the table may be in a different order. And by index I would like to get the value for the corresponding table field. Therefore, I additionally used the FieldIndex dictionary, where the mapping of field names to the number in the sql table. The number takes the name of the field, the name from the dictionary R takes the value from the read line dbf file. As a result, for the nth index in the GetValue database, it returns the corresponding value.

  Dictionary<string, object> R = new Dictionary<string, object>(); Dictionary<int, string> FieldIndex = new Dictionary<int, string>();

We will pass FieldIndex already filled in for the table, and R will fill, when called by the reader, the Read function, which we also implement in the future.

So, the constructor:

  System.IO.FileStream FS; byte[] buffer; int _FieldCount; int FieldsLength; System.Globalization.DateTimeFormatInfo dfi = new System.Globalization.CultureInfo("en-US", false).DateTimeFormat; System.Globalization.NumberFormatInfo nfi = new System.Globalization.CultureInfo("en-US", false).NumberFormat; string[] FieldName; string[] FieldType; byte[] FieldSize; byte[] FieldDigs; int RowsCount; int ReadedRow = 0; Dictionary<string, object> R = new Dictionary<string, object>(); Dictionary<int, string> FieldIndex = new Dictionary<int, string>(); public BDFBulkReader(string FileName, Dictionary<int, string> FieldIndex) { FS = new System.IO.FileStream(FileName, System.IO.FileMode.Open); buffer = new byte[4]; FS.Position = 4; FS.Read(buffer, 0, buffer.Length); RowsCount = buffer[0] + (buffer[1] * 0x100) + (buffer[2] * 0x10000) + (buffer[3] * 0x1000000); buffer = new byte[2]; FS.Position = 8; FS.Read(buffer, 0, buffer.Length); _FieldCount = (((buffer[0] + (buffer[1] * 0x100)) - 1) / 32) - 1; FieldName = new string[_FieldCount]; FieldType = new string[_FieldCount]; FieldSize = new byte[_FieldCount]; FieldDigs = new byte[_FieldCount]; buffer = new byte[32 * _FieldCount]; FS.Position = 32; FS.Read(buffer, 0, buffer.Length); FieldsLength = 0; for (int i = 0; i < _FieldCount; i++) { FieldName[i] = System.Text.Encoding.Default.GetString(buffer, i * 32, 10).TrimEnd(new char[] { (char)0x00 }); FieldType[i] = "" + (char)buffer[i * 32 + 11]; FieldSize[i] = buffer[i * 32 + 16]; FieldDigs[i] = buffer[i * 32 + 17]; FieldsLength = FieldsLength + FieldSize[i]; } FS.ReadByte(); this.FieldIndex = FieldIndex; }

His task is to open the file, determine the field names, their number and their types. The second parameter of the constructor, as I wrote above, is a dictionary of correspondences, so that, for example, by the first field number we are guaranteed to get the required field from the file.

We now turn to the implementation of bool Read (). It will return true if the string was successfully read. And false if the string was not read and at the same time the end of the data was reached.

  public bool Read() { if (ReadedRow >= RowsCount) return false; R.Clear(); buffer = new byte[FieldsLength]; FS.ReadByte(); FS.Read(buffer, 0, buffer.Length); int Index = 0; for (int i = 0; i < FieldCount; i++) { string l = System.Text.Encoding.GetEncoding(866).GetString(buffer, Index, FieldSize[i]).TrimEnd(new char[] { (char)0x00 }).TrimEnd(new char[] { (char)0x20 }); Index = Index + FieldSize[i]; object Tr; if (l.Trim() != "") { switch (FieldType[i]) { case "L": Tr = l == "T" ? true : false; break; case "D": Tr = DateTime.ParseExact(l, "yyyyMMdd", dfi); break; case "N": { if (FieldDigs[i] == 0) Tr = int.Parse(l, nfi); else Tr = decimal.Parse(l, nfi); break; } case "F": Tr = double.Parse(l, nfi); break; default: Tr = l; break; } } else { Tr = DBNull.Value; } R.Add(FieldName[i], Tr); } ReadedRow++; return true; }

Once again I remind you that after calling it, the read line will be written into the dictionary R, for later reading by the reader.
So, it remains to implement, the method returns the number of fields:

 public int FieldCount { get { return _FieldCount; } }

And stubs for the interface:

 public void Dispose() { FS.Close(); } public int Depth { get { return -1; } } public bool IsClosed { get { return false; } } public Object this[int i] { get { return new object(); } } public Object this[string name] { get { return new object(); } } public int RecordsAffected { get { return -1; } } public void Close() { } public bool NextResult() { return true; } public bool IsDBNull(int i) { return false; } public string GetString(int i) { return ""; } public DataTable GetSchemaTable() { return null; } public int GetOrdinal(string name) { return -1; } public string GetName(int i) { return ""; } public long GetInt64(int i) { return -1; } public int GetInt32(int i) { return -1; } public short GetInt16(int i) { return -1; } public Guid GetGuid(int i) { return new Guid(); } public float GetFloat(int i) { return -1; } public Type GetFieldType(int i) { return typeof(string); } public double GetDouble(int i) { return -1; } public decimal GetDecimal(int i) { return -1; } public DateTime GetDateTime(int i) { return new DateTime(); } public string GetDataTypeName(int i) { return ""; } public IDataReader GetData(int i) { return this; } public long GetChars(int i, long fieldoffset, char[] buffer, int bufferoffset, int length) { return -1; } public char GetChar(int i) { return ' '; } public long GetBytes(int i, long fieldOffset, byte[] buffer, int bufferoffset, int length) { return -1; } public byte GetByte(int i) { return 0x00; } public bool GetBoolean(int i) { return false; } public int GetValues(Object[] values) { return -1; }

Where in Dispose () I just close the file.

After the interface is implemented, you can write a method to download the file:

  void SaveToTable(FileInfo dir, string TableName, string connestionString, Dictionary<int, string> FieldIndex) { using (var loader = new SqlBulkCopy(connestionString, SqlBulkCopyOptions.Default)) { loader.DestinationTableName = TableName; loader.BulkCopyTimeout = 9999; loader.WriteToServer(new BDFBulkReader(dir.FullName, FieldIndex)); } }

That's all. It remains to transfer the file location, the table name, the connection string and the corresponding correspondence dictionary to this function, for example:

 Dictionary<int, string> FieldIndex= new Dictionary<int, string>(); FieldIndex.Add(0, "POSTALCODE"); FieldIndex.Add(1, "IFNSFL"); FieldIndex.Add(2, "TERRIFNSFL"); FieldIndex.Add(3, "IFNSUL"); FieldIndex.Add(4, "TERRIFNSUL"); FieldIndex.Add(5, "OKATO"); FieldIndex.Add(6, "OKTMO"); FieldIndex.Add(7, "UPDATEDATE"); FieldIndex.Add(8, "HOUSENUM"); FieldIndex.Add(9, "ESTSTATUS"); FieldIndex.Add(10, "BUILDNUM"); FieldIndex.Add(11, "STRUCNUM"); FieldIndex.Add(12, "STRSTATUS"); FieldIndex.Add(13, "HOUSEID"); FieldIndex.Add(14, "HOUSEGUID"); FieldIndex.Add(15, "AOGUID"); FieldIndex.Add(16, "STARTDATE"); FieldIndex.Add(17, "ENDDATE"); FieldIndex.Add(18, "STATSTATUS"); FieldIndex.Add(19, "NORMDOC"); FieldIndex.Add(20, "COUNTER");

All, thank you all for your attention, pleasant loading.

Useful links:
SqlBulkCopy Description
Insides dbf

Source: https://habr.com/ru/post/249321/

All Articles

Import large dbf files into Ms SQL server 2008 using SqlBulkCopy

More articles: