📜 ⬆️ ⬇️

ETL process using web services in Integration Services 2012

We reserve for the future


Sometimes in the course of work, data from web services is needed, especially SOAP connections today are practically standard.

The ETL process (Extract - Transform - Load) is a term from Business Intelligence and describes the process of collecting and transforming data to create an analytical database (for example, a data warehouse).

SOAP data exchange protocol and web services described by WSDL are common windows into the world of almost all ERP systems, many online portals and financial organizations.
')
I will try to describe step by step the ETL process using one of the most powerful tools in the class - MS Integration Services.

So, we will consider a test task.

Task



It is necessary to collect data on exchange rates against the ruble for each date of the last year and load them into a table for further analysis. The Central Bank of Russia provides historical data - in the form of web services with a good description.
It looks like this is the solution.

Create a project



First, let's create a project in Business Intelligence Development Studio (and more recently, SQL Server Data Tools)

In the toolbox there is an element Web-service-task, and we will transfer it to the workspace.


In the line HttpConnection create a new connection - selecting NewConnection
and put the URL http://www.cbr.ru/DailyInfoWebServ/DailyInfo.asmx
By the way, you can go there - do not hesitate.

WSDL - the file is available there, here it is www.cbr.ru/DailyInfoWebServ/DailyInfo.asmx?WSDL
Download and select it in the settings dialog.

Everything, go to the Input tab

image

Here you can choose among the services and their methods - supplied by the provider.
We need GetCourseOnDate - the issuing DataSet for the requested date.
Set the date value manually - in the Value field, return later and automate it.
Output we simply make to the file by creating a new connection output - to the file output.xml

image

Start the process - it went. File is created - currency rates appear as currency rates.
The format is odd: diffgr attributes, inline scheme. This is the XML output of the Dataset object.

In the BI-studio there is an XML-Source connector - it gives out a zero result when you try to set it on such a file. (Scheme reads).

We put after the Web Service Task - Skript Task.

Workaround


Hidden text
object rawConnection = Dts.Connections["output"].AcquireConnection(Dts.Transaction); DataSet CoursesFile = new DataSet(); string filePath = (string)rawConnection; object rawConnection2 = Dts.Connections["output2"].AcquireConnection(Dts.Transaction); string filePath_Out = (string)rawConnection2; object rawConnection3 = Dts.Connections["output.xsd"].AcquireConnection(Dts.Transaction); string filePath_Schema = (string)rawConnection3; CoursesFile.ReadXml(@filePath); CoursesFile.AcceptChanges(); CoursesFile.WriteXmlSchema(@filePath_Schema); CoursesFile.WriteXml(@filePath2); 

Script in C #, you can use the entire toolkit .net platform and 2012 VisualStudio.
I added to the beginning using System.Xml;
Appeal to the collection of connections - I created 2 more, for saving the schema and for output.

The output is pure XML - thanks to acceptChanges ();

Reconsidering the Beginning



It would be nice to take a range of dates - and receive for each currency exchange rates, and then record, adding a date field to the database.

To begin with, we will create a temporary dimension, it will come in handy and more than once - in each data warehouse.

I create with my variation of the script scanned in the network, here it is.

Hidden text
 -- Delete time dimension if it already exists. IF Exists(Select Name from sysobjects where name = 'Dim_Time') BEGIN Drop Table Dim_Time END GO -- Standard options for creating tables SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO -- Create your dimension table -- Adjust to your own needs Create Table dbo.Dim_Time ( Dateid int IDENTITY (1,1) PRIMARY KEY CLUSTERED, Date date, DateString varchar(10), Day int, DayofYear int, DayofWeek int, DayofWeekName varchar(10), Week int, Month int, MonthName varchar(10), Quarter int, Year int, IsWeekend bit, IsLeapYear bit ) -- Declare and set variables for loop Declare @StartDate datetime, @EndDate datetime, @Date datetime Set @StartDate = '2000/01/01' Set @EndDate = '2020/12/31' Set @Date = @StartDate -- Loop through dates WHILE @Date <=@EndDate BEGIN -- Check for leap year DECLARE @IsLeapYear BIT IF ((Year(@Date) % 4 = 0) AND (Year(@Date) % 100 != 0 OR Year(@Date) % 400 = 0)) BEGIN SELECT @IsLeapYear = 1 END ELSE BEGIN SELECT @IsLeapYear = 0 END -- Check for weekend DECLARE @IsWeekend BIT IF (DATEPART(dw, @Date) = 1 OR DATEPART(dw, @Date) = 7) BEGIN SELECT @IsWeekend = 1 END ELSE BEGIN SELECT @IsWeekend = 0 END -- Insert record in dimension table INSERT Into Dim_Time ( [Date], [DateString], [Day], [DayofYear], [DayofWeek], [Dayofweekname], [Week], [Month], [MonthName], [Quarter], [Year], [IsWeekend], [IsLeapYear] ) Values ( @Date, CONVERT(varchar(10), @Date, 105), -- See links for 105 explanation Day(@Date), DATEPART(dy, @Date), DATEPART(dw, @Date), DATENAME(dw, @Date), DATEPART(wk, @Date), DATEPART(mm, @Date), DATENAME(mm, @Date), DATENAME(qq, @Date), Year(@Date), @IsWeekend, @IsLeapYear ) -- Goto next day Set @Date = @Date + 1 END GO 


Move the new item to the workspace.

image

image

Base TEST and OLE DB connection localhost.TEST created along the way.

How to make the process take each date from the result and transfer it to WebServiceTask.

Using variables.

Variables


Right click on the workspace - Variables. Or the menu View - Other Windows - Variables. This is a very powerful tool, but we will only mention it in passing; without it, SSIS cannot be half as much.
Create a UDate variable of type System.Object, since We will display the result of the query.
Connect it with our SQL Task.

image

Now add successively to SQL-Task Foreach Loop Container.
This is a container (there you can shove any sequence of actions), it performs the foreach routine.
In our case, it will connect to web services and record currency rates for each date in the UDate set.
That's how it is configured
image
image

The Date variable of the Date type created by us will be available inside the container so we will pass the actual date for the record.

Here is what we put in the container.

image

Let's see what DataFlow follows our script - everything is simple there. Read XML with XML - Source,
add a column Date - write to the table.
And so for each date.

image

In the element Derived Column - insert our variable Date - to write to the database.

Here is the result
image

Source: https://habr.com/ru/post/198224/


All Articles