In my earlier post, I showed one of the ways to 'get' InfoPath files from SharePoint 2003 Forms Library. Once the xml files are extracted onto local directory or network share, now the task is to read the xml files and load it in database. Well, since this is a typical ETL (extraction, transformation & load) task, the obvious solution for us Microsoft techies is SSIS (or DTS for old timers). Using a XML Data source, embedded inside a 'for loop' - this could really be an easy job. Wrong! The real challenge is that InfoPath form has multiple namespaces. An XML Data source cannot work with multiple namespaces. So, you have to use an XML task to use an XSLT to strip multiple namespaces, before you execute XML Datasource. There is an excellent post by Ashwin at SQLJunkies site that you can refer on how to strip out multiple namespaces from an InfoPath data file. Here is a link to that article.

http://www.sqljunkies.ddj.com/Article/67C11DD1-080E-4F46-818B-4B9BD332DCA6.scuk 

After removing multiple namespace, now your task is rather simpler. Use a For Loop inside, and move your data flow and xml task. When all said and done your final control flow should look something like this.

But I took a totally different approach. I actually had to leverage the business Logic and data access layer which the Client uses for all their data processing. Now, I could use a .net assembly and call that from SSIS package, but this approach requires the assembly to be strongly named and GAC'ed. So, in order to save some trouble and work for the IT team here, I wrote a simple Console application in C#. This application reads the XML files and loads the data using the framework blessed by IT and Enterprise Architecture. The 'ProcessExtracts' function, loops through the folder and instantiate an XMLDocument object for each file, which further gets passed to 'SetCustomerServiceObject' function which does the real processing. In order to deal with multiple namespaces, I use 'InitNameSpaceManager' function which returns a namespace manager for the XML Objects to understand "my:" in all the InfoPath xml nodes.

public void ProcessExtracts()

{

string[] Files = Directory.GetFiles(GetAppDirectory());foreach(string file in Files)

{

 

FileInfo fileI = new FileInfo(file);

XmlDocument xDoc = new XmlDocument();

xDoc.Load(file);

SetServiceObject(xDoc);

 

}

}

private void SetServiceObject(XmlDocument xDoc)

{

XmlNamespaceManager xns = InitNamespaceManager(xDoc);

XmlNodeList xnList = xDoc.SelectNodes("//my:myFields", xns);

foreach (XmlNode xn in xnList)

{

//MetaData

Service.FormLastModifiedDt = _lastModifiedDt;

Service.AsOfDt = System.
DateTime.Now;

 

//FormData

Service.Invoice= xn["my:Invoice"].InnerText;

Service.EnteredBy = xn["my:EnteredBy"].InnerText;

IServiceRepository repository = RepositoryFactory.Create<IServiceRepository>("ConnectionString");
repository.Add(Service); 
}

}

I took the approach of using a Console application to run on a nightly basis solely because of the requirement of leveraging existing framework and business logic. If I had a choice, I would have used SSIS to execute these jobs. But the bottomline is, this makes my client happy :-). Hopefully, you had something to learn reading this blog.