SQL Server csv in xml out Project


Using SQL Server we will design a project where we process an incoming csv file and prepare an xml file for output.

We will be working for a fictitious company called I Love Books Inc. We will create a new SQL Server database called [ILoveBooks] to do this. The files contain books that we have brought into the retail store as inventory. The company purchases books from different suppliers. All of the suppliers will use exactly the same format of csv file to record the books sent to I Love Books Inc. This is because when we asked them, they agreed to do so. The output xml file will contain the same data as the input file, plus a bit more information. It will be consumed by I Love Books Inc.’s accounting system. We don’t need to worry about how the accounting system imports the xml file.

Our job is to design an algorithm that will solve this challenge. First though we will start with a list of considerations that will be the beginnings of formulating the Requirements and Specifications.

  1. A folder for the incoming csv files and a naming convention
  2. A specific file called inbooks.csv in a specific folder will be SQL Server’s data source
  3. A folder for the outgoing xml files and a naming convention
  4. The entire process of importing and exporting is all-or-nothing
  5. We will take control of error trapping and provide good error reporting
  6. May or may not record all errors to an error log table
  7. Import the csv file and save data as a permanent record
  8. Incoming data must be validated and cleaned if unclean
  9. As part of validation, there can be no duplicates inside of the incoming file itself
  10. The incoming file cannot be stored in database more than once (duplicates will result)
  11. A table containing a list of valid suppliers is stored in the database
  12. The csv file cannot be imported unless the supplier is found in the [Suppliers] table
  13. Our output xml file must be validated against an xml schema (XSD)
  14. A permanent record of all xml files produced are stored in the database

Technical Specifications – Initial Thoughts

This section will explore some ideas as to how we will accomplish some of the above requirements.

  1. Date and time stamp the csv file name. The file name will include the supplier’s company code name followed by a date and time stamp in the format “_YYYYMMDDHHMMSS. There will be one header line , at least one detail line and one trailer line in the csv file. The folder will be named C:\data\ilovebooks\incoming\pending
  2. C:\data\ilovebooks\incoming
  3. Date and time stamp the xml file name. The file name will include our company name “ilovebooks” followed by “_YYYYMMDDHHMMSS.xml”. There will be a header, detail and trailer sections to the csv file. The folder will be named C:\data\ilovebooks\incoming
  4. SQL Server Transactions and Stored Procedures
  5. We will use TRY…CATCH blocks
  6. We may not implement logging
  7. We need “history” tables to append to each time we import. Transactions will roll it back if there is a critical error. The data will come from the incoming tables. The data moves from the csv file to the incoming tables to the history tables. After the incoming data is cleaned and validated it can be copied to the “history” tables.
  8. Validate the supplier code and the date-time in the header against the file name itself. They must match or abort the whole process. the to ensure it matches with the company name in the [Suppliers] table that holds a list of all of the suppliers. This [Suppliers] table has columns for [FullSupplierName], [SupplierCode] and [datetimeadded] to table, with the [SupplierCode] as a primary key. No columns may contain NULL. All of the columns in the incoming csv must exist in the correct order and must contain data, even if empty strings.
  9. The ISBN number is a primary key, so we will get an error from SQL Server if it is a violation, so we do not need to check this one.
  10. We could store the xml in tables and in an xml data type

Stored Procedures

At this point we could begin designing our stored procedures. We will have one stored procedure that calls (EXEC) the other stored procedures. There are benefits to breaking down code into distinct modules. How will we structure it? What are our naming conventions for our user-defined stored procedures. One convention is to start all of them with “usp_”.

  1. usp_00RunProgram
  2. First Stage
    1. usp_10CheckIfFileExists – Part 7
    2. usp_11CheckValidCSVFile
    3. usp_12DeletePreviousOrder Part 5
    4. usp_13OrderImport –
      Part 4
    5. usp_14OrderClean –
      Part 6
    6. usp_15NoDuplicateOrders – Part 8
  3. Second Stage
    1. usp_21OrderCopy – Part 9
    2. usp_22OrderUpdateTotals – Part 10
  4. Third Stage
    1. usp_31CreateXMLData – Part 11
    2. usp_32CreateEmptyXMLFile
  5. Fourth Stage
    1. usp_41CopyOrderToHistory
    2. usp_42XMLCopyHistory
  6. Fifth Stage
    1. usp_51MoveDataToXmlFile

If we are happy with our structure shown above, we could proceed to create all of the stored procedures. We also know that each of these stored procedures are using transactions and TRY…CATCH blocks. Since this is “boilerplate code, we could simply copy the code into each stored procedure. Since our code is well-structured, we can add more stored procedures if we need to. Since we are ordering the stored procedures with numbers, they will sort well in SSMS.

We also can map these procedures to the previous posts at this site or at other sites so that when you are beginning to code the stored procedure, you can refer to the code you need and modify it to suit your needs.

Series NavigationSQL Server csv in xml out Project Part 2 >>

Leave a comment

Your email address will not be published. Required fields are marked *