Storing Large Binary Data in AWS S3 Instead of SQL Server

Our SQL Server database was growing quickly.

So we did some analysis to find what was causing the growth.

We quickly realised most of the storage space was taken up by Xml data from Web Service Requests and Responses.
We store the request and response data from all calls to our web services. This data is used for our web site to show some reports in Html or PDF format.
This Xml data is stored in our database in two VARCHAR(MAX) columns in one table. The data can be 100 KB or up to a few Mega Bytes for each row.

We have been planning to shift that data from SQL Server into AWS S3 for some time now. But of course other things take priority.

But then one day, I decided just to do it. It took me one day of design and coding to implement this in our test environment. It will be rolled out to our live environment as part of the next release.

This is what I did ....

Added a few methods to the C# class that handles the Data Access Layer for the table in question.

SaveDataToS3

uploads the Xml data into S3
the S3 Bucket Name, and the S3 Key used to store the data, is placed into the column that previously held the actual xml data
The value saved to the database with the S3 Bucket Name and S3 Key looks like this;

S3:acmeinc-blobs/MessageData/123/456/7890/1234567890-Request.xml
S3:acmeinc-blobs/MessageData/123/456/7890/1234567890-Response.xml

FetchDataFromS3

downloads the Xml data from S3 using the S3 Bucket Name and S3 Key that is stored in the database column

Updated the code in our web services that writes the Request and Response data into the database. It now calls the SaveDataToS3 as part of saving the data to the SQL Server database.

Updated the code that uses the Request and Response data to generate the Html and PDF, to use the FetchDataFromS3 method to dynamically fetch the data from S3 as required.

There is still data that is written to the SQL Server table. But that now no longer has the large Request and Response Xml data.

To shift existing records over to the way of storing things, I wrote a small C# Console app. It goes through and updates each row in the database and shifts the data to S3. This can be run during off-peak times, and will eventually process every row.

During testing, the time it takes to Save to S3 and fetch from S3 is measured in hundredths of seconds. So the performance is not impacted.

Our test database size has been reduced by around 50%.

Kellie and Grant

Search This Blog

Storing Large Binary Data in AWS S3 Instead of SQL Server

Comments