Since SharePoint was first released, Microsoft has used SQL Server as the underlying data management and storage mechanism. Although SQL Server is adequate for storing SharePoint data, it isn’t always ideal. The reason for this is that like most relational databases, SQL Server is optimized for storing structured data, such as configuration, table and list data. However, the files that are typically stored in SharePoint document libraries are not structured.
When a user saves a file to a SharePoint document library, that file is saved to the SQL Server database as blob, or binary large object, data. There is no rule that says blob data can’t be kept in SQL Server, but SharePoint Server 2010 provides the option of moving it out of content databases and into less costly storage resources through Microsoft’s Remote BLOB Store add-on software.
Architecturally, SharePoint 2010 remote blob storage is somewhat similar to SQL Server storage. The blob data isn’t merely offloaded onto a file share. Instead, an external storage medium (which can be a file server or another SQL Server system) is configured to act as an external blob store. SharePoint uses a component object model (COM) interface to redirect open-and-save commands to the blob store when appropriate.
Microsoft offers a COM object known as the external blob store provider (sometimes referred to as the EBS Provider). Its main job is to keep SQL Server and blob storage in sync. A SQL Server content database must have metadata references to any blob data (which Microsoft refers to as blob streams) residing on the external blob store. And the EBS Provider object has to be installed, registered and configured on each front-end Web server for remote blob storage to work properly.
Advantages of separating blobs, SQL Server
There are at least four compelling reasons for moving blob data out of SQL Server:
- Blob data typically represents the vast majority of the data stored in a SharePoint content database. Relocating blob data can reduce the demands against the SQL Server database, thereby improving performance.
- If SharePoint’s versioning feature is enabled, each time someone makes a change to a document, a new instance of the corresponding blob is saved in the SQL Server database. SharePoint does not have a way of saving only the deltas. Hence, versioning can cause the database to grow exponentially. This can be problematic because Microsoft’s best practices for SQL Server state that database sizes ideally should be kept under 100 GB.
- There is no real advantage to keeping blob data in SQL Server. The very nature of blob data means that the data cannot take advantage of SQL’s query environment beyond the use of simple Get and Put statements.
- Remote blob storage is usually less expensive than SQL Server storage.
There are some clear advantages to using the SharePoint remote blob storage feature, but whether that means all blob data should be moved out of SQL Server depends on whom you ask. Some IT pros insist that all blob data should be stored remotely, while others believe that it should remain in SQL Server. Personally, I like a combined approach.
I don’t recommend moving all of your blob data to remote storage because most organizations place their SharePoint content database on a high-performance server. SQL Server is a very demanding application, so organizations tend to invest in storage hardware that can deliver plenty of I/O operations per second. On the other hand, remote blob storage is possible on less expensive storage. Although an organization might not use low-end hardware for remote blob storage, the storage infrastructure probably isn’t as efficient as what SQL Server is using. As a result, it stands to reason that moving all of your user data to remote blob storage could negatively impact performance.
Remote storage: Nice home for old data
Remote storage is best suited for older data that is not accessed frequently. For example, an organization’s accounting department might still need access to the financials from 2007, but that information probably isn’t accessed on a daily basis. Such data is ideal for remote blob storage. In contrast, it probably wouldn’t be wise to place current collaborative data in remote blob storage because the hardware might not deliver sufficient I/O performance for easy access.
Though it can be handy to move older content to remote blob storage, SharePoint 2010 does not include capabilities for moving blob data to a remote storage location as part of a document lifecycle plan. There are, however, a number of third-party applications that can automatically move blob data to remote storage based on the file’s age, size, type and a number of other criteria.
Remote blob storage is also handy for storing excessively large files, such as video, which can quickly fill up expensive storage in a content database. Again, unless they are accessed frequently, it is better to store large files remotely.
Finally, when considering using remote blob storage, think about how doing so will affect an organization’s disaster recovery strategy. SharePoint data will exist in two separate locations, in the SQL Server database and in the remote blob storage, which will likely require a rethink of backup architecture and any fault-tolerant mechanisms that might exist. But with some proper planning, those issues can also be put to rest.
ABOUT THE AUTHOR:
Brien M. Posey is a Microsoft MVP with two decades of IT experience. Before becoming a freelance technical writer, Posey was CIO for a national chain of hospitals and health care facilities and a network administrator for insurance companies and the Department of Defense.