What SQL Server Data Services Means for Bible Software
Yesterday Microsoft announced a beta of SQL Server Data Services (SSDS), a cloud database. The features of SSDS provide a possible way to solve data portability and synchronization problems in Bible software.
(If terms like “cloud database” and “data portability” put you to sleep, then you should probably stop reading now. This post is fairly technical.)
The Problem
At BibleTech08, Craig Rairdin from Laridian spoke about “synchronizing user-created data between platforms, readers, and vendors” (mp3 of his talk):
In the last ten years, Bible software users have moved from being 100% desk-bound to nearly 100% mobile. Unfortunately, mobile devices are significantly more disposable than desktop systems, and users move from device to device, platform to platform, and Bible software to Bible software. Through this they long to have portability of not just their libraries but their own annotations, highlights, cross-references, bookmarks, and any other user-created data their programs allow them to create.
In other words, Craig describes the digital analog of the classic problem when you buy a new print Bible: what do you do with all the notes, highlights, and underlines in your old Bible? Similarly, when you buy new hardware or software, what do you do with all your customizations? Your notes, highlights, history, saved reports, workspace settings, etc., should ideally come with you when you upgrade your computer or switch programs.
Unfortunately, each Bible software vendor stores this user data in different formats, depending on the needs of the program, and the vendor may or may not choose to export the data in an easily consumable format. Transferring data between programs, therefore, involves a lot of work—probably not as much work as copying out all the notes from your old print Bible, though.
(Naturally, vendors make it relatively easy to transfer your data when you upgrade to a newer version of their program or when you switch computers and need to reinstall your program. They could, however, simplify the process further.)
About SSDS
According to the SSDS overview (pdf), SSDS is a schemaless (no fixed schema), queryable, REST-accessible data store. Basically, you send it an HTTP request, and it sends you back an XML document containing the results of your query.
In many ways, SSDS is part of a trend in recent large-scale database development toward denormalization, key/value lookup, flexible schemas, and standards-based protocols. Amazon’s SimpleDB and CouchDB are both examples of this trend. The need for ORM schemes, like the one used in Ruby on Rails, shows that modern programming languages and relational databases aren’t exactly using the same playbook when it comes to data modeling. And scaling a database requires a good deal of specialized knowledge.
A Solution
SSDS offers a potential solution to the problem of data portability and synchronization in Bible software: instead of only storing user data in a binary format on a local computer, also store it in the cloud so other programs and the user can access it when and how they want.
Let’s look at the SSDS data model and see how it applies to a user-data scenario. An SSDS database has four levels:
- Authority: a group of containers. As a software vendor, you’d probably have one authority per program.
- Container: a group of entities. Containers come with their own security model, so each container would be a user.
- Entity: a group of properties. Each entity would be a distinct piece of data—a user’s notes on a particular verse, for example. The lack of a defined schema means that an entity can consist of only those properties that are relevant to the data. A highlight, for example, might record the start and end points and a timestamp, while a note would need to record the text of the note, as well as a mime type.
- Property: a key-value pair. The raw data. The trick for data portability is that vendors have to agree on the terminology and format for various properties.
In this scenario, when the Bible software program starts up, it queries the SSDS database and look for new data created by the user, integrating the data into the program as necessary. Then, when users write a note in their Bible software program, the program queries the SSDS database and records the note there. If the user doesn’t have an Internet connection at the moment, it queues the data until the Internet becomes available. (The talk by Craig from Laridian goes into a bunch of different synchronization cases.)
Now, in theory, different programs—or the user—could query the database directly and read or write data as necessary. For example, someone might want to create a feed of recent Bible annotations and publish it on a blog. If SSDS produces an Atom feed natively, then it gets even easier to publish the data widely (should the user choose to do so).
In practice, for write requests, you’d probably want to have a trusted and an untrusted container, or proxy the requests through a program that validates the data. OAuth provides a standard way to give programmatic access to data.
In the end, users have more control over their data in an environment that provides them more permanence than a local store. Users who have confidence that they won’t lose their data will hopefully be more likely to commit more data to their Bible software. And the more they use their Bible software, the more loyal they become (and the more money they’ll spend on upgrades in the future).
Data portability is a harder sell to Bible software vendors—after all, you’re giving people an easy way to move to different software. But you’re also letting people, in principle, integrate one activity—studying and annotating the Bible—with other activities in their lives. One example is a Facebook application that shows what Bible passages you’re reading that day. Another example is twittering a brief response to your Bible study. (Are these applications useful? Maybe not. But you can probably come up with some that are.)
The digerati are beginning to demand data portability in online applications; at some point, they’ll start demanding it in their offline applications, too. Will most users of Bible software also demand data portability? No, probably not right away. But they may start to do so if other programs and websites they use begin offering it.
Conclusion
We’re not suggesting that any Bible software vendors go out and start publishing their users’ data on SSDS—we only suggest that if they want data portability, synchronization, and users’ control of their data, then SSDS is a good fit.
Undoubtedly, we haven’t thought through a lot of problems related to this approach, but we thought we’d publish this idea and let others push it deeper if you want.
Our websites run on LAMP rather than on Microsoft technology, and the beauty of SSDS’s REST approach is that we don’t need to care what’s running under SSDS’s hood. (CouchDB, for example, runs on Erlang, which is cool but not the most mainstream language.) We’ve been considering using (and have been testing) Amazon’s SimpleDB for some upcoming applications, but SSDS overcomes a number of present limitations in SimpleDB (notably the 1,024-character limit, the need for massive concurrency to retrieve many items in a reasonable amount of time, and the lack of a numeric data type). The main questions for us about SSDS revolve around price and what limitations SSDS has that the materials released so far don’t mention. We’ve requested access to SSDS, so hopefully we’ll learn some specifics firsthand sometime soon.
If Microsoft can make good on the promise of SSDS, they’ll have made a huge contribution to the notion of cloud computing. At the very least, it will definitely be a good thing for Amazon and Microsoft to compete in this area; competition should result in better products from both of them.
Comments are temporarily open on this post; we may not publish every comment, but we’ll definitely read all of them and publish the constructive ones.
Appendix: Comparison of SSDS, SimpleDB, and CouchDB
| SSDS | SimpleDB | CouchDB | |
|---|---|---|---|
| Protocol | REST (the two examples published by Microsoft point to a true REST approach) or SOAP; XML output (though Microsoft also mentions Atom); JSON is possible in principle, though Microsoft has made no promises | REST (technically, if not in spirit) or SOAP; XML output | REST; JSON output |
| Query Language | LINQ; returns only complete entities | Proprietary Amazon language; returns item names—retrieving attributes requires one additional query per item | Javascript functions for mapping and, eventually, reducing |
| Immediacy | Not mentioned | Eventual consistency; new data may take several seconds to be available | Eventual consistency; views are fast |
| Smallest Updatable Unit | Property (source) | Attribute | Document (group of name/value pairs) |
| Limitations | Only in private beta; no benchmarks; scant information about other possible limitations (which undoubtedly exist) | Each attribute value can only be 1,024 bytes; no sorting; no types (hacks needed to query numbers); lots of parallel requests needed to fetch items | Currently in alpha; you manage the servers it runs on, unlike SSDS and SimpleDB; you can only update complete documents |
| Price | Unknown; Microsoft is targeting small- and medium-sized businesses | Increases with usage and data; starts small | Free (open source) |
| Use as Primary Data Source? | Latency issues (unless application server is in the same data center) make it infeasible if you need to make lots of queries in realtime. Microsoft reportedly plans to sell self-hosted editions of SSDS. | Could work if combined with EC2 to minimize latency | Too early for production |




March 6th, 2008 at 11:50 am
Very nice post. I’ll probably be incorporating this into that one application I showed you that one time as it does most of the work that you have mentioned here (obviously I’m being vague so as not to reveal proprietary project secrets.) My own Bible/Seminar study is done almost entirely on the web. With the exception of BibleWorks, all my Bible/Seminary work (much of which is custom) is done via my iPod Touch (and as soon as I jailbreak my iPod Touch giving me RDP support– I’ll have BibleWorks!) Having centralized data is *critical* for my architectures, Bible-related to not. We need to be able to have centralized “resources” that we can “view” via various applications on various platforms using various technologies. That’s just good architecture.
Also, I can almost guarantee that there will be JSON support. The reason I say this is because Microsoft’s base communication system is Windows Communcation Foundation which makes adding more protocols as simple as adding a line of XML to a configuration file (or for very strange protocols, simply writing a new binding and plugging it in– which can be any communication from your own proprietary XML format to LINQ). In the case of JSON, all they would need to do is add the webHttpBinding to their endpoint and viola, JSON support. This is exactly what I did with my own Bible system service-architecture– something I’ll be writing about extensively in the coming months.
March 6th, 2008 at 12:50 pm
Microsoft’s David Treadwell also announced a Windows Live Application Based Storage service that he describes as follows:
“Application Based storage is an experimental API which allows application developers to store a small amount of state/configuration data in the Windows Live data centers on behalf of a user. This API has an AtomPub service end point so developers will be able to call this using ADO.NET data services or other AtomPub compatible tools. The real value kicks in here if an application was to have hundreds of thousands of users as the storage is offloaded to Windows Live infrastructure.”
See: http://oakleafblog.blogspot.com/2008/02/atom-and-atompub-support-extended-to.html
This sounds as if it would be an ideal companion to SSDS for saving user annotations.
For more on SSDS see http://oakleafblog.blogspot.com/2008/03/sql-server-data-services-to-deliver.html
ESV blog edit: Updated first link
March 6th, 2008 at 12:56 pm
Correction: The first link should be to http://oakleafblog.blogspot.com/2008/02/atom-and-atompub-support-extended-to.html
Sorry about that.
–rj
March 6th, 2008 at 1:05 pm
Colin Neller has notes from today’s Mix session about SSDS:
http://www.colinneller.com/blog/IntroToSQLServerDataServicesSSDSSQLServerInTheCloud.aspx
It looks like the API won’t be entirely RESTful; there are definitely verbs in those URLs. But that’s relatively a minor nitpick.
More problematic, the public beta won’t be until July (alas!), with a final launch next year. In the near term, SimpleDB seems like a better solution, especially if Amazon addresses some of the current perceived shortcomings over the next few months.
March 6th, 2008 at 2:02 pm
The Standard Bible Society Considers SSDS for Bible Software…
March 6th, 2008 at 3:02 pm
interesting to me, as one who loves my Bible software, but is bothered by the fact that I’ve already lost the ability to use some of the older programs that I purchased years ago. The ESV blog talks about some advances that could help in the future…
March 7th, 2008 at 7:39 am
LINQ and Entity Framework-Related Sessions WMVs from MIX08. Select a presentation from MIX08’s Silverlight sliding sessions panel. Wednesday Sessions Posted Thursday March 6, 2008. Pablo Castro: T07 - RESTful Data Services with ADO….
March 7th, 2008 at 4:08 pm
Nigel Ellis’s talk (from the SSDS product team) at Mix is now available at http://sessions.visitmix.com/?selectedSearch=BT05 .
Of note: it does look RESTful–the tool he demoed had GET/POST/PUT/DELETE buttons, and it looked like it returned HTTP status codes properly. That’s exciting.
Second: The smallest updatable unit is an entity, not a property. You have to PUT the whole entity back when you want to change part of it. It uses simple version control similar to how CouchDB does, so you send a version with your update, and if it’s out-of-sync with the version on the server, you get a 409/Conflict response. This approach sounds fantastic. (SimpleDB’s approach, allowing you to update individual attributes, is better–and simpler–in some cases, but it does force you to be more careful about overwriting your data.)
On the less fantastic side, latency from our servers here in Chicago to data.sitka.microsoft.com is fairly high. It’ll be interesting to see how fast it can be to query across the Internet–that’s a key question to determine the kinds of use cases SSDS is appropriate for if you need to access data synchronously from a server located outside Microsoft’s data centers.
March 8th, 2008 at 7:38 am
It is simple, but it is not SimpleDB. There are press articles and some blog posts here and here that is comparing SQL Server Data Services to Amazon SimpleDB or S3. If we look at the data model and query capabilities of SSDS as…