Converting Comma Separated Value to Rows and Vice Versa in SQL Server

By Arshad Ali

Introduction

Often while reporting you will encounter a situation where you will have comma separated (or separated with some other character) values in a single column but you want to report them in rows whereas in some other cases you might have values in multiple rows and want them to be a single value separated by comma or some other character. In this article, I am going to demonstrate how you can write queries in SQL Server to handle these scenarios quickly.  

Converting Comma Separated Value to Rows

For converting a comma separated value to rows, I have written a user defined function to return a table with values in rows. It takes comma separated values as the input parameter, iterates through it as long as it finds a comma in the value, takes each value before the comma, inserts into a table and finally returns the inserted data from that table.

It makes use of CHARINDEX inbuilt function to search for the existence of commas in the input parameter and returns the starting position. It does that as long as the position is greater than zero. Then it makes use of the STUFF inbuilt function to replace a part of the main input parameter (which has already been inserted into table) with a zero length string; effectively removing the value before the comma, which has already been extracted from the main input parameter and inserted into the table. It also uses LTRIM and RTRIM functions to remove any extra spaces from the beginning or end of the value if there are any.

CREATE FUNCTION dbo.BreakStringIntoRows(@CommadelimitedString   varchar(1000))
RETURNS   @Result TABLE (Column1   VARCHAR(100))
AS
BEGIN
        DECLARE @IntLocation INT
        WHILE (CHARINDEX(',',    @CommadelimitedString, 0) > 0)
        BEGIN
              SET @IntLocation =   CHARINDEX(',',    @CommadelimitedString, 0)      
              INSERT INTO   @Result(Column1)
              --LTRIM and RTRIM to ensure blank spaces are   removed
              SELECT RTRIM(LTRIM(SUBSTRING(@CommadelimitedString,   0, @IntLocation)))   
              SET @CommadelimitedString = STUFF(@CommadelimitedString,   1, @IntLocation,   '') 
        END
        INSERT INTO   @Result(Column1)
        SELECT RTRIM(LTRIM(@CommadelimitedString))--LTRIM and RTRIM to ensure blank spaces are removed
        RETURN 
END
GO
 
--Using the UDF to convert comma separated values into rows
SELECT * FROM dbo.BreakStringIntoRows('Apple,Banana,Orange')
SELECT * FROM dbo.BreakStringIntoRows('Apple   ,    Banana,    Orange')

 

Result Screenshot

Converting Rows to Comma Separated Values

Before I go into detail and demonstrate converting rows into comma separated values, let’s first create a table and load some data with the script given below and as shown in the image.

CREATE TABLE Fruits
(
Name   VARCHAR(50)
)
GO
INSERT   Fruits(Name) VALUES ('Apple')
INSERT   Fruits(Name) VALUES ('Banana')
INSERT   Fruits(Name) VALUES ('Orange')
GO
SELECT * FROM Fruits
GO

 

Result Screenshot

 

Now we have values in rows and to convert them to a single comma separated value, you can use the script below, which uses the COALESCE inbuilt function.

DECLARE   @ConcatString VARCHAR(4000)
SELECT   @ConcatString = COALESCE(@ConcatString + ', ', '') + Name FROM Fruits 
SELECT   @ConcatString AS Fruits
GO

 

Result Screenshot

 

The COALESCE function takes a list of parameters, separated by commas, evaluates them and returns the value of the first of its input parameters that is not NULL.

Though COALESCE and ISNULL functions have a similar purpose, they can behave differently. For example, ISNULL function is evaluated only once whereas the input values for the COALESCE function can be evaluated multiple times or until it reaches to the first not-NULL value to return.

In the above example, I have considered the comma as a character to separate values; though you can modify the above script to have any other character like $ or |, etc. for separating values.

Conclusion

In this article, I talked about how you can write a query using inbuilt SQL Server functions to convert comma separated values into rows or convert multiple rows values to be a single value separated by a comma. I have demonstrated using a comma for separating values but you can use any other character for value separation.

Resources

CONCAT and STUFF Functions in SQL Server 2012

COALESCE function

CHARINDEX function

Converting Rows to Columns (PIVOT) and Columns to Rows (UNPIVOT) in SQL Server

By Arshad Ali

Introduction

In my last article “Converting Comma Separated Value to Rows and Vice Versa in SQL Server”, I talked about how you can convert comma separated (or separated with some other character) values in a single column into rows and vice versa. In this article, I demonstrate how you can convert rows values into columns values (PIVOT) and columns values into rows values (UNPIVOT) in SQL Server.

Converting Rows to Columns – PIVOT

SQL Server has a PIVOT relational operator to turn the unique values of a specified column from multiple rows into multiple column values in the output (cross-tab), effectively rotating a table. It also allows performing aggregations, wherever required, for column values that are expected in the final output. The basic syntax for a PIVOT relational operator looks like this:

SELECT <<ColumnNames>> 
FROM <<TableName>> 
PIVOT
 (
   AggregateFunction(<<ColumnToBeAggregated>>)
   FOR PivotColumn IN(<<PivotColumnValues>>)
 ) AS <<Alias>> 

With the script below, let’s create a table and load some data into it. As you can see in the image below, it has sales information for some countries for a couple of years. Also, if you notice, for each country and for each year there is a separate row.

CREATE TABLE [dbo].[PivotExample](
       [Country]   [nvarchar](50)   NULL,
       [Year]   [smallint] NOT NULL,
       [SalesAmount]   [money] NULL
)
GO
 
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Australia', 2005, 1309047.1978)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Germany', 2006, 521230.8475)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United States',   2007, 2838512.3550)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'France', 2008, 922179.0400)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Australia', 2007, 3033784.2131)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'France', 2005, 180571.6920)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United Kingdom',   2006, 591586.8540)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Canada', 2006, 621602.3823)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United Kingdom',   2005, 291590.5194)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United States',   2005, 1100549.4498)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Canada', 2007, 535784.4624)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'France', 2007, 1026324.9692)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Germany', 2007, 1058405.7305)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Australia', 2006, 2154284.8835)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United Kingdom',   2008, 1210286.2700)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United States',   2008, 3324031.1600)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Germany', 2008, 1076890.7700)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United Kingdom',   2007, 1298248.5675)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Australia', 2008, 2563884.2900)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Canada', 2005, 146829.8074)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Germany', 2005, 237784.9902)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'Canada', 2008, 673628.2100)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'United States',   2006, 2126696.5460)
INSERT   [dbo].[PivotExample]([Country],   [Year], [SalesAmount])   VALUES (N'France', 2006, 514942.0131)
GO
 
SELECT * FROM [dbo].[PivotExample] ORDER   BY Country
GO

Results Screenshot

 

Now by using the PIVOT operator, we will convert row values into column values with the script given below and the results as shown in the image below.

Though we have used the SUM aggregation function, in this case there is no summation, as there is only one row for each unique combination for country and year. Please note the use of brackets for pivot column values; these are required.

SELECT   [Country], [2005],   [2006], [2007],   [2008]
FROM   [dbo].[PivotExample] 
PIVOT
(
       SUM(SalesAmount)
       FOR [Year] IN([2005], [2006], [2007], [2008])
) AS P

Results Screenshot

Bringing Dynamism to the PIVOT

If you notice in the above script, we have provided values (2005, 2006, 2007 and 2008) for pivot columns as these values are available in the original datasets. But what if some additional values are expected to come in the future, for example 2009 and 2010, etc. for the pivot column?

In that case, you can still use the pivot column values, which are expected to come (or which are still not available in the original dataset) in the future though you will see NULL for its values. The script below shows this scenario and the image below shows NULLs for the years 2009 and 2010 as there is no data for these years.

SELECT   [Country], [2005],   [2006], [2007],   [2008], [2009],   [2010]
FROM   [dbo].[PivotExample] 
PIVOT
(
       SUM(SalesAmount)
       FOR [Year] IN([2005], [2006], [2007], [2008], [2009], [2010])
) AS P

Results Screenshot

The above discussed approach works fine if you already know all of the possible values for the pivot column, but what if you don’t?

In that case, you can write a dynamic query to first grab all the unique values for the pivot column at runtime and then a write dynamic query to execute it with the pivot query as shown below:

--Declare necessary variables
DECLARE   @SQLQuery AS NVARCHAR(MAX)
DECLARE   @PivotColumns AS NVARCHAR(MAX)
 
--Get unique values of pivot column  
SELECT   @PivotColumns= COALESCE(@PivotColumns + ',','') + QUOTENAME(Year)
FROM (SELECT DISTINCT Year FROM [dbo].[PivotExample]) AS PivotExample
 
SELECT   @PivotColumns
 
--Create the dynamic query with all the values for 
--pivot column at runtime
SET   @SQLQuery = 
    N'SELECT Country, ' +   @PivotColumns + '
    FROM [dbo].[PivotExample] 
    PIVOT( SUM(SalesAmount) 
          FOR Year IN (' + @PivotColumns + ')) AS P'
 
SELECT   @SQLQuery
--Execute dynamic query
EXEC sp_executesql@SQLQuery

 

Results Screenshot

Converting Columns to Rows – UNPIVOT

UNPIVOT is another relational operator in SQL Server that performs almost the reverse operation of PIVOT, by rotating column values into rows values. Let me demonstrate this with an example; lets create a table with pivoted data from the previous query with the script below. The image below shows data of the newly created table with pivoted data.

SELECT   [Country], [2005],   [2006], [2007],   [2008] 
INTO   [dbo].[UnpivotExample]
FROM   [dbo].[PivotExample] 
PIVOT
(
       SUM(SalesAmount)
       FOR [Year] IN([2005], [2006], [2007], [2008])
) AS P
GO
SELECT * FROM [dbo].[UnpivotExample] ORDER   BY Country
GO

Results Screenshot

To rotate column values into row values, we will now use the UNPIVOT operator as shown below. The image below shows rotated data:

SELECT   Country, Year, SalesAmount 
FROM [dbo].[UnpivotExample]
UNPIVOT
(
       SalesAmount
       FOR [Year] IN([2005], [2006], [2007], [2008])
) AS P

Results Screenshot

If you refer back, I said UNPIVOT is almost the reverse of the PIVOT operator; this means it might or might not be exactly the same. The reason is, Pivot performs aggregation while rotating row values into column values and might merge possible multiple row values into single column value in the output. For example, consider for a given country and year there are two values, say 5000 and 6000. Now when you pivot it, the output will have 11000 as the column value if you have SUM as the aggregation function. Later if you want to unpivot it back, you will get 11000 not the bi-furcated values (5000 and 6000 as original). Having said that, we can say if the pivoted values are aggregated values, it will not be possible to get the original data back.

Conclusion

In this article, I demonstrated how you can convert row values into column values (PIVOT) and column values into row values (UNPIVOT) in SQL Server. I also talked about writing a dynamic query to write a dynamic pivot query in which possible pivot column values are determined at runtime.

Resources

Client-Side Storage

Introduction

This is an overview of client-side storage, a general term for several separate but related APIs: Web Storage, Web SQL Database, Indexed Database, and File Access. Each of these techniques provides a distinct way to store data on the user's hard drive, instead of the server, where data usually resides. There are two main reasons to do this: (a) to make the web app available offline; (b) to improve performance. For a detailed explanation of the use cases for client-side storage, see the HTML5Rocks article, "Offline": What does it mean and why should I care?.

The APIs share a similar scope and similar principles. So let's first understand what they have in common before launching to the specifics of each.

Common Features

Storage on the Client Device

In practice, "client-side storage" means data is passed to the browser's storage API, which saves it on the local device in the same area as it stores other user-specific information, e.g. preferences and cache. Beyond saving data, the APIs let you retrieve data, and in some cases, perform searches and batch manipulations.

Sandboxed

All four storage APIs tie data to a single "origin". e.g. if http://abc.example.com saves some data, then the browser will only permit http://abc.example.com to access that data in the future. When it comes to "origins", the domain must be exactly the same, so http://example.com and http://def.example.com are both disqualified. The port must match too, so http://abc.example.com:123 also cannot see http://abc.example.com (which defaults to port 80), and so must the protocol (http versus https, etc.).

Quotas

You can imagine the chaos if any website was allowed to populate unsuspecting hard drives with gigabytes of data! Thus, browsers impose limits on storage capacity. When your app attempts to exceed that limit, the browser will typically show a dialog to let the user confirm the increase. You might expect the browser to enforce a single limit for all storage an origin can use, but most enforce limits separately for each storage mechanism. This will change as the Quota API is adopted, but for now, you should think of the browser as maintaining a 2-D matrix, with "origin" in one dimension and "storage" in the other. For example, "http://abc.example.com" may be allowed to store up to 5MB of Web Storage, 25MB of Web SQL Database Storage, and forbidden to use Indexed Database due to the user denying access. The Quota API brings this into a central location and lets you query how much space is available and in use.

There are also environments where the user can see upfront how much storage will be used, e.g. in the case of the Chrome Web Store, when a user installs an app, they will be prompted upfront to accept its permissions, which include storage limits. One possible value in the manifest is "unlimited_storage".

Transactions

The two "database" storage formats support transactions. The aim is the same reason regular relational databases use transactions: To ensure the integrity of the database. Transactions prevent "race conditions", a phenomenon where two sequences of operations are applied to the database at the same time, leading to results that are both unpredictable and a database whose state is of dubious accuracy.

Synchronous and Asynchronous Modes

Most of the storage formats all support synchronous and asynchronous modes. Synchronous mode is blocking, meaning that the storage operation will be executed to completion before the next line of JavaScript is executed. Asynchronous mode will cause the next lines of JavaScript to be executed before the storage operation completes. The storage operation will be performed in the background and the application will be notified when the operation is finished by way of a callback function being called, a function which must be specified when the call is made.

Synchronous mode should be avoided at all costs, it may seem like a simpler API, but it blocks rendering on the page while the operation completes, and in some cases freezes the whole browser. You've probably noticed when sites and even apps do this, you click on a button and everything freezes, you wonder whether it's crashed, then it springs back to life.

Some APIs don't have an async mode, such as localStorage, you should carefully performance monitor your use of these APIs, and be prepared to switch to one of the async APIs if it becomes an issue.

Overview and Comparison of APIs

Web Storage

Web Storage is basically a single persistent object called localStorage. You can set values using localStorage.foo = "bar" and retrieve them later on — even when the browser has been closed and re-opened — as localStorage.foo. There's also a second object called sessionStorage available, which works the same way, but clears when the window is closed.

Web Storage is an example of a NoSQL key-value store.

Strengths of Web StorageWeakness of Web Storage
  1. Supported on all modern browsers, as well as on iOS and Android, for several years (IE since version 8).
  2. Simple API signature.
  3. Simple call flow, being a synchronous API.
  4. Semantic events available to keep other tabs/windows in sync.
  1. Poor performance for large/complex data, when using the synchronous API (which is the most widely supported mode).
  2. Poor performance when searching large/complex data, due to lack of indexing. (Search operations have to manually iterate through all items.)
  3. Poor performance when storing and retrieving large/complex data structures, because it's necessary to manually serialize and de-serialize to/from string values. The major browser implementations only support string values (even though the spec says otherwise).
  4. Need to ensure data consistency and integrity, since data is effectively unstructured.

Web SQL Database

Web SQL Database is a structured database with all the functionality - and complexity - of a typical SQL-powered relational database. Indexed Database sits somewhere between the two. It has free-form key-value pairs, like Web Storage, but also the capability to index fields from those values, so searching is much faster.

Strengths of Web SQL DatabaseWeakness of Web SQL Database
  1. Supported on major mobile browsers (Android Browser, Mobile Safari, Opera Mobile) as well as several desktop browsers (Chrome, Safari, Opera).
  2. Good performance generally, being an asynchronous API. Database interaction won't lock up the user interface. (Synchronous API is also available for WebWorkers.)
  3. Good search performance, since data can be indexed according to search keys.
  4. Robust, since it supports atransactional database model.
  5. Easier to maintain integrity of data, due to rigid data structure.
  1. Deprecated. Will not be supported on IE or Firefox, and will probably be phased out from the other browsers at some stage.
  2. Steep learning curve, requiring knowledge of relational databases and SQL.
  3. Suffers from object-relational impedance mismatch.
  4. Diminishes agility, as database schema must be defined upfront, with all records in a table matching the same structure.

Indexed Database (IndexedDB)

So far, we have seen that Web Storage and Web SQL Database both have major strengths as well as major weaknesses. Indexed Database has arisen from experiences with both of those earlier APIs, and can be seen as an attempt to combine their strengths without incurring their weaknesses.

An Indexed Database is a collection of "object stores" which you can just drop objects into. The stores are something like SQL tables, but in this case, there's no constraints on the object structure and so no need to define anything upfront. So this is similar to Web Storage, with the advantage that you can have as many databases as you like, and as many stores within each database. But unlike Web Storage, there are important performance benefits: An asynchronous API, and you can create indexes on stores to improve search speed.

Strengths of IndexedDBWeakness of IndexedDB
  1. Good performance generally, being an asynchronous API. Database interaction won't lock up the user interface. (Synchronous API is also available for WebWorkers.)
  2. Good search performance, since data can be indexed according to search keys.
  3. Supports versioning.
  4. Robust, since it supports atransactional database model.
  5. Fairly easy learning curve, due to a simple data model.
  6. Decent browser support: Chrome, Firefox, mobile FF, IE10.
  1. Very complex API resulting in large amounts of nested callbacks.

FileSystem

The previous formats are all suitable for text and structured data, but when it comes to large files and binary content, we need something else. Fortunately, we now have a FileSystem API standard. It gives each domain a full hierarchical filesystem, and in Chrome at least, these are real files sitting on the user's hard drive. For reading and writing of individual files, the API builds on the existing File API.

Strengths of FileSystem APIWeakness of FileSystem API
  1. Can store large content and binary files, so it's suitable for images, audio, video, PDFs, etc.
  2. Good performance, being an asynchronous API.
  1. Very early standard. Only available in Chrome and Opera.
  2. No transaction support.
  3. No built-in search/indexing support.

Show Me the Code

This section compares how the various APIs tackle the same problem. The example is a "geo-mood" check-in system, where you can track your mood across time and place. The interface lets you switch between database types. Of course, this is slightly contrived as in real world situations, one database type will always make more sense than the rest, and FileSystem API is not suited to this kind of application at all! But for demonstration purposes, it's helpful indeed if we can see the different means we can use to achieve the same end. Note too that some of the code snippets have been refactored for readability.

Try the Geo-Mood demo now.

To make the demo interesting, we'll isolate the data storage aspects usingstandard object-oriented design techniques. The UI logic will only know there is a "store"; it won't need to know how the store is implemented, because each store has exactly the same methods on it. So the UI code can just call store.setup(),store.count(), and so on. In reality, there are four implementations of the store, one for each storage type. When the app starts up, it inspects the URL and instantiates the right store.

To keep the API consistent, the methods are asynchronous, i.e. they pass results back to the caller. This is even true for the Web Storage implementation, where the underlying implementation is local.

In the walkthroughs below, we'll skip the UI and geolocation logic to focus on the storage techniques.

Setting up the Store

For localStorage, we do a simple check to see if the store exists. If not, we'll create a new array and store it against the localStorage "checkins" key. We use JSON to convert the structure to a string first, since, in most browsers, localStorage only stores strings.

if(!localStorage.checkins) localStorage.checkins = JSON.stringify([]);

For Web SQL Database, we need to create the database structure if it doesn't exist. openDatabase fortunately creates the database automatically if it doesn't exist, and, likewise, we use the SQL phrase "if not exists" to ensure the new checkins table is not overridden if it is already present. We have to define the structure of the data upfront, i.e. the name and type of each column in the checkins table. Each row will represent a single checkin.

this.db = openDatabase('geomood','1.0','Geo-Mood Checkins',8192);this.db.transaction(function(tx){
  tx.executeSql("create table if not exists "+"checkins(id integer primary key asc, time integer, latitude float,"+"longitude float, mood string)",[],function(){ console.log("siucc");});});

Indexed Database setup takes some work, because it enforces a database version system. When we make a connection to our database we specify which version we're expecting, if the current database uses a previous version, or hasn't been created yet, the onupgradeneeded event is fired, and onsuccess is called once the upgrade is complete. If no upgrade is needed onsuccess is called straight away.

Another thing we do here is creating a mood index, so we will later be able to quickly search for all checkins matching a particular mood.

var db;var version =1;

window.indexedStore ={};

window.indexedStore.setup =function(handler){// attempt to open the databasevar request = indexedDB.open("geomood", version);// upgrade/create the database if needed
  request.onupgradeneeded =function(event){var db = request.result;if(event.oldVersion <1){// Version 1 is the first version of the database.var checkinsStore = db.createObjectStore("checkins",{ keyPath:"time"});
      checkinsStore.createIndex("moodIndex","mood",{ unique:false});}if(event.oldVersion <2){// In future versions we'd upgrade our database here.// This will never run here, because we're version 1.}
    db = request.result;};

  request.onsuccess =function(ev){// assign the database for access outside
    db = request.result;
    handler();
    db.onerror =function(ev){
      console.log("db error", arguments);};};};

Finally, FileSystem setup. We'll store each checkin in its own file, JSON-encoded, and all of them inside a "checkins/" directory. Again, this is not the most appropriate use of FileSystem API, but good for demonstration purposes.

The setup gets a handle on the overall FileSystem, using it to check for the "checkins" directory. If it's not there, we create it with getDirectory.

setup:function(handler){
  requestFileSystem(
    window.PERSISTENT,1024*1024,function(fs){
      fs.root.getDirectory("checkins",{},// no "create" option, so this is a read opfunction(dir){
          checkinsDir = dir;
          handler();},function(){
          fs.root.getDirectory("checkins",{create:true},function(dir){
              checkinsDir = dir;
              handler();},
            onError
          );});},function(e){
      console.log("error "+e.code+"initialising - see http://goo.gl/YW0TI");});}

Saving a Check-in

With localStorage, we simply pull the check-in array out, add a new one to the end, and save it again. We also have to do the JSON dance to store it in string form.

var checkins = JSON.parse(localStorage["checkins"]);
checkins.push(checkin);
localStorage["checkins"]= JSON.stringify(checkins);

With Web SQL Database, we run everything inside a transaction. We're going to create a new row in the checkins table, It's a straightforward SQL call, and instead of including the checkin data in the "insert" command, we use "?" syntax because it's cleaner and more secure. The actual data - the four values we want to store as columns in the new checkins row - are specified in the second row. The "?" elements will be replaced by those values (checkin.timecheckin.latitude, etc.). The next two arguments indicate functions which will be called when the operation has completed, one for success and one for failure. In this app, we use the same generic error handler for all transactions. In this case, the success function is simply the handler that was passed into the search function - we ensure the handler will be called on success so that the UI logic can be notified when the operation has been completed (e.g. to update the count of checkins so far).

store.db.transaction(function(tx){
  tx.executeSql("insert into checkins "+"(time, latitude, longitude, mood) values (?,?,?,?);",[checkin.time, checkin.latitude, checkin.longitude, checkin.mood],
    handler,
    store.onError
  );});

Once the store is set up, saving in IndexedDB is almost as simple as Web Storage, with the advantage of working asynchronously, in a transaction:

var transaction = db.transaction("checkins",'readwrite');
transaction.objectStore("checkins").put(checkin);
transaction.oncomplete = handler;

With FileStore, once we create a file and get a handle on it, we can use the FileWriter API to populate it:

fs.root.getFile("checkins/"+ checkin.time,{create:true, exclusive:true},function(file){
  file.createWriter(function(writer){
    writer.onerror = fileStore.onError;var bb =newWebKitBlobBuilder;
    bb.append(JSON.stringify(checkin));
    writer.write(bb.getBlob("text/plain"));
    handler();}, fileStore.onError);}, fileStore.onError);

The next function fishes out all checkins matching a particular mood, so the user can see where and when they were happy recently, for example. With localStorage, we have to manually walk through each checkin and compare it to the mood, building up a list of matches. It's good practice to return clones of the data that's stored, rather than the actual objects, since searching should be a read-only operation; hence we pass each matching checkin object through a genericclone() operation.

var allCheckins = JSON.parse(localStorage["checkins"]);var matchingCheckins =[];
allCheckins.forEach(function(checkin){if(checkin.mood == moodQuery){
    matchingCheckins.push(clone(checkin));}});
handler(matchingCheckins);

With Web SQL Database, we perform a query that returns only the checkin rows that we need. However, we still have to manually walk through that list to accumulate the checkin structures, as the database API returns database rows, rather than an array. (This is a good thing for large result sets, but right now, it adds some work for us to do!)

var matchingCheckins =[];
store.db.transaction(function(tx){
  tx.executeSql("select * from checkins where mood=?",[moodQuery],function(tx, results){for(var i =0; i < results.rows.length; i++){
        matchingCheckins.push(clone(results.rows.item(i)));}
      handler(matchingCheckins);},
    store.onError
  );});

Naturally enough, the IndexedDB solution uses an index, the index on "mood we created earlier, called "moodIndex". We use a cursor to iterate through each checkin matching the query. Note that this cursor pattern can also be used against an entire store; so, with indexes, it's like we get a window into the store where we can only see matching objects (similar to a "view" in traditional databases).

var store = db.transaction("checkins",'readonly').objectStore("checkins");var request = moodQuery ?
  store.index("moodIndex").openCursor(newIDBKeyRange.only(moodQuery)):
  store.openCursor();

request.onsuccess =function(ev){var cursor = request.result;if(cursor){
    handler(cursor.value);
    cursor["continue"]();}};

As with many traditional filesystems, there's no indexing, so a search algorithm (like that used by the Unix "grep" command) must iterate through each file. We extract a Reader from the checkins directory, which lets us walk through each file via readEntries(). For each file, we again extract a reader, and inspect its contents via readAsText(). As these operations are asynchronous, we need to chain calls together, which is the function served by readNext().

checkinsDir.createReader().readEntries(function(files){var reader, fileCount=0, checkins=[];var readNextFile =function(){
    reader =newFileReader();if(fileCount == files.length)return;
    reader.onload =function(e){var checkin = JSON.parse(this.result);if(moodQuery==checkin.mood||!moodQuery) handler(checkin);
      readNextFile();};
    files[fileCount++].file(function(file){ reader.readAsText(file);});};
  readNextFile();});

Counting All Checkins

Finally, we need to count all checkins.

For localStorage, we simply de-serialize the checkins array structure and find its length.

handler(JSON.parse(localStorage["checkins"]).length);

With Web SQL Database, we could retrieve each row in the database (select * from checkins) and look at the length of the result set, but if we know our way around SQL, there's an easier - and faster - way. We can perform a special select statement to retrieve the count. It will return exactly one row, having one column containing the count.

store.db.transaction(function(tx){
  tx.executeSql("select count(*) from checkins;",[],function(tx, results){
      handler(results.rows.item(0)["count(*)"]);},
    store.onError
  );

Unfortunately, Indexed Database doesn't offer any counting facility, so we have to iterate through all checkins.

var count =0;var request = db.transaction(["checkins"],'readonly').objectStore("checkins").openCursor();
request.onsuccess =function(ev){var cursor = request.result;
  cursor ?++count && cursor["continue"](): handler(count);};

 

For FileSystem, a directory reader's readEntries() method provides a list of files, so we can just return the length of that list.

checkinsDir.createReader().readEntries(function(files){
  handler(files.length);});

Summary

This has been a high-level overview of modern client-side storage techniques. You should also check out the overview on offline apps

Author : Michael Mahemoff

The Meaning of "Offline"

"Web" and "online" are two closely associated terms, downright synonymous to many people. So why on earth would we talk about "offline" web technologies, and what does the term even mean?

At one level, we can talk about completely offline web apps, those unusual creatures that run inside a browser, but never see the light of the internet. Typically, they will run on a file:/// URI, pointing to a hard drive, USB key, or DVD (if you remember those). For example, those USB keys you sometimes get at conferences, which you stick in your computer to receive a multimedia presentation of something-or-other, running in your browser. A more complex example is the concept of Single Page Applications, like TiddlyWiki, a kind of personal notepad that runs entirely offline, exploiting custom browser features to save changes to the hard drive.

However, it gets more nuanced than "always-offline". Another class of applications are "online-offline" apps (sometimes just called "offline apps", somewhat ambiguously). These apps are ultimately intended to sync with the cloud, but can survive periods of downtime. The downtime might be an interim loss of connection, like passing through a tunnel, or an extended period of offline activity, like a long-haul flight. Online-offline applications will handle these cases with varying degrees of gracefulness. An example is GMail's offline facility.

There are also online-offline applications which intentionally keep certain information locally. Sometimes, that might be done to tie data to the browser, rather than the user's account, e.g. the current song being played by a media player application. Local storage is also an interesting alternative to server-based storage for simple applications without user registration; by offloading data storage to clients, you avoid the hassle of maintaining a database on your server. Of course, you should only do this for non-critical data, e.g. a list of recent searches. Another reason is about security and privacy, the concept of Host-Proof Hosting. An early example is HalfNote, by Aaron Boodman (now a Chrome engineer).

Online-offline applications comes at the cost of increased complexity, and you ought to consider whether offline support is justified. The argument against this mode is best summed up in a famous article from a few years back. David Hanson makes the argument that it's not worth making offline apps just for airplane activity, which is the one place you're likely to be offline for a long period of time. But there are still situations where it's worthwhile, and also situations where offline technologies are useful for protecting aganst server or network outages. Just beware that it comes at the cost of complexity, and that complexity needs to be justified.

Even assuming internet utopia, where everything is always online, offline technologies still server a purpose. Why? Because it's faster that way. Offline technologies support caching and detailed control over caching process. Therefore, web apps can boot quickly and show data instantly. You might protest that these technologies shouldn't be called "offline" if they are used by purely online apps. Well, the term is ambiguous. At one level, "offline technologies" could be defined as "technologies supporting offline apps", making the protest valid. But another definition is "technologies operating outside of the cloud", and that's the one used here. Of course, there's a lot of overlap between the two definitions; offline apps certainly need to operate outside of the cloud.

As well as quick startup, offline can be used in a transient way too. Although bandwidth is much faster these days, there are still noticeable limitations for certain applications, in terms of both throughput and latency. If you were designing a high-density video editor, you'll probably find the video is too big to fit into memory at once, meaning you have to swap content in and out of your app. It would be painfully slow to store the whole thing on a server (unless you and all your users are lucky enough to be on a super-fast network). Much better to store the entire video offline and push changes to the server as and when they happen.

Application Cache and Offline Storage

There are primarily two offline capabilities in HTML5: application caching and offline storage (or "client-side storage"). The distinction is core application logic versus data. Application caching involves saving the application's core logic and user-interface. Offline storage is about capturing specific data generated by the user, or resources the user has expressed interest in.

Imagine a game. Application caching would retain the initial HTML document, the JavaScript, the CSS, frequently used icons and images for game characters and scenes, and sound samples. Consequently, the next time the user visited the site, it would load immediately. To some extent, regular browser caching will already enable that capability, but browser caches are easily overwritten, whereas application caches are intended to enjoy special status on the user's hard drive, so they can't be displaced just because the user downloaded a large video. In addition, there are APIs to control what's cached and what's not, which is not the same as regular browser caching.

In our game example, visiting the site the second time round would load the site instantly, but what if we wanted to let the user continue playing the game from a previous position, i.e. we'd like to restore the state of the game. The kind of data required is not held in the application cache, because it's user-specific, so this is a job for offline storage. Whenever the user hits save, we upload to the cloud but we also store game data on disk at the same time, i.e. store it offline (for example, we might save the player's health level and the states of all objects and characters inside the game universe). We might even keep a loop running to continuously store the game data offline. That way, the latest changes are still there if the user lost connectivity and had to shut down the browser before they could get online again. It also takes less bandwidth than continuously uploading state to the server, e.g. we might save data locally every 10 seconds, while uploading it only every 5 minutes as a bandwidth-saving measure.

Read more about AppCache.

Older Offline Storage Techniques

While there's only one kind of application cache, there's a whole flock of offline storage capabilities available, or becoming available. Let's begin with a quick survey of older storage techniques, those that have been around several years and are not generally associated with the "HTML5" label.

Cookies

Cookies have been around since the early days of the web. They were originally intended to associate a little data with the user, though these days, most applications only store identifying information, and store the rest of the user's data on the server. Still, the fact that cookies can store some data does put them in the category of offline storage. However, the data capacity is extremely limited, as low as 20 cookies limited to 4KB each according to specification. Just as important, cookies slow down network activity because they are transferred to and from the server inside HTTP headers.

Plugin Based Storage

Plugin based storage has been supported by several plugins, notably Flash, Java, and Google Gears. Since Flash 6, there has been the concept of a local shared object. Java applets have also been an option, since Java can write to local files. Google Gears is a third example of a plugin, and in its case, offline storage was one of the main selling points right from its inception. The main downside of plugin based storage is the obvious one: You have to assume the plugin is present, and with the right version. It's not uncommon for company firewalls to block plugins and for plugin releases to lag behind, or not exist at all, on certain operating systems. You also have to have faith in the plugin manufacturer keeping up plugin quality and market share going forth, versus the easy substitutability of open-source, open-standards based, browsers.

Browser-specific features

Browser-specific features are capabilities that are available, but outside the realm of standard web technologies. They typically have an "underground" nature, being little-known and lightly documented. For example, TiddlyWiki and the derivativetwFile jQuery plugin use IE-specific and Firefox-specific APIs to access the underlying file system on those respective browsers, falling back to Java applets where that's not possible. IE, since version 5, also supports the UserData API. Similar to plugin based storage, the problem with these is significant: You're requiring the user to be working with a particular browser and you're putting faith in the browser manufacturer going forward. This is different from using features only present in just one or two browsers so far, where the feature is part of an open standards process and other manufacturers are likely to include it in the future.

Offline Storage in the Era of HTML5

The newer storage APIs, what we might call "HTML5 storage", are generally superior in terms of openness and standards compliance. Of course, not all browsers include all of the new APIs, and you may have to support older browsers that don't include any of them at all. So the older techniques are still useful for graceful degradation.

The new APIs are Web Storage (also called "Local Storage" or "DOM Storage"), Web SQL Database (or simply "SQL Storage"), IndexedDB, and File Storage. In all cases, the same-origin principle applies: the store is tied to the origin of the site that creates it (typically this amounts to the site domain or subdomain), so it cannot be accessed by any other origin. Stores and origins are therefore associated 1:N, where N is the number of distinct storage mechanisms represented by the current browser. For example, Chrome currently supports Web Storage and Web SQL Database Storage, so a given origin (e.g., "html5rocks.com") has access to precisely two dedicated stores when the user is visiting from Chrome.

Now let's meet each of the new storage technologies.

Web Storage

Web Storage is a very convenient form of offline storage, being just a simple structure of key-value pairs like any JavaScript object. This works well for situations such as the game example, where you could store the current game under a "currentGame" key: localStorage["currentGame"] = gameState, and load it back the same way: gameState = localStorage["currentGame"]. If you wanted to keep a list of saved games, you'd just store an array inside this structure, savedGames = localStorage["savedGames"];.

Although the spec indicates values can be any type, this is not the case in practice. Browsers presently support only string values, so you will need to serialize your values for storage (i.e., convert them to strings). Fortunately, we have built-in serialization these days, by way of the JSON API, so you can do something like: localStorage["currentGame"] = JSON.stringify(gameState)and gameState = JSON.parse(localStorage["currentGame"]).

There's also a variant of localStorage: sessionStorage. It's there mostly for security purposes, as the data is removed when the window is closed.

For many offline storage scenarios, Web Storage might be all you need, and it's the most compatible format right now. However, the data isn't structured, so if you wanted to locate all games at level 10 or above, for example, you'd have to iterate through every single game item manually.

A gotcha and a further downside of Web Storage is the lack of transactions. For example, the user might have your app open in two tabs. Tab 1 starts writing several things to the database, then tab 2 starts reading it, getting the data when it has only partially been updated. Thus, Web Storage is "racy", meaning it's susceptible to race conditions, and you need to take precautions to ensure the integrity of your data and the accuracy of any queries.

Web SQL Database

Web SQL Database is just like it says on the tin: an offline SQL database. Implementations today are based on SQLite, an general-purpose open-source SQL engine. It comes with all the pros and cons of traditional databases. On the one hand, you have a fully relational structure, allowing you to rapidly query and manipulate data via joins, e.g. "delete all saved game where the level is less than 10 and the game hasn't been loaded in the past 30 days". You can express that in a line of SQL, whereas you'd have to manually wander through the equivalent corpus of Web Storage data. And if you have a strong need for performance, you can also lean on several decades of research on database optimisation (though tuning would have to be done separately for each possible SQL engine being used).

Furthermore, there's support for transactions, your database is protected from the kind of race conditions that can arise with Web Storage.

On the other hand, Web SQL Database also comes at the cost of complexity. Do you really need a full-blown database in your front-end web client? Do you really want to go through the effort of specifying a schema, and coding up SQL for every data access? Sometimes it's overkill. The other issue is compatibility. For now, Microsoft and Mozilla haven't put it in their plans and don't have any (public) plans to do so. You can still use it if you feel there will be advantages, but unless you only need to target a particular browser which supports it (e.g. a Chrome extension), don't use it in a way that assumes it's definitely present.

Read more about IndexedDB.

IndexedDB

IndexedDB is a nice compromise between Web Storage and Web SQL Database. Like the former, it's relatively simple; and like the latter, it's capable of being very fast. The trick is to take a standard mapping, like LocalStorage, but index on certain fields inside the stored data. So if your game data was a structure including a "level" field, you could tell IndexedDB to index on "level". Consequently, you can make fast queries about game levels; your application won't have to trawl through all the game data structures, because it has a separate list of just the "level" fields. And it has a concept of cursors, so your application can quickly walk through result sets.

As with Web SQL Database, you also have transactions available to protect against race conditions.

Read more about IndexedDB.

Filesystem API

File API: Directories and System fills a niche not supported by the other techniques. It gives you a way to store binary content (as well as plain-text), create folder hierarchies, and a means to store potentially huge data structures. There are separate specifications for its other component APIs: FileReader andFileWriter.

The Filesystem API is supported today in Chrome, enabling apps to create and access and entire sandbox directory structure.

If you need a directory structure, use this API to store large chunks of data, e.g. large text documents or even entire movies.

Read more about the File APIs.

How to Use the Offline Technologies

In this section, we'll look at some common problems developers face. The solutions are somewhat speculative, given that most of these technologies are new and in some case, not even implemented yet! Nevertheless, applications like GMail have been using some aspects of these technologies for several years (as well as the earlier Google Gears plugin), and so we can suggest some meaningful rules of thumb.

Storing miscellaneous data

All things being equal, Web Storage should be your default port of call for storing offline data. It's a dead-simple API and one that's widely supported. Only consider using the other techniques if you have a specialised requirement not satisfied by Web Storage.

Extracting binary resources

You will sometimes want to extract binary data from a server so it can be retained locally. You can download binary files very easily with data type improvements made to XMLHttpRequest 2. See Fetching data.

Note this isn't specifically about offline technologies, but it's a common task that will come up when you're using them. The section below explains how you can store binary data, once you've read it.

Storing binary data

You can store binary data directly or you can convert it to plain text. For direct binary storage, you can either use a Web SQL Database with a column of "BLOB" type or the File Writer API. Alternatively, you can convert the binary content to plain text using a binary encoding technique. The obvious choice is Base 64, since that's the format commonly used on the net and in HTML5, e.g. it's supported by browsers via data: URIs. When you come read back your base64-encoded content, you'll find you can include it in your web app using a data URI, e.g.image.src = "data:image/jpg;base64," + base64data.

Unfortunately, base64 encoding can be quite slow, so use the techniques described in the next section to ensure the system stays responsive to user events. Alternatively, it might be worthwhile exposing base64-encoded representations right on your server, so your app can download them and directly store them. It will take up a little more space, to the tune of around 30%, but the UI will run faster.

Ensuring the Application Runs Smoothly

When dealing with large amounts of data, there's a risk your JavaScript will get so busy processing data, it won't devote any time to the user interface. JavaScript is fundamentally single-threaded, meaning that only one thing happens at a time. If the user clicks a button, for instance, and the system is engaged in a long data activity, the click event will linger on a queue for several seconds and the user will be dismayed. Even longer delays will likely lead to the browser opening a dialog box asking the user if they want to stop the script. Best avoided.

Assuming you can't (or don't want to) reduce the amount of data you're storing, the most important thing you can do is exploit asynchronous APIs. With these APIs, the browser will automatically create a new thread for you, do the work, and then return results back to the main thread. While the other thread is doing the work, your user-interface will keep chugging along nicely in the main thread. If you've made XMLHttpRequest before, you'll already know that it's almost always used in asynchronous mode, so you should do that for any Ajax calls. As for storage, most but not all of the APIs have asynchronous modes. LocalStorage is the exception; it's only synchronous, so large quantities of data I/O may slow things down. Web SQL Database, IndexedDB, and File Writer all have both synchronous and asynchronous modes. You should generally favour asynchronous mode, even if it makes the programming model more complex.

The other way to do things asynchronously is to use WebWorkers, a way to introduce new threads into your JavaScript, with your own code running in the derivative thread. However, there are some major limitations; the storage APIs generally won't work inside a Worker thread, and messages passed in and out of Workers can take a long time; I've found transferring several MB to and from a worker will take several seconds and lock up the user interface. So only use WebWorkers if you can't achieve the same thing with an aysnchronous storage call. The situation will improve as more browsers support SharedWorkers, which have less limitations (though more complexity for the developer to deal with).

Handling simultaneous data opertations

It was noted that Web Storage doesn't support transactions, so if you're using it, you have three choices: prevent multiple clients, use locking, design around it, or hope for the best! The last choice is certainly viable, but not for robust web apps. The first choice has been attempted many times, with various tricks used to constrain an app to a single window, but somehow goes against the grain of the web. Locking is more feasible and hopefully libraries will evolve to support this pattern. Designing around it is also possible, as evidenced by various "NOSQL" frameworks for server-side applications and elsewhere. For example, in some circumstances, you might be able to just keep appending info instead of replacing it (and periodically removing old data).

A simpler solution, in many cases, will be to use transactions, as supported by Web SQL Database and (in the future) IndexedDB.

Storing large amounts of data

If you are looking to store many Megabytes - or more, beware that there are limits in place, which are handled in different ways depending on the browser and the particular API we're talking about. In most cases, there is a magic number of 5MB. For Application Cache and the various offline stores, there will be no problem if your domain stores under 5MB. When you go above that, various things can happen: (a) it won't work; (b) the browser will request the user for more space; (c) the browser will check for special configuration (as with the "unlimited_storage" permission in the Chrome extension manifest). In the case of Chrome, extensions can request certain permissions, including "unlimited_storage".

Detecting if you're online

There's an HTML5 API to let you query if a browser is online:navigator.onLine(). It works on recent IE, Firefox, Opera, and Chrome. You can also listen for these online/offline events usingdocument.body.addEventListener("online", function () {...} anddocument.body.addEventListener("offline", function () {...}.

You should generally assume the network is present and make calls as normal. It's only in the event of a server timeout or related error that you should invokenavigator.onLine. Doing so will serve two purposes: (a) you can decide if the error is caused by a network outage versus a server-specific outage, and if it's a network outage, inform the user so they can get back online; (b) you can start watching for the "online" event, and take action when the user's back online.

Structuring file data

The Filsystem API is designed for this use case. However, it is currently not supported everywhere. Alternatives include IndexedDB and Web SQL Database, which support structured data and queries against those structures. The general guideline would be to use one of those if you're in need of complex queries. But what if you need to store file data? You can use the BLOB type of Web SQL Database, but it's not intended for very large data items and it won't work on all browsers.

Ensuring cross-browser compatibility

As you've probably noticed by now, storage support varies widely across the different browsers. If that's true for the latest versions of each browser, imagine the differences that arise when you consider older versions too. To handle this, you're best off creating - or reusing - a common storage abstraction, with calls such asstore.save(id, data) and store.findByIDs(). An example of an open-source abstraction like this is persist-js. An abstraction should determine which storage APIs are available (possibly using the Modernizr API) and then access the store using that API. This will help keep your library portable.

Protecting against lost data

Offline data is best used for caching content, rather than being the primary placeholder for user content. Browsers generally let users delete local storage, and there's a strong risk some users will do so without realising the consequences. Thus, ensure data is frequently sync'd to the server, preferably in an automated fashion.

Securing offline data

As Nicholas Zakas has pointed out, a risk with offline storage is that users sometimes share the same browser. This may not be true with mobile browsers, but it's certainly true in many family homes and scarily more true in the case of internet cafes. There's a big risk of sensitive data being available to other users. They may not even be malicious users; they might just visit the site as someone else, only to find all of the previous user's locally stored data in front of them.

A good compromise may be to use the LocalStorage variant, SessionStorage, which deletes data when the window is closed. However, even this is not guaranteed to happen when a user leaves, and sessionStorage partly defeats the purpose of offline storage anyway. Zakas proposes encrypting any sensitive data before storing it locally, though it remains to be seen if this will become a common practice among developers. There's a counter-argument that users can encrypt their own file system separately (as happens with many corporate systems), and that users who leave other data on their hard drive, e.g. cookies and regular data files, are already at significant risk.

For now, use offline storage for caching public data and perhaps other data of a general nature, e.g. an e-book the user has paid to download. For more sensitive data, you should consider the usage context and the nature of the data; sometimes it will make sense to store offline, sometimes not.

Summary

Offline storage has many potential applications, well beyond the obvious "I want my application to run when users are on a plane" scenario. Developers in the past had to resort to workarounds, plugins, or browser-specific APIs, but modern developers have a veritable arsenal of standards-based capabilities to leverage. The challenge is to decide which to use when, and to use them in the right way. Hopefully, this article is helpful in guide your decision process.

Author : Michael Mahemoff

Storing Data on The Client with LocalStorage

The LocalStorage API gives front-end web developers access to a simple key-value datastore that can be used to save data on a users computer. Saving data on the client-side can help to speed up the performance of your web applications as it can reduce the number of database queries that are needed on the server. This frees up valuable server resources and can potentially even lead to reduced infrastructure costs.

Before the introduction of LocalStorage, developers that wanted to store data on the client would need to use browser cookies. While this approach did work it had some problems. The first issue is that a cookie can only store 4,096 bytes of data, which isn’t really all that much. Another issue is that cookies are sent up to the server with every HTTP request that is made by the client. This increases the size of requests, leading to higher bandwidth usage and slower request times.

In this blog post you are going to learn how to take advantage of the new LocalStorage API when building your own web applications. Lets go!

Checking for Browser Support

LocalStorage is a new technology and therefore it is important that we test for browser support and consider fallbacks for browsers that do not support the API. Testing for LocalStorage support is very straight forward. All you need to do is create a simple if statement that contains the localStorage interface as the condition. Note the lowercase ‘l’ in localStorage.

The JavaScript code below shows how you could test to see if a browser supports the LocalStorage API.

if (localStorage) {
  // LocalStorage is supported!
} else {
  // No support. Use a fallback such as browser cookies or store on the server.
}

If the browser does not support LocalStorage you could fallback to using browser cookies or just send the data to be stored on the server.

Now that you understand how to check for support for LocalStorage lets take a look at what the API has to offer.

Storing Data in LocalStorage

To store data you use the setItem() function. This function takes two parameters, the item key and a value.

localStorage.setItem('name', 'Matt West');

Note: There are multiple ways to interact with the localStorage interface. In this blog post I use the functions outlined in the official specification, but you can also treat the localStorage interface like a JavaScript object or array. The examples below will all store data.

// Functions
localStorage.setItem('name', 'Matt West');

// Object
localStorage.name = 'Matt West';

// Array
localStorage['name'] = 'Matt West';

For the remainder of this blog post I will be using functions to interact with the localStorage interface.


Lets take a look at a simple use of LocalStorage in a website. The code below is markup for a contact form. When the user submits the form we are going to save their name so that we can use it later to show them a personalised message.

<form id="contactForm" action="contact.php" method="POST">
  <div class="field">
    <label for="name">Name</label>
    <input type="text" name="name" id="name">
  </div>
  <div class="field">
    <label for="email">Email</label>
    <input type="email" name="email" id="email">
  </div>
  <div class="field">
    <label for="message">Message</label>
    <textarea name="message" id="message"></textarea>
  </div>
  <div class="field">
    <input type="submit" value="send">
  </div>
</form>

The JavaScript code below shows how you could intercept the form submission and save the user’s name.

window.onload = function() {

  // Check for LocalStorage support.
  if (localStorage) {

    // Add an event listener for form submissions
    document.getElementById('contactForm').addEventListener('submit', function() {
      // Get the value of the name field.
      var name = document.getElementById('name').value;

      // Save the name in localStorage.
      localStorage.setItem('name', name);
    });

  }

}

Now that you know how to save data to the datastore lets take a look at how to get it back out again.

Retrieving Data from LocalStorage

To retrieve data you use the getItem() function. This takes a single parameter; the key that you used when storing data.

var name = localStorage.getItem('name');

Building on our previous contact form example, in the code below we retrieve the user’s name from the datastore and then update an element on the page with the text Hello {name}!.

window.onload = function() {
  ...

  // Retrieve the users name.
  var name = localStorage.getItem('name');

  if (name != "undefined" || name != "null") {
    document.getElementById('welcomeMessage').innerHTML = "Hello " + name + "!";
  } else
    document.getElementById('welcomeMessage').innerHTML = "Hello!";
  }
}

The if statement is used to make sure that there is a name stored in the database. If we didn’t do this our program might say Hello undefined!. If you attempt to access data that does not exist the localStorage interface will return either null orundefined depending on how you tried to access the data (see the note earlier in this post).

Removing Data from LocalStorage

To remove an item from the datastore you use the removeItem() function. This function takes the key of the item that you wish to delete.

localStorage.removeItem('name');

Clearing the Datastore

If you want to delete all of the data in the datastore you can use the clear()function.

localStorage.clear();

Retrieving Keys

The localStorage interface also includes a function calles key(), that can be used to retrieve the key of a data item using the index (numerical position) of the item in the datastore. Admittedly you will probably not be using this function very often but it is useful to know that it exists.

The JavaScript code below shows how you might use this function to output the keys for each of the items in the datastore.

for (var i = 0; i < localStorage.length; i++) {
  console.log(localStorage.key(i))
};

Note the use of localStorage.length in this example. The length property tells you how many items are in the datastore.

Sandboxing and Storage Limits

The data that you add to the LocalStorage datastore is sandboxed to your websites domain name. This means that your web application cannot see the data stored by any other applications and those applications cannot see the data stored by yours. This is an important security measure.

Sub-domains (i.e. a.example.com, b.example.com, c.example.com) are treated as separate domains and are therefore given their own datastore.

There is a limit on how much data you can put in LocalStorage. This limit is set by browser vendors and therefore varies between browsers. To be safe, you should assume that your web application has only 2.5mb of storage space. This should be more than enough space as LocalStorage is only designed to hold basic key-value data. If you find yourself needing more space you might want to reconsider whether LocalStorage is the best storage technology for your application.

SessionStorage

Data stored in LocalStorage is persitent. This means that if you store some data, close your browser and then open up your appllication again, all of the data will still be retrievable. However you may only want to store some data for the duration of a user session. In this case you can use the sessionStorage interface. This has all of the same functions that localStorage does but the data you save will automatically be wiped when the user closes the browser tab.

// Storing Data
sessionStorage.setItem('name', 'Matt West');

// Retrieving Data
var name = sessionStorage.getItem('name');

// Deleting Data
sessionStorage.removeItem('name');

// Retrieving an Item Key
sessionStorage.key(n);

// Clearing the Datastore
sessionStorage.clear();

Thoughts on LocalStorage

The LocalStorage API allows developers to make considerable advances in the performance of their web applications. Looking past this initial advantage, LocalStorage also provides a datastore for offline applications (assuming that a technology like AppCache is being used to make the other application resources available). I am especially interested about the potential applications that LocalStorage has for HTML5 mobile applications. Mobile devices are much more likely to be without an internet connection and therefore without access to traditional server-based datastores.

Are you using LocalStorage in any interesting ways in your projects? Let us know in the comments below.

Example :

$(document).ready(function () {

    // Retrieve the data.

    var items = [];

    for (var i = 0; i < 50; i++) {

        var person = { 'name': 'name' +i, 'value':i };

        items.push(person);

    }

    if (localStorage) {

        //for (var i = 0; i < localStorage.length; i++) {

        //    console.log(localStorage.key(i))

        //};

        var tempData = localStorage.getItem('tempData');

        if (tempData != "undefined" || tempData != "null") {

            localStorage.setItem('tempData', JSON.stringify(items));

        }

    }

    var getTempData = JSON.parse(localStorage.getItem('tempData'));

    //localStorage.removeItem('tempData');

    console.log('getTempData: ', getTempData);

    // Removing Data from LocalStorage

    //localStorage.removeItem('getTempData');

    // Clearing the Datastore

    //localStorage.clear();

});

Author : Matt West

Troubleshooting Expired ASP.NET Session State and Your Options

I have a love/hate relationship with the ASP.NET Session. It's such a convenient place to put things, but when you start putting applications into production there are a number of less-than-obvious edge cases that can come up and bite you.

Most often the Session is used when managing state over a long process like a multi-step wizard or questionnaire. However, when people use the Session, they often lean on it a little. They'll bake it into their design so deep that when it doesn't work, they're screwed. That's not to say they shouldn't be able to lean on it, I'm just saying that there's a lot of things going on with Session (not just on ASP.NET, but other frameworks as well) in order to get it to look seamless.

BUILT IN OPTIONS

ASP.NET offers three options (four if you count rolling your own).

§  Inproc - The default, and usually works fine. However, you can get into trouble in a few scenarios.

§  Web Farms - If you have more than one web server, it's important to remember that your users may not "stick" to the same webserver with each request. Some routers offer Sticky-Sessions or the ability to "pin" a user to a server. This works well if the router uses cookies as its key, but it's less reliable if the router uses IP address/source port as the key as these may change, especially if the user is behind a mega-proxy.

§  Web Gardening - If you've setup IIS to run multiple instances of the IIS Worker Process on a single multi-proc machine, this is the equivalent of running a Web Farm, just on one machine. This technique is usually only useful when you've got a very CPU-intensive application - in other words, don't just turn on Web Gardening and expect your problems to get better instantly. It's subtle.

§  Unexpected Process Recycling - IIS6 had some wonky defaults and would recycle the AppPool or Process when some certain limits were hit, like after number of requests or after 20 minutes. This is the classic "flaky session state is expiring" issue that lots of folks hit. You'll be more likely to see this if you've got really long running processes where users are logged in for long periods of time.

§  Out of proc - A good next step, this moves session out to a Windows Service. You can run one per Web Farm (meaning, you've got multiple machines but one instance of this service) and your session data will survive process recycles, but not system reboots. This is useful for both Web-Gardening and Web-Farming.

§  Folks usually forget to mark their objects as [Serializable] which basically gives your objects "permission" to leave their process space and be stored in memory in the State Service. If you've got a high-traffic site you might want to avoid storing complex objects and object graphs as you'll pay for it on the serialization. Of course, with all things, measure everything! You'll get best performance if you stick with basic types like strings, ints, etc.

§  UPDATE: I wanted to update this post and point folks to Maarten Balliauw's most excellent series on Out of Proc Session State (StateServer). He covers the basic setup, which is unremarkable, but then digs into the advanced stuff including "partitionResolvers" which I am ashamed to say I hadn't heard of! Recommend.

§  ASP.NET load balancing and ASP.NET state server (aspnet_state)

§  ASP.NET Session State Partitioning using State Server Load Balancing

§  SQL Server - The most robust, but now you'll pay for not only serialization, but storage. However, SQL Server is a highly tuned system and if you've got a site with any significant traffic I really recommend just skipping out-of-proc and putting your session state into a SQL Server with a lot of memory. Rather than trusting ASP.NET out of proc Session State Server to be a small database, leave the database work to the databases.

§  The benefits of SQL Server for your Session State include surviving process recycles and reboots.  but more importantly using removes a lot of variables from your troubleshooting in the sense that you no longer worry about the storage of your Session, now you just need to worry if your Session Cookies are getting passed back and forth from browser to server.

§  Make sure you're using Windows Integrated Security and that you decide if you want ASP.NET to store Session in tempdb (which won't survive a SQL recycle) or a dedicated database (my recommendation).

TROUBLESHOOTING

There's a number of things that can go wrong, some of which I mention above, but here's what I usually run through when troubleshooting things.

§  Is the ASP.NET SessionID Cookie actually moving back and forth between browser and server. This can be confirmed by:

§  Using an HTTP Sniffer like ieHttpHeaders or HttpWatch or Fiddler and confirming that the Session ID cookie's value isn't changing between requests.

§  Confirming that the cookie isn't being blocked by IE, privacy settings, lack of a P3P policy document, local firewall like ZoneAlarm or Symantec, or a corporate proxy with an attitude problem.

§  Is IIS recycling  the AppPool or Worker Process? Confirm the settings in IIS manager and make sure they are right for what you're doing.

§  Is the session timing out? Are you sure you're hitting the same VDir from whence you came and successfully resetting the sliding expiration on the Session ID?

§  Is some other thing like an Ajax call or IE's Content Advisor simultaneously hitting the default page or login page and causing a race condition that calls Session.Abandon? (It's happened before!)

At my last company Session became such a hassle for large high traffic applications that we just stopped using in-proc and started exploring alternatives.

SOME 3RD PARTY SESSION STATE OPTIONS

§  NCache from Alachisoft - An in-memory object cache that's distributed across your web farm. Think of it like Out of Process Session State, but distributed/clustered in their Enterprise Edition.

§  ScaleOut Software SessionServer - Fast, scalable in-memory storage that is distributed across machines. Full Disclosure: we worked with these guys while I was at Corillian, but never put them into production.

§  Memcached Session State Provider - Fahad has created ASP.NET Session State providers that will talk to memcached, a very popular distributed memory caching system originally created for LiveJournal.com and now used all over.