Part 1 - Fundamental Enhancements to the Primary Data Store
Note: the code described in this article
is based on the Beta 1 release of the .NET Framework v 2.0, with notes as to
the upcoming changes in the Beta 2 release. All the examples can be downloaded
from our Web site at http://www.daveandal.net/articles/datasetv2/, and you can also run most of them online from the same location.
Version 1.0 of the .NET Framework introduced a completely new approach to handling relational data,
by separating out the two basics functions of all data access application
requirements into separate classes that are optimized for each specific task.
For applications that just need to access rowsets in a stream fashion, the DataReader classes are lightweight and give great performance in a connected
scenario.
On the other hand, the DataSet class is designed for
applications that require disconnected access to data, or which need to persist
data without maintaining an open connection to the database. The DataSet is rather
like the Recordset object from classic ADO, but has several useful advantages:
- The DataSet can store more than one table or rowset.
- The DataSet can store the relationships between these rowsets.
- The DataSet provides excellent integration with XML, through a range of methods and
properties that control the format and serialization capabilities of the
stored data when reading and writing XML.
Although the DataSet class in version 1.x of the .NET Framework
performs well, and can provide a suitable data store for all kinds of
disconnected applications, Web Services, etc., there are some shortcomings. These
have been addressed in version 2.0, where the DataSet grows up into a fully-fledged and
extremely powerful data container that supports much better serialization,
update capabilities and performance than in version 1.x.
In this and a follow-up article, we look at the
changes between the version 1.x and version 2.0 DataSet and their associated classes,
showing you how you can take advantage of the new features to improve your
applications' capabilities and performance. The topics we'll be covering in
this part are:
- Loading a DataSet, and the new LoadOption enumeration.
- Using stand-alone DataTable instances.
- Streaming data into and out of DataSet tables with a DataTableReader.
- Performing batched updates from a DataSet.
- Performance and scalability improvements.
Then, in a follow-up article, we'll look in more detail at:
- Manipulating the XML data type with a DataSet.
- How the DataSet supports user-defined data types (UDTs).
Loading a DataSet, and the New LoadOption Enumeration
One of the issues developers came across when using version
1.x of the DataSet was the awkward way that you have to use the GetChanges and Merge methods of the DataSet to support
updates where you want to allow user-reconciliation of update errors. You must
keep a copy of the DataSet containing the user's updates, so that the current values (the values
they entered) are not lost when the Fill method of the DataAdapter refreshes the rows after attempting an update (or if the stored procedures you
use for the update return the values from the database after the update has
been executed).
In version 1.x, this could result in a six-step process,
where you had to:
- Fill the DataSet with the rows for display to the user
- Allow the user to edit the rows
- Call GetChanges on the DataSet to create a copy of the edited rows
- Execute the Update method to push the changes into the database
- Call Fill again to refresh the rows in the DataSet
- Use the Merge method to get the user's original edited values
into the current values of the DataSet for rows where the update
failed.
In version 2.0, Microsoft has addressed this issue by
providing an enumeration called LoadOption, which allows you to control precisely how the
values in existing rows in a DataSet are updated when incoming rows (from the Fill or the new Load methods)
match the existing rows on the primary key - and hence effectively
"replace" these existing rows. And they have also filled out the API
for the DataSet, DataAdapter and DataView classes
by adding a new property called AcceptChangesDuringUpdate, implementing methods that allow
you (under certain circumstances) to change the RowState property value for rows in the DataSet tables,
and adding a method to convert a DataView instance into a DataTable instance. We'll look at all of
these topics in the following sections of this article.
The LoadOption Enumeration
The LoadOption enumeration is used to specify how the values in incoming rows replace (update)
the values in existing rows in the DataSet tables. Remember that this only
applies if the table has a primary key defined (so that incoming rows can be
matched with existing rows), and where the incoming row actually does match
(and hence replace or refresh) an existing row. The names of the enumeration
members differ between the Beta 1 and Beta 2 releases, as shown in Table 1. The
Beta 2 names should be the ones used in the release version of ADO.NET 2.0.
Table 1 - The Values in the
LoadOption Enumeration for Beta 1 and Beta 2
Beta 1 Enumeration Member | Beta 2 Enumeration Member | Description |
PreserveCurrentValues | PreserveChanges | Synchronize the rows and keep the
modifications to rows in the DataSet. Effectively,
only the original values in the existing rows are replaced with
incoming values. |
UpdateCurrentValues | Upsert | Aggregate the rows in the DataSet with the new rows. Effectively, only the current values in
the existing rows are replaced with incoming values. |
OverwriteRow | OverwriteChanges | Synchronize the rows and abandon modifications to rows in the DataSet. Effectively,
both the current and original values in the existing rows are
replaced with incoming values. |
Table 2 shows the effects of each of the LoadOption values
in more detail. You can see, for example, that the PreserveCurrentValues option only updates
the original value in matching rows in the DataSet tables, while keeping the user's
edited value as the current value. This is the ideal situation where you
want to attempt to push the user's changes to the rows into the database, but
allow them to reconcile any failed updates afterwards.
Usually an update will fail because another user has
performed an update on the row in the database while the first user is holding a
disconnected copy of the row in their DataSet (the application should, of
course, validate values they enter and prevent most of the other common update
errors). The application will be able to display the value that is in the
database now (the underlying value) and the proposed value that was
entered by the user (the current value), so that they can make an
informed decision about how to resolve the failed update.
Table 2 - The Effects of the LoadOption
Enumeration in More Detail
RowState of Existing Row | PreserveCurrentValues PreserveChanges | UpdateCurrentValues Upsert | OverwriteRow (default) OverwriteChanges |
Added | Current = Existing Original = Incoming RowState = Modified | Current = Incoming Original = Existing RowState = Added | Current = Incoming Original = Incoming RowState = Unchanged |
Modified | Current = Existing Original = Incoming RowState = Modified | Current = Incoming Original = Existing RowState = Modified | Current = Incoming Original = Incoming RowState = Unchanged |
Deleted | Current = Existing Original = Incoming RowState = Deleted | In Beta 1: Undo Delete Current = Incoming Original = Existing RowState = Modified See note below for Beta 2 behavior. | Undo Delete Current = Incoming Original = Incoming RowState = Unchanged |
Unchanged | Current = Incoming Original = Incoming RowState = Unchanged | Current = Incoming Original = Existing if new value = existing: RowState = Unchanged else: RowState = Modified | Current = Incoming Original = Incoming RowState = Unchanged |
No existing matching row in
table | Current = Incoming Original = Incoming RowState = Unchanged | Current = Incoming Original = [n/a] RowState = Added | Current = Incoming Original = Incoming RowState = Unchanged |
Using the LoadOption Enumeration Values
The values in the LoadOption enumeration are used when populating a DataSet table with the Fill method or the new Load method. To use them in the Fill method, you set the FillLoadOption property of the DataAdapter that is performing the Fill:
myDataAdapter.FillLoadOption =
LoadOption.PreserveCurrentValues // Beta 1
myDataAdapter.FillLoadOption =
LoadOption.PreserveChanges // Beta 2
The new Load method takes a LoadOption value as its second parameter, for example:
myDataSet.Load(data-reader,
LoadOption.PreserveCurrentValues, table-name array)
We look at the Load method in more detail later in this article. Meanwhile, to see the
effects of the LoadOption enumeration values, the next example uses them to control how rows
are updated when using the Fill method of the DataAdapter. Figure 1 shows the example page we provide that demonstrates some
of the effects of the LoadOption enumeration. At the top are a
set of option (radio) buttons that you can use to specify the LoadOption value to use, followed by a list of three rows from the Northwind database. The page shows the current and original values, and the value of the RowState property, for
the three stages that each row goes through as the page is executed.

Figure 1 - The Example Page that Demonstrates the Effects
of the LoadOption Enumeration
The three stages of processing in the
example page are:
- Fill a DataSet table with
two rows from the database.
- Edit these rows within the code by
modifying the first one, deleting the second one, and adding a new row to
the table.
- Refresh the table by calling the Fill method again, using the selected value from the LoadOption enumeration in the FillLoadOption property of the DataAdapter.
There is quite a lot of code in the page,
because it has to keep copies of the DataSet at each stage to
be able to display the values afterwards. It also has to create a primary key
on the table in the DataSet before making copies, so that the Fill method will match
incoming rows with the existing rows (and hence update them). All these
techniques are the same as you would use in version 1.x, and so we
haven’t listed all the code here. You can use the [view source] link at the foot of the page to see it, or download the examples
and play with the code yourself.
Examining the Effects of the LoadOption Enumeration Values
To see the effects of the LoadOption enumeration values, you can select each one and view the results in
the example page. The following three screenshots (Figures 2, 3 and 4) show
these results. When the LoadOption value is PreserveCurrentValues (PreserveChanges in Beta 2), the current value of the customer name in the
modified row is maintained and the RowState remains set to Modified (see
Figure 2). For the deleted row, there are no current values and the Fill method does not re-instance them - the RowState remains set to Deleted.
In the row that was added to the table,
there is no change to the current values or the RowState because
there is no matching incoming row (this row does not exist in the database). If
another user had added a row to the database with the same primary key,
however, the original values in this row would be set to the incoming
values, and the row would be marked as Modified.
The overall effect is to synchronize the original values in the rows with the values in the database and keep the
modifications to rows in the DataSet.

Figure 2 - The Effects of the PreserveCurrentValues
LoadOption value
When the LoadOption is UpdateCurrentValues (Upsert in Beta 2), the modified customer name in the current value
of first row is replaced by the incoming row value, and the RowState remains set to Modified (see Figure 3). For the deleted
row, where there are no current values, the incoming row re-instates
these with the values in the database. It also changes the RowState from Deleted to Modified, because
the row exists in the database and so an UPDATE command is required for the next
update, and not an INSERT command.
The third row, which was added to the DataSet, remains
as an Added row because no row in the database matches it. If another user had added a row
with the same primary key, however, their values would replace the current value. But even if another user had added rows to the database that match any of the rows in the DataSet,
these values would only replace the current values - while the original values would be retained.
The overall effect, therefore, is to aggregate
the rows in the DataSet with the new rows, without losing the original values.

Figure 3 - The Effects of the UpdateCurrentValues
LoadOption value
Finally, when the LoadOption value is OverwriteRow (OverwriteChanges in Beta 2), the current and the original values in each matching
row (the first and second rows) are replaced by the values in the incoming rows
(see Figure 4). And, because these match the values in the database, the rows
are marked as Unchanged because there is no need to push the values back into the database when the
next update is attempted. The Update method will process them and update the database
only if you edit the rows again.
The only exception is the row that was added to the DataSet, which
remains marked as an Added row ready to be inserted into the database when the next Update takes place. However, if another user had added a row to the database with
the same primary key, their values would replace both the current and original values in the DataSet and the RowState would be set to Unchanged.
The overall effect is to synchronize the rows,
and abandon modifications to rows in the DataSet.

Figure 4 - The Effects of the OverwriteRow LoadOption
value
Updating the RowState Property
Notice how, in Table 2, the LoadOption values affect the setting of
the RowState property for each row that an update is attempted on. A common problem in
version 1.x of ADO.NET is that the RowState values are read-only, and cannot
be changed. While this is fine for common update operations on the rows, it
does make it hard to perform custom management of the data - perhaps when you
want to manipulate the contents of the DataSet so that updates and/or inserts are
attempted on rows that are currently marked as Unchanged (and so the Update method will
ignore them).
In version 2.0, you can change the values of the RowState property
using two new methods that are added to the DataRow class. These are SetAdded and SetModified, and
change the RowState property of the row to Added or Modified.
However, they can only be called on rows that are currently marked at Unchanged, and not on rows that are marked as Added, Deleted or Modified.
The DataAdapter.AcceptChangesDuringUpdate Property
In version 1.x of the DataAdapter class, you can change the AcceptChangesDuringFill property from its default value of true to false, so that incoming row values only
set the current values in the rows in the DataSet table, and are not propagated to
the original values. In other words, the Update method does not call the AcceptChanges method on each of the rows as it fills them. This is useful when, for example,
you want to keep existing values in the original values of the rows -
perhaps to check for changes to the data since the DataSet was last filled.
In version 2.0, Microsoft has added the AcceptChangesDuringUpdate property to the DataAdapter.
This has the same effect as AcceptChangesDuringFill,
except it affects the outcome when the Update method is called. Effectively, it
prevents the AcceptChanges method from being executed on each row. It is useful where, for example, a SQL batch
statement or stored procedure you are using to perform the updates returns
values. These values will replace the current values of the row in the DataSet table, and
by default are copied to the original values by the AcceptChanges method. By changing the
value of this property from the default true to false, you prevent the values being
copied into the original values of the rows.
The DataView.ToTable Method
In version 2.0 of ADO.NET, the DataView class gains a new method named ToTable. This
allows you to convert an existing DataView instance into a DataTable¸ which
might be useful if you have existing code that creates a DataView, or a
class that exposes only a DataView and not the original DataTable.
There are three overloads of the ToTable method that allow you to use the existing view and
column names as the new table and column names, or specify different table
and/or column names. You can find out more from the .NET Framework SDK - search
for "DataView.ToTable".
Stand-alone DataTable Instances
Those developers who looked at the Technical Preview release
of ADO.NET will have noticed a class named DbTable that appeared there - with no
obvious links to existing classes or the natural hierarchy of the other ADO.NET
objects. The idea was to allow developers to work with single tables or
rowsets, without having to create a DataSet to manage them. DbTable was
designed to be used standalone, but could also be stored in a DataSet.
However, the whole approach was confusing and this class
disappeared in Beta 1. Instead, the more obvious approach was taken, by
surfacing the existing DataTable class with new methods and properties that allow it to be used in stand-alone
mode. Effectively, it still lives in a DataSet but you no longer have to create
and manipulate this DataSet.
You just work directly with the individual DataTable instance.
What this means is that common operations for which you
would have required a DataSet in version 1.x are now supported against DataTable. These include the ReadXml, ReadXmlSchema, WriteXml, WriteXmlSchema, Clear, Clone, Copy, Merge, GetChanges methods. The DataSet is also auto-serializable, and so can be returned from a Web Service or via
Remoting. And you can use the Fill and Update methods of the DataAdapter on a DataTable:
myDataAdapter.Fill(myDataTable)
myDataAdapter.Update(myDataTable)
You can also fill an array of DataTable instances:
myDataAdapter.Fill(myDataTableArray)
There are also other overloads of the Fill method that
allow you to fill subsets of rows in a DataTable.
Using the Load Method with a DataSet and DataTable
The stand-alone DataTable also supports the Load method, as
mentioned in connection with the DataSet class earlier in this article. Here, we'll look at
the Load method in more detail, both for the DataTable and the DataSet classes. The only difference
between the two is that, when calling Load on DataSet, you must specify the tables to
load. When calling it on DataTable,
it loads the current DataTable instance.
The syntax for the three overloads of the Load method for the DataSet class is shown
in Table 3. You must always specify a DataReader containing
the source data (it can, of course, return more than one rowset), and a LoadOption value. You must also specify the target tables as an array, either
by name or as references to DataTable instances.
And, finally, you can specifya handler for the FillError event
that is raised if an error occurs during the Load process.
Table 3 - The Load Method
Overloads for the DataSet Class
Method overload | Description |
Load(data-reader, load-option, table-name-array) | Takes a reference to a class that implements the IDataReader interface, such as a SqlDataReader or OleDbDataReader,
and loads the rows it exposes into the specified tables of the DataSet. The
target tables are specified as a String array of table names (less than
or equal to the number of rowsets available through the DataReader). The load-option parameter is a value from the LoadOptions enumeration that determines
how the values of incoming rows will be used when existing matching rows are
present in the DataSet tables. |
Load(data-reader, load-option,data-table-array) | Loads data from a reader instance that implements the IDataReader interface into the tables in a DataSet. Works exactly like the previous overload except
that the list of target tables is specified as an array of DataTable instances instead of the table names. |
Load(data-reader, load-option, fill-error-handler, data-table-array) | Loads data from a reader instance that implements the IDataReader interface into the tables in a DataSet. Works exactly like the previous overload except
that a reference to a delegate (event handler) can be specified. This event
handler will be executed if there is an error while loading the data. |
For a DataTable,
there are also three overloads of the Load method, as shown in Table 4. These
also allow you to specify a LoadOption value, and (if required) a handler for the FillError event that is raised if an
error occurs during the Load process. There is no need to specify the target table, as the data is loaded
into the DataTable instance on which the method is called.
Table 4 - The Load Method
Overloads for the DataTable Class
Method overload | Description |
Load(data-reader) | Takes a reference to a class that implements the IDataReader interface, such as a SqlDataReader or OleDbDataReader,
and loads the rows it exposes into this DataTable. |
Load(data-reader, load-option) | Loads data from a reader instance that implements the IDataReader interface into this DataTable.
The load-option parameter is a value from the LoadOptions enumeration that determines
how the values of incoming rows will be used when existing matching rows are
present in the DataSet tables. |
Load(data-reader, load-option, fill-error-handler) | Loads data from a reader instance that implements the IDataReader interface into this DataTable.
Works exactly like the previous overload except that a reference to a
delegate (event handler) can be specified. This event handler will be
executed if there is an error while loading the data. |
Remember that the Load method (like the Fill method) will
match incoming rows against any existing rows if a primary key is declared in
the table. It must do this to maintain data integrity - it cannot insert a
duplicate row that has the same primary key value as an existing row. However,
if the table already contains rows and there is no primary key defined, the
incoming rows are added to the table after the existing rows.
If the schema of the table (i.e. the names and ordering of any
existing columns) does not match the incoming rows, the schema may be adjusted.
If there are columns in the incoming rows that are not in the existing table, any
that are missing are appended to the end of the Columns collection of the table. This can
cause Load errors if the new columns have constraints, for example if they cannot contain
null values, and you have used PreserveCurrentValues (PreserveChanges in Beta 2) as the LoadOption for the
process. The new columns in existing rows will contain null values (unless you have specified
default values), and so an error will occur when the Load method tries to re-enable
constraints for the row after loading the data.
Using a DataReader to Load Tables in a DataSet
As an example of using the Load method, the code listed below demonstrates how you can create a DataReader (in this case a SqlDataReader) that
returns multiple rowsets using a SQL batch statement, and then push these rows
into the tables in a DataSet. The code creates the DataReader, and then
declares a String array containing two table names, Orders and OrderDetails. Then it creates a new empty DataSet and calls its Load method - passing in the DataReader and the array
of table names.
All the overloads of the Load method for a DataSet require a value from the LoadOption enumeration to be specified, and we use OverwriteRow (OverwriteChanges in Beta 2) here. In this case, because the DataSet is empty, the value we choose makes no difference - there are no
existing rows that could match the incoming rows, and there is no primary key
defined either.
// connection and query details
String connect = "your-connection-string";
const String SQL = "SELECT TOP 5
OrderID, OrderDate, ShipName, ShipCity, "
+ "ShipCountry FROM
[Orders];"
+ "SELECT TOP 5 *
FROM [Order Details];"
+ "SELECT TOP 5
CustomerID, CompanyName, Address, City, "
+ "Country FROM
[Customers]";
// create a connection, command and get a
DataReader
using (SqlConnection con = new
SqlConnection(connect))
{
SqlCommand cmd = new SqlCommand(SQL,
con);
con.Open();
reader =
cmd.ExecuteReader(CommandBehavior.CloseConnection);
// create an array of table names for first
two incoming rowsets
String[] tablenames = new String[]
{"Orders", "Order Details"};
// load the data from the DataReader
// no existing data, but have to provide a
value for LoadOptions
// so use OverwriteRow - makes no
difference in this case
DataSet ds = new DataSet();
ds.Load(reader, LoadOption.OverwriteRow,
tablenames);
// load another table directly using the
Load method
DataTable dt = new DataTable();
dt.TableName = "Customers";
dt.Load(reader);
ds.Tables.Add(dt);
// display the results
...
}
So far, however, we've only loaded two
tables (Orders and OrderDetails). To demonstrate another way of using the Load method, the code then calls it on an individual DataTable. To achieve this, we have to first create a new DataTable. We assign the name Customers to this table
then call its Load method to load the remaining rowset into it from the DataReader, and finally add it to the Tables collection of the DataSet.
The remaining code (not shown here) displays the progress messages you see at
the top of the page shown in Figure 5, and then assigns the
tables to three GridView controls to display the contents:

Figure 5 - Loading DataTables from a DataReader using the
Load Method
The BeginLoadData and EndLoadData Methods of the DataTable
The DataTable class also exposes two more methods, named BeginLoadData and EndLoadData, which can be used to disable
constraint checking and internal index maintenance in a DataTable while data is being loaded.
This speeds up load times, and only raises any constraint violations (such as
primary key duplication) when the EndLoadData method is called. If you
intend to load a large number of rows, you should consider using these methods:
try
{
myDataTable.BeginLoadData();
myDataTable.Load(myDataReader);
myDataTable.EndLoadData();
}
catch (Exception e)
{
// ... handle error ...
}
Using the FillErrorEventHandler to Catch Load Errors
You can also create an error handler
routine that will be executed if an error occurs during the loading of the data
into the target table. This is basically the same process as handling the
existing FillError event of the DataAdapter class in version 1.x of ADO.NET. However, as there is no DataAdapter involved in
the Load process, the technique for connecting the event handler to the
method involves passing a delegate reference into the Load method:
myDataSet.Load(data-reader, load-option,
new FillErrorHandler(myFillHandler), tables);
Then you create the event handler - where,
for example, you can access the table name, the values in the row that caused
the error, and set the Continue property of the FillErrorEventArgs to specify if processing should halt or continue with the next row:
private void MyFillHandler(Object sender,
FillErrorEventArgs args)
{
// write message to a StringBuilder for
display later
builder.Append("Error while
loading table " + args.DataTable.TableName);
builder.Append("Values of the row
being loaded are:<br />");
foreach (Object o in args.Values)
{
builder.Append(" - " +
o.ToString() + "<br />");
}
// tell Load method to continue with
next row
args.Continue = true;
}
Figure 6 shows an example that uses both
the Load method by itself, and the sequence of BeginLoadData, Load and EndLoadData methods, to load two DataTable instances
directly. It also contains a handler for the FillError event, which
you can view from the source code, or by clicking the [view source] link that is at the foot of all the example pages on our Web site.

Figure 6 - Using the BeginLoadData, Load and EndLoadData
Methods
Page 2 -->