Chapter 1 presents LINQ, its history, the reasons to use it, and quick “hello world” examples with objects, XML, and SQL.
Software is simple. It boils down to two things: code and data. Writing software is
not so simple, and one of the major activities it involves is writing code that deals
with data.
To write code, we can choose from a variety of programming languages. The
selected language for an application may depend on the business context, on
developer preferences, on the development team’s skills, on the operating system,
or on company policy.
Whatever language you end up with, at some point you will have to deal with
data. This data can be in files on a disk, tables in a database, or XML documents
coming from the Web, or often you have to deal with a combination of all of
these. Ultimately, managing data is a requirement for every software project you’ll
work on.
Given that dealing with data is such a common task for developers, we would
expect rich software development platforms like the .NET Framework to provide
an easy way to do it. .NET does provide wide support for working with data. You will
see, however, that something had yet to be achieved: deeper language and data
integration. This is where LINQ to Objects, LINQ to XML, and LINQ to SQL fit in.
The technologies we present in this book have been designed as a new way to
write code. This book has been written by developers for developers, so don’t be
afraid: You won’t have to wait too long before you are able to write your first lines
of LINQ code! In this chapter, we will quickly introduce "hello world" pieces of
code to give you hints on what you will discover in the rest of the book. The aim is
that, by the end of the book, you will be able to tackle real-world projects while
being convinced that LINQ is a joy to work with.
The intent of this first chapter is to give you an overview of LINQ and to help
you identify the reasons to use it. We will start by providing an overview of LINQ
and the LINQ toolset, which includes LINQ to Objects, LINQ to XML, and LINQ to
SQL. We will then review some background information to clearly understand why
we need LINQ and where it comes from. The second half of this chapter will guide
you while you make your first steps with LINQ code.
1.1 WHAT IS LINQ?
Suppose you are writing an application using .NET. Chances are high that at some
point you’ll need to persist objects to a database, query the database, and load the
results back into objects. The problem is that in most cases, at least with relational
databases, there is a gap between your programming language and the database.
Good attempts have been made to provide object-oriented databases, which
would be closer to object-oriented platforms and imperative programming languages
such as C# and VB.NET. However, after all these years, relational databases
are still pervasive, and you still have to struggle with data access and persistence in
all of your programs.
The original motivation behind LINQ was to address the conceptual and technical
difficulties encountered when using databases with .NET programming languages.
With LINQ, Microsoft’s intention was to provide a solution for the
problem of object-relational mapping, as well as to simplify the interaction
between objects and data sources. LINQ eventually evolved into a general-purpose
language-integrated querying toolset. This toolset can be used to access data coming
from in-memory objects (LINQ to Objects), databases (LINQ to SQL), XML
documents (LINQ to XML), a file-system, or any other source.
We will first give you an overview of what LINQ is, before looking at the tools it
offers. We will also introduce how LINQ extends programming languages.
1.1.1 Overview
LINQ could be the missing link—whether this pun is intended is yet to be discovered—
between the data world and the world of general-purpose programming
languages. LINQ unifies data access, whatever the source of data, and allows mixing
data from different kind of sources. It allows for query and set operations, similar
to what SQL statements offer for databases. LINQ, though, integrates queries
directly within .NET languages such as C# and Visual Basic through a set of extensions
to these languages: LINQ means Language-INtegrated Query.
Before LINQ, we had to juggle different languages like SQL, XML, or XPath
along with various technologies and APIs like ADO.NET or System.Xml in every
application written using general-purpose languages such as C# or VB.NET. It goes
without saying that this approach had several drawbacks.1 LINQ glues several
worlds together. It helps us avoid the bumps we would usually find on the road
from one world to another: using XML with objects, objects with relational data,
and relational data with XML are some of the tasks that LINQ will simplify.
One of the key aspects of LINQ is that it was designed to be used against any
type of object or data source and to provide a consistent programming model for
doing so. The syntax and concepts are the same across all of its uses: Once you
learn how to use LINQ against an array or a collection, you also know most of the
concepts needed to take advantage of LINQ with a database or an XML file.
Another important aspect of LINQ is that when you use it, you work in a
strongly typed world. The benefits include compile-time checking for your queries
as well as nice hints from Visual Studio’s IntelliSense feature.
LINQ will significantly change some aspects of how you handle and manipulate
data with your applications and components. You will discover how LINQ is a step
toward a more declarative programming model. Maybe you will wonder in the
not-so-distant future why you used to write so many lines of code.
There is duality in LINQ. You can conceive of LINQ as consisting of two complementary
parts: a set of tools that work with data, and a set of programming language
extensions.
You’ll first see how LINQ is a toolset that can be used to work with objects, XML
documents, relational databases, or other kinds of data. You’ll then see how LINQ
is also an extension to programming languages like C# and VB.NET.
1.1.2 LINQ as a toolset
LINQ offers numerous possibilities. It will significantly change some aspects of how
you handle and manipulate data with your applications and components. In this
book, we’ll detail the use of three major flavors of LINQ, or LINQ providers—LINQ
to Objects, LINQ to SQL, and LINQ to XML, respectively—in parts 2, 3, and 4. These
three LINQ providers form a family of tools that can be used separately for particular
needs or combined for powerful solutions.
We will focus on LINQ to Objects, LINQ to SQL, and LINQ to XML in this book,
but LINQ is open to new data sources. The three main LINQ providers discussed
in this book are built on top of a common LINQ foundation. This foundation consists
of a set of building blocks including query operators, query expressions, and expression
trees, which allow the LINQ toolset to be extensible.
Other variants of LINQ can be created to provide access to diverse kinds of
data sources. Implementations of LINQ will be released by software vendors, and
you can also create your own implementations, as you’ll see in chapter 12, which
covers LINQ’s extensibility. You can plug a wide array of data sources into LINQ,
including the file system, Active Directory, WMI, the Windows Event Log, or any
other data source or API. This is excellent because you can benefit from LINQ’s
features with a lot of the data sources you deal with every day. In fact, Microsoft
already offers more LINQ providers than just LINQ to Objects, LINQ to SQL, and
LINQ to XML. Two of them are LINQ to DataSet and LINQ to Entities (to work
with the new ADO.NET Entity Framework). We will present these tools in the second
and third parts of this book. For now, let’s keep the focus on the big picture.
Figure 1.1 shows how we can represent the LINQ building blocks and toolset in
a diagram.
The LINQ providers presented in Figure 1.1 are not standalone tools. They can
be used directly in your programming languages. This is possible because the
LINQ framework comes as a set of language extensions. This is the second aspect
of LINQ, which is detailed in the next section.
1.1.3 LINQ as language extensions
LINQ allows you to access information by writing queries against various data
sources. Rather than being simply syntactic sugar2 that would allow you to easily
include database queries right into your C# code, LINQ provides the same type of
expressive capabilities that SQL offers, but in the programming language of your
choice. This is great because a declarative approach like the one LINQ offers
allows you to write code that is shorter and to the point.
Listing 1.1 shows sample C# code you can write with LINQ.
The listing demonstrates all you need to write in order to extract data from a database
and create an XML document from it. Imagine how you would do the same
without LINQ, and you’ll realize how things are easier and more natural with
LINQ. You will soon see more LINQ queries, but let’s keep focused on the language
aspects for the moment. With the from, where, orderby, and select keywords
in the listing, it’s obvious that C# has been extended to enable languageintegrated
queries.
We’ve just shown you code in C#, but LINQ provides a common querying architecture
across programming languages. It works with C# 3.0 and VB.NET 9.0 (also
known as VB 2008), and as such requires dedicated compilers, but it can be
ported to other .NET languages. This is already the case for F#, a functional language
for .NET from Microsoft Research, and you can expect to see LINQ support
appear in more .NET languages in the future.
Figure 1.2 shows a typical language-integrated query that is used to talk to
objects, XML, or data tables.
The query in the figure is expressed in C# and not in a new language. LINQ is
not a new language. It is integrated into C# and VB.NET. In addition, LINQ can be
used to avoid entangling your .NET programming language with SQL, XSL, or
other data-specific languages. The set of language extensions that come with
LINQ enables queries over several kinds of data stores to be formulated right into
programming languages. Think of LINQ as a universal remote control, if you wish.
At times, you’ll use it to query a database; at others, you’ll query an XML document.
But you’ll do all this in your favorite language, without having to switch to
another one like SQL or XQuery.
In chapter 2, we’ll show you the details of how the programming languages
have been extended to support LINQ. In chapter 3, you’ll learn how to write LINQ
queries. This is where you’ll learn about query operators, query expressions, and
expression trees. But you still have a few things to discover before getting there.
Now that we have given you an idea of what LINQ is, let’s discuss the motivation
behind it, and then we’ll review its design goals and a bit of history.
1.2 WHY DO WE NEED LINQ?
We have just provided you with an overview of LINQ. The big questions at this
point are: Why do we want a tool like LINQ? What makes the previous tools inconvenient?
Was LINQ created only to make working with programming languages,
relational data, and XML at the same time more convenient?
At the origin of the LINQ project is a simple fact: The vast majority of applications
that are developed access data or talk to a relational database. Consequently,
in order to program applications, learning a language such as C# is not enough.
You also have to learn another language such as SQL, and the APIs that tie it
together with C# to form your full application.
We’ll start by taking a look at a piece of data-access code that uses the standard
.NET APIs. This will allow us to point out the common problems that are encountered
in this kind of code. We will then extend our analysis by showing how these
problems exist with other kinds of data such as XML. You’ll see that LINQ
addresses a general impedance mismatch between data sources and programming
languages. Finally, a short code sample will give you a glimpse at how LINQ is a
solution to the problem.
1.2.1 Common problems
The frequent use of databases in applications requires that the .NET Framework
address the need for APIs that can access the data stored within. Of course, this
has been the case since the first appearance of .NET. The .NET Framework Class
Library (FCL) includes ADO.NET, which provides an API to access relational databases
and to represent relational data in memory. This API consists of classes such
as SqlConnection, SqlCommand, SqlReader, DataSet, and DataTable, to name a
few. The problem with these classes is that they force the developer to work explicitly
with tables, records, and columns, while modern languages such as C# and
VB.NET use object-oriented paradigms.
Now that the object-oriented paradigm is the prevailing model in software
development, developers incur a large amount of overhead in mapping it to other
abstractions, specifically relational databases and XML. The result is that a lot of
time is spent on writing plumbing code.3 Removing this burden would increase
productivity in data-intensive programming, which LINQ helps us do.
But it’s not only about productivity! It also impacts quality. Writing tedious
and fragile plumbing code can lead to insidious defects in software or degraded
performance.
Listing 1.2 shows how we would typically access a database in a .NET program.
By looking at the problems that exist with traditional code, you’ll be able to see
how LINQ comes to the rescue.
Just by taking a quick look at this code, we can list several limitations of the model:
- Although we want to perform a simple task, several steps and verbose code
are required.
- Queries are expressed as quoted strings
, which means they bypass all
kinds of compile-time checks. What if the string does not contain a valid
SQL query? What if a column has been renamed in the database?
- The same applies for the parameters
and for the result sets : they are
loosely defined. Are the columns of the type we expect? Also, are we sure
we’re using the correct number of parameters? Are the names of the parameters
in sync between the query and the parameter declarations?
- The classes we use are dedicated to SQL Server and cannot be used with
another database server. Naturally, we could use DbConnection and its
friends to avoid this issue, but that would solve only half of the problem.
The real problem is that SQL has many vendor-specific dialects and data
types. The SQL we write for a given DBMS is likely to fail on a different one.
Other solutions exist. We could use a code generator or one of the several objectrelational
mapping tools available. The problem is that these tools are not perfect
either and have their own limitations. For instance, if they are designed for
accessing databases, most of the time they don’t deal with other data sources
such as XML documents. Also, one thing that language vendors such as Microsoft
can do that mapping tool vendors can’t is integrate data-access and -querying features
right into their languages. Mapping tools at best present a partial solution
to the problem.
The motivation for LINQ is twofold: Microsoft did not have a data-mapping
solution yet, and with LINQ it had the opportunity to integrate queries into its
programming languages. This could remove most of the limitations we identified
in Listing 1.2.
The main idea is that by using LINQ you are able to gain access to any data
source by writing queries like the one shown in Listing 1.3, directly in the programming
language that you master and use every day.
In this query, the data could be in memory, in a database, in an XML document,
or in another place; the syntax would remain similar if not exactly the same. As
you saw in Figure 1.2, this kind of query can be used with multiple types of data
and different data sources, thanks to LINQ’s extensibility features. For example, in
the future we are likely to see an implementation of LINQ for querying a file system
or for calling web services.
1.2.2 Addressing a paradigm mismatch
Let’s continue looking at why we need LINQ. The fact that modern application
developers have to simultaneously deal with general-purpose programming languages,
relational data, SQL, XML documents, XPath, and so on means that we
need two things:
- To be able to work with any of these technologies or languages individually
- To mix and match them to build a rich and coherent solution
The problem is that object-oriented programming (OOP), the relational database
model, and XML—just to name a few—were not originally built to work together.
They represent different paradigms that don’t play well with each other.
What is this impedance mismatch everybody’s talking about?
Data is generally manipulated by application software written using OOP languages
such as C#, VB.NET, Java, Delphi, and C++. But translating an object graph
into another representation, such as tuples of a relational database, often requires
tedious code.
The general problem LINQ addresses has been stated by Microsoft like this:
"Data != Objects." More specifically, for LINQ to SQL: "Relational data != Objects."
The same could apply for LINQ to XML: "XML data != Objects." We should also add:
"XML data != Relational data."
We’ve used the term impedance mismatch. It is commonly applied to incompatibility
between systems and describes an inadequate ability of one system to accommodate
input from another. Although the term originated in the field of
electrical engineering, it has been generalized and used as a term of art in systems
analysis, electronics, physics, computer science, and informatics.
Object-relational mapping
If we take the object-oriented paradigm and the relational paradigm, the mismatch
exists at several levels. Let’s name a few.
Relational databases and object-oriented languages don’t share the same set of primitive
data types. For example, strings usually have a delimited length in databases, which
is not the case in C# or VB.NET. This can be a problem if you try to persist a 150-
character string in a table field that accepts only 100 characters. Another simple
example is that most databases don’t have a Boolean type, whereas we frequently
use true/false values in many programming languages.
OOP and relational theories come with different data models. For performance reasons
and due to their intrinsic nature, relational databases are usually normalized.
Normalization is a process that eliminates redundancy, organizes data efficiently,
and reduces the potential for anomalies during data operations and improves
data consistency. Normalization results in an organization of data that is specific
to the relational data model. This prevents a direct mapping of tables and records
to objects and collections. Relational databases are normalized in tables and relations,
whereas objects use inheritance, composition, and complex reference
graphs. A common problem exists because relational databases don’t have concepts
like inheritance: Mapping a class hierarchy to a relational database requires
using "tricks."
Programming models. In SQL, we write queries, and so we have a higher-level,
declarative way of expressing the set of data that we’re interested in. With imperative
programming languages such as C# or VB.NET, we have to write for loops and if
statements and so forth.
Encapsulation. Objects are self-contained and include data as well as behavior.
In databases, data records don’t have behavior, per se. It’s possible to act on database
records only through the use of SQL queries or stored procedures. In relational
databases, code and data are clearly separated.
The mismatch is a result of the differences between a relational database and a
typical object-oriented class hierarchy. We might say relational databases are from
Mars and objects are from Venus.
Let’s take the simple example shown in Figure 1.3. We have an object model
we’d like to map to a relational model.
Concepts such as inheritance or composition are not directly supported by
relational databases, which means that we cannot represent the data the same way
in both models. You can see here that several objects and types of objects can be
mapped to a single table.
Even if we wanted to persist an object model like the one we have here in a
new relational database, we would not be able to use a direct mapping. For
instance, for performance reasons and to avoid duplication, it’s much better in
this case to create only one table in the database. A consequence of doing so, however,
is that data coming from the database table cannot be easily used to repopulate
an object graph in memory. When we win on one side, we lose on the other.
We may be able to design a database schema or an object model to reduce the
mismatch between both worlds, but we’ll never be able to remove it because of
the intrinsic differences between the two paradigms. We don’t even always have
the choice. Often, the database schema is already defined, and in other cases we
have to work with objects defined by someone else.
The complex problem of integrating data sources with programs involves
more than simply reading from and writing to a data source. When programming
using an object-oriented language, we normally want our applications to use an
object model that is a conceptual representation of the business domain, instead
of being tied directly to the relational structure. The problem is that at some
point we need to make the object model and the relational model work together.
This is not an easy task because object-oriented programming languages and .NET
involve entity classes, business rules, complex relationships, and inheritance,
whereas a relational data source involves tables, rows, columns, and primary and
foreign keys.
A typical solution for bridging object-oriented languages and relational databases
is object-relational mapping. This refers to the process of mapping our relational
data model to our object model, usually back and forth. Mapping can be
defined as the act of determining how objects and their relationships are persisted
in permanent data storage, in this case relational databases.
Databases4 do not map naturally to object models. Object-relational mappers
are automated solutions to address the impedance mismatch. To make a long
story short: We provide an object-relational mapper with our classes, database,
and mapping configuration, and the mapper takes care of the rest. It generates
the SQL queries, fills our objects with data from the database, persists them in the
database, and so on.
As you can guess, no solution is perfect, and object-relational mappers could
be improved. Some of their main limitations include the following:
- A good knowledge of the tools is required before being able to use them
efficiently and avoid performance issues.
- Optimal use still requires knowledge of how to work with a relational
database.
- Mapping tools are not always as efficient as handwritten data-access code.
- Not all the tools come with support for compile-time validation.
Multiple object-relational mapping tools are available for .NET. There is a choice
of open source, free, or commercial products. As an example, Listing 1.4 shows a
mapping configuration file for NHibernate, one of the open source mappers.
In part 3 of this book, you’ll see how LINQ to SQL is an object-relational mapping
solution and how it addresses some of these issues. But for now, we are going to
look at another problem LINQ can solve.
Object-XML mapping
Analogous to the object-relational impedance mismatch, a similar mismatch also
exists between objects and XML. For example, the type system described in the W3C
XML Schema specification has no one-to-one relationship with the type system of
the .NET Framework. However, using XML in a .NET application is not much of a
problem because we already have APIs that deal with this under the System.Xml
namespace as well as the built-in support for serializing and deserializing objects.
Still, a lot of tedious code is required most of the time for doing even simple things
on XML documents.
Given that XML has become so pervasive in the modern software world, something
had to be done to reduce the work required to deal with XML in programming
languages.
When you look at these domains, it is remarkable how different they are. The
main source of contention relates to the following facts:
- Relational databases are based on relational algebra and are all about
tables, rows, columns, and SQL.
- XML is all about documents, elements, attributes, hierarchical structures,
and XPath.
- Object-oriented general-purpose programming languages and .NET live in
a world of classes, methods, properties, inheritance, and loops.
Many concepts are specific to each domain and have no direct mapping to
another domain. Figure 1.4 gives an overview of the concepts used in .NET and
object-oriented programming, in comparison to the concepts used in data
sources such as XML documents or relational databases.
Too often, programmers have to do a lot of plumbing work to tie together the
different domains. Different APIs for each data type cause developers to spend an
inordinate amount of time learning how to write, debug, and rewrite brittle code.
The usual culprits that break the pipes are bad SQL query strings or XML tags, or
content that doesn’t get checked until runtime. .NET languages such as C# and
VB.NET assist developers and provide such things as IntelliSense, strongly typed
code, and compile-time checks. Still, this can become broken if we start to include
malformed SQL queries or XML fragments in our code, none of which are validated
by the compiler.
A successful solution requires bridging the different technologies and solving
the object-persistence impedance mismatch—a challenging and resource-intensive
problem. To solve this problem, we must resolve the following issues between .NET
and data source elements:
- Fundamentally different technologies
- Different skill sets
- Different staff and ownership for each of the technologies
- Different modelling and design principles
Some efforts have been made to reduce the impedance mismatch by bringing
some pieces of one world into another. For example: SQLXML 4.0 ties SQL to XSD;
System.Xml spans XML/XML DOM/XSL/XPath and CLR; the ADO.NET API
bridges SQL and CLR data types; and SQL Server 2005 includes CLR integration.
All these efforts are proof that data integration is essential; however, they represent
distinct moves without a common foundation, which makes them difficult to
use together. LINQ, in contrast, offers a common infrastructure to address the
impedance mismatches.
1.2.3 LINQ to the rescue
To succeed in using objects and relational databases together, you need to understand
both paradigms, along with their differences, and then make intelligent
tradeoffs based on that knowledge. The main goal of LINQ and LINQ to SQL is to
get rid of, or at least reduce, the need to worry about these limits.
An impedance mismatch forces you to choose one side or the other as the "primary"
side. With LINQ, Microsoft chose the programming language side, because
it’s easier to adapt the C# and VB.NET languages than to change SQL or XML.
With LINQ, the aim is toward deeply integrating the capabilities of data query and
manipulation languages into programming languages.
LINQ removes many of the barriers among objects, databases, and XML. It
enables us to work with each of these paradigms using the same language-integrated
facilities. For example, we are able to work with XML data and data coming from a
relational database within the same query.
Because code is worth a thousand words, let’s take a look at a quick code sample
using the power of LINQ to retrieve data from a database and create an XML document
in a single query. Listing 1.5 creates an RSS feed based on relational data.
We will not detail here how this code works. You will see documented examples
like this one in parts 3 and 4 of the book. What is important to note at this point is
how LINQ makes it easy to work with relational data and XML in the same piece of
code. If you have already done this kind of work before, it should be obvious that
this code is very concise and readable in comparison to the solutions at your disposal
before LINQ appeared.
Before seeing more code samples and helping you write your own LINQ code,
we’ll now quickly review where LINQ comes from.
1.3 DESIGN GOALS AND ORIGINS OF LINQ
It’s important to know clearly what Microsoft set out to achieve with LINQ. This is
why we’ll start this section by reviewing the design goals of the LINQ project. It’s
also interesting to know where LINQ takes its roots from and understand the links
with other projects you may have heard of. We’ll spend some time looking at the
history of the LINQ project to know how it was born.
LINQ is not a recent project from Microsoft in the sense that it inherits a lot of
features from research and development work done over the last several years.
1.3.1 The goals of the LINQ project
Table 1.1 reviews the design goals Microsoft set for the LINQ project in order to
give you a clear understanding of what LINQ offers.
The number-one LINQ feature presented in Table 1.1 is the ability to deal with
several data types and sources. LINQ ships with implementations that support
querying against regular object collections, databases, entities, and XML sources.
Because LINQ supports rich extensibility, developers can also easily integrate it
with other data sources and providers.
Another essential feature of LINQ is that it is strongly typed. This means the
following:
- We get compile-time checking for all queries. Unlike SQL statements today,
where we typically only find out at runtime if something is wrong, this
means we can check during development that our code is correct. The
direct benefit is a reduction of the number of problems discovered late in
production. Most of the time, issues come from human factors. Strongly
typed queries allow us to detect early typos and other mistakes made by the
developer in charge of the keyboard.
- We get IntelliSense within Visual Studio when writing LINQ queries. This
not only makes typing faster, but also makes it much easier to work against
both simple and complex collection and data source object models.
This is all well and good, but where does LINQ come from? Before delving into
LINQ and starting to use it, let’s see how it was born.
1.3.2 A bit of history
LINQ is the result of a long-term research process inside Microsoft. Several
projects involving evolutions of programming languages and data-access methods
can be considered to be the parents of LINQ to Objects, LINQ to XML (formerly
known as XLinq), and LINQ to SQL (formerly known as DLinq).
Cω (or the C-Omega language)
Cω (pronounced "c-omega") was a project from Microsoft Research that extended
the C# language in several areas, notably the following:
- A control flow extension for asynchronous wide-area concurrency (formerly
known as Polyphonic C#)
- A data type extension for XML and database manipulation (formerly known
as Xen and as X#)
Cω covered more than what comes with LINQ, but a good deal of what is now
included as part of the LINQ technologies was already present in Cω. The Cω
project was conceived to experiment with integrated queries, mixing C# and SQL,
C# and XQuery, and so on. This was carried out by researchers such as Erik
Meijer, Wolfram Schulte, and Gavin Bierman, who published multiple papers on
the subject.
Cω was released as a preview in 2004. A lot has been learned from that prototype,
and a few months later, Anders Hejlsberg, chief designer of the C# language,
announced that Microsoft would be working on applying a lot of that knowledge in
C# and other programming languages. Anders said at that time that his particular
interest for the past couple of years had been to think deeply about the big impedance
mismatch between programming languages—C# in particular—and the data
world. This includes database and SQL, but also XML and XQuery, for example.
Cω’s extensions to the .NET type system and to the C# language were the first
steps to a unified system that treated SQL-style queries, query result sets, and XML
content as full-fledged members of the language. Cω introduced the stream type,
which is analogous to the .NET Framework 2.0 type System.Collections.Generic.
IEnumerable<T>. Cω also defined constructors for typed tuples (called anonymous
structs), which are similar to the anonymous types we get in C# 3.0 and VB.NET 9.0.
Another thing Cω supported is embedded XML, something we are able to see in
VB.NET 9.0 (but not in C# 3.0).
ObjectSpaces
LINQ to SQL is not Microsoft’s first attempt at object-relational mapping. Another
project with a strong relationship to LINQ was ObjectSpaces.
The first preview of the ObjectSpaces project appeared in a PDC 2001
ADO.NET presentation. ObjectSpaces was a set of data access APIs. It allowed data
to be treated as objects, independent of the underlying data store. ObjectSpaces
also introduced OPath, a proprietary object query language. In 2004, Microsoft
announced that ObjectSpaces depended on the WinFS5 project, and as such
would be postponed to the Orcas timeframe (the next releases after .NET 2.0 and
Visual Studio 2005). No new releases happened after that. Everybody realized that
ObjectSpaces would never see the light of day when Microsoft announced that
WinFS wouldn’t make it into the first release of Windows Vista.
XQuery implementation
Similar to what happened with ObjectSpaces and about the same time, Microsoft
had started working on an XQuery processor. A preview was included in the first
beta release of the .NET Framework version 2.0, but eventually it was decided not
to ship a client-side6 XQuery implementation in the final version. One problem
with XQuery is that it was an additional language we would have to learn specifically
to deal with XML.
Why all these steps back? Why did Microsoft apparently stop working on these
technologies? Well, the cat came out of the bag at PDC 2005, when the LINQ
project was announced.
LINQ has been designed by Anders Hejlsberg and others at Microsoft to
address this impedance mismatch from within programming languages like C#
and VB.NET. With LINQ, we can query pretty much anything. This is why Microsoft
favored LINQ instead of continuing to invest in separate projects like
ObjectSpaces or support for XQuery on the client-side.
As you’ve seen, LINQ has a rich history behind it and has benefited from all the
research and development work done on prior, now-defunct projects. Before we
go further and show you how it works, how to use it, and its different flavors, what
about writing your first lines of LINQ code?
The next three sections provide simple code that demonstrates LINQ to
Objects, LINQ to XML, and LINQ to SQL. This will give you an overview of what
LINQ code looks like and show you how it can help you work with object collections,
XML, and relational data.
1.4 FIRST STEPS WITH LINQ TO OBJECTS: QUERYING COLLECTIONS IN MEMORY
After this introduction, you’re probably eager to look at some code and to make
your first steps with LINQ. We think that you’ll get a better understanding of the
features LINQ provides if you spend some time on a piece of code. Programming
is what this book is about, anyway!
1.4.1 What you need to get started
Before looking at code, let’s spend some time reviewing all you need to be able to
test this code.
Compiler and .NET Framework support and required software
LINQ is delivered as part of the Orcas wave, which includes Visual Studio 2008 and
the .NET Framework 3.5. This version of the framework comes with additional
and updated libraries, as well as new compilers for the C# and VB.NET languages,
but it stays compatible with the .NET Framework 2.0.
LINQ features are a matter of compiler and libraries, not runtime. It is important
to understand that although the C# and VB.NET languages have been enriched
and a few new libraries have been added to the .NET Framework, the .NET runtime
(the CLR) did not need to evolve. New compilers are required for C# 3.0 and
VB.NET 9.0, but the required runtime is still an unmodified version 2.0. This means
that the applications you’ll build using LINQ can run in a .NET 2.0 runtime.7
At the time of this writing, LINQ and LINQ to XML, or at least subsets of them,
are supported by the current releases of the Silverlight runtime. They are available
through the System.Linq and System.Xml.Linq namespaces.
All the content of this book and the code samples it contains are based on the
final products, Visual Studio 2008 and .NET 3.5 RTM,8 which were released on
November 19, 2007.
To set up your machine and be able to run our code samples as you read, you
only need to install the following:
At least one of these versions of Visual Studio:
- Visual C# 2008 Express Edition
- Visual Basic 2008 Express Edition
- Visual Web Developer 2008 Express Edition
- Visual Studio 2008 Standard Edition or higher
If you want to run the LINQ to SQL samples, one of the following is required:
- SQL Server 2005 Express Edition or SQL Server 2005 Compact Edition
(included with most versions of Visual Studio)
- SQL Server 2005
- SQL Server 2000a
- A later version of SQL Server9
That’s all for the required software. Let’s now review the programming languages
we’ll use in this book.
Language considerations
In this book, we assume you know the syntax of the C# programming language
and occasionally a bit of VB.NET. For the sake of simplicity, we’ll be light on the
explanations while we introduce our first few code samples. Don’t worry: In chapters
2 and 3, we’ll take the time to present in detail the syntax evolutions provided
by C# 2.0, C# 3.0, VB.NET 9.0, and LINQ. You will then be able to fully understand
LINQ queries.
NOTE:
Most of the examples contained in this book are in C#, but they can
easily be ported to VB.NET, because the syntax is similar between the
two languages.
Code examples are in VB.NET when we examine the features specific
to this language or simply when it makes sense. All the code samples are
available both in C# and VB.NET as a companion source code download,
so you can find them in your language of choice. |
All right, enough preliminaries! Let’s dive into a simple example that will show
you how to query a collection in memory using LINQ to Objects. Follow the guide,
and be receptive to the magic of all these new features you’ll be using soon in
your own applications.
1.4.2 Hello LINQ to Objects
You may have had little contact with these new concepts and syntactic constructs.
Fear not! Our ultimate goal is for you to master these technologies, but don’t
force yourself to understand everything at once. We’ll take the time we need to
come back to every detail of LINQ and the new language extensions as we
progress through the book.
Listing 1.6 shows our first LINQ example in C#.
Listing 1.7 shows the same example in VB.NET.
| NOTE:
Most of the code examples contained in this book can be copied and
pasted without modification into a console application for testing.
|
If you were to compile and run these codes, here is the output you’d see:
hello
linq
world
As is evident from the results, we have filtered a list of words to select only the
ones whose length is less than or equal to five characters.
We could argue that the same result could be achieved without LINQ using the
code in Listing 1.8.
Notice how this "old-fashioned" code is much shorter than the LINQ version and
very easy to read. Well, don’t give up yet. There is much more to LINQ than what
we show in this first simple program! If you read on, we will help you discover all
the power of LINQ to Objects, LINQ to SQL, and LINQ to XML.
To give you some motivation to pursue reading, let’s try to improve our simple
example with grouping and sorting. This should give you an idea of why LINQ is
useful and powerful.
In order to get this result
Words of length 9
beautiful
wonderful
Words of length 5
hello
world
Words of length 4
linq
we can use the C# code shown in Listing 1.9.
Listing 1.10 shows the equivalent VB.NET code.
In the preceding examples, we have expressed in one query (or two nested queries
more precisely) what could be formulated in English as "Sort words from a list
alphabetically and group them by their length in descending order."
We’ll leave doing the same without LINQ as an exercise for you. If you take the
time to do it, you’ll notice that it takes more code and requires dealing a lot with
collections. One of the first advantages of LINQ that stands out with this example
is the expressiveness it enables: We can express declaratively what we want to
achieve using queries instead of writing convoluted pieces of code.
We won’t take the time right now to get into the details of the code you’ve just
seen. If you are familiar with SQL, you probably already have a good idea of what
the code is doing. In addition to all the nice SQL-like querying, LINQ also provides
a number of other functions such as Sum, Min, Max, Average, and much more.
They let us perform a rich set of operations.
For example, here we sum the amount of each order in a list of orders to compute
a total amount:
decimal totalAmount = orders.Sum(order => order.Amount);
If you haven’t dealt with C# 3.0 yet, you may find the syntax confusing. "What’s
this strange arrow?" you may wonder. We’ll explain this type of code in greater
detail later in the book so you can fully understand it. However, before we continue,
you may wish to test our "Hello LINQ" example and start playing with the
code. Feel free to do so to get an idea of how easy to use LINQ really is.
Once you are ready, let’s move on to LINQ to XML and LINQ to SQL. We’ll spend
some time with these two other flavors of LINQ so you can get an idea of what they
taste like. We will get back to LINQ to Objects in detail in part 2 of this book.
1.5 FIRST STEPS WITH LINQ TO XML:
QUERYING XML DOCUMENTS
As we said in the first half of this chapter, the extensibility of the LINQ query architecture
is used to provide implementations that work over both XML and SQL
data. We will now help you to make your first steps with LINQ to XML.
LINQ to XML takes advantage of the LINQ framework to offer XML query and
transform capabilities integrated into host .NET programming languages. You can
also think of LINQ to XML as a full-featured XML API comparable to a modernized,
redesigned .NET 2.0 System.Xml plus a few key features from XPath and XSLT. LINQ
to XML provides facilities to edit XML documents and element trees in-memory, as
well as streaming facilities. This means that you’ll be able to use LINQ to XML to
more easily perform many of the XML-processing tasks that you have been performing
with the traditional XML APIs from the System.Xml namespace.
We will first examine why we need an XML API like LINQ to XML by comparing
it to some alternatives. You’ll then make your first steps with some code using
LINQ to XML in an obligatory "Hello World" example.
1.5.1 Why we need LINQ to XML
XML is ubiquitous nowadays, and is used extensively in applications written using
general-purpose languages such as C# or VB.NET. It is used to exchange data
between applications, store configuration information, persist temporary data,
generate web pages or reports, and perform many other things. It is everywhere!
Until now, XML hasn’t been natively supported by most programming languages,
which therefore required the use of APIs to deal with XML data. These
APIs include XmlDocument, XmlReader, XPathNavigator, XslTransform for XSLT,
and SAX and XQuery implementations. The problem is that these APIs are not
well integrated with programming languages, often requiring several lines of
unnecessarily convoluted code to achieve a simple result. You’ll see an example of
this in the next section (see Listing 1.13). But for the moment, let’s see what LINQ
to XML has to offer.
LINQ to XML extends the language-integrated query features offered by LINQ
to add support for XML. It offers the expressive power of XPath and XQuery but in
our programming language of choice and with type safety and IntelliSense.
If you’ve worked on XML documents with .NET, you probably used the XML
DOM (Document Object Model) available through the System.Xml namespace.
LINQ to XML leverages experience with the DOM to improve the developer toolset
and avoid the limitations of the DOM.
Table 1.2 compares the characteristics of LINQ to XML with those of the XML
DOM.
Whereas the DOM is low-level and requires a lot of code to precisely formulate
what we want to achieve, LINQ to XML provides a higher-level syntax that allows us
to do simple things simply.
LINQ to XML also enables an element-centric approach in comparison to the
document-centric approach of the DOM. This means that we can easily work with
XML fragments (elements and attributes) without having to create a complete
XML document.
Two classes that the .NET Framework offers are XmlReader and XmlWriter.
These classes provide support for working on XML text in its raw form and are
lower-level than LINQ to XML. LINQ to XML uses the XmlReader and XmlWriter
classes underneath and is not a completely new XML API. One advantage of this is
that it allows LINQ to XML to remain compatible with XmlReader and XmlWriter.
LINQ to XML makes creating documents more direct, but it also makes it easier
to query XML documents. Expressing queries against XML documents feels more
natural than having to write of lot of code with several loop instructions. Also,
being part of the LINQ family of technologies, it is a good choice when we need to
join diverse data sources.
With LINQ to XML, Microsoft is aiming at 80 percent of the use cases. These
cases involve straightforward XML formats and common processing. For the other
cases, developers will continue to use the other APIs. Also, although LINQ to XML
takes inspiration from XSLT, XPath, and XQuery, these technologies have benefits
of their own and are designed for specific use cases, and within those scopes LINQ
to XML is in no way able to compete with them. LINQ to XML is not enough for
some specific cases, but its compatibility with the other XML APIs allows us to use
it in combination with these APIs. We’ll keep these kinds of advanced scenarios
for part 4 of this book.
For the moment, let’s discover how LINQ to XML makes a difference by looking
at some code.
1.5.2 Hello LINQ to XML
The running example application we’ll use in this book deals, appropriately
enough, with books. We’ll detail this example in chapter 4. For the moment,
we’ll stick to a simple Book class because it is enough for your first contact with
LINQ to XML.
In our first example, we want to filter and save a set of Book objects as XML.
Here is how the Book class could be defined in C#:10
C#
class Book
{
public string Publisher;
public string Title;
public int Year;
public Book(string title, string publisher, int year)
{
Title = title;
Publisher = publisher;
Year = year;
}
}
And here it is in VB.NET:
VB.NET
Public Class Book
Public Publisher As String
Public Title As String
Public Year As Integer
Public Sub New( _
ByVal title As String, _
ByVal publisher As String, _
ByVal year As Integer)
Me.Title = title
Me.Publisher = publisher
Me.Year = year
End Sub
End Class
Let’s say we have the following collection of books:
Book[] books = new Book[] {
new Book("Ajax in Action", "Manning", 2005),
new Book("Windows Forms in Action", "Manning", 2006),
new Book("RSS and Atom in Action", "Manning", 2006)
};
Here is the result we would like to get if we ask for the books published in 2006:
<books>
<book title="Windows Forms in Action">
<publisher>Manning</publisher>
</book>
<book title="RSS and Atom in Action">
<publisher>Manning</publisher>
</book>
</books>
Using LINQ to XML, this can be done with the code shown in Listing 1.11.
Listing 1.12 shows the same code in VB.NET.
In contrast, Listing 1.13 shows how we would build the same document without
LINQ to XML, using the XML DOM.
As you can see, LINQ to XML is more visual than the DOM. The structure of the
code to get our XML fragment is close to the document we want to produce itself.
We could say that it’s WYSIWYM code: What You See Is What You Mean.
Microsoft names this approach the Functional Construction pattern. It allows us
to structure code in such a way that it reflects the shape of the XML document (or
fragment) that we’re constructing.
In VB.NET, the code can be even closer to the resulting XML, as shown in Listing
1.14.
The listing uses a new syntax named XML literals, which is highlighted in bold. Literal
means something that is output as part of the result. Here, the books, book, and
publisher XML elements will be part of the generated XML. XML literals allow us
to use a template of the XML we’d like to get, with a syntax comparable to ASP.
The XML literals feature is not provided by C# 3.0. It exists only in VB.NET 9.0.
You will discover that VB.NET comes with more language-integrated features than
C# to work with XML.
You’ll get the details about XML literals and everything else you need to know
to make the best of LINQ to XML in part 4 of the book. For the moment, we still
have one major piece of the LINQ trilogy to introduce: LINQ to SQL.
1.6 FIRST STEPS WITH LINQ TO SQL:
QUERYING RELATIONAL DATABASES
LINQ’s ambition is to make queries a natural part of the programming language.
LINQ to SQL, which made its first appearance as DLinq, applies this concept to
allow developers to query relational database using the same syntax that you have
seen with LINQ to Objects and LINQ to XML.
After summing up how LINQ to SQL will help us, we’ll show you how to write
your first LINQ to SQL code.
1.6.1 Overview of LINQ to SQL’s features
LINQ to SQL provides language-integrated data access by using LINQ’s extension
mechanism. It builds on ADO.NET to map tables and rows to classes and objects.
LINQ to SQL uses mapping information encoded in .NET custom attributes or
contained in an XML document. This information is used to automatically handle
the persistence of objects in relational databases. A table can be mapped to a class
and the table’s columns to properties of the class, and relationships between
tables can be represented by additional properties.
LINQ to SQL automatically keeps track of changes to objects and updates the
database accordingly through dynamic SQL queries or stored procedures. This is
why we don’t have to provide the SQL queries by ourself most of the time. But all
this will be developed in part 3 of this book. For the moment, let’s make our first
steps with LINQ to SQL code.
1.6.2 Hello LINQ to SQL
The time has come to look at some code using LINQ to SQL. As you saw in our
Hello LINQ example, we are able to write queries against a collection of objects.
The following C# code snippet filters an in-memory collection of contacts based
on their city:
from contact in contacts
where contact.City == "Paris"
select contact;
The good news is that thanks to LINQ to SQL, doing the same on data from a relational
database is direct:
from contact in db.GetTable<Contact>()
where contact.City == "Paris"
select contact;
This query works on a list of contacts from a database. Notice how subtle the difference
is between the two queries. Only the object on which we are working is
different; the query syntax is exactly the same. This shows how we’ll be able to
work the same way with multiple types of data. This is what is so great about LINQ!
As an astute reader, you know that the language a relational database understands
is SQL, and you suspect that our LINQ query must be translated into a SQL
query at some point. This is the heart of the technology: In the first example, the
collection is iterated in memory, whereas in the second code snippet, the query is
used to generate a SQL query that is sent to a database server. In the case of LINQ
to SQL queries, the real processing happens on the database server. What’s appealing
about these queries is that we have a nice strongly typed query API, in contrast
with SQL, where queries are expressed in strings and not validated at compile-time.
We will dissect the inner workings of LINQ to SQL in the third part of this
book, but let’s first walk through a simple complete example. To begin with,
you’re probably wondering what db.GetTable<Contact>() means in our LINQ to
SQL query.
Entity classes
The first step in building a LINQ to SQL application is declaring the classes we’ll
use to represent your application data: our entities.
In our simple example, we’ll define a class named Contact and associate it
with the Contacts table of the Northwind sample database provided by Microsoft
with the LINQ code samples.11 To do this, we need only to apply a custom
attribute to the class:
[Table(Name="Contacts")]
class Contact
{
public int ContactID;
public string Name;
public string City;
}
The Table attribute is provided by LINQ to SQL in the System.Data.Linq.Mapping
namespace. It has a Name property that is used to specify the name of the
database table.
In addition to associating entity classes with tables, we need to denote each
field or property we intend to associate with a column of the table. This is done
with the Column attribute:
[Table(Name="Contacts")]
class Contact
{
[Column(IsPrimaryKey=true)]
public int ContactID { get; set; }
[Column(Name="ContactName"]
public string Name { get; set; }
[Column]
public string City { get; set; }
}
The Column attribute is also part of the System.Data.Linq.Mapping namespace. It
has a variety of properties we can use to customize the exact mapping between
our fields or properties and the database’s columns. You can see that we use the
IsPrimaryKey property to tell LINQ to SQL that the table column named
ContactID is part of the table’s primary key. Notice how we indicate that the ContactName
column is to be mapped to the Name field. We don’t specify the names of
the other columns or the types of the columns: In our case, LINQ to SQL will
deduce them from the fields of the class.
The DataContext
The next thing we need to prepare before being able to use language-integrated
queries is a System.Data.Linq.DataContext object. The purpose of DataContext
is to translate requests for objects into SQL queries made against the database and
then assemble objects out of the results.
We will use the Northwnd.mdf database provided with the code samples
accompanying this book. This database is in the Data directory, so the creation of
the DataContext object looks like this:
string path = Path.GetFullPath(@"..\..\..\..\Data\northwnd.mdf");
DataContext db = new DataContext(path);
The constructor of the DataContext class takes a connection string as a parameter.
Because we are using SQL Server 2005 Express Edition, a path to the database
file is sufficient.
The DataContext provides access to the tables in the database. Here is how to
get access to the Contacts table mapped to our Contact class:
Table<Contact> contacts = db.GetTable<Contact>();
DataContext.GetTable is a generic method, which allows us to work with strongly
typed objects. This is what will allow us to use a LINQ query.
We are now able to write a complete code sample, as seen in Listing 1.15.
Executing this code gives the following result:
Bonjour Marie Bertrand
Bonjour Dominique Perrier
Bonjour Guylène Nodier
Here is the SQL query that was sent to the server transparently:
SELECT [t0].[ContactID], [t0].[ContactName] AS [Name], [t0].[City]
FROM [Contacts] AS [t0]
WHERE [t0].[City] = @p0
Notice how easy it is to get strongly typed access to a database thanks to LINQ.
This is a simplistic example, but it gives you a good idea of what LINQ to SQL has
to offer and how it could change the way you work with databases.
Let’s sum up what has been done automatically for us by LINQ to SQL:
- Opening a connection to the database
- Generating the SQL query
- Executing the SQL query against the database
- Creating and filling our objects out of the tabular results
As an exercise, you can try to do the same without LINQ to SQL. For example, you
can try to use a DataReader. You’ll notice the following things in the old-school
code when comparing it with our LINQ to SQL code:
- Queries explicitly written SQL in quotes
- No compile-time checks
- Loosely bound parameters
- Loosely typed result sets
- More code required
- More knowledge required
Writing standard data-access code hinders productivity for simple cases. In contrast,
LINQ to SQL allows us to write data-access code that doesn’t get in the way.
Before concluding our introduction to LINQ to SQL, let’s review some of its
features.
1.6.3 A closer look at LINQ to SQL
You have seen that LINQ to SQL is able to generate dynamic SQL queries based
on language-integrated queries. This may not be adapted to every situation, and
so LINQ to SQL also supports custom SQL queries and stored procedures so that
we can use our own handwritten SQL code and still benefit from the LINQ to
SQL infrastructure.
In our example, we provided the mapping information using custom attributes
on our classes; but if you prefer not to have this kind of information hard-coded
in your binaries, you are free to use an external XML mapping file to do the same.
To get a better understanding of how LINQ to SQL works, we created our entity
classes and provided the mapping information. In practice, typically this code
would be generated by tools that come with LINQ to SQL or using the graphical
LINQ to SQL Designer.
The list of LINQ to SQL’s features is much longer than this and includes things
such as support for data binding, interoperability with ADO.NET, concurrency
management, support for inheritance, and help for debugging. Let’s keep that
for later; we promise that all this and more will be covered in detail in part 3 of
the book.12
1.7 SUMMARY
This first chapter presented the motivation behind the LINQ technologies. You
also took your first steps with LINQ to Objects, LINQ to XML, and LINQ to SQL
code.
Although we have just scratched the surface of the possibilities offered by
LINQ, we hope you now have an idea of the potential power these technologies
provide. As you’ve seen, LINQ is not about taking SQL or XML and slapping
them into C# or VB.NET code. It’s much more than that, as you’ll see soon in
the next chapters.
LINQ unlocks a whole new way to access data from within your applications.
However, LINQ would not be possible without the addition of a number of features
to programming languages. We will start the next chapter by reviewing the
enhancements that have been made to the C# and VB.NET languages to enable
language-integrated queries.
- "It was like you had to order your dinner in one language and drinks in another," said Jason McConnell,
product manager for Visual Studio at Microsoft. "The direct benefit is programmers are more productive
because they have this unified approach to querying and updating data from within their language."
- Syntactic sugar is a term coined by Peter J. Landin for additions to the syntax of a computer language
that do not affect its expressiveness but make it "sweeter" for humans to use. Syntactic sugar gives the
programmer an alternative way of coding that is more practical, either by being more succinct or more
like some familiar notation.
- It is estimated that dealing with the task of storing and retrieving objects to and from data stores
accounts for between 30 and 40 percent of a development team’s time.
- We are talking only about relational databases here because this is what is used in the vast majority of
business applications. Object-oriented databases offer a different approach that allows persisting objects
more easily. Whether object-oriented databases are better than relational databases is another debate,
which we are not going to address in this book.
- WinFS was a project for a relational file system Microsoft had been developing for Windows. It was canceled
in 2006.
- A server-side implementation of XQuery is included with SQL Server 2005, and now that the XQuery
standard has been finalized, Microsoft is once again considering whether to add support for XQuery in
.NET.
- Nevertheless, .NET 2.0 Service Pack 1 is required for LINQ to SQL.
- Release To Manufacturing.
- The new data types provided by SQL Server 2008 are not supported by the first release of LINQ to SQL.
- Here we use public fields in the Book class for the sake of simplicity, but properties and private fields
would be better. Another option is to use auto-implemented properties, which is a new feature of C# 3.0.
You’ll see auto-implemented properties in action in chapters 2, 7, and 13.
- See the CSharpSamples.zip and VBSamples.zip files in the Samples subfolder of your Visual Studio 2008
installation folder.
- It should be noted that while LINQ to SQL includes a lot of functionality, its narrow focus means it
doesn’t include some of the features found in other object-relational mapper products available today
on the market. In 2008, Microsoft will be providing an even broader object-relational mapping solution:
the ADO.NET Entity Framework. We will include a quick introduction to it after discussing LINQ to
SQL later in this book.
|