What to Know About Value Types and Reference Types in C#: ASP Alliance


What to Know About Value Types and Reference Types in C#

page

by Brendan Enrick
Feedback

Average Rating:

Views (Total / Last 10 Days): 40503/ 74

Introduction

Anyone planning on writing code in C# should understand the data types available. Before diving into this, however, it is important to understand value types and reference types. This article is designed to give you a quick understanding of these. This article is not meant for people who already understand the difference between these two categories of data types. Knowing this information will give you a much better understanding of how data is stored and how to correctly interact with this data in C#. As you read on, I explain the difference between these two classifications of data types, and I explain how these differences impact you as a C# developer.

Simple Conceptual Difference

I will start by giving a short and simple explanation of the differences between these two. Data types in C# will either be value types or reference types. You can think of Value types as actually being the "value." All we are storing is that value. Think of Reference types as being a "reference" to the value. In this case we are storing a reference to the value you are using.

Value Type Examples

· int

· float

· bool

· byte

Reference Type Examples

· string

· object

· List

· Regex

Notice that the simpler, baser types tend to be values and the more advanced types tend to be references. If it really seems like what is being stored is just one single value, it is likely what you are dealing with is a value type. If the data seems more complex or there are simply more bits and pieces of information which are stored together, it is likely you are dealing with a reference type.

Value types are often able to be set to literal values. (42, 3.14159, true, false)

Reference types are similar to pointers in C and C++. (They are not the same, but they are close.) I say this because it helps people understand reference types to think of them as being like pointers to the data. Some C++ purists will probably criticize me for explaining it this way.

Figure 1: Basic Concept

In Figure 1, I have defined two variables: foo and bar. Notice that the value of foo is stored at the location of foo, but with bar there is only a reference to the letters.

Storage Considerations

One fairly obvious statement is that data types take up different amounts of space. An integer obviously does not take as much space as a Person class would. The Person class would require space for a name, birth date, etc. We also have collection types, and with collections we do not know how much space we will need for them. It is not defined in advance at all. We need to consider and understand how each of these cases is handled.

As a general rule, we store value types on the stack in C# and we store reference types on the heap. This is not entirely correct, but it is fairly close. Value types are stored on the stack when they're within the scope of a method. This is because methods live on the stack. If the value type for example is associated with a class, it is being associated with a reference type and is therefore stored with the reference type. This means that it will be included on the heap along with the rest of the data associated with the reference type.

Stack

The stack manages the currently executing code. The information currently running is here. The currently executing code is here, which means that it contains for example method-specific information. This means local variables are stored here, which includes value types and in some sense the reference types. The reference types are here, but they only contain the address of the actual data here. This lets you have access to them in the current method even though the data is stored elsewhere. The stack builds up as we go deeper into methods. You have probably seen a "Stack Trace." It is basically some diagnostic info showing what the current information on the stack is. For the most part, this is just a list of nested methods. We call it a stack because each new context stacks on top of the previous one.

Heap

The heap is a separate section of memory whose main purpose is simply to manage the data we want to declare dynamically. The main benefit of this memory is that it is easily allocated for objects of varying size, collections usually. This is the area where the actual data is stored for variables of reference type. We have a pointer to the data which we do not see because .NET is a managed language. The pointer is the variable we are using. It is stored on the stack. It is the way we access this heap-stored data. This is why we call it a reference type. It is because we basically just keep a reference to the data.

Dynamic Data Types

Some data types require an unknown amount of space. This creates the need for memory to be dynamically allocated for this data. When we dynamically allocate data, we are allocating this data in the managed heap of C#. As I have said previously, our collections are the primary form of dynamic data in C#. Lists, stacks, queues, etc. are a few of the collections which we use. These are obviously all reference types. If you tried to store collections on the stack it would get pretty messy pretty quickly, since collections tend to grow and shrink. That's why the heap is so much nicer than the stack for dynamic data.

Reference Assignment

One of the hardest concepts for people to understand when it comes to reference types is assignment. It is often difficult to realize how assignments work with reference types. The problem with this is that a lot of people assume that value types and reference types handle assignment in the same way. This is wrong. Assignments with value types copy the value stored in the memory location. Reference types merely copy the reference to the value.

Basic assignment of reference types

When you assign a variable which is a reference type to another variable of the same type, it does not duplicate the values. What happens behind the scenes is that the pointer to the memory location is duplicated.

In the following code I create two variables; both of them are strings. The first one I start by giving a string value. The second one I start by assigning null. I then assign it to the first variable. In the accompanying illustration you can see what changes after the assignment.

Listing 1: Reference Assignment

string myName = "Brendan";
string authorName = null;
 
authorName = myName;

Figure 2: Reference Assignment

It is important to notice here that we have not copied the data. All we have done is copied that reference. This means that we are pointing at the exact same data for both variables. I have seen many people run into problems where they assumed that an assignment with reference type variables worked the same way as value types. They assume the value has been duplicated. A dangerous assumption since they may alter the value not realizing it alters both.

Null Values

One big difference between value and reference type variables is null. With a reference type, having the null value basically means that it is not referencing anything. Value types obviously cannot ever have the null value because we are saying that it is a special value having to do with references.

If you have ever tried to assign the null value to a value type, such as assigning null to an integer variable, you have probably received a nice compiler message. This is because value types do not reference anything, so the concept of null makes no sense for them. It is not a valid integer value.

Nullable Types

In C# we have some special types which allow us to have our value types as allowing nulls. These are called Nullable Types. This is not really true the way I have described it here. The main way to obtain a nullable type is to follow the type identifier with a question mark. So, for example, I would define one in the following way.

Listing 2: Nullable Values

int? myNullableInteger = null;
bool? myNullableBool = null;

What I have done here is not really what it appears. I am allowing a null value for an integer, but not really. The way this works is that there is a struct called Nullable. It is a generic type, so really I made Nullable<Int32> and a Nullable<Boolean> with the above statements. The Nullable struct defines a pair of read-only properties: HasValue and Value. HasValue is just used to check if it is null or not. If it is not null then you are safe to check the Value.

Boxing and Unboxing

The concepts of boxing and unboxing are interesting ones. They're associated with value and reference types. Boxing is the process through which a programming language stores a value type as a reference type. You might be wondering how this is different from the Nullable type discussed in the section before this one.

Nullable is a special class which use generics to have a value type they manage, and this is how they give the illusion of a nullable value. When you use boxing, it is not using generics in such a nice way.

When you use boxing, the fact that your variable is a value type is lost as the variable is stored as an object. A good way to remember this term is that boxing is hiding your value inside of a box so you no longer know what type of value it is. The following lines of code show an example of this.

Listing 3: Boxing and Unboxing

int myValue = 7; // Value type
object myObject = myValue; // Implicitly boxing the value
object myObject2 = (object)myValue; // Explicitly boxing the value
 
int theValueAgain = (int)myObject; // Explicitly unboxing the value

As you can see in this example, boxing and unboxing is easy to do and happens often. If ever a value type is converted to object (which is a reference type) boxing must occur. When you extract the value type back out of that object you are unboxing the value.

An easy way to remember these terms is that when you put something in a box you no longer see what it is, you just see the box. When you pull something out of a box, you see what it really is since it is no longer hidden away in the box.

Notes to remember

Value types store the value and reference types store a reference to the value. The value types are how most people think of variables. Reference types work a little bit like pointers in other languages. Since C# is a managed language these pointers are hidden away so the programmer does not have access to them.

Assignment of reference types does not duplicate the value as is done with value types. The reference types only copy the reference to the value. This means that the information for a reference type is only stored in that one spot and multiple variables can point to it.

Reference types can be null. Value types cannot be null, but a struct exists which can wrap around value types so they can be null. The Nullable struct can be used to allow the types which are normally value types to be able to handle null values.

Boxing is the process of storing a value type as a reference type. Unboxing is the process of retrieving the value type back from the reference type.

Conclusion

After reading this article you should have a basic understanding of how value types and reference types work. Do not believe you are an expert on the subject now, reading this article will not make you one. This should give you enough of an understanding of the subject that working with them in the future will not be difficult. When problems with them arise you will understand how they work and be able to work your way through. Go forth with this new knowledge and solve the problems of the programming world.

About the Author

Brendan Enrick