Skip to content

Arrays

A string, as you’ve recently learned, is a sequence or ordered list of zero or more characters. Strings are ideal for representing text, such as words, sentences, or paragraphs.

But what if you need to represent an ordered list of a different data type, such as integers? For example, you might want to store the ages of ten people as a list of ten integers or keep track of the names of a user’s top five favorite restaurants in Corvallis, OR, as a list of five strings.

In C++, there are several ways to create lists. The most common and idiomatic approach is to use a container class template from the Standard Template Library (STL), which is part of the C++ standard library. Examples of these class templates include std::vector, std::list, std::array, std::map, and many others, each designed for specific use cases.

However, these STL containers are built upon a more fundamental, low-level list data type known as an array. Studying arrays provides a deeper understanding of basic list structures, which is why arrays are often introduced before STL containers in programming courses. Learning about arrays now will better prepare you for future programming challenges.

For these reasons, we’ll focus on arrays rather than the STL in this discussion. It’s important to emphasize that this choice is purely for academic purposes. In real-world C++ applications, the STL is generally the preferred option, except in specific cases—such as when a custom high-performance container is required or when containers need to cross DLL boundaries. In fact, plain C++ arrays can be risky to use due to the potential for buffer overflows (which will be discussed later). If you ever write a C++ application in practice, make sure to follow good programming practices and become familiar with the STL. While you won’t be tested on the STL in this course, exploring std::vector is a good starting point for learning about it.

Finally, there are two types of arrays in C++: statically allocated arrays (with automatic storage duration) and dynamically allocated arrays (with dynamic storage duration). In this course, we will focus only on statically allocated arrays.

Declaring Statically Allocated Arrays

In C++, there is no generic “array data type.” Instead, arrays are strictly homogeneous, meaning all elements in an array must be of the same data type, and that data type must be specified when the array is declared.

Since arrays are variables, they must be declared like any other variable. An array represents a fixed location in memory with a specific size and type, all of which are determined during declaration.

It’s important to note that arrays, like other variables, have a fixed size. This may seem counterintuitive because arrays represent lists, but their size cannot change once declared—they cannot grow or shrink. This behavior differs from std::string variables, which can be reassigned to hold larger or smaller strings as needed. For statically allocated arrays, both the type and size are specified at the time of declaration. Dynamically allocated arrays, however, can change size after declaration.

When declaring a statically allocated array, you must ensure its size is “big enough” to accommodate all the data it will ever need to store, even if it doesn’t initially require that much space. If the array is larger than needed, unused elements can remain uninitialized and ignored until required. In practical scenarios, dynamically allocated arrays or STL containers (like std::vector) are typically preferred for resizable lists.

Syntax

The syntax for declaring a statically allocated array in C++ is:

<element data type> <array variable name>[<array size>];
  • Replace <element data type> with the desired data type of the elements in the array. • Replace <array variable name> with the name you want to give the array.
  • Replace <array size> with the number of elements the array should hold.

For example, the following code declares an array named pages containing 200 strings (each representing a page in a book):

std::string pages[200];

Compile-Time Constants

An essential detail of this syntax is that the size of a statically allocated array must be a compile-time constant. This restriction stems from the definition of “statically allocated”:

  • Static means the size is determinable at compile time.
  • Allocation means deciding the total memory required for the array variable.

Since the total memory depends on the array’s size, the size must be known at compile time. Consider the following examples:

array_examples.cpp
// A valid statically allocated array of 200 strings.
std::string pages[200];
// A valid statically allocated array of x doubles, where x is a compile-time constant.
constexpr int x = 5;
double numbers[x];
// An invalid statically allocated array of n booleans, where n is a runtime variable.
int n = 12;
bool my_booleans[n]; // Syntax error

While the third example will compile and run with g++ by default, it uses a feature called variable-length arrays (VLAs). VLAs allow arrays to have non-constant sizes at compile time, but they are not part of the formal C++ standard. As a result, VLAs are brittle, non-portable, and should generally be avoided in C++ (though they are officially supported in C since C99).

To disable VLA support in g++, use the -Werror=vla flag during compilation. For instance:

Terminal window
g++ -Werror=vla -o my_program my_program.cpp

When applied to the invalid example, the compiler produces the following error:

Terminal window
array_examples.cpp:15:22: error: variable length arrays are a C99 feature [-Werror,-Wvla-extension]
15 | bool my_booleans[n]; // Syntax error
| ^
array_examples.cpp:15:22: note: read of non-const variable 'n' is not allowed in a constant expression
array_examples.cpp:14:9: note: declared here
14 | int n = 12;
| ^
1 error generated.

Tradeoffs

Statically allocated arrays have strict limitations: their sizes must be compile-time constants, and they cannot be resized after declaration. However, these restrictions enable compilers to implement them efficiently. If these constraints are problematic, consider using dynamically allocated arrays or STL containers, which are more flexible. They are outside the scope of this course.

Arrays in Memory

Although we haven’t focused much on memory in this course, understanding it is essential when working with arrays.

Every byte of data in your computer is stored at a unique numeric location called a memory address. While memory addresses are simply non-negative integers, they are often expressed in hexadecimal format for readability (e.g., 15 in hexadecimal is written as 0x000F). Technically, a “byte” is defined as the smallest addressable unit of memory, which on most modern computers corresponds to 8 bits.

Most data types require more than one byte of storage. For instance, on the ENGR servers, an int-typed variable occupies 4 bytes. This means that an int technically spans 4 memory addresses. However, because memory is ordered sequentially, these bytes are treated as a single block with a specific starting point. When we refer to a “variable’s memory address,” we’re typically talking about the address of the first byte in that block.

Memory Structure

In C++, an array’s memory is contiguous. This means that all the elements of the array are stored sequentially in memory. When you allocate an array, it occupies a single continuous block of memory.

For example, suppose you allocate an array of 100 integers, and each integer requires 4 bytes of memory. The array will require a total of 400 bytes of memory. The memory address of the first byte in this block is called the array’s base address. If the base address of the array is 0x0004 (byte 4), the memory layout would look like this:

  • The first integer starts at 0x0004 (byte 4),
  • The second integer starts at 0x0008 (byte 8),
  • The third integer starts at 0x000C (byte 12),
  • The fourth integer starts at 0x0010 (byte 16), and so on.

Each integer occupies 4 bytes, and the elements are tightly packed with no wasted space.

Advantages of Contiguous Memory

This memory structure makes arrays highly memory efficient, as they do not waste space between elements. Additionally, arrays support a feature called direct access (or random access). This means your computer can access any element in the array in constant time, regardless of its position. For example, accessing the millionth element in an array is just as fast as accessing the 10th element.

How Direct Access Works

The efficiency of direct access is achieved through pointer arithmetic. Suppose the base address of an int array is 0x000F (byte 15), and you want to access the 17th integer. Since each int occupies 4 bytes, your computer calculates the memory address of the 17th integer by shifting over 16 integers from the base address:

  • 16×4=6416 \times 4 = 64 bytes.
  • Adding 64 bytes to the base address (0x000F) yields 0x004F (byte 79).

This calculation determines the exact memory address of the 17th integer. Once the address is known, modern memory hardware enables the computer to retrieve the value in constant time, as RAM (Random Access Memory) is specifically designed to support direct access.

Practical Implications

While these details are fascinating, you don’t need to perform such calculations manually when working with arrays in code. Instead, you interact with arrays using indices, and the compiler handles the conversion from indices to memory addresses using pointer arithmetic. Nonetheless, understanding how arrays are stored and accessed in memory provides valuable context and deepens your understanding of how computers operate.

Accessing Array Elements

Once you’ve declared an array variable, you can reference it by name, just like any other variable. However, working with arrays is slightly different from working with other types of variables.

Names and Base Addresses

After declaring an array, its name acts as a handle to the array’s base address—the memory address of the first byte of the array. For example, consider the following code:

array_doubles.cpp
#include <iostream>
int main() {
// Declare an array of 10 doubles:
double my_list_of_numbers[10];
// Print the array's base address:
std::cout << my_list_of_numbers << std::endl;
return 0;
}

When you run this program, the output will display something like this:

Terminal window
0x16b82af88

This hexadecimal value represents the array’s base address.

Accessing and Writing Individual Elements

To work with the actual values stored in an array, you need to access its individual elements. Unlike some programming languages, C++ doesn’t provide built-in aggregate operations for arrays. For instance, there are no simple operators to print all elements of an array, add two arrays, or concatenate them. Instead, you manipulate arrays by accessing and modifying their elements individually, often using loops.

Array elements are accessed using their indices. Arrays in C++ are zero-indexed, meaning:

  • The first element is at index 0.
  • The second element is at index 1.
  • And so on.

To access an array element at a specific index, use the subscript operator, which consists of square brackets ([]). The syntax is:

<array name>[<index>]

This provides direct access to the element at the specified index, allowing you to both read and modify its value.

Here’s an example of declaring an array of five doubles and initializing all its elements to zero:

double my_list_of_numbers[5];
my_list_of_numbers[0] = 0;
my_list_of_numbers[1] = 0;
my_list_of_numbers[2] = 0;
my_list_of_numbers[3] = 0;
my_list_of_numbers[4] = 0;

To print the value of the third element:

std::cout << my_list_of_numbers[2] << std::endl;

This will output 0 because all elements were initialized to zero.

Using Loops

Manually initializing or accessing each element can become tedious for large arrays. Instead, it’s common to use loops, especially for loops, to work with arrays. For example, the following program initializes an array of 100 integers with values from 1 to 100 and prints them:

array_ints_for.cpp
#include <iostream>
int main() {
int values[100];
// Initialize the array with values 1 to 100
for (int i = 0; i < 100; i++) {
values[i] = i + 1;
}
// Print the array values
for (int i = 0; i < 100; i++) {
std::cout << values[i] << std::endl;
}
return 0;
}

Declaration vs. Element Access

It’s important to distinguish between array declaration syntax and element access syntax, as they look similar.

When declaring an array, you specify the element type, array name, and size:

std::string my_array[20];

When accessing an array element, you specify the array name and the target index:

my_array[20]; // Accesses the 21st element

The difference lies in the presence of a type specifier:

  • If a type specifier (e.g., std::string) is included, you’re declaring a new array.
  • If there’s no type specifier, you’re working with an existing array.

Additionally, the meaning of square brackets changes:

  • In declarations, square brackets specify the array’s size.
  • In element access, square brackets represent the subscript operator, which specifies an index.

Reading Data into Arrays

The subscript operator ([]) takes an index as its operand and provides direct access to the array element at that index. This allows you to both read from and write to individual elements of an array.

Because of this, you can directly read input from std::cin and store it in a specific array element using the subscript operator. For example, the following code demonstrates how to populate an array of int elements with user-provided data using the stream extraction operator (>>):

// Reading integers into an array
int array_of_numbers[10];
for (int i = 0; i < 10; i++) {
std::cout << "Enter a number: ";
std::cin >> array_of_numbers[i];
}

Similarly, for an array of std::string values, you can use std::getline() to read entire lines of input and store them in individual array elements:

// Reading strings into an array using std::getline
std::string array_of_sentences[5];
for (int i = 0; i < 5; i++) {
std::cout << "Enter a sentence: ";
std::getline(std::cin, array_of_sentences[i]);
}

Arrays Assignment and Initialization

While you can modify individual array elements using the assignment operator (=) with the subscript operator ([]), you cannot assign or reassign an entire array. For example, the following code snippet would produce a syntax error:

int numbers[100];
int other_numbers[100];
numbers = other_numbers; // Syntax error

To understand this restriction, recall that a variable is simply a name that represents a fixed location in memory with a specific size and type. For arrays, the array name (after declaration) specifically refers to the base address of the array—the memory address of the first element in the sequence.

The statement numbers = other_numbers would attempt to change the base address of numbers to match that of other_numbers. However, this would violate the principle that variables represent fixed locations in memory. For numbers to reflect the base address of other_numbers, the data in numbers would need to be moved to a new location. Since this contradicts the immutability of variable memory locations, C++ disallows reassignment of entire arrays.

If you want to copy the contents of one array into another, you must do so by copying each element individually. This can be achieved using a loop or a higher-level mechanism. Here’s an example using a loop:

// Copying array elements manually
for (int i = 0; i < 100; i++) {
numbers[i] = other_numbers[i];
}

By iterating over each element, you can replicate the data from one array to another while adhering to C++‘s memory constraints.

When you declare an array, you can initialize its elements directly using an initializer list. This list is enclosed in curly braces ({}) and contains the initial values for each element, separated by commas. The initializer list is placed after the array name and size during declaration.

int arr[5] = {1, 2, 3, 4, 5};

This is part of the array’s construction process and does not involve reassigning an existing array. The initializer list is a convenient way to set the initial values of an array when it is first declared.

An invalid but common mistake is to attempt to use the assignment operator (=) to initialize an array after declaration. This is not allowed in C++:

int arr[5];
arr = {1, 2, 3, 4, 5}; // Syntax error

When you declare an array, arr is not a modifiable lvalue—it represents a fixed block of memory, and its address cannot be changed or reassigned.

Lastly, what happens when you partly initialize an array? For example, consider the following code:

int arr[5] = {1, 2};

What would happen if you try accessing the uninitialized elements of arr? The answer is that the uninitialized elements of partially initialized arrays are automatically initialized to zero (for scalar types), \0 (for characters), or an empty string (for strings). This is a feature of C++ that ensures all elements of an array are initialized, even if you provide only a partial initializer list. Global or static arrays are zero-initialized by default.

Passing Arrays to Functions

Passing an array to a function is straightforward in C++, but the process involves some unique details worth understanding.

Declaration and Calling Syntax

To use an array as a function parameter, the syntax for declaring the parameter is similar to that of declaring a local array variable. For example, the following function receives an array of 10 std::string elements and prints them:

// Function to print an array of 10 strings
void print_strings(std::string my_strings[10]) {
for (int i = 0; i < 10; i++) {
std::cout << my_strings[i] << std::endl;
}
}

You can call this function by providing an array as an argument:

// Main function using print_strings
int main() {
std::string array_of_strings[10];
// Ask the user for 10 strings
for (int i = 0; i < 10; i++) {
std::cout << "Enter a sentence: ";
std::getline(std::cin, array_of_strings[i]);
}
// Print the strings
print_strings(array_of_strings);
}

The above function works only for arrays of exactly 10 strings because the size is embedded in the parameter declaration. To make the function more flexible, you can omit the size in the parameter declaration by using empty square brackets ([]). This syntax is valid only for function parameters, not for local array declarations:

// Function with flexible array size
void print_strings(std::string my_strings[], int n_strings) {
for (int i = 0; i < n_strings; i++) {
std::cout << my_strings[i] << std::endl;
}
}

With this approach, the function works with arrays of any size, but you must explicitly pass the array’s size as an additional parameter:

// Main function using print_strings with size parameter
int main() {
std::string array_of_strings[10];
// Ask the user for 10 strings
for (int i = 0; i < 10; i++) {
std::cout << "Enter a sentence: ";
std::getline(std::cin, array_of_strings[i]);
}
// Print the strings along with the array size
print_strings(array_of_strings, 10);
}

This pattern—passing both an array and its size—is common because C++ lacks a built-in way to deduce an array’s size from its variable.

Arrays as Memory Addresses

When passing an array to a function, the array’s behavior differs from regular variables. Instead of copying the array, only the base address of the array (the memory address of its first element) is passed. This behavior makes array parameters act like reference parameters.

Consider the function call print_strings(array_of_strings, 10). Here, the base address of array_of_strings is passed to the function. Inside the function, pointer arithmetic is used to access elements in the array using the subscript operator ([]).

This means that the function’s array parameter my_strings[] is not an entire copy of the supplied array argument, but rather a copy of the base address referred to by the supplied array argument.

Remember, when you call a function with non-array variable argument, the argument is fully copied into the parameter, and you end up with two separate variables (the argument and parameter) each with their own entirely separate memory and separate data. In such a case, modifying the parameter’s data will not effect the argument’s data.

void foo(int x) {
x += 1;
}
int main() {
int y = 0;
foo(y);
std::cout << y << std::endl; // Outputs 0
}

For arrays, however, only the base address is copied, meaning both the function’s parameter and the original array refer to the same memory. This allows functions to modify the array’s contents directly.

This means that modifications to the elements of an array parameter do affect the data in the corresponding array argument. In some sense, this means that arrays behave like reference parameters (e.g., the second string parameter in std::getline()). Study the below example for a better understanding of how this works:

array_as_reference.cpp
void foo(int x[]) {
// Print the base address of the array
std::cout << x << std::endl;
// Modify the first element
x[0] = 10;
}
int main() {
int y[10];
y[0] = 100;
// Print the base address of y
std::cout << y << std::endl;
foo(y);
// Print the modified first element of y
std::cout << y[0] << std::endl; // Outputs 10
}

Returning a Statically Allocated Array

While it’s technically possible to return a statically allocated array from a function, it’s almost always a bad idea due to the risk of issues like dangling pointers, depending on the allocation context. We’ll discuss these pitfalls in more detail during the lecture on references.

Instead of attempting to return an array from a function, a better approach is to have the function receive a statically allocated array as a parameter, as described in the previous section. The function can then modify the array’s elements directly. Since arrays in C++ are via their base address, any changes made to the array within the function will be reflected in the original array used as the argument. This effectively achieves the same result as “returning” an array.

“Resizing” a Statically Allocated Array

Statically allocated arrays cannot be resized after they are declared. However, there is a common trick, especially in C where the STL is unavailable, that allows you to simulate resizing. The idea is straightforward:

  1. Define a Maximum Size: Choose a “maximum size” for the array, which should reflect the largest number of elements you will ever need to store.
  2. Declare the Array: Create an array with this maximum size.
  3. Use a Size Variable: Introduce a variable (e.g., an int) to keep track of the number of elements in the array that your program “acknowledges” . Let’s call this variable n.
  4. Acknowledge Elements: Treat any elements beyond index n - 1 as unused and ignore them in your program logic.
  5. Add and Remove Elements: To “add” an element, store it at index n then increment n by 1. To “remove” an element, decrement n by 1. Insertions and deletions at other positions require rearranging elements before adjusting n.

Below is an example that uses this strategy to manage an array of integers, allowing the user to input up to 100 integers.

array_resizing.cpp
#include <iostream>
int main() {
// Array with a maximum capacity of 100 integers
int arr[100];
// Initially, the array acknowledges 0 elements
int arr_size = 0;
// Variable to hold user input for continuing
char user_input;
// Prompt the user for numbers until they choose to stop or the array is full
do {
std::cout << "Enter a number: ";
// Store the number at the next available index
std::cin >> arr[arr_size];
// Increment the size variable to acknowledge the new element
arr_size++;
// Ask if they want to continue, but only if the array isn't full
if (arr_size < 100) {
std::cout << "Would you like to enter another number? Type Y for yes: ";
std::cin >> user_input;
}
} while (user_input == 'Y' && arr_size < 100);
// Print all acknowledged elements in the array
std::cout << "You entered the following numbers:\n";
for (int i = 0; i < arr_size; i++) {
std::cout << arr[i] << std::endl;
}
return 0;
}

Here’s the explanation:

  1. Initial State:
    • The array arr is declared with a fixed size of 100.
    • The variable arr_size starts at 0, meaning no elements are initially acknowledged.
  2. Input Loop:
    • The program prompts the user to input numbers.
    • Each number is stored at the index arr_size, and the arr_size variable is incremented to acknowledge the new element.
    • The loop stops either when the user chooses to stop (user_input != 'Y') or when the array reaches its maximum size (arr_size == 100).
  3. A for loop prints only the acknowledged elements (those with indices from 0 to arr_size - 1).
  4. Any elements beyond remain uninitialized and unused. These elements are safely ignored throughout the program.

The downsides are twofold. First, you must decide on the maximum number of elements the array can hold. If the actual number of elements exceeds this limit, the array cannot accommodate them. Second, if the array doesn’t reach its maximum capacity, unused elements waste memory. These elements are uninitialized and serve no purpose.

Dynamically allocated arrays and STL containers (e.g., std::vector) eliminate these limitations but involve trade-offs, such as slightly increased complexity or runtime overhead, but they are better suited for scenarios where the array size cannot be predicted in advance.

Buffer Overflows

When you provide an out-of-range index to a std::string’s .at() function, an exception is thrown, typically terminating the program (unless you handle the exception, which we haven’t covered). But what happens if you pass an out-of-range index to an array’s subscript operator ([])?

You might assume it behaves similarly, throwing an exception and terminating the program. Unfortunately, the reality is much worse: passing an out-of-range index to an array subscript operator results in undefined behavior. This kind of error, known as a buffer overflow, is particularly dangerous, and the C++ standard provides no guarantee about what will happen in such cases.

Undefined Behavior

When a buffer overflow occurs, the program’s behavior is unpredictable. It might:

  • Seem to work correctly.
  • Crash unexpectedly.
  • Corrupt data in memory.
  • Introduce subtle bugs that are difficult to diagnose.

Unlike exceptions, which provide error messages and diagnostics, buffer overflows often fail silently. This lack of immediate feedback makes them difficult to detect and fix.

Security Risks

The risks associated with buffer overflows go beyond simple program instability. Due to the memory structure of arrays, buffer overflows can pose severe security threats. Let’s revisit how arrays are stored and accessed in memory to understand why.

Remember, arrays are stored contiguously in memory, and their name serves as a handle to the base address (the address of the first element). When you use the subscript operator ([]), the compiler calculates the memory address of the indexed element by performing pointer arithmetic:

  • Offset Calculation: The address is calculated by adding an offset to the base address. The offset is determined by multiplying the index by the size of each element in bytes.

For example:

  • The element at index 0 is at the base address.
  • The element at index 1 is located at base address + size of one element.
  • The element at index 2 is at base address + 2 * size of one element.

If you supply an out-of-range index, the subscript operator performs the same calculation but computes the address for a memory location outside the array’s bounds.

When an out-of-range index is used, the subscript operator might access memory that belongs to other variables or even parts of the program’s control flow. If the program modifies this memory, it can corrupt unrelated data, leading to unpredictable behavior.

Buffer overflows become even more dangerous when they extend into memory regions used for program control, such as the return address—a memory location that stores the address of the next instruction to execute after a function call. If a buffer overflow modifies the return address:

  • The program might jump to an incorrect memory location after the function ends.
  • It could execute unintended or malicious code.

Skilled attackers can exploit buffer overflows to gain control over a program. For example:

  1. Arbitrary Code Execution (ACE): Attackers manipulate a buffer overflow to modify the return address and inject malicious code. The program is coerced into executing this code.
  2. Remote Code Execution (RCE):** In networked applications, attackers exploit buffer overflows remotely to execute arbitrary code on the target machine.

The simplest way to avoid buffer overflows is to avoid raw arrays in favor of high-level containers, such as those in the Standard Template Library (STL). STL containers, like std::vector and std::array, provide bounds-checked access through their .at() member function. If an out-of-range index is supplied, these functions throw exceptions instead of causing undefined behavior.

This discussion is intended, in part, to emphasize the risks of using raw arrays in C++—whether statically or dynamically allocated—in real-world applications. Unless absolutely necessary and justifiable, you should avoid raw arrays and instead rely on high-level container objects.

If you plan to use C++ beyond an academic setting, take the time to study the STL and learn how to use it effectively. Additionally, familiarize yourself with related topics such as dangling pointers, which can also lead to Arbitrary Code Execution (ACE) exploits, and explore tools like smart pointers and Resource Acquisition Is Initialization (RAII) to manage memory safely.

Finally, if you must use raw arrays, ensure your program rigorously avoids supplying out-of-range indices to the subscript operator. This is essential for maintaining both program stability and security.