Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Java provides an ArrayList class, but how does it work? Let’s code up a very basic array list class as a learning aid to get an appreciation of how it might work and how its time complexity compares with other options.
Outside of this training exercise, always use the language provided implementations, as these are fully featured, tested and optimised. The implementations here will be a little more basic to illustrate the main principles.
Our basic implementation will be backed by an array. The limitation with a Java primitive array is the size is fixed, you have to specify it upfront and it cannot change, as all the memory is allocated in a contiguous block.
Define a public API for the supported operations; add, remove and get items for now, along with a method to get the size of the array. By using Java’s generics, the list is more flexible as its able to support any object type and enforce this to ensure the list only contains objects of the same type.
public interface SimpleList<T>
{
void add( T element );
void remove( int index );
T get( int index );
int size();
}
As this is backed by an array (primitive data type), the trade-off it how large to make the array? Too small, and once it’s full, we will have to resize it, too large and we waste space.
If you look at the actual source code for Java’s implementation, this sets a default size of 10, but allows the value to be overridden via the constructor. We will do the same.
Note: You cannot create a generic type primitive array, so we need to use Object to store the data elements, this is ok, as it’s all internal (private access modifier) and not exposed outside of the implementation.
public class SimpleArrayList<T> extends SimpleList<T>
{
private static final int DEFAULT_CAPACITY = 10;
private Object[] data;
public SimpleArrayList()
{
this( DEFAULT_CAPACITY );
}
public SimpleArrayList( final int capacity )
{
data = new Object[ capacity ];
}
}
Let’s implement this TDD, so write a test to add a single item and verify the size fo the list has increased by 1. Note: size() will return the number of elements the calling code has added to the list (and track those removed), this has nothing to do with the size of the underlying primitive array. The Constructor will create an array of size 10 internally, but the size of the list should be zero, as nothing has been added as yet.
@Test
public void shouldGetItemFromListAtIndex0()
{
list.add( 100 );
assertEquals( 100, (int) list.get( 0 ) );
}
Adding an item within the capacity is pretty straightforward. Define a variable to keep track of the current position, and use this as the index into the primitive array, incrementing each time add is called. This an be done with a single line of code – Add the object to the backing array, increment the index ready for the next element to be added.
@Override
public void add( T element )
{
// Store the element in the primitive array at the next available index
data[ currentIndex++ ] = element;
}
This is where things get more interesting, and this is the functionality that makes the ArrayList so powerful and easy to use. Image we specify an initial capacity of 2. This declares a backing array of 2 elements. As this is fixed we cannot go beyond it, as priced with this test (using current implementation).
@Test
public void shouldAddItemBeyondInitialCapacity()
{
list = new SimpleArrayList<>( 2 );
// These 2 are fill the initial capacity
list.add( 100 );
list.add( 101 );
// adding a new element should 'grow' the list
list.add( 102 );
assertEquals( 102, (int) list.get( 2 ) );
}
This test wil FAIL with the following error: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
We cannot add an element beyond the fixed size of the array, so here we need to resize the backing array. This can be done by creating a new larger array, then copying the data form the old smaller array into this one.
@Override
public void add( T element )
{
// If the current backing array is full, resize it
if( currentIndex == data.length )
{
// Create a new array with extra capacity, copy in the old array's data
data = Arrays.copyOf( data, data.length + DEFAULT_CAPACITY );
}
data[ currentIndex++ ] = element;
}
After the array has been copied, there is now room for 10 more elements. This same resizing process will occur again when the 21st element is added (assuming no deletes).
There is a cost to this operation, this is where the trade-off comes into play. If we know we are planning on holding say a minimum 100 items in the array list, its more efficient to specify that value as the initial capacity, otherwise the array will need to be resized 9 times, whereas choosing a more appropriate value, could be no resizing at all. Ok, say why not just define a capacity of 10,000? Well doing this will create a backing array with a capacity of 10,000 elements, and if only using 100, then memory has ben wasted, as 9,900 elements have been provisioned but never used, so this is a waste of memory allocation. For our integers it snot really that big a deal, but for lists holding much larger objects, it could be a problem, as even though the array is empty, the compiler will provision enough memory to hold that number of elements.
There is always a trade-off, in this case its memory over performance. Making the right decision will depend on which one you value the most for a given situation, or maybe just going with a sensible middle-ground option could be the best fit in many cases.