class: center, middle # Java Collections --- # Main interfaces
.center[![](images/colls-coreInterfaces.png)] --- class: center, middle # java.util.Collection --- # Collection * A collection is an object that represents a group of objects; * The Collections Framework is a unified architecture for representing and manipulating collections; * Enables collections to be manipulated independently of implementation details; * The primary advantages of a collections framework are that it: * Reduces programming effort by providing data structures and algorithms so you don't have to write them yourself; * Provides high-performance implementations of data structures and algorithms; ??? * Because the various implementations of each interface are interchangeable, programs can be tuned by switching implementations; --- # Collection * The primary advantages of a collections framework are that it: * Provides interoperability between unrelated APIs; * Reduces the effort required to learn or to design and implement ad hoc APIs; * Fosters software reuse by providing a standard interface for collections and algorithms with which to manipulate them; ??? * Provides interoperability: by establishing a common language to pass collections back and forth; * Reduces the effort: by not requiring you to learn and produce your own ad hoc collections APIs; --- # Collection * Many modification methods in the interfaces are labeled optional; * The documentation for each implementation must specify which optional operations are supported; * Several terms are introduced to aid in this specification: * Collections that do not support modification operations are referred to as "unmodifiable", the counterpart being "modifiable"; * Collections that guarantee that no change in the collection object will be visible are referred to as "immutable", the counterpart being "mutable"; ??? * Optional: implementations are permitted to not perform one or more of these operations, throwing a runtime exception (UnsupportedOperationException) if they are attempted; * Modification operations such as add, remove and clear; --- # Collection * Several terms are introduced to aid in this specification: * Lists that guarantee that their size remains constant even though the elements can change are referred to as "fixed-size", the counterpart being "variable-size"; * Lists that support fast (generally constant time) indexed element access are known as "random access lists", the counterpart being "sequential access lists"; * The RandomAccess marker interface is used by lists to advertise the fact that they support random access; ??? * This enables generic algorithms to change their behavior to provide good performance when applied to either random or sequential access lists; --- class: center, middle # Set --- # Set * A collection that contains no duplicate elements and at most one null element; * All constructors must create a set that contains no duplicate elements; * Some set implementations have restrictions on the elements that they may contain: * Some implementations prohibit null elements; * Some have restrictions on the types of their elements; * Common implementations: * HashSet * TreeSet * LinkedHashSet --- # TreeSet * Based on a TreeMap; * Not synchronized; * Elements are ordered using their natural ordering, or by a Comparator provided at set creation time; * Provides guaranteed log(n) time cost for the basic operations (add, remove and contains); --- # HashSet * Backed by a HashMap instance; * Permits the null element; * Not synchronized; * Makes no guarantees as to the iteration order of the set; * In particular, it does not guarantee that the order will remain constant over time; * Offers constant time performance for the basic operations: add, remove, contains and size; --- # LinkedHashSet * Hash table and linked list implementation of the Set interface, with predictable iteration order; * Permits the null element; * Not synchronized; * Differs from HashSet because it maintains a doubly-linked list running through all of its entries, so it maintains insertion-order; * Provides constant-time performance for the basic operations: add, contains and remove; --- class: center, middle # List --- # List * An ordered collection (a sequence); * User has precise control over where each element is inserted, and can access elements by their index; * Unlike sets, lists typically allow duplicate elements, including multiple null elements; * Some list implementations have restrictions on the elements that they may contain: * Some implementations prohibit null elements; * Some have restrictions on the types of their elements; * Common implementations: * Vector * ArrayList * LinkedList --- # Vector * This class was retrofitted to implement the List interface, making it a member of the Java Collections Framework; * Is roughly equivalent to ArrayList, except that it is synchronized; * If a thread-safe implementation is not needed, it is recommended to use ArrayList; --- # ArrayList * Implements a growable array of objects; * The size can grow as needed to accommodate adding items after the ArrayList has been created; * Operations size, isEmpty, get, set, iterator, and listIterator run in constant time; * All of the other operations run in linear time (roughly); * Each ArrayList instance has a capacity and, as elements are added, its capacity grows automatically; ??? * The add operation runs in amortized constant time; * The capacity is always at least as large as the list size; --- # LinkedList * [Doubly-linked list](https://en.wikipedia.org/wiki/Doubly_linked_list) implementation of the List and Deque interfaces; * A set of sequentially linked nodes; * Each node contains two fields that are references to the previous and to the next node in the sequence; * The beginning and ending nodes point to null; * Not synchronized; * Implements all optional list operations, and permits all elements (including null);
.center[![](images/doubly-linked-list.png)] --- class: center, middle # Queue --- # Queue * A collection designed for holding elements prior to processing; * Provide additional insertion, extraction, and inspection operations; * Every Queue implementation must specify its ordering properties; * Queues typically, but do not necessarily, order elements in a FIFO manner; * Priority queues order elements according to a supplied comparator, or the elements' natural ordering; * Stacks are queues ordering the elements in a LIFO manner; * The head of the queue is always that element which would be removed by a call to remove() or poll(); ??? * Each these additional operations exists in two forms: * One throws an exception if the operation fails, designed specifically for use with capacity-restricted Queue implementations; * The other returns a special value (null or false), for most implementations, where insert operations cannot fail; --- # PriorityQueue * Does not permit null elements; * Not synchronized; * The elements of the priority queue are ordered according to their natural ordering, or by a provided Comparator; * The head is the least element with respect to the specified ordering; ??? * Head: if multiple elements are tied for least value, the head is one of those elements -- ties are broken arbitrarily; --- # PriorityQueue * Is unbounded, but has an internal capacity; * Capacity is always at least as large as the queue size; * As elements are added, its capacity grows automatically; * Provides: * O(log(n)) time for offer, poll, remove and add methods; * Linear time for the remove(Object) and contains(Object) methods; * Constant time for the retrieval methods (peek, element, and size) --- class: center, middle # Deque --- # Deque * Linear collection that supports insertion and removal at both ends; * Name is short for "double ended queue", usually pronounced "deck"; * Extends the Queue interface; * When a deque is used as a queue, FIFO behavior results; * Deques can also be used as LIFO stacks; * Should be used in preference to the legacy Stack class; * Provides two methods to remove interior elements, removeFirstOccurrence and removeLastOccurrence; * Does not provide support for indexed access to elements; * LinkedList is the most popular implementation; --- class: center, middle # Comparisons ## ArrayList vs LinkedList --- # ArrayList vs LinkedList * LinkedList implements List with a doubly-linked list: * Allows for constant-time insertions or removals using iterators; * You can walk the list forwards or backwards, but finding an element in the list takes time proportional to its size; * ArrayList implements List with a dynamically re-sizing array: * Allows constant time random read access; * Adding/removing from anywhere but the end requires shifting all the latter elements over; * When the capacity of the underlying array is exceeded, a bigger array is allocated; ??? * ArrayList resizing makes add operations O(n) at worst case, but constant on average; --- # LinkedList Complexities * get(int index) is O(n) average; * .green[add(E element) is O(1);] * add(int index, E element) is O(n) average; * remove(int index) is O(n) average; * .green[Iterator.remove() is O(1);] * .green[ListIterator.add(E element) is O(1);] --- # ArrayList Complexities * .green[get(int index) is O(1);] * .green[add(E element) is O(1) amortized]; * O(n) worst-case, while resizing array; * add(int index, E element) is O(n) average; * remove(int index) is O(n) average; * Iterator.remove() is O(n) average; * ListIterator.add(E element) is O(n) average; --- # ArrayList vs LinkedList Complexities
.center[![](images/complexity-arraylist-vs-linkedlist.png)] --- class: center, middle # Comparisons ## HashSet vs TreeSet --- # HashSet vs TreeSet * Both implementations are not synchronized; * Both garantee duplicate-free collections of elements; * HashSet is faster than TreeSet on basic operations; * O(1) for operations like add, remove and contains; * TreeSet guarantees that elements will be sorted; * Ascending natural order, or using a comparator specified via constructor; * The cost is O(log n) complexity for basic operations; --- # HashSet vs TreeSet Complexities
.center[![](images/complexity-hashset-vs-treeset.png)] --- class: center, middle # Map --- # Map * A map is an object that maps keys to values; * It cannot contain duplicate keys, each key can map to at most one value; * Not true collections: implementations based on java.util.Map do not implement the Collection interface; * However, they contain collection-view operations, which enables them to be manipulated as collections; * Provides three collection views, which allow a map's contents to be viewed as: * A set of keys; * A collection of values; * A set of key-value mappings; * Common implementations: * HashMap * TreeMap ??? * Some implementations make specific guarantees as to their order, like the TreeMap, and others do not, like the HashMap; * keySet(), values(), entrySet(); --- # HashMap * Implementation of the Map interface based on a hash table; * Is roughly equivalent to Hashtable, except that it is not synchronized and permits null values and the null key; * Provides all of the optional map operations; * Makes no guarantees as to the order of the map and does not guarantee that it will remain constant over time either; * Provides constant-time performance for basic operations (get and put); * Assuming the hash function disperses the elements properly among the buckets; --- #HashMap
.center[![](images/hashtable.png)] --- # HashMap * A HashMap has two parameters that affect its performance: initial capacity and load factor: * The capacity is the number of buckets in the hash table; * The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased; * When the number of entries in the hash table exceeds the limit, it is rehashed (internal data structures are rebuilt) so that the number of buckets grows; * Creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing; ??? * Limit of hash table: product of the load factor and the current capacity; --- # TreeMap * A [Red-Black tree](https://en.wikipedia.org/wiki/Red-black_tree) based NavigableMap/SortedMap implementation; * Sorted according to the natural ordering of its keys, or by a provided Comparator; * This order is reflected when iterating over the map's collection views; * Provides guaranteed O(log(n)) time cost for the containsKey, get, put and remove operations; * Not synchronized; * All keys inserted into a sorted map must implement the Comparable interface (or be accepted by the specified comparator); * All such keys must be mutually comparable: k1.compareTo(k2) or comparator.compare(k1, k2) must not throw a ClassCastException; ??? * Map collection views: entrySet(), keySet() and values() methods; * It is the map analogue of SortedSet; --- class: center, middle # Comparisons ## HashMap vs TreeMap --- # HashMap vs TreeMap * HashMap: * Makes no guarantees about the iteration order; * The order might even change when new elements are added; * Is faster in general, providing O(1) complexity for basic operations; * Only works with objects having a suitable hashCode() operation; * Iteration performance depends on capacity and load factor; * TreeMap: * Guarantees ordering at the cost of being slower: O(log(n)) for basic operations; * Only works with Comparable objects; * Additionally, it implements the SortedMap interface, which contains methods that depend on this sort order; * Does not provide tuning parameters; --- # Credits * [Oracle - Collections Framework Overview](https://docs.oracle.com/javase/8/docs/technotes/guides/collections/index.html) * [StackOverflow](https://stackoverflow.com) * [When to use LinkedList over ArrayList?](https://stackoverflow.com/a/322742/7976727) * [What are the time complexities of various data structures?](https://stackoverflow.com/a/7294635/7976727) * [Hashset vs Treeset](https://stackoverflow.com/a/4464394/7976727) * [What are hashtables and hashmaps and their typical use cases?](https://stackoverflow.com/a/138285/7976727) * [Big-O Cheat Sheet](http://bigocheatsheet.com/) * [Wikipedia](https://en.wikipedia.org/) * [Doubly Linked List](https://en.wikipedia.org/wiki/Doubly_linked_list) * [Hash table](https://en.wikipedia.org/wiki/Hash_table)