The content for this site available on GitHub. If you want to launch the notebooks interactively click on the binder stamp below. Binder

< Control Flow in Python | Contents | Functions in Python >

Data Structures in Python

Purpose: To help you get comfortable with the topics outlined below.

Recomended Usage

  • Run each of the cells (Shift+Enter) and edit them as necessary to solidify your understanding
  • Do any of the exercises that are relevant to helping you understand the material

Topics Covered

  • Common Data Structures
  • Additional Data Structures
  • Time Complexity

Workbook Setup

The following magics will reload all modules before executing a new line and help make sure you follow PEP8 code style.

In [1]:
%load_ext autoreload
%autoreload 2

%load_ext pycodestyle_magic
%pycodestyle_on

Common Data Structures

Data Structure Category Definition
ChainMap Dictionary-like list of dictionaries
Counter Dictionary-like keeping track of the most occurring elements
Defaultdict Dictionary-like dictionary with a default value
OrderedDict Dictionary-like a dictionary where insertion order matters
Dqueue List-like the double ended queue (doubly linked list)
Heapq List-like the ordered list
NamedTuple Tuple-like tuples with field names
Enum Other a group of constants

... and many more

Note: It's really important to know how to READ THE DOCS to check the runtimes of operations, common usage cases, etc. Take some time now and familiarize yourself with them.

Tuples

Tuple are immutable groupings (BUT they can contain mutable types)

my_tuple = ()

Create tuples

In [87]:
my_tuple = ()
type(my_tuple)
Out[87]:
tuple
In [1]:
t = 1111, 22222, 'papapap!'
print(type(t))
print(t[0])
<class 'tuple'>
1111
In [94]:
# Nested tuples
u = t, (1, 2, 3, 4, 5)
u
Out[94]:
((1111, 22222, 'papapap!'), (1, 2, 3, 4, 5))
In [52]:
# Tuples are immutable
t[0] = 88888
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-9e19cce22619> in <module>
      1 # Tuples are immutable:
----> 2 t[0] = 88888

TypeError: 'tuple' object does not support item assignment
In [95]:
# but they can contain mutable objects
v = ([1, 2, 3], [3, 2, 1])
print(v)

v[0][0] = 0
print(v)
([1, 2, 3], [3, 2, 1])
([0, 2, 3], [3, 2, 1])

Lists

A list is just a mutable collection of items

my_list = []

Create and modify lists

In [2]:
doubles = []
for x in range(10):
    doubles.append(x*2)
print(doubles)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Create list using comprehensions

List comprehensions are written using the following syntax

[expression for item in list]

In [9]:
doubles = [x*2 for x in range(10)]
print(doubles)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In [10]:
# Create a list of tuples using list comprehensions
[(x, y) for x in [1, 2, 3] for y in [3, 1, 4] if x != y]
Out[10]:
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

The following examples are from the docs

In [11]:
# Filter the list to exclude negative numbers
vec = [-10, 12, 0, -5, 2]
[x for x in vec if x >= 0]
Out[11]:
[12, 0, 2]
In [12]:
# Apply a function to all the elements
[abs(x) for x in vec]
Out[12]:
[10, 12, 0, 5, 2]
In [1]:
# Call a method on each element
freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']
[fruit.strip() for fruit in freshfruit]
Out[1]:
['banana', 'loganberry', 'passion fruit']

Note: The strip() function takes a string and strips if of spaces in front and behind.

In [9]:
fruit = '   banana'
fruit.strip?
Signature: fruit.strip(chars=None, /)
Docstring:
Return a copy of the string with leading and trailing whitespace remove.

If chars is given and not None, remove characters in chars instead.
Type:      builtin_function_or_method
In [15]:
# Create a list of 2-tuples like (number, square)
[(x, x**2) for x in range(6)]
Out[15]:
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]
In [16]:
# Flatten a list using a listcomp with two 'for'
vec = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[num for elem in vec for num in elem]
Out[16]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Membership Testing

In [69]:
vec
Out[69]:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [68]:
[1, 2, 3] in vec
Out[68]:
True
In [70]:
1 in vec
Out[70]:
False

Iterate over lists

In [61]:
my_list = ['big', 'belly', 'chuck']

The enumerate() function take an iterable (a list in this case) and returns it indexed. For example enumerating my_list returns tuples with the index number followed by the list item.

In [2]:
enumerate?
Init signature: enumerate(iterable, start=0)
Docstring:     
Return an enumerate object.

  iterable
    an object supporting iteration

The enumerate object yields pairs containing a count (from start, which
defaults to zero) and a value yielded by the iterable argument.

enumerate is useful for obtaining an indexed list:
    (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
Type:           type
Subclasses:     
In [62]:
for item in enumerate(my_list):
    print(item)
(0, 'big')
(1, 'belly')
(2, 'chuck')

We can also access each item in the tuple directly from the for statement.

In [63]:
for i, v in enumerate(my_list):
    print(i, v)
0 big
1 belly
2 chuck

Zip two lists together and iterate

In [5]:
questions = ['fname', 'lname', 'favorite teacher']
answers = ['james', 'cook', 'ms prescott']
In [6]:
zip?
Init signature: zip(self, /, *args, **kwargs)
Docstring:     
zip(*iterables) --> zip object

Return a zip object whose .__next__() method returns a tuple where
the i-th element comes from the i-th iterable argument.  The .__next__()
method continues until the shortest iterable in the argument sequence
is exhausted and then it raises StopIteration.
Type:           type
Subclasses:     
In [8]:
for q, a in zip(questions, answers):
    print('{0}? {1}'.format(q, a))
fname? james
lname? cook
favorite teacher? ms prescott

Dicts

A dictionary as a set of key:value pairs, with the requirement that the keys are unique (within one dictionary).

my_dict = {}

Create and modify dicts

In [43]:
# Create a dict
zip_codes = {'jane': 54837, 'jack': 43521, 'fred': 3982}
zip_codes
Out[43]:
{'jane': 54837, 'jack': 43521, 'fred': 3982}
In [55]:
# Create a dict
dict(sape=4139, guido=4127, jack=4098)
Out[55]:
{'sape': 4139, 'guido': 4127, 'jack': 4098}
In [44]:
# Update an entry
zip_codes['jane'] = 3333
zip_codes
Out[44]:
{'jane': 3333, 'jack': 43521, 'fred': 3982}
In [45]:
# Get the value of the `jack` key
zip_codes['jack']
Out[45]:
43521
In [46]:
# Remove key value pair
del zip_codes['jack']
zip_codes
Out[46]:
{'jane': 3333, 'fred': 3982}
In [48]:
# Sort a dict
sorted(zip_codes)
Out[48]:
['fred', 'jane']

Dict Comprehensions

In [54]:
# Create a dict using comprehensions
{x: x**2 for x in (2, 4, 6)}
Out[54]:
{2: 4, 4: 16, 6: 36}

Membership Testing

In [52]:
'fred' in zip_codes
Out[52]:
True
In [51]:
'fred' not in zip_codes
Out[51]:
False

Type conversion

In [47]:
# Convert to list
list(zip_codes)
Out[47]:
['jane', 'fred']
In [57]:
# Convert list of tuples to dict
dict([('fred', 4355), ('james', 3333), ('tammy', 10392)])
Out[57]:
{'fred': 4355, 'james': 3333, 'tammy': 10392}

Iterate over a dict

In [58]:
for k, v in zip_codes.items():
    print(k, v)
jane 3333
fred 3982

Sets

A set is an unordered collection with no duplicate elements. Common use cases include things like membership testing and eliminating duplicate entries.

my_set = set()

Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.

Create and modify

In [76]:
empty_set = set()
type(empty_set)
Out[76]:
set
In [77]:
# Duplicates will be removed when creating a set
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket)
{'apple', 'banana', 'orange', 'pear'}

Note: This doesn't create a dictionary even though there are curly brackets because a dictionary takes key value pairs, here we only give one value each.

In [81]:
# Only unique letters will be added to the sets below
a = set('aaabbbbbbcccc')
b = set('aaaaaddddeeeeefffff')

# Unique letters in each
print('Unique letters in a: {}'.format(a))

print('Unique letters in b: {}'.format(b))
Unique letters in a: {'b', 'a', 'c'}
Unique letters in b: {'f', 'e', 'a', 'd'}

Creation using comprehensions

In [86]:
a = {x for x in 'aaaaabbbbccccdddd' if x not in 'abc'}
a
Out[86]:
{'d'}

Membership Testing

In [78]:
# Fast membership testing
'orange' in basket
Out[78]:
True

Mathematical operations on sets

In [82]:
# Letters in a but not in b
a - b
Out[82]:
{'b', 'c'}
In [83]:
# Letters in a or b or both
a | b
Out[83]:
{'a', 'b', 'c', 'd', 'e', 'f'}
In [84]:
# Letters in both a and b
a & b
Out[84]:
{'a'}
In [85]:
# Letters in a or b but not in both
a ^ b
Out[85]:
{'b', 'c', 'd', 'e', 'f'}

Additional Data Structures

Time Complexity

Before we go into specific data structures, we need to understand how Big O Notation is used to show time complexity so that we can make good judgments about when to use certain data structures.

Big O Notation Examples

O(1) - a function that takes O(1) time takes 1 step to execute (pronounced "order-1" or "oh-one")

O(n) - takes n steps to execute where n is generally the size of the object (pronounced "order-n" or "oh-n")

O(2**n) - takes 2**n steps to execute (pronounced "order-two-to-the-n" or "oh-two-to-the-n")

Data Structure Definition Common Time Complexities Notes/Examples
list mutable list of items O(1): append, get, set, len
O(n): remove, insert
remove and insert are O(n) operations because Python must copy the whole list
dict unsorted but fast map of items (uses hash structure) O(1): get, set, delete
O(m): remove, insert
Where m is the max size of the dictionary
dict ops like get, set may not be O(1) if hash collisions
set like a dict without values append, get set, len O(1); remove, insert O(n) Uses a hash to get a unique collection of values
Particularly useful for permission systems where you have to do mass adding or removing of users
tuple immutable list append, get set, len O(1); remove, insert O(n) Because tuples are hashable there are some cases where its more advantageous to have a tuple over a list
Tuple packing and unpacking is supported

Yea yea things take different amounts of time to complete....who cares? Well, you do if you work with large quantities of data.

For example, say you are working with a database of X users and ...

Troubleshooting Tips

If you run into issues running any of the code in this notebook, check your version of Python and Jupyter against mine below

import sys
print(sys.version)
3.7.6 (default, Dec 31 2019, 17:12:14) 
[Clang 11.0.0 (clang-1100.0.33.16)]
!jupyter --version
jupyter core     : 4.6.1
jupyter-notebook : 6.0.2
qtconsole        : not installed
ipython          : 7.9.0
ipykernel        : 5.1.3
jupyter client   : 5.3.4
jupyter lab      : 1.2.3
nbconvert        : 5.6.1
ipywidgets       : not installed
nbformat         : 4.4.0
traitlets        : 4.3.3
In [7]:
# import sys
# print(sys.version)
In [6]:
# !jupyter --version


< Control Flow in Python | Contents | Functions in Python >