< Control Flow in Python | Contents | Functions in Python >
Data Structures in Python¶
Purpose: To help you get comfortable with the topics outlined below.
Recomended Usage
- Run each of the cells (Shift+Enter) and edit them as necessary to solidify your understanding
- Do any of the exercises that are relevant to helping you understand the material
Topics Covered
- Common Data Structures
- Additional Data Structures
- Time Complexity
Workbook Setup¶
The following magics will reload all modules before executing a new line and help make sure you follow PEP8 code style.
%load_ext autoreload
%autoreload 2
%load_ext pycodestyle_magic
%pycodestyle_on
Common Data Structures¶
| Data Structure | Category | Definition |
|---|---|---|
ChainMap |
Dictionary-like | list of dictionaries |
Counter |
Dictionary-like | keeping track of the most occurring elements |
Defaultdict |
Dictionary-like | dictionary with a default value |
OrderedDict |
Dictionary-like | a dictionary where insertion order matters |
Dqueue |
List-like | the double ended queue (doubly linked list) |
Heapq |
List-like | the ordered list |
NamedTuple |
Tuple-like | tuples with field names |
Enum |
Other | a group of constants |
... and many more
Note: It's really important to know how to READ THE DOCS to check the runtimes of operations, common usage cases, etc. Take some time now and familiarize yourself with them.
Create tuples¶
my_tuple = ()
type(my_tuple)
t = 1111, 22222, 'papapap!'
print(type(t))
print(t[0])
# Nested tuples
u = t, (1, 2, 3, 4, 5)
u
# Tuples are immutable
t[0] = 88888
# but they can contain mutable objects
v = ([1, 2, 3], [3, 2, 1])
print(v)
v[0][0] = 0
print(v)
Create and modify lists¶
doubles = []
for x in range(10):
doubles.append(x*2)
print(doubles)
Create list using comprehensions¶
List comprehensions are written using the following syntax
[expression for item in list]
doubles = [x*2 for x in range(10)]
print(doubles)
# Create a list of tuples using list comprehensions
[(x, y) for x in [1, 2, 3] for y in [3, 1, 4] if x != y]
The following examples are from the docs
# Filter the list to exclude negative numbers
vec = [-10, 12, 0, -5, 2]
[x for x in vec if x >= 0]
# Apply a function to all the elements
[abs(x) for x in vec]
# Call a method on each element
freshfruit = [' banana', ' loganberry ', 'passion fruit ']
[fruit.strip() for fruit in freshfruit]
Note: The strip() function takes a string and strips if of spaces in front and behind.
fruit = ' banana'
fruit.strip?
# Create a list of 2-tuples like (number, square)
[(x, x**2) for x in range(6)]
# Flatten a list using a listcomp with two 'for'
vec = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[num for elem in vec for num in elem]
Membership Testing¶
vec
[1, 2, 3] in vec
1 in vec
Iterate over lists¶
my_list = ['big', 'belly', 'chuck']
The enumerate() function take an iterable (a list in this case) and returns it indexed. For example enumerating my_list returns tuples with the index number followed by the list item.
enumerate?
for item in enumerate(my_list):
print(item)
We can also access each item in the tuple directly from the for statement.
for i, v in enumerate(my_list):
print(i, v)
Zip two lists together and iterate¶
questions = ['fname', 'lname', 'favorite teacher']
answers = ['james', 'cook', 'ms prescott']
zip?
for q, a in zip(questions, answers):
print('{0}? {1}'.format(q, a))
Dicts¶
A dictionary as a set of key:value pairs, with the requirement that the keys are unique (within one dictionary).
my_dict = {}
Create and modify dicts¶
# Create a dict
zip_codes = {'jane': 54837, 'jack': 43521, 'fred': 3982}
zip_codes
# Create a dict
dict(sape=4139, guido=4127, jack=4098)
# Update an entry
zip_codes['jane'] = 3333
zip_codes
# Get the value of the `jack` key
zip_codes['jack']
# Remove key value pair
del zip_codes['jack']
zip_codes
# Sort a dict
sorted(zip_codes)
Dict Comprehensions¶
# Create a dict using comprehensions
{x: x**2 for x in (2, 4, 6)}
Membership Testing¶
'fred' in zip_codes
'fred' not in zip_codes
Type conversion¶
# Convert to list
list(zip_codes)
# Convert list of tuples to dict
dict([('fred', 4355), ('james', 3333), ('tammy', 10392)])
Iterate over a dict¶
for k, v in zip_codes.items():
print(k, v)
Sets¶
A set is an unordered collection with no duplicate elements. Common use cases include things like membership testing and eliminating duplicate entries.
my_set = set()
Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.
Create and modify¶
empty_set = set()
type(empty_set)
# Duplicates will be removed when creating a set
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket)
Note: This doesn't create a dictionary even though there are curly brackets because a dictionary takes key value pairs, here we only give one value each.
# Only unique letters will be added to the sets below
a = set('aaabbbbbbcccc')
b = set('aaaaaddddeeeeefffff')
# Unique letters in each
print('Unique letters in a: {}'.format(a))
print('Unique letters in b: {}'.format(b))
Creation using comprehensions¶
a = {x for x in 'aaaaabbbbccccdddd' if x not in 'abc'}
a
Membership Testing¶
# Fast membership testing
'orange' in basket
Mathematical operations on sets¶
# Letters in a but not in b
a - b
# Letters in a or b or both
a | b
# Letters in both a and b
a & b
# Letters in a or b but not in both
a ^ b
Additional Data Structures¶
Time Complexity¶
Before we go into specific data structures, we need to understand how Big O Notation is used to show time complexity so that we can make good judgments about when to use certain data structures.
Big O Notation Examples
O(1) - a function that takes O(1) time takes 1 step to execute (pronounced "order-1" or "oh-one")
O(n) - takes n steps to execute where n is generally the size of the object (pronounced "order-n" or "oh-n")
O(2**n) - takes 2**n steps to execute (pronounced "order-two-to-the-n" or "oh-two-to-the-n")
| Data Structure | Definition | Common Time Complexities | Notes/Examples |
|---|---|---|---|
list |
mutable list of items | O(1): append, get, set, lenO(n): remove, insert |
remove and insert are O(n) operations because Python must copy the whole list |
dict |
unsorted but fast map of items (uses hash structure) | O(1): get, set, deleteO(m): remove, insert |
Where m is the max size of the dictionarydict ops like get, set may not be O(1) if hash collisions |
set |
like a dict without values | append, get set, len O(1); remove, insert O(n) | Uses a hash to get a unique collection of values Particularly useful for permission systems where you have to do mass adding or removing of users |
tuple |
immutable list | append, get set, len O(1); remove, insert O(n) | Because tuples are hashable there are some cases where its more advantageous to have a tuple over a list Tuple packing and unpacking is supported |
Yea yea things take different amounts of time to complete....who cares? Well, you do if you work with large quantities of data.
For example, say you are working with a database of X users and ...
Troubleshooting Tips¶
If you run into issues running any of the code in this notebook, check your version of Python and Jupyter against mine below
import sys
print(sys.version)
3.7.6 (default, Dec 31 2019, 17:12:14)
[Clang 11.0.0 (clang-1100.0.33.16)]
!jupyter --version
jupyter core : 4.6.1
jupyter-notebook : 6.0.2
qtconsole : not installed
ipython : 7.9.0
ipykernel : 5.1.3
jupyter client : 5.3.4
jupyter lab : 1.2.3
nbconvert : 5.6.1
ipywidgets : not installed
nbformat : 4.4.0
traitlets : 4.3.3
# import sys
# print(sys.version)
# !jupyter --version