# Creating a dictionary with student grades
= {
grades "John": 85,
"Alice": 92,
"Bob": 78
}
11 Data Structures: Dictionaries and Sets
In the previous chapter, we explored lists and tuples, which are foundational data structures in Python. Now, we turn our attention to two additional data structures: dictionaries and sets. These structures provide efficient ways to store and manipulate data, particularly when managing large or complex datasets.
11.1 Dictionaries
A dictionary in Python is a versatile and powerful data structure that stores data in key-value pairs. Unlike lists, which use numerical indices, dictionaries use keys that can be of any immutable data type (e.g., strings, numbers, tuples). This makes dictionaries an ideal structure for tasks that require fast lookups, updates, and association between related pieces of data.
11.1.1 Creating a Dictionary
Dictionaries are created using curly braces {}
or the dict()
constructor, and key-value pairs are defined using the syntax key: value
. Let’s explore various ways to create dictionaries:
Literal Notation
The most straightforward way to create a dictionary is by using curly braces:
Using the dict()
Constructor
Alternatively, dictionaries can be created using the dict()
constructor, which allows for the creation of dictionaries using keyword arguments or iterables of key-value pairs:
# Creating a dictionary using keyword arguments
= dict(John=85, Alice=92, Bob=78)
grades
# Creating a dictionary from a list of tuples
= dict([("John", 85), ("Alice", 92), ("Bob", 78)]) grades
In this example, both methods create the same dictionary as the one using literal notation.
11.1.2 Accessing Values
To access values in a dictionary, you use the key as the index. This allows for quick lookup time, which is one of the key advantages of dictionaries over lists or tuples.
# Accessing the value associated with the key "Alice"
print(grades["Alice"])
92
If the key is not found, Python raises a KeyError
. To avoid this, you can use the get()
method, which returns None
or a specified default value if the key does not exist:
# Safely accessing a key
print(grades.get("David", "Not Found"))
Not Found
Note the second argument for the get
method is a value that will be returned if the specified key does not exist. The default is None
.
11.1.3 Adding and Modifying Entries
Dictionaries are mutable, meaning you can add or modify key-value pairs after the dictionary has been created.
Modifying Values
To modify the value associated with an existing key, simply assign a new value to that key:
# Modifying the value associated with "John"
"John"] = 88
grades[print(grades)
{'John': 88, 'Alice': 92, 'Bob': 78}
Adding New Key-Value Pairs
To add a new key-value pair, use the same syntax as modifying an existing pair:
# Adding a new key-value pair
"David"] = 90
grades[print(grades)
{'John': 88, 'Alice': 92, 'Bob': 78, 'David': 90}
11.1.4 Deleting Key-Value Pairs
There are several ways to remove key-value pairs from a dictionary:
Using the del
Statement
The del
statement removes the key-value pair from the dictionary:
# Removing the key-value pair for "Bob"
del grades["Bob"]
print(grades)
{'John': 88, 'Alice': 92, 'David': 90}
Using the pop()
Method
The pop()
method removes a key-value pair and returns the value. If the key is not found, it raises a KeyError
, unless a default value is provided.
# Removing and returning the value for "Alice"
= grades.pop("Alice")
alice_grade print(alice_grade)
print(grades)
92
{'John': 88, 'David': 90}
11.1.5 Dictionary Methods
Dictionaries come with a variety of built-in methods that simplify common tasks, such as adding, removing, and checking for keys and values.
keys()
, values()
, and items()
keys()
returns a view of all the keys in the dictionary.values()
returns a view of all the values in the dictionary.items()
returns a view of all key-value pairs as tuples.
print(grades.keys())
print(grades.values())
print(grades.items())
dict_keys(['John', 'David'])
dict_values([88, 90])
dict_items([('John', 88), ('David', 90)])
update()
The update()
method allows you to merge two dictionaries or add key-value pairs from another iterable:
# Merging dictionaries
= {"Eve": 85, "Charlie": 79}
extra_grades
grades.update(extra_grades)print(grades)
{'John': 88, 'David': 90, 'Eve': 85, 'Charlie': 79}
clear()
The clear()
method removes all key-value pairs from the dictionary:
grades.clear()print(grades)
{}
11.1.6 Iterating Over Dictionaries
There are several ways to iterate over dictionaries, depending on whether you need keys, values, or both:
Iterating Over Keys
By default, iterating over a dictionary yields its keys:
for student in grades:
print(student)
Iterating Over Values
You can iterate over the values by using the values()
method:
for grade in grades.values():
print(grade)
Iterating Over Key-Value Pairs
The items()
method allows you to iterate over both keys and values simultaneously:
for student, grade in grades.items():
print(f"{student}: {grade}")
11.1.7 Dictionary Comprehension
Like list comprehensions, Python also supports dictionary comprehensions, which provide a concise way to create dictionaries from iterables.
# Creating a dictionary of squares
= {x: x**2 for x in range(1, 6)}
squares print(squares)
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
11.1.8 Practical Applications of Dictionaries
Dictionaries are used in a variety of real-world applications, particularly where fast lookups or associations between pieces of data are needed.
Frequency Count
One common use of dictionaries is counting the frequency of elements in a collection. Here is an example that counts the occurrence of each character in a string:
def char_frequency(text):
= {}
freq for char in text:
if char in freq:
+= 1
freq[char] else:
= 1
freq[char] return freq
= "data science"
text print(char_frequency(text))
{'d': 1, 'a': 2, 't': 1, ' ': 1, 's': 1, 'c': 2, 'i': 1, 'e': 2, 'n': 1}
Storing Configuration Settings
Dictionaries are often used to store configuration settings because they allow easy lookups by key:
= {
config "host": "localhost",
"port": 8080,
"debug": True
}print(config["host"])
localhost
Caching Computations
Dictionaries can be used to cache results of expensive computations to avoid recalculating them:
A cache (pronounced “cash”) is memory used to store something, usually data, temporarily in a computing environment.
= {}
factorial_cache
def factorial(n):
if n in factorial_cache:
return factorial_cache[n]
if n == 0:
= 1
result else:
= n * factorial(n-1)
result = result
factorial_cache[n] return result
print(factorial(5))
print(factorial_cache)
120
{0: 1, 1: 1, 2: 2, 3: 6, 4: 24, 5: 120}
By caching the results, subsequent calls to factorial(n)
for previously computed values are faster, as they avoid redundant calculations.
11.1.9 Using **kwargs
in Functions
In Python, dictionaries are often used to pass and manage named arguments to functions. One powerful feature that leverages dictionaries is the **kwargs
mechanism, which allows functions to accept an arbitrary number of keyword arguments (recall sec-functions). These keyword arguments are collected into a dictionary, which provides flexibility when you do not know in advance what arguments might be passed to a function.
The **kwargs
construct is particularly useful when writing functions that need to accept a variable number of named parameters or when extending existing functions with new optional arguments without changing their function signature.
How **kwargs
Works
The term kwargs
stands for “keyword arguments,” and when used with **
, it allows you to pass a variable number of named arguments to a function. Inside the function, these keyword arguments are captured as a dictionary.
def print_student_scores(**kwargs):
for student, score in kwargs.items():
print(f"{student}: {score}")
# Calling the function with multiple keyword arguments
=85, Alice=92, Bob=78) print_student_scores(John
John: 85
Alice: 92
Bob: 78
In the example above, the function print_student_scores()
accepts any number of keyword arguments and prints them. The **kwargs
parameter collects the keyword arguments as a dictionary, where the keys are the argument names (John
, Alice
, Bob
), and the values are the respective scores.
Accessing and Using **kwargs
Once inside the function, **kwargs
behaves like a normal dictionary. You can access, iterate over, and modify its elements just as you would with any other dictionary.
def get_student_grade(**kwargs):
= kwargs.get("student")
student = kwargs.get("grade")
grade if student and grade:
print(f"{student}'s grade is {grade}")
else:
print("Missing student or grade information")
# Providing student and grade as keyword arguments
="John", grade=85)
get_student_grade(student
# Missing one argument
="Alice") get_student_grade(student
John's grade is 85
Missing student or grade information
In this example, the kwargs.get()
method is used to safely retrieve values from the kwargs
dictionary. If the key does not exist, get()
returns None
, which prevents the function from throwing a KeyError
.
Combining **kwargs
with Regular and Positional Arguments
You can combine **kwargs
with regular and positional arguments. However, **kwargs
must always be placed after regular arguments in the function signature:
def student_info(course, **kwargs):
print(f"Course: {course}")
for key, value in kwargs.items():
print(f"{key}: {value}")
# Calling the function with both positional and keyword arguments
"Mathematics", name="John", grade=90, age=20) student_info(
Course: Mathematics
name: John
grade: 90
age: 20
Here, course
is a regular argument, and the remaining keyword arguments (e.g., name
, grade
, and age
) are captured into the kwargs
dictionary.
Passing a Dictionary as **kwargs
If you already have a dictionary of key-value pairs, you can pass it to a function using **
to unpack the dictionary into keyword arguments.
def print_details(name, age, occupation):
print(f"Name: {name}, Age: {age}, Occupation: {occupation}")
# Creating a dictionary of arguments
= {"name": "Alice", "age": 30, "occupation": "Data Scientist"}
person
# Passing the dictionary as keyword arguments
**person) print_details(
Name: Alice, Age: 30, Occupation: Data Scientist
In this case, the **person
syntax unpacks the dictionary into keyword arguments, allowing you to pass the dictionary directly into the function.
**kwargs
and Default Arguments
While **kwargs
allows for flexible keyword arguments, you can also combine it with default arguments to give function parameters some predefined behavior.
def log_message(level="INFO", **kwargs):
= kwargs.get("message", "No message provided")
message = kwargs.get("timestamp", "No timestamp")
timestamp print(f"[{level}] {timestamp}: {message}")
# Logging a message with a default level
="System started", timestamp="2023-09-12 10:00:00")
log_message(message
# Overriding the default level
="ERROR", message="System failure", timestamp="2023-09-12 10:01:00") log_message(level
[INFO] 2023-09-12 10:00:00: System started
[ERROR] 2023-09-12 10:01:00: System failure
In this example, the log_message()
function uses a default level of “INFO” and then utilizes **kwargs
to collect additional information like the message and timestamp.
Summary
The **kwargs
construct provides a flexible way to pass and handle keyword arguments in Python. By collecting all keyword arguments into a dictionary, you gain the ability to write dynamic and adaptable functions. Whether used for configuration settings, logging, or passing optional parameters, **kwargs
is a powerful tool that makes functions more reusable and extensible.
11.2 Sets
A set in Python is an unordered collection of unique elements. This data structure is useful when you need to store distinct items and perform operations such as union, intersection, difference, or membership testing efficiently. Sets are particularly powerful when handling large datasets where duplication is unnecessary or undesirable.
11.2.1 Creating a Set
Sets are created by placing items inside curly braces {}
or by using the set()
function. Unlike lists or dictionaries, sets do not maintain any order, and duplicate values are automatically removed. Sets can hold items of any immutable data type, such as numbers, strings, or tuples.
Literal Notation
You can create a set directly by enclosing a sequence of values in curly braces:
# Creating a set of integers
= {1, 2, 3, 4, 5}
numbers print(numbers)
# Creating a set of strings
= {"apple", "banana", "cherry"}
fruits print(fruits)
{1, 2, 3, 4, 5}
{'banana', 'cherry', 'apple'}
Using the set()
Function
The set()
function is particularly useful when creating a set from an iterable, such as a list or a string. It automatically removes duplicates:
# Creating a set from a list with duplicate values
= [1, 2, 2, 3, 4, 4, 5]
numbers_list = set(numbers_list)
unique_numbers print(unique_numbers)
# Creating a set from a string
= set("hello")
letters print(letters)
{1, 2, 3, 4, 5}
{'o', 'e', 'l', 'h'}
11.2.2 Adding and Removing Elements
Adding Elements
You can add elements to a set using the add()
method. However, since sets do not allow duplicate values, adding an existing element has no effect:
# Adding elements to a set
= {"apple", "banana"}
fruits "cherry")
fruits.add(print(fruits)
# Attempting to add a duplicate element
"apple")
fruits.add(print(fruits)
{'banana', 'cherry', 'apple'}
{'banana', 'cherry', 'apple'}
Removing Elements
There are multiple ways to remove elements from a set, including the remove()
, discard()
, and pop()
methods:
remove()
raises aKeyError
if the element does not exist.discard()
does not raise an error if the element is not found.pop()
removes and returns an arbitrary element, as sets are unordered.
# Removing elements using remove()
"banana")
fruits.remove(print(fruits)
# Using discard() to remove an element safely
"apple")
fruits.discard(print(fruits)
# Removing a random element with pop()
= fruits.pop()
random_fruit print(random_fruit)
print(fruits)
{'cherry', 'apple'}
{'cherry'}
cherry
set()
11.2.3 Set Operations
One of the most powerful features of sets is their support for mathematical operations such as union, intersection, difference, and symmetric difference. These operations are efficient and allow for concise and readable code.
Union (|
or union()
)
The union operation combines all elements from two sets, excluding duplicates. This operation can be performed using the |
operator or the union()
method.
= {1, 2, 3}
set1 = {3, 4, 5}
set2
# Using the | operator
= set1 | set2
union_set print(union_set)
# Using the union() method
= set1.union(set2)
union_set print(union_set)
{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5}
Intersection (&
or intersection()
)
The intersection operation returns only the elements that are present in both sets. It can be performed using the &
operator or the intersection()
method.
= {1, 2, 3}
set1 = {2, 3, 4}
set2
# Using the & operator
= set1 & set2
intersection_set print(intersection_set)
# Using the intersection() method
= set1.intersection(set2)
intersection_set print(intersection_set)
{2, 3}
{2, 3}
Difference (-
or difference()
)
The difference operation returns the elements that are in the first set but not in the second. This can be done using the -
operator or the difference()
method.
= {1, 2, 3}
set1 = {2, 3, 4}
set2
# Using the - operator
= set1 - set2
difference_set print(difference_set)
# Using the difference() method
= set1.difference(set2)
difference_set print(difference_set)
{1}
{1}
Symmetric Difference (^
or symmetric_difference()
)
The symmetric difference operation returns elements that are in either of the sets but not in both. This operation can be performed using the ^
operator or the symmetric_difference()
method.
= {1, 2, 3}
set1 = {2, 3, 4}
set2
# Using the ^ operator
= set1 ^ set2
sym_diff_set print(sym_diff_set)
# Using the symmetric_difference() method
= set1.symmetric_difference(set2)
sym_diff_set print(sym_diff_set)
{1, 4}
{1, 4}
11.2.4 Checking for Subsets and Supersets
Sets also support operations that allow you to check whether one set is a subset or superset of another. These operations are particularly useful in scenarios where you need to compare sets.
Subset (<=
or issubset()
)
A set A
is a subset of set B
if all elements of A
are also in B
. You can check for subsets using the <=
operator or the issubset()
method.
= {1, 2}
set1 = {1, 2, 3, 4}
set2
# Using the <= operator
print(set1 <= set2)
# Using the issubset() method
print(set1.issubset(set2))
True
True
Superset (>=
or issuperset()
)
A set A
is a superset of set B
if all elements of B
are also in A
. You can check for supersets using the >=
operator or the issuperset()
method.
= {1, 2, 3, 4}
set1 = {1, 2}
set2
# Using the >= operator
print(set1 >= set2)
# Using the issuperset() method
print(set1.issuperset(set2))
True
True
11.2.5 Frozen Sets
A frozen set is an immutable version of a set, meaning that once a frozen set is created, its elements cannot be modified (i.e., you cannot add or remove elements). Frozen sets are useful when you need a collection of unique elements that should remain constant throughout the program. You create a frozen set using the frozenset()
function:
# Creating a frozen set
= frozenset([1, 2, 3, 4])
immutable_set print(immutable_set)
# Attempting to add an element to a frozen set will raise an error
# immutable_set.add(5) # Raises AttributeError
frozenset({1, 2, 3, 4})
11.2.6 Set Comprehensions
Python provides a concise way to create sets using set comprehensions, similar to list comprehensions. A set comprehension is written with curly braces {}
and allows you to define sets based on existing iterables, often including a filtering condition.
# Creating a set of squares for numbers 1 to 5
= {x**2 for x in range(1, 6)}
squares print(squares)
# Set comprehension with a condition
= {x**2 for x in range(1, 11) if x % 2 == 0}
even_squares print(even_squares)
{1, 4, 9, 16, 25}
{64, 100, 4, 36, 16}
Set comprehensions are a powerful tool when you need to transform or filter data while maintaining unique elements.
11.2.7 Set Best Practices
While sets are efficient for handling unique elements, there are a few best practices to keep in mind when working with sets in Python:
Avoid Unnecessary Duplicates: Since sets automatically remove duplicates, there’s no need to check for duplicates before adding elements.
Use Sets for Membership Testing: When you need to check if an item exists in a collection and the collection does not need to maintain order or allow duplicates, sets are the best choice due to their O(1) membership testing time complexity.
Choose the Right Operation: Use set operations such as union, intersection, and difference to simplify complex data comparison tasks. These operations are more efficient than writing custom loops to achieve the same results.
Sets are an invaluable data structure for handling collections of unique items. Their efficiency in membership testing, combined with their ability to perform set operations such as union, intersection, and difference, makes them ideal for a wide variety of tasks, from data processing to mathematical computations. With their unordered nature and automatic deduplication, sets help simplify code and ensure efficient performance, especially when working with large datasets.
11.2.8 Combining Dictionaries and Sets
You can often combine dictionaries and sets in practical applications. For example, to find unique words and their counts from a list of sentences:
def unique_words(sentences):
= {}
word_dict for sentence in sentences:
= set(sentence.split()) # Use a set to find unique words
words for word in words:
if word in word_dict:
+= 1
word_dict[word] else:
= 1
word_dict[word] return word_dict
= ["data science is great", "data science is evolving"]
sentences = unique_words(sentences)
result print(result) # Output: {'data': 2, 'science': 2, 'is': 2, 'great': 1, 'evolving': 1}
{'is': 2, 'great': 1, 'data': 2, 'science': 2, 'evolving': 1}
This function uses sets to ensure that each word in a sentence is only counted once per sentence and then stores the results in a dictionary.
11.3 Exercises
Excersice 1: Student Grades
Write a function called update_grades()
that accepts a dictionary of student names and their grades. The function should accept new student names and grades and update the dictionary. Finally, it should return the updated dictionary.
Example:
= {"John": 85, "Alice": 92}
grades = {"Bob": 78, "Alice": 95}
new_grades
= update_grades(grades, **new_grades)
updated_grades print(updated_grades)
Expected Output:
"John": 85, "Alice": 95, "Bob": 78} {
Excersice 2: Word Frequency Counter
Write a function word_frequency(text)
that takes a string and returns a dictionary with the frequency count of each word in the string.
Example:
= "data science is data fun science is fun"
text = word_frequency(text)
result print(result)
Expected Output:
'data': 2, 'science': 2, 'is': 2, 'fun': 2} {
Excersice 3: Dictionary of Squares
Create a function squares_dict(n)
that generates a dictionary where the keys are numbers from 1 to n
and the values are their corresponding squares.
Example:
print(squares_dict(5))
Expected Output:
1: 1, 2: 4, 3: 9, 4: 16, 5: 25} {
Exercise 4: Merge Dictionaries
Write a function merge_dictionaries(*args)
that accepts any number of dictionaries and merges them into a single dictionary. If a key is repeated, the value from the last dictionary should be retained.
Example:
= {"a": 1, "b": 2}
dict1 = {"b": 3, "c": 4}
dict2 = {"d": 5}
dict3
= merge_dictionaries(dict1, dict2, dict3)
result print(result)
Expected Output:
'a': 1, 'b': 3, 'c': 4, 'd': 5} {
Exercise 5: Remove Duplicates
Write a function remove_duplicates(lst)
that takes a list and returns a new list with all the duplicates removed using a set.
Example:
= [1, 2, 2, 3, 4, 4, 5]
numbers print(remove_duplicates(numbers))
Expected Output:
1, 2, 3, 4, 5] [
Exercise 6: Set Operations
Given two sets of students enrolled in two different courses, write functions to:
- Find students enrolled in both courses.
- Find students enrolled only in the first course.
- Find students enrolled in either course but not both.
Example:
= {"Alice", "Bob", "Charlie", "David"}
course_A = {"Charlie", "David", "Eve", "Frank"}
course_B
# Students in both courses
print(students_in_both(course_A, course_B))
# Students only in course A
print(only_in_first(course_A, course_B))
# Students in either course but not both
print(either_but_not_both(course_A, course_B))
Expected Output:
'Charlie', 'David'}
{'Alice', 'Bob'}
{'Alice', 'Bob', 'Eve', 'Frank'} {
Exercise 7: Flexible Function with **kwargs
Write a function student_profile(**kwargs)
that accepts student information (name, age, grade, etc.) as keyword arguments and prints the information in a readable format.
Example:
="Alice", age=20, grade="A", major="Mathematics") student_profile(name
Expected Output:
Name: Alice20
Age:
Grade: A Major: Mathematics
Exercise 8: Summing Keyword Arguments
Write a function sum_values(**kwargs)
that takes any number of keyword arguments where the values are integers and returns their sum.
Example:
= sum_values(a=5, b=10, c=3)
result print(result)
Expected Output:
18