11  Data Structures: Dictionaries and Sets

In the previous chapter, we explored lists and tuples, which are foundational data structures in Python. Now, we turn our attention to two additional data structures: dictionaries and sets. These structures provide efficient ways to store and manipulate data, particularly when managing large or complex datasets.

11.1 Dictionaries

A dictionary in Python is a versatile and powerful data structure that stores data in key-value pairs. Unlike lists, which use numerical indices, dictionaries use keys that can be of any immutable data type (e.g., strings, numbers, tuples). This makes dictionaries an ideal structure for tasks that require fast lookups, updates, and association between related pieces of data.

11.1.1 Creating a Dictionary

Dictionaries are created using curly braces {} or the dict() constructor, and key-value pairs are defined using the syntax key: value. Let’s explore various ways to create dictionaries:

Literal Notation

The most straightforward way to create a dictionary is by using curly braces:

# Creating a dictionary with student grades
grades = {
    "John": 85,
    "Alice": 92,
    "Bob": 78
}

Using the dict() Constructor

Alternatively, dictionaries can be created using the dict() constructor, which allows for the creation of dictionaries using keyword arguments or iterables of key-value pairs:

# Creating a dictionary using keyword arguments
grades = dict(John=85, Alice=92, Bob=78)

# Creating a dictionary from a list of tuples
grades = dict([("John", 85), ("Alice", 92), ("Bob", 78)])

In this example, both methods create the same dictionary as the one using literal notation.

11.1.2 Accessing Values

To access values in a dictionary, you use the key as the index. This allows for quick lookup time, which is one of the key advantages of dictionaries over lists or tuples.

# Accessing the value associated with the key "Alice"
print(grades["Alice"])  
92

If the key is not found, Python raises a KeyError. To avoid this, you can use the get() method, which returns None or a specified default value if the key does not exist:

# Safely accessing a key
print(grades.get("David", "Not Found")) 
Not Found

Note the second argument for the get method is a value that will be returned if the specified key does not exist. The default is None.

11.1.3 Adding and Modifying Entries

Dictionaries are mutable, meaning you can add or modify key-value pairs after the dictionary has been created.

Modifying Values

To modify the value associated with an existing key, simply assign a new value to that key:

# Modifying the value associated with "John"
grades["John"] = 88
print(grades)  
{'John': 88, 'Alice': 92, 'Bob': 78}

Adding New Key-Value Pairs

To add a new key-value pair, use the same syntax as modifying an existing pair:

# Adding a new key-value pair
grades["David"] = 90
print(grades) 
{'John': 88, 'Alice': 92, 'Bob': 78, 'David': 90}

11.1.4 Deleting Key-Value Pairs

There are several ways to remove key-value pairs from a dictionary:

Using the del Statement

The del statement removes the key-value pair from the dictionary:

# Removing the key-value pair for "Bob"
del grades["Bob"]
print(grades) 
{'John': 88, 'Alice': 92, 'David': 90}

Using the pop() Method

The pop() method removes a key-value pair and returns the value. If the key is not found, it raises a KeyError, unless a default value is provided.

# Removing and returning the value for "Alice"
alice_grade = grades.pop("Alice")
print(alice_grade)
print(grades)  
92
{'John': 88, 'David': 90}

11.1.5 Dictionary Methods

Dictionaries come with a variety of built-in methods that simplify common tasks, such as adding, removing, and checking for keys and values.

keys(), values(), and items()

  • keys() returns a view of all the keys in the dictionary.
  • values() returns a view of all the values in the dictionary.
  • items() returns a view of all key-value pairs as tuples.
print(grades.keys())   
print(grades.values()) 
print(grades.items()) 
dict_keys(['John', 'David'])
dict_values([88, 90])
dict_items([('John', 88), ('David', 90)])

update()

The update() method allows you to merge two dictionaries or add key-value pairs from another iterable:

# Merging dictionaries
extra_grades = {"Eve": 85, "Charlie": 79}
grades.update(extra_grades)
print(grades) 
{'John': 88, 'David': 90, 'Eve': 85, 'Charlie': 79}

clear()

The clear() method removes all key-value pairs from the dictionary:

grades.clear()
print(grades) 
{}

11.1.6 Iterating Over Dictionaries

There are several ways to iterate over dictionaries, depending on whether you need keys, values, or both:

Iterating Over Keys

By default, iterating over a dictionary yields its keys:

for student in grades:
    print(student)

Iterating Over Values

You can iterate over the values by using the values() method:

for grade in grades.values():
    print(grade)

Iterating Over Key-Value Pairs

The items() method allows you to iterate over both keys and values simultaneously:

for student, grade in grades.items():
    print(f"{student}: {grade}")

11.1.7 Dictionary Comprehension

Like list comprehensions, Python also supports dictionary comprehensions, which provide a concise way to create dictionaries from iterables.

# Creating a dictionary of squares
squares = {x: x**2 for x in range(1, 6)}
print(squares)
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

11.1.8 Practical Applications of Dictionaries

Dictionaries are used in a variety of real-world applications, particularly where fast lookups or associations between pieces of data are needed.

Frequency Count

One common use of dictionaries is counting the frequency of elements in a collection. Here is an example that counts the occurrence of each character in a string:

def char_frequency(text):
    freq = {}
    for char in text:
        if char in freq:
            freq[char] += 1
        else:
            freq[char] = 1
    return freq

text = "data science"
print(char_frequency(text))
{'d': 1, 'a': 2, 't': 1, ' ': 1, 's': 1, 'c': 2, 'i': 1, 'e': 2, 'n': 1}

Storing Configuration Settings

Dictionaries are often used to store configuration settings because they allow easy lookups by key:

config = {
    "host": "localhost",
    "port": 8080,
    "debug": True
}
print(config["host"])
localhost

Caching Computations

Dictionaries can be used to cache results of expensive computations to avoid recalculating them:

cache

A cache (pronounced “cash”) is memory used to store something, usually data, temporarily in a computing environment.

factorial_cache = {}

def factorial(n):
    if n in factorial_cache:
        return factorial_cache[n]
    if n == 0:
        result = 1
    else:
        result = n * factorial(n-1)
    factorial_cache[n] = result
    return result

print(factorial(5))  
print(factorial_cache)
120
{0: 1, 1: 1, 2: 2, 3: 6, 4: 24, 5: 120}

By caching the results, subsequent calls to factorial(n) for previously computed values are faster, as they avoid redundant calculations.

11.1.9 Using **kwargs in Functions

In Python, dictionaries are often used to pass and manage named arguments to functions. One powerful feature that leverages dictionaries is the **kwargs mechanism, which allows functions to accept an arbitrary number of keyword arguments (recall sec-functions). These keyword arguments are collected into a dictionary, which provides flexibility when you do not know in advance what arguments might be passed to a function.

The **kwargs construct is particularly useful when writing functions that need to accept a variable number of named parameters or when extending existing functions with new optional arguments without changing their function signature.

How **kwargs Works

The term kwargs stands for “keyword arguments,” and when used with **, it allows you to pass a variable number of named arguments to a function. Inside the function, these keyword arguments are captured as a dictionary.

def print_student_scores(**kwargs):
    for student, score in kwargs.items():
        print(f"{student}: {score}")

# Calling the function with multiple keyword arguments
print_student_scores(John=85, Alice=92, Bob=78)
John: 85
Alice: 92
Bob: 78

In the example above, the function print_student_scores() accepts any number of keyword arguments and prints them. The **kwargs parameter collects the keyword arguments as a dictionary, where the keys are the argument names (John, Alice, Bob), and the values are the respective scores.

Accessing and Using **kwargs

Once inside the function, **kwargs behaves like a normal dictionary. You can access, iterate over, and modify its elements just as you would with any other dictionary.

def get_student_grade(**kwargs):
    student = kwargs.get("student")
    grade = kwargs.get("grade")
    if student and grade:
        print(f"{student}'s grade is {grade}")
    else:
        print("Missing student or grade information")

# Providing student and grade as keyword arguments
get_student_grade(student="John", grade=85)

# Missing one argument
get_student_grade(student="Alice")  
John's grade is 85
Missing student or grade information

In this example, the kwargs.get() method is used to safely retrieve values from the kwargs dictionary. If the key does not exist, get() returns None, which prevents the function from throwing a KeyError.

Combining **kwargs with Regular and Positional Arguments

You can combine **kwargs with regular and positional arguments. However, **kwargs must always be placed after regular arguments in the function signature:

def student_info(course, **kwargs):
    print(f"Course: {course}")
    for key, value in kwargs.items():
        print(f"{key}: {value}")

# Calling the function with both positional and keyword arguments
student_info("Mathematics", name="John", grade=90, age=20)
Course: Mathematics
name: John
grade: 90
age: 20

Here, course is a regular argument, and the remaining keyword arguments (e.g., name, grade, and age) are captured into the kwargs dictionary.

Passing a Dictionary as **kwargs

If you already have a dictionary of key-value pairs, you can pass it to a function using ** to unpack the dictionary into keyword arguments.

def print_details(name, age, occupation):
    print(f"Name: {name}, Age: {age}, Occupation: {occupation}")

# Creating a dictionary of arguments
person = {"name": "Alice", "age": 30, "occupation": "Data Scientist"}

# Passing the dictionary as keyword arguments
print_details(**person)
Name: Alice, Age: 30, Occupation: Data Scientist

In this case, the **person syntax unpacks the dictionary into keyword arguments, allowing you to pass the dictionary directly into the function.

**kwargs and Default Arguments

While **kwargs allows for flexible keyword arguments, you can also combine it with default arguments to give function parameters some predefined behavior.

def log_message(level="INFO", **kwargs):
    message = kwargs.get("message", "No message provided")
    timestamp = kwargs.get("timestamp", "No timestamp")
    print(f"[{level}] {timestamp}: {message}")

# Logging a message with a default level
log_message(message="System started", timestamp="2023-09-12 10:00:00")

# Overriding the default level
log_message(level="ERROR", message="System failure", timestamp="2023-09-12 10:01:00")
[INFO] 2023-09-12 10:00:00: System started
[ERROR] 2023-09-12 10:01:00: System failure

In this example, the log_message() function uses a default level of “INFO” and then utilizes **kwargs to collect additional information like the message and timestamp.

Summary

The **kwargs construct provides a flexible way to pass and handle keyword arguments in Python. By collecting all keyword arguments into a dictionary, you gain the ability to write dynamic and adaptable functions. Whether used for configuration settings, logging, or passing optional parameters, **kwargs is a powerful tool that makes functions more reusable and extensible.

11.2 Sets

A set in Python is an unordered collection of unique elements. This data structure is useful when you need to store distinct items and perform operations such as union, intersection, difference, or membership testing efficiently. Sets are particularly powerful when handling large datasets where duplication is unnecessary or undesirable.

11.2.1 Creating a Set

Sets are created by placing items inside curly braces {} or by using the set() function. Unlike lists or dictionaries, sets do not maintain any order, and duplicate values are automatically removed. Sets can hold items of any immutable data type, such as numbers, strings, or tuples.

Literal Notation

You can create a set directly by enclosing a sequence of values in curly braces:

# Creating a set of integers
numbers = {1, 2, 3, 4, 5}
print(numbers)  

# Creating a set of strings
fruits = {"apple", "banana", "cherry"}
print(fruits)  
{1, 2, 3, 4, 5}
{'banana', 'cherry', 'apple'}

Using the set() Function

The set() function is particularly useful when creating a set from an iterable, such as a list or a string. It automatically removes duplicates:

# Creating a set from a list with duplicate values
numbers_list = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = set(numbers_list)
print(unique_numbers) 

# Creating a set from a string
letters = set("hello")
print(letters)
{1, 2, 3, 4, 5}
{'o', 'e', 'l', 'h'}

11.2.2 Adding and Removing Elements

Adding Elements

You can add elements to a set using the add() method. However, since sets do not allow duplicate values, adding an existing element has no effect:

# Adding elements to a set
fruits = {"apple", "banana"}
fruits.add("cherry")
print(fruits)  

# Attempting to add a duplicate element
fruits.add("apple")
print(fruits)  
{'banana', 'cherry', 'apple'}
{'banana', 'cherry', 'apple'}

Removing Elements

There are multiple ways to remove elements from a set, including the remove(), discard(), and pop() methods:

  • remove() raises a KeyError if the element does not exist.
  • discard() does not raise an error if the element is not found.
  • pop() removes and returns an arbitrary element, as sets are unordered.
# Removing elements using remove()
fruits.remove("banana")
print(fruits)  

# Using discard() to remove an element safely
fruits.discard("apple")
print(fruits)  

# Removing a random element with pop()
random_fruit = fruits.pop()
print(random_fruit)  
print(fruits)  
{'cherry', 'apple'}
{'cherry'}
cherry
set()

11.2.3 Set Operations

One of the most powerful features of sets is their support for mathematical operations such as union, intersection, difference, and symmetric difference. These operations are efficient and allow for concise and readable code.

Union (| or union())

The union operation combines all elements from two sets, excluding duplicates. This operation can be performed using the | operator or the union() method.

set1 = {1, 2, 3}
set2 = {3, 4, 5}

# Using the | operator
union_set = set1 | set2
print(union_set)  

# Using the union() method
union_set = set1.union(set2)
print(union_set)  
{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5}

Intersection (& or intersection())

The intersection operation returns only the elements that are present in both sets. It can be performed using the & operator or the intersection() method.

set1 = {1, 2, 3}
set2 = {2, 3, 4}

# Using the & operator
intersection_set = set1 & set2
print(intersection_set) 

# Using the intersection() method
intersection_set = set1.intersection(set2)
print(intersection_set)  
{2, 3}
{2, 3}

Difference (- or difference())

The difference operation returns the elements that are in the first set but not in the second. This can be done using the - operator or the difference() method.

set1 = {1, 2, 3}
set2 = {2, 3, 4}

# Using the - operator
difference_set = set1 - set2
print(difference_set) 

# Using the difference() method
difference_set = set1.difference(set2)
print(difference_set)  
{1}
{1}

Symmetric Difference (^ or symmetric_difference())

The symmetric difference operation returns elements that are in either of the sets but not in both. This operation can be performed using the ^ operator or the symmetric_difference() method.

set1 = {1, 2, 3}
set2 = {2, 3, 4}

# Using the ^ operator
sym_diff_set = set1 ^ set2
print(sym_diff_set)  

# Using the symmetric_difference() method
sym_diff_set = set1.symmetric_difference(set2)
print(sym_diff_set)  
{1, 4}
{1, 4}

11.2.4 Checking for Subsets and Supersets

Sets also support operations that allow you to check whether one set is a subset or superset of another. These operations are particularly useful in scenarios where you need to compare sets.

Subset (<= or issubset())

A set A is a subset of set B if all elements of A are also in B. You can check for subsets using the <= operator or the issubset() method.

set1 = {1, 2}
set2 = {1, 2, 3, 4}

# Using the <= operator
print(set1 <= set2) 

# Using the issubset() method
print(set1.issubset(set2))
True
True

Superset (>= or issuperset())

A set A is a superset of set B if all elements of B are also in A. You can check for supersets using the >= operator or the issuperset() method.

set1 = {1, 2, 3, 4}
set2 = {1, 2}

# Using the >= operator
print(set1 >= set2)  

# Using the issuperset() method
print(set1.issuperset(set2))  
True
True

11.2.5 Frozen Sets

A frozen set is an immutable version of a set, meaning that once a frozen set is created, its elements cannot be modified (i.e., you cannot add or remove elements). Frozen sets are useful when you need a collection of unique elements that should remain constant throughout the program. You create a frozen set using the frozenset() function:

# Creating a frozen set
immutable_set = frozenset([1, 2, 3, 4])
print(immutable_set)  

# Attempting to add an element to a frozen set will raise an error
# immutable_set.add(5)  # Raises AttributeError
frozenset({1, 2, 3, 4})

11.2.6 Set Comprehensions

Python provides a concise way to create sets using set comprehensions, similar to list comprehensions. A set comprehension is written with curly braces {} and allows you to define sets based on existing iterables, often including a filtering condition.

# Creating a set of squares for numbers 1 to 5
squares = {x**2 for x in range(1, 6)}
print(squares)  

# Set comprehension with a condition
even_squares = {x**2 for x in range(1, 11) if x % 2 == 0}
print(even_squares)  
{1, 4, 9, 16, 25}
{64, 100, 4, 36, 16}

Set comprehensions are a powerful tool when you need to transform or filter data while maintaining unique elements.

11.2.7 Set Best Practices

While sets are efficient for handling unique elements, there are a few best practices to keep in mind when working with sets in Python:

  1. Avoid Unnecessary Duplicates: Since sets automatically remove duplicates, there’s no need to check for duplicates before adding elements.

  2. Use Sets for Membership Testing: When you need to check if an item exists in a collection and the collection does not need to maintain order or allow duplicates, sets are the best choice due to their O(1) membership testing time complexity.

  3. Choose the Right Operation: Use set operations such as union, intersection, and difference to simplify complex data comparison tasks. These operations are more efficient than writing custom loops to achieve the same results.

Sets are an invaluable data structure for handling collections of unique items. Their efficiency in membership testing, combined with their ability to perform set operations such as union, intersection, and difference, makes them ideal for a wide variety of tasks, from data processing to mathematical computations. With their unordered nature and automatic deduplication, sets help simplify code and ensure efficient performance, especially when working with large datasets.

11.2.8 Combining Dictionaries and Sets

You can often combine dictionaries and sets in practical applications. For example, to find unique words and their counts from a list of sentences:

def unique_words(sentences):
    word_dict = {}
    for sentence in sentences:
        words = set(sentence.split())  # Use a set to find unique words
        for word in words:
            if word in word_dict:
                word_dict[word] += 1
            else:
                word_dict[word] = 1
    return word_dict

sentences = ["data science is great", "data science is evolving"]
result = unique_words(sentences)
print(result)  # Output: {'data': 2, 'science': 2, 'is': 2, 'great': 1, 'evolving': 1}
{'is': 2, 'great': 1, 'data': 2, 'science': 2, 'evolving': 1}

This function uses sets to ensure that each word in a sentence is only counted once per sentence and then stores the results in a dictionary.

11.3 Exercises

Excersice 1: Student Grades

Write a function called update_grades() that accepts a dictionary of student names and their grades. The function should accept new student names and grades and update the dictionary. Finally, it should return the updated dictionary.

Example:

grades = {"John": 85, "Alice": 92}
new_grades = {"Bob": 78, "Alice": 95}

updated_grades = update_grades(grades, **new_grades)
print(updated_grades)

Expected Output:

{"John": 85, "Alice": 95, "Bob": 78}

Excersice 2: Word Frequency Counter

Write a function word_frequency(text) that takes a string and returns a dictionary with the frequency count of each word in the string.

Example:

text = "data science is data fun science is fun"
result = word_frequency(text)
print(result)

Expected Output:

{'data': 2, 'science': 2, 'is': 2, 'fun': 2}

Excersice 3: Dictionary of Squares

Create a function squares_dict(n) that generates a dictionary where the keys are numbers from 1 to n and the values are their corresponding squares.

Example:

print(squares_dict(5))

Expected Output:

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

Exercise 4: Merge Dictionaries

Write a function merge_dictionaries(*args) that accepts any number of dictionaries and merges them into a single dictionary. If a key is repeated, the value from the last dictionary should be retained.

Example:

dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}
dict3 = {"d": 5}

result = merge_dictionaries(dict1, dict2, dict3)
print(result)

Expected Output:

{'a': 1, 'b': 3, 'c': 4, 'd': 5}

Exercise 5: Remove Duplicates

Write a function remove_duplicates(lst) that takes a list and returns a new list with all the duplicates removed using a set.

Example:

numbers = [1, 2, 2, 3, 4, 4, 5]
print(remove_duplicates(numbers))

Expected Output:

[1, 2, 3, 4, 5]

Exercise 6: Set Operations

Given two sets of students enrolled in two different courses, write functions to:

  1. Find students enrolled in both courses.
  2. Find students enrolled only in the first course.
  3. Find students enrolled in either course but not both.

Example:

course_A = {"Alice", "Bob", "Charlie", "David"}
course_B = {"Charlie", "David", "Eve", "Frank"}

# Students in both courses
print(students_in_both(course_A, course_B))

# Students only in course A
print(only_in_first(course_A, course_B))

# Students in either course but not both
print(either_but_not_both(course_A, course_B))

Expected Output:

{'Charlie', 'David'}
{'Alice', 'Bob'}
{'Alice', 'Bob', 'Eve', 'Frank'}

Exercise 7: Flexible Function with **kwargs

Write a function student_profile(**kwargs) that accepts student information (name, age, grade, etc.) as keyword arguments and prints the information in a readable format.

Example:

student_profile(name="Alice", age=20, grade="A", major="Mathematics")

Expected Output:

Name: Alice
Age: 20
Grade: A
Major: Mathematics

Exercise 8: Summing Keyword Arguments

Write a function sum_values(**kwargs) that takes any number of keyword arguments where the values are integers and returns their sum.

Example:

result = sum_values(a=5, b=10, c=3)
print(result)

Expected Output:

18