3 Describing Data with Tables and Graphs

3.1 Organizing Categorical Data

“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey

Guiding question: Which graphs work best for categorical data?

When you collect data that fall into groups—like preferred streaming service, political affiliation, or type of pet—the first step is to count how many observations fall into each category. Those counts form the backbone of both tables and graphs for categorical data. In this section we’ll learn how to build simple frequency tables, translate them into proportions or percentages, and organize two categorical variables together in a two‑way table. Along the way we’ll see when different visual summaries make sense and preview bar and pie charts (covered in detail in Section 3.2).

Frequency and relative frequency tables

A frequency table lists the possible values of a categorical variable and the number of times each value occurs. It’s one of the simplest ways to summarize data, but it packs a punch. Such tables usually have two columns: one for the categories and one for their counts. Creating one involves listing the distinct categories and tallying how many observations fall into each category.

Sometimes absolute counts aren’t enough—especially when comparing samples of different sizes. A relative frequency expresses each category’s count as a proportion of the total. When we add a column of proportions (or percentages) to a frequency table we get a relative frequency table. To compute the relative frequencies, divide each count by the total number of observations. Relative frequencies always add up to 1 (or 100% when expressed as percentages), which makes them handy for comparing distributions across different sample sizes.

An example: common symptoms in a clinic

Imagine you survey 30 patients at a local clinic about the primary symptom that brought them in. You record four categories: “Headache,” “Back pain,” “Fatigue,” and “Nausea.” We can organize the responses in a simple table of counts and proportions. Below we simulate such a survey and display the results.

response	frequency	relative_frequency
Back pain	9	0.3000000
Fatigue	9	0.3000000
Headache	8	0.2666667
Nausea	4	0.1333333

The table lists the four categories in alphabetical order with their counts and relative frequencies. For instance, if 8 of the 30 patients reported “Headache,” the relative frequency of “Headache” is \(8/30 \approx 0.27\). The accompanying bar chart gives a visual sense of the same information: each bar’s height corresponds to a category’s frequency, and the bars are separated to emphasize that the categories have no inherent order. In practice you might reorder the bars to make the graph easier to read—perhaps putting the largest category first.

Pareto charts

Sometimes you want to highlight the few categories that account for most of the observations. A Pareto chart is a bar chart arranged in descending order of frequency and often paired with a cumulative percentage line. It helps you identify the “vital few and trivial many” in quality control and business applications. Pareto charts are useful when there are many categories and you want to focus attention on the most common causes or responses.

Tip:

In JMP Pro 17 you can create a frequency table by selecting Analyze → Distribution, assigning your categorical variable to the X role, and examining the resulting counts. To add relative frequencies, use the red triangle menu (▸) to choose Display Options → Show Percent. JMP’s Graph Builder will automatically construct a bar chart when you drag a categorical variable to the X‑axis and the count statistic to the Y‑axis.

Two‑way (contingency) tables

What if you have two categorical variables and want to see how they interact? A contingency table (also called a two-way table) displays the counts for each combination of levels of the two variables. One variable defines the rows and the other defines the columns. Such tables are the starting point for examining associations and will underlie chi‑square tests in later chapters.

An example: symptom by age group

Suppose we collect data on the same symptom question but also record each patient’s age group: “Under 30,” “30–50,” or “Over 50.” We can summarize the joint distribution in a two‑way table.

age_group	Back pain	Fatigue	Headache	Nausea
30–50	3	4	5	3
Over 50	2	3	1	0
Under 30	4	2	2	1

Each cell in the table shows the number of patients who fall into the corresponding combination of age group and symptom. We can also compute row or column relative frequencies to see percentages within each group; for example, dividing each row by its total gives the distribution of symptoms within each age group. Contingency tables allow us to see whether symptom patterns differ across age groups and serve as input for clustered or stacked bar charts (discussed in Section 3.2).

Why percentages matter

Because categorical variables can have different numbers of levels and sample sizes can vary, relative frequencies are essential for fair comparisons. Reporting only counts can be misleading: 20 supporters of a movie genre in a survey of 50 people represent a large fraction, while 20 supporters in a survey of 500 people represent a much smaller fraction. Percentages standardize the scale.

When displaying percentages, make sure they add to 100%. In a pie chart (a circular graph we’ll describe in the next section), each slice represents a category’s percentage of the whole. Pie charts are useful for showing how the total is divided among categories, but they become cluttered with too many slices. Bar charts are more flexible: you can reorder the bars, show counts or percentages, and compare multiple groups using side‑by‑side or stacked bars.

Working in JMP Pro 17

In JMP, tables and graphs for categorical variables are straightforward:

To create a frequency table, go to Analyze → Distribution, assign your categorical variable to X, and click OK. The report shows counts and percentages; use the red triangle (▸) menu to toggle percentages, counts, or both.
For two categorical variables, use Analyze → Fit Y by X and assign one variable to Y and the other to X. Choose Contingency Table from the platform to see the two‑way counts and associated statistics.
To visualize categorical distributions, open Graph Builder, drag the categorical variable to the X‑axis, and drop the N summary statistic onto the Y‑axis. You can change the chart type to “Bar” or “Pie.” Dragging a second categorical variable onto the Group drop zone will create clustered or stacked bars.

Recap

Keyword	Definition
Frequency table	A table that lists each category of a variable and the number of observations in that category.
Relative frequency	The proportion or percentage of observations in a category, equal to the category’s count divided by the total count.
Relative frequency table	A frequency table with an additional column showing the relative frequency of each category.
Two‑way (contingency) table	A table that displays the counts for each combination of levels of two categorical variable.

Check your understanding

In a survey of 80 households, 32 own a dog, 20 own a cat, 12 own both, and the remainder own no pets. Construct a frequency table that shows the number and percentage of households in each pet ownership category (Dog only, Cat only, Both, None). Which visualization—a bar chart or a pie chart—would you choose, and why?
Explain the difference between a frequency table and a relative frequency table. In what situations is it more informative to look at relative frequencies rather than absolute frequencies?
What is a two‑way (contingency) table? Describe a scenario where a two‑way table could help you explore the relationship between two categorical variables.
Bar charts have spaces between bars and can be drawn in any order. Why are these design choices appropriate for categorical variables? What might go wrong if you drew the bars touching or forced them into a numerical order?

Solutions

Pet ownership table. The four categories and their counts are: Dog only (20), Cat only (8), Both (12), None (40). The total number of households is 80. The relative frequencies are 25% dog only, 10% cat only, 15% both, and 50% none. A bar chart would be preferable here because it allows you to order the bars from most to least common and makes it easy to compare magnitudes. A pie chart could work for four categories, but it becomes harder to read when slices are similar in size or when there are many categories.
Frequency vs. relative frequency. A frequency table reports the counts of observations in each category. A relative frequency table adds a column showing the proportion or percentage of observations in each category. Relative frequencies are more informative when comparing groups of different sizes or when you want to focus on the distribution rather than the sample size—for example, comparing survey results from two classes of different sizes.
Contingency table example. A two‑way table displays counts for each combination of levels of two categorical variable. For instance, you could record whether each patient in a clinic has insurance (Yes/No) and whether they arrived on time (On time/Late). A contingency table would show how many patients fall into each combination (e.g., insured & on time, insured & late, uninsured & on time, uninsured & late), helping you explore whether punctuality differs by insurance status.
Design choices. Categories have no intrinsic numeric order, so bars in a bar chart can be arranged in any order without misrepresenting the data. Leaving space between bars reinforces that the categories are distinct and unordered. If you drew the bars touching, it might suggest a continuous scale (like a histogram), which could confuse readers. Forcing categories into a numerical order might imply ranking where none exists.

3.2 Bar Charts and Pie Charts

“`Normally if given a choice between doing something and nothing, I’d choose to do nothing. But I would do something if it helps someone do nothing. I’d work all night if it meant nothing got done.” – Ron Swanson

Guiding question: How do we make clear, truthful bar charts and pie charts?

When you’ve tallied the counts of a categorical variable, your next job is to turn those numbers into a picture. Two of the simplest pictures—bar charts and pie charts—seem deceptively alike: each shows categories and their sizes. But as we’ll see, they serve different purposes and come with different design rules.

What is a bar chart?

As we have already seen in Section 3.1, bar chart displays categories along one axis and uses the length of a bar on the other axis to represent a numerical value. Bars can be vertical (a “column” chart) or horizontal, and they are separated by gaps to emphasise that the categories are discrete. Because humans are good at comparing lengths that share a common baseline, bar charts are our go‑to tool for comparing counts, percentages or other statistics across categories.

Example: distribution of blood types. Suppose a hospital records the blood type (A, B, AB or O) of 200 randomly chosen donors. The counts are shown in the table below along with a bar chart. Notice that the bars are separated and can be reordered to make patterns easy to see.

type	count	prop
A	66	0.330
AB	9	0.045
B	31	0.155
O	94	0.470

The vertical bar chart emphasizes how common type O is relative to the others. You could flip the axes to make a horizontal bar chart if your category names are long or if you prefer to read labels on the y‑axis.

Design tips for bar charts. A few simple rules help make bar charts honest and clear:

Start the axis at zero. Because bar length encodes value, truncating the axis exaggerates differences.
Keep consistent spacing. Leave a gap between bars—about half the width of a bar is a good rule of thumb.
Sort deliberately. Arrange bars in a logical order (alphabetical, chronological, or by size) to help the reader scan.
Avoid clutter and gimmicks. Skip 3‑D effects and decorative icons; they distort perception and add no information.

What is a pie chart?

A pie chart shows how a total is divided among categories. A circle is divided into slices; each slice represents a category and its angle corresponds to the category’s proportion of the whole. Pie charts are familiar and immediately signal a “part of a whole” story.

Example: reasons for missing an appointment. A dental clinic tracks why patients miss scheduled cleanings. Out of 100 missed appointments, 50 were due to forgetfulness, 20 to fear, 15 to cost, and 15 to other reasons. A pie chart makes the share of each reason obvious.

The slices emphasize that half of the missed appointments were simply forgotten. However, imagine adding three more reasons of similar size. The slices would become crowded and hard to compare. Pie charts work only when the categories sum to a meaningful whole and there are no more than a few slices.

When to use bar charts vs. pie charts

Although you can plot the same data with either chart, they are not interchangeable. Use a bar chart when you want to:

Compare values across categories or between groups.
Display a statistic that does not sum to a meaningful whole (e.g., average pain scores by medication).
Show many categories, even if some are small.

Use a pie chart only when:

The values represent parts of a whole that add up to 100%.
The number of categories is small (ideally no more than five).
You care more about conveying the big picture of how the whole is divided than about exact comparisons.

When you find yourself squinting at a pie chart to see which slice is bigger, switch to a bar chart; our brains judge lengths more accurately than angles.

Clustered and stacked bar charts

Sometimes you have two categorical variables and want to see how their categories interact. We introduced two‑way tables in Section 3.1; here’s how to graph them.

A clustered (side‑by‑side) bar chart groups bars for each level of a second variable next to each other so you can compare across groups. For example, imagine you survey 120 patients about how satisfied they were with a new physical therapy program (satisfied, neutral, dissatisfied) and record whether they were in the treatment or control group. A clustered bar chart shows differences in satisfaction between the two groups.

In a stacked bar chart, bars for each category are stacked atop one another. This emphasizes the total size of each category but makes it harder to compare the segments across stacks. You might use a stacked chart to show how types of injuries (sprain, fracture, other) contribute to emergency visits across departments; if you convert each bar to 100% of its height, you get a 100% stacked bar chart that highlights composition within each group.

Cautions with stacked bars

Stacked bars are useful when you care about the total across categories, but they hide patterns in the middle segments. In the last plot, you can easily compare the overall emergency visits across departments and the share of fractures, but it’s harder to compare the “other” injuries across departments because their segments float at different heights. If your goal is to compare subgroups, a clustered bar chart is usually better.