How to Create Network Visualizations in Tableau

Network visualizations display relationships between elements by linking nodes with common characteristics. This allows the user to visualize clusters quickly and determine relationships. Unfortunately, the development process in Tableau isn’t as straightforward. To get the data into a Tableau-friendly format for network visualization, we’ll first do some data preparation then bring it into Gephi to produce spatial coordinates.

This example is expanding on the blog published by The Data Surfers. I’ll be showing the steps in both Python and Excel. I prefer Python, but either (or a tool of your choice) should produce the same result.

Prepare a Dataset Capable of Creating Network Visualizations

To accomplish this, you’ll need a dataset capable of showing a network of relationships. In this example, I’ll be using Harry Potter screen time statistics gathered from IMDB. This allows us to link each actor to the movies they appeared in. I’ll be referring to this as the main dataset.

If you’d like to follow along with the same data, do it! I even cleaned it up a bit for you; thank me later.

To prepare the data for Gephi, we need to create two CSV files: nodes and edges.

Prepare Data for Gephi: Create a CSV Nodes File

The nodes file will contain three columns:

ID: a unique identifier for each label
Nodes: a unique identifier to create each node
Label: a list of all unique values of the two columns you’re visualizing (in my case: Actors and Movies)

Python

# 1. Read in the main dataset

import pandas as pd

import numpy as np

main = pd.read_csv(yourfilename)

# 2. Take a unique list of the two network columns (Actor and Movie)

movies = main.Movie.unique()

actors = main.Actor.unique()

# 3. Concatenate the two list into one array

label = np.concatenate([movies, actors])

# 4. Create the nodes dataframe from the label array

nodes = pd.DataFrame(label, columns = [“Label”])

# 5. Add the ID and Nodes unique identifier columns

nodes[“ID”] = range(1, 1 + len(nodes))

nodes[“Nodes”] = range(1, 1 + len(nodes))

# 6. Export the nodes dataframe as a CSV

nodes.to_csv(‘nodes.csv’, index = False)

Excel

Create a new sheet named nodes
Create three columns “ID,” “Nodes,” and “Label”
Copy the Movies column from main and paste it under “Label”
Copy the Actors column from main and paste it under “Label” below the Movies values
Select the entire “Label” column and remove duplicates
- Data – Data Tools – Remove Duplicates
Create a unique ID and Node for every value in “Label”
- Make sure the “ID” and “Node” columns are identical
Save the sheet as a CSV

Prepare Data for Gephi: Create a CSV Edges File

Before we attack the edges file, let’s add two columns to the main dataset: “Source” and “Target.” We’ll use the nodes file to get the unique ID of both the Actor and Movie labels. The “Source” value will contain the Movie ID and “Target” the Actor ID.

Python

# use the map() function to add the columns with data from nodes

main[“Source”] = main.Movie.map(nodes.set_index(‘Label’)[‘Nodes’].to_dict())

main[“Target”] = main.Actor.map(nodes.set_index(‘Label’)[‘Nodes’].to_dict())

I found this article helpful in implementing .map()

Excel

Use the Index Match function combination to transfer the IDs from the nodes to the “Source” and “Target” columns.

Here’s a helpful article on using Index Match

Build the Edges Dataset

Ok, now we’re ready to build the edges dataset.

The edges file will contain four columns:

Source: The unique Movie ID from the main dataset
Target: The unique Actor ID from the main dataset
Type: “Directed,” which tells Gephi that edges can have a direction. If you’d like to read further into it, here ya go.
Weight: “1,” which tells Gephi the total connections between Source and Target. Again, if you’d like to read further into it, have at it.

Python

# 1. Grab the Source and Target columns from main

edges = main[[“Source”, “Target”]]

# 2. Add the Type and Directed columns

edges.insert(2, ‘Type’, “Directed”)

edges.insert(3, ‘Weight’, 1)

# 3. Export the edges dataframe as a CSV

edges.to_csv(“edges.csv”, index = False)

# 4. Export the main data frame as a CSV

main.to_csv(“main.csv”, index = False)

Excel

Create a new sheet named edges
Copy the “Source” and “Target” columns and paste them into the edges sheet
Create a column named “Type” and populate every row with “Directed”
Create a column named “Weight” and populate every row with “1”
Save the edges sheet as a CSV
Save the main sheet as a CSV

Loading Data into Gephi

We now have all the data necessary for Gephi. If you haven’t already, install the free software.

When loading the software, I got an error “Cannot find Java 1.8 or higher”. This site helped resolve it. To summarize, add a new line to the top of the gephi.conf file directing it to your instance of Java (ex. jdkhome=”C:/Program Files (x86)/Java/jre1.8.0_291″)

Let’s start loading some data.

Click File – Open and select your nodes CSV file
- Gephi should recognize this as a Nodes table
Click Next – Finish
Change Graph Type from Mixed to Directed
Click OK

At this point, you should see an array of nodes that appear randomly distributed in your workspace.

Let’s input the edges file (a very similar process)

Click File – Open and select your edges CSV file
- Gephi should recognize this as an Edges table
Click Next – Finish
Change Graph Type to Directed
Select “Append to existing workspace”
Click OK

We’re getting somewhere! It’s time to utilize the Layout dropdown to format our network.

I recommend going through each layout and altering its preset properties until you find one that meets your needs. I ended up using the Yifan Hu Proportional.

Once you’ve selected a layout, you can manually move nodes using the drag tool (select Configure to change the diameter of your cursor).

To export the coordinate data, select File – Export – Graph File and save the file as a .gexf

Final Data Prep

We’re almost to Tableau, I promise. To extract the coordinates, we must open the .gexf file as an XML table. I found this easiest to implement in Excel.

Open an empty Excel sheet
Click File – Open and navigate to the .gexf file you just created
- Make sure that “All Files” is selected from the file search dropdown

Select the .gexf file and click Open
When prompted to select how to open the file, choose “As an XML table”

You’re looking at something like this, right?

Perfect. We’ll call this new table network. The only columns you’ll need from network are “label,” “value,” “x,” and “y.” Do what you wish with the remaining columns.

The goal is to transfer the x and y values to our main dataset using the unique ID of each node. Notice the “value” column for each label in the network sheet corresponds to the “Source” and “Target” columns in the main dataset.

For the links in the network visualization, we need to create two nodes for each source and target combination in the main dataset. (ex. one at Harry Potter and one at The Sorcerer’s Stone)

Python

Save the network table as an .XLSX file.

# 1. Read in the main dataset

main = pd.read_csv(“main.csv”)

# 2. Read in the network dataset

network = pd.read_excel(“network.xlsx”)

# 3. Merge the ‘x’ and ‘y’ coordinates corresponding to the “Target” value for each main dataset combination. Save this as a new dataframe

target = main.merge(network[[‘x’, ‘y’, ‘value’]], how = ‘left’,

left_on = ‘Target’, right_on = ‘value’).drop(columns = [‘value’])

# 4. Merge the ‘x’ and ‘y’ coordinates corresponding to the “Source” value for each main dataset combination. Save this as a new dataframe

source = main.merge(network[[‘x’, ‘y’, ‘value’]], how = ‘left’,

left_on = ‘Source’, right_on = ‘value’).drop(columns = [‘value’])

# 5. Append the source data to target and save it as main

main = target.append(source)

# 6. Export main as a CSV

main.to_csv(“main.csv”, index = False)

Excel

Make a copy of the main sheet
In the original sheet, use the Index Match function combination to match the Target value of each pair with its corresponding x and y coordinates.
- ex. = INDEX(array of x coordinates, MATCH(label in main, array of labels in network, 0))
In the copy of the sheet, use the same function combination to match the Source value of each pair with its corresponding x and y coordinates
Copy and paste all the data in the second sheet (the source value coordinates) under the data in the original sheet (the target value coordinates).
Delete the second sheet and save

Connect to Main Dataset in Tableau

About time, right? Now, all you’re going to need is your main dataset. Once you’ve connected to the main dataset, in a new sheet, create a dual-axis chart using the X and Y coordinates.

Add X to columns
Add Y to rows twice
- Note both the X and Y variables must be continuous dimensions
Right-click on the second Y variable and select “Dual Axis”

Right-click on either Y-axis and select “Synchronize Axis”
Set the first value on the Marks card, Y, to Line
Set the second value on the Marks card, Y (2), to Circle

Select All on the Marks card and add Source and Target to detail as continuous dimensions.

Bingo.

I’ll leave the design decisions up to you, but this was the process I used to create the Network Visualization – Harry Potter dashboard.

How to Create Network Visualizations in Tableau

Prepare a Dataset Capable of Creating Network Visualizations

Prepare Data for Gephi: Create a CSV Nodes File

Prepare Data for Gephi: Create a CSV Edges File

Build the Edges Dataset

Loading Data into Gephi

Final Data Prep

Connect to Main Dataset in Tableau

More to explore

phData Achieves the AWS Generative AI Competency

Omni Analytics: What You Need to Know

Table Groupings & Table Summary in Sigma Computing

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

phData Achieves the AWS Generative AI Competency

Omni Analytics: What You Need to Know

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning