🧔 - me

👩‍🔬 - volunteer, please

Data Visualizations are Cool

By Patrick Dudas

why?

🧔 - Let's look at some data.

data
dataset x y
0 A 55.384600 97.179500
1 A 51.538500 96.025600
2 A 46.153800 94.487200
3 A 42.820500 91.410300
4 A 40.769200 88.333300
... ... ... ...
1841 M 33.674442 26.090490
1842 M 75.627255 37.128752
1843 M 40.610125 89.136240
1844 M 39.114366 96.481751
1845 M 34.583829 89.588902

1846 rows × 3 columns

🧔 - Wow, that's a lot of data.

🧔 - Let's see if we can figure anything out with this data.

🧔 - How many data sets are there?

len(data['dataset'].unique())
13

And they are...

data['dataset'].unique()
array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'],
      dtype=object)

🧔 - Wow, that's a lot of data sets.

🧔 - Let's calculate the mean values for the x values.

data.groupby('dataset')["x"].mean()
dataset
A    54.263273
B    54.266100
C    54.261442
D    54.269927
E    54.260150
F    54.267341
G    54.268805
H    54.260303
I    54.267320
J    54.268730
K    54.265882
L    54.267849
M    54.266916
Name: x, dtype: float64

🧔 - Hmmm... 🤔

🧔 - They are all roughly the same!

🧔 - Let's calculate the mean values for the y values.

data.groupby('dataset')["y"].mean()
dataset
A    47.832253
B    47.834721
C    47.830252
D    47.836988
E    47.839717
F    47.839545
G    47.835450
H    47.839829
I    47.837717
J    47.830823
K    47.831496
L    47.835896
M    47.831602
Name: y, dtype: float64

🧔 - Hmmm... 🤔

🧔 - They are all roughly the same!

🧔 - Let's calculate the mean and sd values for both the x and y values.

d = {}
d['mean x'] = data.groupby('dataset')["x"].mean().tolist()
d['mean y'] = data.groupby('dataset')["y"].mean().tolist()
d['sd x'] = data.groupby('dataset')["x"].std().tolist()
d['sd y'] = data.groupby('dataset')["y"].std().tolist()
df = pd.DataFrame(data=d)
df.index.name = "data sets"
df
mean x mean y sd x sd y
data sets
0 54.263273 47.832253 16.765142 26.935403
1 54.266100 47.834721 16.769825 26.939743
2 54.261442 47.830252 16.765898 26.939876
3 54.269927 47.836988 16.769959 26.937684
4 54.260150 47.839717 16.769958 26.930002
5 54.267341 47.839545 16.768959 26.930275
6 54.268805 47.835450 16.766704 26.939998
7 54.260303 47.839829 16.767735 26.930192
8 54.267320 47.837717 16.760013 26.930036
9 54.268730 47.830823 16.769239 26.935727
10 54.265882 47.831496 16.768853 26.938608
11 54.267849 47.835896 16.766759 26.936105
12 54.266916 47.831602 16.770000 26.937902

🧔 - 😲 They are all roughly the same! 😲

🧔 - In conclusion, these data sets must all be the same!

🧔 - ✨ Thanks for coming to my TED talk! ✨

🧔 - Have a great day! 😃

👩‍🔬 - Wait..

🧔 - yes?

👩‍🔬 - have you looked at the data?

🧔 - why?

🧔 - fine...

👩‍🔬 - Let's try sound, this is call 💥sonification💥

sr = 22050
T = .1
t = np.linspace(0, T, int(T*sr), endpoint=False) 
x = data[data['dataset'] == "A"]["x"].multiply(10).sample(n=100, random_state=1).tolist()
e = []
for a in x:
    e.append(0.5*np.sin(2*np.pi*a*t)) 
sound1 = np.array(e)
ipd.Audio(sound1.flatten(), rate=sr)
sr = 22050
T = .1
t = np.linspace(0, T, int(T*sr), endpoint=False) 
x = data[data['dataset'] == "B"]["x"].multiply(10).sample(n=100, random_state=1).tolist()
e = []
for a in x:
    e.append(0.5*np.sin(2*np.pi*a*t)) 
sound2 = np.array(e)      
ipd.Audio(sound2.flatten(), rate=sr)

🧔 - WOW! This sound is banging. I am going to sample this for my rave band.

👩‍🔬 - you're not in a band

😎 - yes I am

👩‍🔬 - ...

🧔 - ...

🧔 - WOW! You can hear a difference! I wish I could see the sound!

👩‍🔬 - You can! Remember, visualizations are just representations of signals. See...

fig
fig

🧔 - oww, that's pretty...

👩‍🔬 - This is one of the big "gotchas" of data visualizations. It is part data and part art.

🧔 - why is this a "gotcha"?

👩‍🔬 - because art is subjective. It is a bit of a balancing act between "What looks cool!" and "What looks precise."

🧔 - so how do I build one of these visualatrons?

👩‍🔬 - visualizations. Well, there are two factors to consider when building visualatrons, I mean visualization. They are the spatial or planar encoding and the visual or retinal encoding. For now, we will call them spatial and visual encodings. Let's start with the visual encodings.

🧔 - ohhh

🧔 - ahhh

🧔 - whhhat?

👩‍🔬 - When you translate signals to symbols, we call these symbols glyphs and the process of connecting glyphs with data, 💥semiotics💥

🧔 - Can you encode multiple things with objects?

👩‍🔬 - Yep, but we need to ask, are the data integral or separable? Let's look at an example.

👩‍🔬 - Let's map our x and y values two different ways and see if we can pick how best to map our values.

🧔 - I think I am starting to get it. I have an idea! Let's check out all the datasets based on these visual encodings.

🧔 - There is definitely something going on with these datasets. But I am still not convinced that some are the same.

👩‍🔬 - Good call, maybe we should discuss the last component of visualizations...

🧔 - tacos?

👩‍🔬 - I thought we were finally having a breakthrough. No, the spatial component. This is the most important component of them all. It's like the old saying...

🧔 - you are what you eat?

👩‍🔬 - You're hungry, aren't you.

🧔 - yes.

👩‍🔬 - Well, the saying is... location, location, location.

👩‍🔬 - The spatial component is all about using space to code data. Let's take a look at two plots.

Data Set 1

Forbidden B C D E F G H I J K L M

Data Set 2

Forbidden B C D E F G H I J K L M

🧔 - wait, what's that forbidden one?

👩‍🔬 - what forbidden one?

🧔 - the one that says "forbidden!""

👩‍🔬 - oh, we can't touch that one.

🧔 - hmmm... let's see if I can fix this.

ICDS Image

👩‍🔬 - what are you doing?? Don't hack the webpage! NOOOOOO!!

🧔 - and there we go.

Data Set 1

Forbidden B C D E F G H I J K L M

Data Set 2

Forbidden B C D E F G H I J K L M

🧔 - AHHH!!! 👩‍🔬 - AHHH!!!

🧔 - aww... he's cute...

👩‍🔬 - kill it with fire!

🧔 - NO! I will bring him to life!

👩‍🔬 - there is no way... it's just data!

🧔 - Now it is my turn to show the power of visualatrons!

🧔 - NOW I BRING HIM TO LIFE!

👩‍🔬 - now that is cool! Where can I learn more about this amazing technology?

🧔 - well at https://immersive.psu.edu/ of course!

👩‍🔬 - well, it's about time we end this conversation, don't you think?

👩‍🔬 - Well, there is a dark side to visualizations...

To be continued.....

Right now.