🧔 - me¶

👩‍🔬 - volunteer, please¶

Data Visualizations are Cool¶

By Patrick Dudas¶

why?

🧔 - Let's look at some data.¶

data

	dataset	x	y
0	A	55.384600	97.179500
1	A	51.538500	96.025600
2	A	46.153800	94.487200
3	A	42.820500	91.410300
4	A	40.769200	88.333300
...	...	...	...
1841	M	33.674442	26.090490
1842	M	75.627255	37.128752
1843	M	40.610125	89.136240
1844	M	39.114366	96.481751
1845	M	34.583829	89.588902

1846 rows × 3 columns

🧔 - Wow, that's a lot of data.¶

🧔 - Let's see if we can figure anything out with this data.¶

🧔 - How many data sets are there?¶

len(data['dataset'].unique())

And they are...¶

data['dataset'].unique()

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'],
      dtype=object)

🧔 - Wow, that's a lot of data sets.¶

🧔 - Let's calculate the mean values for the x values.¶

data.groupby('dataset')["x"].mean()

dataset
A    54.263273
B    54.266100
C    54.261442
D    54.269927
E    54.260150
F    54.267341
G    54.268805
H    54.260303
I    54.267320
J    54.268730
K    54.265882
L    54.267849
M    54.266916
Name: x, dtype: float64

🧔 - Hmmm... 🤔¶

🧔 - They are all roughly the same!¶

🧔 - Let's calculate the mean values for the y values.¶

data.groupby('dataset')["y"].mean()

dataset
A    47.832253
B    47.834721
C    47.830252
D    47.836988
E    47.839717
F    47.839545
G    47.835450
H    47.839829
I    47.837717
J    47.830823
K    47.831496
L    47.835896
M    47.831602
Name: y, dtype: float64

🧔 - Hmmm... 🤔¶

🧔 - They are all roughly the same!¶

🧔 - Let's calculate the mean and sd values for both the x and y values.¶

d = {}
d['mean x'] = data.groupby('dataset')["x"].mean().tolist()
d['mean y'] = data.groupby('dataset')["y"].mean().tolist()
d['sd x'] = data.groupby('dataset')["x"].std().tolist()
d['sd y'] = data.groupby('dataset')["y"].std().tolist()
df = pd.DataFrame(data=d)
df.index.name = "data sets"
df

	mean x	mean y	sd x	sd y
data sets
0	54.263273	47.832253	16.765142	26.935403
1	54.266100	47.834721	16.769825	26.939743
2	54.261442	47.830252	16.765898	26.939876
3	54.269927	47.836988	16.769959	26.937684
4	54.260150	47.839717	16.769958	26.930002
5	54.267341	47.839545	16.768959	26.930275
6	54.268805	47.835450	16.766704	26.939998
7	54.260303	47.839829	16.767735	26.930192
8	54.267320	47.837717	16.760013	26.930036
9	54.268730	47.830823	16.769239	26.935727
10	54.265882	47.831496	16.768853	26.938608
11	54.267849	47.835896	16.766759	26.936105
12	54.266916	47.831602	16.770000	26.937902

🧔 - 😲 They are all roughly the same! 😲¶

🧔 - In conclusion, these data sets must all be the same!¶

🧔 - ✨ Thanks for coming to my TED talk! ✨¶

🧔 - Have a great day! 😃¶

👩‍🔬 - Wait..¶

🧔 - yes?¶

👩‍🔬 - have you looked at the data?¶

🧔 - why?¶

👩‍🔬 - sometimes statistics don't tell the whole story. Convert the data into another format and see if these trends continue.¶

🧔 - fine...¶

👩‍🔬 - Let's try sound, this is call 💥sonification💥¶

sr = 22050
T = .1
t = np.linspace(0, T, int(T*sr), endpoint=False) 
x = data[data['dataset'] == "A"]["x"].multiply(10).sample(n=100, random_state=1).tolist()
e = []
for a in x:
    e.append(0.5*np.sin(2*np.pi*a*t)) 
sound1 = np.array(e)
ipd.Audio(sound1.flatten(), rate=sr)

sr = 22050
T = .1
t = np.linspace(0, T, int(T*sr), endpoint=False) 
x = data[data['dataset'] == "B"]["x"].multiply(10).sample(n=100, random_state=1).tolist()
e = []
for a in x:
    e.append(0.5*np.sin(2*np.pi*a*t)) 
sound2 = np.array(e)      
ipd.Audio(sound2.flatten(), rate=sr)

🧔 - WOW! This sound is banging. I am going to sample this for my rave band.¶

👩‍🔬 - you're not in a band¶

😎 - yes I am¶

👩‍🔬 - ...¶

🧔 - ...¶

🧔 - WOW! You can hear a difference! I wish I could see the sound!¶

👩‍🔬 - You can! Remember, visualizations are just representations of signals. See...¶

fig

fig

🧔 - oww, that's pretty...¶

👩‍🔬 - This is one of the big "gotchas" of data visualizations. It is part data and part art.¶

🧔 - why is this a "gotcha"?¶

👩‍🔬 - because art is subjective. It is a bit of a balancing act between "What looks cool!" and "What looks precise."¶

🧔 - so how do I build one of these visualatrons?¶

👩‍🔬 - visualizations. Well, there are two factors to consider when building visualatrons, I mean visualization. They are the spatial or planar encoding and the visual or retinal encoding. For now, we will call them spatial and visual encodings. Let's start with the visual encodings.¶

🧔 - ohhh¶

🧔 - ahhh¶

🧔 - whhhat?¶

👩‍🔬 - When you translate signals to symbols, we call these symbols glyphs and the process of connecting glyphs with data, 💥semiotics💥¶

🧔 - Can you encode multiple things with objects?¶

👩‍🔬 - Yep, but we need to ask, are the data integral or separable? Let's look at an example.¶

👩‍🔬 - Let's map our x and y values two different ways and see if we can pick how best to map our values.¶

🧔 - I think I am starting to get it. I have an idea! Let's check out all the datasets based on these visual encodings.¶

🧔 - There is definitely something going on with these datasets. But I am still not convinced that some are the same.¶

👩‍🔬 - Good call, maybe we should discuss the last component of visualizations...¶

🧔 - tacos?¶

👩‍🔬 - I thought we were finally having a breakthrough. No, the spatial component. This is the most important component of them all. It's like the old saying...¶

🧔 - you are what you eat?¶

👩‍🔬 - You're hungry, aren't you.¶

🧔 - yes.¶

👩‍🔬 - Well, the saying is... location, location, location.¶

👩‍🔬 - The spatial component is all about using space to code data. Let's take a look at two plots.¶

🧔 - wait, what's that forbidden one?¶

👩‍🔬 - what forbidden one?¶

🧔 - the one that says "forbidden!""¶

👩‍🔬 - oh, we can't touch that one.¶

🧔 - hmmm... let's see if I can fix this.¶

ICDS Image

👩‍🔬 - what are you doing?? Don't hack the webpage! NOOOOOO!!¶

🧔 - and there we go.¶

🧔 - AHHH!!! 👩‍🔬 - AHHH!!!¶

🧔 - aww... he's cute...¶

👩‍🔬 - kill it with fire!¶

🧔 - NO! I will bring him to life!¶

👩‍🔬 - there is no way... it's just data!¶

🧔 - Now it is my turn to show the power of visualatrons!¶

🧔 - NOW I BRING HIM TO LIFE!¶

👩‍🔬 - now that is cool! Where can I learn more about this amazing technology?¶

🧔 - well at https://immersive.psu.edu/ of course!¶

👩‍🔬 - well, it's about time we end this conversation, don't you think?¶

👩‍🔬 - Well, there is a dark side to visualizations...¶

To be continued.....¶

Right now.¶

🧔 - me¶

👩‍🔬 - volunteer, please¶

Data Visualizations are Cool¶

By Patrick Dudas¶

🧔 - Let's look at some data.¶

🧔 - Wow, that's a lot of data.¶

🧔 - Let's see if we can figure anything out with this data.¶

🧔 - How many data sets are there?¶

And they are...¶

🧔 - Wow, that's a lot of data sets.¶

🧔 - Let's calculate the mean values for the x values.¶

🧔 - Hmmm... 🤔¶

🧔 - They are all roughly the same!¶

🧔 - Let's calculate the mean values for the y values.¶

🧔 - Hmmm... 🤔¶

🧔 - They are all roughly the same!¶

🧔 - Let's calculate the mean and sd values for both the x and y values.¶

🧔 - 😲 They are all roughly the same! 😲¶

🧔 - In conclusion, these data sets must all be the same!¶

🧔 - ✨ Thanks for coming to my TED talk! ✨¶

🧔 - Have a great day! 😃¶

👩‍🔬 - Wait..¶

🧔 - yes?¶

👩‍🔬 - have you looked at the data?¶

🧔 - why?¶

👩‍🔬 - sometimes statistics don't tell the whole story. Convert the data into another format and see if these trends continue.¶

🧔 - fine...¶

👩‍🔬 - Let's try sound, this is call 💥sonification💥¶

🧔 - WOW! This sound is banging. I am going to sample this for my rave band.¶

👩‍🔬 - you're not in a band¶

😎 - yes I am¶

👩‍🔬 - ...¶

🧔 - ...¶

🧔 - WOW! You can hear a difference! I wish I could see the sound!¶

👩‍🔬 - You can! Remember, visualizations are just representations of signals. See...¶

🧔 - oww, that's pretty...¶

👩‍🔬 - This is one of the big "gotchas" of data visualizations. It is part data and part art.¶

🧔 - why is this a "gotcha"?¶

👩‍🔬 - because art is subjective. It is a bit of a balancing act between "What looks cool!" and "What looks precise."¶

🧔 - so how do I build one of these visualatrons?¶

👩‍🔬 - visualizations. Well, there are two factors to consider when building visualatrons, I mean visualization. They are the spatial or planar encoding and the visual or retinal encoding. For now, we will call them spatial and visual encodings. Let's start with the visual encodings.¶

🧔 - ohhh¶

🧔 - ahhh¶

🧔 - whhhat?¶

👩‍🔬 - When you translate signals to symbols, we call these symbols glyphs and the process of connecting glyphs with data, 💥semiotics💥¶

🧔 - Can you encode multiple things with objects?¶

👩‍🔬 - Yep, but we need to ask, are the data integral or separable? Let's look at an example.¶

👩‍🔬 - Let's map our x and y values two different ways and see if we can pick how best to map our values.¶

🧔 - I think I am starting to get it. I have an idea! Let's check out all the datasets based on these visual encodings.¶

🧔 - There is definitely something going on with these datasets. But I am still not convinced that some are the same.¶

👩‍🔬 - Good call, maybe we should discuss the last component of visualizations...¶

🧔 - tacos?¶

👩‍🔬 - I thought we were finally having a breakthrough. No, the spatial component. This is the most important component of them all. It's like the old saying...¶

🧔 - you are what you eat?¶

👩‍🔬 - You're hungry, aren't you.¶

🧔 - yes.¶

👩‍🔬 - Well, the saying is... location, location, location.¶

👩‍🔬 - The spatial component is all about using space to code data. Let's take a look at two plots.¶

Data Set 1

Data Set 2

🧔 - wait, what's that forbidden one?¶

👩‍🔬 - what forbidden one?¶

🧔 - the one that says "forbidden!""¶

👩‍🔬 - oh, we can't touch that one.¶

🧔 - hmmm... let's see if I can fix this.¶

👩‍🔬 - what are you doing?? Don't hack the webpage! NOOOOOO!!¶

🧔 - and there we go.¶

Data Set 1

Data Set 2

🧔 - AHHH!!! 👩‍🔬 - AHHH!!!¶

🧔 - aww... he's cute...¶

👩‍🔬 - kill it with fire!¶

🧔 - NO! I will bring him to life!¶

👩‍🔬 - there is no way... it's just data!¶

🧔 - Now it is my turn to show the power of visualatrons!¶

🧔 - NOW I BRING HIM TO LIFE!¶

👩‍🔬 - now that is cool! Where can I learn more about this amazing technology?¶

🧔 - well at https://immersive.psu.edu/ of course!¶

👩‍🔬 - well, it's about time we end this conversation, don't you think?¶

🧔 - it looks like visualizations are the answer to all data related problems!¶