Accessing Data - 🐴¶
In this notebook, we discuss how to access datasets within D3.js. There are two ways to accomplish this in our Jupyter Lab environment.
d3-fetch¶
The most straightforward way is to use d3-fetch.
JavaScript
d3.csv("/path/to/file.csv").then((data) => { })
Python (Pandas) + d3-fetch¶
The second way include d3-fetch, but processing the data first in python, saving the file, and then using d3.fetch. Typically, I will use a library called Pandas to do this, as it similar features to d3.js.
Python
import pandas as pd
data=pd.read_csv("/path/to/file.csv")
data.to_csv('newFile.csv', index = False, header=True)
JavaScript
d3.csv("/path/to/newFile.csv").then((data) => { })
Pros/Cons of Both¶
There is little difference between the two, obviously you are adding another step with Python, but you get the added benefit of being able to use all of Python’s tools before working with the file. Not to say D3.js does not have these similar capabilities. If anything, I (my preference) think D3.js are a bit better and easier to use, but this not the case for everyone. Let’s showcase both.
Acknowledgement for Data¶
Cite as: Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeal, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. CITY:US420020. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ 02/22/2021.
Publications citing this dataset should also cite the following article: Matthew J. Menne, Imke Durre, Russell S. Vose, Byron E. Gleason, and Tamara G. Houston, 2012: An Overview of the Global Historical Climatology Network-Daily Database. J. Atmos. Oceanic Technol., 29, 897-910. doi:10.1175/JTECH-D-11-00103.1.
Use liability: NOAA and NCEI cannot provide any warranty as to the accuracy, reliability, or completeness of furnished data. Users assume responsibility to determine the usability of these data. The user is responsible for the results of any application of this data for other than its intended purpose.
d3-fetch Implementation¶
The data we will be using for this exercise is acknowledged above, but the basic idea is this is a weather data for the State College, PA area (near Penn State University).
The data preview is:
DATE - date in YYYY-MM-DD format
DAY - the day (_D format)
MONTH - the month (_M format)
YEAR - the year (YYYY format)
PRCP - the amount of precipitation in inches
SNOW - the amount of snow in inches
TMAX - the max temperature in F°
TMIN - the min temperature in F°
Here’s a print out of the first row of data:
%%html
<p id="print1"></p>
<script type="module">
import * as d3 from "https://cdn.skypack.dev/d3@7";
d3.csv("https://gist.githubusercontent.com/dudaspm/13174849c09aba7a0716d5fa230ebe95/raw/a4866834185bad7e4a1e1b8f90da76b168eb1361/StateCollege2010-2020_min.csv")
.then((data) =>
document.getElementById("print1").innerHTML = JSON.stringify(data[0])
)
</script>
💥💥AND I STRONGLY RECOMMEND DOING THIS💥💥
with these types of calls (.then()), you can also include .catch((error) => console.log(error)). This will make sure that you notified if there is an error and what that error is.
Here is what the complete step will look like.
d3.csv("file")
.then((data) => do something with the data
.catch((error) => console.log(error))
Why? because Jupyter does a not so great job of showcasing errors. As much we try and teach ourselves to be perfect coders 🙄. We all make mistakes and too be honest, as much as I love D3.js. One small thing can cause problems. If you have not looked at the troubleshooting section of the notebook, please check it out. It will save you hours of your life in small errors.
%%html
<p id="print2"></p>
<script>
d3.csv("https://raw.githubusercontent.com/dudaspm/d3plotbook/main/weather.csv")
.then((data) =>
document.getElementById("print2").innerHTML = JSON.stringify(data[0])
)
.catch((error) => console.log(error) )
</script>
Python + Pandas + d3-fetch Implementation¶
Now this implementation adds an extra step but again, if you want to do more in Python and make the data available to D3.js afterwards. This is where you want to be!
Let’s do the same steps as above, but this time we will add Pandas.
Note
For those using this notebook in Google Colabs, Pandas is already installed. But, for those running this locally or on another system. You may need to install pandas first. You can actually do this in the notebook. Create a new cell (after this one) and run the following:
!pip install pandas
Here, I’ll help with this. The next cell will have this, but it will commented out. Just remove the #.
#!pip install pandas
We need to break this up into three steps. This will allow Jupyter Lab to keep this variable within the notebook and use it internally (instead of writing to file).
Also, and most importantly, this works in Google Colab.
Using Python + Pandas¶
Suprisingly, this is much harder than it needs to be. Specifically because I am trying to design this for Jupyter Lab and (the culprit 😈) Google Colab. Google Colab is a fantastic implementation of Jupyter Lab but it has to be locked down a bit more because well, people could do some really bad things if they didn’t. So, after months (and I mean months) of searching and trying things, this is my best implementation.
Side note, if by chance you figure out a better solution, please let me know! It would make my day 😄
Starting Pandas and Getting the File¶
The first step is enabling Pandas in Jupyter Lab and in Python we use import statements for this. Think about imports as extra libraries built using Python to increase what they can do. In this case, Pandas is a great, as they specify, “a fast, powerful, flexible and easy to use open source data analysis and manipulation tool [pdt20, WesMcKinney10].”
Note
When using an external library, like Pandas, we can rename the library (as in this case, pandas to pd). This is a common way to see Pandas being used. This means when I want to use something from this library, I will use pd instead of pandas.
import pandas as pd
data=pd.read_csv("https://raw.githubusercontent.com/dudaspm/d3plotbook/main/data/StateCollege2000-2020.csv")
Obviously, because this is csv, we use
pd.read_csv
Here is a listing of all of Pandas I/Os (input/outputs): https://pandas.pydata.org/docs/reference/io.html
We next need to create a sender and receiver model. Again, it is not as simple as handing the data to JavaScript and having it work. Instead we need to broadcast the data to the receiver using a
new BroadcastChannel('name')
You do need to run these in order but it works ¯\_(ツ)_/¯
The receiver is waiting for a message. This is where we would put our D3.js code once we send it from Pandas.
The sender code takes takes the Pandas data (in what is called a DataFrame), turns it into a CSV structure, and removes all new lines (\n) with (;;;). Why ;;;? Because that’s what I like to use as the ;;; pattern is quite a random patterns, so its unlikely the data would have ;;; in it.
Receiver (JavaScript)¶
%%html
<p id="print3"></p>
<script type="module">
import * as d3 from "https://cdn.skypack.dev/d3@7";
var receiver = new BroadcastChannel('channel');
receiver.onmessage = (msg) => {
var data = d3.csvParse(msg.data.replaceAll(";;;","\n"))
document.getElementById("print3").innerHTML = JSON.stringify(data[0])
//###################################################
////# This is where we would put our D3.js code #////
//###################################################
};
</script>
Sender (JavaScript)¶
from IPython.display import display, HTML
d = data.head().to_csv(index=False,line_terminator='\n').replace('\n',';;;')[:-3]
#############################################################
#//// This is where we send out Pandas data from Python ////#
#############################################################
js = """
<script>
var sender = new BroadcastChannel('channel');
var message = '{}'
sender.postMessage(message);
</script>""".format(d)
print(js)
display(HTML(js))
<script>
var sender = new BroadcastChannel('channel');
var message = 'DATE,DAY,MONTH,YEAR,PRCP,SNOW,TMAX,TMIN;;;2000-01-01,1,1,2000,0.0,0.0,44.0,23;;;2000-01-02,2,1,2000,0.0,0.0,52.0,23;;;2000-01-03,3,1,2000,0.01,0.0,60.0,35;;;2000-01-04,4,1,2000,0.12,0.0,62.0,54;;;2000-01-05,5,1,2000,0.04,0.0,60.0,30'
sender.postMessage(message);
</script>
Examples of Each Approach¶
Note, there are several concepts that we have not talked about yet here. So, don’t be afraid 👻
I just want to showcase how to implement both.
d3-fetch only¶
%%html
<div id="gohere1"></div>
<script type="module">
import * as d3 from "https://cdn.skypack.dev/d3@7";
d3.csv("https://raw.githubusercontent.com/dudaspm/d3plotbook/main/data/StateCollege2000-2020.csv")
.then(function(data) {
var width = 600
var height = 300
var margin = 60
var dateConverter = d3.timeParse("%Y-%m-%d")
data = data.filter(d=> (d.MONTH=="3") && (d.YEAR=="2020"))
data = data.map(d=> ({"DATE":dateConverter(d.DATE),"TMAX":+d.TMAX}))
var xScale = d3.scaleTime().range([margin , width - margin]).domain(d3.extent(data, (d,i) => d.DATE))
var yScale = d3.scaleLinear().range([height-margin , margin]).domain(d3.extent(data, (d,i) => d.TMAX))
var svg = d3.select("div#gohere1").append("svg")
.attr("width", width)
.attr("height", height)
svg.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.DATE))
.attr("cy", d => yScale(d.TMAX))
.attr("r", 5)
.style("stroke", "darkgrey" )
.style("stroke-width", 1)
.style("fill", "steelblue")
})
.catch(function(error){
console.log(error)
})
</script>
BroadcastChannels¶
%%html
<div id="gohere2"></div>
<script type="module">
import * as d3 from "https://cdn.skypack.dev/d3@7";
var receiver2 = new BroadcastChannel('channel2');
receiver2.onmessage = (msg) => {
var data = d3.csvParse(msg.data.replaceAll(";;;","\n"))
var width = 600
var height = 300
var margin = 60
var dateConverter = d3.timeParse("%Y-%m-%d")
data = data.filter(d=> (d.MONTH=="3") && (d.YEAR=="2020"))
data = data.map(d=> ({"DATE":dateConverter(d.DATE),"TMAX":+d.TMAX}))
var xScale = d3.scaleTime().range([margin , width - margin]).domain(d3.extent(data, (d,i) => d.DATE))
var yScale = d3.scaleLinear().range([height-margin , margin]).domain(d3.extent(data, (d,i) => d.TMAX))
d3.select("div#gohere2").selectAll("svg").remove()
var svg = d3.select("div#gohere2").append("svg")
.attr("width", width)
.attr("height", height)
svg.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.DATE))
.attr("cy", d => yScale(d.TMAX))
.attr("r", 5)
.style("stroke", "darkgrey" )
.style("stroke-width", 1)
.style("fill", "steelblue")
}
</script>
from IPython.display import display, HTML
d = data.to_csv(index=False,line_terminator='\n').replace('\n',';;;')[:-3]
js = """
<script>
var sender2 = new BroadcastChannel('channel2');
var message2 = '{}'
sender2.postMessage(message2);
</script>""".format(d)
display(HTML(js))
Your Turn¶
Remember this code? Use it to create a d3-fetch version and then a BroadcastChannels version
Here is a link to the data: https://raw.githubusercontent.com/dudaspm/d3plotbook/main/data/smallData.csv
d3-fetch¶
%%html
<div id="yourturn1"></div>
<script type="module">
import * as d3 from "https://cdn.skypack.dev/d3@7";
// USE D3-fetch TO GET THE DATA
drawingCircles = []
var width = 300
var height = 200
var svg = d3.select("div#yourturn1").append("svg")
.attr("width", width)
.attr("height", height)
// Observe
svg.selectAll("circle")
// Collect
.data(drawingCircles)
// Update
.join("circle")
.attr("cx", (d,i) => i*30)
.attr("cy", (d,i) => i*20)
.attr("r", (d,i) => d)
</script>
BroadcastChannels¶
%%html
<div id="yourturn1"></div>
<script type="module">
import * as d3 from "https://cdn.skypack.dev/d3@7";
// USE BroadcastChannels TO GET THE DATA
drawingCircles = []
var width = 300
var height = 200
var svg = d3.select("div#yourturn2").append("svg")
.attr("width", width)
.attr("height", height)
// Observe
svg.selectAll("circle")
// Collect
.data(drawingCircles)
// Update
.join("circle")
.attr("cx", (d,i) => i*30)
.attr("cy", (d,i) => i*20)
.attr("r", (d,i) => d)
</script>