Bar Charts¶
Data¶
import pandas as pd
url="https://gist.githubusercontent.com/dudaspm/e518430a731ac11f52de9217311c674d/raw/4c2f2bd6639582a420ef321493188deebc4a575e/StateCollege2000-2020.csv"
data = []
data=pd.read_csv(url)
data = data.fillna(0) # replace all NAs with 0s
data.to_csv('weather.csv', index = False, header=True)
data.head()
DATE | DAY | MONTH | YEAR | PRCP | SNOW | TMAX | TMIN | WT_FOG | WT_THUNDER | WT_SLEET | WT_HAIL | WT_GLAZE | WT_HIGHWINDS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1/1/2000 | 1 | 1 | 2000 | 0.00 | 0.0 | 44.0 | 23 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 1/2/2000 | 2 | 1 | 2000 | 0.00 | 0.0 | 52.0 | 23 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 1/3/2000 | 3 | 1 | 2000 | 0.01 | 0.0 | 60.0 | 35 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 1/4/2000 | 4 | 1 | 2000 | 0.12 | 0.0 | 62.0 | 54 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 1/5/2000 | 5 | 1 | 2000 | 0.04 | 0.0 | 60.0 | 30 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Acknowledgement¶
Cite as: Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeal, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. CITY:US420020. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ 02/22/2021.
Publications citing this dataset should also cite the following article: Matthew J. Menne, Imke Durre, Russell S. Vose, Byron E. Gleason, and Tamara G. Houston, 2012: An Overview of the Global Historical Climatology Network-Daily Database. J. Atmos. Oceanic Technol., 29, 897-910. doi:10.1175/JTECH-D-11-00103.1.
Use liability: NOAA and NCEI cannot provide any warranty as to the accuracy, reliability, or completeness of furnished data. Users assume responsibility to determine the usability of these data. The user is responsible for the results of any application of this data for other than its intended purpose.
Links: https://data.noaa.gov/onestop/
https://www.ncdc.noaa.gov/cdo-web/search
Bostock, M., Ogievetsky, V., & Heer, J. (2011). D³ data-driven documents. IEEE transactions on visualization and computer graphics, 17(12), 2301-2309.
from IPython.display import HTML, Javascript, display
def configure_d3():
display(Javascript("""
require.config({
paths: {
d3: "https://d3js.org/d3.v6.min"
}
})"""))
configure_d3()
Group¶
%%html
<script type="text/javascript">
require(['d3'], function (d3) {
d3.csv('weather.csv')
.then(function(data) {
const dateConverter = d3.timeParse("%_m/%_d/%Y")
const daysOfTheWeek = d3.timeFormat("%a")
data = data.map(d=> ({"DATE":dateConverter(d.DATE),"PRCP":+d.PRCP}))
console.log(d3.group(data, d => daysOfTheWeek(d.DATE)))
})
.catch(function(error){
})
})
</script>
Rollup¶
%%html
<script type="text/javascript">
require(['d3'], function (d3) {
d3.csv('weather.csv')
.then(function(data) {
const dateConverter = d3.timeParse("%_m/%_d/%Y")
const daysOfTheWeek = d3.timeFormat("%a")
data = data.map(d=> ({"DATE":dateConverter(d.DATE),"PRCP":+d.PRCP}))
console.log(d3.rollup(data, v => d3.mean(v, d => d.PRCP), k => daysOfTheWeek(k.DATE)))
})
.catch(function(error){
})
})
</script>
ScaleBand¶
%%html
<script type="text/javascript">
require(['d3'], function (d3) {
const someData = [0,4,14,20,30,31,42,50,59,62]
tryingScaleBands = d3.scaleBand().range([0,100]).domain(d3.extent(someData))
d3.select("div#graph1").text(someData.map(d=>tryingScaleBands(d)))
})
</script>
<div id="graph1"></div>
%%html
<script type="text/javascript">
require(['d3'], function (d3) {
const someData = [0,4,14,20,30,31,42,50,59,62]
tryingScaleBands = d3.scaleBand().range([0,100]).domain(someData.map(d=>d))
d3.select("div#graph2").text("list length: "+(someData.length)+" scaleBand output: "+someData.map(d=>tryingScaleBands(d)))
})
</script>
<div id="graph2"></div>
This makes a bit more sense. As you can see, this is evenly spacing all of our data based on the maximum range (100) minus the minimal range value (0), then dividing this by the size (the number of values) in the list. Or…
scaleBand() has a couple of neat features that can help with bar chart design. Two in particular are called scaleBand().bandwidth and scaleBand().padding.
scaleBand().bandwidth - will give you the distance between points in scaleBand(). Meaning, it will be perfect for our bar charts, because we will be using rectangles for our bars.
scaleBand().padding - increases the padding between each bar.
Here is a graph showing the use of scaleBand() and using scaleBand().bandwidth to create the width of the rectangles.
The figure below maps out 10 values and padding for 0 to 1. 0 indicating no padding and 1 meaning 100% padding (or no rectangle at all). Notice how values and the axis get evenly spaced based on the padding as well.
Creating the rectangles themselves is another significant change from the line chart. With rectangles, you need 4 components:
svg.append(”g”).selectAll(”rect”)
.data(data)
.join(”rect”)
the x position (as it relates to the date)
.attr(”x”, (d,i)=>x(d.day))
the y position (as it relates to the interest), NOTE it is NOT the x-axis
.attr(”y”,(d,i)=>y(d.avg))
the width, which we talked about in regards to using the bandwidth
.attr(”width”,x.bandwidth)
this one is a bit, well, weird. I will explain below.
.attr(”height”, d => y(0) - y(d.avg))
.style(”stroke-width”, 2)
.style(”stroke”,”black”)
.style(”fill”, ”steelblue”)
OK, let’s talk about height. The weirdness stems from our 0,0 being in the top-left corner of the screen. Meaning, when we create a rectangle and add a ‘’height,’’ it goes down and not up.
Here is an example of a rectangle that starts in the middle of the box. When we add height, it goes down. You may think, well, can I use a negative height? The answer is no. What does this mean? Continue below.
This how we get the following. We first need to recall that the y is
.attr(”y”,(d,i)=>y(d.avg))
This is NOT the x-axis, but the position of the actual value at that given avg.
Next, we take the maximum height value from our y scaleLinear().
y = d3.scaleLinear().range([height-margin.bottom , margin.top]).domain([0,d3.max(backToList, (d,i) => d.avg)])
The largest value is (height-margin.bottom) or in another words, the smallest index in our y scaleLinear() ( y(0))
Last part we need to remember that we are starting at y(d.interest) and we trying to get back to the x-axis. Meaning, we need to subtract out y(d.avg)
We can write out the height two different ways:
.attr(”height”, d => y(0) - y(d.avg))
OR
.attr(”height”, d => (height-margin.bottom) - y(d.avg))
My choice? .attr("height", d => (height-margin.bottom) - y(d.avg))
because this will be true no matter what the minimal value is for y. It is constant.
Map - JavaScript | MDN. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map. Accessed 9 Apr. 2021.
%%html
<script type="text/javascript">
require(['d3'], function (d3) {
d3.csv('weather.csv')
.then(function(data) {
const dateConverter = d3.timeParse("%_m/%_d/%Y")
const daysOfTheWeek = d3.timeFormat("%a")
data = data.map(d=> ({"DATE":dateConverter(d.DATE),"PRCP":+d.PRCP}))
const nestedData = d3.rollup(data, v => d3.mean(v, d => d.PRCP), d => daysOfTheWeek(d.DATE))
console.log(nestedData)
var backToList = []
for (let [key, value] of nestedData) {
console.log(key + ' = ' + value)
backToList.push({"day":key,"avg":value})
}
console.log(backToList)
})
.catch(function(error){
})
})
</script>
Graph¶
%%html
<div id="graph3"></div>
<script type="text/javascript">
require(['d3'], function (d3) {
d3.csv('weather.csv')
.then(function(data) {
const dateConverter = d3.timeParse("%_m/%_d/%Y")
const daysOfTheWeek = d3.timeFormat("%a")
data = data.map(d=> ({"DATE":dateConverter(d.DATE),"PRCP":+d.PRCP}))
const nestedData = d3.rollup(data, v => d3.mean(v, d => d.PRCP), k => daysOfTheWeek(k.DATE))
var backToList = []
for (let [key, value] of nestedData) {
backToList.push({"day":key,"avg":value})
}
const width = 600
const height = 300
const margin = 60
const svg = d3.select("div#graph3").append("svg")
.attr("width", width)
.attr("height", height)
const x = d3.scaleBand().range([margin , width - margin]).domain(backToList.map(d=>d.day)).padding(0)
const y = d3.scaleLinear().range([height-margin , margin]).domain([0,d3.max(backToList, (d,i) => d.avg)])
const xAxis = d3.axisBottom().scale(x)
svg.append("g")
.attr("class", "axis")
.attr("transform", "translate(0," + (height-margin) + ")")
.call(xAxis)
svg.append("text")
.attr("x", width/2)
.attr("y", height-5)
.style("text-anchor", "middle")
.text("Days of the Week")
const yAxis = d3.axisLeft().scale(y)
svg.append("g")
.attr("class", "axis")
.attr("transform", "translate(" + margin + ",0)")
.call(yAxis)
svg.append("text")
.attr("transform", "rotate(-90,15,"+(height/2)+")")
.attr("x", 15)
.attr("y", height/2)
.style("text-anchor", "middle")
.text("Average Rainfall (inches)")
svg.append("g").selectAll("rect")
.data(backToList)
.join("rect")
.attr("x", (d,i)=>x(d.day))
.attr("y",(d,i)=>y(d.avg))
.attr("width",x.bandwidth)
.attr("height", d => (height-margin) - y(d.avg))
.style("stroke-width", 2)
.style("stroke","black")
.style("fill", "steelblue")
.append("title")
.text(d=>d.avg)
})
.catch(function(error){
})
})
</script>