Using Pandas to Work with Data - 🐼

Using Pandas to Work with Data - 🐼#

Why are we using Pandas?

  • Pandas us to quickly get our comma seperated value (csv) file.

  • Fill in missing values or NAs

  • Allow us the ability to preview the date (.head())

  • Filter our data

import pandas as pd
url="https://gist.githubusercontent.com/dudaspm/e518430a731ac11f52de9217311c674d/raw/4c2f2bd6639582a420ef321493188deebc4a575e/StateCollege2000-2020.csv"
data = []
data=pd.read_csv(url)
data = data.fillna(0) # replace all NAs with 0s
data.head()
DATE DAY MONTH YEAR PRCP SNOW TMAX TMIN WT_FOG WT_THUNDER WT_SLEET WT_HAIL WT_GLAZE WT_HIGHWINDS
0 1/1/2000 1 1 2000 0.00 0.0 44.0 23 0.0 0.0 0.0 0.0 0.0 0.0
1 1/2/2000 2 1 2000 0.00 0.0 52.0 23 0.0 0.0 0.0 0.0 0.0 0.0
2 1/3/2000 3 1 2000 0.01 0.0 60.0 35 0.0 0.0 0.0 0.0 0.0 0.0
3 1/4/2000 4 1 2000 0.12 0.0 62.0 54 0.0 0.0 0.0 0.0 0.0 0.0
4 1/5/2000 5 1 2000 0.04 0.0 60.0 30 0.0 0.0 0.0 0.0 0.0 0.0

Acknowledgement#

Cite as: Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeal, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. CITY:US420020. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ 02/22/2021.

Publications citing this dataset should also cite the following article: Matthew J. Menne, Imke Durre, Russell S. Vose, Byron E. Gleason, and Tamara G. Houston, 2012: An Overview of the Global Historical Climatology Network-Daily Database. J. Atmos. Oceanic Technol., 29, 897-910. doi:10.1175/JTECH-D-11-00103.1.

Use liability: NOAA and NCEI cannot provide any warranty as to the accuracy, reliability, or completeness of furnished data. Users assume responsibility to determine the usability of these data. The user is responsible for the results of any application of this data for other than its intended purpose.

Links: https://data.noaa.gov/onestop/

https://www.ncdc.noaa.gov/cdo-web/search

Numpy Absolute Basics - https://numpy.org/doc/2.2/user/absolute_beginners.html

Filtering#

Both Pandas and D3.js have a type of filtering that are very similar. I will discuss the Pandas version

Filter by year#

data[data.YEAR==2020].head()
DATE DAY MONTH YEAR PRCP SNOW TMAX TMIN WT_FOG WT_THUNDER WT_SLEET WT_HAIL WT_GLAZE WT_HIGHWINDS
7274 1/1/2020 1 1 2020 0.10 0.3 40.0 28 0.0 0.0 0.0 0.0 0.0 0.0
7275 1/2/2020 2 1 2020 0.00 0.0 36.0 27 0.0 0.0 0.0 0.0 0.0 0.0
7276 1/3/2020 3 1 2020 0.05 0.0 46.0 29 0.0 0.0 0.0 0.0 0.0 0.0
7277 1/4/2020 4 1 2020 0.28 0.0 49.0 42 0.0 0.0 0.0 0.0 0.0 0.0
7278 1/5/2020 5 1 2020 0.00 0.0 49.0 31 0.0 0.0 0.0 0.0 0.0 0.0
data[(data.YEAR==2020) & (data.MONTH==11)].head()
DATE DAY MONTH YEAR PRCP SNOW TMAX TMIN WT_FOG WT_THUNDER WT_SLEET WT_HAIL WT_GLAZE WT_HIGHWINDS
7579 11/1/2020 1 11 2020 0.00 0.0 46.0 38 0.0 0.0 0.0 0.0 0.0 0.0
7580 11/2/2020 2 11 2020 0.19 0.0 50.0 32 0.0 0.0 0.0 0.0 0.0 0.0
7581 11/3/2020 3 11 2020 0.00 0.0 40.0 33 0.0 0.0 0.0 0.0 0.0 0.0
7582 11/4/2020 4 11 2020 0.00 0.0 55.0 33 0.0 0.0 0.0 0.0 0.0 0.0
7583 11/5/2020 5 11 2020 0.00 0.0 69.0 34 0.0 0.0 0.0 0.0 0.0 0.0

Filter by WT_HAIL or WT_HighWinds#

data[(data.WT_HAIL==1) | (data.WT_HIGHWINDS==1)].head()
DATE DAY MONTH YEAR PRCP SNOW TMAX TMIN WT_FOG WT_THUNDER WT_SLEET WT_HAIL WT_GLAZE WT_HIGHWINDS
96 4/6/2000 6 4 2000 0.02 0.0 47.0 30 0.0 1.0 0.0 1.0 0.0 0.0
135 6/15/2000 15 6 2000 0.14 0.0 72.0 63 0.0 1.0 0.0 1.0 0.0 1.0
182 8/1/2000 1 8 2000 0.00 0.0 85.0 69 0.0 1.0 0.0 1.0 0.0 0.0
315 12/12/2000 12 12 2000 0.13 0.0 42.0 29 0.0 0.0 0.0 0.0 0.0 1.0
375 2/10/2001 10 2 2001 0.02 0.0 59.0 31 0.0 0.0 0.0 0.0 0.0 1.0