From: https://github.com/ksatola
Version: 1.0.0

Introduction

The data comes from the website of Polish Inspectorate Of Environmental Protection (GIOS - Glowny Inspektorat Ochrony Srodowiska) dowloaded on Feb 18th, 2020.

Methodology

For the download, I used web scraping techniques. As there are different types of XSLS files for different years with different structure and single or multiple sheets, the ETL logic is defined as follows:

  • Identify emission measurement stations codes (Krakow, PL) to be used as a filter while transforming data format from XLS files into and analytical table (Metadane_wer20190813.xlsx).
  • For all available years of observations extract separately hourly (1g) and daily (24g) measurements for the selected stations.
  • For any emmision measure, if there are more than one measurement station, average their (not NaN) values, so for each pollutant there is a single value per hour or day. This approach will treat the Krakow area as one point of measurement and will partially address the missing measurement values for specific stations, or stations changing over time. In this last case, averaging the values allows numerical continuity for the subject area over all years.

Comments

The downloaded content consist of metadata files regarding emission measurement stadions, their codes, locations and measurements characteristics over time as well as aggregated statistics. All downloaded files are in form of ZIP archives. The ZIP archives contain XSLS files. Measurements are gathered in files by year (from 2000 to 2018), emission measurement station, and pollutants. The data covers hourly and daily averages of pollutants measurements.

There is a major metadata format and naming convention change in 2016. I had to tak this into consideration while working on the automated ETL pipeline.

Currently, there are eight emission measurement stadions in the Krakow area taking different sets of measurements:

Three of the stations were renamed in 2016 and other five were colsed between 2004 and 2018:

  • 'MpKrakowWIOSPrad6115', # closed on 2010-02-28
  • 'MpKrakowWSSEKapi6108', # closed on 2009-12-31
  • 'MpKrakowWSSEPrad6102', # closed on 2004-12-31
  • 'MpKrakowWSSERPod6113', # closed on 2004-12-31
  • 'MpKrakTelime' # closed on 2018-06-01

The first two stations in the Krakow area (MpKrakAlKras, MpKrakBulwar) were initiatied on Jan 1st, 2003.


In [1]:
%load_ext autoreload
In [2]:
%autoreload 2
In [3]:
import sys
sys.path.insert(0, '../src')
In [4]:
import pandas as pd
import numpy as np
import time
import os
import random
import re
import fnmatch

from pathlib import Path
import zipfile
import csv

import requests
import urllib.request
from bs4 import BeautifulSoup
In [5]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 1000)
In [6]:
from prepare import (
    get_gios_pollution_data_files,
    extract_archived_data,
    get_pollutant_measures_for_locations,
    get_files_for_name_pattern,
    build_gios_analytical_view
)

Data Web Scraping

In [7]:
# Set the url to the website and access the site with our requests library
url = 'http://powietrze.gios.gov.pl/pjp/archives'
response = requests.get(url)
response
Out[7]:
<Response [200]>
In [8]:
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/
soup = BeautifulSoup(response.text, "html.parser")
In [9]:
# We use the method .find to locate <ul> of id
ul = soup.find("ul", {"id": "archive_files"})
print(ul)
<ul class="list-unstyled" id="archive_files">
<li> <a href="/pjp/archives/downloadFile/102">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Kody stacji pomiarowych</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/305">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Metadane - stacje i stanowiska pomiarowe</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/304">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Statystyki z lat 2000-2018</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/223">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2000 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/224">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2001 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/225">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2002 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/226">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2003 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/202">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2004 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/203">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2005 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/227">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2006 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/228">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2007 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/229">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2008 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/230">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2009 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/231">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2010 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/232">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2011 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/233">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2012 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/234">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2013 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/302">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2014 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/236">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2015 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/242">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2016 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/262">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2017 roku</p>
</div>
</a>
</li>
<li> <a href="/pjp/archives/downloadFile/303">
<div class="col-md-1 col-sm-2 col-xs-3 text-center" style="color: black;">
<div style="width: 50px; height: 52px; display: table; margin: 0 auto;"><img alt="" src="/pjp/assets-0.0.31/img/zip.png"/></div>
<p class="archive_file_name">Wyniki pomiarów z 2018 roku</p>
</div>
</a>
</li>
</ul>
In [10]:
lis = ul.find_all('li')
resources = []

for li in lis:
    file_name = li.find("p", {"class": "archive_file_name"}).getText()
    file_url = li.find("a")['href'].split('/')
    #print(file_url)
    resources.append((file_name, file_url[3]+'/'+file_url[4]))

resources
Out[10]:
[('Kody stacji pomiarowych', 'downloadFile/102'),
 ('Metadane - stacje i stanowiska pomiarowe', 'downloadFile/305'),
 ('Statystyki z lat 2000-2018', 'downloadFile/304'),
 ('Wyniki pomiarów z 2000 roku', 'downloadFile/223'),
 ('Wyniki pomiarów z 2001 roku', 'downloadFile/224'),
 ('Wyniki pomiarów z 2002 roku', 'downloadFile/225'),
 ('Wyniki pomiarów z 2003 roku', 'downloadFile/226'),
 ('Wyniki pomiarów z 2004 roku', 'downloadFile/202'),
 ('Wyniki pomiarów z 2005 roku', 'downloadFile/203'),
 ('Wyniki pomiarów z 2006 roku', 'downloadFile/227'),
 ('Wyniki pomiarów z 2007 roku', 'downloadFile/228'),
 ('Wyniki pomiarów z 2008 roku', 'downloadFile/229'),
 ('Wyniki pomiarów z 2009 roku', 'downloadFile/230'),
 ('Wyniki pomiarów z 2010 roku', 'downloadFile/231'),
 ('Wyniki pomiarów z 2011 roku', 'downloadFile/232'),
 ('Wyniki pomiarów z 2012 roku', 'downloadFile/233'),
 ('Wyniki pomiarów z 2013 roku', 'downloadFile/234'),
 ('Wyniki pomiarów z 2014 roku', 'downloadFile/302'),
 ('Wyniki pomiarów z 2015 roku', 'downloadFile/236'),
 ('Wyniki pomiarów z 2016 roku', 'downloadFile/242'),
 ('Wyniki pomiarów z 2017 roku', 'downloadFile/262'),
 ('Wyniki pomiarów z 2018 roku', 'downloadFile/303')]
In [11]:
links = [a["href"] for a in ul.select("a[href]")]
links
Out[11]:
['/pjp/archives/downloadFile/102',
 '/pjp/archives/downloadFile/305',
 '/pjp/archives/downloadFile/304',
 '/pjp/archives/downloadFile/223',
 '/pjp/archives/downloadFile/224',
 '/pjp/archives/downloadFile/225',
 '/pjp/archives/downloadFile/226',
 '/pjp/archives/downloadFile/202',
 '/pjp/archives/downloadFile/203',
 '/pjp/archives/downloadFile/227',
 '/pjp/archives/downloadFile/228',
 '/pjp/archives/downloadFile/229',
 '/pjp/archives/downloadFile/230',
 '/pjp/archives/downloadFile/231',
 '/pjp/archives/downloadFile/232',
 '/pjp/archives/downloadFile/233',
 '/pjp/archives/downloadFile/234',
 '/pjp/archives/downloadFile/302',
 '/pjp/archives/downloadFile/236',
 '/pjp/archives/downloadFile/242',
 '/pjp/archives/downloadFile/262',
 '/pjp/archives/downloadFile/303']

Download GIOS data files

In [12]:
%%time

download_base_url = 'http://powietrze.gios.gov.pl/pjp/archives'
path_to_save = "/Users/ksatola/Documents/git/air-polution/data/gios/etl"

get_gios_pollution_data_files(download_base_url, path_to_save)
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/102
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/305
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/304
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/223
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/224
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/225
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/226
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/202
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/203
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/227
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/228
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/229
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/230
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/231
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/232
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/233
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/234
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/302
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/236
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/242
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/262
ok: 200 http://powietrze.gios.gov.pl/pjp/archives/downloadFile/303
CPU times: user 8.46 s, sys: 8.85 s, total: 17.3 s
Wall time: 4min 41s

Extract files to a folder

In [13]:
%%time

source_dir = '/Users/ksatola/Documents/git/air-polution/data/gios/etl'
target_dir = '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/'
file_search_pattern = '*.zip'

extract_archived_data(source_dir, target_dir, file_search_pattern)
Found directory: /Users/ksatola/Documents/git/air-polution/data/gios/etl
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2000 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2001 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Statystyki z lat 2000-2018.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2017 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2016 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2010 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2011 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Kody stacji pomiarowych.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2007 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2006 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2014 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2015 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2009 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2008 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2003 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2002 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Metadane - stacje i stanowiska pomiarowe.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2004 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2005 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2018 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2013 roku.zip
Extracting: /Users/ksatola/Documents/git/air-polution/data/gios/etl/Wyniki pomiarów z 2012 roku.zip
CPU times: user 3.01 s, sys: 1.26 s, total: 4.28 s
Wall time: 4.57 s

Data Transformation

In [14]:
# Emission measurement stations codes in the Krakow area

ems_codes = [
    
    # Active stations
    'MpKrakOsPias', # from 2016-01-01, pm25, pm10, http://powietrze.gios.gov.pl/pjp/current/station_details/info/10139
    'MpKrakWadow',  # from 2017-01-01, pm25, pm10, http://powietrze.gios.gov.pl/pjp/current/station_details/info/10447
    'MpKrakSwoszo', # from 2019-01-01, pm10, http://powietrze.gios.gov.pl/pjp/current/station_details/info/11303
    'MpKrakZloRog', # from 2016-01-01, pm10, http://powietrze.gios.gov.pl/pjp/current/station_details/info/10123
    'MpKrakAlKras', # from 2003-01-01, pm25, pm10, CO, NO2, NOx, benzen, http://powietrze.gios.gov.pl/pjp/current/station_details/info/400
    'MpKrakBujaka', # from 2010-01-01, pm25, pm10, CO, NO2, NOx, benzen, SO2, O3 http://powietrze.gios.gov.pl/pjp/current/station_details/info/401
    'MpKrakBulwar', # from 2003-01-01, pm25, pm10, CO, NO2, NOx, benzen, SO2, http://powietrze.gios.gov.pl/pjp/current/station_details/info/402
    'MpKrakDietla', # from 2016-01-01, pm10, NO2, NOx, http://powietrze.gios.gov.pl/pjp/current/station_details/info/10121
    
    # Old codes and historical stations
    'MpKrakowWIOSAKra6117', # MpKrakAlKras
    'MpKrakowWIOSBuja6119', # MpKrakBujaka
    'MpKrakowWIOSBulw6118', # MpKrakBulwar
    'MpKrakowWIOSPrad6115', # closed on 2010-02-28
    'MpKrakowWSSEKapi6108', # closed on 2009-12-31
    'MpKrakowWSSEPrad6102', # closed on 2004-12-31
    'MpKrakowWSSERPod6113', # closed on 2004-12-31
    'MpKrakTelime'          # closed on 2018-06-01
]
In [15]:
source_dir = '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/'

years = [
    '2000',
    '2001',
    '2002',
    '2003',
    '2004',
    '2005',
    '2006',
    '2007',
    '2008',
    '2009',
    '2010',
    '2011',
    '2012',
    '2013',
    '2014',
    '2015',
    '2016',
    '2017',
    '2018',
    '2019'
]
In [16]:
%%time

# Get all 1g files from 2016-2019 inclusive
file_search_pattern = '201[6789]_*_1g.xlsx'

get_files_for_name_pattern(source_dir, file_search_pattern)
CPU times: user 1.17 ms, sys: 681 µs, total: 1.86 ms
Wall time: 1.77 ms
Out[16]:
['/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_O3_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_NO2_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_PM2.5_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_PM25_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Hg(TGM)_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_SO2_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Hg(TGM)_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_C6H6_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_CO_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Hg(TGM)_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_C6H6_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_NOx_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_PM10_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_NO2_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_O3_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_SO2_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_NOx_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_PM25_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_CO_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_NO2_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_PM10_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_NOx_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_C6H6_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_SO2_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_O3_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_PM10_1g.xlsx',
 '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_CO_1g.xlsx']
In [17]:
%%time

file = '2018_SO2_1g.xlsx'
full_path_to_file = os.path.join(source_dir, file)

# Take measurement from a file
measurement_name = file.split('_')[1]
measurement_name

df1 = get_pollutant_measures_for_locations(full_path_to_file, ems_codes, measurement_name, '2018')
df1.head()
/Users/ksatola/anaconda3/lib/python3.7/site-packages/numpy/lib/nanfunctions.py:1115: RuntimeWarning: All-NaN slice encountered
  overwrite_input=overwrite_input)
CPU times: user 11.9 s, sys: 101 ms, total: 12 s
Wall time: 12 s
Out[17]:
SO2_mean SO2_median SO2_min SO2_max SO2_std SO2_sum SO2_obs_num
Datetime
2018-01-01 01:00:00 8.07894 8.07894 8.07894 8.07894 NaN 8.07894 1
2018-01-01 02:00:00 NaN NaN NaN NaN NaN 0.00000 0
2018-01-01 03:00:00 NaN NaN NaN NaN NaN 0.00000 0
2018-01-01 04:00:00 NaN NaN NaN NaN NaN 0.00000 0
2018-01-01 05:00:00 NaN NaN NaN NaN NaN 0.00000 0
In [18]:
%%time

file = '2017_C6H6_1g.xlsx'
full_path_to_file = os.path.join(source_dir, file)

# Take measurement from a file
measurement_name = file.split('_')[1]
measurement_name

df2 = get_pollutant_measures_for_locations(full_path_to_file, ems_codes, measurement_name, '2017')
df2.head()
CPU times: user 4.52 s, sys: 27.9 ms, total: 4.55 s
Wall time: 4.56 s
Out[18]:
C6H6_mean C6H6_median C6H6_min C6H6_max C6H6_std C6H6_sum C6H6_obs_num
Datetime
2017-01-01 01:00:00 5.895385 5.895385 5.53153 6.25924 0.514569 11.79077 2
2017-01-01 02:00:00 6.491270 6.491270 5.64930 7.33324 1.190725 12.98254 2
2017-01-01 03:00:00 7.056075 7.056075 5.99393 8.11822 1.502100 14.11215 2
2017-01-01 04:00:00 8.039045 8.039045 6.58716 9.49093 2.053275 16.07809 2
2017-01-01 05:00:00 8.633105 8.633105 7.06201 10.20420 2.221864 17.26621 2
In [19]:
# Merge data frames on datetime index
#df3 = pd.DataFrame() # works also if one dfs is empty
merged = pd.merge(df1, df2, how='outer', left_index=True, right_index=True)
merged.head()
Out[19]:
SO2_mean SO2_median SO2_min SO2_max SO2_std SO2_sum SO2_obs_num C6H6_mean C6H6_median C6H6_min C6H6_max C6H6_std C6H6_sum C6H6_obs_num
Datetime
2017-01-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN 5.895385 5.895385 5.53153 6.25924 0.514569 11.79077 2.0
2017-01-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN 6.491270 6.491270 5.64930 7.33324 1.190725 12.98254 2.0
2017-01-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN 7.056075 7.056075 5.99393 8.11822 1.502100 14.11215 2.0
2017-01-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN 8.039045 8.039045 6.58716 9.49093 2.053275 16.07809 2.0
2017-01-01 05:00:00 NaN NaN NaN NaN NaN NaN NaN 8.633105 8.633105 7.06201 10.20420 2.221864 17.26621 2.0
In [20]:
merged.tail()
Out[20]:
SO2_mean SO2_median SO2_min SO2_max SO2_std SO2_sum SO2_obs_num C6H6_mean C6H6_median C6H6_min C6H6_max C6H6_std C6H6_sum C6H6_obs_num
Datetime
2018-12-31 20:00:00 6.531955 6.531955 5.81358 7.25033 1.015936 13.06391 2.0 NaN NaN NaN NaN NaN NaN NaN
2018-12-31 21:00:00 7.601315 7.601315 5.50472 9.69791 2.965033 15.20263 2.0 NaN NaN NaN NaN NaN NaN NaN
2018-12-31 22:00:00 8.165295 8.165295 5.41679 10.91380 3.886973 16.33059 2.0 NaN NaN NaN NaN NaN NaN NaN
2018-12-31 23:00:00 8.826955 8.826955 5.91481 11.73910 4.118395 17.65391 2.0 NaN NaN NaN NaN NaN NaN NaN
2019-01-01 00:00:00 9.130160 9.130160 6.26282 11.99750 4.055031 18.26032 2.0 NaN NaN NaN NaN NaN NaN NaN
In [21]:
df1.shape
Out[21]:
(8760, 7)
In [22]:
df2.shape
Out[22]:
(8760, 7)
In [23]:
merged.shape
Out[23]:
(17520, 14)

Build 1g analytical view

In [24]:
%%time

df_1g = build_gios_analytical_view(years=years, sampling_freq='1g', root_folder=source_dir, ems_codes=ems_codes)
Year: 2000 - df_full.shape (0, 0)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2000_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2000_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2000_O3_1g.xlsx - measurement_name: O3
----------------------------------------

Year: 2001 - df_full.shape (8784, 14)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_SO2_1g.xlsx - measurement_name: SO2
----------------------------------------

Year: 2002 - df_full.shape (17544, 21)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_O3_1g.xlsx - measurement_name: O3
----------------------------------------

Year: 2003 - df_full.shape (26304, 28)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_NO2_1g.xlsx - measurement_name: NO2
----------------------------------------

Year: 2004 - df_full.shape (35064, 35)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_SO2_1g.xlsx - measurement_name: SO2
----------------------------------------

Year: 2005 - df_full.shape (43848, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_PM10_1g.xlsx - measurement_name: PM10
----------------------------------------

Year: 2006 - df_full.shape (52608, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_CO_1g.xlsx - measurement_name: CO
----------------------------------------

Year: 2007 - df_full.shape (61368, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_PM2.5_1g.xlsx - measurement_name: PM25
----------------------------------------

Year: 2008 - df_full.shape (70128, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_SO2_1g.xlsx - measurement_name: SO2
----------------------------------------

Year: 2009 - df_full.shape (78912, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_PM10_1g.xlsx - measurement_name: PM10
----------------------------------------

Year: 2010 - df_full.shape (87672, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_PM2.5_1g.xlsx - measurement_name: PM25
----------------------------------------

Year: 2011 - df_full.shape (96432, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_PM10_1g.xlsx - measurement_name: PM10
----------------------------------------

Year: 2012 - df_full.shape (105192, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_NOx_1g.xlsx - measurement_name: NOx
----------------------------------------

Year: 2013 - df_full.shape (113977, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_NO2_1g.xlsx - measurement_name: NO2
----------------------------------------

Year: 2014 - df_full.shape (122737, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_SO2_1g.xlsx - measurement_name: SO2
----------------------------------------

Year: 2015 - df_full.shape (131497, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_PM25_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_Hg(TGM)_1g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_NO2_1g.xlsx - measurement_name: NO2
----------------------------------------

Year: 2016 - df_full.shape (140257, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_PM2.5_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Hg(TGM)_1g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_O3_1g.xlsx - measurement_name: O3
----------------------------------------

Year: 2017 - df_full.shape (149041, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Hg(TGM)_1g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_SO2_1g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_PM25_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_CO_1g.xlsx - measurement_name: CO
----------------------------------------

Year: 2018 - df_full.shape (157801, 56)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_O3_1g.xlsx - measurement_name: O3
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_NO2_1g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_PM25_1g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Hg(TGM)_1g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_PM10_1g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_NOx_1g.xlsx - measurement_name: NOx
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_CO_1g.xlsx - measurement_name: CO
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_C6H6_1g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_SO2_1g.xlsx - measurement_name: SO2
----------------------------------------

Year: 2019 - df_full.shape (166561, 56)
----------------------------------------

CPU times: user 13min 43s, sys: 4.54 s, total: 13min 48s
Wall time: 13min 50s
In [25]:
df_1g.shape
Out[25]:
(166561, 56)
In [26]:
df_1g.head()
Out[26]:
C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum CO_max CO_mean CO_median CO_min CO_obs_num CO_std CO_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum NOx_max NOx_mean NOx_median NOx_min NOx_obs_num NOx_std NOx_sum O3_max O3_mean O3_median O3_min O3_obs_num O3_std O3_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2000-01-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN 0.0 NaN NaN NaN NaN 0.0 NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 62.0 49.333333 48.0 38.0 3.0 12.055428 148.0 170.0 121.000000 105.0 88.0 3.0 43.278170 363.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 56.0 46.666667 47.0 37.0 3.0 9.504385 140.0 181.0 116.000000 96.0 71.0 3.0 57.662813 348.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 52.0 44.666667 46.0 36.0 3.0 8.082904 134.0 162.0 115.333333 106.0 78.0 3.0 42.770706 346.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 05:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 53.0 43.666667 43.0 35.0 3.0 9.018500 131.0 154.0 113.000000 105.0 80.0 3.0 37.643060 339.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [27]:
df_1g.tail()
Out[27]:
C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum CO_max CO_mean CO_median CO_min CO_obs_num CO_std CO_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum NOx_max NOx_mean NOx_median NOx_min NOx_obs_num NOx_std NOx_sum O3_max O3_mean O3_median O3_min O3_obs_num O3_std O3_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2018-12-31 20:00:00 2.76298 1.713593 1.75158 0.62622 3.0 1.068886 5.14078 0.72661 0.585620 0.585620 0.44463 2.0 0.199390 1.17124 67.4538 45.557525 42.55340 29.6695 4.0 15.854688 182.2301 212.6990 97.096300 69.32805 37.0301 4.0 78.963510 388.3852 23.7920 23.7920 23.7920 23.7920 1.0 NaN 23.7920 41.8932 29.219671 28.5545 15.3653 7.0 9.677638 204.5377 25.1614 20.140967 23.6990 11.5625 3.0 7.465067 60.4229 7.25033 6.531955 6.531955 5.81358 2.0 1.015936 13.06391
2018-12-31 21:00:00 3.61236 2.154820 1.68318 1.16892 3.0 1.288190 6.46446 0.77990 0.660650 0.660650 0.54140 2.0 0.168645 1.32130 56.6802 41.029525 38.75120 29.9355 4.0 11.413497 164.1181 165.4850 81.358750 61.15985 37.6303 4.0 57.981694 325.4350 21.1737 21.1737 21.1737 21.1737 1.0 NaN 21.1737 53.3517 38.305571 37.9557 27.0842 7.0 9.636778 268.1390 35.7650 30.312100 32.6308 22.5405 3.0 6.910436 90.9363 9.69791 7.601315 7.601315 5.50472 2.0 2.965033 15.20263
2018-12-31 22:00:00 3.35900 2.026807 1.43370 1.28772 3.0 1.156020 6.08042 0.54587 0.535710 0.535710 0.52555 2.0 0.014368 1.07142 39.3984 32.127175 32.45540 24.1995 4.0 7.101207 128.5087 98.4181 55.409125 47.31220 28.5940 4.0 31.921626 221.6365 27.0917 27.0917 27.0917 27.0917 1.0 NaN 27.0917 50.7413 39.311457 37.1867 30.2702 7.0 7.212393 275.1802 35.1773 30.402933 31.0801 24.9514 3.0 5.146472 91.2088 10.91380 8.165295 8.165295 5.41679 2.0 3.886973 16.33059
2018-12-31 23:00:00 3.17358 2.017590 1.51083 1.36836 3.0 1.003648 6.05277 0.54440 0.497900 0.497900 0.45140 2.0 0.065761 0.99580 37.9001 28.491200 27.40825 21.2482 4.0 7.657392 113.9648 85.6241 46.385250 37.66430 24.5883 4.0 28.021586 185.5410 32.3864 32.3864 32.3864 32.3864 1.0 NaN 32.3864 56.5092 42.888271 44.2766 31.8605 7.0 7.730065 300.2179 34.8589 32.065400 33.2028 28.1345 3.0 3.503519 96.1962 11.73910 8.826955 8.826955 5.91481 2.0 4.118395 17.65391
2019-01-01 00:00:00 2.78365 1.957933 1.68273 1.40742 3.0 0.728220 5.87380 0.56017 0.515095 0.515095 0.47002 2.0 0.063746 1.03019 37.5347 27.325025 25.80380 20.1578 4.0 8.543424 109.3001 79.4643 43.043625 35.21135 22.2875 4.0 26.750338 172.1745 34.5747 34.5747 34.5747 34.5747 1.0 NaN 34.5747 58.9693 48.698329 49.1963 37.4338 7.0 7.329880 340.8883 44.9021 38.654567 36.6074 34.4542 3.0 5.516595 115.9637 11.99750 9.130160 9.130160 6.26282 2.0 4.055031 18.26032
In [28]:
df_1g.sample(5)
Out[28]:
C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum CO_max CO_mean CO_median CO_min CO_obs_num CO_std CO_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum NOx_max NOx_mean NOx_median NOx_min NOx_obs_num NOx_std NOx_sum O3_max O3_mean O3_median O3_min O3_obs_num O3_std O3_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2018-03-09 02:00:00 14.6407 9.556283 8.371350 5.65680 3.0 4.607675 28.66885 1.84287 1.65907 1.65907 1.47527 2.0 0.259932 3.31814 67.5374 52.612175 54.03655 34.83820 4.0 13.492997 210.44870 533.433 326.583000 286.147 200.605 4.0 148.347785 1306.332 2.58988 2.58988 2.58988 2.58988 1.0 NaN 2.58988 149.5000 114.131813 110.0360 86.5432 8.0 21.258410 913.0545 141.4940 101.296267 86.70480 75.69000 3.0 35.245209 303.8888 6.98110 6.923020 6.923020 6.86494 2.0 0.082138 13.84604
2004-06-18 23:00:00 NaN NaN NaN NaN NaN NaN NaN 1.00000 0.70000 0.70000 0.40000 2.0 0.424264 1.40000 55.0000 45.000000 44.00000 36.00000 3.0 9.539392 135.00000 124.000 72.333333 54.000 39.000 3.0 45.368859 217.000 NaN NaN NaN NaN NaN NaN NaN 54.0000 36.500000 36.5000 19.0000 2.0 24.748737 73.0000 NaN NaN NaN NaN NaN NaN NaN 4.00000 4.000000 4.000000 4.00000 2.0 0.000000 8.00000
2015-06-08 14:00:19.020000 0.3000 0.300000 0.300000 0.30000 1.0 NaN 0.30000 0.49916 0.39086 0.39086 0.28256 2.0 0.153159 0.78172 71.4505 30.252313 11.31010 7.99634 3.0 35.717127 90.75694 161.895 61.559000 12.100 10.682 3.0 86.896417 184.677 92.99680 92.99680 92.99680 92.99680 1.0 NaN 92.99680 39.4836 35.250333 36.9433 29.3241 3.0 5.287103 105.7510 13.8651 11.458133 11.10930 9.40000 3.0 2.252897 34.3744 6.76999 4.731125 4.731125 2.69226 2.0 2.883391 9.46225
2008-11-18 14:00:00 0.7000 0.700000 0.700000 0.70000 1.0 NaN 0.70000 0.16000 0.16000 0.16000 0.16000 1.0 NaN 0.16000 81.0000 44.333333 30.00000 22.00000 3.0 32.005208 133.00000 276.000 119.666667 52.000 31.000 3.0 135.795189 359.000 40.00000 40.00000 40.00000 40.00000 1.0 NaN 40.00000 59.0000 35.333333 29.0000 18.0000 3.0 21.221059 106.0000 15.0000 14.000000 14.00000 13.00000 2.0 1.414214 28.0000 26.00000 17.333333 23.000000 3.00000 3.0 12.503333 52.00000
2016-06-05 15:00:00 1.0000 0.918085 0.918085 0.83617 2.0 0.115845 1.83617 0.80519 0.55546 0.55546 0.30573 2.0 0.353172 1.11092 75.5272 44.189950 38.83565 23.56130 4.0 23.243374 176.75980 212.400 94.433333 38.700 32.200 3.0 102.213812 283.300 68.99890 68.99890 68.99890 68.99890 1.0 NaN 68.99890 33.7213 18.862950 15.1281 12.7374 6.0 8.253398 113.1777 16.4247 10.254800 7.68853 6.65117 3.0 5.368406 30.7644 2.54552 2.045820 2.045820 1.54612 2.0 0.706683 4.09164
In [29]:
# Create a save directory if not exists
save_dir = '/Users/ksatola/Documents/git/air-polution/data/final'
Path(save_dir).mkdir(parents=True, exist_ok=True)
In [30]:
# Save
gios_1g_all_file = '/Users/ksatola/Documents/git/air-polution/data/final/gios_1g_all.csv'
df_1g.to_csv(gios_1g_all_file, encoding="utf-8", index=True)
In [31]:
# Test read
df_1g_read = pd.read_csv(gios_1g_all_file, encoding='utf-8', sep=",", index_col="Datetime")
df_1g_read.head()
Out[31]:
C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum CO_max CO_mean CO_median CO_min CO_obs_num CO_std CO_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum NOx_max NOx_mean NOx_median NOx_min NOx_obs_num NOx_std NOx_sum O3_max O3_mean O3_median O3_min O3_obs_num O3_std O3_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2000-01-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN 0.0 NaN NaN NaN NaN 0.0 NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 62.0 49.333333 48.0 38.0 3.0 12.055428 148.0 170.0 121.000000 105.0 88.0 3.0 43.278170 363.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 56.0 46.666667 47.0 37.0 3.0 9.504385 140.0 181.0 116.000000 96.0 71.0 3.0 57.662813 348.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 52.0 44.666667 46.0 36.0 3.0 8.082904 134.0 162.0 115.333333 106.0 78.0 3.0 42.770706 346.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-01 05:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 53.0 43.666667 43.0 35.0 3.0 9.018500 131.0 154.0 113.000000 105.0 80.0 3.0 37.643060 339.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [32]:
assert df_1g.shape == df_1g_read.shape

Build 24g analytical view

In [34]:
%%time

df_24g = build_gios_analytical_view(years=years, sampling_freq='24g', root_folder=source_dir, ems_codes=ems_codes)
Year: 2000 - df_full.shape (0, 0)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2000_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2000_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2000_PM10_24g.xlsx - measurement_name: PM10
----------------------------------------

Year: 2001 - df_full.shape (366, 14)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2001_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
----------------------------------------

Year: 2002 - df_full.shape (731, 14)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2002_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
----------------------------------------

Year: 2003 - df_full.shape (1096, 21)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2003_PM2.5_24g.xlsx - measurement_name: PM25
----------------------------------------

Year: 2004 - df_full.shape (1461, 28)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2004_PM2.5_24g.xlsx - measurement_name: PM25
----------------------------------------

Year: 2005 - df_full.shape (1827, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2005_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
----------------------------------------

Year: 2006 - df_full.shape (2192, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2006_C6H6_24g.xlsx - measurement_name: C6H6
----------------------------------------

Year: 2007 - df_full.shape (2557, 42)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2007_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
----------------------------------------

Year: 2008 - df_full.shape (2922, 49)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2008_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
----------------------------------------

Year: 2009 - df_full.shape (3288, 70)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2009_NO2_24g.xlsx - measurement_name: NO2
----------------------------------------

Year: 2010 - df_full.shape (3653, 105)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2010_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
----------------------------------------

Year: 2011 - df_full.shape (4018, 105)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_Ca2+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_NH4+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_Mg2+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_NO3-(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_SO42_(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_EC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_Na+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_K+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_OC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2011_C6H6_24g.xlsx - measurement_name: C6H6
----------------------------------------

Year: 2012 - df_full.shape (4383, 105)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_SO42_(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_NO3-(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_EC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_Mg2+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_Na+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_NH4+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_OC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_K+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2012_Ca2+(PM2.5)_24g.xlsx - measurement_name: PM25
----------------------------------------

Year: 2013 - df_full.shape (4751, 105)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_K+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_NO3-(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_OC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_EC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_SO42_(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_Na+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_Ca2+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_NH4+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2013_Mg2+(PM2.5)_24g.xlsx - measurement_name: PM25
----------------------------------------

Year: 2014 - df_full.shape (5481, 105)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_Ca2+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_NH4+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_Mg2+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_SO42_(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_Na+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_K+(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_NO3-(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_OC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_EC(PM2.5)_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2014_formaldehyd_24g.xlsx - measurement_name: formaldehyd
----------------------------------------

Year: 2015 - df_full.shape (5846, 112)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_Jony_w_PM25_24g.xlsx - measurement_name: Jony
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_DBah(PM10)_24g.xlsx - measurement_name: DBah(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_Hg(TGM)_24g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_PM25_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2015_C6H6_24g.xlsx - measurement_name: C6H6
----------------------------------------

Year: 2016 - df_full.shape (6211, 119)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Jony_PM2_5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Hg(TGM)_24g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_PM2.5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2016_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
----------------------------------------

Year: 2017 - df_full.shape (6577, 119)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Jony_PM2_5_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_PM25_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2017_Hg(TGM)_24g.xlsx - measurement_name: Hg(TGM)
----------------------------------------

Year: 2018 - df_full.shape (6942, 119)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Jony_PM25_24g.xlsx - measurement_name: Jony
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_formaldehyd_24g.xlsx - measurement_name: formaldehyd
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_PM10_24g.xlsx - measurement_name: PM10
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_NO2_24g.xlsx - measurement_name: NO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_PM25_24g.xlsx - measurement_name: PM25
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Ni(PM10)_24g.xlsx - measurement_name: Ni(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_BaA(PM10)_24g.xlsx - measurement_name: BaA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Pb(PM10)_24g.xlsx - measurement_name: Pb(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_As(PM10)_24g.xlsx - measurement_name: As(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_DBahA(PM10)_24g.xlsx - measurement_name: DBahA(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_C6H6_24g.xlsx - measurement_name: C6H6
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_BkF(PM10)_24g.xlsx - measurement_name: BkF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Hg(TGM)_24g.xlsx - measurement_name: Hg(TGM)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_BjF(PM10)_24g.xlsx - measurement_name: BjF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_SO2_24g.xlsx - measurement_name: SO2
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_BaP(PM10)_24g.xlsx - measurement_name: BaP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_IP(PM10)_24g.xlsx - measurement_name: IP(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_BbF(PM10)_24g.xlsx - measurement_name: BbF(PM10)
File: /Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/2018_Cd(PM10)_24g.xlsx - measurement_name: Cd(PM10)
----------------------------------------

Year: 2019 - df_full.shape (7307, 119)
----------------------------------------

CPU times: user 44.3 s, sys: 495 ms, total: 44.8 s
Wall time: 45.8 s
In [35]:
df_24g.shape
Out[35]:
(7307, 119)
In [36]:
df_24g.head()
Out[36]:
As(PM10)_max As(PM10)_mean As(PM10)_median As(PM10)_min As(PM10)_obs_num As(PM10)_std As(PM10)_sum BaA(PM10)_max BaA(PM10)_mean BaA(PM10)_median BaA(PM10)_min BaA(PM10)_obs_num BaA(PM10)_std BaA(PM10)_sum BaP(PM10)_max BaP(PM10)_mean BaP(PM10)_median BaP(PM10)_min BaP(PM10)_obs_num BaP(PM10)_std BaP(PM10)_sum BbF(PM10)_max BbF(PM10)_mean BbF(PM10)_median BbF(PM10)_min BbF(PM10)_obs_num BbF(PM10)_std BbF(PM10)_sum BjF(PM10)_max BjF(PM10)_mean BjF(PM10)_median BjF(PM10)_min BjF(PM10)_obs_num BjF(PM10)_std BjF(PM10)_sum BkF(PM10)_max BkF(PM10)_mean BkF(PM10)_median BkF(PM10)_min BkF(PM10)_obs_num BkF(PM10)_std BkF(PM10)_sum C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum Cd(PM10)_max Cd(PM10)_mean Cd(PM10)_median Cd(PM10)_min Cd(PM10)_obs_num Cd(PM10)_std Cd(PM10)_sum DBah(PM10)_max DBah(PM10)_mean DBah(PM10)_median DBah(PM10)_min DBah(PM10)_obs_num DBah(PM10)_std DBah(PM10)_sum DBahA(PM10)_max DBahA(PM10)_mean DBahA(PM10)_median DBahA(PM10)_min DBahA(PM10)_obs_num DBahA(PM10)_std DBahA(PM10)_sum IP(PM10)_max IP(PM10)_mean IP(PM10)_median IP(PM10)_min IP(PM10)_obs_num IP(PM10)_std IP(PM10)_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum Ni(PM10)_max Ni(PM10)_mean Ni(PM10)_median Ni(PM10)_min Ni(PM10)_obs_num Ni(PM10)_std Ni(PM10)_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum Pb(PM10)_max Pb(PM10)_mean Pb(PM10)_median Pb(PM10)_min Pb(PM10)_obs_num Pb(PM10)_std Pb(PM10)_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2000-01-01 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 135.9 132.95 132.95 130.0 2.0 4.171930 265.9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 106.0 106.0 106.0 106.0 1.0 NaN 106.0
2000-01-02 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 129.1 122.55 122.55 116.0 2.0 9.263099 245.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 93.0 93.0 93.0 93.0 1.0 NaN 93.0
2000-01-03 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 41.2 37.10 37.10 33.0 2.0 5.798276 74.2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 42.0 42.0 42.0 42.0 1.0 NaN 42.0
2000-01-04 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 36.4 31.20 31.20 26.0 2.0 7.353911 62.4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 38.0 38.0 38.0 38.0 1.0 NaN 38.0
2000-01-05 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 33.9 28.95 28.95 24.0 2.0 7.000357 57.9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 33.0 33.0 33.0 33.0 1.0 NaN 33.0
In [37]:
df_24g.tail()
Out[37]:
As(PM10)_max As(PM10)_mean As(PM10)_median As(PM10)_min As(PM10)_obs_num As(PM10)_std As(PM10)_sum BaA(PM10)_max BaA(PM10)_mean BaA(PM10)_median BaA(PM10)_min BaA(PM10)_obs_num BaA(PM10)_std BaA(PM10)_sum BaP(PM10)_max BaP(PM10)_mean BaP(PM10)_median BaP(PM10)_min BaP(PM10)_obs_num BaP(PM10)_std BaP(PM10)_sum BbF(PM10)_max BbF(PM10)_mean BbF(PM10)_median BbF(PM10)_min BbF(PM10)_obs_num BbF(PM10)_std BbF(PM10)_sum BjF(PM10)_max BjF(PM10)_mean BjF(PM10)_median BjF(PM10)_min BjF(PM10)_obs_num BjF(PM10)_std BjF(PM10)_sum BkF(PM10)_max BkF(PM10)_mean BkF(PM10)_median BkF(PM10)_min BkF(PM10)_obs_num BkF(PM10)_std BkF(PM10)_sum C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum Cd(PM10)_max Cd(PM10)_mean Cd(PM10)_median Cd(PM10)_min Cd(PM10)_obs_num Cd(PM10)_std Cd(PM10)_sum DBah(PM10)_max DBah(PM10)_mean DBah(PM10)_median DBah(PM10)_min DBah(PM10)_obs_num DBah(PM10)_std DBah(PM10)_sum DBahA(PM10)_max DBahA(PM10)_mean DBahA(PM10)_median DBahA(PM10)_min DBahA(PM10)_obs_num DBahA(PM10)_std DBahA(PM10)_sum IP(PM10)_max IP(PM10)_mean IP(PM10)_median IP(PM10)_min IP(PM10)_obs_num IP(PM10)_std IP(PM10)_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum Ni(PM10)_max Ni(PM10)_mean Ni(PM10)_median Ni(PM10)_min Ni(PM10)_obs_num Ni(PM10)_std Ni(PM10)_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum Pb(PM10)_max Pb(PM10)_mean Pb(PM10)_median Pb(PM10)_min Pb(PM10)_obs_num Pb(PM10)_std Pb(PM10)_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2018-12-27 00:00:00 1.06 0.780000 0.78 0.5 2.0 0.395980 1.56 6.02000 6.02000 6.02000 6.02000 1.0 NaN 6.02000 6.07 4.752500 4.745 3.45 4.0 1.218589 19.01000 2.69000 2.69000 2.69000 2.69000 1.0 NaN 2.69000 2.08000 2.08000 2.08000 2.08000 1.0 NaN 2.08000 2.00000 2.00000 2.00000 2.00000 1.0 NaN 2.00000 NaN NaN NaN NaN NaN NaN NaN 0.37000 0.315000 0.315 0.26 2.0 0.077782 0.63000 NaN NaN NaN NaN NaN NaN NaN 0.4700 0.4700 0.4700 0.4700 1.0 NaN 0.4700 3.99000 3.99000 3.99000 3.99000 1.0 NaN 3.99000 NaN NaN NaN NaN NaN NaN NaN 1.00 0.625000 0.62500 0.25 2.0 0.530330 1.25000 19.36 18.195 18.87 15.68 4.0 1.713816 72.78 16.32 16.32 16.32 16.32 1.0 NaN 16.32 0.00969 0.007035 0.007035 0.00438 2.0 0.003755 0.01407 NaN NaN NaN NaN NaN NaN NaN
2018-12-28 00:00:00 1.06 0.686667 0.50 0.5 3.0 0.323316 2.06 6.02000 6.02000 6.02000 6.02000 1.0 NaN 6.02000 6.07 4.768000 4.830 3.45 5.0 1.055898 23.84000 2.69000 2.69000 2.69000 2.69000 1.0 NaN 2.69000 2.08000 2.08000 2.08000 2.08000 1.0 NaN 2.08000 2.00000 2.00000 2.00000 2.00000 1.0 NaN 2.00000 NaN NaN NaN NaN NaN NaN NaN 0.37000 0.306667 0.290 0.26 3.0 0.056862 0.92000 NaN NaN NaN NaN NaN NaN NaN 0.4700 0.4700 0.4700 0.4700 1.0 NaN 0.4700 3.99000 3.99000 3.99000 3.99000 1.0 NaN 3.99000 NaN NaN NaN NaN NaN NaN NaN 1.94 1.063333 1.00000 0.25 3.0 0.846778 3.19000 27.65 22.016 23.81 11.99 5.0 6.295084 110.08 23.42 23.42 23.42 23.42 1.0 NaN 23.42 0.01195 0.008673 0.009690 0.00438 3.0 0.003886 0.02602 NaN NaN NaN NaN NaN NaN NaN
2018-12-29 00:00:00 1.06 0.686667 0.50 0.5 3.0 0.323316 2.06 6.02000 6.02000 6.02000 6.02000 1.0 NaN 6.02000 6.07 4.768000 4.830 3.45 5.0 1.055898 23.84000 2.69000 2.69000 2.69000 2.69000 1.0 NaN 2.69000 2.08000 2.08000 2.08000 2.08000 1.0 NaN 2.08000 2.00000 2.00000 2.00000 2.00000 1.0 NaN 2.00000 NaN NaN NaN NaN NaN NaN NaN 0.37000 0.306667 0.290 0.26 3.0 0.056862 0.92000 NaN NaN NaN NaN NaN NaN NaN 0.4700 0.4700 0.4700 0.4700 1.0 NaN 0.4700 3.99000 3.99000 3.99000 3.99000 1.0 NaN 3.99000 NaN NaN NaN NaN NaN NaN NaN 1.94 1.063333 1.00000 0.25 3.0 0.846778 3.19000 23.15 18.350 17.70 14.50 5.0 3.357871 91.75 20.64 20.64 20.64 20.64 1.0 NaN 20.64 0.01195 0.008673 0.009690 0.00438 3.0 0.003886 0.02602 NaN NaN NaN NaN NaN NaN NaN
2018-12-30 00:00:00 1.06 0.686667 0.50 0.5 3.0 0.323316 2.06 6.02000 6.02000 6.02000 6.02000 1.0 NaN 6.02000 6.07 4.768000 4.830 3.45 5.0 1.055898 23.84000 2.69000 2.69000 2.69000 2.69000 1.0 NaN 2.69000 2.08000 2.08000 2.08000 2.08000 1.0 NaN 2.08000 2.00000 2.00000 2.00000 2.00000 1.0 NaN 2.00000 NaN NaN NaN NaN NaN NaN NaN 0.37000 0.306667 0.290 0.26 3.0 0.056862 0.92000 NaN NaN NaN NaN NaN NaN NaN 0.4700 0.4700 0.4700 0.4700 1.0 NaN 0.4700 3.99000 3.99000 3.99000 3.99000 1.0 NaN 3.99000 NaN NaN NaN NaN NaN NaN NaN 1.94 1.063333 1.00000 0.25 3.0 0.846778 3.19000 22.84 17.136 16.70 13.48 5.0 3.579648 85.68 19.74 19.74 19.74 19.74 1.0 NaN 19.74 0.01195 0.008673 0.009690 0.00438 3.0 0.003886 0.02602 NaN NaN NaN NaN NaN NaN NaN
2018-12-31 00:00:00 0.50 0.500000 0.50 0.5 3.0 0.000000 1.50 4.21025 4.21025 4.21025 4.21025 1.0 NaN 4.21025 4.48 3.569046 3.780 2.36 5.0 0.773945 17.84523 1.84231 1.84231 1.84231 1.84231 1.0 NaN 1.84231 1.90216 1.90216 1.90216 1.90216 1.0 NaN 1.90216 1.35311 1.35311 1.35311 1.35311 1.0 NaN 1.35311 NaN NaN NaN NaN NaN NaN NaN 0.28623 0.238743 0.260 0.17 3.0 0.060961 0.71623 NaN NaN NaN NaN NaN NaN NaN 0.2472 0.2472 0.2472 0.2472 1.0 NaN 0.2472 2.46422 2.46422 2.46422 2.46422 1.0 NaN 2.46422 NaN NaN NaN NaN NaN NaN NaN 1.68 0.982390 1.01717 0.25 3.0 0.715634 2.94717 20.22 15.644 16.28 10.71 5.0 3.850303 78.22 17.83 17.83 17.83 17.83 1.0 NaN 17.83 0.00693 0.005540 0.005650 0.00404 3.0 0.001448 0.01662 NaN NaN NaN NaN NaN NaN NaN
In [38]:
df_24g.sample(5)
Out[38]:
As(PM10)_max As(PM10)_mean As(PM10)_median As(PM10)_min As(PM10)_obs_num As(PM10)_std As(PM10)_sum BaA(PM10)_max BaA(PM10)_mean BaA(PM10)_median BaA(PM10)_min BaA(PM10)_obs_num BaA(PM10)_std BaA(PM10)_sum BaP(PM10)_max BaP(PM10)_mean BaP(PM10)_median BaP(PM10)_min BaP(PM10)_obs_num BaP(PM10)_std BaP(PM10)_sum BbF(PM10)_max BbF(PM10)_mean BbF(PM10)_median BbF(PM10)_min BbF(PM10)_obs_num BbF(PM10)_std BbF(PM10)_sum BjF(PM10)_max BjF(PM10)_mean BjF(PM10)_median BjF(PM10)_min BjF(PM10)_obs_num BjF(PM10)_std BjF(PM10)_sum BkF(PM10)_max BkF(PM10)_mean BkF(PM10)_median BkF(PM10)_min BkF(PM10)_obs_num BkF(PM10)_std BkF(PM10)_sum C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum Cd(PM10)_max Cd(PM10)_mean Cd(PM10)_median Cd(PM10)_min Cd(PM10)_obs_num Cd(PM10)_std Cd(PM10)_sum DBah(PM10)_max DBah(PM10)_mean DBah(PM10)_median DBah(PM10)_min DBah(PM10)_obs_num DBah(PM10)_std DBah(PM10)_sum DBahA(PM10)_max DBahA(PM10)_mean DBahA(PM10)_median DBahA(PM10)_min DBahA(PM10)_obs_num DBahA(PM10)_std DBahA(PM10)_sum IP(PM10)_max IP(PM10)_mean IP(PM10)_median IP(PM10)_min IP(PM10)_obs_num IP(PM10)_std IP(PM10)_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum Ni(PM10)_max Ni(PM10)_mean Ni(PM10)_median Ni(PM10)_min Ni(PM10)_obs_num Ni(PM10)_std Ni(PM10)_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum Pb(PM10)_max Pb(PM10)_mean Pb(PM10)_median Pb(PM10)_min Pb(PM10)_obs_num Pb(PM10)_std Pb(PM10)_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2013-03-07 23:59:59 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.100 2.150 2.150 1.200 2.0 1.343503 4.300 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2016-04-01 00:00:00 2.008 1.7545 1.7545 1.501 2.0 0.358503 3.509 4.928 4.928 4.928 4.928 1.0 NaN 4.928 5.712 4.974 4.974 4.236 2.0 1.043690 9.948 1.920 1.920 1.920 1.920 1.0 NaN 1.920 3.497 3.497 3.497 3.497 1.0 NaN 3.497 4.083 4.083 4.083 4.083 1.0 NaN 4.083 NaN NaN NaN NaN 0.0 NaN 0.0 1.943 1.2850 1.2850 0.627 2.0 0.930553 2.570 NaN NaN NaN NaN NaN NaN NaN 0.536 0.536 0.536 0.536 1.0 NaN 0.536 4.075 4.075 4.075 4.075 1.0 NaN 4.075 NaN NaN NaN NaN NaN NaN NaN 2.056 1.816 1.816 1.576 2.0 0.339411 3.632 41.0 32.525 30.15 28.8 4.0 5.697587 130.1 22.300 22.3000 22.3000 22.300 1.0 NaN 22.300 0.07153 0.047235 0.047235 0.02294 2.0 0.034358 0.09447 NaN NaN NaN NaN NaN NaN NaN
2002-01-06 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 73.6 73.600 73.60 73.6 1.0 NaN 73.6 NaN NaN NaN NaN 0.0 NaN 0.000 NaN NaN NaN NaN NaN NaN NaN 64.625 64.625 64.625 64.625 1.0 NaN 64.625
2014-05-14 00:00:00 0.500 0.4660 0.4660 0.432 2.0 0.048083 0.932 0.376 0.376 0.376 0.376 1.0 NaN 0.376 0.985 0.842 0.842 0.699 2.0 0.202233 1.684 1.019 1.019 1.019 1.019 1.0 NaN 1.019 2.438 2.438 2.438 2.438 1.0 NaN 2.438 0.587 0.587 0.587 0.587 1.0 NaN 0.587 NaN NaN NaN NaN 0.0 NaN 0.0 0.283 0.2605 0.2605 0.238 2.0 0.031820 0.521 NaN NaN NaN NaN NaN NaN NaN 0.051 0.051 0.051 0.051 1.0 NaN 0.051 0.350 0.350 0.350 0.350 1.0 NaN 0.350 NaN NaN NaN NaN NaN NaN NaN 1.498 1.200 1.200 0.902 2.0 0.421436 2.400 15.0 14.500 14.50 14.0 2.0 0.707107 29.0 12.000 12.0000 12.0000 12.000 1.0 NaN 12.000 0.00700 0.006000 0.006000 0.00500 2.0 0.001414 0.01200 NaN NaN NaN NaN NaN NaN NaN
2009-10-08 00:00:00 NaN NaN NaN NaN 0.0 NaN 0.000 1.300 1.300 1.300 1.300 1.0 NaN 1.300 6.600 6.600 6.600 6.600 1.0 NaN 6.600 3.500 3.500 3.500 3.500 1.0 NaN 3.500 NaN NaN NaN NaN NaN NaN NaN 2.300 2.300 2.300 2.300 1.0 NaN 2.300 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 NaN 0.000 NaN NaN NaN NaN NaN NaN NaN 0.900 0.900 0.900 0.900 1.0 NaN 0.900 0.200 0.200 0.200 0.200 1.0 NaN 0.200 31.0 31.0 31.0 31.0 1.0 NaN 31.0 NaN NaN NaN NaN 0.0 NaN 0.000 46.0 42.000 42.00 38.0 2.0 5.656854 84.0 22.458 22.4165 22.4165 22.375 2.0 0.05869 44.833 NaN NaN NaN NaN 0.0 NaN 0.00000 10.000 7.024 7.024 4.048 2.0 4.2087 14.048
In [39]:
# Create a save directory if not exists
save_dir = '/Users/ksatola/Documents/git/air-polution/data/final'
Path(save_dir).mkdir(parents=True, exist_ok=True)
In [40]:
# Save
gios_24g_all_file = '/Users/ksatola/Documents/git/air-polution/data/final/gios_24g_all.csv'
df_24g.to_csv(gios_24g_all_file, encoding="utf-8", index=True)
In [41]:
# Test read
df_24g_read = pd.read_csv(gios_24g_all_file, encoding='utf-8', sep=",", index_col="Datetime")
df_24g_read.head()
Out[41]:
As(PM10)_max As(PM10)_mean As(PM10)_median As(PM10)_min As(PM10)_obs_num As(PM10)_std As(PM10)_sum BaA(PM10)_max BaA(PM10)_mean BaA(PM10)_median BaA(PM10)_min BaA(PM10)_obs_num BaA(PM10)_std BaA(PM10)_sum BaP(PM10)_max BaP(PM10)_mean BaP(PM10)_median BaP(PM10)_min BaP(PM10)_obs_num BaP(PM10)_std BaP(PM10)_sum BbF(PM10)_max BbF(PM10)_mean BbF(PM10)_median BbF(PM10)_min BbF(PM10)_obs_num BbF(PM10)_std BbF(PM10)_sum BjF(PM10)_max BjF(PM10)_mean BjF(PM10)_median BjF(PM10)_min BjF(PM10)_obs_num BjF(PM10)_std BjF(PM10)_sum BkF(PM10)_max BkF(PM10)_mean BkF(PM10)_median BkF(PM10)_min BkF(PM10)_obs_num BkF(PM10)_std BkF(PM10)_sum C6H6_max C6H6_mean C6H6_median C6H6_min C6H6_obs_num C6H6_std C6H6_sum Cd(PM10)_max Cd(PM10)_mean Cd(PM10)_median Cd(PM10)_min Cd(PM10)_obs_num Cd(PM10)_std Cd(PM10)_sum DBah(PM10)_max DBah(PM10)_mean DBah(PM10)_median DBah(PM10)_min DBah(PM10)_obs_num DBah(PM10)_std DBah(PM10)_sum DBahA(PM10)_max DBahA(PM10)_mean DBahA(PM10)_median DBahA(PM10)_min DBahA(PM10)_obs_num DBahA(PM10)_std DBahA(PM10)_sum IP(PM10)_max IP(PM10)_mean IP(PM10)_median IP(PM10)_min IP(PM10)_obs_num IP(PM10)_std IP(PM10)_sum NO2_max NO2_mean NO2_median NO2_min NO2_obs_num NO2_std NO2_sum Ni(PM10)_max Ni(PM10)_mean Ni(PM10)_median Ni(PM10)_min Ni(PM10)_obs_num Ni(PM10)_std Ni(PM10)_sum PM10_max PM10_mean PM10_median PM10_min PM10_obs_num PM10_std PM10_sum PM25_max PM25_mean PM25_median PM25_min PM25_obs_num PM25_std PM25_sum Pb(PM10)_max Pb(PM10)_mean Pb(PM10)_median Pb(PM10)_min Pb(PM10)_obs_num Pb(PM10)_std Pb(PM10)_sum SO2_max SO2_mean SO2_median SO2_min SO2_obs_num SO2_std SO2_sum
Datetime
2000-01-01 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 135.9 132.95 132.95 130.0 2.0 4.171930 265.9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 106.0 106.0 106.0 106.0 1.0 NaN 106.0
2000-01-02 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 129.1 122.55 122.55 116.0 2.0 9.263099 245.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 93.0 93.0 93.0 93.0 1.0 NaN 93.0
2000-01-03 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 41.2 37.10 37.10 33.0 2.0 5.798276 74.2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 42.0 42.0 42.0 42.0 1.0 NaN 42.0
2000-01-04 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 36.4 31.20 31.20 26.0 2.0 7.353911 62.4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 38.0 38.0 38.0 38.0 1.0 NaN 38.0
2000-01-05 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 33.9 28.95 28.95 24.0 2.0 7.000357 57.9 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 33.0 33.0 33.0 33.0 1.0 NaN 33.0
In [42]:
df_24g_read.shape
Out[42]:
(7307, 119)
In [43]:
assert df_24g.shape == df_24g_read.shape

Testing

In [ ]:
%%time

df_full = pd.DataFrame()

for year in years:
    
    file_search_pattern = year+'_*_1g.xlsx'
    files = get_files_for_name_pattern(folder, file_search_pattern)
        
    df_for_year = pd.DataFrame()
    
    print(f"Year: {year} - df_full.shape {df_full.shape}")# - files: {files}")
    
    for file in files:
        # Take measurement from a file name
        measurement_name = file.split('_')[1]
        
        # Manual corrections to inconsistent names created by data supplier
        file_name = os.path.basename(file)
        
        # Unify headers, instead of PM2.5 we should have PM25
        if re.search('PM2.5', file_name):
            #if file_name in ['2012_PM2.5_1g.xlsx', '2016_PM2.5_1g.xlsx']: 
            measurement_name = 'PM25'
        
        #print(measurement_name)
        print(f"File: {file} - measurement_name: {measurement_name}")
        
        # Gather data for a measurement
        df_measure = get_pollutant_measures_for_locations(file, ems_codes, measurement_name, year)
        
        print(f"df_measure: {df_measure.head(2)}")
        
        # Merge data frames on datetime index (add more columns for the specified time range)
        df_for_year = pd.merge(df_for_year, df_measure, how='outer', left_index=True, right_index=True)
        
        print(f"{measurement_name} - df_measure.shape {df_measure.shape} - df_for_year.shape {df_for_year.shape}")
        
        print(f"df_full.columns: {df_full.columns} - df_for_year.columns {df_for_year.columns}")
    
    # Append new rows with new range of datetimes
    df_full = df_full.append(df_for_year, ignore_index = False, verify_integrity=True, sort=False) # keep the appended df index intact
    
In [ ]:
df_full.shape
In [ ]:
df_full.head()
In [ ]:
df_full.tail()
In [ ]:
df_full.sample(5)
In [ ]:
df_full.to_csv('/Users/ksatola/Documents/git/air-polution/data/final/gios_df_full_24g_ok.csv', encoding="utf-8", index=False)

Check format for different years

In [ ]:
%%time

extracted_dir = '/Users/ksatola/Documents/git/air-polution/data/gios/etl/extracted/'
#file = '2017_C6H6_1g.xlsx'
#file = '2015_CO_1g.xlsx'
#file = '2014_CO_1g.xlsx'
#file = '2012_NOx_1g.xlsx'
file = '2005_NOx_1g.xlsx'

full_path_to_file = os.path.join(extracted_dir, file)

# 2016-2019
#dft = pd.read_excel(full_path_to_file, header=1) # read 2nd row as header
# 2012-2015
dft = pd.read_excel(full_path_to_file, header=0) # read 1st row as header

dft.rename(columns={dft.columns[0]: datetime_col_name}, inplace = True)
dft.head(10)
In [ ]:
# Get columns defined in ems_codes and datetime
cols_in_scope = ems_codes
cols_in_scope.append(dft.columns[0]) # add time column
dft = dft.loc[:, dft.columns.isin(cols_in_scope)] # handle not existing columns
dft.head()
In [ ]:
# Remove first X rows as they contain metadata
#dft = dft.iloc[4:, :] # 2016-2019
dft = dft.iloc[2:, :] # 2015, 2014
dft.head()
In [ ]:
cols = dft.columns[1:]
In [ ]:
# Replace commas with dots (in all columns but the first one - detatime)
# for 2016-2019
# not needed for 2012-2015
dft[cols] = dft[cols].apply(lambda x: x.str.replace(',','.'))
In [ ]:
# Not used when only datetime column is present
if len(cols) > 0:
    # Change columns type
    dft[dft.columns[0]] = dft[dft.columns[0]].apply(pd.to_datetime)
    dft[cols] = dft[cols].apply(pd.to_numeric)
    dft.head()
In [ ]:
# Set datetime index
dft = dft.set_index(dft.columns[0])
dft.head()
In [ ]:
# Calculate statistics for the measure
cols = dft.columns
df_return = pd.DataFrame(index=dft.index.copy())
In [ ]:
# If the measurements are available from multiple stations
if len(cols) >= 1:
    df_return[measurement_name+'_mean'] = dft[cols].mean(axis=1, skipna=True)
    df_return[measurement_name+'_median'] = dft[cols].median(axis=1, skipna=True)
    df_return[measurement_name+'_min'] = dft[cols].min(axis=1, skipna=True)
    df_return[measurement_name+'_max'] = dft[cols].max(axis=1, skipna=True)
    df_return[measurement_name+'_std'] = dft[cols].std(axis=1, skipna=True)
    df_return[measurement_name+'_sum'] = dft[cols].sum(axis=1, skipna=True)
    df_return[measurement_name+'_obs_num'] = dft.apply(lambda x: x.count(), axis=1) # count not-null values in a row
In [ ]:
df_return.head(10)
In [ ]:
df_return.tail(5)
In [ ]: