Tuesday 15 September 2015

Visualizing Open NASA Data

NASA & Data

The National Aeronautics and Space Administration (NASA) is the United States government agency responsible for the civilian space program as well as aeronautics and aerospace research - Wikipedia. NASA has made public the data it collects, from satellites as we will see later, for a long time now. They have also been organising a yearly Space Apps Challenge event to promote innovative projects using the data.

This blog post will be about accessing Earth Science Data, focusing solely on CO2 information at a particular altitude range, processing the data, building a map of the Earth and finally presenting the result on the map.

Goal
The goal of this article is to allow you to create a map like or better than in the one below:
https://github.com/Lougarou/Co2D3/

Getting The Right Data

It is important to get the right data source, in this case CO2 data, and to do that we need to find the right NASA Mission. Some places to search include the following: 
Our search for CO2 data will finally end at the Jet Propulsion Laboratory: California Institute Of Technology.  
The next task is to identify which mission suits us, which narrows the data sources to ACOS Gosat Satellite and OCO-2 Satellite. OCO-2 is more accurate so we will be using this one mainly.

Data levels

If you tried to download the data you would notice that there are 2 ways to do so. 
  1. Download level 2 data from a python script
  2. Customize the data and download a level 3 data
Data levels classify the data in different categories of levels 0 to 4. As explained here:

Data Level
Description
Level 0Reconstructed, unprocessed instrument and payload data at full resolution, with any and all communications artifacts (e.g., synchronization frames, communications headers, duplicate data) removed. (In most cases, the EOS Data and Operations System (EDOS) provides these data to the data centers as production data sets for processing by the Science Data Processing Segment (SDPS) or by a SIPS to produce higher-level products.)
Level 1AReconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (e.g., platform ephemeris) computed and appended but not applied to Level 0 data.
Level 1BLevel 1A data that have been processed to sensor units (not all instruments have Level 1B source data).
Level 2Derived geophysical variables at the same resolution and location as Level 1 source data.
Level 3Variables mapped on uniform space-time grid scales, usually with some completeness and consistency.
Level 4Model output or results from analyses of lower-level data (e.g., variables derived from multiple measurements).

Meaning we are going to have to convert our level 2 or 3 data to a level 4 data (our final result).
However! if you tried to download the customized level 3 data you would notice one major shortcoming, that the altitudes are all set to be at 2600 km above the ground and have other limitations. 
This is not what we want for this article, so we are going to process the level 2 data which is very abundant.
The easiest way is to get the python download script here

Note: you can stop downloading after the first few netCDF (.nc4 files around 20mb to 70mb) files for this article. 12+ GB will take too much time on a Mauritian internet connection anyway!

Understanding The Data

Now you should have a bunch of files which looks as follows 

dataoco2_LtCO2_140906_B7101Ar_xxxxxxxxxxxxs.nc4 
dataoco2_LtCO2_140907_B7101Ar_xxxxxxxxxxxxs.nc4 
...

Your first thought would be to open it in a text editor but this is sadly in netCDF format and you require some libraries to easily access it. We will be using it's Java library for this later on. 
The easiest way to get an idea of data is view it with Panoply. Below is a screenshot of the data opened in Panoply.

The dataset shows the data's hierarchy and variables, this will be particularly useful when we will access it in the next section. If you wish to understand the variables, which you normally should, you can read the documentation here. But for this article we will only use longitude, latitude, xco2 and Sounding->altitude.

Processing Level 2 Data

Get your prefered version of netCDF library. An easy way to include it in Java is by making a maven project and adding the following to your project's pom.xml.

 <dependency>  
      <groupId>edu.ucar</groupId>  
      <artifactId>netcdf</artifactId>  
      <version>4.2</version>  
 </dependency>  

Accessing the data within JAVA is even easier. Below is a snippet to give an idea of this.

dataFile = NetcdfFile.open(filename, null);  
// Retrieve variables  
final Variable xco2 = dataFile.findVariable("xco2");  
final Variable latitude = dataFile.findVariable("latitude");  
final Variable longitude = dataFile.findVariable("longitude");  
final Variable altitude = dataFile.findGroup("Sounding").findVariable("altitude");  
ArrayList<Array> res= (ArrayList<Array>) dataFile.readArrays(new ArrayList<Variable>(){{  
     this.add(xco2);  
     this.add(altitude);  
     this.add(latitude);  
     this.add(longitude);  
}});  

But why is it necessary to even process the data? I mean, we have all the required data to plot this on a map. In order to understand why, let's actually plot it on a graph without any processing!

Making A Map

Let's move away from the world of Java, make a map and use D3 to draw it. Thankfully for us, the open source spirit lives inside the cartography community as well. We are going to make our own map instead of using Google Map or Openstreet Map. If you want to use the Google Map or another map you can skip this part.

The first step is get a shapefile of the earth from Natural Earth Data. Since the accuracy of our data is basically by 0.5 degrees of either longitude or latitude download the small scale data. Once you have downloaded your .shp file, the next step will be to convert it into GeoJSON format. In order to do that we need yet another tool ogr2ogr. Using og2ogr:

ogr2ogr -f "GeoJSON" "countries.geojson" "ne_10m_admin_0_countries.shp"

You should now have countries.geojson, but we will not stop there. Next install Topojson (using npm install -g Topojson). Then using Topojson:

Topojson -o countries.topojson countries.geojson

Notice that there is an 80% compression rate from GeoJSON to TopoJSON. You should now have a topojson file as follows http://pastebin.com/1sRWybhe

If you feel lost at any point consult this documentation http://bost.ocks.org/mike/map/

Displaying The Map

Download our example source code from Github to help with the following example. Make sure to run it on a web server. This is because D3 needs to know the path of the external files that are being read like countries.topojson.

We will be using D3.js and topojson.v0.min.js to visualize the map. Create a .html file and include the following libraries after downloading them.

 <script src="js/d3.min.js"></script>  
 <script src="js/topojson.v0.min.js"></script>  

Create a div element and then we can use some Javascript to read countries.topojson and to draw it in SVG.

 <div id="divMap"></div>  

var projection = d3.geo.mercator();
worldSvg = d3.select("#divMap").append("svg").attr("style","display: block;margin:auto;")
   .attr("width", 1000);
path = d3.geo.path()
   .projection(projection);

Those are the main parts of the code to generate the world map.

We start by telling D3 to create a Mercator Projection. This is because our data is in latitude and longitude, but we would like to draw it as pixels in SVG. D3 facilitates this job using projections.

d3.json("countries.topojson", function(error, topology) {  
 countries = g.selectAll("path").data(topojson.object(topology, topology.objects.countries_new)  
                .geometries);  
  countries.enter()  
 .append("path")  
 .attr("d", path)  
 .attr("class","country")  
 .attr("stroke","black").attr("fill", "white").attr("stroke-width","1");  

After that we read our countries.topojson file. If you look inside of it you would notice that every country is actually a geometry file. Using D3 we read into those geometries and draw them in worldSvg and svg dom element. The geometry points are in latitude and longitude but projection we defined before does the job of converting those points to the points on your screen. The rest of the .attr (attributes) specify the color and other details. Feel free to play with in our source.
Also, for some cool effects, try to use an orthographic projection.


You should get something similar to the above screenshot if everything went well.

Plotting Our Level 2 Data

First of all congratulations for getting this far. You are a patient one. Now if code a Java program to generate, for example a simple csv file as follows:

xco2,latitude,longitude
397.2427062988281,-40.625553131103516,175.2617950439453
396.0091857910156,-12.630569458007812,169.10618591308594
396.2909240722656,-12.609465599060059,169.06430053710938
396.73992919921875,-12.18317699432373,169.01231384277344
396.9307861328125,-12.164556503295898,169.0084228515625
396.8463134765625,-12.18188762664795,168.9932403564453
....

Then using the example D3 code. Draw small rectangles(small ones because of the smaller longitude and latitude variance in this one compared to the one inside our Github example). You should get the following picture.


Well this is not really useful and THIS is the reason why we should process it further.

Processing Level 2 Data Into Level 3 Data Into Level 4 Data

The algorithm involved in processing the level 2 data to a level 3 data is up to you. In this article we will talk about a simple clustering algorithm using QuadTile. Which basically involves sectioning our map into different squares. Then we keep an average CO2 PPM(Parts Per Million) value for every square(first column in the above csv). This also means that the more squares that we use the more accurate the level 3 data will be. A sample java algorithm is outlined below:

 ArrayList<ArrayList<Double>> quadTiles=new ArrayList<ArrayList<Double>>();  
 int chunks = 32;  
 for(int i=-179;i<=180;i+=360/chunks-1){  
      for(int j=-89;j<=90;j+=180/chunks-1){  
           ArrayList<Double> tile = new ArrayList<Double>();  
           //Bounds  
           tile.add((double) i);  
           tile.add((double) j);  
           tile.add((double) (i+360/chunks-1));  
           tile.add((double) (j+180/chunks-1));  
           tile.add((double) 0);  
           quadTiles.add(tile);  
      }  
 }   
 for(int i=0;i<res.get(0).getSize();i++) {  
      for(int j=0;j<quadTiles.size();j++){  
           //check if point lies inside the tile  
           if(res.get(2).getDouble(i)>=Math.min(quadTiles.get(j).get(1),quadTiles.get(j).get(3)) &&  
             res.get(2).getDouble(i)<=Math.max(quadTiles.get(j).get(1),quadTiles.get(j).get(3)) &&  
             res.get(3).getDouble(i)>=Math.min(quadTiles.get(j).get(0),quadTiles.get(j).get(2)) &&  
             res.get(3).getDouble(i)<=Math.max(quadTiles.get(j).get(0),quadTiles.get(j).get(2))){  
             if(quadTiles.get(j).get(4).equals(0)){  
                  quadTiles.get(j).set(4,res.get(0).getDouble(i));  
             }else {  
                  quadTiles.get(j).set(4,(res.get(0).getDouble(i)+quadTiles.get(j).get(4))/2);  
             }  
           }  
      }  
 }  

We are basically creating chunks/tiles in the map and then we check whether the points of level 2 data fall into those tiles. We are using -179 instead of -180 to start our longitude because D3 has issues with displaying -180, this is a quick sacrifice which will affect accuracy and should be avoided if possible.
Then if a point lies inside that tile, we update it's average CO2 PPM value. Then if we use this data to generate a csv file as seen here http://pastebin.com/wZVYj0p2 and if we plot it(the level 3 Data we just made) the result is as follows(the level 4 Data).




This plot is much more indicative of the sinks and souces of CO2, but still not very accurate, as we have used only one .nc4 file and sampled it as an average on 1620 tiles only.


Final words & Inspirations

With enough of processing and data, one can plot a lot of accurate things like the following:



The second diagram shows CO2 PPM concentration at a 2600 km altitude(2014) while the first one at 600km altitude (2013). That was why altitude was important. They are both level 3 Data from https://co2.jpl.nasa.gov/ . The second diagram actually contains 64000 points.

This blog post only talks CO2 data but there is an enormous amount of data on other things like sea level or heat. You could may be monitor the sea level around the Maldives or look at the heat levels in India!
It is now up to you to use this power, may be for a good cause!

- Yog


6 comments:

  1. Nice post, hmm you just got me interested in playing with that amount of data.

    ReplyDelete
  2. Replies
    1. I have to say though, that we are very bottlenecked by our download speed. Hopefully that will change in the future. A quick solution to this is to buy a server for example in the US, download all the data from it then process and simplify it from there, but oh well the server costs money too.

      Delete
  3. Titanium Shift knob | Etched steel - Titanium Arts
    A new titanium ion color blade titanium curling iron for ceramic vs titanium curling iron the tittanium Tic-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac-Tac. titanium phone case

    ReplyDelete