Saturday, October 5, 2019

(Finally) Getting going with python (and a bit of history)

I certainly recognize the power of scientific programming. Programming spreadsheets is obvious and there is nothing to be ashamed of there (see weeks 2-6 in my Computers in Earth and Space Exploration course). I started off in grad school with Mathemematica and appreciated the notebook style of computations and integrated graphics and text. I also taught myself enough fortran to get the main calculations for my dissertation completed.

Modeling profile development with simple diffusion (Arrowsmith, et al., 1998) using fortran code and Mathematica for the basic graphics.

I took some programming courses (Pascal) back in grad school. I was not very good at it. Once the professor even said "that is the stupidest way I have ever seen for doing that" in office hours. I was not too offended; I barely understood what I was doing. Nevertheless, I got the big picture and have stumbled along ever since.

Professor George Hilley taught me many things. One thing he was able to do after a fair amount of cajoling was to get me to start in MATLAB. The more data-oriented and matrix handling of MATLAB ended being something I could use effectively. We also worked with Don Ragan a lot on MATLAB and Latex. While I am no expert, I have taught the basics to students over the years see weeks 6-9 in my Computers in Earth and Space Exploration course). I cannot say that I have had any really great programming projects, but the various analysis and plotting needs have been satisfied. For what it is worth, I even set up a GitHub page to hold a few things. Dr. Olaf Zielke is a serious MATLAB programmer and wrote some impressive tools with GUIS for his PhD and related work. TopoToolbox is another set of MATLAB-based tools which I had the opportunity to learn and appreciate their transformative power for a lot of geomorphic analyses.

I have been watching the progressive adoption of Python and related tools in my little scientific bubble over the last 5 or so years. I have not had the time to do much as far as learning until recently, however. While my MATLAB expertise won't go away, I have appreciated the fact that it is hard to share and teach with students and colleagues who don't have access to the rather expensive licenses for MATLAB. On the other hand, Python and related tools are open and apparently so adaptable and customizeable.

Recently, I finally had an excuse and the time to get my feet wet with Python. Chris Crosby and I (OpenTopography) helped out with a short course From point clouds and full-waveform data to DEM analysis (Sep-30 to Oct-4 2019) led by Professor Bodo Bookhagen and his team.

Specific catchment area computed with the tools from Rheiwalt, et al., 2019. The basic processing was in Python with some c code and then visualized using Displaz.

Here is some basic full waveform lidar processing from Bookhagen and Rheinwalt again basic procesing in Python and then visualized using Displaz.

I still don't understand all that I am doing, but I got the basic set up and can sort of understand packages and environments. Javier Colunga helped me by getting Ananconda installed. Spyder is the development environment I had been looking for. There is so much that is possible; it is hard to even know where to start.

For my first project, I thought it would be nice to play around with a lidar point cloud (using the Dragon's Back of course): grid it using pdal and then make a hillshade using gdal. Download the data from OpenTopography.

The steps are:

  1. Launch an Ananconda terminal.
  2. Add these conda channels for package install:
    conda config --prepend channels conda-forge/label/dev
    conda config --prepend channels conda-forge
  3. Then define an environment and add the packages using conda: conda create -y -n PC_py3 python=3.6 pip scipy pandas numpy matplotlib scikit-image gdal pdal xarray packaging ipython multiprocess h5py lastools pykdtree spyder gmt=5* (this comes from the workshop).

My first problem was that I could not run pdal from inside a python script. There is something I don't understand there (even though it is installed, etc.). I see that it is possible to call python from inside the pdal json files... But, I can run it from the command line:
> pdal pipeline db.json
where the db.json has the parameters for the simple neighborhood gridding run:

                "resolution": 1,
                "radius": 0.707,
                "gdaldriver": "GTiff",
                "gdalopts": "COMPRESS=DEFLATE, ZLEVEL=7, GDAL_NUM_THREADS=ALL CPUS",
                "data_type": "float",
                "output_type": "idw",
I was able to make a little dem (the tif file). But then, I ran the gdal from the command line only:
>gdaldem hillshade small_idw_1m.tif small_idw_1m_shd.tif

Finally, I could run a little python script to draw the hillshaded geotiff (this I could run from Spyder):

#!/usr/bin/env python
import gdal
import matplotlib.pyplot as plt

ds = gdal.Open('small_idw_1m_shd.tif').ReadAsArray()


plt.imshow(ds, cmap='gray')

f.savefig('smallDB.png', dpi=300)

Small piece of the Dragon's Back lidar data (B4 project) gridded with pdal, hillshaded with gdal, and drawn with matplotlib

So, I guess that is a bit of a success, there is a lot more to do and learn!

I keep finding references to help learn: