Clip raster with vector and calculate mean of clipped area, revisited

Two months ago, I posted  “Clip raster with vector and calculate mean of clipped area” blog post. I mentioned in this post that extraction and cellstats were taking time. I posted this blog on reddit.com as well. One of the users (REORS) responded with few suggestions (see below).

Neat, I’ve got a similar tool in my own code (though my handling of shapefiles is really basic, not got to grips with them in R yet – had other things on my to-do list). Just a few ideas:

  • The extract() function might be closer to what you’re after it bypasses the need to mask out cells, and actually has some fairly advanced settings – this is kind of another case where the raser package has a really useful tool, but keeps somewhat quiet about it.
  • You might consider adding a crop() function infront of the mask(), where the polygon is a lot smaller than the raster this reduces the number of cells R has to mask out (though my notes on the speed of this are fairly subjective).
  • The drawPoly() function is a neat little tool that lets you define a shapefile by clicking on the plot window – though that’s kind of different to what you’re doing here.

and about speed, he sad

If your computer’s as old as the one I use – then speed is really a concern…
My version’s more aimed as a spectral profiler (along the same lines as can be found in ERDAS Imagine), but the broad idea is similar. If you’re interested the code’s here. Would you mind if I borrowed your idea of adding the results to the shapefile (if I can work out how to make it work)?

You can read full comments on this link.  I applied one of the suggestion (the use of crop() function), following is the new code (with couple of changes).

# install.packages("raster") #into your R console 

library(raster)

#Note:
# (1) I assume that raster and vector are in one EPSG + vector is overlaping raster (polygons)
# (2) Vector and raster are in in project directory

#read shape file
shp = shapefile("Africa_boundaries_4326.shp")

# load reaster
img = raster("monthlyPCP1998_1.tif");

# filter by polygon ( this will go into loop)
country <- c("Algeria","Angola","Benin","Botswana","Burkina Faso","Burundi","Cameroon","Cape Verde","Central African Republic","Chad","Comoros","Congo","Djibouti","Egypt","Equatorial Guinea","Eritrea","Ethiopia","Gabon","Gambia, The","Gaza Strip","Ghana","Guinea","Guinea-Bissau","Ivory Coast","Kenya","Lebanon","Lesotho","Liberia","Libya","Madagascar","Malawi","Mali","Mauritania","Mauritius","Morocco","Mozambique","Namibia","Niger","Nigeria","Reunion","Rwanda","Sao Tome and Principe","Senegal","Seychelles","Sierra Leone","Somalia","South Africa","Sudan","Swaziland","Tanzania, United Republic of","Togo","Tunisia","Uganda","Western Sahara","Zaire","Zambia","Zimbabwe");

output <- matrix(ncol=3, nrow=length(country))

for (i in 1:length(country))
{

  # now filter
  filtered = shp[match(toupper(country[i]),toupper(shp$name)),]

  #now use the mask function
  # crop() has reduced the area for mask() thus very fast.
  rr <- mask(crop(img, filtered),filtered)

  country_sum <-cellStats(rr, mean, na.rm=TRUE);

  output[i,1] <- i;
  output[i,2] <- country[i];
  output[i,3] <- country_sum;

  #print(i,country_sum);
}

write.csv(output, file ="data.csv")

#ref
#http://gis.stackexchange.com/questions/61243/clipping-a-raster-in-r

Thanks to REORS, Crop() function is reducing the areas to be masked (clipped), thus script became super fast.

Open Source rocks 🙂

Advertisements

Clip raster with vector and calculate mean of clipped area

Few days back a friend asked me to help him with a task. He wanted to clip a raster with a polygon vector file and then calculate mean of clipped raster pixel values.

I figured out that it could be done in following steps

  • Load raster and vector
  • Read vector and get all polygons
  • While Looping though the polygons
    • clip raster based on the selected polygon
    • calculate mean
    • write in a csv with polygon’s unique key

I used Tropical Rainfall Measuring Mission (TRMM, January 1998) raster image and African boundaries shape file for this script (see below).

trmm

I decided to write a script in R. I looked at some of the examples on stagexchange.com and came across an answer (link)which helped in writing the code below (almost 80% of the code)


# install.packages("raster") #into your R console 

library(raster)

#Note:
# (1) I assume that raster and vector are in one EPSG + vector is overlaping raster (polygons)
# (2) Vector and raster are in in project directory

#read shape file
shp = shapefile("Africa_boundaries.shp")

# load reaster
img = raster("rasterfile.tif");

# filter by unique polygons ( this will go into loop)
country <- c("Algeria","Angola","Bahrain","Benin","Botswana","Burkina Faso","Burundi","Cameroon","Cape Verde","Central African Republic","Chad","Comoros","Congo","Djibouti","Egypt","Equatorial Guinea","Eritrea","Ethiopia","Gabon","Gambia, The","Gaza Strip","Ghana","Guinea","Guinea-Bissau","Ivory Coast","Kenya","Lebanon","Lesotho","Liberia","Libya","Madagascar","Malawi","Mali","Mauritania","Mauritius","Morocco","Mozambique","Namibia","Niger","Nigeria","Reunion","Rwanda","Sao Tome and Principe","Senegal","Seychelles","Sierra Leone","Somalia","South Africa","Sudan","Swaziland","Tanzania, United Republic of","Togo","Tunisia","Uganda","Western Sahara","Zaire","Zambia","Zimbabwe");

output <- matrix(ncol=3, nrow=7)

for (i in 1:length(country))
{

   # now filter shape file to get one polygon record
   filtered = shp[match(toupper(country[i]),toupper(shp$name)),]

   #now use the mask function to clip based on polygon record
   rr <- mask(img, filtered)

   # get mean of all cells
   country_sum <-cellStats(rr, mean, na.rm=TRUE);

   # write record in csv
   output[i,1] <- i;
   output[i,2] <- country[i];
   output[i,3] <- country_sum;

}

write.csv(output, file ="data.csv")

I should have looped through shape file to read each polygon rather than using a variable like country. Next time I will do that :).

The extraction and cellStats function took little bit of time. Finally, I joined my csv with shape file (using qGIS, see below).

mean_rf

 

Its better to know more than one programming language and if you are looking for an other language R is a strong candidate especially for GIS/RS.

reference:
http://gis.stackexchange.com/questions/61243/clipping-a-raster-in-r

Brown-bag seminars: Learn R by coding examples and hands-on.

bb

Some people listens lectures/talks (“Ted talks” or “Talks at Google”) while eating lunch, its an effective way of learning new things. This habit is similar to Brown-bag seminars (I did not realize it until a friend mentioned the similarities).

Brown-bag seminars are held around lunch time to learn new things (read more here), they are short (30 to 45 minutes) thus utilizing only lunch time. Brown-bags are excellent opportunities especially for students/interns or for some one who wants to learn new things in a relaxed environment while enjoying lunch.

Geo Science Lab (ICRAF) holds weekly Brown-bags seminars related to R. These are code examples (with exercise dataset) and hands-on. Good thing about code examples is that they are easy to learn and replicate (fast track learning), I mean look at the following example (taken from seminar 6), any one can replicate these commands for one’s own use within minutes.


#load ggplot package
library(ggplot2)
#Let's plot Clay by Carbon
#First Graphic
ggplot(data=tree)+geom_point(aes(x=Carbon,y=Clay))
#Second Graphic- add the color of Site
ggplot(data=tree)+geom_point(aes(x=Carbon,y=Clay,col=Site))
#Third Graphic- add faceting of VegStructure
ggplot(data=tree)+geom_point(aes(x=Carbon,y=Clay,col=Site))+facet_wrap(~VegStructure,ncol=1)
#Fourth Graphic- add xaxis and yaxis labels and move legend title
ggplot(data=tree)+geom_point(aes(x=Carbon,y=Clay,col=Site))+facet_wrap(~VegStructure,ncol=1)+theme(legend.position="top")+ xlab("Carbon (%)")+ylab("Clay (%)")

Below is a list of 7 seminars

1) Welcome to Brown-bags seminars 

2) Introduction to R: part 2 (data import, data frames)

3) Introduction to R: part 3 and 4  (Basic data analysis, basic stats)

4) Introduction to R: part 5 (graphs)

5) Introduction to ggplot

6) ggplot and dplyer packages

Next few sessions will be on ggplot, dplyr and raster packages.

OSGeo Live on virtual box

Osgeolive_wordle

For those of you who still wants to keep windows as their main operating system but wants to try the goodies of Free and Open Source software for Geo (FOSS4Geo), OSGeo has many options like live DVD and virtual box VMs. All you have to do is to install Virtual Box and download OSGeo virtual machine (plus 7zip for extracting zipped file). Follow the instruction below

http://live.osgeo.org/en/quickstart/virtualization_quickstart.html

Download will take time as it is around 3GB (I downlaoded vdkm), extratct using the following command (if you are on windows, use 7zip GUI)

7z x software-backup/osgeo-live-vm-8.0.7z

When you will run command sudo apt-get update, password is require and it is “user” (on desktop, there is a file named passwords.txt, contains all the passwords e.g. OS-user ‘user’ has password ‘user’)

One more thing, to set screen to a reasonable size, at the end of OSGeo blog there are some commands, if they do not work (like in my case) go to  start -> System Tools -> Synaptic Package Manager and install the following packages (read my old blog about it Resize virtual machine’s desktop in VirtualBox)

apt-get install virtualbox-guest-dkms
virtualbox-guest-utils
virtualbox-guest-x11

The first package will install second package as dependencies however you have to install third package. If successful with installation, you should see a desktop like below

OSGeoLive

What it contains

  1. Libraries with paths configured (this is what you need the most e.g. gdal)
  2. Programming languages like python (with packages installed)
  3. QGIS, GRASS GIS, gvSIG, OpenJump, SAGA GIS and uDig
  4. R Statistics
  5. and many more 🙂

Trust me, for new users, the first two are very important as they reduce complexities and frustration (a big thanks to OSGeo of course ).

 

Gdal – a swiss knife

572px-GDALLogoColor.svg

I worked with Global Inventory Modeling and Mapping Studies (GIMMS) dataset few weeks back, it is based on AVHRR and goes back to 1981.  For analysis, I had to resample GIMMS dataset from 8 km to 5 km (the version of GIMMS I was working with had 8 km resolution). I was working with approx. 736 images. Since I was performing an other task in R, thus, decided to combine resampling with it. I wrote a bash script to call my R script and passed source image and desired resolution image. The process was very simple as shown below (full script available here).

.......
# target image folder
src1folder="path-to-files"

#desired resolution image
src2folder="folder-for-desired-image"
src2img="desired-image-name.tif" 

for myfile in *.tif
do
  # target path and image name
  target="target-image-folder-path" 
  
  Rscript /media/external/muhammad/myscripts/scale.r $src1folder $myfile $src2folder $src2img $target
  
  # R temp folder is being filled by temp files, we need to clean it as well. 
  rm /tmp/R_raster_geoscilab/*.gri
  rm /tmp/R_raster_geoscilab/*.grd

done
.......

The R script was executed inside the loop for each image in folder “path-to-files”. R script (available here) read desired and source images and performed resampling using projectRaster() method. There were couple of other tasks performed by my R script but resampling was major.  When I tested this approach,  a single image took 12 minutes (5 images per hour). WOW! my script will run for 6 days to finish this task. Yes, scripts worked but 6 days were too much. I found out that it is projectReater() function which is taking almost 11 minutes. After a quick discussion with a friend of mine (R guru and big fan of gdal + Open Source) I tried gdal utilities. Quickly, I wrote a new script utilizing gdalwarp utility (command below, full script is available here).

gdalwarp -tr 0.050000000000000 -0.050000000000000 $outputfile $output_newfile

Notice the values for -tr switch which dictates the output image to be in 5km pixel size.  When I tested this image, WOW it took only 5 seconds for a single image. Still R was required for filtering but now I will not run my scripts for 6 days. It is important to note here that I used the default method for resampling in gdalwarp. Below is a GIMMS 5km resolution image (with nodata for further processing).

GIMMS-2011-dec-a

Moral of story: Try to use other tools rather than focusing on one.

Upgrading R to 3.1.0 on Ubuntu 12.04

I had R 2.14.1 on one of my servers. I needed packages like raster rgdal but could not install due to the error that these packages are not available for this version.

I tried to install installr package (below) so that I can run  updateR() and move to R 3.1.0

install.packages("installr");

but got the following error

Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
 package ‘installr’ is not available (for R version 2.14.1)

found out that it is only for window.

Then i read this post (How to install R ver 3.0) and followed it from section “Uninstall old R”. The update and upgrade part will take some time. R it self will take time.

Finally, R 3.1.0 is installed (Spring Dance is ON) and then I started installing packages. First was raster

install.packages("raster");

an error occurred

Error : package ‘sp’ was built before R 3.0.0: please re-install it
ERROR: lazy loading failed for package ‘raster’
* removing ‘/home/myserver/R/x86_64-pc-linux-gnu-library/3.1/raster’
The downloaded source packages are in
 ‘/tmp/Rtmp1n6Mdz/downloaded_packages’
Warning message:
In install.packages("raster") :
 installation of package ‘raster’ had non-zero exit status

Thus installed sp first by following command

install.packages("sp");

Then raster

install.packages("raster");

then came

install.packages("scales");

Now, before rgdal installation (remember I ran into problems previously, see blog) I checked gdal version with following command

 gdalinfo --version

I had gdal 1.7.3, aaaaaaaaaah needed to install version 1.10. Thus followed my own blog 🙂 “Installing rgdal package for R 3

First I removed gdal 1.7.x by following command

sudo apt-get remove libgdal-dev gdal-bin 

then I made connection to Ubuntu Unstable PPA by following commands

sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable 
sudo apt-get update

Finally I installed the gdal 1.10 with following command

sudo apt-get install libgdal-dev gdal-bin

WOW it took a lot of time, may be due to slow internet at my side. Now I started R and installed rgdal package

install.packages("rgdal");

and last package to install was ggplot

install.packages("ggplot2");

Finally I got my script running successfully.

rainfallTS

Installing rgdal package for R 3

Day by day, my usage of R is increasing, it is simple, less code yet powerful. Recently I used R to get pixel values from a stack of images and generated graphs. Today i had to upgrade a server to R3 and ran into installation related issues

rgdal package installation was causing problems, error message was

gdal-config
--configure-args='--with-gdal-config=/usr/local/bin/gdal-config'
with appropriate values for your installation.

ERROR: configuration failed for package ‘rgdal’
* removing ‘/home/mycomputer/R/x86_64-pc-linux-gnu-library/3.0/rgdal’

The downloaded source packages are in
‘/tmp/Rtmp27M68g/downloaded_packages’

Warning message:
In install.packages("rgdal") :installation of package ‘rgdal’ had non-zero exit status

I didn’t have gdal-config on my computer and when i tried to install gdal-config following using command

sudo apt-get install libgdal1-dev

I got following errors

Reading package lists... Done
Building dependency tree 
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
libgdal1-dev : Depends: libdap-dev but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

I checked my gdal version using synaptic (alternate gdalinfo – -version command) and it was 1.9 (and this was the issue, gdal 1.10 was required) thus added ppa from ubutnugis.  Started synaptic package manager and clicked ‘Repositories’ from settings menu, went to ‘Other Software ‘ tab and added following repository (one by one)

deb http://ppa.launchpad.net/ubuntugis/ubuntugis-unstable/ubuntu precise main

deb http://ppa.launchpad.net/ubuntugis/ubuntugis-unstable/ubuntu precise main (Source Code)

Now from Edit menu, clicked ‘Reload Package Information’ , gdal 1.9 required upgrade, once update was complete  the command below worked

sudo apt-get install libgdal1-dev

relax it will take some time. Finally i installed package rgdal on R using

install.packages('rgdal')  

I checked installation by executing my graph R script and it worked.

rain-fall

But I discovered an other issue (after some time), now an applications (geonode) requiring gdal is not working. I installed geonode on an other server and this time instead of the above process, I did not update gdal but simply used the following command (use it after installing R)

sudo apt-get install libgdal1-dev libgdal-dev

Finally, I installed rgdal package. Till now I have not seen any issues with applications (and gdal version has not changed).