Saturday, November 12, 2011

Using a csv file as a metadata database in Greenstone (2.84)

Alternative to entering metadate manually in GLI in Greenstone (currently 2.84).  One can create a csv file with all the metadatas and import it into greenstone using gli.  This is very useful for at least two cases: 
  1. You can get the metadata from somewhere else in csv format.
  2. You want to experiment on a collection.  And you manage metadata in a separate csv file and  do not need to reenter all the metadatas after you screw up the collection. 

Greenstone has a metadatacsv plugin and unfortunately the author cannot figure out how to use it.  Alternatively, Greenstone has a explode metadata mechanism which is very useful.   Let us detail it. 

Let us first assume we have three files in Greenstone:

a.txt, b.txt, c.html


Filename,dc.Title,dc.Creator,Description,Contributor
a.txt, a, GZ, first test, GJZ
b.txt, b, GZ, second test, GJZ
c.html, c, GZ, third one, GJZ

A couple of point to notice for this file:
  • The first line is the label
  • The name of the label must exactly match the your metadata name (including the case, title will be mapped into "dc.title", instead of the standard "dc.Title")
  • Later we will choose the default metadata set, the non-qualified label will be mapped into this default set (Dublin Core in this case)
  • One can use a simple text editor do it. 
  • Google Doc is also an excellent tool with the  built-in collaborating function. One can create a spreadsheet first, fill it with your team mates, then "download as" a csv file. 

Here is the procedure in GLI
  1. (optional) create a new collection
  2. In the "Gather" tab, add all three files (a.txt, b.txt, c.html)
  3. In the "Gather" tab, add meta.csv (a window will pops up asking about plugin to use, choose either one and this does not matter much)
  4. In the "Enrich" tab, right-click "meta.csv", choose "explode metadata database", a new window will pop up (it shows as CSVPlugin, it is fine)
  5. choose "metadata_set" accordingly (Dublin Core in the example), this will be your default set, all your non-qualified metadata will be mapped into this (in our example, "Description" is mapped as "dc.Description")
  6. tick "document_field", enter the label of the file name column ("Filename" in the example)
  7. click "explode"
  8. In the Enrich tab, you will notice that all three files becomes the sub-levels of meta. And the metadata field is populated.  You can enter more
  9. Build the collection, tweak the display, do the rest 


Adding metadata with a new metadata scheme


Sometimes one has some special metadata fields that do not fit any existing schema.  One might want to create a new one from scratch or by modifying the existing one.  This can be done in Greenstone GEMS. Or one can do the same in  GLI "Manage Metadatasets..." in "Enrich" tab. 



I made a new metadata set "my new DC" from Dublin Core.  So it has no complication of the sublevel, all flat.  I specify it with a new "namespace"  ---"mdc" in order to distinguish it from normal dc.  I then use the same procedure as in the coruse website to explode a csv metadata file and it work fine. 



Subtlety with line end in CSV

Sometimes it just gives you a 000001.nul when you explode the csv file. It seems that CSVplugin of greenstone (@2.84) only works well with unix time line end "LR", but not well with Windows type "CR LR". Unfortunately, Excel generates wrong style. One way around is to copy the full sheet from excel to google doc spreadsheet and then "download as" CSV. Google doc seems to produce the CSV Greenstone likes. This csv file should explode fine. Greenstone team needs to work on the CSVplugin.

1 comment :

Yohannes said...

Thank you so very much! Helped a lot. I was struggling with the Excel csv for 6 hours straight now.