Michael Armbrust

Michael Armbrust

Berkeley, California, United States
2K followers 500+ connections

Activity

Join now to see all activity

Experience

  • Databricks Graphic

    Databricks

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    Mountain View, CA

  • -

  • -

  • -

  • -

  • -

  • -

Education

  • University of California, Berkeley Graphic
  • -

    Activities and Societies: Phi Beta Kappa, Delta Lambda Phi, Phi Sigma Pi, Mortar Board, College Mentors for Kids (CMFK), Computer Science Undergraduate Student Board (USB), Queer Student Union, Purdue Science Student Council (PSSC), Science Ambassadors, Boiler Gold Rush (BGR), Engineering Projects in Comunity Service (EPICS)

Publications

  • Creating Data Pipelines for PDS Datasets

    We present the details of an image processing pipeline and a new Python library providing a convenient interface to Planetary Data System (PDS) data products. The library aims to be a useful tool for general purpose PDS processing. Test images have been extracted from existing PDS data products using the library but will work with lunar images from LRO/LROC. To process high-volume data sets we employ Hadoop, an open-source framework implementing the Map/Reduce paradigm for writing data…

    We present the details of an image processing pipeline and a new Python library providing a convenient interface to Planetary Data System (PDS) data products. The library aims to be a useful tool for general purpose PDS processing. Test images have been extracted from existing PDS data products using the library but will work with lunar images from LRO/LROC. To process high-volume data sets we employ Hadoop, an open-source framework implementing the Map/Reduce paradigm for writing data intensive distributed applications. By harnessing a cluster of processing nodes we are able to extract raw images from data products and convert them to web-friendly formats at the rate of gigabytes per minute. The resultant images have been converted using the Python Image Library. Additionally, the images have been cropped to postage stamp images supporting various zoom levels. The final images, along with some metadata are uploaded to Amazon's S3 data storage system where they are served. Preliminary tests of the pipeline are promising, having processed 10,000 sample files totaling 30 GB in 15 minutes. The resultant jpegs totaled only 3 GB after compression. The code base has not only proven successful in its own right, but also shows Python, an interpreted language, to be a viable alternative to more mainstream compiled languages such as C/C++ or Fortran, especially when combined with Hadoop. This work was funded through NASA ROSES NNX09AD34G.

    Other authors
    See publication

More activity by Michael

View Michael’s full profile

  • See who you know in common
  • Get introduced
  • Contact Michael directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Michael Armbrust in United States

Add new skills with these courses