Wednesday, October 23, 2013

Managing the Deluge of 'Big Data' From Space

Ed remembered a link from a previous post that fits with where we are going to be as we find ways to return more data from missions.  Once you get the data back, how do you handle it?  How hard is it to digest what you received?  It also brought to mind the discussions about large data bases that companies and our national security folks are collecting.  As I mentioned to Ed, Amazon.com certainly has algorithms that can guess at what book I might like to buy. 
-LRK-

-------------------------------------------

From one of your previous messages
http://www.nasa.gov/mission_pages/spitzer/news/spitzer20131017.html
Managing the Deluge of 'Big Data' From Space

An interesting overlap with your current post.

As best I can tell hard disk drives hit their equivalent of "Moore's Law" two to three years ago.
Up until then home disk drives at Costco were doubling in size every year or so.  They have ceased doing so.
Solid state drives may now catch up with spinning drives - but may hit "Moore's Law" before then.

One thing an optical link could really do is shrink antenna sizes on earth and in space, while maintaining current data rates.
For an optical link a parabolic reflector is probably lighter than a lens.
...
-------------------------------------------

I would not have been able to copy and save Pioneer 10-11 Master Data Record data if it had been in the amounts that can be gathered now. 
This data from the total mission time is just a drop in a bucket compared to what you might gather with something new like the  download rate of 622 megabits per second (Mbps) shown with the NASA's Lunar Laser Communication Demonstration (LLCD).
Pioneer 10: 155 disks x 128 MB (16.33 GB)
Pioneer 11: 217 disks x 128 MB (23.01 GB)

NASA is looking at the problems of dealing with large amounts of data as Ed reminded me.
-LRK-

-------------------------------------------
For NASA and its dozens of missions, data pour in every day like rushing rivers. Spacecraft monitor everything from our home planet to faraway galaxies, beaming back images and information to Earth. All those digital records need to be stored, indexed and processed so that spacecraft engineers, scientists and people across the globe can use the data to understand Earth and the universe beyond.
At NASA's Jet Propulsion Laboratory in Pasadena, Calif., mission planners and software engineers are coming up with new strategies for managing the ever-increasing flow of such large and complex data streams, referred to in the information technology community as "big data."
How big is big data? For NASA missions, hundreds of terabytes are gathered every hour. Just one terabyte is equivalent to the information printed on 50,000 trees worth of paper.
"Scientists use big data for everything from predicting weather on Earth to monitoring ice caps on Mars to searching for distant galaxies," said Eric De Jong of JPL, principal investigator for NASA’s Solar System Visualization project, which converts NASA mission science into visualization products that researchers can use. "We are the keepers of the data, and the users are the astronomers and scientists who need images, mosaics, maps and movies to find patterns and verify theories."
...
-------------------------------------------

Even if you have the money that NSA commands to store data it gathers, it can be a problem just running the hardware.
- LRK -

-------------------------------------------
The NSA's Hugely Expensive Utah Data Center Has Major Electrical Problems And Basically Isn't Working
10/07/2013 @ 9:31PM

Well, this is good news for those with privacy concerns about the NSA and terrible news for those concerned about government spending. The National Security Agency’s new billion-dollar-plus data center in Bluffdale, Utah was supposed to go online in September, but the Wall Street Journal’s Siobhan Gormanreports that it has major electrical problems and that the facility known as “the country’s biggest spy center” is presently nearly unusable:

"Chronic electrical surges at the massive new data-storage facility central to the National Security Agency’s spying operation have destroyed hundreds of thousands of dollars worth of machinery and delayed the center’s opening for a year, according to project documents and current and former officials.

There have been 10 meltdowns in the past 13 months that have prevented the NSA from using computers at its new Utah data-storage center, slated to be the spy agency’s largest, according to project documents reviewed by The Wall Street Journal."
...
-------------------------------------------

Data from missions has to be used and evaluated. The same goes for other databases.
-LRK-

-------------------------------------------
Data mining

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD),[1] an interdisciplinary subfield of computer science,[2][3][4] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligencemachine learningstatistics, and database systems.[2] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.[2] Aside from the raw analysis step, it involves database and data management aspects, data pre-processingmodel and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[2]
The term is a buzzword,[5] and is frequently misused to mean any form of large-scale data or information processing (collectionextractionwarehousinganalysis, and statistics) but is also generalized to any kind of computer decision support system, including artificial intelligencemachine learning, and business intelligence. In the proper use of the word, the key term isdiscovery[citation needed], commonly defined as "detecting something new". Even the popular book "Data mining: Practical machine learning tools and techniques with Java"[6] (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons.[7] Often the more general terms "(large scale) data analysis", or "analytics" – or when referring to actual methods, artificial intelligence and machine learning – are more appropriate.
...
--------------------------------------------

Just for fun, consider the Internet and its many nodes, connections, and data transferred.  Now take a data image snap of what is out there and store it in the Web Archive.
-LRK-

--------------------------------------------
The Internet Archive has updated its Wayback Machine with a significant bump in coverage: the service has gone from 150,000,000,000 URLs to having 240,000,000,000 URLs, a total of about 5 petabytes of data. More specifically, the Wayback Machine now covers the Web from late 1996 to December 9, 2012.
Why is this significant? Well, as the Internet Archive points out, the Wayback Machine’s database is queried over 1,000 times every second by over 500,000 people a day, making Archive.org the 250th most popular site on the Web.
The team has spent the last year archiving various pages, including those related to the US 2012 presidential election. Yet the Wayback Machine doesn’t just feature archives of sites that still exist today: it also contains many that no longer exist.
...
https://archive.org/
Wayback Machine
--------------------------------------------

And if you like Science Fiction that comes too close to reality or might in a few years, you might enjoy the WWW Trilogy by Robert J. Sawyer where the Internet wakes up and begins to communicate with a blind girl. The security folks don't like someone else looking at the whole Internet.  (I am cheap and bought my copies used.) :-)
-LRK-

--------------------------------------------
The World Wide Web wakes up
"Lately, I've been inspired by ideas from Robert J. Sawyer."
—Artificial-intelligence pioneer Marvin Minsky

WAKE

WATCH

WONDER

--------------------------------------------

Thanks for looking up with me.  
- LRK -
============================================

WHAT THE MIND CAN CONCEIVE, AND BELIEVE, IT WILL ACHIEVE - LRK -

============================================