Approved for public release; distribution is unlimited.

Document created: 1 December 05
Air & Space Power Journal - Winter 2005

Diving the Digital Dumpster

The Impact of the Internet on Collecting
Open-Source Intelligence

Lt Col David A. Umphress, PhD, USAFR

Editorial Abstract: Initially a research project for networking computers, the Internet now enables a free flow of information and has woven itself into the very fabric of our culture. However, the Air Force should be judicious about the information it places on Web sites. In this article, Colonel Umphress argues that although the Internet is an information system we cannot avoid, the Air Force must use this resource responsibly to avoid falling prey to its vulnerabilities.

Air Force organizations commonly ponder the type of information they should post to Web sites. On the one hand, they could reasonably consider posting as much information as possible. Web sites so constructed might include an organization’s functions; its list of personnel; details of its operations, policies, major decisions, finances; and so forth, thus conveying a sense of openness and transparency. Visitors to the site could easily find whatever they are looking for. On the other hand, these organizations could argue in favor of posting very little information beyond perhaps their names and post-office-box addresses. Although such a policy would not present a very friendly “Web presence,” it would certainly prevent someone from using information for nefarious purposes.

Common sense says that the answer lies somewhere between these two extremes. But where? Although the Internet makes possible the free flow of information, the Air Force should not necessarily make all information freely available through the Internet. Obviously, the service should not post classified or sensitive information, appropriate only for a restricted audience, without appropriate information-protection mechanisms. The less obvious question addresses how much unclassified information the Air Force should make publicly available, realizing the possibility of assembling compromising intelligence from seemingly innocent information.

We live in an information age, one requiring that we carefully consider possible threats to national security before we openly provide certain information. This article explores the issue of legal data collection in the context of the Internet by describing its susceptibility to exploitation for open-source intelligence (OSINT), current Air Force efforts to prevent OSINT collection, current practices that expose the Air Force to such collection, and possible countermeasures.1

The Internet:
An Information Delivery System We Cannot Avoid

Picture a world in which everyone has a printing press and potential access to everyone else’s documents. Because of today’s technology—specifically, the Internet—this image is not far removed from reality. Rising from the modest roots of 1960s technology, the Internet currently attracts an estimated 935 million users across more than 214 countries. At its current growth rate, usage could reach world saturation by 2010.2

The Internet first demonstrated its usefulness by providing the underlying computer-network infrastructure for transmitting data files from one computer to another, thereby spawning electronic mail, news groups, chat boards, and other applications that support information transfer. Over the past decade, the World Wide Web (WWW) has given the Internet a user-friendly veneer by providing data-transfer protocols and addressing schemes needed to deliver text, pictures, sounds, and videos. It has put the Internet—hence information—directly into the hands of ordinary citizens. Indeed, the WWW has made it possible for anyone with access to a Web server—something supplied by all major Internet service providers—to publish information to the world.

The Internet’s potential has prompted technology pundits to declare it a major force of change because it offers information unfettered by physical, political, or cultural boundaries. With the Internet, for instance, a student can find information on how computer networks work; a civil engineer can locate an aerial photo of a road system; a child can send an instant message to a military mother deployed to a foreign country; a doctor can download a scholarly paper on diseases; a shopper can purchase electronic equipment from a geographically distant retailer; a tourist can read information about her native country in her native language; a blogger can append a description of daily observations to a Web log; and so forth.

This hunger for information is not likely to subside. Industry is increasingly turning to publishing information on Web pages in an effort to enable consumers to find answers to their questions. The US government has taken a similar path with the E-Government Act of 2002, which promotes “establishing a broad framework of measures that require using Internet-based information technology to enhance citizen access to Government information and services.”3 The Department of Defense (DOD) echoes this desire with a policy that states that “using the World Wide Web is strongly encouraged in that it provides the DoD with a powerful tool to convey information quickly and efficiently on a broad range of topics relating to its activities, objectives, policies and programs.”4 Each military service has a derivative policy conveying the same idea.5

The Internet as
Open-Source Intelligence

We no longer question whether to use the Internet to convey information to the public—only what information. It is naïve to think of the Internet purely in terms of the 1960s fulfillment of a “global embrace” in which we use information for the betterment of all.6 Instead, we need to consider the Internet a vast pool of data from which we can draw information, recognizing that doing so might lead to unintended consequences.7

People commonly use the Internet, particularly the WWW, for open-source information—that is, “publicly available information (i.e., any member of the public could lawfully obtain the information by request or observation).”8 Because the Internet has such a popular following, it is a good candidate for OSINT, the discipline of acquiring open-source information for the purpose of answering a specific question. The student, civil engineer, doctor, and so forth, in the earlier hypothetical examples practiced OSINT in their use of the Internet. One could easily imagine more sinister real-life scenarios of Internet OSINT: using the Internet to locate information on how to build a bomb; to obtain high-resolution aerial photos of major US metropolitan areas, including many military installations; to learn the lethality level of various nerve agents; to download computer-hacking tools; and to learn how to conduct OSINT operations.9 Plainly put, the Internet can serve as a resource for helping schoolchildren with their homework as well as for helping terrorists plan attacks.

The Internet is not the only source of OSINT. Other forms include newspapers, phone books, scientific journals, textbooks, broadcasts, and the like. A combination of three features makes the Internet unique. First, it provides access to the largest body of public information in the world. One can envision the Internet as having both surface and deep content. The surface Internet refers to information accessible through search engines and public links. Traditional WWW pages alone contain, at a minimum, an estimated 170 terabytes of information—a body of data roughly 17 times the size of the print holdings of the Library of Congress.10 The deep Internet describes publicly available information—but only to those who know how to access it. This includes information that software assembles on the fly; that resides in a database, transmitted in response to a specific query; or that simply does not have a publicly known address. Examples include the Berkeley Digital Library Project and Amazon.com.11 Estimates put the volume of the deep Internet at over 400 times that of its surface counterpart.12

Second, as noted previously, since anyone can publish to the Internet, that information might be unregulated and unexpurgated, even within organizations that have strict constraints on electronic publishing. Web pages that abet audience comments—such as public chat rooms, communities of interest, and newsgroups—frequently convey information that a public affairs office would not approve for publication. Blogs and discussion forums frequently offer a unique mixture of raw information and emotion, providing an interested observer not only with content but also with a sense of how it is perceived. A sponsoring organization may monitor such Web pages, in which case it will remove damaging information—but oftentimes not before its wide dissemination.

Third, the information appears in a format that computers can process. Unlike open sources such as print media and broadcasts, which require humans to identify and isolate every piece of information, the Internet has facilities that search for specific information based on certain characteristics (e.g., keywords, location, organization, etc.). As one commentator on the intelligence community puts it,

Collecting intelligence these days is at times less a matter of stealing through dark alleys in a foreign land to meet some secret agent than one of surfing the Internet under the fluorescent lights of an office cubicle to find some open source. The world is changing with the advance of commerce and technology. Mouse clicks and online dictionaries today often prove more useful than stylish cloaks and shiny daggers in gathering intelligence required to help analysts and officials understand the world.13

This is not to say that finding useful intelligence on the Internet is easy. Quite the contrary: the superabundance of information means that specific searches often yield an intractable number of results, the majority of which are irrelevant. Primitive search capabilities based on an exact keyword match have such a narrow focus that they may miss useful intelligence.

OSINT Technology Fronts
on the Internet

Managing information on the scale available through the Internet has become an active research area for today’s computer scientists—an area in which major strides occur almost continuously. Technology fronts of particular interest to Internet OSINT include search engines and data mining.

Search Engines

A search engine is a software system that allows users to locate an Internet resource—a Web page, news-group entry, or other public file—based on some search characteristic. Popular Internet search engines include Google, Alta Vista, Excite, Yahoo, and MSN Search.14 Currently the leader in the highly competitive search-engine market, Google well illustrates the information-location capabilities available today. With a searchable database consisting of over 8 billion Web pages, Google seeks “to organize the world’s information and make it universally accessible and useful.”15 To that end, it offers a wide selection of searching capabilities, including searches constrained to a particular segment of the Internet, such as scholarly journals, news groups, military Web sites, geographic locations, and so forth. Google not only locates information based on textual keywords and phrases but also provides limited capabilities for locating images based on the name of the file containing the image.

Google’s searchable database uses two approaches common to other search engines. First, it draws most of its information from “crawling” Web pages. That is, Google downloads the content of a Web page; indexes that information into a database, based on a number of parameters; and then downloads the content of pages linked to the Web page under examination.16 In this fashion, it visits page upon page by following links. Second, it has the user community submit addresses of Web pages, thus forming a human-edited directory. Categorizing the submissions into a directory that resembles the yellow pages of a phone book allows incorporation of information into the search database that might not have links on pages on the path of the Web crawler. These submissions, presumably, then become candidates for crawling by the automated Web-crawling mechanism.

The company recently added an “alert” feature, which sends e-mail to users who have registered search criteria, notifying them that the search engine has found what they are looking for. Google points out that they can use alerts for “monitoring a developing news story[,] keeping current on a competitor or industry[,] getting the latest on a celebrity or event[, and] keeping tabs on your favorite sports teams.”17 Because this feature uses the same search techniques as the traditional Google services, it might yield irrelevant results; however, it ups the OSINT ante by allowing the user to wait passively for information.

For the technologically sophisticated audience, Google makes many of its services accessible programmatically through “application program interfaces.”18 In other words, a user can write software that draws upon Google’s search features rather than having to use its Web interface. Thus, a user can have a highly specialized program for finding information based on an in-depth analysis of the results of Google searches, possibly combining Google results with those from other sources.

Data Mining

Also known as knowledge discovery, data mining attempts to extract meaning from large amounts of data. “Soft” data mining identifies already-existing patterns of knowledge within a body of data; “hard” data mining discovers heretofore unknown knowledge from a body of data. The former is analogous to an analyst recognizing specific bits of intelligence from a mass of raw data; the latter to a scientist originating new facts by extending concepts gleaned from raw data. Both are relevant to OSINT operations for different reasons; however, both require extensive human intervention because they demand understanding the semantics of information—something very difficult to automate.

Common search engines give a glimpse into the world of data mining at the most fundamental level. All engines present search results in some type of rank ordering. Some rank results based on the number of times keywords appear in the content. Google, for example, utilizes a more complex approach by presenting search results based on a “fitness” measurement that takes into account the content of the page, number of links pointing to the page, arrangement of information on the page, and other factors.19

Google’s search results typically have three links: one pointing to the address of the Web page where the search result was initially found, one pointing to the copy of the Web page at the time the search result was found, and one pointing to pages “similar to” the page pointed to by the search result. The company has not revealed how it determines its similar-to links. They appear to be based on major keywords on the target page, perhaps employing synonyms as well. Regardless of the underlying mechanism, the similar-to links seem to represent the accepted state-of-the-practice of generalized Internet data mining.

Air Force Efforts to
Prevent OSINT

The World War II maxim “loose lips sink ships” well illustrates the concern with public disclosure of information relating to military operations. In a sense, the Internet is an extremely large collection of loose lips. The Air Force attempts to prevent these lips from sinking its metaphorical ships through three primary means: policies, technological practices, and training.


As addressed earlier, policies show that the DOD recognizes the worth of using the Web to promote public awareness of the military services. It also recognizes the danger that accompanies public disclosure:

The considerable mission benefits gained by using the Web must be carefully balanced through the application of comprehensive risk management procedures against the potential risk to DoD interests, such as national security, the conduct of federal programs, the safety and security of personnel or assets, or individual privacy created by having electronically aggregated DoD information more readily accessible to a worldwide audience.20

This balance is codified in a chain of policies that begin with DOD Directive (DODD) 5230.9, Clearance of DOD Information for Public Release, which states that “information shall be reviewed for clearance by appropriate security review and public affairs offices prior to release.”21 Attendant DOD Instruction (DODI) 5230.29, Security and Policy Review of DOD Information for Public Release, assigns the director of the Washington Headquarters Services the responsibility of monitoring compliance of public disclosure and directs each DOD component to “issue any guidance necessary for the internal administration” of information released to the public.22

With DODD 5230.9 and DODI 5230.29 primarily setting the stage for information released to the public, the DOD’s Web Site Administration Policies and Procedures specifically addresses the face that the DOD places on the Web.23 It issues instructions on the process a Web site administrator goes through in posting information to the Web, noting in particular a number of reviews that information should undergo before release.24 Importantly, it also contains a “guide for identifying information inappropriate for posting to a publicly accessible DOD web site.”25 Although termed a guide, the categories of information deemed incompatible with public release are quite extensive and detailed. They address areas of military operations and exercises, personnel information, proprietary information, test-and-evaluation information, scientific and technological information, intelligence information, and miscellaneous confidential information. As a whole, the categories preclude the publishing of maps, detailed organization charts, notices of exercises, and so forth.26

Air Force–specific policy mirrors DOD policy. For example, Air Force Instruction (AFI) 35-101, Public Affairs Policies and Procedures, places Web information under the purview of the public affairs function and echoes the content guidelines of the DOD-level Web Site Administration Policies and Procedures.27 AFI 33-129, Web Management and Internet Use, gives individual commands the authority to establish Web sites, subject to approval by higher headquarters, assigning them responsibility for assuring the “content and security” of information posted to the Web.28 Importantly, it also directs that all Web sites be evaluated using a checklist that includes open-source vulnerability criteria.29 AFI 10-1101, Operations Security (OPSEC), also provides indicators of information vulnerabilities, albeit in a more general sense than does AFI 33-129.30

Technological Practices

AFI 33-129 explicitly segments the Air Force’s public Web information into two parts: pages accessible to the general public and pages intended for a restricted audience, namely .mil and .gov users.31 Public pages contain information released through Air Force public-affairs channels, accessible by any Web browser. A limited number of users can access private pages, based on the network address of their browsing computer, password, and possibly other information-assurance certifications. AFI 33-129 further outlines the security mechanisms (e.g., password and type of encryption) appropriate for placement on Web site information.32


Personnel authorized to place information on Air Force Web pages must undergo training in topics relating to OPSEC, Privacy Act information, information designated “for official use only,” and Web administration.33 The Web Administration course (the primary training source), offered through the Air Force’s computer-based training program, distills much of AFI 33-129 into practices required specifically of people working at the operational level.34 Although the course addresses the protection of Web information, it emphasizes network security and only alludes to OPSEC and information vulnerability, which it treats perfunctorily.

Typically, the base or wing prescribes supplemental training. Maxwell AFB, Alabama, for example, requires Web masters to pass a locally constructed course that covers AFI 33-129 in depth. Additionally, the base Webmaster holds quarterly meetings of Web-page maintainers, during which personnel learn about the latest changes in Internet policies. Training related to information protection utilizes the same means as does the line Air Force: computer-based courses and briefings on information-assurance awareness, network-user license awareness, Privacy Act information, and operational risk management.

OSINT Vulnerabilities
on the Internet

Assuming adherence to all of its policies and practices, the Air Force appears to have in place the mechanisms needed to minimize exposure to traditional OSINT collection. What vulnerabilities remain? The answer lies along two fronts.

First, information available on sites outside the scope of Air Force control represents a much deadlier threat than the OSINT-collection possibilities on Air Force–owned systems because the service has no control of the data (nor does the US government, in the case of information protected by the First Amendment or of Internet portals hosted by foreign entities). Too, the sheer volume of data relative to that available on Air Force systems increases the probability that useful intelligence information actually exists and can be found.

The second, more tractable, threat involves systems under Air Force control. The current process to reduce the risk of OSINT collection focuses almost exclusively on information content, but the Internet delivers more than content over open sources, including metadata, meaning, and information about content. For example, when a Web browser requests a Web page, the Web server transmitting the page may also transmit the date of the Web page’s creation and modification as well as the name of the server software transmitting the page. The former two items yield insight into the currency of information on the Web page; the latter item keys an assailant to documented software flaws susceptible to network attacks. Similarly, because the uniform resource locator (URL)—the addressing scheme used by the WWW—contains a wealth of information, it can be unintentionally compromising. The Air Force’s Internet policies and practices mitigate risk from visible information—not from unseen information used to transmit content.

Deterrents and

The Internet presents the Air Force with an inexpensive and pervasive mechanism for transmitting information. It can improve communication with the public and within the Air Force community. However, any information that the Air Force releases to the public could also potentially reveal a military vulnerability. The following recommendations address the need to decrease the Air Force’s exposure to vulnerabilities.

Long-Term Recommendations

The Air Force should present itself as a player in crafting national and international policy for open-source information. At present, such policy is in the formative stage; immediate and aggressive support could put the Air Force in a role to substantially influence the management of information on the Internet. Some institutions have already offered suggestions regarding such policy. For example, a recent University of Maryland report proposes developing a definition of sensitive information that is unclassified but controlled; identifying mechanisms for controlling such information within the public, private, and academic/scientific sectors; encouraging a process of reviewing research findings for possible open-source vulnerabilities; and launching an education campaign to make all sectors of the information community aware of such vulnerabilities.35

Currently, we have no principal defender of cyberspace in the same sense that we have principal defenders of land, sea, and air. Since the Air Force includes information superiority as one of its distinctive capabilities—as demonstrated by its capabilities in information warfare and network security—the service is in a position to assume that role.36 Adopting positions advocated by the University of Maryland’s policy researchers, particularly the ones outlined above, will move the service in that direction.

Short-Term Recommendations

The Air Force could take a number of actions within its own bounds, thereby eliminating the need for cooperation from multiple information communities.

1. Work with the research community to help it understand its ethical obligation to control the distribution of sensitive work (a recommendation of the University of Maryland’s report). The authors of the report clearly understand the culture of biological and nuclear scientists, an audience that has traditionally appreciated the need to use research results responsibly. However, this optimism may become misplaced when it comes to engendering a culture of introspection and cooperation among the computer-hacker community, which historically has flaunted conventional ethical standards by posting online tools and techniques for breaking into computers, launching denial-of-service attacks, constructing viruses, and so forth.

These actions have isolated the hacker community into a subculture all its own. Few legitimate organizations are willing to understand the hacker community, much less work with it. As information plays an increasingly important role in maintaining the country’s infrastructure, it becomes necessary to persuade those who could damage that infrastructure to adopt mainstream ethical behavior. The Air Force can and should assume this role by becoming actively involved in working with the computer-hacker community by attending hackers’ conferences and showing a desire to rechannel their creative work into more productive endeavors. A successful effort would reduce the amount of computer hacking of open-source information available to the public and would facilitate understanding and prevention of possible attacks.

2. Reevaluate Web-site policy. Web sites should periodically review their presentation of information in light of advancing search-engine capabilities. Organizations should be encouraged to determine how they want their Web information accessed. Search engines’ default position of directing searchers to any page within a Web site allows visitors to access pages deep in a semantic hierarchy of Web pages without first visiting so-called entry pages, which provide context and meaning. Air Force organizations wishing to enforce entry pages should take appropriate action to remove access to deep pages by search-engine crawlers.

3. Identify an information-removal policy. Air Force policy that calls for the removal of sensitive information discovered on its Web sites is only a starting point. It should follow up with an assessment of who may have gotten that information, creation of worst-case and probable-case scenarios should someone use the information, actions to take if the user copies information to a server beyond Air Force control, and so forth.

4. Establish an active OSINT-collection initiative within the Air Force that would attempt to track down useful intelligence data for the purpose of identifying how the information came to be posted on the Internet. If the information originated with people under Air Force jurisdiction (e.g., military personnel, employees, or contractors), then the service could take remedial action.

5. Assemble a team to examine the metadata exposure of Air Force Web sites and develop recommendations for minimizing such vulnerabilities. Make recommendations for improving Web administrations’ awareness of these technical OSINT possibilities.

6. Conduct red-team attacks regularly on each organization’s Web presence, looking specifically for OSINT candidates and examining the content of each Web page as well as browser tags that describe the rendering of information. These tags can reveal information not visible to the browser but useful in determining software capabilities of the page builder.

7. Enable or disable Web pages for searches. Use the “no follow” tag to prevent legitimate search-engine Web crawlers from indexing Web pages. Use links that require the user to enter the text displayed in computer-generated pictures to thwart crawlers that ignore such tags, such as illicit crawlers that surreptitiously examine the Web. Analyze Web traffic to detect traces of automated Web crawlers.


Not a passing trend, Internet technology has woven itself into the very fabric of our culture. Although it began as a research project for networking computers, it has evolved into a societal project for networking people. That the Internet is an inherently public medium makes it attractive to individuals—both friendly and unfriendly—who seek information over open sources.

Air Force policies and practices put into place mechanisms for minimizing the risk of exposing sensitive content on its public Web sites. However, two vulnerabilities remain: OSINT collection from Web sites not under Air Force control and intelligence collection from metadata delivered collaterally with content. Generally intractable, the first vulnerability requires defensive measures to detect the intelligence and then take action necessary to protect the target of the intelligence. The second vulnerability, which lies within the scope of Air Force control, requires offensive measures to change the way of posting information to public Web sites.

[ Feedback? Email the Editor ]


1. This article uses Internet as an umbrella term for describing any information service available to globally networked computer users. Although such usage is technically inaccurate, it reflects the colloquial trend of referring to e-mail, Internet messaging, the World Wide Web, and so forth, by the generic network that carries them.

Strictly speaking, the Internet is a worldwide system of connected computers that exchange information using a common protocol. Information sent over the Internet is broken into small segments, each one transmitted along computers in the Internet network until it reaches its destination computer, at which point the segments are reassembled into the original format. The Internet proper remains unaware of the content of the information transmitted. It simply provides a mechanism for transporting data, regardless of whether that data represents text, static images, voice, video, or sound.

Software running on computers connected to the Internet uses the Internet to provide useful services. Built on top of the basic Internet transmission protocol, these services, in effect, give meaning to the information routed through the network. Typical Internet services include sending and receiving electronic mail (SMTP protocol), logging in to a remote computer (telnet protocol), text conferencing (IRC protocol), transferring files (FTP protocol), posting and reading user-generated news (NNTP protocol), and retrieving and displaying a Web page (HTTP protocol).

2. “Worldwide Internet Users Will Top 1 Billion in 2005,” Computer Industry Almanac, Inc., 3 September 2004, http://www.c-i-a.com/pr0904.htm; “Rank Order—Internet Users,” The World Factbook, 17 May 2005, http://www.odci.gov/cia/publications/factbook/rankorder/2153rank.html; and “Global Internet Trends,” Nielsen/NetRatings, http://www.nielsen-netratings.com.

3. “The E-Government Act of 2002,” H.R. 2458/S. 803, 17 December 2002, E-Gov, http://www.whitehouse.gov/omb/egov/g-4-act.html.

4. Web Site Administration Policies and Procedures (Washington, DC: Department of Defense, Office of the Assistant Secretary of Defense, 25 November 1998), http://www.defenselink.mil/webmasters/policy/dod_web_policy_12071998_with_amendments_and_corrections.html.

5. Army Regulation (AR) 25-1, Army Knowledge Management and Information Technology, 15 July 2005, http://www.usapa.army.mil/pdffiles/r25_1.pdf; Secretary of the Navy Instruction (SECNAVINST) 5720.47A, Department of the Navy Policy for Content of Publicly Accessible World Wide Web Sites, 24 October 2003, http://www.chinfo.navy.mil/navpalib/internet/secnav5720-47a.pdf; Marine Corps Order (MCO) 5720.76, Standardization of Publicly Accessible Web Pages, 14 September 2001, http://www.usmc.mil/directiv.nsf/bf7ed869c4398a1685256517005818da/afa583efe72bda7685256af10060c9d7/$FILE/MCO%205720.76.pdf; and Air Force Instruction (AFI) 33-129, Web Management and Internet Use, 3 February 2005, http://www.e-publishing.af.mil/pubfiles/af/33/afi33-129/afi33-129.pdf.

6. Marshall McLuhan, Understanding Media (1964; repr., Cambridge, MA: MIT Press, 1995), n.p.

7. Here we make a distinction among data, information, and intelligence. Data is raw content; information is data that has meaning and context; intelligence is the application of information for a particular purpose. Note that we make no assertion as to the accuracy of the information on the Internet.

8. Director of Central Intelligence Directive (DCID) 2/12, Community Open Source Program, 1 March 1994, http://www.fas.org/irp/offdocs/dcid212.htm.

9. Report on the Availability of Bombmaking Information (Washington, DC: Department of Justice, April 1997), http://www.usdoj.gov/criminal/cybercrime/bombmaking info.html; “Will I Be Able to See My House?” Keyhole, http://www.keyhole.com; “Medical Management Guidelines (MMGs) for Nerve Agents: Tabun (GA), Sarin (GB), Soman (GD), and VX,” Agency for Toxic Substances and Disease Registry, 2004, http://www.atsdr.cdc.gov/MHMI/mmg166.html; “Tools,” Hacking Exposed, http://www.hackingexposed.com/tools/tools.html; and NATO Open Source Intelligence Handbook, November 2001, http://www.oss.net/dynamaster/file_archive/030201/ca5fb66734f540fbb4f8f6ef759b258c/NATO%20OSINT%20Handbook%20v1.2%20-%20Jan%202002.pdf.

10. How Much Information? (Berkeley: University of California, School of Information Management and Systems, 2003), http://www.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm.

11. Digital Library Project, University of California–Berkeley, http://elib.cs.berkeley.edu; and Amazon.com, http://www.amazon.com.

12. Michael K. Bergman, “The Deep Web: Surfacing Hidden Value,” White Paper, BrightPlanet, 2005, http://www.brightplanet.com/technology/DeepWeb.asp.

13. Stephen C. Mercado, “Sailing the Sea of OSINT in the Information Age,” Studies in Intelligence 48, no. 3 (2004), http://www.odci.gov/csi/studies/vol48no3/article05.html.

14. Note that a number of engines search a specific collection of electronic information. LexisNexis (www.lexisnexis.com), for example, yields results from a search of its proprietary library of legal, regulatory, and business documents. See also www.google.com; www.altavista.com; www.excite.com; www.yahoo.com; and search.msn.com.

15. “Press Center: Google Inc. Fact Sheet,” Google, 2005, http://www.google.com/press/facts.html; and “Corporate Information: Company Overview,” Google, 2005, http://www.google.com/intl/en/corporate/index.html.

16. “Corporate Information: Technology Overview,” Google, 2004, http://www.google.com/corporate/tech.html.

17. “Google Alerts,” Google, 2005, http://www.google.com/alerts.

18. “Google Code: Google APIs,” Google, 2005, http://code.google.com/apis.html.

19. “Corporate Information: Technology Overview.”

20. Web Site Administration Policies and Procedures, pt. 1, sec. 4.2.

21. DODD 5230.9, Clearance of DOD Information for Public Release, 9 April 1996, par. D.2, https://sites.defenselink.mil/dd5230_9.html.

22. DODI 5230.29, Security and Policy Review of DoD Information for Public Release, 6 August 1999, par. 5, http://www.dtic.mil/whs/directives/corres/pdf/i523029_080699/i523029p.pdf.

23. Web Site Administration Policies and Procedures.

24. Ibid., pt. 2, sec. 3.

25. Ibid., pt. 4, sec. 2.

26. Ironically, by explicitly naming what should not be posted to public Web pages, the military makes known what constitutes sensitive data. Web searches tuned to these areas would presumably yield useful intelligence.

27. AFI 35-101, Public Affairs Policies and Procedures, 26 July 2001, chap. 18,

28. AFI 33-129, Web Management and Internet Use, sec. 3.

29. “Checklist,” Air Force Public Web Site Review, https://www.webreview.hq.af.mil; and AFI 33-129, Web Management and Internet Use, sec. 3 and table 1.

30. AFI 10-1101, Operations Security (OPSEC), 31 May 2001, attachment 3, http://www.e-publishing.af.mil/pubfiles/af/10/afi10-1101/afi10-1101.pdf.

31. AFI 33-129, Web Management and Internet Use, sec. 1.

32. Ibid., table 1.

33. Ibid., sec. 3.10.4.

34. Web Administration, Course YAF04SE, Air Force ITE-Learning, https://usaf.skillport.com.

35. Jacques S. Gansler and William Lucyshyn, The Unintended Audience: Balancing Openness and Secrecy: Crafting an Information Policy for the 21st Century (College Park: University of Maryland, Center for Public Policy and Private Enterprise, School of Public Policy, September 2004), http://www.cpppe.umd.edu/Bookstore/Documents/UnintendedAudience_3.05.pdf.

36. Air Force Doctrine Document (AFDD) 1, Air Force Basic Doctrine, 17 November 2003, 78.


Lt Col David A. Umphress, USAFR Lt Col David A. Umphress, USAFR (BS, Angelo State University; MCS, PhD, Texas A&M University), is an associate professor of computer science and software engineering at Auburn University and a researcher at the College of Aerospace Doctrine, Research and Education, Maxwell AFB, Alabama. He has served on the faculty at Seattle University; as lead software engineer, US Strategic Command; as assistant professor at the Air Force Institute of Technology; as communications-computer officer at Texas A&M University; and as lead systems programmer for the 1020th Computer Services Squadron. A Certified Software Development Professional, Colonel Umphress led an Air Force–Navy venture to modernize a software organization of over 900 people and led a project to develop the Air Force’s first workstation application in the Ada software language.


The conclusions and opinions expressed in this document are those of the author cultivated in the freedom of expression, academic environment of Air University. They do not reflect the official position of the U.S. Government, Department of Defense, the United States Air Force or the Air University

[ Back Issues | Home Page | Feedback? Email the Editor ]