Search Engines

1.2. Search Engines

During the Business related information gathering phase, there is a great deal of diverse research conducted and are as follows:

                      -------------> Web Presence -------------
                    /                                           \
                   /                                             V  
        Cached and Archival Sites                    Partners and Third Parties
                  ^                Public Information            |
                  |                                              V  
              Harvesting                                  Job Posting
                  ^                                             /
                   \                                           /
                     ------------Financial Information <------

1.2.1. Web Presence

In this phase, you will learn a great deal more about your target including:

What they do
What is their business purpose
Physical and logical locations
Employees and departments
Email and contact information
Alternative web sites and sub-domains
Press release, news, comments, opinions

Sources that you can get the data from:

Organization websites You can get:
- The location of the company
- The name of the business
- Projects
- External links (i.e. Social Media)
Google Dorks Operators:
- AND
- OR
- ""
Filters:
- cache [cache:www.website.com]
- link [link:www.website.com]
- site [some query string site:www.website.com]
- filetype [some query string filetype:www.website.com]
References:
Other Search Engines Example:
- linkedin
- Bing
- Yahoo
- Ask
- Aol
- Pandastats.net
- Dogpile.com
DUNS number and CAGE code Organizations that operate globally and have a desire to sell to the U.S. government or government agencies, are required to possess two codes useful to us:
- DUNS number (Duns and Bradstreet)
- CAGE code (or NCAGE for a non U.S. business)
These 2 codes allows us to retrieve even more information such as contacts, products lists, active/inactive contracts with the government, and much more.
You can retrieve the DUNS and CAGE code for a given company from the following website

You may have probably notices by now that this process is not set in stone and is never the same for all the organizations. Organizations belonging to different industries can be investigated through search in different publicly available databases. Compliance and regulations might force companies to publish different kind of information publicly.

An example is publicly traded companies that have to file their financial documents to SEC database. For this purpose, you can use the EDGAR (Electronic Data Gathering, Analysis, and Retrieval System).

1.2.2. Partners and Third Parties

Other information that you can gather about the company a re mergers acquisitions, partnerships, third parties, etc.

With these you can deduce what type of technologies and systems they use internally.

1.2.3. Job Posting

From job postings we can deduce internal hierarchies, vacancies, projects, responsibilities, weak departments, financed projects, technology implementations and more.

Job posts websites:

LinkedIn
Indeed
Monster
Careerbuilder
Glassdoor
Simplyhired
Dice

1.2.4. Financial Information

With a company's financial information, you can easily find out if the organization:

is going to invest in a specific technology
might be subject to a possible merge with another organization
has critical assets and business services

Tools:

Crunchbase You can find information about:
- Companies
- People
- Investors and financial information Anyone can edit the information in it
Inc Inc. focuses its attention on growing companies and provides advice, resources, and information to companies. It offers a list of the 500/5000 fastest-growing private companies, showing very useful information and statistics to them.

1.2.5. Harvesting

In this phase, we unpack methods for gathering company documents such as charts (detailing the company structure), database files, diagrams, papers, documentation, spreadsheets, and so on. This is the right time to begin harvesting emails accounts (Twitter, Facebook, etc.), names, roles, and more.

It is important to know that when a document is created, it automatically stores information (metadata) like who created it, date and time of creation, software used, computer name, and so on.

If we are able to retrieve documents online and inspect the underlying metadata, we can extract useful information.

1. Google Dorks

We can use this following google filters:

site:[website] and filetype:[filetype]

This will narrow down the results and display only the links to files with the [filetype] extension and stored in the [website]

2. FOCA

Doing this manually can be very tedious and time consuming. A very useful tool that allows us to automatically find and download files is FOCA

By querying engines like google and bing, FOCA is able to retrieve files and then attempt to extract metadata such as names, usernames, passwords, OS, etc.

Note that this tool works only on Windows unfortunately.

FOCA allows us to download and extract infrastructure information as well as business information, but now we are only going to pay attention to the business information.

3. theHarvester

Thanks to search engines and social networks, theHarvester is able to enumerate email accounts, usernames, domains, and hostnames.

Once we have the too installed on our machine, we can run the following command in order to retrieve information about elearnsecurity.com:

theharvester -d elearnsecurity.com -l 100 -b google

where:

-d is the domain or the company to search
-l limits the results to the value specified
-b is the data sources (google, linkedin, bing, etc)

1.2.6. Cached and Archival Sites

Since information on the web changes so quickly, sometimes seeking an older version of a site could provide useful to our cause.

Consider a job post. If the organization deletes it from the website, you will "lose" that information; if you could see the webpage, before the update, you could harvest that information. Turns out this is entirely possible through cache and archival technology.

Tool:

archieve.org
google dork (cache:URL)

Remember Logging!!

PreviousInformation Gathering NextSocial Media

Last updated 4 years ago

Was this helpful?