Under The Ocean of the Internet

Under The Ocean of the Internet - The Deep Web

Introduction

The Internet is like big ocean. The ocean is filled with large continents and islands that people visit. A large continent would be Google, and an island would be the news site for your local newspaper. Every day the average person visits these continents and islands using their web browser, which act as a boat navigating to destinations on the Internet. The reality though is that these continents and islands only make up 4% of the Internet. The rest of the Internet is made up of the Deep Web, which is located under the ocean. The Deep Web or Invisible web is used for both good and bad, while some may assume its use is for illegal purposes. The use of the Internet continues to evolve, and the Deep Web is a big part of that.

Internet Usage

People all over the world use the Internet every day. There are currently over 3billion people that use the Internet, more than 1 billion websites, and 3.5 billion Google searches a day. There are also 500 million tweets sent a day ("Internet Live Stats - Internet Usage & Social Media Statistics", 2016). These numbers have grown significantly in the past 10 years, and will continue to grow as the Internet evolves and its use expands.

The vast number of people using the Internet have similar vast need for using it. The top 10 tasks that the Internet is used for are:

Email
Music & Movies
Searching
Buying Tickets
Shopping

Social Media a right outside of the top 10. Out of the listed tasks, how many of then do you regularly perform online? Your answer is most likely all of them. For these tasks you can either use them via a web browser, such as chrome, firefox, or you have some piece of software to perform the task like iTunes to download music and movies. As the usage of the Internet continues to change and evolve, the design of the Internet continues to be tested as well.

Internet Design

The Internet was originally designed as an open-architecture network, which allowed communication and collaboration. As the stated in "Brief History of the Inter":

"In an open-architecture network, the individual networks may be seperately designed and developed and each may have its own unique interface which it may offer to users and/or other providers, including other Internet providers. Each network can be designed in accordance with the specific environment and user requirements of that network. There are generally no constraints on the types of network that can be included or on their geographic scope, although certain programatic considerations will dictate what makes sense to offer".

As you can see, the Internet was meant to be a free and open space upon its conception. It was designed in a way that multiple systems, no matter how different, could all communicate with each other. This architecture still stands today, and it is making sure that your packets get delivered to their destination.

The Surface Web vs The Deep Web

The Internet is comprised of two pieces. Those two pieces are the Surface Web and the Deep web. The Surface Web is the area of the Internet that the average person visits, such as Facebook, Google, such as a web browser. The other area of the Internet is called the Deep Web. The Deep Web is made up of the Dark Web, Deep Web Databases, and much more. You need specialized software or access in order to interact with the Deep Web. The distinction between these two areas of the Internet are very important.

The Surface Web

The Surface Web is an area of the Internet that is indexable by search engines, such s Google, Facebook and etc... Other names for this area of the Internet are Visible Web, Lightnet, Indexed Web, Clearnet, or Indexable Web. Just to put it in perspective, there are currently over 8 billion indexed web pages.

Let's take a look at how a search engine like Google indexes web pages. They use pieces of software called web crawlers, whose primary purpose is for the discovery of web pages on the Internet. You will know that a Google web crawler is crawling your site by seeing "Googlebot" in the user agent String. Of course, the user agent string could be spoofed by an attacker. Once the web crawler visits a page, it will look for any links on that page and visit those pages. Upon visiting these pages, data is gathered and sent to Google. Google has software that determines which sites are to be crawled, the frequency of the crawling, and the number of pages to be retrieved from sites. It will give additional attention to sites that are new, have changed, or are no longer available. These pages are officially indexed by Google, so that the pages can be retrieved in an efficient and appropriate manner when needed. A Google index is comprised of information about various keywords and where they are located. When you search for something, Google does a lookup of your search term in an index to find matching pages on the we ("Crawling & Indexing - Inside Search - Google"). There is an area of the Internet though that Google cannot reach via a search, since it is not indexable.

The Deep Web

The Deep Web is an area of the Internet that is not indexable by search engines and not linked to pages on the Surface Web. Other names for this area of the Internet are Deep Net, Hidden Web, or Invisible Web. This part of the Internet makes up 96% of it, which is obviously significantly larger than the Surface Web. The Deep Web is 700 times larger than the Surface Web.

There are many reasons that a web page in not crawlable. The web page could be password protected, which would prevent a web crawler from accessing it. Another scenario could be that the web page is only allowed to be accessed a certain number of times, then it becomes unavailable. If that threshold is met before a crawler reaches the web page, then it wouldn't be crawled. Another way that a web page cannot be crawled is if the site's robots.txt file explicitly says not to crawl it. A robots.txt file is located in the root of a web site, and will let web crawler's know which directories are not allowed to be crawled on its site and which use agent's the rule applies to. The last scenario that would cause a web page to be uncrawlable, is if the page is simply hidden or not linked on any other page of the path to the page in order to visit it("The Ultimate Guide to the Invisible Web", 2013). The average Internet user is not going to use the Deep Web, so its use should be considered suspicious.

The Deep Web

The Deep Web is a complex and mysterious area of the Internet. There are many reasons that its content can be accessed or used for legitimate or illegitimate purposes. There is plenty of content available in the Deep Web, such as Dark Web Hidden Services and Deep Web Databases to name a couple. Special software, such as Tor, is required to access the Deep Web. the details on all of these facts of the Deep Web will be covered in the sections to follow.

Why Go Under Water?

The use of the Deep Web can be split into two categories. These categories are legal activity and illegal activity. An example of illegal activity would be the selling of stolen credit cards. An example of legal activity would be using the Wayback Machine to see a previous version of a web page. Regardless whether the usage is legal or illegal, the act of accessing the Deep Web is an intentional action.

Legal Activity

Believe it or not, there is plenty of legal activity that goes in the Deep Web. The Deep Web can be a very resources for a plethora of information. For instance, there are plenty of search engines that allow you to search databases not indexed by the Google's and Bing's of the world. These databases can contain virtual academic libraries or old version of web page ("The Ultimate Guide to the Invisible Web", 2013).

There are several tasks that are perfectly legal to perform on the Deep Web, and you might not realize that the data being accessed actually resides there. When somebody performs a background check on an individual, it searches several databases on the Internet for information. This information is actually being searched for on the Deep Web. Another use for the Deep Web is if an adopted person wanted to try and search for their natural parents. The databases that house this adoption information are on the Deep Web. You can also use the Deep Web to perform veteran research, or lookup your genealogy history. Legal research is also conducted on the Deep Web for cases. If you are a student, you can use the Deep Web too. There are several academic databases that you can search through for topics, such as scientific journals for example (Dube, 2014). Although there are legal activities to perform on the Deep Web, there are illegal activities as well.

Illegal Activity

This is no surprise, but there is definitely illegal activity that happens on the Deep Web. The Deep Web is a place to go where you can be anonymous. As such, this is a popular destination for criminals to buy and sell information. Some information that can be sold on the Deep Web are social security numbers, medical records, credit card numbers, and other Personally Identified Information (PII). You can buy enough information on somebody to easily steal their identity (Ingevaldson, 2015). The Deep Web is also used to sell drugs, display child pornography, trade weapons, and hire hitmet (Reporter, 2013).

What is in The Ocean?

There is plenty of content available on the Deep Web. Some of it is good and some is bad. The bad content involves things such as Dark Web Hidden Services, the Hidden Wiki, and Silk Road. Although Silk Road was taken down, it was one of the most popular sites to visit when it was still up and running. These pieces of content are available through the Dark Web.

To learn about Dark Web click here...

Conclusion

The Deep Web is the largest part of the Internet, yet the majority of the population doesn't even know about it, or even access it. It can be used for good and for bad, legal and illegal activity. It is more important to understand that is not all bad. There is plenty good about the Deep Web, which includes the right of privacy when surfing the Internet. The understanding of the Deep Web and its capabilities is vital to the future of the Internet, and hopefully this paper helps accomplish that goal.

Also read,

DevHackersBlog

Search This Blog