Overview
You may need to remove documents that you upload to your website or content that you create (pages, events, news articles, staff profiles, etc.). Content may become outdated. External events could change the sensitivity of content. Individuals featured in content may wish to dissassociate themselves from that content after searching for their name in Google.
However, deleting that piece of content may be just the first step in removing it from public access. As soon as you publish content on the public web, automated scripts start making copies of your content across the internet. Some copies are easy to remove. Some are not.
Public versus private content
Should I make this content public?
Consider whether each item of content you publish is appropriate for the public. The Information Security Office's Protect Our Info website provides guidance regarding the proper handling of information by classification type. Protected or sensitive information should require authentication and access control. Search engine crawlers, AI crawlers, and internet archives cannot make copies of password-protected content. Princeton Site Builder has an optional Access Control & Private Content feature. The University provides options for private intranet sites.
Caching
I just deleted this page, why does it still show up in my browser?
Princeton Site Builder has 4 levels of caching. Serving up cached pages allows websites to load faster and reduces hosting costs. Some layers of cache clear automatically, and some may take a few minutes to clear. For more details, read through our Cache documentation page. If a cached version of a deleted page persists, contact WDS. We will identify the issue and assist you with fully removing that content.
Documents and Images
Site Builder completely removes deleted documents and images from the filesystem within a few minutes.
Removing documents or images from a page does not remove them from the file system. You must also delete them from the Document Library or Media Library.
By default, documents and images are uploaded to the public file system in Site Builder, and the full URL is available to the public, even if that document or image is added to an access-controlled page. With Site Builder's optional Access Control & Private Content feature enabled, content authors can upload a private documents and images. With private documents, access is restricted to authenticated users or a custom role. Search engines and other bots will not be able to access and make a copy of a private document.
Google Search and Bing
I deleted this page, but why does it still show up in search results?
Search engines like Google and Bing no longer provide links to view cached copies of pages or documents they have crawled. However, search results and search snippets of deleted pages can stay in search results for a few days or a few weeks. If you can prove to Google that you are the owner of a website page, you can submit a removal request that Google will usually honor within 24 hours. Bing has a similar service.
For Google, use the Google Search Console. When adding a website property to your Google Search Console account, use the URL Prefix option instead of the Domain option. If you have set up a Google Analytics property for that site with the same Google account that you used to sign into Google Search Console, you should be able to quickly verify your ownership. If that fails, you can use the HTML tag verification method. Copy the supplied meta tag, then have someone with the site administrator role log into the admin dashboard for your Site Builder site. Go to Configuration, Basic Site Settings, and paste that meta tag into the Google verification code field under the Search Engines section.
Once you have verified ownership of your site, you can navigate to Indexing, then Removals, and click the New Request button. You have the option to request removal for one URL or all URLs that begin with the same URL prefix.
For Bing, use Bing Webmaster Tools. It is simplest to allow Bing Webmaster Tools to communicate with Google Search Console to verify site ownership. Then go to Configuration, then Block URLs. There you can enter a page or entire directory to block. You will usually want to block the URL and Cache. The block will remain in place for 90 days, but if your content has been deleted or access controlled, Bing will not try to reindex it.
Princeton.edu Search
The search engine on the main princeton.edu website is a custom view of Google's search index. A completed removal request for the search index at google.com will also update Princeton's custom search. For particularly sensitive items, contact WDS. WDS staff can immediately exclude certain results from Princeton's custom search index.
Internet Archive Wayback Machine
The Wayback Machine, founded by the Internet Archive, is a digital archive of the world wide web at archive.org that hosts copies of over 900 billion public webpages, dating back to 1995. The Wayback Machine archives a wide variety of document types, including HTML, PDF, plain text, and various data compression formats. It also preserves embedded objects like images, videos, and stylesheets associated with webpages.
If the Wayback Machine's crawler detects a significant change to a web document, it will create a new snapshot, and visitors can browse a timeline history of multiple versions of that document. Sensitive information removed from the current version of a webpage is readily available in a previous copy. The Archive.org site does have a process for removal requests; however removal is not guaranteed.
If you would like to submit a request for archives of your site or account to be excluded from web.archive.org, send us a request to [email protected] and indicate:
- the URL or URLs of the material
- the time period that you wish to have excluded
- the time period during which you had control of the site or relevant user account (if applicable) and
- any other information that you think would be helpful for us to better understand your request.
This will initiate a review by our team. We do not make any guarantees beforehand about the outcome of a request.
Source: How do I request to remove something from archive.org?
Search Bots and AI
While Google and Bing dominate the global search market, there are other players, which include DuckDuckGo, Yahoo Search, Yandex (Russia), and Baidu (China). DuckDuckGo and Yahoo Search share Bing's index, so a Bing removal request affects these two search engines. Other search engines have their own procedures for removal requests.
A significant portion of the page views for Site Builder sites are website-crawling scripts or bots. Some scripts are collecting website data for public or private AI content generators or other Large Language Models. Others are scraping data for use in marketing campaigns.
In summary, while it is possible to delete published content from the public web, you can only make it difficult to find, not impossible to find.