Search Engines

Scattered notes on alternative search engines. Just getting things written down.

https://github.com/djoerd/searsiaclient -- Searsia is a protocol and implementation for large scale federated web search.

Today's web search engines need to combine search results from many independent sources, such as results from a Web crawl, from Advertisements, from Video's, Books, News, Shopping, etc. Searsia provides an approach to search that:

Manages and shares large collections of independent sources.
Selects for each query the most relevant sources.
Combines sources in an aggregated search interface.
Learns over time what kind of information each source provides.

https://searsia.org/about.html

https://github.com/dato-ai/dato.rss -- A seamless RSS Search Engine experience with a hint of Machine Learning.

Search Engine: Quickly search through the millions of available RSS feeds.

RESTful API: Turns feed data into an awesome API. The API simplifies how you handle RSS, Atom, or JSON feeds. You can add and keep track of your favourite feed data with a simple, fast and clean REST API. All entries are enriched by Machine Learning and Semantic engines.

https://teclis.com/ -- Teclis is a search engine for finding interesting, unique results on 'clean' websites.

Teclis is not a Google replacement, and works best for research and discovery with broad(er) search phrases like the examples above. For a full Google replacement, that incorporates Teclis, check Kagi.

The crawler is hybrid, using async python requests and puppeteer with uBlock Origin. The way detection works is we count the number of uBO blocked requests on the page, and if too many (threshold is set to 5), we kick it out, leaving only "clean" pages in the index.

Crawler is also unique in a sense that it will follow an interesting dead link to its internet archive page, trying its best to preserve the page in our index (you will see those results under "Internet Archive" section).

https://www.alexandria.org/ <-- Alexandria.org is a non-profit, ad free search engine. Our goal is to provide the best available information without compromise.

The index is built on data from Common Crawl and the engine is written in C++. The source code is available at https://github.com/alexandria-org

https://wiby.me/ <-- search engine for the classic web https://wiby.me/submit/ <-- submit pages for indexing.

https://search.marginalia.nu/ <-- An independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed. https://memex.marginalia.nu/projects/edge/about.gmi

https://github.com/cblgh/lieu <-- an alternative search engine Created in response to the environs of apathy concerning the use of hypertext search and discovery. In Lieu, the internet is not what is made searchable, but instead one's own neighbourhood. Put differently, Lieu is a neighbourhood search engine, a way for personal webrings to increase serendipitous connexions. https://lieu.cblgh.org/

https://mwmbl.org/ -- An open source, non-profit search engine implemented in python https://github.com/mwmbl/mwmbl

https://millionshort.com/ -- Remove top 100-1,000,000 websites from search results. Filters to include/exclude ecommerce or live chat.

https://github.com/yacy/yacy_search_server -- yacy is a self-hosted and distributed search engine. http://yacy.net/

https://random.surf/#miguelibarra.dev -- Not a search engine, random websites like stumbleupon was. Interestingly, they use a bloom filter to avoid duplicate content without keeping history. https://random.surf/advanced

https://hndex.org/ -- indexes hn posts.

https://github.com/sergiotapia/torrentinim -- self-hosted torrent search engine.

https://stovetop.app/ -- no bs recipe search

https://gigablast.com/ -- "no more dictators" source at https://github.com/gigablast/open-source-search-engine https://private.sh/ -- "encrypted" search. By Gigablast ^

https://re-search.xyz/ -- Research collective composed of search engineers -- https://re-search.xyz/writing/mapping-the-new-world-towards-a-new-information-engine

https://github.com/metarank/metarank -- A low code Machine Learning tool that personalizes product listings, articles, recommendations, and search results in order to boost sales. A friendly Learn-to-Rank engine https://metarank.ai

https://dontbeevil.rip/search -- API only search engine for developers. Indexes HN/SO/reddit/github

https://locserendipity.com/edu.html search EDU domains on DMOZ

https://www.mojeek.com -- Mojeek is a web search engine that provides unbiased, fast, and relevant search results combined with a no tracking privacy policy.

https://searchmysite.net/ -- Open source search engine and search as a service for personal and independent websites. Help improve it by submitting your favourite sites via Quick Add or your own site via Verified Add (both available via Add Site). Open source: https://github.com/searchmysite/searchmysite.net

https://breezethat.com/ -- Breeze has many specialty search engines.

https://blogsurf.io/ -- search engine for blogs.

https://torrents-csv.ml/ -- Torrents.csv is a collaborative git repository of torrents, consisting of a single, searchable torrents.csv file. Its initially populated with a January 2017 backup of the pirate bay, and new torrents are periodically added from various torrents sites. It comes with a self-hostable webserver, a command line search, and a folder scanner to add torrents.

https://gitea.com/heretic/torrents-csv-server

https://github.com/hnhx/librex -- A privacy respecting free as in freedom meta search engine

Online instances: search.davidovski.xyz

Ad & JavaScript free
Torrent results from popular torrent sites
Special queries (e.g.: 1 btc to usd , what does xyz mean etc.)
Tracking snippets from URLs are removed
Image results are converted to base64 to prevent clients from connecting to Google servers
Supports both POST and GET requests
Popular social media sites (YouTube, Instagram, Twitter) are replaced with privacy friendly front-ends
Easy to use JSON API for developers
No 3rd party libs are used
Easy to setup