laitimes

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Pine comes from the Temple of Cave Fei

Qubits | Official account QbitAI

Heavy Internet enthusiasts are here!

Have you ever encountered this situation: a meme map has been searched all over the Internet and has not been found.

Now a brother on the Internet has created an Internet-scale Meme search engine, and there are nearly 20 million meme pictures in the library, covering various niche cultures.

Search for keywords, or upload similar images, and the results will be available in seconds!

If you encounter memes that are not in the Meme library, you can also share and upload them.

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

The meme picture that netizens have not found for six years was found in 2 minutes on this little brother's website.

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

However, the device behind the goose such a second-second meme diagram is indeed sauce Aunt's:

(Wouldn't that be a bit shabby.)

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

At this time, there may be a curiosity, how does this crude device do to quickly retrieve memes?

Then let's take a look at how this "Meme search engine" is built~

Inspired by iPhone picture recognition

To write a Meme search engine, the most important and first problem is: how to accurately identify the text information in the meme map?

In technical terms: how to have a scalable OCR (optical character recognition)?

OCR solutions are readily available, but the existing ones either encounter more abstract meme recognition results are not very good, or too expensive.

Take a simple chestnut

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

For example, using Tesseract OCR to extract text in images, when testing, only very standard fonts and color schemes can be used to identify Meme images, otherwise the following situation will occur.

Here is the original image:

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Here is the recognized text:

30 BLUE man41;? S4-5?’ flew/ — V [IL ‘ . “,2; g” .’ Sj /B”f; T”EArmDand [red] mvslmunlm: sawmills

Emmmmmm

But the inspiration soon came, when the little brother accidentally sent someone a captcha picture on the iPhone.

Here is the captcha picture:

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Here is the copied text:

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

And this function of the iPhone has been exposed in the iOS Vision framework, and there is no solution to the problem of scalable OCR~

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

However, there is currently no ready-made open-source code plug-in for the Vision framework, so I can only write it myself, and the specific code brother has not yet been announced.

BUT, the little brother still summarizes his experience in writing code, and it is for a white who has never written anything serious in Swift:

In case of decision, Google Zhi

Reverse engineer various Swift repo agreements on Github

Ask friends who understand iOS to solve Xcode problems

……

In the end, they cobbled together a viable solution: the iOS Vision OCR server, which would run on just one iPhone.

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Identifying text information is done, and then it is the turn of the search link, which is much simpler than the previous party.

The brother uses ElasticSearch (open source) and Postgres.

ElasticSearch has multiple nodes to avoid failures and accommodate millions of memes at the same time, but this comes at the expense of reliability.

Postgres, on the other hand, guarantees reliable search results, but becomes particularly slow when it exceeds the range of a million images.

One can guarantee speed, the other can guarantee quality, that...

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Done!

Among them, the brother uses PGSync, which is a middleware that can be used to synchronize data from Postgres to Elasticsearch/OpenSearch, and the specific search process is as follows:

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

At this point, the construction of the entire search engine has begun to take shape, but it is not over...

Video meme is also supported

Because Meme doesn't just rely on memes, sometimes there are videos.

This is also simple, directly split the video into a screenshot set, and then it can be recognized like a normal Meme image.

Specifically, Xiaoge wrote a small microservice to capture 10 evenly spaced pictures from the video through FFMPEG (which can perform recording, conversion, and streaming functions in multiple formats of audio and video).

Then send the screenshot file to the iPhone OCR service, and the final video file will have the result set after OCR of each screenshot.

However, with the video retrieval function, there is no doubt that the load of OCR service is heavy, and the workload of a video OCR is almost 10 times that of ordinary memes.

Although the speed of the OCR application server is fast, it can't help but be like this, so the iOS OCR service is upgraded (add a few more phones), and the final device is like the one in the picture at the beginning.

Finally, the specific flow chart brother also gave it intimately:

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

After the little brother's Meme search engine came out, netizens also called it great.

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Of course, some netizens gave some suggestions, thinking that the current search engine is too textual, and many Meme pictures themselves do not have much text, more often they are "understanding".

In this regard, the little brother himself also responded, saying that he will continue to optimize the search engine in the future:

Consider converting images into text for descriptions...

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

However, it is worth mentioning that at present, this search engine is not very supportive of Chinese, and the Chinese meme search effect is not very good, but since the little brother has given the construction method, look forward to a wave of almighty netizens. (Manual dog head)

If you are interested in this project, you can poke the link below~

Portal:

https://findthatmeme.com/

Reference Links:

https://findthatmeme.com/blog/2023/01/08/image-stacks-and-iphone-racks-building-an-internet-scale-meme-search-engine-Qzrz7V6T.html

.AI

Read on