ArtemisSearch: A Multimodal Search Engine for Efficient Video Log-Life Event Retrieval Using Time-Segmented Queries and Vision Transformer-based Feature Extraction
This text-based multimodal search engine is specifically designed to retrieve log life events within videos. The engine is equipped with advanced search capabilities and can extract keyframes based on a lengthy series query. Additionally, it enables the identification of words within the images contained in the videos. Our motivation is driven by the desire to learn, innovate, and make a difference.
ArtemisSearch, a text-based multimodal search engine designed for temporal event retrieval in videos. In the proposed system, an efficient algorithm for Content-Based Image Retrieval (CBIR) using ViT-H/14 and BEiT3 for feature extraction and an opensource vector database, Milvus. To further enhance the model’s performance, we propose using EasyOCR for Optical Character Recognition (OCR)-based queries on text in images or videos.
- We extract frames from videos in our dataset at a predefined interval (e.g., every second) to ensure we capture a representative sample of the video’s content.
- Each extracted frame is fed into the CLIP-ViT-H/14 or BEiT3 model, generating a high-dimensional feature vector that captures the visual semantics of the frame.
- The resulting feature vectors are stored in a Milvus database for efficient retrieval during the query phase.
- This feature allows for searching keyframes across a lengthy sentence. Furthermore, our engine supports breaking down long-series queries into individual sentences, which is very userfull for retrieval.
- Besides, users have the ability to view the video associated with selected keyframes. This feature also enables users to observe the keyframes both before and after the chosen event, enhancing the overall viewing experience.
# 1. Clone the respority:
git clone https://github.com/LoylP/AIC2024
# 2. Install dependencies:
pip install -r requirement.txt
# 3. Start the search engine:
unicorn serve:app -–reload
# 4. Start the frontend app:
npm run start


