This package extracts data from images such as social media posts that contain an image part and a text part. The analysis can generate a very large number of features, depending on the user input. See our paper for a more in-depth description.
This project is currently under development!
Use pre-processed image files such as social media posts with comments and process to collect information:
- Text extraction from the images
- Language detection
- Translation into English or other languages
- Content extraction from the images
- Textual summary of the image content ("image caption")
- Feature extraction from the images: User inputs query and images are matched to that query (both text and image query)
- Question answering about image content
- Content extraction from videos
- Textual summary of the video content
- Question answering about video content
- Color analysis
- Analyse hue and percentage of color on image
- Multimodal analysis
- Find best matches for image content or image similarity
- Cropping images to remove comments from posts
The AMMICO package can be installed using pip:
pip install ammico
Or install the development version from GitHub (currently recommended for the new features):
pip install git+https://github.com/ssciwr/AMMICO.gitThis will install the package and its dependencies locally.
Demonstration notebooks can be found in the docs/tutorials folder and also on google colab:
The text is extracted from the images using google-cloud-vision. For this, you need an API key. Set up your google account following the instructions on the google Vision AI website or as described here. You then need to export the location of the API key as an environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="location of your .json"
The extracted text is then stored under the text key (column when exporting a csv).
Googletrans is used to recognize the language automatically and translate into English. The text language and translated text is then stored under the text_language and text_english key (column when exporting a csv).
The image and video content ("caption") is extracted using QWEN 2.5 Vision-Language model family. Qwen2.5-VL is a multimodal large language model capable of understanding and generating content from both images and videos. With its help, ammico supports tasks such as image/video summarization and image/video visual question answering, where the model answers users' questions about the context of a media file.
The audio transcription, language detection and translation is carried out using the WhisperX model family for audio transcription as developed by OpenAI.
Color detection is carried out using colorgram.py and colour for the distance metric. The colors can be classified into the main named colors/hues in the English language, that are red, green, blue, yellow, cyan, orange, purple, pink, brown, grey, white, black.
We welcome contributions to the ammico project! If you'd like to help improve the tool, add new features, or report or fix bugs, please follow these guidelines.
Please use the issues tab to report bugs, request features, or start discussions.
ammico is licensed under the MIT license.
Ammico has been published in Comp. Comm. Res., please cite the paper as specified in the Citations file.