Computer Vision

  • CVAT : Powerful and efficient Computer Vision Annotation Tool (CVAT).

  • PaddleGAN : PaddlePaddle GAN library, including lots of interesting applications like DeepFake First-Order motion transfer, Mai-ha-hi(蚂蚁呀嘿), faceswap wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

  • 2020-MediaPipe : Cross-platform, customizable ML solutions for live and streaming media.


  • 2023-roboflow/notebooks : Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.

Image & Text

  • Versatile-Diffusion : We built Versatile Diffusion (VD), the first unified multi-flow multimodal diffusion framework, as a step towards Universal Generative AI.


  • Rembg : Rembg is a tool to remove images background. That is it.


  • Nsfw JS : A simple JavaScript library to help you quickly identify unseemly images; all in the client’s browser. NSFWJS isn’t perfect, but it’s pretty accurate (~90% from our test set of 15,000 test images)… and it’s getting more accurate all the time.

  • DeepCreamPy : A deep learning-based tool to automatically replace censored artwork in hentai with plausible reconstructions.


  • Tess4j : Java JNA wrapper for Tesseract OCR API.

  • 2018-alpr-unconstrained: License Plate Detection and Recognition in Unconstrained Scenarios.

  • 2020-PaddleOCR : PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.

  • 2020-EasyOCR : Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai

  • keras-ocr : A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

  • 2019-PaddleOCR : Awesome OCR toolkits based on PaddlePaddle(8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embeded and IoT devices).

  • 2020-mmocr : OpenMMLab Text Detection, Recognition and Understanding Toolbox


  • 2023-Segment Anything : The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.

    • 2023-Grounded-Segment-Anything : Marrying Grounding DINO with Segment Anything & Stable Diffusion & BLIP - Automatically Detect , Segment and Generate Anything with Image and Text Inputs

    • Magic Copy : Magic Copy is a Chrome extension that uses Meta’s Segment Anything Model to extract a foreground object from an image and copy it to the clipboard.

    • 2023-Semantic-Segment-Anything : Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

    • 2023-ZrrSkywalker/Personalize-SAM : How to customize SAM to automatically segment your pet dog in a photo album?

    • 2023-opengeos/segment-geospatial : A Python package for segmenting geospatial data with the Segment Anything Model (SAM)

    • 2023-SysCV/sam-hq : We propose HQ-SAM to upgrade SAM for high-quality zero-shot segmentation. Refer to our paper for more details. Our code and models will be released in two weeks. Stay tuned!

  • 2023-Painter : Painter & SegGPT Series: Vision Foundation Models from BAAI

  • 2023-Inpaint-Anything : Users can select any object in an image by clicking on it. With powerful vision models, e.g., SAM, LaMa and Stable Diffusion (SD), Inpaint Anything is able to remove the object smoothly (i.e., Remove Anything). Further, prompted by user input text, Inpaint Anything can fill the object with any desired content (i.e., Fill Anything) or replace the background of it arbitrarily (i.e., Replace Anything).

  • 2023-EditAnything : Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc.

  • 2023-GroundingDINO : The official implementation of “Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection”

  • 2023-Segment-Everything-Everywhere-All-At-Once : We introduce SEEM that can Segment Everything Everywhere with Multi-modal prompts all at once. SEEM allows users to easily segment an image using prompts of different types including visual prompts (points, marks, boxes, scribbles and image segments) and language prompts (text and audio), etc. It can also work with any combinations of prompts or generalize to custom prompts!

  • 2023-Recognize_Anything-Tag2Text : A Strong Image Tagging Model & Tag2Text: Guiding Vision-Language Model via Image Tagging

Object Detection


  • 2017-Detectron : Detectron is Facebook AI Research’s software system that implements state-of-the-art object detection algorithms, including Mask R-CNN.

  • 2017-Multi Object Tracker : Object detection using deep learning and multi-object tracking.

  • 2018-OpenPose : Real-time multi-person keypoint detection library for body, face, hands, and foot estimation.

  • 2021-CLIP : CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs.

  • yolov5 : This repository represents Ultralytics open-source research into future object detection methods, and incorporates our lessons learned and best practices evolved over training thousands of models on custom client datasets with our previous YOLO repository.

  • YOLOX : YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

  • YOLOv6 : a single-stage object detection framework dedicated to industrial applications.

  • Handtrack.js : 它可以让开发人员使用经过训练的手部检测模型快速创建手势交互原型。

  • MMDetection : MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

  • 2022-MMYOLO : MMYOLO is an open source toolbox for YOLO series algorithms based on PyTorch and MMDetection. It is a part of the OpenMMLab project.

  • detrex : IDEA Open Source Toolbox for Transformer Based Object Detection Algorithms


  • 2018-pico.js : a face-detection library in 200 lines of JavaScript

  • face-api.js : JavaScript API for Face Recognition in the Browser with tensorflow.js.

  • 2018-Faceswap : Faceswap is a tool that utilizes deep learning to recognize and swap faces in pictures and videos.

  • 2019-faceai : 一款入门级的人脸、视频、文字检测以及识别的项目。

  • SeetaFace : Open source, full stack face recognization toolkit.

  • 2019-超轻量级人脸检测模型 : 该模型设计是针对边缘计算设备或低算力设备(如用 ARM 推理)设计的实时超轻量级通用人脸检测模型,可以在低算力设备中如用 ARM 进行实时的通用场景的人脸检测推理,同样适用于移动端、PC。

  • 2020-Face Depixelizer : Face Depixelizer based on “PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models” repository.

  • 2020-FaceX Zoo : FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards state-of-the-art face recognition, as well as a standardized evaluation module which enables to evaluate the models in most of the popular benchmarks just by editing a simple configuration.


  • 2021-ByteTrack : ByteTrack: Multi-Object Tracking by Associating Every Detection Box.

  • 2023-Track-Anything : Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.


  • 2018-videoflow : Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment.

  • 2021-RobustVideoMatting : Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

  • 2023-DINOv2 : DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations.