AI OpenSource List


Pretrained Models

  • 2019-Deep Learning Models 🗃️ : A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.

  • PyTorch Hub 🗃️ : PyTorch Hub 包含一个经过预训练的模型库,专门用于促进研究的可重复性和快速开展新的研究。PyTorch Hub 内置了对 Colab 的 支持,并且能够与 Papers With Code 集成。目前 PyTorch Hub 已包含一系列广泛的模型,包括分类器和分割器、生成器、变换器等。

  • Papers with Code 🗃️

  • UER-py : Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo.

  • google-research 🗃️ : This repository contains code released by Google AI Research.

  • 2021-AliceMind : This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba’s MinD (Machine IntelligeNce of Damo) Lab.


  • TensorFlow : TensorFlow is an open source software library for numerical computation using data flow graphs.

  • Pytorch : Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • scikit-learn : scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.

  • SciPy : SciPy (pronounced “Sigh Pie”) is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

  • 2019-Deep Java Library (DJL) : An Engine-Agnostic Deep Learning Framework.

  • 2019-NNI : An open source AutoML toolkit for neural architecture search, model compression and hyper-parameter tuning.

  • 2019-Thinc : A refreshing functional take on deep learning, compatible with your favorite libraries.

  • 2019-Streamlit : Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours! All in pure Python. All for free.

  • 2020-MegEngine : MegEngine 是一个快速、可拓展、易于使用且支持自动求导的数值计算框架。

  • 2021-Kedro : Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.

  • 2022-Towhee : Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.


  • tensorflow-playground: Play with neural networks!

  • Sonnet : Sonnet is a library built on top of TensorFlow for building complex neural networks.

  • TFLearn: Deep learning library featuring a higher-level API for TensorFlow.

  • Spleeter : Spleeter is the Deezer source separation library with pretrained models written in Python and uses Tensorflow.


  • PyTorch Lightning : The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

  • AITemplate : AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Universal Toolkits

  • 2021-AugLy : AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations.

  • 2022-Skypilot : SkyPilot is a framework for easily running machine learning workloads on any cloud through a unified interface.

Dataset Management

  • Hub : Fastest unstructured dataset management for TensorFlow/PyTorch. Stream data real-time & version-control it.


  • TensorSpace.js : Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js

  • Curve : An Integrated Experimental Platform for time series data anomaly detection.

  • wandb : Our tool wandb helps you track and visualize machine learning experiments.

  • Streamlit : Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts.

  • 2021-lux : Python API for Intelligent Visual Data Discovery

Utils & IDE

  • 2020-Otto : Otto is an intelligent chat application, designed to help aspiring machine learning engineers go from idea to implementation with minimal domain knowledge.

  • 2020-Spyder : Spyder is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package.

  • 2014-Jupyter : Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.

  • 2019-Jupytext : Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.

Machine Learning

  • NumPy : NumPy is the fundamental package for scientific computing with Python.

  • pandas : pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

  • Matplotlib : Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

  • feature-selector : Feature selector is a tool for dimensionality reduction of machine learning datasets

  • SPTAG : A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

Feature Engineering

Time Series

  • 2019-adtk : A Python toolkit for unsupervised anomaly detection in time series

  • 2020-sktime : A unified framework for machine learning with time series.

  • 2021-Kats : Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Deep Learning

  • tfjs : A WebGL accelerated, browser based JavaScript library for training and deploying ML models.

  • brain.js : brain.js is a library of Neural Networks written in JavaScript.

  • neurojs : neurojs is a JavaScript framework for deep learning in the browser. It mainly focuses on reinforcement learning, but can be used for any neural network based task. It contains neat demos to visualise these capabilities, for instance a 2D self-driving car.


  • tianshou : An elegant, flexible, and superfast PyTorch deep Reinforcement Learning platform.

Distributed Training

  • BytePS : BytePS is a high performance and general distributed training framework.

  • SQLFlow : SQLFlow is a bridge that connects a SQL engine, e.g. MySQL, Hive or MaxCompute, with TensorFlow, XGBoost and other machine learning toolkits. SQLFlow extends the SQL syntax to enable model training, prediction and model explanation.

  • Horovod : Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

  • 2019-ElasticDL : ElasticDL is a Kubernetes-native deep learning framework built on top of TensorFlow 2.0 that supports fault-tolerance and elastic scheduling.

  • 2019-Alink : Alink 是基于 Flink 的通用算法平台,由阿里巴巴计算平台 PAI 团队研发。

Integrated Tools

  • Deepo : Deepo is a Docker image with a full reproducible deep learning research environment. It contains most popular deep learning frameworks: theano, tensorflow, sonnet, pytorch, keras, lasagne, mxnet, cntk, chainer, caffe, torch.

  • 2017-Turi Create : Turi Create simplifies the development of custom machine learning models. You don’t have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.

  • Ludwig : Ludwig is a toolbox that allows to train and test deep learning models without the need to write code.

Federated Learning

  • FATE : 微众银行 AI 团队自主研发的全球首个工业级联邦学习框架 FATE(Federated AI Technology Enabler),提供基于数据隐私保护的分布式安全计算框架,为机器学习、深度学习、迁移学习算法提供高性能的安全计算支持,此外,FATE 还提供友好的跨域交互信息管理方案,能够解决联邦学习信息安全审计难问题。


  • 2018-ONNX Runtime : ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →