近期AIGC相关实践和资料收集

假期的时候我带了一批本科生以“用chatGPT生成一个故事,然后用AI绘画做成连环画”为目标,做了AI绘画和模型微调的尝试并撰写了报告。但是我本人并没有亲自fineture模型,导致对一些AI绘画痛点的理解不是很到位。正所谓“纸上得来终觉浅,绝知此事要躬行”,所以这两天我也加入了炼丹师的队伍,现在也在玩微调模型的事情。

学生报告截图。先用chatGPT编写故事,然后用dreamstudio画图
pixiv网站以“ai生成”为关键词搜索的结果。其中大部分看着比较精致的作品,都是预训练模型+小批量图片finetune的结果。这个微调模型和出图的过程,因为不太稳定,所以也被戏称为“炼丹”。

客观的说,虽然2022年末以来AI展现出巨大变革,但普通人想用好AI绘画,现阶段还不是那么容易,这有一篇论文吐槽了当今AIGC领域的一系列问题和机遇,大家可以看看专家是怎么吐槽这一切的:Doom or Deliciousness: Challenges and Opportunities for Visualization in the Age of Generative Models

为了更好的使用AI绘画,近期也诞生了一批相关的辅助工具,比如给AI绘画补充提示词的promptist, 帮助探索绘画风格的prompt helper等等。目前我所在的浙大实验室也有博士在写这方面论文。这有一个网站,也收集了不少相关产品和工具:https://www.chinaz.com/ai/categorie/262_1.shtml

相对于图形领域,更能具备落地潜质的还是以chatGPT为代表的自然语言模型。在国外的futurepedia.io网站中,收集了超过1000种AI工具和创业项目。例如其中的character.ai,可以创建自带人格的对话模型,例如你可以创建一个鲁迅陪你聊天。我想假以时日,如果能收集足够自己的数据,弄一个赛博自己,就可以时不时让他陪媳妇聊天,这样我就可以专心打游戏了(笑)。

futurepedia网站首页。排在前面的就是号称前端岗位杀手的magicForm和website builder,确实现在AI的大发展已经引起了前端开发的广泛议论,甚至身边的浙大研究生都放弃AI岗位改为读博想做其他方向。

这里有个列表,罗列了AIGC的各种方向:https://github.com/yzihan/Generative-AI

Frontend

painter: https://github.com/aml2610/react-painter#readme

replit: https://replit.com/~

Stable Diffusion related

Stable Diffusion web UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui

txt2mask: https://github.com/ThereforeGames/txt2mask

Img2img Video: https://github.com/memes-forever/Stable-diffusion-webui-video

Vid2Vid: https://github.com/Filarius/stable-diffusion-webui/blob/master/scripts/vid2vid.py

Image segmentation and recognition

semantic-segmentation: https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512

plant recognition: https://web.plant.id/plant-identification-api/

texture recognition (need to apply): https://www.clarifai.com/models/texture-recognition

doodle-recognition: https://github.com/zhangchaodesign/doodle-recognition

Image style transfer

Arbitrary Neural Style Transfer: https://replicate.com/collections/style-transfer

Image search

image search: https://www.microsoft.com/en-us/bing/apis/bing-image-search-api

Graph-based

Spacy:https://spacy.io/api

Stanford OpenIE:https://nlp.stanford.edu/software/openie.html

Text generation

Demo-InferKit: https://app.inferkit.com/demo

Sassbook AI Story Generator: https://sassbook.com/ai-story-writer

Rytr-an AI writing assistant: https://rytr.me/

Title generation

OpenBMB: https://live.openbmb.org/ant

Text classification

Cohere: https://os.cohere.ai/playground/large/classify

Text analysis

Convert Unstructured Text Data Into Actionable Insights With Advanced Text Analysis: https://kpibees.com/

Semantic Role Labeling: https://demo.allennlp.org/semantic-role-labeling/semantic-role-labeling

Event extraction: https://huggingface.co/veronica320/QA-for-Event-Extraction

Sentiment analysis: https://huggingface.co/models?other=sentiment-analysis

Named entity recognition: https://huggingface.co/dslim/bert-base-NER

Continue images generation

StoryDall-E: https://github.com/adymaharana/storydalle?continueFlag=aecf3cf42991a37d09397fc61687c405 https://huggingface.co/spaces/ECCV2022/storydalle

Storygan: https://arxiv.org/pdf/1812.02784.pdf https://github.com/yitong91/StoryGAN

stable-diffusion-video: stable-diffusion-videos: https://github.com/nateraw/stable-diffusion-videos

StyleCLIP: https://github.com/orpatashnik/StyleCLIP

Generating image with specified character: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

Text-to-Video Generation

Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions: https://phenaki.video/index.html

Make-A-Video: https://makeavideo.studio/

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

CogVideo: https://github.com/THUDM/CogVideo

text-image-video: https://huggingface.co/spaces/Kameswara/TextToVideo

Imagen Video: https://imagen.research.google/video/

Magicvideo: Efficient video generation with latent diffusion models

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Video Diffusion Models

Image-to-Video Generation

Make it move: Controllable image-to-video generation with text descriptions

3D Generation

DreamFusion: Text-to-3D using 2D Diffusion: https://dreamfusion3d.github.io/https://github.com/ashawkey/stable-dreamfusion

dreamfields-3D: https://github.com/shengyu-meng/dreamfields-3D

ZoeDepth:https://github.com/isl-org/ZoeDepth

Text to 3D Scene

https://github.com/oaishi/3DScene_from_text

https://nlp.stanford.edu/projects/text2scene.shtml

Representing Scenes Generation

https://www.matthewtancik.com/nerf

Depth Analysis

https://github.com/EPFL-VILAB/omnidata

Compositional Visual Generation

https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/

https://people.csail.mit.edu/lishuang/#Home

Dynamic Human

https://developer.nvidia.com/blog/human-like-character-animation-system-uses-ai-to-navigate-terrains/

Switch Human Body

https://github.com/NVIDIA/vid2vid

Image to Text

img2prompt: https://replicate.com/methexis-inc/img2prompt

Text to Image

Deforum Stable Diffusion: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb

stable diffusion demo: https://demo.rowy.io/table/imageGeneration

Disco Diffusion: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb

Latent Diffusion: https://huggingface.co/spaces/multimodalart/latentdiffusion

Dreamstudio: https://beta.dreamstudio.ai/dream

clipdraw: https://deepai.org/publication/clipdraw-exploring-text-to-drawing-synthesis-through-language-image-encoders

styleclipdraw: https://github.com/pschaldenbrand/StyleCLIPDraw

Midjourney: https://www.midjourney.com/home/

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Image Editing with Text

Prompt-to-Prompt Image Editing with Cross-Attention Control: https://prompt-to-prompt.github.io/

Imagic: Text-Based Real Image Editing with Diffusion Models

Text augmentation

Bloom: https://huggingface.co/bigscience/bloom

OPT : Open Pre-trained Transformer Language Models: https://huggingface.co/facebook/opt-125m

seq2seq

Bart: https://huggingface.co/facebook/bart-base

Outpainting

stablediffusion-infinity: https://github.com/lkwq007/stablediffusion-infinity

Pytorch implementation

video-diffusion-pytorch: https://github.com/lucidrains/video-diffusion-pytorch

phenaki-pytorch: https://github.com/lucidrains/phenaki-pytorch

make-a-video-pytorch: https://github.com/lucidrains/make-a-video-pytorch

imagen-pytorch: https://github.com/lucidrains/imagen-pytorch

DALLE2-pytorch: https://github.com/lucidrains/DALLE2-pytorch

Access/Share State of Art Models

Hugging Face: https://huggingface.co/

Replicate: https://replicate.com/

Rapid API: https://rapidapi.com/hub

ml5.js: https://ml5js.org/

Pollinations: https://pollinations.ai/c/Anything

Some development issue

Access-Control-Allow-Origin: https://chrome.google.com/webstore/detail/allow-cors-access-control/lhobafahddgcelffkeicbaginigeejlf/related?hl=en

Some Human-AI Systems

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

StoryDrawer: A Child–AI Collaborative Drawing System to Support Children's Creative Visual Storytelling