近期AIGC相关实践和资料收集
文章目录[隐藏]
- Frontend
- Stable Diffusion related
- Image segmentation and recognition
- Image style transfer
- Image search
- Graph-based
- Text generation
- Title generation
- Text classification
- Text analysis
- Continue images generation
- Text-to-Video Generation
- Image-to-Video Generation
- 3D Generation
- Text to 3D Scene
- Representing Scenes Generation
- Depth Analysis
- Compositional Visual Generation
- Dynamic Human
- Switch Human Body
- Image to Text
- Text to Image
- Image Editing with Text
- Text augmentation
- seq2seq
- Outpainting
- Pytorch implementation
- Access/Share State of Art Models
- Some development issue
- Some Human-AI Systems
假期的时候我带了一批本科生以“用chatGPT生成一个故事,然后用AI绘画做成连环画”为目标,做了AI绘画和模型微调的尝试并撰写了报告。但是我本人并没有亲自fineture模型,导致对一些AI绘画痛点的理解不是很到位。正所谓“纸上得来终觉浅,绝知此事要躬行”,所以这两天我也加入了炼丹师的队伍,现在也在玩微调模型的事情。
客观的说,虽然2022年末以来AI展现出巨大变革,但普通人想用好AI绘画,现阶段还不是那么容易,这有一篇论文吐槽了当今AIGC领域的一系列问题和机遇,大家可以看看专家是怎么吐槽这一切的:Doom or Deliciousness: Challenges and Opportunities for Visualization in the Age of Generative Models
为了更好的使用AI绘画,近期也诞生了一批相关的辅助工具,比如给AI绘画补充提示词的promptist, 帮助探索绘画风格的prompt helper等等。目前我所在的浙大实验室也有博士在写这方面论文。这有一个网站,也收集了不少相关产品和工具:https://www.chinaz.com/ai/categorie/262_1.shtml
相对于图形领域,更能具备落地潜质的还是以chatGPT为代表的自然语言模型。在国外的futurepedia.io网站中,收集了超过1000种AI工具和创业项目。例如其中的character.ai,可以创建自带人格的对话模型,例如你可以创建一个鲁迅陪你聊天。我想假以时日,如果能收集足够自己的数据,弄一个赛博自己,就可以时不时让他陪媳妇聊天,这样我就可以专心打游戏了(笑)。
这里有个列表,罗列了AIGC的各种方向:https://github.com/yzihan/Generative-AI
Frontend
painter: https://github.com/aml2610/react-painter#readme
replit: https://replit.com/~
Stable Diffusion related
Stable Diffusion web UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui
txt2mask: https://github.com/ThereforeGames/txt2mask
Img2img Video: https://github.com/memes-forever/Stable-diffusion-webui-video
Vid2Vid: https://github.com/Filarius/stable-diffusion-webui/blob/master/scripts/vid2vid.py
Image segmentation and recognition
semantic-segmentation: https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512
plant recognition: https://web.plant.id/plant-identification-api/
texture recognition (need to apply): https://www.clarifai.com/models/texture-recognition
doodle-recognition: https://github.com/zhangchaodesign/doodle-recognition
Image style transfer
Arbitrary Neural Style Transfer: https://replicate.com/collections/style-transfer
Image search
image search: https://www.microsoft.com/en-us/bing/apis/bing-image-search-api
Graph-based
Spacy:https://spacy.io/api
Stanford OpenIE:https://nlp.stanford.edu/software/openie.html
Text generation
Demo-InferKit: https://app.inferkit.com/demo
Sassbook AI Story Generator: https://sassbook.com/ai-story-writer
Rytr-an AI writing assistant: https://rytr.me/
Title generation
OpenBMB: https://live.openbmb.org/ant
Text classification
Cohere: https://os.cohere.ai/playground/large/classify
Text analysis
Convert Unstructured Text Data Into Actionable Insights With Advanced Text Analysis: https://kpibees.com/
Semantic Role Labeling: https://demo.allennlp.org/semantic-role-labeling/semantic-role-labeling
Event extraction: https://huggingface.co/veronica320/QA-for-Event-Extraction
Sentiment analysis: https://huggingface.co/models?other=sentiment-analysis
Named entity recognition: https://huggingface.co/dslim/bert-base-NER
Continue images generation
StoryDall-E: https://github.com/adymaharana/storydalle?continueFlag=aecf3cf42991a37d09397fc61687c405 https://huggingface.co/spaces/ECCV2022/storydalle
Storygan: https://arxiv.org/pdf/1812.02784.pdf https://github.com/yitong91/StoryGAN
stable-diffusion-video: stable-diffusion-videos: https://github.com/nateraw/stable-diffusion-videos
StyleCLIP: https://github.com/orpatashnik/StyleCLIP
Generating image with specified character: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
Text-to-Video Generation
Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions: https://phenaki.video/index.html
Make-A-Video: https://makeavideo.studio/
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html
CogVideo: https://github.com/THUDM/CogVideo
text-image-video: https://huggingface.co/spaces/Kameswara/TextToVideo
Imagen Video: https://imagen.research.google/video/
Magicvideo: Efficient video generation with latent diffusion models
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Video Diffusion Models
Image-to-Video Generation
Make it move: Controllable image-to-video generation with text descriptions
3D Generation
DreamFusion: Text-to-3D using 2D Diffusion: https://dreamfusion3d.github.io/; https://github.com/ashawkey/stable-dreamfusion
dreamfields-3D: https://github.com/shengyu-meng/dreamfields-3D
ZoeDepth:https://github.com/isl-org/ZoeDepth
Text to 3D Scene
https://github.com/oaishi/3DScene_from_text
https://nlp.stanford.edu/projects/text2scene.shtml
Representing Scenes Generation
https://www.matthewtancik.com/nerf
Depth Analysis
https://github.com/EPFL-VILAB/omnidata
Compositional Visual Generation
https://people.csail.mit.edu/lishuang/#Home
Dynamic Human
Switch Human Body
https://github.com/NVIDIA/vid2vid
Image to Text
img2prompt: https://replicate.com/methexis-inc/img2prompt
Text to Image
Deforum Stable Diffusion: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
stable diffusion demo: https://demo.rowy.io/table/imageGeneration
Disco Diffusion: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Latent Diffusion: https://huggingface.co/spaces/multimodalart/latentdiffusion
Dreamstudio: https://beta.dreamstudio.ai/dream
styleclipdraw: https://github.com/pschaldenbrand/StyleCLIPDraw
Midjourney: https://www.midjourney.com/home/
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Image Editing with Text
Prompt-to-Prompt Image Editing with Cross-Attention Control: https://prompt-to-prompt.github.io/
Imagic: Text-Based Real Image Editing with Diffusion Models
Text augmentation
Bloom: https://huggingface.co/bigscience/bloom
OPT : Open Pre-trained Transformer Language Models: https://huggingface.co/facebook/opt-125m
seq2seq
Bart: https://huggingface.co/facebook/bart-base
Outpainting
stablediffusion-infinity: https://github.com/lkwq007/stablediffusion-infinity
Pytorch implementation
video-diffusion-pytorch: https://github.com/lucidrains/video-diffusion-pytorch
phenaki-pytorch: https://github.com/lucidrains/phenaki-pytorch
make-a-video-pytorch: https://github.com/lucidrains/make-a-video-pytorch
imagen-pytorch: https://github.com/lucidrains/imagen-pytorch
DALLE2-pytorch: https://github.com/lucidrains/DALLE2-pytorch
Access/Share State of Art Models
Hugging Face: https://huggingface.co/
Replicate: https://replicate.com/
Rapid API: https://rapidapi.com/hub
ml5.js: https://ml5js.org/
Pollinations: https://pollinations.ai/c/Anything
Some development issue
Access-Control-Allow-Origin: https://chrome.google.com/webstore/detail/allow-cors-access-control/lhobafahddgcelffkeicbaginigeejlf/related?hl=en
Some Human-AI Systems
StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement
StoryDrawer: A Child–AI Collaborative Drawing System to Support Children's Creative Visual Storytelling