OpenAI宣告Sora人工智能生成将改变世界__人工智能

2024-09-19 09:56:29 智能写作

***加载中...

OpenAI宣告Sora人工智能生成视频将改变世界_视频_人工智能智能写作

OpenAI发布了一款新的AI产品Sora，用户只需输入文本，即可天生一段极其逼真且富有想象力的***。
还分享了几个样本***，证明了通过文本天生***的可能性。

All right, so this is simultaneously really impressive and really frightening at the same time, and it's hitting me in ways that I didn't really expect.

好吧，这真是令人印象深刻，又令人生畏，它以一种意想不到的办法震荡了我。

So do you remember Will Smith eating spaghetti?

还记得威尔·史密斯吃意大利面的***吗？

Do you remember when this was what AI generated videos looked like?

你还记得人工智能天生出的***长什么样吗？

Remember when we said, "Okay, this AI stuff is cool and all "but clearly there's a long way to go "before there's any need for concern."

那时我们说：“好吧，人工智能这东西很酷” ，但显然“在须要担心之前” 还有很长的路要走。

Well, welcome to the future people because this is also an AI generated video.

好吧，伙计们欢迎来到未来天下，由于这也是人工智能天生的***。

And so is this, completely synthesized out of thin air by computers.

这也是，完备是电脑凭空合成的。

This one too, this is not real.

这也是人工智能，这不是真的。

Absolutely ridiculous how far we've come in literally one year.

一年之内我们已经取得了多么大的进步，这绝对是荒谬的。

This does feel like another ChatGPT, DALL.E moment for AI.

这觉得就像是人工智能领域的又一个ChatGPT、DALL.E。

And maybe I'm overreacting because, okay, I am a video creator, so an AI that's actually doing my job, maybe that feels a little more threatening so I'm particularly impressed by it.

大概我有点夸年夜了，由于作为一个***创作者，人工智能实际上正在做我的事情，以是我觉得更有威胁性，我对它印象特殊深刻。

But also this stuff is really good.

不过这东西也确实不错。

So today, Sam Altman and OpenAI announced a new model called Sora and it can generate full up to one minute video clips from just text input.

就在本日，萨姆·奥尔特曼和 OpenAI 发布了新模型Sora，把文本指令天生出长达一分钟的完全***。

So the same way DALL.E was able to understand our text input and turn it into a photorealistic or stylized image or whatever you want, same thing with Sora but now since it's videos, it also needs to understand how all these things like reflections and textures and materials and physics all interact with each other over time to make a reasonable looking video.

就像DALL.E 能够理解我们的文本指令，将其转换为极具真实感或风格化图像，以及任何你想要的图像，Sora现在能天生出***，它还能理解更多内容，例如反射、随着韶光的推移，纹理、材料和物理都会相互结合，从而制作出看似合理的***。

And of course, right away, there's a bunch of examples on their website that are crazy.

他们的网站上立时就能看到一堆猖獗的例子。

Now, before I show you these, I just need you to keep this in mind, you're about to watch a bunch of AI generated videos and you know that you're about to watch a bunch of AI generated content.

在我向你展示这些之前，我只须要你记住这一点，你要看到的是一堆人工智能天生的***，并且你知道你要看这些。

So your brain, you're already looking for this stuff and it's not perfect, you will find imperfections, but not everybody who sees AI generated content on the internet knows to be looking for that.

这样在你的大脑中，你就会在个中探求一些不完美的部分，你会创造不完美的地方，但并不是每个在互联网上看到人工智能天生内容的人都会找出它。

So also keep that in mind.

以是也要记住这一点。

This is also the worst that this technology is going to be from here on out.

这也是这项技能目前最恐怖的地方。

So, okay, here's one of the videos.

这是个中一个***。

There's no audio to any of these clips, but the prompt for this one is a stylish woman walks down a Tokyo street filled with warm, glowing, neon and animated city signage.

这些片段都没有音频，但这个片段的指令是一位时尚女性走在东京街道上，街道上充满了温暖、发光的霓虹灯和动画城市标牌。

She wears a black leather jacket, a long red dress and black boots.

她穿着玄色皮夹克、赤色长裙和玄色靴子。

This video is already miles ahead of where we were.

该***已经比我们之前的内容领先许多了。

It has accurate lighting, it has materials, it has skin tones, movements, even has reflections all over the place.

它有准确的灯光，有材质，有肤色，动作，乃至到处都有反射。

Now, of course, if you look at it for more than about 10 seconds, very closely, there are lots of giveaways.

当然，如果你仔细不雅观察超过 10 秒，就会创造很多细节。

Like this dude in the background kinda looks like he's gliding in a weird way.

就像背景中的这个家伙走路办法有点奇怪，觉得像是在滑步。

The frame rates and the reflections in the water are for some reason lower than the rest of the video.

由于某种缘故原由，帧速率和水中的反射低于***的别的部分。

The camera movement overall is just a bit inconsistent and it just, I don't know, it just kinda feels a little bit off.

相机的整体运动有点不一致，我也不愿定，只是觉得有点不对劲。

But then again, this is where we were one year ago.

但话又说回来，这便是我们一年前天生出的***。

So just keep that in the back of your head for all this.

请记住这一点。

Okay, how about this one?

再看看这个怎么样？

This is another one which has a long prompt about a camera following behind a white vintage SUV with a black roof rack as it's speeds up a steep dirt road.

这是另一个很长的指令，镜头跟随一辆带有玄色车顶行李架的白色老式 SUV，它在陡峭的土路上加速提高。

This is also, again, really good.

这也非常好。

It kinda looks a little more video gamey because of how rock solid the drone footage is, but clearly very usable.

由于无人机镜头很稳定，它看起来有点像***游戏，这显然是可以利用的。

Here's another one, a litter of golden retriever puppies playing in the snow.

这是另一个***，一窝金毛小狗在雪地里玩耍。

Their heads pop in and out of the snow covered in it, it's so good.

他们的头从雪中探出来，被雪覆盖着，这也太好了。

It feels like the physics of the fur and the ears and everything with the snow flying around in slow motion is incredible.

觉得毛皮、耳朵以及雪花慢动作飞舞的统统的动作都很逼真。

I've looked through all of the sample videos on OpenAI's website, and clearly these are the handpicked best ones that they chose to share where they just put in some text and then get a video and don't modify it.

我浏览了 OpenAI 网站上的所有示例***，很明显，这些是他们精心挑选的最佳***，他们选择分享的内容都是仅输入文本指令天生，不经由修正的***。

But there's really impressive stuff in there.

但里面确实有令人惊异的东西。

Some of it has humans, some of it doesn't.

个中有一些是人类，一些则没有。

Some of it is more realistic feeling like the truck driving one, but some of them are more video gamey or more stylized.

个中一些更真实，就像驾驶卡车一样，但个中一些更具***游戏性或更具风格化。

A lot of it is slow motion, I just have to say how insanely fast these models are improving is genuinely, like that's the shocking part.

很多都是慢动作，我只想说这些模型的改进速率确实快得惊人，这便是令人震荡的部分。

Like I remember not even that many months ago, DALL-E 3, really, really high end, and you could always still find something off about it.

就像我乃至不记得几个月前，DALL-E 3的发布让人以为非常高端，但你总能找到一些关于它的确定。

Like especially if you ask it for something like a photorealistic image of a human, something about like the hands or the ears would always just be a little bit off, nevermind the physics.

特殊是如果你哀求它供应诸如真实的人类图像之类的东西，像手或耳朵这样的东西总是会有点偏差，更不用说动作了。

But even this video here is crazy at first glance.

但纵然是这个***，乍一看也很猖獗。

The prompt for this AI generated video is a young man in his 20s is sitting on a piece of a cloud in the sky reading a book.

这段人工智能天生的***的提示是一个20多岁的年轻人坐在天空中的一片云上读书。

This one feels like 90% of the way there for me.

这对我来说完成率已经达到90%了。

Like it's beyond the uncanny valley of like apple's personas, which are actually based on humans.

就像它超出了苹果公司的人物角色的胆怯谷一样，这些角色实际上是基于人类的。

This is a made up person.

这是一个被制造出来的人。

I mean, his eyes are kinda weird, and the motion of the pages in the book are kinda odd.

我的意思是，他的眼睛有点奇怪，书页的动作也有点奇怪。

And yeah, obviously, he's in a cloud and that's a giveaway but like, the lighting and the shadows and the skin tones and then all the realism of the textures on the shirt and the way the shirt and the pants move and the hair, they're all really impressive.

显然他坐在云端，仔细看看还是能找出一些瑕疵，比如灯光、阴影和肤色，还有衬衫上纹理的逼真度，以及衬衫、裤子和头发移动的方向，它们都非常令人印象深刻。

And then for this one, they typed in a movie trailer featuring the adventurers of the 30-year old spaceman wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35 millimeter film.

还有这个，给出的指令是，电影预报片讲述了30岁太空人戴着赤色羊毛针织摩托车头盔的冒险经历，蓝天，盐漠，电影风格，用35毫米胶片拍摄。

And the closeups of his face, the fabrics on the helmet, the film grain through every shot and the cinematic style, this is one of the most convincing AI generated videos I've ever seen, minus maybe the weird physics of that dude walking kind of in fast motion.

他的脸部特写、头盔上的织物、每个镜头的胶片颗粒和电影风格，这是我见过的最令人信服的人工智能天生的***之一，除了那个家伙行走时的奇怪走路姿势。

So Sam Altman, if you follow him on Twitter, he's going through a whole bunch more of like people's requests and posting a bunch more generated videos.

如果你在推特上关注萨姆·奥尔特曼的话，你会创造他会处理更多类似人类的要求，发布更多天生的***。

And so if you wanna check out his profile, you can see those.

以是如果你想查看他的个人资料，你可以看到这些。

But here's the thing about these AI generated videos now, as good as they've gotten to this point, they can and will pass as real videos to people who are not looking for AI generated videos.

但现在这些人工智能天生的***的问题是，只管它们已经达到了这一点，它们可以充当真实***传给那些不清楚人工智能天生***的人。

Now that is obviously insanely sketchy during an election year in the U.S. and also terrifying for a bunch of other internet related reasons but it's also perfect for stock footage.

在今年的美国大选年，虽然这还很粗糙，但由于其他一些与互联网干系的缘故原由，这很适宜作素材利用。

Like there are already all kinds of presentations and advertisements and then PowerPoints that are in need of oddly specific stock videos.

就像已经有各种各样的演示文稿和广告，然后还有须要奇怪的特定库存***的 PowerPoint。

And these AI generated videos are already good enough to 100% pass for that purpose.

这些 AI 天生的***已经足够好，可以 100% 作为这个目的而利用。

Like look at this one, this one with the waves at Big Sur, this drone shot.

就像看看这个，在大苏尔的海浪用无人机拍摄的***。

Honestly, if I saw this on Twitter, I wouldn't even think twice.

诚笃说，如果我在推特上看到这个，我乃至不会多想。

I'd be like, "Oh, nice drone shot, dude." Wouldn't even think about AI if I wasn't pixel peeping at like the way the water was moving.

我会说，“哦，无人机拍摄得不错，伙计。
”如果我像素窥视创造水流动的办法不对，我乃至不会想到人工智能。

Like this is a totally usable video in an ad for some California based product.

像这样，这是一些加州产品的广告中完备可用的***。

And that has all sorts of implications for the drone pilot that no longer needs to be hired, for all the photographers and videographers whose footage no longer needs to be licensed to show up in that ad that's being made.

这不再须要雇用无人机事情职员进行拍摄，也不须要搜聚广告制作的所有拍照师和摄像师的容许，这对他们会造成方方面面的影响。

It's already that good.

这的确很棒了。

There's other stuff like this wall of TVs, which would be a totally expensive and difficult thing to shoot with a camera and all these old expensive props, but if you can just generate it this well with reflections and the environment and everything else around it, I mean, why do it any other way?

还有其他一些示例，比如这面电视墙，用相机和这些昂贵的旧道具拍摄，是一件非常昂贵和困难的事情，但如果你能通过文本天生出它，且它在反射、环境和周围的其他统统都完成得很好的情形下，为什么还要用其他办法呢？

It's also very capable of historical themed footage.

它也非常适宜历史主题镜头。

So this is supposed to be California during the Gold Rush.

以是这该当是淘金热期间的加利福尼亚州。

It's AI generated but it could totally pass for the opening scene in an old Western with the right music over it.

它是人工智能天生的，但如果配上得当的音乐，它完备可以与古老西部片的开场场景相媲美。

How long until an entire ad, every single shot is completely generated with AI?

要多久才能用人工智能完备天生全体广告中的每个镜头？

Or what about an entire YouTube video or an entire movie?

或者完全的油管***或整部电影呢？

I'm tempted to say like we're a long way away from that because you know, this still has flaws clearly and there's no sound, and there's a long way to go with the prompt engineering to iron these things out.

我很想说，我们离这个目标还有很长的路要走，由于这显然还存在毛病，而且没有声音，而且是很长一段路，须要及时的技能来办理这些问题。

But then again, the spaghetti was like a year ago.

但话又说回来，就像我们在一年前看到的意大利面画面一样。

Now actually like that OpenAI, on their website, they show some of the downfalls too of this particular model.

实际上OpenAI在他们的网站上，也展示了这个特定模型的一些缺陷。

And because who would know better than the people who have been using it?

由于谁会比一贯在利用它的人更理解呢？

This is a very private tool, by the way, right now.

顺便说一下，目前这是一个非常私人的工具。

It's in super limited access, so it's in the hands of , which basically means people testing it, pushing the limits, trying to break it, and a few trusted creators.

它的访问非常有限，节制权在red teamers手中，这基本上意味着少数的人能够测试它，试图寻衅它的极限，大部分都是值得相信的创作者。

But they have found plenty of weird edge stuff.

但是他们创造了很多奇怪的边缘性的内容。

Like this clip here of a bunch of gray wolf pups looks normal at first but then it's pretty clear that something's kinda off with the way they're just kinda appearing out of nowhere and walking through each other.

就像这里的这段***一样，一群灰狼幼崽一开始看起来很正常，但后来很明显，它们不知从哪里溘然涌现，相互走过，有些不对劲。

That's kinda weird.

这有点奇怪。

Or this clip of a guy running on a treadmill, which I mean, I don't really have to say much more about why this one is weird.

或者是一个人在跑步机上跑步的片段，这不说也能看出它奇怪在哪里吧。

But this is my favorite one, again, so again, just try to put yourself in the mind of someone who's not expecting AI.

但这是我最喜好，再次强调一下，试着把自己放在一个不期待人工智能的视角中。

You're just scrolling through Facebook or Twitter or something, right?

你只是浏览 Facebook 或 Twitter 之类的，对吧？

So you just see this video.

以是你只要看这个***就可以了。

So first I just want you to watch this clip as if it's just a stock video you found of a grandma celebrating her birthday.

首先，我只想让你看一下这段***，就彷佛它只是你随意看到的一位奶奶庆祝生日的***一样。

And just try to think like, I wonder what birthday she's celebrating, right?

试着想一想，我想知道她在庆祝什么样的生日，对吧？

I don't know, how old do you think she is?

我不知道，你以为她多大了?

60?

60？

65?

Maybe it's the big 70. She seems to really like that cake.

大概是七十大寿。
她彷佛很喜好那个蛋糕。

Now, did you see it?

现在，你看到了吗？

Did you catch that?

你明白了吗？

I'm gonna play it again, but this time, watch the video knowing that AI generated photos and videos have trouble accurately doing hands.

我再放一遍，但这一次，当你不雅观看***时，带着“人工智能天生的照片和***很难准确地进行手工操作”这样的心态去看。

I'll play it again.

我会再播放一次。

And now it feels super obvious like every time you watch it, watch a different set of hands, it gets weirder and weirder.

现在觉得非常明显，就像每次你看它，看着不同的手，它变得越来越奇怪。

You can watch it like five times and there's dead giveaway after dead giveaway, not even mentioning the weird inconsistencies with the direction of the wind on the candles.

你可以看五遍，里面有一个又一个的致命弱点，更不用说烛炬上风向不一致了。

But even as I'm saying all that, even as it's coming outta my mouth, I can't help but remember that 12 months ago we were critiquing this.

但纵然我说了这统统，纵然它从我嘴里说出来，我还是忍不住想起 12 个月前我们对此进行了批评。

So what does this all mean?

那么，这意味着什么？

Well, I mean, there's what it means now and there's what it means for the future.

我的意思是，这对现在意味着什么，对未来也意味着什么。

Now, Sora, this thing that they've made is clearly a really impressive video generation AI tool that is both going to fool people and also be very useful.

现在，他们发布的Sora模型，显然是一个非常令人印象深刻的***天生人工智能工具，它既可以愚弄人们，也具有实用性。

There's also a watermark in the bottom corner of every video generated by it.

它天生的每个***的底角还有一个水印。

So if you see one of those videos and ideally it hasn't been cropped out, then that's at least a pretty clear indicator that it's AI generated.

以是你看到的个中一个***，空想情形下它没有被裁剪，那么这至少可以清楚地表明它是人工智能天生的。

It's a Sora video.

这是Sora天生出的***。

But also, I do think they're gonna have to be very careful with this, they're gonna have a whole bunch of safety stuff to keep in mind.

而且，我确实认为他们要小心对待这个，他们须要牢记一大堆安全事变。

I think they'll probably have to be even more safe than DALL.E. Like you shouldn't be able to generate people's likenesses.

我认为他们可能必须确保比 DALL.E 更安全。
就像你不应该能够产生千篇一律的人类。

Like you shouldn't be able to make a politician look like they're doing something on video, especially this year.

就像你不应该让政客看起来像是在***中做某事一样，尤其是今年。

You probably won't be able to make Will Smith eating spaghetti, but it also definitely means stock video generation is absolutely going to take a dent out of video licensing.

你可能无法让威尔·史密斯吃意大利面，但这也绝对意味着***天生绝对会削弱***容许。

Like I can basically guarantee that.

我基本上可以担保。

Like logistically, why would anyone making something pay for footage of a house in the cliffs when they can generate one for free or for a small subscription price?

就像逻辑上一样，当有人可以免费或以较低的订阅价格制作一个峭壁上的屋子的镜头时，为什么有人会费钱购买真实拍摄出来的镜头呢？

Like that is the real scary part of what this tool implies.

这便是这个工具真正恐怖的部分。

But in the future, it gets pretty existential, man.

但在未来，它可能真的会发生。

I mean, okay, if this is trained on all videos that have ever been made by humans, then surely it can't be innovative or creative in ways that humans haven't already been, right?

如果这是对人类制作的所有***进行演习，那么它肯定不会以人类尚未实现的办法进行创新或创造性，对吗？

I don't know.

我不知道。

Either way, I'll have all the links below for all the Sora stuff, for OpenAI stuff, and I guess I'll talk to you next year when we look back and go, "Remember that first version of Sora "and how bad those wolf pups looked "when they spawned out of nowhere?" Just remember, this is the worst that this technology is going to be from here on out.

不管若何，我会不才面供应所有 Sora 的东西、OpenAI 的东西的所有链接，我想明年当我们回顾过去时我会和你谈谈，“还记得 Sora 的第一个版本” 以及有多糟糕吗？那些狼崽看似“不知所措 ”请记住，从现在开始，这是这项技能最恐怖的情形。

Thanks for watching.

感谢您的不雅观看。

Catch you the next one, peace.

我们下期再见。