Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
  • Sign in / Register
R
radiobatallontopater
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Analytics
    • Analytics
    • CI / CD
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Jobs
  • Issue Boards
  • Angus Lachance
  • radiobatallontopater
  • Issues
  • #1

Closed
Open
Opened Feb 12, 2025 by Angus Lachance@anguslachance
  • Report abuse
  • New issue
Report abuse New issue

DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk


DeepSeek: at this phase, the only takeaway is that open-source designs surpass exclusive ones. Everything else is and I don't buy the public numbers.

DeepSink was built on top of open source Meta models (PyTorch, Llama) and ClosedAI is now in threat because its appraisal is outrageous.

To my knowledge, no public paperwork links DeepSeek straight to a specific "Test Time Scaling" technique, however that's extremely possible, so enable me to streamline.

Test Time Scaling is used in machine discovering to scale the design's efficiency at test time instead of throughout training.

That indicates fewer GPU hours and less effective chips.

In other words, lower computational requirements and lower hardware expenses.

That's why Nvidia lost almost $600 billion in market cap, the biggest one-day loss in U.S. history!

Lots of people and institutions who shorted American AI stocks ended up being extremely abundant in a few hours because financiers now forecast we will require less powerful AI chips ...

Nvidia short-sellers just made a single-day profit of $6.56 billion according to research from S3 Partners. Nothing compared to the market cap, I'm taking a look at the single-day quantity. More than 6 billions in less than 12 hours is a lot in my book. Which's just for Nvidia. Short sellers of chipmaker Broadcom made more than $2 billion in profits in a few hours (the US stock market runs from 9:30 AM to 4:00 PM EST).

The Nvidia Short Interest Gradually data shows we had the second greatest level in January 2025 at $39B however this is dated due to the fact that the last record date was Jan 15, 2025 -we have to wait for the newest information!

A tweet I saw 13 hours after publishing my post! Perfect summary Distilled language designs

Small language designs are trained on a smaller sized scale. What makes them different isn't just the capabilities, it is how they have been constructed. A distilled language model is a smaller sized, more effective model developed by moving the knowledge from a larger, more complicated model like the future ChatGPT 5.

Imagine we have an instructor model (GPT5), which is a large language design: a deep neural network trained on a lot of information. Highly resource-intensive when there's restricted computational power or when you need speed.

The knowledge from this teacher design is then "distilled" into a trainee design. The trainee model is simpler and has less parameters/layers, that makes it lighter: less memory use and computational demands.

During distillation, the trainee model is trained not only on the raw information but likewise on the outputs or the "soft targets" (possibilities for each class rather than hard labels) produced by the teacher model.

With distillation, the trainee model gains from both the initial data and the detailed predictions (the "soft targets") made by the instructor model.

To put it simply, the trainee design does not simply gain from "soft targets" however also from the very same training data used for the teacher, but with the guidance of the teacher's outputs. That's how knowledge transfer is enhanced: dual knowing from data and from the teacher's predictions!

Ultimately, the trainee simulates the instructor's decision-making process ... all while using much less computational power!

But here's the twist as I understand it: DeepSeek didn't simply extract content from a single big language design like ChatGPT 4. It relied on numerous large language designs, including open-source ones like Meta's Llama.

So now we are distilling not one LLM however multiple LLMs. That was among the "genius" idea: mixing various architectures and datasets to develop a seriously adaptable and robust little language model!

DeepSeek: Less guidance

Another necessary innovation: less human supervision/guidance.

The concern is: how far can designs opt for less human-labeled information?

R1-Zero found out "thinking" abilities through experimentation, it progresses, it has special "reasoning habits" which can lead to noise, unlimited repetition, and language mixing.

R1-Zero was speculative: there was no preliminary guidance from identified information.

DeepSeek-R1 is various: it used a structured training pipeline that consists of both monitored fine-tuning and support learning (RL). It started with initial fine-tuning, followed by RL to fine-tune and boost its reasoning abilities.

The end result? Less sound and pipewiki.org no language blending, unlike R1-Zero.

R1 utilizes human-like reasoning patterns initially and it then advances through RL. The development here is less human-labeled data + RL to both guide and fine-tune the model's efficiency.

My concern is: did DeepSeek actually resolve the issue understanding they extracted a lot of information from the datasets of LLMs, which all gained from human supervision? Simply put, is the standard reliance actually broken when they relied on formerly trained models?

Let me reveal you a live real-world screenshot shared by Alexandre Blanc today. It shows training data drawn out from other models (here, ChatGPT) that have actually gained from human guidance ... I am not persuaded yet that the traditional reliance is broken. It is "easy" to not require enormous quantities of top quality reasoning information for training when taking shortcuts ...

To be balanced and show the research, I have actually submitted the DeepSeek R1 Paper (downloadable PDF, 22 pages).

My issues concerning DeepSink?

Both the web and mobile apps gather your IP, keystroke patterns, and device details, and everything is kept on servers in China.

Keystroke pattern analysis is a behavioral biometric approach utilized to determine and verify individuals based on their distinct typing patterns.

I can hear the "But 0p3n s0urc3 ...!" remarks.

Yes, open source is great, however this thinking is restricted since it does rule out human psychology.

Regular users will never ever run models in your area.

Most will just desire quick responses.

Technically unsophisticated users will use the web and mobile variations.

Millions have already downloaded the mobile app on their phone.

DeekSeek's models have a real edge and that's why we see ultra-fast user adoption. In the meantime, they transcend to Google's Gemini or OpenAI's ChatGPT in lots of ways. R1 scores high up on unbiased benchmarks, no doubt about that.

I suggest looking for anything delicate that does not line up with the Party's propaganda on the web or mobile app, and asteroidsathome.net the output will speak for itself ...

China vs America

Screenshots by T. Cassel. Freedom of speech is stunning. I might share terrible examples of propaganda and censorship however I won't. Just do your own research study. I'll end with DeepSeek's privacy policy, which you can continue reading their website. This is a basic screenshot, nothing more.

Feel confident, your code, ideas and conversations will never be archived! As for the genuine financial investments behind DeepSeek, we have no idea if they remain in the numerous millions or parentingliteracy.com in the billions. We just understand the $5.6 M amount the media has been pushing left and right is false information!

  • Discussion
  • Designs
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
0
Labels
None
Assign labels
  • View project labels
Reference: anguslachance/radiobatallontopater#1