Announcement

**ssk** · 14 Jul 2020, 22:23:26

ตามนั้นครับ apple คือตัวเร่งชั้นดี
จริงๆ กลุ่มนี้เตรียมตัวมาเนิ่นๆ ด้วย
Office online adobe CC clound for mobile อยู่แล้ว เหลือแค่ port for osx-native arm เองนะ

ว่าจะคุยเรื่อง AMD rocm cdna counter AVX512
เดี๋ยวจะหา ว่า amd ล้าหลังเรื่องนี้ ไปแปะใหนดี ROZ? ดีมั้ยครับ

**micronz** · 14 Jul 2020, 22:30:47

ตกลง a13 มีไม้ท่านเห็นข่าวท่านแปะบอกไปรหัส a14. ผมเป็นผู้บริหารก็แขยง จิงๆๆนะเลข 13
.........
อัพเดทให้อีกเรื่อง o. S. ที่มีเวอร์ชั่น อาร์มมา
ตลอด ตอนนี้ m.x.linux รั้งอันดับ1. ใช้ง่าย
แถมเร็วมาก base on. Debian เหมือนอุบันตู
อีก ใครอยากเปลี่ยนประสพการณ์ลองดู
มีเวอร์ชั่น x64. อยู่แล้ว

กลับไปบ้านจะล้างอุบันตูลง m. X. ซักหน่อย

**micronz** · 14 Jul 2020, 23:51:56

https://www.beartai.com/news/itnews/454315
ยังเป็นข่าวรอดูกันต่อไป
สําหรับไมโครซอฟต์และพลพรรค

**ssk** · 15 Jul 2020, 07:26:31

Google uses AMD Epyc chips in new Cloud instances with extra memory protection

นั่นล่ะครับ Google เลือกใช้ AMD EPYC กับ SERVER และ EPYC ใช้ ARM Core Embed เข้ารหัสข้อมูลใน VM ซึ่ง
Epyc 2 ตัว สามารถรองรับ 511 VM และ ตามข่าวเก่า มันใช้งานร่วมกับ NVIDIA A100 โลกไม่ได้มีแค่ AVX512 ครับ

https://tekdeeps.com/pro-google-uses...ry-protection/

AMD and Google have announced that AMD’s Epyc Rome chips will be deployed in new Google Cloud server instances. The instances will be part of Google’s Confidential Computing portfolio.

This new segment uses virtual machines that come with a feature called ‘secure encrypted virtualization’, which encrypts the memory that can be allocated to one virtual machine using a key unique to each vm. The key is generated by an embedded arm cpu on the Epyc chip, called the Platform Security Processor. To be precise, aes encryption with a 128-bit key is used. An Epyc 2 chip holds up to 511 keys.

The data is also encrypted with this functionality when the data is used by the CPU, and not only when data is sent or temporarily stored in the ram. This means that if malware manages to read the memory attached to another VM, it encounters encrypted data.

The functionality is made available in a beta project on Google’s N2D-instances. Currently, Ubuntu 18.04, Ubuntu 20.04, Container Optimized OS 81 and RHEL 8.2 are supported on these servers. CentOS and Debian will be added in the future.

Sources:
AMD, Anandtech, Heise

แปะนี่ดีกว่า เอาเรื่อง DATACENTER มากองนี่ด้วยละกัน

**micronz** · 15 Jul 2020, 19:44:23

ฝั่ง serv.supercom หลายตัวย้ายมาอีปิค์
เพราะชยาดรูรั่ว ที่ขยันรั่วของสถาปัต์
สกายเลค นี่แหละ

**ssk** · 15 Jul 2020, 20:24:34

Originally posted by micronz View Post

ฝั่ง serv.supercom หลายตัวย้ายมาอีปิค์
เพราะชยาดรูรั่ว ที่ขยันรั่วของสถาปัต์
สกายเลค นี่แหละ

มีเรื่องบีบคอขายพ่วงด้วยครับ จะใช้ CPU ต้องเอา OPTANE DCMM ไปด้วย
ไม่ก็ต้องใช้ SSD INTEL ต้องพ่วง PHI ที่ผมไปเจอมาจาก เว๊บนอกนะ
SAMSUNG ยังเคยด่าเลย เรื่องพ่วง SSD เนี่ย

จริงๆ LINUS เค้าด่าคือ พอไปดู CPU ใหม่ ที่ต้องทำ KERNEL ใว้
มันดันไม่รองรับ AVX512 ทั้งหมด บางตัวมี บางตัวไม่มี ในสายตา DEV
แกเลยยกนิ้วกลางให้ ข่าว UPDATE ก็ยังลักปิดลักเปิด จะใช้ต้อง Enable BIG Core
แล้วจะใช้ Library AVX512 ไปทำไม เขียน AVX/AVX2 ก็ดีอยู่แล้ว ตัวใหนว่างลงได้หมด

แถม OS-Software สมัยนี้เรื่อง Cross platform นี่สำคัญมากๆ ขีนไปเขียนอิง 512
พอ Recomplie ข้ามระบบต้องมาทำใหม่อีก มันวุ่นวาย งาน ML-AI-DL
ก็ใช้แค่ FP32 คือมันเป็น pointer vecter ตามรูป AxB+C ก้คือ MATRIX 2 ตัวคูณกัน+counter weight แล้วผลออกมา
เลขใหน ชี้ไปที่อะไร ค่อยไปโหลดก้อนข้อมูลส่งไปให้ User งานมีกี่ตัวแปรก็ทำซ้ำๆไปจน all Condition INT เป็น All true
ซึ่งมันก็ไม่เกิน INT32 ข้อมูลแค่เนี้ย จะไปยัดลงคำสั่งยาวๆแบบ 512 BITS ทำไม เค้า IMplement มาแล้ว
ว่านี่แหละ OK ใช้ได้ตั้งแต่ ARM จิ๋วๆยัน EPYC-128 core เค้าถึงด่าไง มันไม่จำเป็น จะใส่มาทำไม

งาน MATRIX+VECTER GPU มันก็เก่งอยู่แล้ว CPU มันผู้จัดการไปทำงาน OS handle+Workload Distribute ไปเหอะ
อย่าไปคิดแย่งงานชาวบ้่านเค้าเลย งานตัวเองเอาให้ดีๆเถอะ นั่นล่ะที่ผมคิดว่า LINUS อยากสื่อนะครับ

และ LINUX น่ะเป็น OS ที่คนใช้มากที่สุดในโลก ถ้ารวม Android และ DATACENTER ที่ XENON ทั้งหลายต้องทำงานด้วย
มันใช้ LINUX ทั้งนั้น ไม่ใช่ WINDOWS ซึ่งคนที่ทำ KERNEL อย่าง LINUS ต้องทำ KERNEL รองรับ CPU ทุก PLATFORM
ตั้งแต่ ARM A5x-XENON หรือ EPYC แกคงเห็นว่ามันไม่ดีจริงๆ ถึงออกมายกนิ้วกลางให้รู้ซะมั่งว่าไม่เอาเว้ย

**micronz** · 15 Jul 2020, 20:32:04

เจ้าพ่อ ไลนัสด่าแต่ละที่คนโดนด่ามีสะดุ้งครับ เดี๋ยวนี้ ไดรฟเวอร?เขียวกับLinux ดีขึ้น
เยอะมาก

**micronz** · 15 Jul 2020, 20:36:16

เจ้าพ่อ ไลนัสด่าแต่ละคนที่โดนด่ามีสะดุ้งครับ เดี๋ยวนี้ ไดรฟเวอร?เขียวกับLinux ดีขึ้น
เยอะมาก. สถาปัตย์ x86. ที่มันกําลังโดน
อาร์มแดก ทุกวันนี้เพราะความงกของ
อินเทลที่ไม่ยอมขายสิทธ์ให้คนอื่นผลิต
มาแข่งนี่แหละ. ถึงฝั่งวืนโดว์จะช้า
ก็รอวันถูกแดกเหอะไม่เกิน 30. ปี

ตั้งแต่ มีข่าว apple ประกาศ เชนจ์ ทูอาร์ม
ผมเห็น เว็บเฟส เซียนในวงการคอมบางคน
ไม่กล้าลงข่าวเลย สงสัยรักอินเทลมาก
O.c.z.ยังจัดไม่เลี้ยง55

สถาปัตยกรรม CPU ที่เปิดโอกาศให้มีผู้เล่น
มากกว่า 2-3. เจ้าได้มาแข่งขันยังไงผมก็สนับ
สนุนมากกว่า และแพลตฟอร์มย่อมมีอนาคต
มากกว่าเพราะแพร่หลายกว่า

**ssk** · 15 Jul 2020, 21:40:14

พัก วิวาทะข้ามห้อง เพื่อความสงบ
สักพัก ผมมาลองเล่น python ตอนว่างๆ
คงมาถามๆกันบ้างล่ะครับ เขียนพวกนี้มันจับต้อง ผลลัพท์ยากจริงๆ เขียน plc ยังเห็นเครื่องขยับชัดๆ จริงๆนะ ตามนิสัยน่ะ
ลองๆสักพักไม่รุ่งก็เลิก

**micronz** · 15 Jul 2020, 23:58:22

https://venturebeat.com/2020/03/03/a...ver-processor/
มีหลายตัวขุนพลอาร์ม server. 2020

**ssk** · 16 Jul 2020, 10:05:50

Galaxy Note 20 may be the first Samsung smartphone with wireless desktop DeX mode

https://tekdeeps.com/galaxy-note-20-...ktop-dex-mode/

July 15, 2020
The Samsung DeX function implies several ways to connect your smartphone to an external monitor or PC. What unites them is that in all cases a cable is used. However, DeX may soon become a completely wireless technology. Moreover, in this version, it may appear already in the Galaxy Note 20.

The evidence of the development of the DeX wireless mode was found in the form of a recording in the Samsung Tips application, which gives users tips on how to use certain functions of Galaxy smartphones. Instead of the header, the entry contained the following line: DREAM_DEX_HEADER_USE_DEX_WIRELESSLY_M_TIPS. The strange format of the text indicates that it got into the application by mistake and clearly prematurely.

Further, the message, already in normal English, explained how to connect the Samsung DeX to a TV in the living room without a cable. The advice was accompanied by a visual animation of what actions the user should perform to connect. The presence of ready-made instructions may indirectly indicate that the function is almost ready for implementation in one of the smartphones of a South Korean company. They may well be the Galaxy Note 20, the announcement of which is scheduled for August 5.

This is not the first time we have learned about Samsung’s intention to rid DeX mode of wires. The first time the company reported this two years ago. Since then, DeX has received many new features, in particular, it has become possible to connect a smartphone via a USB cable to a computer or laptop on Windows or Mac platforms. So far, only the wireless version of DeX has appeared.

Recall that DeX is a mode that adds functionality similar to a desktop device to a mobile device, allowing you to connect computer peripherals to it. After connecting the monitor, the user is provided with an interface oriented to work with the mouse and keyboard and supporting operation in windowed mode.

มาเรื่อยๆล่ะ ตามนี้ครับ Clound service on hand TAB Or Clound book. บางทีเราอาจต้องขี้ลืมๆ เรื่องเก่าๆกันหน่อยล่ะครับ

**ssk** · 16 Jul 2020, 18:46:05

เติมสิ่งที่ LINUS พูด VERSION AMD ROOM

ให้เห็นภาพชัดขึ้นนะครับ ขอไปยก ของเก่าตอนต้นๆปีที่ผมกลับมานั่งตบตีคนในนี้ นะ

ตาราง CPU FMA มันจะประมานนี้

[/CENTER][/QUOTE]

คือ 2 FMA 256 + 1 FMA512 = 256+256+512 = 1024 Bits Per Cycle

- ซึ่ง ถ้าคำสั่งเป็น AVX2 มันก็ต้อง แยกเข้า PORT 0/1 เข้า Port 5 ไม่ได้ จะเข้า PORT 5 ต้องไปจัดรูปก่อน และถ้า AVX512 จะเข้า PORT 0/1 ก็ต้อง FUSE ซะก่อน
ทุก Step ที่เพิ่มมาต้องการ 1 Instruction Cycle มันก็เกิด Latancy สิ
- ถ้าเราแตก FMA512 เป็น 2 FMA 256 เราก็สามารถ Execute คำสั่ง AVX256 ได้ 2 คำสั่งเท่าๆกันนั่นแหละ
คือ 4 FMA256 = 4x256 =1024 BITS per Cycle

- ทีนี้มาเรื่อง REGISTET ไอ้นี่มันแค่ TAG มันไม่ได้วิเศษแบบพวกครั่ง AVX512 เค้าพูดหรอก
ให้ท่อง REF มาพูดเป๊ะมันก็แค่นั้น
สำคัญมันอยู่ที่ FMA long ต่างหาก จาก REGISTER Z ที่ยาวจาก 16 เป็น 32 Step มัน
ก็ไม่ได้วิเศษไปกว่ากัน ถ้า FMA มันยาวเท่ากับ 32 STEP ผลลัพท์มันก็เท่ากันตามนี้

A คือ 512x32 = 16384 Bits Per cycle
ฺฺB คือ (256x32) + (256x32 ) = (8192)+(8192) = 16384 Bits Per cycle
C คือ 4x( 256x16) = 4x(4096) = 16384 Bits Per cycle
A = B = C อยู่แล้ว อยู่ที่ว่าจะจัดชุดคำสั่ง FMA แบบใหนเท่านั้นเอง
เพราะฉนั้น จะใช้ AVX2 หรือ 512 มันก็ไม่ต่างกัน แต่ AVX2 มีคนใช้ได้เยอะกว่าแน่ๆ

- และในเมื่อมีแต่ FMA256 ล้วนๆ 4 ชุดทดแทน FMA 2x256+512 เมื่อเราใส่ SMT ลงไป มันก็สมมาตรกันพอดีด้วย
VT core ทั้ง 2 ตัวก็สามารถรับงานไดๆก้ได้โดยไม่ต้องมานั่งแยกว่าตัวนี้ลงได้ ตัวนั้นลงไม่ได้ออกไป มันก็ไม่ต้องมีคำสั่งพิเศษที่ไป TAG
Thread ใหนพิเศษต้องลงเฉพาะ Core นี้ด้วย มันก็ลดงานของแกไปในตัว และทำให้ DEV อิ่นๆเค้าทำงานง่ายด้วยเหมือนกัน

LINUS เลยออกมาพูดแบบนั้น ว่า AVX2 ก็พอแล้ว และนี่คือสิ่งที่ AMD ทำอยู่ ตามที่แกพูดครับ
ไม่แปลกที่ AMAZON GOOGLE FB ORACLE เลือก EPYC ไปใช้กับ NVIDIA GPU สำหรับ AI SERVER

**ssk** · 16 Jul 2020, 19:37:52

ผมไปวุ่นวายเรื่อง AVX512 มานานแล้ว เลยย้ายมาฝั่งนี้บ้าง จะได้ครบๆ นะครับ
ในเมื่อวนท่องไปว่า AI ML DL ซ้ำๆซากๆน่ารำคาญแบบนั้น เราก็ต้องพูดบ้าง

มาว่าเรื่อง AI ฝั่ง AMD บ้าง ผมไม่ไช่ DEV หรือ Programer หรอกครับ ที่ผมพูดว่าจับๆ PYTHON ผมแค่ทำความรู้จักมัน
จะได้รู้ๆติดหัวไว้บ้างครับ ไครเห็นผิด ตำหนิ และ เสริมให้ด้วยนะครับ

Welcome to AMD ROCm Platform
https://rocmdocs.amd.com/en/latest/

AMD ROCm is the first open-source software development platform for HPC/Hyperscale-class GPU computing. AMD ROCm brings the UNIX philosophy of choice, minimalism and modular software development to GPU computing.

Since the ROCm ecosystem is comprised of open technologies: frameworks (Tensorflow / PyTorch), libraries (MIOpen / Blas / RCCL), programming model (HIP), inter-connect (OCD) and up streamed Linux® Kernel support – the platform is continually optimized for performance and extensibility. Tools, guidance and insights are shared freely across the ROCm GitHub community and forums.

มันไม่ได้ใหม่อะไรนะครับ มันเป็น Open frame work สำหรับการใช้งาน CPU+GPU AMD น่ะครับ
เท่าที่อ่านและดูๆอยู่ 2 สัปดาห์นี้ มัน RUN บน LINUX ที่ HPC+SERVER ใช้กันอยู่แล้ว
และมันเป้น OPEN CL ครับ มันจึงไม่ได้ต้องใช้ชุดคำสั่งพิสดารอะไร อย่าง AVX512
API เองก็รองรับ programing Framework ทุกตัว ซึ่งมันใช้งานกับ APU ได้ด้วย ซึ่งตรงนี้เป็นข้อดี
เพราะอย่างน้อย GCN RDNA VEGA IGPU ก็ทำงานได้ดีกว่า
INTEL HD GRAPHIC ทุกตัวอยู่แล้ว มาตั้งแต่สมัย FM1-2
แค่มี APU ก็ TRAIN AI ได้แล้ว ทำให้มันทำงานแบบ NVIDIA JETSON หรือ RASBERY PI ได้
เลย ไม่ต้องไปหาอะไรมาใส่เพิ่ม แค่มี APU+MB+RAM จบ และยืดหยุ่นจาก ATX ITX platform อยู่แล้ว

แต่เท่าที่ดู COMUNITY แล้วถือว่าเล็ก
อันนี้ AMD ขยับตัวช้าไปหน่อย บางทีก็อยากบ่นเหมือนกันว่ามีของดีๆในมือ แต่ต๊วมเตี๊ยมทำไรอยู่ก็ไม่รู้

**ssk** · 16 Jul 2020, 19:41:16

หรือจะจัดแบบนี้เลยก้ได้

**ssk** · 16 Jul 2020, 20:12:42

AMD เผย Epyc รุ่น 3 ออกปลายปี 2020, Radeon Instinct ตัวใหม่ สถาปัตยกรรม CDNA

https://www.blognone.com/node/115039

อันนี้ก้ไม่ใหม่ ผมว่าผมเหมินเคยเห็นเพื่อนบางท่านแปะๆไว้นะ แต่เอามาเล่าใหม่ ให้ครบๆครับ
นอกจาก Framework AMD ยังออก CDNA หรือ NEXT Compute GPU ด้วยครับ
ซึ่งจากข่าวที่ AMD ถูกเลือกให้ใช้เป็น CORE HARDWARE ของ Super computer
ที่กองทัพสหรัฐใช้ ตามข่าว อันนี้ใช้จริง ข่าวจริง ไม่ได้เอา presskit มาดม แล้ว มโนโมเมแถเอาจาก Clip ครับ

สถาปัตยกรรมของ Epyc รุ่นใหม่และ Radeon Instinct รุ่นใหม่จะถูกใช้ในซูเปอร์คอมพิวเตอร์ตัวใหม่ของสหรัฐอเมริกา 2 ตัวที่
AMD ร่วมสร้างกับ Cray คือ Frontier (ปี 2021) และ El Capitan (ปี 2022) โดย El Capitan
เป็นซูเปอร์คอมพิวเตอร์สมรรถนะสูงที่สุดในโลก (เท่าที่ประกาศไว้ในปัจจุบัน) คือ 2 exaFLOPS

ซึ่ง CDNA เท่าที่อ่านจะทำมาเพื่อรองรับ Tensor Flow API แบบ Nvidia Volta/Ampere นั่นแหละครับ
ซึ่งอย่างที่บอก GPU ของ AMD มันแรงกว่า INTEL ทุกตัวที่จำนวน Unifild เท่ากัน แม้แต่ XE/XE DG ยังสู้ไม่ได้ ข่าวจริงด้วย
ซึ่งใสส่วน HPS AI SERVER INTEL EOL PHI ไปแล้ว เพราะลูกค้าไม่เอาแล้ว ตอนนี้ทางเลือกจึงมาอยู่ที่
ARM Or AMD นี่แหละ

AMD Announces CDNA, RDNA2 Architectures, Significant Leap in Performance-per-Watt

https://www.extremetech.com/gaming/3...-architectures

CDNA: Data Center Driven
Although AMD didn’t explicitly make this connection during the event, the emergence of CDNA as a separate architecture may explain why we’ve seen no formal ROCm support for Navi 10 under Linux. (ROCm is AMD’s open source GPGPU computing platform that translates CUDA into code AMD GPUs can run).

AMD’s compute-centric roadmap starts with GCN in 2019 (Radeon Instinct MI50 and MI60), progresses through CDNA, and arrives at “CDNA2” in 2022. CDNA2, therefore, would be the architecture that supports the El Capitan supercomputer. CDNA is a compute-centric version of RDNA, but AMD didn’t give specifics about the changes between the two families beyond that statement. Sometimes it’s possible to draw conclusions about what one company will do by examining the behavior of a competitor, but Nvidia has actually used several different strategies for its high-end Tesla GPUs compared with its consumer cards. There have been times when Team Green deployed significantly different architectures for Tesla compared with GeForce and times when the company tapped the same core design for parts in both families.

Click to enlarge. This slide is cropped differently to maximize visibility of the small text elements. Gray = Unsupported, Dark blue = early, and bright blue = full production.

One major feature of CDNA? Fully connected, cache-coherent architecture between CPU and GPU, which CDNA will introduce. AMD did not specifically say if CDNA corresponds exactly to RDNA, while CDNA2 corresponds to RDNA2. In theory, CDNA might have a similar relationship to RDNA that Nvidia’s old GK104 had to GK110. Both chips implemented the Kepler architecture, but GK110 supported several features that GK104 didn’t, in addition to packing more GPU cores and supporting faster double-precision floating-point performance.

Key Feature ตัวนึงของ ROCm เมื่อทำงานร่วมกับ RDNA ตามข่าวนี้ คือมันสามารถ Emulate/Transcode CUDA API ให้ CDNA ได้ด้วย ซึ่งตาม ROAD MAP และ แผนการพัฒนา
ต่อไปมันจะก้าวตาม NVIDIA ไป เราคงได้เห็น RX series ทำงานแบบ RTX ได้ด้วยในอณาคต อันนี้ความเห็นส่วนตัวนะครับ

AMD CDNA Architecture Based Arcturus GPU ‘Radeon Instinct’ Test Board Spotted – 120 CUs With 7680 Cores, 1200 MHz HBM2 Clock, 878 MHz GPU Clock

https://wccftech.com/amd-cdna-archit...cu-7680-cores/

ทั้งหมดตามนี้ครับ โลกนี้ไม่มี AVX512 ก็ทำงาน AI ML DL ได้ครับ เพราะโลกนี้ไม่ได้มีแค่ INTEL
ไม่มี AVX512 ก็ไม่ได้ล้าหลัง และถึงไม่มี AVX512 ใน CPU ชาวโลกก็ไม่ได้เดือดร้อนครับ ฝั่ง AMD
ก็มี Radeon ซึ่งการทำงานกับ AI ได้ดีกว่าด้วยซ้ำ ไครๆก็ใช้ GPU ทำงานไป ปล่อย CPU จัดการระบบไปตามหน้าที่

Announcement

ARM ROOM นั่งคุยเรื่อง arm และ risc tecnology

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment