Announcement

**นกแสก** · 1 Aug 2020, 21:07:05

(ต่อ)

ในส่วนนี้นำเสนอประเด็นต่างๆเกี่ยวกับ AVX ก่อนอื่นเลย AVX เนี่ยเป็นชุคคำสั่งสำหรับช่วยประมวลผลด้านเวกเตอร์ซึ่งพัฒนาโดยอินเทล ปัจจุบันนี้ CPU ในตลาด desktop ยังรองรับแค่ AVX2 (256 bit) อยู่ แต่ในอนาคตอันใกล้ CPU รุ่นใหม่ของอินเทลจะรองรับรุ่นล่าสุดนั่นคือ AVX512 (512 bit) ทว่าประเด็นสำสำคัญคือแม้ชุดคำสั่งเหล่านี้ทรงพลังมาก แต่มันเหมาะกับงานบางอย่างเท่านั้น และการใช้งานก็ยากแสนยาก

AVX ทรงพลังยังไง?
- มันสามารถใช้คำนวนตัวเลข (โดยเฉพาะการคำนวณยอดฮิต FMA ซึ่งเป็นการคูณและบวกในคราวเดียวกัน) ได้ครั้งละหลายๆตัว ด้วยการประมวลผลเพียงครั้งเดียว! ยกตัวอย่าง สมมติว่ามีตัวเลขทศนิยมแบบ 32 bit ที่ต้องคำนวณเป็นจำนวนมาก หากมี CPU ที่รองรับ AVX2 (256 bit) เราสามารถเรียกใช้ชุดคำสั่ง AVX มาคำนวนตัวเลขดังกล่าวได้คราวละ 8 ตัว (256/32 = 8) นั่นคือเร็วกว่าคำนวณปกติถึง 8 เท่า!!! แต่ทั้งนี้ต้องคำนวณตัวเลขปริมาณมากๆ โดยที่เรียกใช้ชุดคำสั่ง AVX อย่างต่อเนื่องด้วยจึงจะเห็นผลจริงในทางปฏิบัติ

AVX เหมาะกับงานแบบไหน?
- งานคำนวณชุดตัวเลขที่สามารถจัดรูปเป็นเวกเตอร์ได้ และมีปริมาณงานมากพอจะใช้ชุดคำสั่ง AVX ได้อย่างต่อเนื่องเต็มประสิทธิภาพ ยิ่งถ้างานนั้นกินเวลาประมวลผลนาน ก็ยิ่งทำให้ผู้ใช้รู้สึกถึงความเร็วที่เพิ่มขึ้นได้อย่างชัดเจน ตัวอย่างได้แก่ งานจำลองทางวิทยาศาสตร์หรือวิศวกรรม, งานด้าน AI, งานวิเคราะห์ข้อมูลขนาดใหญ่, การ Transcode VDO, งานเรนเดอร์, การเข้ารหัส Crypto และงาน Supercomputer

มันเหมาะกับเกมหรือไม่?
- ไม่เหมาะ เพราะเกมไม่ได้มีแค่คำนวณตัวเลข แต่ยังต้องมีอย่างอื่นแทรกอยู่ตลอดเวลา เช่น AI สคริปต์, การควบคุมโต้ตอบของผู้เล่น หรือเงื่อนไขสภาวะต่างๆในเกม แถมยังต้องประมวลผลแบบ Frame by Frame จึงแทบไม่มีโอกาสให้เรียกใช้ชุดคำสั่ง AVX อย่างต่อเนื่องได้ หรือต่อให้เกมใช้ AVX ได้ต่อเนื่องเต็มที่ ก็ยากที่โปรแกรมเมอร์เขียนเกมจะยอมลงทุนเรียนรู้ชุดคำสั่งพวกนี้ และยากที่บริษัทเกมจะยอมทุ่มเงินจ้างโปรแกรมเมอร์ทีใช้งานเป็น ที่สำคัญ การรองรับ AVX ไม่ได้ช่วยให้เกมฮิตหรือขายดี สู้เอาเวลา, เงินทุน และแรงงานไปพัฒนาเนื้อหาเกมให้สนุก หรือแก้บักก่อนดีกว่า

ตัวอย่างคำสั่ง AVX512 นี่แค่เศษเสี้ยว ดูคำสั่งทั้งหมดได้ที่นี่

แล้วทำไมถึงใช้ยาก?

โปรแกรมเมอร์ต้องเรียนรู้เครื่องมือและขั้นตอนที่จำเป็นในการเรียกใช้ชุดคำสั่ง AVX
ต้องศึกษาคำสั่งประหลาดที่แทบจะเป็นภาษาต่างดาว ... (ไม่รู้จะบรรยายยังไง ดูในรูปเอาแล้วกัน)
ต้องจัดรูปข้อมูลที่จะใช้คำนวณให้อยู่ในรูปเวกเตอร์ที่เหมาะสมก่อน ซึ่งส่วนนี้ยากที่สุด โปรแกรมเมอร์ต้องเข้าใจทั้งเรื่องอัลกอริทึ่ม, เรื่องเวกเตอร์ และเรื่องชุดคำสั่งเป็นอย่างดี

(มีต่อ)

**นกแสก** · 1 Aug 2020, 21:14:16

(ต่อ)

ถ้า AVX มันยาก แล้วทำไม Benchmark สมัยนี้จึงรองรับ AVX ค่อนข้างแพร่หลาย? (ในความคิดผม)

Benchmark บางอันเป็นการคำนวณทางคณิตศาสตร์ เช่นคำนวณค่า Pi, ค่า Prime ฯลฯ ซึ่งง่ายต่อการจัดรูปเป็นเวกเตอร์
ทำนองเดียวกัน Benchmark บางอันเป็นการคำนวณทางวิทยาศาสตร์และวิศวกรรม ซึ่งมีรูปแบบของการคำนวณเวกเตอร์อยู่ไม่น้อย
โปรแกรมด้าน Transcode VDO ก็นิยมใช้เป็น Benchmark เช่นกัน และขั้นตอนสำคัญของงานแปลง VDO พวกนี้ก็คือการคำนวณ FFT (หรือใช้เทคนิค Transform อื่น) ซึ่งผมรู้ว่ามันเป็นเวกเตอร์
โปรแกรมเรนเดอร์กราฟฟิกบางตัว ผู้พัฒนาได้ปรับปรุงให้รองรับการใช้งาน GPU (CUDA, OpenCL) ได้แล้ว ซึ่งจัดว่ายากพอๆกับ AVX เลย พวกเขามีศักยภาพสูงกว่าโปรแกรมเมอร์ธรรมดา จึงเรียนรู้และใช้งาน AVX ได้
บางโปรแกรมก็เป็นเครื่องมือคำนวณโดยตรง เช่น Matlab, Octave, SciLab ฯลฯ การพยายามรองรับ AVX ถือว่าสมเหตุสมผล

ทำไมอินเทล ต้องพัฒนา AVX ขึ้นมา? (ในความคิดผมเช่นกัน)

เพื่อตอบโจทย์ตลาด Workstation และ Supercomputer ที่ต้องการพลังคำนวณสูงลิบลิ่ว
ที่ผ่านมาเริ่มมีการนำ GPU มาช่วยคำนวณอย่างจริงจัง ซึ่งอินเทลเป็นจ้าวตลาด CPU แต่ไม่มี GPU ดีๆมาแข่งขัน จึงต้องงัด AVX มาสู้
AVX คือทางลัดในการเพิ่มพลังคำนวณของ CPU เป็นทวีคูณ โดยไม่ต้องเพิ่มคอร์ หรืออัดคล็อกเพิ่ม
ผลพลอยได้ในเรื่องการตลาด คะแนน Benchmark ที่พุ่งขึ้นทำให้ CPU รุ่นใหม่ของอินเทลดูดี ไม่ว่าจะเทียบกับ CPU รุ่นก่อน หรือเทียบกับ CPU คู่แข่ง

กลับเข้าเรื่อง "หลุมพราง Benchmark" ซึ่งที่ผมมองว่าเป็นหลุมพรางเพราะคะแนน Benchmark ที่พุ่งขึ้น มันทำให้คำรีวิวของสื่อออกมาในทางบวก (ถ้าสื่อรู้ไม่เท่าทัน) และมีส่วนสร้างกระแสโน้มน้าวผู้คน ได้มากมาย แต่โอกาสที่ผู้ใช้ทั่วไปจะได้รับประโยชน์กลับน้อยนิดมาก ดังนี้:

โปรแกรมยอดฮิตหลายๆตัวเช่น Chrome, MS Office, Line, Skype, โปรแกรมดูหนัง ฟังเพลง, พวก Utility ต่างๆ แทบไม่ได้พึ่งพาคำนวณตัวเลขมหาศาล ส่วนเรื่องเกมนั้นผมได้พูดไปแล้ว
บางโปรแกรมอาจได้ประโยชน์จาก AVX แต่ทีมพัฒนาก็ไม่สนใจจะรองรับเพราะมันยาก
บางโปรแกรมอาจรองรับ AVX ทางอ้อม (เช่นได้การออพติไมซ์จากคอมไพเลอร์ หรือใช้งานไลบารี่ที่รองรับ) แต่ส่วนมากไม่ค่อยจำเป็น คำนวณแบบปกติมันก็เร็วพออยู่แล้ว
พวกที่ใช้ประโยชน์จริงๆ คือคนทำงานจริงจังที่ใช้โปรแกรมเฉพาะด้าน (ซึ่งผู้พัฒนาใคร่ครวญแล้วว่ามันคุ้มที่จะรองรับ AVX) หรือกลุ่มงานเฉพาะทาง เช่นงานวิจัย หรืองานประมวลผลข้อมูลตัวเลข ซื่งเป็นเพียงกลุ่มเล็ก
โปรแกรมส่วนใหญ่ที่เรารู้สึกมันอืดๆ ที่จริงมันอืดเพราะสาเหตุอื่น ไม่ใช่เพราะมันคำนวณตัวเลขได้ช้า
ลองนึกสภาพการทำงานจริง อย่าง Excel เนี่ย เราเสียเวลาส่วนใหญ่ไปกับการกรอกและแก้ไขข้อมูล แต่ตอนมันคำนวณจริงๆไม่ถึง 1 วิ ด้วยซ้ำ แล้วจะรองรับ AVX ไปเพื่อ??

ถึงตรงนี้ เพื่อนๆคิดเห็นอย่างไรก็มาร่วมพูดคุยแลกเปลี่ยนกันได้ครับ

**Beaver_XT** · 1 Aug 2020, 22:40:10

ต้องดูระยะยาวครับ เหมือน Cuda VS OpenCL นี่ล่ะ สุดท้าย Cuda ก็เอาไปครองเพราะมีคนพัฒนามากกว่า

**micronz** · 1 Aug 2020, 23:12:06

ติดตามอ่านเลยครับขอบคุณมากแปลไทยง่ายๆๆ อาจจะ คหสต. ไปบ้าง ท่านนกแสกหายไปนานเลย

**นกแสก** · 2 Aug 2020, 02:09:07

Originally posted by Beaver_XT View Post

ต้องดูระยะยาวครับ เหมือน Cuda VS OpenCL นี่ล่ะ สุดท้าย Cuda ก็เอาไปครองเพราะมีคนพัฒนามากกว่า

Cuda มีคนพัฒนามากกว่าจริงๆก็ไม่ใช่ปัญหานะครับ มันมีซอฟแวร์แปลงโค้ด Cuda-OpenCL กลับไปกลับมาได้ ใครถนัดเขียนแบบไหนก็เขียนไป สุดท้ายก็แปลงใช้กันได้หมด

Originally posted by micronz View Post

ติดตามอ่านเลยครับขอบคุณมากแปลไทยง่ายๆๆ อาจจะ คหสต. ไปบ้าง ท่านนกแสกหายไปนานเลย

ขอบคุณครับ อุตส่าห์จำกันได้ เนื้อหาในโพสนี้ผมไม่ได้แปลจากที่ไหน เขียนจากความรู้ที่ค่อยๆซึมซับมาจากหลายๆแห่งครับ

**ssk** · 2 Aug 2020, 07:30:04

เรื่อง AVX512 นี่ผมปะทะกับฝั่งนู้นมายาวๆ ในมุมมองผม ทางด้าน HW นอกจาก XENON บัตรเครดิต
ก็ไม่มีตัวใหนทำออกมาแบบ FUll Lengt ครับ พวก MOBILE ที่ว่าใช้ได้ๆ น่ะ มันเป็นแค่ FUSE AVX 2 Port ครับ
มันถึงชนะ RYZEN U ไม่ได้ไงครับ ( XENON เอง ก็ง้าง EPYC ไม่ลง ) ข้อนี่ที่คุณ นกแสกว่าผมก็มองว่ามันแหม่งๆ
จากข่าว TIGER LAKE ตบ RYZEN U ร่วงนั่นแหละครับ เพราะมันแปลก ที่ CPU 4 หัวที่มี CU CUNIT น้อยกว่า
ไปตบ CPU 8 หัวที่มี CU UNIT มากกว่าได้ มันต้องไปเอาอะไรที่ไม่ถนัดมาให้ทำ แบบนี้แหละ
ถ้าอยางนี้ก็คงช่วยืนยันได้ว่า เป็นการ TEST บนคำสั่ง AVX512 ที่ Transcode แบบปัจจุบันนี้ไม่ได้นั่นแหละครับ

โดยพื้นฐาน AVX512 มันได้วิเศษอะไรมากหรอกครับ ผมเคยบอกแล้วว่ามันเป็นแค่ Competion API
ที่พ่ายแพ้ให้กับ CUDA OPEN CL แล้วหาที่ลงไม่ได้เลยยัดเข้า CPU มาน่ะครับ อิงจากที่ผมไปอัดคนมาจากเรื่อง
PHI นะครับ

ซึ่งมัน TRANSCODE ได้แบบ มันเป็นวิธีคิดเลขครับ ถ้ารู้วิธี จะกดเครื่องคิดเลขทีละ STEP หรือกดสูตรสำเร็จรูป ที่ mem ไว้ก็ได้ทั้งคู่
คนมีเครื่องคิดเลขวิศวกรรม คงรู้ดีนะ ตอนสอบไครไม่เคยเมมสูตรมั่ง?
มันเป็น EXTENSION ไปจาก AVX/AVX2 และเรื่องนี้หาดูไม่ยากครับ RYZEN1000 แบบที่ผมใช้อยู่นี่ไงครับ
RYZEN 1000 มี FMA UNIT แค่ 2x128 BITS เองนะครับ แต่ EXECUTE AVX2 ได้ คือมันทอน AVX2 256 เป็น AVX2/128 2 ชุด
ทำงานขนานกันไป อย่าลืมว่าทุกวันนี้ FP ที่ใช้กันยังแค่ FP32 64 ยังใช้กันไม่เยอะ และ 1 FP32 จริงๆมันต้อง
กันไว้ 2 BLOCK ครับ เพื่อรองรับ OVER VALUE Result มันก็แค่ 64 BITS เอง 1 BLOCK ที่ต้องทำ AxB+C

ประมานนี้

( 32[32] x 32[32] )

64+64 = 128

AVX/AVX2/AVX512 Basic Block Size

FMA-128	64	64
A	32 [32]	32 [32]
B	32 [32]	32 [32]
C	32 [32]	32 [32]

คือ 32 x 4 =128 ถ้าเป็น 256/512 มันแค่ก็กว้างขึ้นเฉยๆ ทำงานได้ 2 หรือ 4 คำสั่งต่อครั้ง แค่นั้น
สมการการคิดเลข คนเขียนแบบ ซ้ายไปขวา แต่ ทำบนลงล่าง คอมมันก็ทำบนลงล่างแบบนั้นครับ

จริงๆตามโครงสร้าง CPU ZEN2 นี่มันก็ EXECUTE คำสั่ง AVX512 ได้ด้วยวิธีนี้เหมือนกัน แต่มันติดตรงที่
AVX512 มันยังไม่นิ่ง INTEL ยังพยายามยัดคำสั่งใหม่ๆลงมาเรื่อยๆ นิ่งเมื่อไหร่ รู้ว่าแต่ละคำสั่งทำงานยังไง
ค่อยออก Transcode หรือ NATIVE ทีเดียวครับ ZEN3 คงยังไม่น่ามี แต่อาจเพิ่ม FMA UNIT มาแทน
ให้ทัดเทียมกัน 4 FMA 256 = 256+256+512 ที่ XENON/Core X มันใช้ๆกันแล้วครับ thorught put เท่ากันครับ
ส่วนที่ฝั่งโน้นบอกว่า VIA ซื้อ AVX512 ไปนั้นจริง VIA แค่ต้องการ AVX/AVX2 แค่นั้น
สาวก CYRIX อย่างผมมองออก CPU VIA มันระดับ ATOM/Cerelon J จะยัด 512 ไปทำไม
แต่ INTEL บังคับขายน่ะ แบบนี้ได้ตังเยอะกว่า และ จะเอาไม่เอาว่ามาเพราะไม่มีไครมีขาย
ยังไงก็ต้องซื้อมาทั้งน้ำตา

**ssk** · 2 Aug 2020, 12:05:03

และมันมักมีข่าว BENCH MARK แปลก แบบนี้ออกมาสนับสนุนเรื่องนี้ที่ท่านนกแสกว่าเรื่อยๆ ตลอด ข่าวนี้ก็เหมือนกันครับ NEXT XENON 2 ตัวรุมสุบ EPYC ด้วย
AVX512 BENCHMARK และเป็นตัวที่นักอวยอาชีพ เค้าชอบใช้อีกต่างหาก INKAK ไม่ได้แค่ยัดเงินไต้โต๊ะ แต่ ยังยัดของให้หน้ามาอวยด้วยครับ ผมเห็นแบบนี่ที่อีกที่ด้วย
ลีลาการคุย ตะเภาเดียวกันเลย รุม 2 ด้วย CPU ใหม่ล่าสุด บน AVX512 ยังแพ้ EPYC เลยครับ

https://wccftech.com/intel-10nm-ice-...nchmarks-leak/

Intel’s Next-Gen 10nm Ice Lake-SP CPU Tested, Dual 28 Cores & 56 Threads Chips Compared To A Single AMD EPYC 7742 64 Core CPU

By Hassan Mujtaba
Jul 28
SHARE TWEET SUBMIT

The latest benchmarks of Intel's next-generation Ice Lake-SP Xeon CPU server family have leaked out and they show some interesting results when compared to AMD's current generation 3rd Gen EPYC Rome CPUs.
Intel's Next-Gen 10nm Ice Lake-SP CPUs Tested, Two Chips With 28 Cores and 56 Threads Each Against A Single AMD EPYC Rome 64 Core Flagship

As part of the Whitley platform, the Intel Ice Lake-SP CPU lineup will be composed of several Xeon chips. We have already seen 6 core and 24 core parts but the latest one is a 28 core part and has been spotted by TUM_APISAK in the Geekbench database and Momomo_US too in the SiSoftware database.

Intel’s 11th Gen Rocket Lake Desktop CPU Spotted Running A PCIe Gen 4 NVMe SSD – First Intel CPU Platform To Support PCIe Gen 4.0

The Intel Ice Lake-SP CPU was tested on a dual-socket server and features two of the chips. Each chip features 28 cores and 56 threads which round up to 56 cores and 112 threads in total. Since the chip is still an early engineering sample, it features lower clock speeds of 1.5 GHz base and up to 3.20 GHz boost clocks. The CPU features 42 MB of L3 and 35 MB of L2 cache for a total of 77 MB cache. The 2S Ice Lake-SP server was equipped with 512 GB of memory which should be clocked at 3200 MHz and featured in an 8-channel configuration which is one of the key highlights of the new Whitley platform.

The performance of the 2S Intel Ice Lake-SP server was evaluated within Geekbench 4 which does benefit from the AVX-512 instruction set featured on Intel's current & upcoming Xeon CPU families. In single-core tests, the server scored up to 3443 points and in multi-core tests, the chip scored up to 37317 points.

Before we compare it to the AMD EPYC 7742, it should be pointed that both of these test results are based on early engineering samples with lower clock speeds so final performance is expected to be much better. However, at the same time, the Intel CPUs benefit in this benchmark from their AVX-512 instruction set which the AMD CPUs lack. The entries are also shown in different operating system environments though the EPYC 7742 CPU was in fact tested on a Windows 10 server setup, Geekbench isn't reporting it correctly. Regardless of that, let's see how the two 28 core Ice Lake-SP Xeon CPUs stack against a single AMD EPYC 7742 CPU.

Intel All Set To Unveil & Detail Its Next-Gen Xe Graphics Next Month

We used a single EPYC 7742 entry for comparison so we are comparing a total of 64 cores and 128 threads from AMD against 56 cores and 112 threads from Intel. The AMD EPYC 7742 CPU easily out performs the Intel chips in single-core tests which is due to the higher base clock speeds of 3.4 GHz versus 1.5 GHz on the Intel parts. At the same time, the AMD platform delivers around 35000 multi-core points which are slightly lower than the Intel Ice Lake-SP parts. With final clock speeds, the Ice Lake-SP CPUs can easily outperform the AMD EPYC Rome parts but the lead may not be as big as Intel had hoped for.

It obviously looks like Intel's Ice Lake-SP was more of an EPYC Rome competitor which missed its initial schedule due to poor 10nm yields and now has to compete against AMD's EPYC Milan that is just around the corner. We will wait to see some more test and performance results for Ice Lake-SP CPUs in non-AVX 512 optimized workloads but every benchmark leak makes it very clear that Ice Lake-SP is late and AMD isn't going to make things any better for Intel.

And it also just isn't about the performance metrics, we still don't know the prices and power efficiency of Ice Lake-SP yet but we do know that the existing EPYC Rome lineup has far lower prices and TCO than the Cascade Lake-SP lineup and that is expected to remain intact even when Ice Lake-SP ships.
Intel Xeon SP Families:

Process Node	14nm+	14nm++	14nm++	10nm+	10nm++	7nm+?
Platform Name	Intel Purley	Intel Purley	Intel Cedar Island	Intel Whitley	Intel Eagle Stream	Intel Eagle Stream
MCP (Multi-Chip Package) SKUs	No	Yes	No	Yes	TBD	TBD
Socket	LGA 3647	LGA 3647 BGA 5903	LGA 4189	LGA 4189	LGA 4677	LGA 4677
Max Core Count	Up To 28	Up To 28 Up To 48	Up To 28	Up To 56?	TBD	TBD
Max Thread Count	Up To 56	Up To 56 Up To 96	Up To 56	Up To 112?	TBD	TBD
Max L3 Cache	38.5 MB L3	38.5 MB L3 66 MB L3	38.5 MB L3	TBA (1.5 MB Per Core)	TBD	TBD
Memory Support	DDR4-2666 6-Channel	DDR4-2933 6-Channel DDR4 2933 12-Channel	Up To 6-Channel DDR4-3200	Up To 8-Channel DDR4-3200	8-Channel DDR5	8-Channel DDR5
PCIe Gen Support	PCIe 3.0 (48 Lanes)	PCIe 3.0 (48 Lanes)	PCIe 3.0 (48 Lanes)	PCIe 4.0 (64 Lanes)	PCIe 5.0	PCIe 5.0
TDP Range	140W-205W	165W-205W	150W-250W	~250W-~300W	TBD	TBD
3D Xpoint Optane DIMM	N/A	Apache Pass	Barlow Pass	Barlow Pass	Crow Pass	Donahue Pass
Competition	AMD EPYC Naples 14nm	AMD EPYC Rome 7nm	AMD EPYC Rome 7nm	AMD EPYC Milan 7nm+	AMD EPYC Genoa ~5nm	AMD Next-Gen EPYC (Post Genoa)
Launch	2017	2018	2020	2020	2021	2022-2023?

Intel Xeon 10nm+ Ice Lake-SP Family

Intel Ice Lake-SP processors will be shipping later this year and will be based on the 10nm+ process node. We have seen earlier slides say that the Ice Lake family would feature up to 28 cores but the one from ASUS's presentation says that it would actually feature up to 38 cores & 76 threads per socket. There are also rumors indicating up to 56 cores and 112 threads so we cannot say for sure what will the actual core counts on the new chips look like.

The main highlight of Ice Lake-SP processors will be support for PCIe Gen 4 and 8-channel DDR4 memory. The Ice Lake Xeon family would offer up to 64 PCIe Gen 4 lanes and would offer support for 8-channel DDR4 memory clocked at 3200 MHz (16 DIMM per socket with 2nd Gen Persistent memory support). Intel Ice Lake Xeon processors would be based on the brand new Sunny Cove core architecture which delivers an 18% IPC improvement versus the Skylake core architecture that has been around since 2015.

One thing to note is that Intel's 10nm for 2020 is an enhanced node of the original 10nm node that will mark its debut with the Tiger Lake CPUs. It's marked as 10nm+ and that is specifically what the Ice Lake-SP Xeon line will make use of. Some of the major upgrades that 10nm will deliver include:

2.7x density scaling vs 14nm
Self-aligned Quad-Patterning
Contact Over Active Gate
Cobalt Interconnect (M0, M1)
1st Gen Foveros 3D Stacking
2nd Gen EMIB

The Intel Ice Lake-SP lineup would be directly competing against AMD's enhanced 7nm based EPYC Milan lineup which will feature the brand new 7nm Zen 3 core architecture which is confirmed to be one of AMD's biggest architectural upgrade since the original Zen core. Expect to see more Intel & NVIDIA based servers in the coming months.

ซึ่งก็มีคนให้ความเห็นราวๆนี้เหมือนกันว่า ดูดีแต่ในกระดาษและผลเทส แต่ใช้งานจริงไร้ประโยชน์มากๆ ซึ่งเอาจริงๆในยุค YOUTUBER และ นักรีวิวมีมากมาย ราวหญ้าในท้องทุ่งแบบนี้
น้อยลงมาก คนจะซื้อเค้าก็ไปดูผลเทสจากพวกนี้ประกอบ ซึ่งบางทีมัน MAKE ยาก ถึงคนสมัยนี้จะสนใจ PC น้อยลง แต่ก็ฉลาดเลือกเยอะขึ้น ไม่แปลกที่ Q2 สัดส่วน DESKTOP PC+NOTEBOOK INTEL มันโตติดลบ
ในขณะที่ AMD+ และ สัดส่วนการตลาดเบียดขึ้นมา 2 หลัก ใน 2-3 ปีนี้เอง

ACTCech • 3 days ago
Intel fake marketing at its best. Intel's AVX-512 has a lot of performance issues. Sure, it excels in some niche cases/benchmarks but it is useless in relevant ones.

As Linus Torvalds stated:

"I hope AVX-512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on,"

Web performance firm Cloudflare has written about the performance impact of AVX-512. It advised customers who don't need AVX-512 for high-performance tasks to disable AVX-512 execution on the server and desktop to avoid its "accidental" throttling.

"I want my power limits to be reached with regular integer code, not with some AVX-512 power virus that takes away top frequency (because people ended up using it for memcpy!) and takes away cores (because those useless garbage units take up space)," continued Torvalds.

"Yes, yes, I'm biased. I absolutely detest FP benchmarks, and I realize other people care deeply. I just think AVX-512 is exactly the wrong thing to do.

**นกแสก** · 3 Aug 2020, 03:09:12

Originally posted by ssk View Post

.....
จริงๆตามโครงสร้าง CPU ZEN2 นี่มันก็ EXECUTE คำสั่ง AVX512 ได้ด้วยวิธีนี้เหมือนกัน แต่มันติดตรงที่
AVX512 มันยังไม่นิ่ง INTEL ยังพยายามยัดคำสั่งใหม่ๆลงมาเรื่อยๆ....
...

ผมก็คิดแบบนี้เหมือนกันครับ ถึงได้มั่นใจว่า Zen3 ไม่รองรับ AVX512

อีกเหตุผลนึงคือ AVX มันไม่เข้าทาง HSA ที่ AMD พยายามผลักดันสักเท่าไหร่ แนวทางของ HSA คือชิพประมวลผลแต่ะละชนิดมันต้องช่วยกันทำงาน ต้องจัดสรรแบ่งงานกันได้ง่าย ไม่ใช่ต้องมาคอยจัดรูปเวกเตอร์ไปๆมาๆวุ่นวาย แนวทางที่ราบรื่นที่สุดคือ OpenCL ซึ่งเขียนทีเดียวรันได้ทั้ง CPU และ GPU ของทุกค่าย

**นกแสก** · 3 Aug 2020, 03:10:56

แปลกจริง

รูปที่ผมแปะไว้ หายไปไหนหมด โดนลบ??

**นกแสก** · 3 Aug 2020, 03:16:15

จะทดสอบแปะรูปอีกที มันขึ้นว่า...

"You are not authorized to create or remove attachments."

นี่ผมโดนแบน ไม่ให้แปะรูปหรอ?!?

**micronz** · 3 Aug 2020, 06:20:51

บอร์ดใหม่มันแปะรูปยากครับท่านผมยังมืนๆ ที่านอัพรูปที่เว้บฝากรูปครับแล้ว มาใช้ไอค่อนรูปภาพแปะลิ้งค์ลงไปอีกทีน่าจะได้นะครับ
ไม่น่ามีใครแบนท่านครับ

**Hirasawa** · 3 Aug 2020, 10:30:06

คนที่ไม่รู้ ก็ตกเป็นเหยื่อการตลาดไปครับ
natural selection

*คนไม่รู้ ≠ คนโง่นะ (บอกไว้ก่อน เดี๋ยวมีดราม่าอีก)

กลัวคนประเภทดีแต่อวยซะมากกว่า พวกที่ไม่รู้มั๊กจะตกเป็นเหยื่อของคนประเภทนี้

เรื่องคำสั่ง AVX512 ถ้าอินเทลยอดขายเยอะ เอามาลง cpu ระดับ mainstream ทางฝั่ง developer ก็คงเอาใช้เยอะขึ้นแหละครับ คุ้มที่จะลงทุน(แต่จะคุ้มมั้ย ก็คงต้องรอดูอนาคต) เพราะคนใช้เยอะ

คล้ายๆกับ ios ของ apple ที่เป็นระบบปิด ถามว่า ทำไมมี app ลงฝั่ง ios เยอะ ทั้งๆที่ระบบก็ปิด จำกัดสิทธิ์เยอะ ก็เพราะมีคนใช้เยอะไงละครับ

**นกแสก** · 3 Aug 2020, 13:20:56

แปะได้ซะที สรุปต้องอัพขึ้น Cloud ก่อน...

**ssk** · 10 Aug 2020, 14:32:01

Dav1d AV1 Decoder Begins Adding AVX-512 Optimizations For Intel Ice LakeWritten by Michael Larabel in Multimedia on 22 January 2020 at 04:45 PM EST.
Phoronix ทดสอบ codec optimosed บน Icelake U พบว่า ไม่ได้มีประโชน์อะไรเลย
https://www.phoronix.com/scan.php?pa...X-512-Ice-Lake

Ahead of the forthcoming dav1d 0.6 release, this open-source AV1 video decoder has begun implementing AVX-512 optimizations targeting Intel Ice Lake processors.
ตามนั้น หลังจาก DAV1d 0.6 ประกาศว่า Optimised Codec สำหรับ AVX512 ออกมาโดยมีเป้าหมายที่ Ice lake CPU

The work has begun on AVX-512 optimizations focused on Ice Lake for this already quite speedy AV1 video decoder.

ซึ่งปรับปรุงเพื่อให้ใช้งานกับ AV1 Codec

Ice Lake introduced six additional instructions to the AVX-512 capabilities and other AVX improvements in general we've seen to the performance. This is on top of Ice Lake's better IPC and other improvements for this 10nm+ processor albeit at lower clock speeds currently compared to 14nm parts.

ซึ่งจากที่ INTEL เคยบอกว่า Icelake มี IPC ที่แรงกว่าเดิม 10%

I did fire up some dav1d Git benchmarks to see if the AVX-512 work so far has made much of an impact. At least with the Core i7 1065G7 as my lone Ice Lake system at the moment, it didn't make any measurable increase.
ซึ่งทางผุ้ทดสอบได้ทดลองทำการวัดผลการทำงาน พบว่า มันไม่ได้แรงขึ้นจนวัดผลได้ว่าดีจริงเลยแม้แต่น้อย สรุป ใส่ม่ก็เท่านั้น ไม่ได้มีประโยชน์อะไรเลย ทำไมน่ะเหรอ เดี๋ยวบอกข้างล่าง

Of course, I'll keep monitoring the AVX-512 progress in dav1d and other open-source applications.

ตามที่ผมชอบพูดว่า core U/Y มันให้ AVX512 มาแบบ FUSE AVX แค่ชุดเดียว ผมตอบจากข้อมูลที่ผ่านๆตามา ก็ตามที่เห็น FMA 512 ที่ Port 5 ถูกตัดออกเพื่อลด TDP/Power Usage
แล้วแบบนี้ก็ตามที่เค้าบอก AVX512 แบบนี้ จะมีหรือไม่มีก็ไม่ต่างกันกับ เกาเหลา ZEN2 ซึ่งตัว GEN-11 ก็น่าจะมาแบบนี้แหละ สุดท้ายก็เหมือนเดิม อ้อ FUSE AVX มันทำงานแค่ Zmm16 แค่นั้นนะ
ซึ่งพูดได้เต็มปากว่า AVX512 พวกนี้เป็นแค่ของเล่น ให้หน้าม้า INKALA ทั่วโลก รวมทั้งในห้องนี้ มีอะไรไว้คุย แค่นั้นแหละครับ ไม่มีประโยชณ์อะไรมากไปกว่านั้น

รังสรรค์ SUB นรก BY SSK เอาข่าวมาแปะเพิ่มให้ครับ ก้ตามทีเราคุยกัน ถ้าไม่ได้เทส บนอะไรที่ล๊อคไว้กับ AVX512
ตามโครงสร้าง CPU มันไม่ได้ดีกว่า RYZEN-2 เลยแม้แต่น้อย ไม่แปลกที่ผลจาก REVIEW ตาม YOUTUBE RYZEN แรงกว่าในบางเรื่อง

Announcement

Zen3, Rocket Lake กับหลุมพราง Benchmark

Zen3, Rocket Lake กับหลุมพราง Benchmark

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Announcement

​Zen3, Rocket Lake กับหลุมพราง Benchmark

​Zen3, Rocket Lake กับหลุมพราง Benchmark

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Zen3, Rocket Lake กับหลุมพราง Benchmark

Zen3, Rocket Lake กับหลุมพราง Benchmark