请输入您要查询的百科知识:

 

词条 Reynold Xin
释义

  1. Biography

      UC Berkeley    Databricks  

  2. References

{{Infobox scientist
| name = Reynold Xin
| caption =
| birth_place =
| death_date =
| death_place =
| residence =
| alma_mater = UC Berkeley (doctoral study)
University of Toronto (BA.Sc.)
| doctoral_advisor = Michael J. Franklin
| known_for = Apache Spark, Databricks
| footnotes =
| ethnicity =
| field = Computer Science
| author_abbreviation_bot =
| author_abbreviation_zoo =
| prizes =
| religion =
}}Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks.[1] He is a frequent speaker on the topic of Big Data and open source software at conferences. He is best known for his work on Apache Spark, which {{as of|2016|06|lc=y}} is the top open-source Big Data project.[2] He designed and lead development of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames—all of which are part of the core Apache Spark distribution—plus served as the release manager for Spark's 2.0 release.[3] {{As of|2016|09}} he is also the most active contributor to Spark with over 1000 commits.[4]

Biography

UC Berkeley

Xin started his work on the Spark open source project while he was a PhD candidate at the UC Berkeley AMPLab.

The first research project, Shark,[5] created a system that was able to efficiently execute SQL and advanced analytics workloads at scale. Shark won Best Demo Award at SIGMOD 2012.[6] Shark was one of the first open source interactive SQL on Hadoop systems, with claims that it was between 10 and 100 times faster than Apache Hive. Shark was used by technology companies such as Yahoo,[7] although it was replaced by a newer system called Spark SQL in 2014.[8]

The second research project, GraphX,[9] created a graph processing system on top of Spark, a general data-parallel system. GraphX at the same challenged the notion that specialized systems are necessary for graph computation. GraphX was released as an open source project and merged into Spark in 2014, as the graph processing library on Spark.

Databricks

In 2013, along with Matei Zaharia and other key Spark contributors, Xin co-founded Databricks, a venture-backed company based in San Francisco that offers data platform as a service, based on Spark.

In 2014, Xin led a team of engineers from Databricks to compete in the Sort Benchmark and won the 2014 world record in Daytona GraySort using Spark, beating the previous record held by Apache Hadoop by 30 times.[10] Xin claimed that Spark was the fastest open source engine for sorting a petabyte of data.[11]

While at Databricks, he also started the DataFrames project,[12] Project Tungsten,[13] and Structured Streaming.[14] DataFrames has become the foundational API while Tungsten has become the new execution engine.

References

1. ^{{cite web |url=https://www.bloomberg.com/research/stocks/private/person.asp?personId=367761215&privcapId=247369843&previousCapId=247369843&previousTitle=Databricks%20Inc. |title=Reynold Xin: Executive Profile & Biography - Businessweek |author= |website=bloomberg.com |publisher=Bloomberg Businessweek |access-date=21 September 2016}}
2. ^{{cite web |url=https://www.datanami.com/2016/06/08/apache-spark-adoption-numbers/ |title=Apache Spark Adoption by the Numbers |last1=Woodie |first1=Alex |date=8 June 2016 |website=datanami.com |publisher=Tabor Communications |access-date=21 September 2016}}
3. ^{{Cite web|url=http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Apache-Spark-2-0-0-td18471.html|title=Apache Spark Developers List - [ANNOUNCE] Announcing Apache Spark 2.0.0|website=apache-spark-developers-list.1001551.n3.nabble.com|access-date=2016-08-04}}
4. ^{{Cite web|url=https://github.com/apache/spark/graphs/contributors|title=apache/spark|website=GitHub|access-date=2016-08-04}}
5. ^{{Cite journal|last=Xin|first=Reynold S.|last2=Rosen|first2=Josh|last3=Zaharia|first3=Matei|last4=Franklin|first4=Michael J.|last5=Shenker|first5=Scott|last6=Stoica|first6=Ion|date=2013-01-01|title=Shark: SQL and Rich Analytics at Scale|url=http://doi.acm.org/10.1145/2463676.2465288|journal=Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data|series=SIGMOD '13|location=New York, NY, USA|publisher=ACM|pages=13–24|doi=10.1145/2463676.2465288|isbn=9781450320375}}
6. ^{{Cite web|url=https://amplab.cs.berkeley.edu/news/shark-wins-best-demo-award-at-sigmod-2012/|title=Shark Wins Best Demo Award at SIGMOD 2012|website=AMPLab - UC Berkeley|language=en-US|access-date=2016-08-04}}
7. ^{{Cite web|url=https://spark-summit.org/2013/wp-content/uploads/2013/10/Tully-SparkSummit4.pdf|title=Analytics on Spark & Shark @Yahoo|last=Tully|first=|date=|website=|publisher=|access-date=}}
8. ^{{Cite web|url=https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html|title=Shark, Spark SQL, Hive on Spark, and the future of SQL on Apache Spark|date=2014-07-01|access-date=2016-08-04}}
9. ^{{Cite journal|last=Gonzalez|first=Joseph E.|last2=Xin|first2=Reynold S.|last3=Dave|first3=Ankur|last4=Crankshaw|first4=Daniel|last5=Franklin|first5=Michael J.|last6=Stoica|first6=Ion|date=2014-01-01|title=GraphX: Graph Processing in a Distributed Dataflow Framework|url=http://dl.acm.org/citation.cfm?id=2685048.2685096|journal=Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation|series=OSDI'14|location=Berkeley, CA, USA|publisher=USENIX Association|pages=599–613|isbn=9781931971164}}
10. ^{{Cite web|url=https://www.wired.com/2014/10/startup-crunches-100-terabytes-data-record-23-minutes/|title=Startup Crunches 100 Terabytes of Data in a Record 23 Minutes|language=en-US|access-date=2016-08-04}}
11. ^{{Cite web|url=https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html|title=Apache Spark the fastest open source engine for sorting a petabyte|date=2014-10-10|access-date=2016-08-04}}
12. ^{{Cite web|url=https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html|title=Introducing DataFrames in Apache Spark for Large Scale Data Science|date=2015-02-17|access-date=2016-08-04}}
13. ^{{Cite web|url=https://www.datanami.com/2015/05/04/deep-dive-into-databricks-big-speedup-plans-for-apache-spark/|title=Deep Dive Into Databricks' Big Speedup Plans for Apache Spark|last1=Woodie |first1=Alex |date=4 May 2015|website=datanami.com |publisher=Tabor Communications |access-date=21 September 2016}}
14. ^{{cite web |url=https://www.datanami.com/2016/02/25/spark-2-0-to-introduce-new-structured-streaming-engine/ |title=Spark 2.0 to Introduce New 'Structured Streaming' Engine |last1=Woodie |first1=Alex |date=25 February 2016 |website=datanami.com |publisher=Tabor Communications |access-date=21 September 2016}}
{{Authority control}}{{DEFAULTSORT:Xin, Reynold}}

4 : Living people|University of California, Berkeley alumni|University of Toronto alumni|Year of birth missing (living people)

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/10 17:12:38