I run the Products Strategic Insights function at Splunk. I own the vision on using products data to accelerate Splunk's business. My team has purview over portfolio-wide insights and actions in Splunk's Products and Technology organization, as well as Sales, Customer Success, Corporate Strategy, Marketing, Finance, and Legal. My work covers aligned perspectives across 10+ Splunk products and influences 40%+ of Splunk's overall annual recurring revenue.
Past activities
I was previously a member of the Performance Team at Cloudera. My work involved both internal and competitive performance analysis and optimization. I specialized in performance across multiple components of our big data platform, including Hadoop MapReduce, Impala, HBase, Search, and Hive - someone had to make sure the entire Hadoop ecosystem runs fast together.
I'm a lead author to Statistical Workload Injector for MapReduce (SWIM), an open source tool that allows someone to synthesize and replay MapReduce production workloads.
SWIM has become a standard MapReduce performance measurement tool used to certify many Cloudera partners.
You can learn more about SWIM in our
MASCOTS 2011 and VLDB 2012 papers.
I contributed to the first generation big data industry standard benchmarks within the Transactional Processing Council (TPC) - TPC-DS 2.0, TPCx-BigBench, TPCx-HS. I also serve as program committee member and reviewer to various conferences, publications, and NSF funding panels.
I hold a computer science PhD with MBA minor from UC Berkeley. My dissertation is Workload-Driven Design and Evaluation of Large Scale Data Centric Systems. I worked with Professor Randy Katz at the AMP Lab. My dissertation committee also included Professors Vern Paxson and Ray Larson.
Publications
- Data Quality: Experiences and Lessons from Operationalizing Big Data. A. Ganapathi, Y. Chen. IEEE Big Data. 2016.
- Scaling SQL-on-Hadoop for BI. Y. Chen, D. Kumar. Strata Hadoop World London. 2015. See also extended version on Cloudera Engineering Blog.
- Five Challenges for Energy Efficient Computing Research. Y. Chen. Invited talk. National Science Foundation Workshop on Sustainable Data Centers. 2015.
- The Truth About MapReduce Performance on SSDs. K. Kambatla, Y. Chen. Strata Hadoop World San Jose. 2015. See also extended version at the Large Installation System Administration Conference (LISA) 2014, and abbreviated version on Cloudera Engineering Blog (This post was a top-ten Cloudera Engineering Blog of 2014).
- Underhyped - Big Data as an Advance in the Scientific Method. Y. Chen. Cloudera Vision Blog. 2014.
- Five Pitfalls of Benchmarking Big Data Systems. Y. Chen. G. Shapira. Big Data Spain. 2014. See also talk at LISA 2014, post on Cloudera Engineering Blog, featured article for IEEE Transactions on Services Computing Jan/Feb 2016.
- On Big Data benchmarks. Interview with Francois Raab and Yanpei Chen. F. Raab, Y. Chen. R. V. Zicari (Ed). ODBMS Industry Watch. 2014. Also appeared as Big Data Benchmarks: Toward Real-Life Use Cases on Cloudera Engineering Blog.
- Fine Tuning a Hadoop Cluster to Increase Performance. A. Acosta, B. Gowda, Y. Chen. Panel, Dell-Intel-Cloudera. Hadoop Summit. 2014. See also post-panel interview.
- Rigorous and Multi-Tenant Hbase Performance Measurment. G. Kamat, Y. Chen. Hadoop Summit. 2014. Slides.
- Impala Performance Update: Now Reaching DBMS-Class Speed. J. Erickson, G. Rahn, M. Kornacker, Y. Chen. Cloudera Engineering Blog. 2014. (This post was a top-ten Cloudera Engineering Blog of 2014)
- From TPC-C to Big Data Benchmarks: A Functional Workload Model. Y. Chen, F. Raab, R. Katz. Lecture Notes on Computer Science, Volume 8163, 2014. Extended proceedings from Workshop on Big Data Benchmarks, 2012.
- Configuring Impala and MapReduce for Multi-tenant Performance. Y. Chen, P. Gokhale, A. Singla. Cloudera Engineering Blog. 2013.
- Interactive Query Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. Y. Chen, S. Alspaugh, R. Katz. International Conference on Very Large Data Bases (VLDB) 2012.
- We Don't Know Enough to make a Big Data Benchmark Suite - An Academia-Industry View. Y. Chen. Workshop on Big Data Benchmarking. 2012.
- Understanding TCP Incast and Its Implications for Big Data Workloads. Y. Chen, R. Griffith, D. Zats, A. D. Joseph, R. Katz. USENIX ;login: Magazine. Vol. 37. No. 3. pp. 24-38. June 2012.
- Energy Efficiency for Large-Scale MapReduce Workloads with Significant Interactive Analysis. Y. Chen, S. Alspaugh, D. Borthakur, R. Katz. European Conference on Computer Systems (EuroSys) 2012.
- Challenges and Opportunities for Managing Data Systems Using Statistical Models. Y. Chen, A. Ganapathi, R. Katz. 2011. IEEE Data Engineering Bulletin. Vol. 34. No. 4. pp. 53-60. December 2011.
- Hadoop and Performance. T. Lipcon and Y. Chen. 2011. Hadoop World. November 2011.
- Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis. Y. Chen, K. Srinivasan, G. Goodson, R. Katz. 23rd ACM Symposium on Operating Systems Principles (SOSP) 2011.
- The Case for Evaluating MapReduce Performance Using Workload Suites. Y. Chen, A. Ganapathi, R.Griffith, R. Katz. 19th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) 2011.
- Integrating Renewable Energy Using Data Analytics Systems: Challenges and Opportunities. A. Krioukov, C. Goebel, S. Alspaugh, Y. Chen, D. Culler, R. Katz. IEEE Data Engineering Bulletin. Vol. 34. No. 1. pp. 3-15. March 2011.
- What's New about Cloud Security. Y. Chen, V. Paxson, R. Katz. UC Berkeley EECS Tech Report. 2010.
Non-technical interests
I am interested in how technology affects society at large.
I believe we computer scientists should
participate in relevant discussions and contribute our perspectives.
Half-presentable work from the past:
- Gender Balance in UC Berkeley EECS. Y. Chen & J. Nam. 2007.
Self-started research project. Results released in Spring 2007 to the Chair of EECS and the Diversity Director of the Department. Met with UC Berkeley Chancellor & the Assoc. Vice Provost for Faculty Equity to discuss results. Report forwarded to the Vice Chancellor for Equity and Inclusion, as well as other entities on campus.
Personal stuff
I was born in China and holds Australian-US citizenship. I hold interests unique to each country - I do Chinese caligraphy, I love college football (Go Bears!!!), and I still miss playing cricket. I went to an all-boys public high school in Australia, and I hold Computer Science BS, MS, and PhD with MBA Minor degrees from UC Berkeley. I am a happy father of two energetic toddler boys, and I currently spend the majority of my spare time playing with them. When they grow older, I hope to revive my neglected interests and introduce these interests to my sons, including various sports (badminton, table tennis, fencing, archery), classical guitar, chess, hiking, and traveling.
The rest to be filled later ...
Past awards
National Science Foundation Graduate Research Fellow
UC Berkeley Regents and Chancellors Scholar
UC Berkeley Lipson Humanistic Values Scholar
Premier's Awards in Math, Physics, Chemistry (Victoria, Australia)
Australia Student Prize 2001
|