首页 > BigData > Distributed System Prerequisite List

Distributed System Prerequisite List

2013年10月27日 发表评论 阅读评论

bigdata写在前面:不知不觉,来帝都已经一年整了,这也意味着从search转向分布式系统真正一年了。当初选择这个方向,也考虑到过会有很多不容易,但是既然决定了、选择了,就要努力去做好。过去的一年,也曾迷惘过,也曾沮丧过,但好在努力坚持了下来。总得来说,对所目前的状态,以及所做的事还是充满了信心。一直以来,都坚信只要是自己喜欢的事情,努力去做了,就一定能做好,这是支撑我不断努力、奋斗的信念。最近一直在考虑一个事情,过去的一年,在工作之余,陆陆续续零零散散看了不少分布式相关的资料,打算把这一块好好梳理一下,希望在接下来一年左右时间,通过系统地、深入地学习,对这个领域的理解能够更加全面、透彻。这也是今天写下这篇文章的一个初衷,一方面是给自己接下来的学习理清一个思路,另一方面,也是希望能够与有同样需求的朋友共勉。

接下的内容按几个大类来列:
1. 文件系统
a. GFS – The Google File System
b. HDFS
1) The Hadoop Distributed File System
2) The Hadoop Distributed File System: Architecture And Design
c. XFS – The Tencent File System

2. 数据库系统
a. BigTable – BigTable: A Distributed Storage System for Structured Data
b. HBase – The Apache HBase Reference Guide
c. Dynamo – Dynamo: Amazon’s Highly Available Key-Value Store
d. Megastore – Megastore: Providing Scalable, Highly Available Storage for Interactive Services
e. Spanner – Spanner: Google’s Globally-Distributed Database
f. Azure – Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency
g. Percolator – Large-scale Incremental Processing Using Distributed Transactions and Notifications

3. 机群/资源管理系统
a. Omega – Omega: Flexible, Scalable Schedulers for Large Compute Clusters
b. Autopilot – Autopilot: Automatic Data Center Management
c. Yarn
1) Architecture of Next Generation Apache Hadoop MapReduce Framework
2) The Next Generation of Apache Hadoop Mapreduce
3) Introducing Apache Hadoop YARN
d. Mesos – A Platform for Fine-Grained Resource Sharing in the Data Center

4. 计算框架:
a. MapReduce – MapReduce: Simplified Data Processing on Large Clusters
b. Storm – Storm: Distributed and Fault-Tolerant Realtime Computaion
c. Spark – Spark: Cluster Computing with Working Sets
d. Impala – Cloudera Impala: Real-Time Querie in Apache Hadoop
e. Dremel – Dremel: Interactive Analysis of Web-Scale Datasets
f. Hive/Stinger
1) Hive: A Warehousing Solution Over a MapReduce Framework
2) Hive: A Petabyte Scale Data Warehouse Using Hadoop
3) The Stinger Initiative: Making Apache Hive 100 Times Faster
4) Stinger, Interactive Query for Apache Hive
g. FlumeJava/Crunch
1) FlumeJava: Easy, Efficient Data-Parellel Pipelines
2) Introducing Crunch: Easy MapReduce Pipelines for Apache Hadoop
h. Tez
1) Apache Hadoop Tez
2) Apache Tez: A New Chapter in Hadoop Data Processing
g. Presto – Presto: Interacting with petabytes of data at Facebook

5. 分布式一致性
a. Paxos – Paxos Made Simple
b. Zookeeper
1) Zookeeper: A Distributed Coordination Service for Distributes Applications
2) Zookeeper: Wait-free Coordination for Internet-scale Systems
c. Chubby – The Chubby Lock Service for Loosely-coupled Distributed Systems
d. Raft – In Search of an Understandable Consensus Algorithm

6. 其它
a. SequenceFile – Sequence File Format
b. SSTable
1) SSTable and Log Structured Storage: LevelDB
2) SSTable Storage Format
c. RCFile – RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
d. ORCFile – ORC File Format
e. Parquet – Parquet: Columnar Storage for The People

分类: BigData 标签:
  1. duliangang@gmail.com
    2014年5月24日17:47 | #1

    一不小心又谷歌到小武哥的博客里来了…….
    先留下脚印在学习

  1. 本文目前尚无任何 trackbacks 和 pingbacks.
您必须在 登录 后才能发布评论.