Storm杂谈之Topology的启动过程-云计算-火龙果软件工程

捐助

Storm杂谈之Topology的启动过程

作者：joeywen 来源：CSDN 发布于 2015-6-9

次浏览

topology的提交

大家都知道，要提交Storm Topology 到Cluster，需要运行如下命令：

${STORM_HOME}/bin/storm jar xxxxxxxxxxx.jar ${main class} [args ...]

bin目录下storm是一个Python文件，我们可以看一下Python脚本的main方法

def main():  
    if len(sys.argv) <= 1:  
        print_usage()  
        sys.exit(-1)  
    global CONFIG_OPTS  
    config_list, args = parse_config_opts(sys.argv[1:])  
    parse_config(config_list)  
    COMMAND = args[0]  
    ARGS = args[1:]  
    (COMMANDS.get(COMMAND, unknown_command))(*ARGS)  
      
if __name__ == "__main__":  
    main()

首先解析args参数，解析完了之后，把所有的参数传递给COMMANDS，由COMMANDS调用正确的方法，COMMANDS是一个Dict，key是string，value是function

COMMANDS = {"jar": jar, "kill": kill, "shell": shell, "nimbus": nimbus, "ui": ui, "logviewer": logviewer,  
            "drpc": drpc, "supervisor": supervisor, "localconfvalue": print_localconfvalue,  
            "remoteconfvalue": print_remoteconfvalue, "repl": repl, "classpath": print_classpath,  
            "activate": activate, "deactivate": deactivate, "rebalance": rebalance, "help": print_usage,  
            "list": listtopos, "dev-zookeeper": dev_zookeeper, "version": version, "monitor": monitor}

我们是调用jar方法：

def jar(jarfile, klass, *args):  
    """Syntax: [storm jar topology-jar-path class ...] 
 
    Runs the main method of class with the specified arguments.  
    The storm jars and configs in ~/.storm are put on the classpath.  
    The process is configured so that StormSubmitter  
    (http://storm.incubator.apache.org/apidocs/backtype/storm/StormSubmitter.html) 
    will upload the jar at topology-jar-path when the topology is submitted. 
    """  
    exec_storm_class(  
        klass,  
        jvmtype="-client",  
        extrajars=[jarfile, USER_CONF_DIR, STORM_DIR + "/bin"],  
        args=args,  
        jvmopts=JAR_JVM_OPTS + ["-Dstorm.jar=" + jarfile])

exec_storm_class时加了一些默认的参数，jvmtype是client的，为什么用client模式启动，而不是server呐？二者区别请看之前的一篇blog：Real differences between “java -server” and “java -client” ，其他的就是把系统配置传进去：

def exec_storm_class(klass, jvmtype="-server", jvmopts=[], extrajars=[], args=[], fork=False):  
    global CONFFILE  
    storm_log_dir = confvalue("storm.log.dir",[CLUSTER_CONF_DIR])  
    if(storm_log_dir == None or storm_log_dir == "nil"):  
        storm_log_dir = STORM_DIR+"/logs"  
    all_args = [  
        JAVA_CMD, jvmtype, get_config_opts(),  
        "-Dstorm.home=" + STORM_DIR,  
        "-Dstorm.log.dir=" + storm_log_dir,   
        "-Djava.library.path=" + confvalue("java.library.path", extrajars),  
        "-Dstorm.conf.file=" + CONFFILE,  
        "-cp", get_classpath(extrajars),  
    ] + jvmopts + [klass] + list(args)  
    print("Running: " + " ".join(all_args))  
    if fork:  
        os.spawnvp(os.P_WAIT, JAVA_CMD, all_args)  
    else:  
        os.execvp(JAVA_CMD, all_args) # replaces the current process and  
        # never returns

组件初始化

进程启动之后，就开始调用你自己写的Topology代码了，我们一般用TopologyBuilder来构建Topology，TopologyBuilder有三个变量

private Map<String, IRichBolt> _bolts = new HashMap<String, IRichBolt>();  
    private Map<String, IRichSpout> _spouts = new HashMap<String, IRichSpout>();  
    private Map<String, ComponentCommon> _commons = new HashMap<String, ComponentCommon>();

_bolts和_spouts就不言而喻了，就是存放你定义的bolt和spout，然后setXXX（）进来的，key=componentId，value是自定义实现的组件
_commons存放该组件额外的一些信息，并行度，额外配置等等。每set一个组件时都会调用初始化common方法

private void initCommon(String id, IComponent component, Number parallelism) {  
        ComponentCommon common = new ComponentCommon();  
        common.set_inputs(new HashMap<GlobalStreamId, Grouping>());  
        if(parallelism!=null) common.set_parallelism_hint(parallelism.intValue());  
        Map conf = component.getComponentConfiguration();  
        if(conf!=null) common.set_json_conf(JSONValue.toJSONString(conf));  
        _commons.put(id, common);  
    }

该方法会调getComponentCommon方法

private ComponentCommon getComponentCommon(String id, IComponent component) {  
        ComponentCommon ret = new ComponentCommon(_commons.get(id));  
          
        OutputFieldsGetter getter = new OutputFieldsGetter();  
        component.declareOutputFields(getter);  
        ret.set_streams(getter.getFieldsDeclaration());  
        return ret;          
    }

大家会看到方法调用组件的declareOutputFields方法，所以在一般重载的方法（Sput会重载open，nextTuple等等，Bolt会重载prepare，execute等等）中declareOutputFields是被最先调用的，所以是不能再declareOutputFields中使用未被初始化的变量（我们一般会在open或prepare中初始化变量，一般也不强调在构造函数中初始化，因为Storm自身的序列化框架机制），这样会抛出NullPointer异常。

当所有的bolt和spout都set完毕之后，我们就会调用createTopology方法生成一个StormTopology，由StormSubmitter来submit topology

/** 
    * Submits a topology to run on the cluster. A topology runs forever or until 
    * explicitly killed. 
    * 
    * 
    * @param name the name of the storm. 
    * @param stormConf the topology-specific configuration. See {@link Config}. 
    * @param topology the processing to execute. 
    * @param opts to manipulate the starting of the topology 
    * @param progressListener to track the progress of the jar upload process 
    * @throws AlreadyAliveException if a topology with this name is already running 
    * @throws InvalidTopologyException if an invalid topology was submitted 
    */  
   public static void submitTopology(String name, Map stormConf, StormTopology topology, SubmitOptions opts,
 ProgressListener progressListener) throws AlreadyAliveException, InvalidTopologyException {  
       if(!Utils.isValidConf(stormConf)) {  
           throw new IllegalArgumentException("Storm conf is not valid. Must be json-serializable");  
       }  
       stormConf = new HashMap(stormConf);  
       stormConf.putAll(Utils.readCommandLineOpts());  
       Map conf = Utils.readStormConfig();  
       conf.putAll(stormConf);  
       try {  
           String serConf = JSONValue.toJSONString(stormConf);  
           if(localNimbus!=null) {  
               LOG.info("Submitting topology " + name + " in local mode");  
               localNimbus.submitTopology(name, null, serConf, topology);  
           } else {  
               NimbusClient client = NimbusClient.getConfiguredClient(conf);  
               if(topologyNameExists(conf, name)) {  
                   throw new RuntimeException("Topology with name `" + name + "` already exists on cluster");  
               }  
               submitJar(conf, progressListener);  
               try {  
                   LOG.info("Submitting topology " +  name + " in distributed mode with conf " + serConf);  
                   if(opts!=null) {  
                       client.getClient().submitTopologyWithOpts(name, submittedJar, serConf, topology, opts);  
                   } else {  
                       // this is for backwards compatibility  
                       client.getClient().submitTopology(name, submittedJar, serConf, topology);  
                   }  
               } catch(InvalidTopologyException e) {  
                   LOG.warn("Topology submission exception: "+e.get_msg());  
                   throw e;  
               } catch(AlreadyAliveException e) {  
                   LOG.warn("Topology already alive exception", e);  
                   throw e;  
               } finally {  
                   client.close();  
               }  
           }  
           LOG.info("Finished submitting topology: " +  name);  
       } catch(TException e) {  
           throw new RuntimeException(e);  
       }  
   }

提交Topology的操作是，初始化NimbusClient，上传Jar包，检查该Topology是否存在，一切完工后，接下来就交由Nimbus来做了。

Nimbus

Nimbus可以说是storm中最核心的部分，它的主要功能有两个：

对Topology的任务进行分配资源

接收用户的命令并做相应的处理，如Topology的提交，杀死，激活等等

Nimbus本身是基于Thrift框架实现的，使用了Thrift的THsHaServer服务，即半同步半异步服务模式，使用一个单独的线程来处理网络IO，使用一个独立的线程池来处理消息，大大提高了消息的并发处理能力。

服务接口的定义都在storm.thrift文件中定义，贴下部分代码：

service Nimbus {  
  void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf,
4: StormTopology topology) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);  
  void submitTopologyWithOpts(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 
4: StormTopology topology, 5: SubmitOptions options) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);  
  void killTopology(1: string name) throws (1: NotAliveException e);  
  void killTopologyWithOpts(1: string name, 2: KillOptions options) throws (1: NotAliveException e);  
  void activate(1: string name) throws (1: NotAliveException e);  
  void deactivate(1: string name) throws (1: NotAliveException e);  
  void rebalance(1: string name, 2: RebalanceOptions options) throws (1: NotAliveException e, 
2: InvalidTopologyException ite);  
  
  // need to add functions for asking about status of storms, what nodes they're running on, looking at task logs  
  
  string beginFileUpload();  
  void uploadChunk(1: string location, 2: binary chunk);  
  void finishFileUpload(1: string location);  
    
  string beginFileDownload(1: string file);  
  //can stop downloading chunks when receive 0-length byte array back  
  binary downloadChunk(1: string id);  
  
  // returns json  
  string getNimbusConf();  
  // stats functions  
  ClusterSummary getClusterInfo();  
  TopologyInfo getTopologyInfo(1: string id) throws (1: NotAliveException e);  
  //returns json  
  string getTopologyConf(1: string id) throws (1: NotAliveException e);  
  StormTopology getTopology(1: string id) throws (1: NotAliveException e);  
  StormTopology getUserTopology(1: string id) throws (1: NotAliveException e);  
}

当执行命令 nohup ${STORM_HOME}/bin/storm nimbus & 时，会启动nimbus服务，具体的代码执行：storm python脚本代码，默认启动backtype.storm.daemon.nimbus程序：

def nimbus(klass="backtype.storm.daemon.nimbus"):  
    """Syntax: [storm nimbus]  
  
    Launches the nimbus daemon. This command should be run under   
    supervision with a tool like daemontools or monit.   
  
    See Setting up a Storm cluster for more information.  
    (http://storm.incubator.apache.org/documentation/Setting-up-a-Storm-cluster)  
    """  
    cppaths = [CLUSTER_CONF_DIR]  
    jvmopts = parse_args(confvalue("nimbus.childopts", cppaths)) + [  
        "-Dlogfile.name=nimbus.log",  
        "-Dlogback.configurationFile=" + STORM_DIR + "/logback/cluster.xml",  
    ]  
    exec_storm_class(  
        klass,   
        jvmtype="-server",   
        extrajars=cppaths,   
        jvmopts=jvmopts)

然后执行nimbus.clj 脚本，主要涉及两个方法——launch-server!(nimbus的启动入口)和service-handler（真正定义处理逻辑的地方）。

nimbus启动后，对外提供了一些服务，topology的提交，UI信息，topology的kill，rebalance等等。在文章一中讲到提交topology给nimbus，这些服务的处理逻辑全部在service-handler方法中。以下截取service-handler里面处理提交Topology的逻辑

(reify Nimbus$Iface  
      (^void submitTopologyWithOpts  
        [this ^String storm-name ^String uploadedJarLocation ^String serializedConf ^StormTopology topology  
         ^SubmitOptions submitOptions]  
        (try  
          (assert (not-nil? submitOptions))  
          (validate-topology-name! storm-name)  
          (check-storm-active! nimbus storm-name false)  
          (let [topo-conf (from-json serializedConf)]  
            (try  
              (validate-configs-with-schemas topo-conf)  
              (catch IllegalArgumentException ex  
                (throw (InvalidTopologyException. (.getMessage ex)))))  
            (.validate ^backtype.storm.nimbus.ITopologyValidator (:validator nimbus)  
                       storm-name  
                       topo-conf  
                       topology))  
          (swap! (:submitted-count nimbus) inc)  
          (let [storm-id (str storm-name "-" @(:submitted-count nimbus) "-" (current-time-secs))  
                storm-conf (normalize-conf  
                            conf  
                            (-> serializedConf  
                                from-json  
                                (assoc STORM-ID storm-id)  
                              (assoc TOPOLOGY-NAME storm-name))  
                            topology)  
                total-storm-conf (merge conf storm-conf)  
                topology (normalize-topology total-storm-conf topology)  
                storm-cluster-state (:storm-cluster-state nimbus)]  
            (system-topology! total-storm-conf topology) ;; this validates the structure of the topology  
            (log-message "Received topology submission for " storm-name " with conf " storm-conf)  
            ;; lock protects against multiple topologies being submitted at once and  
            ;; cleanup thread killing topology in b/w assignment and starting the topology  
            (locking (:submit-lock nimbus)  
              (setup-storm-code conf storm-id uploadedJarLocation storm-conf topology)  
              (.setup-heartbeats! storm-cluster-state storm-id)  
              (let [thrift-status->kw-status {TopologyInitialStatus/INACTIVE :inactive  
                                              TopologyInitialStatus/ACTIVE :active}]  
                (start-storm nimbus storm-name storm-id (thrift-status->kw-status (.get_initial_status submitOptions))))  
              (mk-assignments nimbus)))  
          (catch Throwable e  
            (log-warn-error e "Topology submission exception. (topology name='" storm-name "')")  
            (throw e))))  
        
      (^void submitTopology  
        [this ^String storm-name ^String uploadedJarLocation ^String serializedConf ^StormTopology topology]  
        (.submitTopologyWithOpts this storm-name uploadedJarLocation serializedConf topology  
                                 (SubmitOptions. TopologyInitialStatus/ACTIVE)))

检查Topology的DAG图是否是有效连接图、以及该topology Name是否已经存在，然后分配资源和任务调度（mk-assignments ）方法，等分配好资源之后，把数据写入到zookeeper，watcher发现有数据，就通知supervisor读取数据启动新的worker，一个worker就是一个JVM进程，worker启动后就会按照用户事先定好的task数来启动task，一个task就是一个thread

在executor.clj中mk-threads: spout ,mk-threads: bolt方法就是启动task，而task就是对应的spout或bolt 组件，而且这时Spout的open，nextTuple方法，以及bolt的preapre，execute方法都是在这里被调用的，结合文章一中提到的，

对于Spout 方法调用顺序:declareOutputFields-> open -> nextTuple -> fail/ack or other

Bolt 方法调用顺序：declareOutputFields-> prepare -> execute

需要的注意的是在Spout中fail、ack方法和nextTuple是在同一线程中被顺序调用的，所以在nextTuple中不要做延迟很大的操作。

至此，一个topology算是可以正式启动工作了。

次浏览