个人签名

529篇博客

编辑推荐

网易数帆开源API网关与容器云项目，让云原生生产落地“多快好

网易汪源：统一负载与多云环境的“开放姿态”，才是云原生

网易数帆如何用 Kubernetes“原语”搞定云原生中间件

快手打新挤爆券商系统，网易数帆推出券商稳定性保障方案

探索智慧校园新模式，网易有数在教育行业的实践分享

金融行业大数据治理之路——数据模型篇

如何实现一颗虚拟DOM树

勿忘初心2018-10-26 11:40

此文已由作者杨介正授权网易云社区发布。

欢迎访问网易云社区，了解更多网易技术产品运营经验。

只是读书总结，并不是发明，但伴随一点点创造，不对之处欢迎大家交流指正。开门进山，实现虚拟树需要分两步：第一步是写一个虚拟DOM类，第二步是写一个html解析器去解析html字符串，然后以虚拟DOM为节点建立一颗虚拟树。

1、虚拟DOM类：

写虚拟DOM类时可以参照真实DOM赋予如appendChild, removeChild，childNodes, setAttribute等方法或属性。

2、html解析器:

直接举个例子吧

<div id="a">hello</div>

如何解析上面这段html字符串呢，也就是说如何去验证上面这段html的正确性并提取出有用的关键信息如<div、id="a"等然后转化成我们想要的虚拟DOM树。开动脑筋想，方法肯定是N多。这里介绍一种学来加上自己简化的方法：分为两步，用lexer（词法分析器）去提取关键词<div、id="a"、>、hello、</div>，并验证html的正确性。然后用parser(语法分析器)去分析lexer提取来的关键词并根据html标准解析成虚拟DOM树。为什么要分两步，不能边提取边解析吗，也可以。但是html的结构可能本身是错误的，如果把这个错误在lex阶段就中断掉，不是更好吗？那么问题来了，lexer怎么写。简单的说就是遍历html字符串，根据“当前状态”提取关键词。遍历html字符串可以用while循环加string.charAt方法，这里的当前状态重点解释一下。如果大家接触过状态机，就会很好理解，没有接触过的话（我也没接触过），也没关系，我通俗的解释一下。比如有beginStartTag(开始标签开始状态，如<div)、attribute（属性状态，如id="a"）、endStartTag(开始标签结束状态，如>)、startEndTag(结束标签开始状态，如</)、endEndTag(结束标签结束状态，如>)5个状态。这5个状态之间的关系如下图

注：char是当前遍历到的单个字符，next char是下一个字符，chars和next chars 分别是当前遍历到的字符段和接下来的字符段

根据上图状态机来分析下第一个例子中lexer的提取过程：

0:初始状态为beginStartTag，初始化tokens=[]。tokens是用来保存提取的词法信息的，比如提取<div id="a"></div>的结果为

[
  {
    type: 'tagStart',
    tagname: 'div',
    nodeType: '1'
  },

  {
    type: 'attribute',
    name: 'id',
    value: 'a'
    nodeType: '2'
  },

  {
    type: 'tagEnd'
  },

]

1.beginStartTag(开始标签开始)状态:如果chars等于<div，则push对应token到tokens里并将状态改为attribute。

2.attributes（属性）状态：重复遍历出所有attributes，这里只有一个id="a"属性。为方便属性词法的提取，可能需要额外再加一个或多个状态，比如startAttributeName, endAttributeName, startAttributeValue和endAttributeValue。在文章的最后，我会给出示例代码供大家参考，根据我的方法，只需再多引入一个状态就好了。当下一个char等于'>'时，则进入endStartTag状态。

3.endStartTag（开始标签结束）状态:可以作一些处理，也可以什么都不做，直接回到beginStartTag状态。

4.beginStartTag（开始标签开始）状态：如何下一个chars是</则进入beginEndTag状态。

5.beginEndTag(结束标签开始)状态：如果当前进行中的标签名等于div则进入endEndTag。

6.endEndTag（结束标签结束）状态:如果char等于'>'，则重新回到beginStartTag状态。

7.在状态的跳转中，如果出现不可跳转的状态，则说明html的语法是错误的，直接抛出异常并提示异常位置及字符串信息。

接下来，并是根据lexer提取出的词法信息，生成出一个虚拟DOM树，这个过程我们称为语法解析，有parser语法解析器完成。具体的解析过程如下:

1.遍历tokens。

2.当type等于tagStart时，根据tagname创建虚拟DOM节点，并将DOM推入栈中。之所以要入栈，是因为DOM之间是有父子关系的，如果栈中有DOM，则后创建的DOM将被视为栈尾DOM的子DOM。

3.当type等于attribute时，为当前DOM添加属性。

4.当type等于tagEnd时，栈尾DOM出栈。

5.当遍历完tokens时，整颗虚拟树也就生成好了。

3、示例代码:

接下来的示例代码大家是无法正常运行的，我只是想给大家提供一下参考。lexer会比上面分析的略复杂一点点，因为要考虑到textNode，<input />自关闭标签，attribute的状态里多加了个beginAttributeValue状态。刚开始看可能麻烦了点，多看几遍就懂了，也就是几个状态跳来跳去。

注：对NEJ不太熟的同学注意，下面的变量名的命名规则：public _$前缀， protected __前缀， private _前缀

define([
	'./eventEmitter',
	'../util/util',
	'./vDom'
],function(EventEmitter, u, VDom){


	var
		Lexer = EventEmitter._$extend({
			
			__init: function(options){

				this.__super(options);


				this.__state = 'beginStartTag';
				this.__tagName = [];
				
			},


			_error: function(type){
				var
					message = '',
					i;
				for(i = -10; i < 10 ; i++){
					message += this._peek(i);
				}
				throw ('html syntax error at position' + this.__index + ' and constructor:' +  message + ' status: ' + type);
				this.__state = 'error';
			},

			_states: {
				
				
				beginStartTag: function(ch){
					var
						name;

					if(ch != '<'){
						//textNode
						this.__tokens.push({
							type: 'startTag',
							nodeType: 3,
							tagName: 'text',
							text: this.__readIdent(ch, function(ch){
								return ch != '<';
							})
						});
						this.__tokens.push({
							type: 'endTag',
							tagName: 'text'
						});
						this.__state = 'beginStartTag';

					}else if(ch == '<' && this._peek(1) == '/'){

						this.__state = 'beginEndTag';
						this.__index ++;


					}else if(ch == '<' && this._isIdent(this._peek(1))){
						
						name = this.__readIdent();


						this.__tokens.push({
							type: 'startTag',
							tagName: name,
							nodeType: '1'
						});

						this.__state = 'attribute';
						this.__tagName.unshift(name);

					}else {
						this._error('beginStartTag');
					}

				},

				attribute: function(ch){
					var
						tagName = this.__tagName[0],
						name,
						value = '',
						beginAttributeValue,
						singleQuotes,
						doubleQuotes;	

					if(ch == '/' && this._is('>', this._peek(1)) && (tagName == 'img' || tagName == 'input') ){
						
						this.__state = 'endSelfEndTag';
						return;

					}
					// > transfer to beginStartTag status
					else if(ch == '>'){
			
						this.__state = 'beginStartTag';
						return;

					}else if(this._isIdent(ch)){

						name = this.__readIdent(ch);
					
					}				
					
					while(this.__index < this.__length){

						ch = this._peek(1);

						if(ch == '/' && this._is('>', this._peek(1)) && (tagName == 'img' || tagName == 'input') ){
						
							this.__state = 'endSelfEndTag';
							break;

						}
						// > transfer to endStartTag status
						else if(ch == '>'){
							this.__state = 'endStartTag';
							break;

						}
						else if(!beginAttributeValue){

							if( this._isWhite(ch) ){

								this.__index ++;
								continue;

							}else if(this._isIdent(ch)){

								this.__state = 'attribute';
								break;

							}else if(ch == '='){

								beginAttributeValue = true;
								this.__index ++;

							}else{
								this._error('!beginAttributeValue');
								break;
							}
								
						}else if(beginAttributeValue){

							
							if(!singleQuotes && !doubleQuotes){


								if( this._isWhite(ch) ){

									this.__index ++;
									continue;

								}else if(ch == '\''){

									singleQuotes = true;
									this.__index ++;

								}else if(ch == '\"'){

									doubleQuotes = true;
									this.__index ++;

								}else{
									this._error('beginAttributeValue');
									break;
								}
							}else{
								
								value = this.__readIdent('', function(ch){

									return ch != (singleQuotes ? '\'' : '\"');
								});

								this.__state = 'attribute';
								this.__index ++;
								break;

							}
							

						}else{
							this._error('attribute');
							break;
						}
					}
					this.__tokens.push({
						type: 'attribute',
						name: name,
						value: value
					});

				},

				endStartTag: function(ch){

					this.__state = 'beginStartTag';
					
				},

				beginEndTag: function(ch){
					
					name = this.__readIdent(ch);

					if(name == this.__tagName[0]){

						this.__state = 'endEndTag';

					}else{

						this._error('beginEndTag');
					}
						
				},

				endEndTag: function(ch){
					
					if(ch == '>'){
						this.__state = 'beginStartTag';	
										
						this.__tokens.push({
							type: 'endTag',
							tagName: this.__tagName[0]
						});

						this.__tagName.shift();
					}else{
						this._error('endEndTag');
					}
				},

				endSelfEndTag: function(ch){
					this.__state = 'beginStartTag';					
					this.__tokens.push({

						type: 'endTag',
						tagName: this.__tagName[0]
					});
					this.__tagName.shift();
				}

			},

			_peek: function(num){
				num = num || 0;
				if(this.__index + num < this.__text.length){
					
					return this.__text.charAt(this.__index + num);
				}
				return undefined;
			},

			_is: function(chars, ch){
				return chars.indexOf(ch || this.__ch) >= 0;
			},

			_isWhite: function(ch){
				return ch == ' ' || ch == '\n' || ch == '\r' || ch == '\t' || ch == '\v';
			},

			_isNotWhite: function(ch){
				return !this._isWhite(ch);
			},

			_isNumber: function(ch){
				return ch >= '0' && ch <= '9';
			},

			_isString: function(ch){
				return ch == '\'' || ch == '\"';
			},


			_isIdent: function(ch){
				return ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z' || ch == '_';
			},



			_$lex: function(text){
							

				this.__tokens = [];
				this.__text = text;
				this.__length = text.length;
				this.__index = 0,
				this.__ch = undefined;


				while(this.__index < this.__length && this.__state != 'error'){
					
					this.__ch = this._peek();
					
					if(this._isWhite(this.__ch)){

						this.__index ++;
						continue;

					}else{

						this._states[this.__state].call(this, this.__ch);
						this.__index ++;
					}
					
				}

				return this.__tokens;
			},

			

			__readIdent: function(ch, _is){
				var
					index = this.__index,
					text = ch || '';
				
				_is = _is || this._isIdent;

				while(this.__index < this.__length){
					
					ch = this._peek(1);
					
					if(_is.call(this, ch)){
						text += ch;
						this.__index ++;
					}else{
						break;
					}															
				}

				return text;
			}




		});



	var
		HtmlParser = EventEmitter._$extend({
			
			__init: function(){

				this.__super();

				
			},

			_$parse: function(text){
				var
					fn;
				this.__lexer = new Lexer();
				this.__tokens = this.__lexer._$lex(text);
				fn  = this._statements();
				return fn;
			},

			_statements: function(){
				
				return this._consume();
			},

			_consume: function(){
				var
					token,
					parentNode = [],
					node,
					type,
					tagName,
					nodeType,
					attrName,
					attrValue,
					text;


				while(this.__tokens.length){

					token = this.__tokens.shift();

					type = token.type;

					if(type == 'startTag'){

						node = new VDom(token);
						if(parentNode[0]){
							parentNode[0].appendChild(node);
						}
						parentNode.unshift(node);

					}else if(type == 'attribute'){

						attrName = token.name;
						attrValue = token.value;
						node.setAttribute(attrName, attrValue);

					}else if(type == 'endTag'){
						node = parentNode.shift();
					}

				}

				return node;
									
			}

		});



	return HtmlParser;
})

网易云免费体验馆，0成本体验20+款云产品！

更多网易技术、产品、运营经验分享请点击。

<上一篇NEJ-Dispatcher剖析(二) 调度原理

下一篇既生AngularJs何生ReactJs>

最新博客

最新资源下载

编辑推荐

如何实现一颗虚拟DOM树

最新博客

最新资源下载

编辑推荐

如何实现一颗虚拟DOM树

推荐博客