30. PyQuery: HTML-based CSS selectors

Vec_Kun 2023-01-25 21:26:16 阅读数:411

pyqueryhtml-basedhtmlbasedcss

目录

前言 

导包

基本用法

按标签选择

Label chain operation

simple chain:后代选择器

类选择器 

id 选择器 

属性/文本选择器(重点)

Improve the multi-label attribute method

快速总结

PyQuery的强大功能:修改源代码

添加代码块

修改/添加属性

删除属性/标签等

总结


前言 

We met in the previous chaptersCSS与CSS选择器的概念, In this section we introduce onePython模块: PyQuery. 

Maybe some friends know each otherJava中的JQuery, 其实PyQueryAnd it is the same. 


导包

from pyquery import PyQuery


基本用法

The commonly used ones are as follows:

 下面通过实例讲解:

html = """
<ul>
<li class="aaa"><a href="http://www.google.com">谷歌</a></li>
<li class="aaa"><a href="http://www.baidu.com">百度</a></li>
<li class="bbb" id="qq"><a href="http://www.qq.com">腾讯</a></li>
<li class="bbb"><a href="http://www.csdn.net">CSDN</a></li>
</ul>
"""
# 加载html内容
p = PyQuery(html)
print(p)
print(type(p))

Import a simple html,用PyQuery接口加载html内容,Print to see the output,发现就是html本身的内容.Then print its type again,See if nothing is done,发现它是PyQuery类.

PyQuery对象有什么用呢?

它可以把html加载起来,Then it can be done afterwardCSS选择器的操作了.

关于CSSWe have already talked about the knowledge points of selectors in the previous section,Students who don't understand can move to the hyperlink portal on the previous row.

按标签选择

# pyquery对象直接(css选择器)
a = p("a")
print(a)
print(type(a)) # 依然是pyquery对象

可以筛选出所有a标签对象,And the filter result is still PyQuery对象,From this we can derive the following example.

Label chain operation

# 链式操作
a = p("li")("a")
print(a)

 

Still after screeningPyQuery对象,So we can continue to filter.The above example is filtered out first li 标签,Then filter in it a Label and print out.

simple chain:后代选择器

a = p("li a")
print(a)

The output is the same as above.用了CSS选择器的语法:后代选择器,Screening is included li 标签的所有 a 标签.

类选择器 

a = p(".aaa a") # class="aaa"
print(a)

查询结果为class为aaaEverything contained under the a 标签.

id 选择器 

a = p("#qq a") # id="qq"
print(a)

查询结果为 id 为 qq Everything contained under the a 标签.

属性/文本选择器(重点)

href = p("#qq a").attr('href') # 拿属性
text = p("#qq a").text() # 拿文本
print(href)
print(text)

查询结果为 id 为 qq contained under the tab of a 标签的 href attribute and the text it contains.

注意:If multiple tags take attributes at the same time,Only get the first one:

# 坑, If multiple tags take attributes at the same time. You can only get the first one by default
href = p("li a").attr("href")
print(href)

Multiple tags take attributes at the same time,Find one and return,不能这样写.

Improve the multi-label attribute method

# Multiple tags take attributes
it = p("li a").items()
for item in it: # Get each label from the iterator
href = item.attr("href") # 拿到href属性\
text = item.text()
print(text, href)

用items()method converts tabs into a list,Then loop through the list,Catch every one insidehrefProperties are printed out.

快速总结

1. pyquery(选择器)
2. items()  When the selector selects a lot of content. When you need to deal with them one by one
3. attr(属性名)  获取属性信息
4. text() 获取文本


PyQuery的强大功能:修改源代码

PyQueryThe biggest difference from other queries is that it can modify the source code of the web page to make it “整齐”,It is also convenient for us to grab information and other operations.

如下例所示:

添加代码块

html = """
<HTML>
<div class="aaa">哒哒哒</div>
<div class="bbb">嘟嘟嘟</div>
</HTML>
"""
p = PyQuery(html)
# 在xxxx标签后面添加xxxxx新标签
p("div.aaa").after("""<div class="ccc">吼吼吼</div>""")
p("div.aaa").append("""<span>我爱你</span>""")
print(p)

上述代码第11The meaning of the line is inaaa类的divAdd a line after the labelafterThe code block contained in the ;

上述代码第12The meaning of the line is inaaa类的divAdd a line inside the labelappendThe code block contained in the .

修改/添加属性

p("div.bbb").attr("class", "ccc") # 修改属性
p("div.ccc").attr("id", "12306") # 新增属性, 前提是该标签没有这个属性
print(p)

修改/All attributes are added .attr() 这个方法 ,如果有这个属性,Then the corresponding is to modify,没有就是新增.

其实可以类比Python的字典键值对,很好理解.

删除属性/标签等

p("div.ccc").remove_attr("id") # 删除属性
p("div.aaa").remove() # 删除标签
print(p)

The above code is removedclass为ccc的div的id属性,和class为aaa的div标签本身. 

 

The filtered tags themselves can be removed individually using the methods described above、标签属性、标签类、Tag name field(不常用).


总结

In this section we discussedPyQueryThe basic usage and its special features,The next section will be a practical exercise,Use examples to show when to usePyQuery最合适.(That is what we mentioned to make the source code become“整齐”And make the information easy to extract)

版权声明:本文为[Vec_Kun]所创,转载请带上原文链接,感谢。 https://qdmana.com/2023/025/202301252104213848.html