My Coding >
Programming language >
Python >
BeautifulSoup >
Python: How to make absolute links in BeautifulSoup
Python: How to make absolute links in BeautifulSoupReal HTML data, obtaining from the WEB can have relative internal links. This is not very convenient for further parsing. That is why, every time, when you have a new HTML data, which you need to parse to find internal links, it is necessary to convert all internal links to its absolute values. This can be done by adding domain name and recalculating path. Everything can be done with Python tools. For this example we will analyse page https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. If you look at the source code of this page, you will find a lot of links with following format: These kind of relative links should be converted. For example, above link should be: https://en.wikipedia.org/wiki/ISO_3166-2 Downloading HTTP dataIt is possible to download HTTP data with library requests. I always advise you to give web-server proper browser name, and also give proper referrer. As a referrer, it is possible to give the same page
Convert relative path to absolute in BeautifulSoupRelative links can be found in the following HTML tags:
To convert relative link to its absolute value we can use urljoin function from urllib.parse library. This function joins current url with link from this url to make absolute url. If the link from this url is already in absolute format, then this function will not do any changes.
After execution of this function all links will be in absolute format. Modification of these tags will result in modification of the original soup content
|
Last 10 artitles
9 popular artitles
|
|
© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit. |