2

I need to scrape all article, title of article and paragraf in this web: https://portaljuridic.gencat.cat/eli/es-ct/l/2014/12/29/19

The problem is than I tried some of div, h3 or p nothing happen add image.

from bs4 import BeautifulSoup import lxml import pandas as pd from tqdm import tqdm_notebook def parse_url(url): response = requests.get(url) content = response.content parsed_response = BeautifulSoup(content, "lxml") return parsed_response url = "https://portaljuridic.gencat.cat/eli/es-ct/l/2014/12/29/19" soup = parse_url(url) article = soup.find("div", {"class":"article-document"}) article 

It seems to be a website with javascript, but I don't know how to get it.

    1 Answer 1

    3

    The website does 3 API calls in order to get the data.
    The code below does the same and get the data.

    (In the browser do F12 -> Network -> XHR and see the API calls)

    import requests payload1 = {'language':'ca','documentId':680124} r1 = requests.post('https://portaldogc.gencat.cat/eadop-rest/api/pjc/getListTraceabilityStandard',data = payload1) if r1.status_code == 200: print(r1.json()) print('------------------') payload2 = {'documentId':680124,'orderBy':'DESC','language':'ca','traceability':'02'} r2 = requests.post('https://portaldogc.gencat.cat/eadop-rest/api/pjc/getListValidityByDocument',data = payload2) if r2.status_code == 200: print(r2.json()) print('------------------') payload3 = {'documentId': 680124,'traceabilityStandard': '02','language': 'ca'} r3 = requests.post('https://portaldogc.gencat.cat/eadop-rest/api/pjc/documentPJC',data=payload3) if r3.status_code == 200: print(r3.json()) 
    5
    • Hi balderman, thanks for your help and explanation. Can I make one question more, I'm sorry I'm really new with this. Some parts of text has special caracters like ' or ` and in the extraction appear &, how can chnage this to specific caracter? Thanks again for the support.
      – Merinoide
      CommentedOct 13, 2021 at 5:49
    • I am not sure I understand the question. Can you come up with a specific example?
      – balderman
      CommentedOct 13, 2021 at 7:05
    • Hi Balderman For exemple when extract first article inside of 'text': '<p align="JUSTIFY">\n\t1. the text start with Aquesta llei t&eacute; per objecte but in the webside apear 1. Aquesta llei té per objecte: How I can change this to see Aquesta llei té per objecte: instead of Aquesta llei t&eacute; per objecte. Thanks for your support.
      – Merinoide
      CommentedOct 13, 2021 at 9:55
    • Well... I have no idea. Sorry.
      – balderman
      CommentedOct 13, 2021 at 10:27
    • Hi balderman Oks, well I will look to sse what I find Really thanks for your help and suport!!
      – Merinoide
      CommentedOct 13, 2021 at 10:41

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.