Beautifulsoup Select All Href In Some Element With Specific Class

July 30, 2023 Post a Comment

I'm trying to scrap images from this website. I tried with Scrapy(using Docker)and with scrapy/slenium. Scrapy seems not to work in windows10 home so I'm now trying with Selenium/B

Solution 1:

you can get href by class name as:

que1:

for link in soup.findAll('a', {'class': 'emblem'}):
   try:
      print link['href']
   except KeyError:
      pass`

Solution 2:

Try this. It will give you all the urls traversing all the pages in that site. I've used Explicit Wait to make it faster and dynamic.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
url = "http://emblematica.grainger.illinois.edu/"
wait = WebDriverWait(driver, 10)
driver.get("http://emblematica.grainger.illinois.edu/browse/emblems?Filter.Collection=Utrecht&Skip=0&Take=18")
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".emblem")))

whileTrue:
    soup = BeautifulSoup(driver.page_source,"lxml")
    for item in soup.select('.emblem'):
        links = url + item['href']
        print(links)

    try:
        link = driver.find_element_by_id("next")
        link.click()
        wait.until(EC.staleness_of(link))
    except Exception:
        break
driver.quit()

Partial output:

http://emblematica.grainger.illinois.edu/detail/emblem/av1615001
http://emblematica.grainger.illinois.edu/detail/emblem/av1615002
http://emblematica.grainger.illinois.edu/detail/emblem/av1615003

Solution 3:

Not sure if above answers did the job. Here is one which does the work for me.

url = "SOME-URL-YOU-WANT-TO-SCRAPE"response = requests.get(url=url)
urls = BeautifulSoup(response.content, 'lxml').find_all('a', attrs={"class": ["YOUR-CLASS-NAME"]}, href=True)

Html5 Academy

Beautifulsoup Select All Href In Some Element With Specific Class

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Beautifulsoup Select All Href In Some Element With Specific Class"