Web Technologies QA
Web Technologies QA
Q: What is HTTP, and how does it work in web communication? Describe the structure of HTTP
A: HTTP (HyperText Transfer Protocol) is the foundation of web communication. It is a stateless protocol used for
transmitting data between a client (usually a browser) and a server. HTTP works on a request-response model, where
- Request Line: Includes the HTTP method (GET, POST, etc.), the URL, and the version (e.g., HTTP/1.1)
- Body (Optional): Contains data sent to the server (e.g., form inputs)
- Status Line: Includes HTTP version, status code (e.g., 200 OK), and message
HTTP enables communication and data transfer across the internet, forming the basis of web browsing.
Q: How can you retrieve web pages and images using the urllib library in Python?
A: The urllib library in Python is used to access URLs and perform HTTP requests. To retrieve web pages, you can use
import urllib.request
url = 'https://www.example.com'
response = urllib.request.urlopen(url)
html = response.read().decode('utf-8')
import urllib.request
image_url = 'https://www.example.com/image.jpg'
urllib.request.urlretrieve(image_url, 'image.jpg')
These methods are useful for web scraping and downloading content from the web.
A: Parsing HTML and web scraping are important because they allow the extraction of useful data from web pages. This
- Access Public Data: Gather data like prices, reviews, and articles.
- Enable Research and Analysis: Extract data for machine learning or trend analysis.
In short, web scraping and HTML parsing unlock structured data from unstructured sources on the web.
A: Regular Expressions:
BeautifulSoup:
Conclusion: Use BeautifulSoup for robust and structured HTML parsing. Regular expressions should only be used for
Q: What is XML, and how is it used for data? Show how to parse XML in Python.
A: XML (eXtensible Markup Language) is a format for storing and transporting structured data. It uses custom tags and
a tree structure.
Use Cases:
- Data interchange
- Configuration files
import xml.etree.ElementTree as ET
xml_data = """<students><student><name>John</name></student></students>"""
root = ET.fromstring(xml_data)
name = student.find('name').text
print(name)
Q: What is JSON, and why is it better than XML? Show how to parse JSON in Python.
A: JSON (JavaScript Object Notation) is a lightweight format for data exchange. It's easier to read, write, and parse than
XML.
- Cleaner syntax
- Smaller size
import json
data = json.loads(json_data)
Web Technologies - Q&A
print(data['name'])
A: Web services retrieve external data by making HTTP requests to external APIs. The response is usually in JSON or
XML format.
Example:
import requests
url = 'https://api.openweathermap.org/data/2.5/weather?q=London&appid=API_KEY'
response = requests.get(url)
if response.status_code == 200:
data = response.json()
print(data['main']['temp'])
This allows web applications to use real-time data from other systems.
Q: How can you read binary files with urllib? What are common use cases?
A: Binary files (images, PDFs, audio) can be read with urllib by using urlopen() and reading the content as bytes.
Example:
import urllib.request
url = 'https://example.com/file.jpg'
response = urllib.request.urlopen(url)
data = response.read()
f.write(data)
- Downloading images