#python
Read more stories on Hashnode
Articles with this tag
Spark SQL is a component of Apache Spark that works with tabular data. # Load data from file df = spark.read.csv("trains.csv", header=True) # Create...
To interact with AWS in Python, there is the Boto3 library. 1. AWS S3 S3 is the AWS Storage solution. import boto3 # Generate the boto3 client for...
Spark = tool for doing parallel computation with large datasets. Spark lets you spread data and computations over clusters with multiple nodes.pyspark...
Airflow: platform to program workflows. DAG: workflow made up of tasks with dependencies. Define a DAG in Python: from airflow.models import DAG from...
Download Data using curl https://curl.haxx.se/download.html curl -O https://websitename.com/file001.txt #-O -> download file with it's name curl...
1. Importing Data from Flat Files and Spreadsheets read_csv for all flat files import pandas as pd data = pd.read_csv('file.csv') data =...