Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Python is increasingly used in scientific applications traditionally dominated by R, MATLAB, Stata, SAS, and other commercial or open-source research environments.
The maturity and stability of the fundamental numerical libraries NumPy, SciPy, and others, the quality of documentation, and the availability of “kitchen-sink” distributions have made Python accessible and convenient for a broad audience.
Matplotlib integrated with IPython provides an interactive research and development environment with data visualization suitable for most users.
Pandas
The pandas library, under development since 2008, is intended to close the gap in the richness of available data analysis tools between Python, a general purpose systems and scientific computing language, and the numerous domain-specific statistical computing platforms and database languages.
Aim to provide equivalent functionality and implement many features, such as automatic data alignment and hierarchical indexing, which are not readily available in such a tightly integrated way in other libraries or computing environments to our knowledge.
While initially developed for financial data analysis applications, pandas will enable scientific Python to be a more attractive and practical statistical computing environment for academic and industry practitioners.
The library’s name derives from panel data, a common term for multidimensional data sets encountered in statistics and econometrics.
I/O with Pandas
Installing Pandas
!pip install pandas
Importing Pandas
import pandas as pd
Reading Data with Pandas
Note: We need a raw form of data in csv, xlsv, sql and json format to execute this code. Make sure that name of the file should same in the code. For example, data.csv,data.xlsx, and data.json.
1. Reading Data from CSV
import pandas as pd
data = pd.read_csv('data.csv')
print(data)
Output
2. Reading Data from Excel
import pandas as pd
data = pd.read_excel('data.xlsx')
print(data)
Output
3. Reading Data from SQL Database
import pandas as pd
import sqlite3
connection = sqlite3.connect(‘codingninjas.db')
query = "SELECT * FROM players"
data = pd.read_sql(query, connection)
connection.close()
print(data)
Output
4. Reading Data from JSON
import pandas as pd
data = pd.read_json('data.json')
print(data)
Output
Writing Data to Different Formats
Writing Data to CSV
import pandas as pd
data = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Aditya', 'Shubh', 'Hritik'], 'Age': [19, 20, 23]})
data.to_csv('output.csv', index=False)
We use Pandas to create a DataFrame named 'data' with three columns: 'ID,' 'Name', and 'Age', containing data. Then we use the to_csv() function to save this DataFrame as a CSV file named 'output.csv.' The index=False argument confirms that the DataFrame's index is not in the CSV file.
Writing Data to Excel
import pandas as pd
data = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Aditya', 'Shubh', 'Hritik'], 'Age': [19, 20, 23]})
data.to_excel('output.xlsx', index=False)
We use Pandas to create a DataFrame named 'data' with three columns: 'ID,' 'Name', and 'Age', containing data. Then we use the to_excel() method to save this DataFrame as a Excel file named 'output.xlsx.' The index=False argument confirms that the DataFrame's index is not in the Excel file.
We Imported Pandas and SQLite libraries and created a DataFrame called 'data' with data. Connects to an SQLite database named 'codingninjas.db'. Writes the data from the DataFrame to an SQLite table named 'players,' replacing it if it already exists.Closes the database connection.
Writing Data to JSON
import pandas as pd
data = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Aditya', 'Shubh', 'Hritik'], 'Age': [19, 20, 23]})
data.to_json('output.json', orient='records')
We use Pandas to create a DataFrame named 'data' with three columns: 'ID,' 'Name', and 'Age', containing data. Then we use the to_json() method to save this DataFrame as a JSON file named 'output.json.'' The orient='records' parameter specifies the JSON format, then each row of the DataFrame becomes a separate JSON file.
Frequently Asked Questions
Can Pandas work with databases?
Yes, Pandas can work with various databases using libraries like SQLAlchemy. We can read data from databases using pd.read_sql() and write data using to_sql().
How to import Pandas in a Python script?
We can import Pandas in our script by import pandas as pd. The handle pd is generally used for Pandas to make code more straightforward.
Is Pandas suitable for handling huge datasets?
Pandas are optimized for in-memory data processing, and performance can be problematic with big datasets. For big data, distributed computing frameworks like Apache Spark might be more suitable.
Conclusion
Mastering input and output operations with Pandas opens a world of data manipulation and analysis possibilities. By seamlessly reading and writing data from and to various sources, Pandas empowers users to harness the insights hidden within datasets.
In this article, we learn about I/O with Pandas. We also learn about Pandas. We concluded the article by discussing I/O with Pandas.