AI Glossary
The Definitive Glossary for Understanding Artificial Intelligence (AI)
Term | Definition |
---|---|
Algorithm | A set of rules or steps followed to solve a problem or perform a task. |
Artificial Intelligence (AI) | Simulation of human intelligence by machines, especially computers. |
Machine Learning | A subset of AI where machines improve performance through experience. |
Deep Learning | A type of machine learning using neural networks with many layers. |
Neural Network | A series of algorithms that mimic the human brain to recognize patterns. |
Supervised Learning | Machine learning with labeled data for training. |
Unsupervised Learning | Machine learning with unlabeled data, finding hidden patterns. |
Reinforcement Learning | Learning by interacting with an environment and receiving rewards or penalties. |
Natural Language Processing (NLP) | AI that understands and processes human language. |
Computer Vision | AI that interprets and understands visual information from the world. |
Data Mining | The process of discovering patterns in large data sets. |
Big Data | Extremely large data sets analyzed to reveal patterns and trends. |
Predictive Analytics | Using data, statistical algorithms, and machine learning techniques to predict future outcomes. |
Classification | Assigning data into predefined categories. |
Regression | A statistical method for predicting a continuous outcome. |
Clustering | Grouping data points into clusters based on similarity. |
Anomaly Detection | Identifying rare items, events, or observations which differ significantly from the majority of the data. |
Decision Tree | A model used to make decisions based on rules. |
Random Forest | An ensemble of decision trees used for classification and regression. |
Support Vector Machine (SVM) | A supervised learning model used for classification and regression analysis. |
K-Nearest Neighbors (KNN) | A simple algorithm that stores all available cases and classifies new cases based on a similarity measure. |
Gradient Descent | An optimization algorithm used to minimize the cost function in machine learning models. |
Overfitting | When a model learns the training data too well, including noise and details, affecting its performance on new data. |
Underfitting | When a model is too simple and cannot capture the underlying pattern of the data. |
Cross-Validation | A technique for assessing how the results of a statistical analysis will generalize to an independent data set. |
Training Data | Data used to train machine learning models. |
Test Data | Data used to test the trained model's performance. |
Validation Data | Data used to tune the model's parameters during training. |
Hyperparameter | Parameters set before the learning process begins, controlling the learning process. |
Feature Engineering | The process of using domain knowledge to extract features from raw data. |
Feature Selection | The process of selecting a subset of relevant features for model construction. |
Dimensionality Reduction | Reducing the number of random variables under consideration by obtaining a set of principal variables. |
Principal Component Analysis (PCA) | A technique used to emphasize variation and bring out strong patterns in a data set. |
Linear Regression | A linear approach to modeling the relationship between a dependent variable and one or more independent variables. |
Logistic Regression | A statistical model that uses a logistic function to model a binary dependent variable. |
Bias | Error introduced by approximating a real-world problem which may oversimplify the model. |
Variance | Error introduced by the model's sensitivity to small fluctuations in the training set. |
Loss Function | A method of evaluating how well a specific algorithm models the given data. |
Gradient Boosting | A machine learning technique for regression and classification problems, which builds a model in a stage-wise fashion. |
AdaBoost | A boosting algorithm that combines multiple weak classifiers to create a strong classifier. |
Bagging | A technique that combines the predictions of multiple machine learning algorithms to produce a more accurate prediction. |
Ensemble Learning | Using multiple models to improve the performance of a single model. |
Convolutional Neural Network (CNN) | A deep learning algorithm which can take in an input image, assign importance to various aspects in the image, and differentiate one from the other. |
Recurrent Neural Network (RNN) | A type of neural network where connections between nodes can create cycles, allowing output from some nodes to affect subsequent input to the same nodes. |
Long Short-Term Memory (LSTM) | A type of RNN architecture designed to avoid the long-term dependency problem. |
Autoencoder | A type of artificial neural network used to learn efficient codings of input data. |
Generative Adversarial Network (GAN) | A class of machine learning frameworks designed by two neural networks competing with each other to generate new data. |
Transfer Learning | A machine learning method where a model developed for a particular task is reused as the starting point for a model on a second task. |
Tokenization | Breaking text into smaller pieces, like words or phrases, for analysis. |
Embedding | A learned representation for text where words that have the same meaning have a similar representation. |
Bag of Words (BoW) | A representation of text that describes the occurrence of words within a document. |
Term Frequency-Inverse Document Frequency (TF-IDF) | A statistical measure used to evaluate how important a word is to a document in a collection. |
Word2Vec | A group of related models that are used to produce word embeddings. |
Sentence Embedding | A method to represent entire sentences as vectors. |
Named Entity Recognition (NER) | A process in NLP that locates and classifies named entities in text into predefined categories. |
Sentiment Analysis | The process of determining the emotional tone behind a series of words. |
Text Classification | Assigning predefined categories to text. |
Text Generation | Using machine learning to generate new, similar text based on a given input. |
Chatbot | A computer program designed to simulate conversation with human users. |
Speech Recognition | The ability of a machine to identify words and phrases in spoken language and convert them to a machine-readable format. |
Image Recognition | The process of identifying and detecting an object or feature in a digital image or video. |
Object Detection | Identifying and locating objects within an image. |
Image Segmentation | Partitioning a digital image into multiple segments to make it easier to analyze. |
Generative Model | A model for generating all values for a phenomenon, both observed and unseen. |
Discriminative Model | A model that differentiates between different kinds of data instances. |
Markov Decision Process (MDP) | A mathematical process for making a sequence of decisions. |
Bayesian Network | A graphical model that represents the probabilistic relationships among a set of variables. |
Hidden Markov Model (HMM) | A statistical model where the system being modeled is assumed to be a Markov process with hidden states. |
Fuzzy Logic | A form of many-valued logic dealing with approximate, rather than fixed and exact reasoning. |
Expert System | A computer system that emulates the decision-making ability of a human expert. |
Heuristic | A problem-solving approach using practical methods for immediate solutions. |
Cognitive Computing | Technologies that mimic human brain function to perform tasks. |
Autonomous Systems | Systems capable of performing tasks without human intervention. |
Robotics | The branch of technology dealing with the design, construction, operation, and application of robots. |
Internet of Things (IoT) | Interconnected devices that communicate and exchange data over the internet. |
Edge Computing | Computing that’s done at or near the source of data. |
Cloud Computing | Delivering computing services over the internet. |
Quantum Computing | Computing using quantum-mechanical phenomena. |
Blockchain | A decentralized digital ledger of transactions. |
Cryptography | The practice of securing communication from third parties. |
Cybersecurity | Protecting systems, networks, and programs from digital attacks. |
Data Science | A field that uses scientific methods, processes, algorithms, and systems to extract knowledge from data. |
Data Engineer | A professional who prepares ‘big data’ for analytical or operational uses. |
Data Analyst | A professional who collects, processes, and performs statistical analyses of data. |
Data Visualization | The graphical representation of information and data. |
Business Intelligence (BI) | Technologies and strategies used by enterprises for data analysis and management. |
Data Warehousing | The process of constructing and using a data warehouse. |
ETL (Extract, Transform, Load) | A process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. |
SQL (Structured Query Language) | A standard programming language for relational database management and data manipulation. |
NoSQL | A database management system that does not use SQL. |
Hadoop | An open-source framework for storing and processing big data. |
Spark | An open-source unified analytics engine for big data processing. |
Tableau | A data visualization tool. |
Power BI | A business analytics tool by Microsoft. |
Python | A high-level programming language used for general-purpose programming. |
R | A programming language and software environment for statistical computing and graphics. |
Java | A high-level, class-based, object-oriented programming language. |
C++ | A general-purpose programming language created as an extension of C. |
TensorFlow | An open-source machine learning framework developed by Google. |
PyTorch | An open-source machine learning library developed by Facebook. |
Keras | An open-source software library that provides a Python interface for neural networks. |
Scikit-learn | A free software machine learning library for the Python programming language. |
OpenCV | An open-source computer vision and machine learning software library. |
API (Application Programming Interface) | A set of functions and procedures allowing the creation of applications that access features or data of an operating system, application, or other service. |